WU's Not Being Assigned by 171.67.108.102/171.67.108.105/?

Moderators: Site Moderators, FAHC Science Team

Post Reply
TLS2000
Posts: 2
Joined: Fri May 12, 2017 12:53 am

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Post by TLS2000 »

Seems to have improved a lot in the last 24 hours. I'm still getting the occassional WU not assigned, but it's nowhere near as bad as it was.
JimF
Posts: 652
Joined: Thu Jan 21, 2010 2:03 pm

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Post by JimF »

Nert wrote:This whole episode is sad and disrespectful to the people that contribute to this project. Two questions come to mind:

1) Why do the volunteer contributors have a sense of urgency and those responsible for the project do not ?

2) These problems ALWAYS seem to happen over holiday weekends. Is everything so fragile that it fails when no one is there to hand hold the systems and keep them running ?
Good questions. The information flow on this project is all downhill. The purpose of the moderators (helpful though they may be in many cases) is to shield the developers from problems rather than feeding information back to them. These are not new issues (and a lot of others not apparent at the moment). They have been going on for years. PG's usual response is to start a new public relations campaign to make up for the people who leave.
Aurum
Posts: 296
Joined: Sat Oct 03, 2015 3:15 pm
Location: The Great Basin

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Post by Aurum »

Joe_H wrote:I have heard back that it is being looked into, but nothing further to post. The first reports came in on a Friday evening and reported to PG on Saturday morning. This is a relatively major holiday weekend, so limited staff would be available to work on this.
When I was a graduate student I did NOT get holidays. When I worked at Intel I was on-call 24x7. I bet they could even remedy this remotely. I notified Pande and Chodera and have heard nothing :?: :?: :?:
Please notify us when the servers are working reliably so I can move my rigs back to F@H. I'm down below 20% of my capacity and if they don't all have WUs when I get home today I'll move the last of them to another project.
In Science We Trust Image
JimF
Posts: 652
Joined: Thu Jan 21, 2010 2:03 pm

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Post by JimF »

Adam A. Wanderer wrote: As sad as these developments are, I'll stick with F@H. There's just no other project that does the work F@H does. And, F@H has improved over the years, I hope it'll continue to do so.
That is a good choice for their science, which I think is quite good too (though not being an expert, I can't prove it). I will check back by the end of the year to see if any problems are resolved. Given their usual rate of progress, that should be sufficient.
JonasTheMovie
Posts: 88
Joined: Wed Jan 06, 2016 4:16 am
Location: Northern Sweden

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Post by JonasTheMovie »

I see the same problem with slots stalling, glad to see Im not alone, if you read me right.

But I have to ask, what is the main problem here?
That there are unresponsive project owners that stalls contributors slots if they happen to be directed to those projects/servers, or
That the client does not recognize a stall due to multiple fails in downloading a new assignment and downloads another project?

Each time this has happened a reboot has "solved" my problem, a new WU has downloaded and it has been processing for a day, till I happen to come upon a problematic server.
Since the reboot helps, that tells me that the client should be able to recognize the problem and go on to the next project/server.
Image
foldy
Posts: 2061
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Post by foldy »

Adam A. Wanderer wrote:
Nert wrote:Was any form of "hacking" or a virus involved?
I hope the Stanford IT is robust and has many backups so if those things happen they can recover from it.
For the donors the worst case is the servers don't work for some days. For the science the worst case would be if the folding results are lost or corrupted.
Aurum
Posts: 296
Joined: Sat Oct 03, 2015 3:15 pm
Location: The Great Basin

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Post by Aurum »

boristsybin wrote:
Serge_Grenier wrote:Seems <client-type v='beta'/> is working to get WUs since yesterday.
seems it works
Good idea. I'll try it when I get home.
I used to use client-type v='advanced' to try to send the biggest jobs to my best rigs but it did not seem to have any effect so I deleted them.
In Science We Trust Image
rwh202
Posts: 425
Joined: Mon Nov 15, 2010 8:51 pm
Hardware configuration: 8x GTX 1080
3x GTX 1080 Ti
3x GTX 1060
Various other bits and pieces
Location: South Coast, UK

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Post by rwh202 »

SteveWillis wrote:I should mention that my older machine has also not had any problem at all. Only my newer machine had the problem. I mentioned it earlier but didn't bother to include my log.

Code: Select all

*********************** Log Started 2017-05-29T23:18:46Z ***********************
23:18:46:************************* Folding@home Client *************************
23:18:46:    Website: http://folding.stanford.edu/
23:18:46:  Copyright: (c) 2009-2014 Stanford University
23:18:46:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
23:18:46:       Args: --child --lifeline 1895 /etc/fahclient/config.xml --run-as
23:18:46:             fahclient --pid-file=/var/run/fahclient.pid --daemon
23:18:46:     Config: /etc/fahclient/config.xml
23:18:46:******************************** Build ********************************
23:18:46:    Version: 7.4.4
23:18:46:       Date: Mar 4 2014
23:18:46:       Time: 12:02:38
23:18:46:    SVN Rev: 4130
23:18:46:     Branch: fah/trunk/client
23:18:46:   Compiler: GNU 4.4.7
23:18:46:    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
23:18:46:             -fno-unsafe-math-optimizations -msse2
23:18:46:   Platform: linux2 3.2.0-1-amd64
23:18:46:       Bits: 64
23:18:46:       Mode: Release
23:18:46:******************************* System ********************************
23:18:46:        CPU: AMD FX(tm)-8320 Eight-Core Processor
23:18:46:     CPU ID: AuthenticAMD Family 21 Model 2 Stepping 0
23:18:46:       CPUs: 8
23:18:46:     Memory: 31.32GiB
23:18:46:Free Memory: 30.66GiB
23:18:46:    Threads: POSIX_THREADS
23:18:46: OS Version: 3.19
23:18:46:Has Battery: false
23:18:46: On Battery: false
23:18:46: UTC Offset: -5
23:18:46:        PID: 1897
23:18:46:        CWD: /var/lib/fahclient
23:18:46:         OS: Linux 3.19.0-32-generic x86_64
23:18:46:    OS Arch: AMD64
23:18:46:       GPUs: 6
23:18:46:      GPU 0: NVIDIA:7 GP104 [GeForce GTX 1080] 8873
23:18:46:      GPU 1: UNSUPPORTED: NV3 [PCI]
23:18:46:      GPU 2: NVIDIA:7 GP104 [GeForce GTX 1080] 8873
23:18:46:      GPU 3: UNSUPPORTED: NV3 [PCI]
23:18:46:      GPU 4: NVIDIA:7 GP104 [GeForce GTX 1080] 8873
23:18:46:      GPU 5: UNSUPPORTED: NV3 [PCI]
23:18:46:       CUDA: 6.1
23:18:46:CUDA Driver: 8000
23:18:46:***********************************************************************
23:18:46:<config>
23:18:46:  <!-- Client Control -->
23:18:46:  <fold-anon v='true'/>
23:18:46:
23:18:46:  <!-- Folding Core -->
23:18:46:  <checkpoint v='30'/>
23:18:46:
23:18:46:  <!-- Folding Slot Configuration -->
23:18:46:  <cause v='HUNTINGTONS'/>
23:18:46:
23:18:46:  <!-- Network -->
23:18:46:  <proxy v=':8080'/>
23:18:46:
23:18:46:  <!-- Slot Control -->
23:18:46:  <power v='full'/>
23:18:46:
23:18:46:  <!-- User Information -->
23:18:46:  <passkey v='********************************'/>
23:18:46:  <team v='224497'/>
23:18:46:  <user v='DarthMouse_ALL_1GD5nCZbh7gNo1SESPLT24xEd2Jsu4rTP9'/>
23:18:46:
23:18:46:  <!-- Work Unit Control -->
23:18:46:  <next-unit-percentage v='100'/>
23:18:46:
23:18:46:  <!-- Folding Slots -->
23:18:46:  <slot id='0' type='GPU'/>
23:18:46:  <slot id='1' type='GPU'/>
23:18:46:  <slot id='2' type='GPU'/>
23:18:46:</config>

Is this the one that works? If so, I think it could be the <cause v='HUNTINGTONS'/>

I've added that flag and got work straight away on 3 different rigs. I'm guessing that this flag (and others, like beta) gives you preferential referral to non-affected WorkServers.

Thanks!
PS3EdOlkkola
Posts: 184
Joined: Tue Aug 26, 2014 9:48 pm
Hardware configuration: 10 SMP folding slots on Intel Phi "Knights Landing" system, configured as 24 CPUs/slot
9 AMD GPU folding slots
31 Nvidia GPU folding slots
50 total folding slots
Average PPD/slot = 459,500
Location: Dallas, TX

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Post by PS3EdOlkkola »

@rwh202 I can confirm that changing the cause preference to Huntington's does avoid the problematic work server/assignment server. All slots are finally operational. Changing this value got 14 slots that were in "ready" mode to get a work unit and start processing. The procedure is to pause the slot that's in "ready" mode, then go to Configure, select tab Advanced, then select the Cause Preference as Huntinton's, click Save then un-pause the slot. The slot should pick up a work unit right away. Thanks rwh202 :)
Image
Hardware config viewtopic.php?f=66&t=17997&p=277235#p277235
Aurum
Posts: 296
Joined: Sat Oct 03, 2015 3:15 pm
Location: The Great Basin

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Post by Aurum »

We might just cure Huntington's tonight with the entire F@H network cranking it :D :shock: :lol:
In Science We Trust Image
SteveWillis
Posts: 409
Joined: Fri Apr 15, 2016 12:42 am
Hardware configuration: PC 1:
Linux Mint 17.3
three gtx 1080 GPUs One on a powered header
Motherboard = [MB-AM3-AS-SB-990FXR2] qty 1 Asus Sabertooth 990FX(+59.99)
CPU = [CPU-AM3-FX-8320BR] qty 1 AMD FX 8320 Eight Core 3.5GHz(+41.99)

PC2:
Linux Mint 18
Open air case
Motherboard: ASUS Crosshair V Formula-Z AM3+ AMD 990FX SATA 6Gb/s USB 3.0 ATX AMD
AMD FD6300WMHKBOX FX-6300 6-Core Processor Black Edition with Cooler Master Hyper 212 EVO - CPU Cooler with 120mm PWM Fan
three gtx 1080,
one gtx 1080 TI on a powered header

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Post by SteveWillis »

Yes that is the one that works.
Image

1080 and 1080TI GPUs on Linux Mint
Hypocritus
Posts: 40
Joined: Sat Jan 30, 2010 2:38 am
Location: Washington D.C.

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Post by Hypocritus »

boristsybin wrote:
Serge_Grenier wrote:Seems <client-type v='beta'/> is working to get WUs since yesterday.
seems it works

Code: Select all

client-type
beta
definitely works for me.

I have 4 rigs, and kept wondering why the last two of them never had the WS x.x.x.105 issues that the first two kelp having. I assumed it was the 1080 Ti's that kept the malpracticing server at bay in those two 100% uptime rigs. But then I was like, "why is @PS3EdOlkkola having such a huge problem if I am not? surely he has lots of high-end cards too..."

Lo and behold, when I checked, the last two rigs had the "beta" flag set in them, whereas my first two rigs didn't. So I went into FAHControl > Configure > Expert (tab) > Extra client options > then added the above "beta" flag to the first two rigs as well > hit OK > hit Save. The next time any slot checked, it got a "beta" assignment right away.

Since then I have had zero problems. Although I suspect the PPD is "slightly" lower than non-beta, at least I don't have to waste my time pausing and unpausing several times an hour.

Good find @Serge_Grenier
ifolder
Posts: 64
Joined: Sat Sep 19, 2015 12:44 pm

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Post by ifolder »

PS3EdOlkkola wrote:@rwh202 I can confirm that changing the cause preference to Huntington's does avoid the problematic work server/assignment server. All slots are finally operational. Changing this value got 14 slots that were in "ready" mode to get a work unit and start processing. The procedure is to pause the slot that's in "ready" mode, then go to Configure, select tab Advanced, then select the Cause Preference as Huntinton's, click Save then un-pause the slot. The slot should pick up a work unit right away. Thanks rwh202 :)
Is it possible to that in some way through telnet localhost 36330?
ifolder
Posts: 64
Joined: Sat Sep 19, 2015 12:44 pm

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Post by ifolder »

JimF wrote:PG's usual response is to start a new public relations campaign to make up for the people who leave.
PG should probably also start a campaign to get more biologists joining their team and working on folding projects because the paper publication rate is far from following the computational power increase of the network... That's quite a lot of electricity spent worldwide for quite a few published papers in the last years...
msultan
Pande Group Member
Posts: 134
Joined: Mon Jun 24, 2013 10:27 pm

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Post by msultan »

Hello everyone,
I apologize for the late response. 171.67.108.105 is my WS, which has been given assignemnts by the WS even though it has no assignable jobs. We are currently trying to fix the problem with the AS where it keeps sending jobs to my WS. In the meanwhile, I have reduced the priority of my WS so that it doesn't assign jobs as frequently(it is currently 1/10 of the original value).

I am terribly sorry for all the problems that this issue is causing everyone. We appreciate all of your support and hope this doesn't turn you away from F@H. Again, I am sorry for the problem, and we are trying to fix it.
Best,
Muneeb
Post Reply