Page 6 of 10

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Posted: Tue May 30, 2017 3:16 pm
by TLS2000
Seems to have improved a lot in the last 24 hours. I'm still getting the occassional WU not assigned, but it's nowhere near as bad as it was.

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Posted: Tue May 30, 2017 3:17 pm
by JimF
Nert wrote:This whole episode is sad and disrespectful to the people that contribute to this project. Two questions come to mind:

1) Why do the volunteer contributors have a sense of urgency and those responsible for the project do not ?

2) These problems ALWAYS seem to happen over holiday weekends. Is everything so fragile that it fails when no one is there to hand hold the systems and keep them running ?
Good questions. The information flow on this project is all downhill. The purpose of the moderators (helpful though they may be in many cases) is to shield the developers from problems rather than feeding information back to them. These are not new issues (and a lot of others not apparent at the moment). They have been going on for years. PG's usual response is to start a new public relations campaign to make up for the people who leave.

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Posted: Tue May 30, 2017 3:47 pm
by Aurum
Joe_H wrote:I have heard back that it is being looked into, but nothing further to post. The first reports came in on a Friday evening and reported to PG on Saturday morning. This is a relatively major holiday weekend, so limited staff would be available to work on this.
When I was a graduate student I did NOT get holidays. When I worked at Intel I was on-call 24x7. I bet they could even remedy this remotely. I notified Pande and Chodera and have heard nothing :?: :?: :?:
Please notify us when the servers are working reliably so I can move my rigs back to F@H. I'm down below 20% of my capacity and if they don't all have WUs when I get home today I'll move the last of them to another project.

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Posted: Tue May 30, 2017 3:58 pm
by JimF
Adam A. Wanderer wrote: As sad as these developments are, I'll stick with F@H. There's just no other project that does the work F@H does. And, F@H has improved over the years, I hope it'll continue to do so.
That is a good choice for their science, which I think is quite good too (though not being an expert, I can't prove it). I will check back by the end of the year to see if any problems are resolved. Given their usual rate of progress, that should be sufficient.

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Posted: Tue May 30, 2017 4:24 pm
by JonasTheMovie
I see the same problem with slots stalling, glad to see Im not alone, if you read me right.

But I have to ask, what is the main problem here?
That there are unresponsive project owners that stalls contributors slots if they happen to be directed to those projects/servers, or
That the client does not recognize a stall due to multiple fails in downloading a new assignment and downloads another project?

Each time this has happened a reboot has "solved" my problem, a new WU has downloaded and it has been processing for a day, till I happen to come upon a problematic server.
Since the reboot helps, that tells me that the client should be able to recognize the problem and go on to the next project/server.

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Posted: Tue May 30, 2017 4:46 pm
by foldy
Adam A. Wanderer wrote:
Nert wrote:Was any form of "hacking" or a virus involved?
I hope the Stanford IT is robust and has many backups so if those things happen they can recover from it.
For the donors the worst case is the servers don't work for some days. For the science the worst case would be if the folding results are lost or corrupted.

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Posted: Tue May 30, 2017 4:55 pm
by Aurum
boristsybin wrote:
Serge_Grenier wrote:Seems <client-type v='beta'/> is working to get WUs since yesterday.
seems it works
Good idea. I'll try it when I get home.
I used to use client-type v='advanced' to try to send the biggest jobs to my best rigs but it did not seem to have any effect so I deleted them.

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Posted: Tue May 30, 2017 5:11 pm
by rwh202
SteveWillis wrote:I should mention that my older machine has also not had any problem at all. Only my newer machine had the problem. I mentioned it earlier but didn't bother to include my log.

Code: Select all

*********************** Log Started 2017-05-29T23:18:46Z ***********************
23:18:46:************************* Folding@home Client *************************
23:18:46:    Website: http://folding.stanford.edu/
23:18:46:  Copyright: (c) 2009-2014 Stanford University
23:18:46:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
23:18:46:       Args: --child --lifeline 1895 /etc/fahclient/config.xml --run-as
23:18:46:             fahclient --pid-file=/var/run/fahclient.pid --daemon
23:18:46:     Config: /etc/fahclient/config.xml
23:18:46:******************************** Build ********************************
23:18:46:    Version: 7.4.4
23:18:46:       Date: Mar 4 2014
23:18:46:       Time: 12:02:38
23:18:46:    SVN Rev: 4130
23:18:46:     Branch: fah/trunk/client
23:18:46:   Compiler: GNU 4.4.7
23:18:46:    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
23:18:46:             -fno-unsafe-math-optimizations -msse2
23:18:46:   Platform: linux2 3.2.0-1-amd64
23:18:46:       Bits: 64
23:18:46:       Mode: Release
23:18:46:******************************* System ********************************
23:18:46:        CPU: AMD FX(tm)-8320 Eight-Core Processor
23:18:46:     CPU ID: AuthenticAMD Family 21 Model 2 Stepping 0
23:18:46:       CPUs: 8
23:18:46:     Memory: 31.32GiB
23:18:46:Free Memory: 30.66GiB
23:18:46:    Threads: POSIX_THREADS
23:18:46: OS Version: 3.19
23:18:46:Has Battery: false
23:18:46: On Battery: false
23:18:46: UTC Offset: -5
23:18:46:        PID: 1897
23:18:46:        CWD: /var/lib/fahclient
23:18:46:         OS: Linux 3.19.0-32-generic x86_64
23:18:46:    OS Arch: AMD64
23:18:46:       GPUs: 6
23:18:46:      GPU 0: NVIDIA:7 GP104 [GeForce GTX 1080] 8873
23:18:46:      GPU 1: UNSUPPORTED: NV3 [PCI]
23:18:46:      GPU 2: NVIDIA:7 GP104 [GeForce GTX 1080] 8873
23:18:46:      GPU 3: UNSUPPORTED: NV3 [PCI]
23:18:46:      GPU 4: NVIDIA:7 GP104 [GeForce GTX 1080] 8873
23:18:46:      GPU 5: UNSUPPORTED: NV3 [PCI]
23:18:46:       CUDA: 6.1
23:18:46:CUDA Driver: 8000
23:18:46:***********************************************************************
23:18:46:<config>
23:18:46:  <!-- Client Control -->
23:18:46:  <fold-anon v='true'/>
23:18:46:
23:18:46:  <!-- Folding Core -->
23:18:46:  <checkpoint v='30'/>
23:18:46:
23:18:46:  <!-- Folding Slot Configuration -->
23:18:46:  <cause v='HUNTINGTONS'/>
23:18:46:
23:18:46:  <!-- Network -->
23:18:46:  <proxy v=':8080'/>
23:18:46:
23:18:46:  <!-- Slot Control -->
23:18:46:  <power v='full'/>
23:18:46:
23:18:46:  <!-- User Information -->
23:18:46:  <passkey v='********************************'/>
23:18:46:  <team v='224497'/>
23:18:46:  <user v='DarthMouse_ALL_1GD5nCZbh7gNo1SESPLT24xEd2Jsu4rTP9'/>
23:18:46:
23:18:46:  <!-- Work Unit Control -->
23:18:46:  <next-unit-percentage v='100'/>
23:18:46:
23:18:46:  <!-- Folding Slots -->
23:18:46:  <slot id='0' type='GPU'/>
23:18:46:  <slot id='1' type='GPU'/>
23:18:46:  <slot id='2' type='GPU'/>
23:18:46:</config>

Is this the one that works? If so, I think it could be the <cause v='HUNTINGTONS'/>

I've added that flag and got work straight away on 3 different rigs. I'm guessing that this flag (and others, like beta) gives you preferential referral to non-affected WorkServers.

Thanks!

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Posted: Tue May 30, 2017 6:48 pm
by PS3EdOlkkola
@rwh202 I can confirm that changing the cause preference to Huntington's does avoid the problematic work server/assignment server. All slots are finally operational. Changing this value got 14 slots that were in "ready" mode to get a work unit and start processing. The procedure is to pause the slot that's in "ready" mode, then go to Configure, select tab Advanced, then select the Cause Preference as Huntinton's, click Save then un-pause the slot. The slot should pick up a work unit right away. Thanks rwh202 :)

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Posted: Tue May 30, 2017 6:54 pm
by Aurum
We might just cure Huntington's tonight with the entire F@H network cranking it :D :shock: :lol:

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Posted: Tue May 30, 2017 7:53 pm
by SteveWillis
Yes that is the one that works.

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Posted: Tue May 30, 2017 8:06 pm
by Hypocritus
boristsybin wrote:
Serge_Grenier wrote:Seems <client-type v='beta'/> is working to get WUs since yesterday.
seems it works

Code: Select all

client-type
beta
definitely works for me.

I have 4 rigs, and kept wondering why the last two of them never had the WS x.x.x.105 issues that the first two kelp having. I assumed it was the 1080 Ti's that kept the malpracticing server at bay in those two 100% uptime rigs. But then I was like, "why is @PS3EdOlkkola having such a huge problem if I am not? surely he has lots of high-end cards too..."

Lo and behold, when I checked, the last two rigs had the "beta" flag set in them, whereas my first two rigs didn't. So I went into FAHControl > Configure > Expert (tab) > Extra client options > then added the above "beta" flag to the first two rigs as well > hit OK > hit Save. The next time any slot checked, it got a "beta" assignment right away.

Since then I have had zero problems. Although I suspect the PPD is "slightly" lower than non-beta, at least I don't have to waste my time pausing and unpausing several times an hour.

Good find @Serge_Grenier

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Posted: Tue May 30, 2017 9:14 pm
by ifolder
PS3EdOlkkola wrote:@rwh202 I can confirm that changing the cause preference to Huntington's does avoid the problematic work server/assignment server. All slots are finally operational. Changing this value got 14 slots that were in "ready" mode to get a work unit and start processing. The procedure is to pause the slot that's in "ready" mode, then go to Configure, select tab Advanced, then select the Cause Preference as Huntinton's, click Save then un-pause the slot. The slot should pick up a work unit right away. Thanks rwh202 :)
Is it possible to that in some way through telnet localhost 36330?

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Posted: Tue May 30, 2017 9:22 pm
by ifolder
JimF wrote:PG's usual response is to start a new public relations campaign to make up for the people who leave.
PG should probably also start a campaign to get more biologists joining their team and working on folding projects because the paper publication rate is far from following the computational power increase of the network... That's quite a lot of electricity spent worldwide for quite a few published papers in the last years...

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Posted: Tue May 30, 2017 9:39 pm
by msultan
Hello everyone,
I apologize for the late response. 171.67.108.105 is my WS, which has been given assignemnts by the WS even though it has no assignable jobs. We are currently trying to fix the problem with the AS where it keeps sending jobs to my WS. In the meanwhile, I have reduced the priority of my WS so that it doesn't assign jobs as frequently(it is currently 1/10 of the original value).

I am terribly sorry for all the problems that this issue is causing everyone. We appreciate all of your support and hope this doesn't turn you away from F@H. Again, I am sorry for the problem, and we are trying to fix it.
Best,
Muneeb