WU's Not Being Assigned by 171.67.108.102/171.67.108.105/?

Moderators: Site Moderators, PandeGroup

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Postby TLS2000 » Tue May 30, 2017 3:16 pm

Seems to have improved a lot in the last 24 hours. I'm still getting the occassional WU not assigned, but it's nowhere near as bad as it was.
TLS2000
 
Posts: 2
Joined: Fri May 12, 2017 12:53 am

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Postby JimF » Tue May 30, 2017 3:17 pm

Nert wrote:This whole episode is sad and disrespectful to the people that contribute to this project. Two questions come to mind:

1) Why do the volunteer contributors have a sense of urgency and those responsible for the project do not ?

2) These problems ALWAYS seem to happen over holiday weekends. Is everything so fragile that it fails when no one is there to hand hold the systems and keep them running ?


Good questions. The information flow on this project is all downhill. The purpose of the moderators (helpful though they may be in many cases) is to shield the developers from problems rather than feeding information back to them. These are not new issues (and a lot of others not apparent at the moment). They have been going on for years. PG's usual response is to start a new public relations campaign to make up for the people who leave.
JimF
 
Posts: 446
Joined: Thu Jan 21, 2010 2:03 pm

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Postby Aurum » Tue May 30, 2017 3:47 pm

Joe_H wrote:I have heard back that it is being looked into, but nothing further to post. The first reports came in on a Friday evening and reported to PG on Saturday morning. This is a relatively major holiday weekend, so limited staff would be available to work on this.
When I was a graduate student I did NOT get holidays. When I worked at Intel I was on-call 24x7. I bet they could even remedy this remotely. I notified Pande and Chodera and have heard nothing :?: :?: :?:
Please notify us when the servers are working reliably so I can move my rigs back to F@H. I'm down below 20% of my capacity and if they don't all have WUs when I get home today I'll move the last of them to another project.
Image Image
User avatar
Aurum
 
Posts: 293
Joined: Sat Oct 03, 2015 3:15 pm
Location: The Great Basin

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Postby JimF » Tue May 30, 2017 3:58 pm

Adam A. Wanderer wrote:As sad as these developments are, I'll stick with F@H. There's just no other project that does the work F@H does. And, F@H has improved over the years, I hope it'll continue to do so.

That is a good choice for their science, which I think is quite good too (though not being an expert, I can't prove it). I will check back by the end of the year to see if any problems are resolved. Given their usual rate of progress, that should be sufficient.
JimF
 
Posts: 446
Joined: Thu Jan 21, 2010 2:03 pm

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Postby JonasTheMovie » Tue May 30, 2017 4:24 pm

I see the same problem with slots stalling, glad to see Im not alone, if you read me right.

But I have to ask, what is the main problem here?
That there are unresponsive project owners that stalls contributors slots if they happen to be directed to those projects/servers, or
That the client does not recognize a stall due to multiple fails in downloading a new assignment and downloads another project?

Each time this has happened a reboot has "solved" my problem, a new WU has downloaded and it has been processing for a day, till I happen to come upon a problematic server.
Since the reboot helps, that tells me that the client should be able to recognize the problem and go on to the next project/server.
Image
JonasTheMovie
 
Posts: 88
Joined: Wed Jan 06, 2016 4:16 am
Location: Northern Sweden

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Postby foldy » Tue May 30, 2017 4:46 pm

Adam A. Wanderer wrote:
Nert wrote:Was any form of "hacking" or a virus involved?

I hope the Stanford IT is robust and has many backups so if those things happen they can recover from it.
For the donors the worst case is the servers don't work for some days. For the science the worst case would be if the folding results are lost or corrupted.
foldy
 
Posts: 1133
Joined: Sat Dec 01, 2012 3:43 pm

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Postby Aurum » Tue May 30, 2017 4:55 pm

boristsybin wrote:
Serge_Grenier wrote:Seems <client-type v='beta'/> is working to get WUs since yesterday.

seems it works
Good idea. I'll try it when I get home.
I used to use client-type v='advanced' to try to send the biggest jobs to my best rigs but it did not seem to have any effect so I deleted them.
User avatar
Aurum
 
Posts: 293
Joined: Sat Oct 03, 2015 3:15 pm
Location: The Great Basin

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Postby rwh202 » Tue May 30, 2017 5:11 pm

SteveWillis wrote:I should mention that my older machine has also not had any problem at all. Only my newer machine had the problem. I mentioned it earlier but didn't bother to include my log.


Code: Select all
*********************** Log Started 2017-05-29T23:18:46Z ***********************
23:18:46:************************* Folding@home Client *************************
23:18:46:    Website: http://folding.stanford.edu/
23:18:46:  Copyright: (c) 2009-2014 Stanford University
23:18:46:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
23:18:46:       Args: --child --lifeline 1895 /etc/fahclient/config.xml --run-as
23:18:46:             fahclient --pid-file=/var/run/fahclient.pid --daemon
23:18:46:     Config: /etc/fahclient/config.xml
23:18:46:******************************** Build ********************************
23:18:46:    Version: 7.4.4
23:18:46:       Date: Mar 4 2014
23:18:46:       Time: 12:02:38
23:18:46:    SVN Rev: 4130
23:18:46:     Branch: fah/trunk/client
23:18:46:   Compiler: GNU 4.4.7
23:18:46:    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
23:18:46:             -fno-unsafe-math-optimizations -msse2
23:18:46:   Platform: linux2 3.2.0-1-amd64
23:18:46:       Bits: 64
23:18:46:       Mode: Release
23:18:46:******************************* System ********************************
23:18:46:        CPU: AMD FX(tm)-8320 Eight-Core Processor
23:18:46:     CPU ID: AuthenticAMD Family 21 Model 2 Stepping 0
23:18:46:       CPUs: 8
23:18:46:     Memory: 31.32GiB
23:18:46:Free Memory: 30.66GiB
23:18:46:    Threads: POSIX_THREADS
23:18:46: OS Version: 3.19
23:18:46:Has Battery: false
23:18:46: On Battery: false
23:18:46: UTC Offset: -5
23:18:46:        PID: 1897
23:18:46:        CWD: /var/lib/fahclient
23:18:46:         OS: Linux 3.19.0-32-generic x86_64
23:18:46:    OS Arch: AMD64
23:18:46:       GPUs: 6
23:18:46:      GPU 0: NVIDIA:7 GP104 [GeForce GTX 1080] 8873
23:18:46:      GPU 1: UNSUPPORTED: NV3 [PCI]
23:18:46:      GPU 2: NVIDIA:7 GP104 [GeForce GTX 1080] 8873
23:18:46:      GPU 3: UNSUPPORTED: NV3 [PCI]
23:18:46:      GPU 4: NVIDIA:7 GP104 [GeForce GTX 1080] 8873
23:18:46:      GPU 5: UNSUPPORTED: NV3 [PCI]
23:18:46:       CUDA: 6.1
23:18:46:CUDA Driver: 8000
23:18:46:***********************************************************************
23:18:46:<config>
23:18:46:  <!-- Client Control -->
23:18:46:  <fold-anon v='true'/>
23:18:46:
23:18:46:  <!-- Folding Core -->
23:18:46:  <checkpoint v='30'/>
23:18:46:
23:18:46:  <!-- Folding Slot Configuration -->
23:18:46:  <cause v='HUNTINGTONS'/>
23:18:46:
23:18:46:  <!-- Network -->
23:18:46:  <proxy v=':8080'/>
23:18:46:
23:18:46:  <!-- Slot Control -->
23:18:46:  <power v='full'/>
23:18:46:
23:18:46:  <!-- User Information -->
23:18:46:  <passkey v='********************************'/>
23:18:46:  <team v='224497'/>
23:18:46:  <user v='DarthMouse_ALL_1GD5nCZbh7gNo1SESPLT24xEd2Jsu4rTP9'/>
23:18:46:
23:18:46:  <!-- Work Unit Control -->
23:18:46:  <next-unit-percentage v='100'/>
23:18:46:
23:18:46:  <!-- Folding Slots -->
23:18:46:  <slot id='0' type='GPU'/>
23:18:46:  <slot id='1' type='GPU'/>
23:18:46:  <slot id='2' type='GPU'/>
23:18:46:</config>



Is this the one that works? If so, I think it could be the <cause v='HUNTINGTONS'/>

I've added that flag and got work straight away on 3 different rigs. I'm guessing that this flag (and others, like beta) gives you preferential referral to non-affected WorkServers.

Thanks!
rwh202
 
Posts: 324
Joined: Mon Nov 15, 2010 8:51 pm
Location: South Coast, UK

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Postby PS3EdOlkkola » Tue May 30, 2017 6:48 pm

@rwh202 I can confirm that changing the cause preference to Huntington's does avoid the problematic work server/assignment server. All slots are finally operational. Changing this value got 14 slots that were in "ready" mode to get a work unit and start processing. The procedure is to pause the slot that's in "ready" mode, then go to Configure, select tab Advanced, then select the Cause Preference as Huntinton's, click Save then un-pause the slot. The slot should pick up a work unit right away. Thanks rwh202 :)
User avatar
PS3EdOlkkola
 
Posts: 185
Joined: Tue Aug 26, 2014 9:48 pm
Location: Dallas, TX

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Postby Aurum » Tue May 30, 2017 6:54 pm

We might just cure Huntington's tonight with the entire F@H network cranking it :D :shock: :lol:
User avatar
Aurum
 
Posts: 293
Joined: Sat Oct 03, 2015 3:15 pm
Location: The Great Basin

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Postby SteveWillis » Tue May 30, 2017 7:53 pm

Yes that is the one that works.
Image
folding as DarthMouse_ALL_1GD5nCZbh7gNo1SESPLT24xEd2Jsu4rTP9
Currently folding on 13 GPUs on Linux Mint
SteveWillis
 
Posts: 372
Joined: Fri Apr 15, 2016 12:42 am

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Postby Hypocritus » Tue May 30, 2017 8:06 pm

boristsybin wrote:
Serge_Grenier wrote:Seems <client-type v='beta'/> is working to get WUs since yesterday.


seems it works

Code: Select all
client-type
beta

definitely works for me.

I have 4 rigs, and kept wondering why the last two of them never had the WS x.x.x.105 issues that the first two kelp having. I assumed it was the 1080 Ti's that kept the malpracticing server at bay in those two 100% uptime rigs. But then I was like, "why is @PS3EdOlkkola having such a huge problem if I am not? surely he has lots of high-end cards too..."

Lo and behold, when I checked, the last two rigs had the "beta" flag set in them, whereas my first two rigs didn't. So I went into FAHControl > Configure > Expert (tab) > Extra client options > then added the above "beta" flag to the first two rigs as well > hit OK > hit Save. The next time any slot checked, it got a "beta" assignment right away.

Since then I have had zero problems. Although I suspect the PPD is "slightly" lower than non-beta, at least I don't have to waste my time pausing and unpausing several times an hour.

Good find @Serge_Grenier
Hypocritus
 
Posts: 26
Joined: Sat Jan 30, 2010 2:38 am

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Postby ifolder » Tue May 30, 2017 9:14 pm

PS3EdOlkkola wrote:@rwh202 I can confirm that changing the cause preference to Huntington's does avoid the problematic work server/assignment server. All slots are finally operational. Changing this value got 14 slots that were in "ready" mode to get a work unit and start processing. The procedure is to pause the slot that's in "ready" mode, then go to Configure, select tab Advanced, then select the Cause Preference as Huntinton's, click Save then un-pause the slot. The slot should pick up a work unit right away. Thanks rwh202 :)


Is it possible to that in some way through telnet localhost 36330?
ifolder
 
Posts: 49
Joined: Sat Sep 19, 2015 12:44 pm

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Postby ifolder » Tue May 30, 2017 9:22 pm

JimF wrote:PG's usual response is to start a new public relations campaign to make up for the people who leave.


PG should probably also start a campaign to get more biologists joining their team and working on folding projects because the paper publication rate is far from following the computational power increase of the network... That's quite a lot of electricity spent worldwide for quite a few published papers in the last years...
ifolder
 
Posts: 49
Joined: Sat Sep 19, 2015 12:44 pm

Re: WU's Not Being Assigned by 171.67.108.102/171.67.108.105

Postby msultan » Tue May 30, 2017 9:39 pm

Hello everyone,
I apologize for the late response. 171.67.108.105 is my WS, which has been given assignemnts by the WS even though it has no assignable jobs. We are currently trying to fix the problem with the AS where it keeps sending jobs to my WS. In the meanwhile, I have reduced the priority of my WS so that it doesn't assign jobs as frequently(it is currently 1/10 of the original value).

I am terribly sorry for all the problems that this issue is causing everyone. We appreciate all of your support and hope this doesn't turn you away from F@H. Again, I am sorry for the problem, and we are trying to fix it.
Best,
Muneeb
User avatar
msultan
Pande Group Member
 
Posts: 134
Joined: Mon Jun 24, 2013 10:27 pm

PreviousNext

Return to Issues with a specific server

Who is online

Users browsing this forum: No registered users and 3 guests

cron