WS server for GPU down?

Moderators: Site Moderators, FAHC Science Team

WS server for GPU down?

Postby Reuzenkakatoe » Tue May 19, 2020 4:20 pm

Good day,

I'm aware of the recent server overload issues but I would still like to make sure that my system is fully operational.

I have moved my (previously working with F@H) GPU card to another machine. My new machine has one CPU slot and one GPU slot.
The CPU slot is working fine but the GPU fails to contact the F@H server. Running F@H 7.6.13. The relevant part of the logfile:

14:47:44:WU01:FS01:Connecting to assign1.foldingathome.org:80
14:47:45:WU01:FS01:Assigned to work server 192.0.2.1
14:47:45:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:GF108 [GeForce GT 630] 311 from 192.0.2.1
14:47:45:WU01:FS01:Connecting to 192.0.2.1:8080
14:48:06:WARNING:WU01:FS01:WorkServer connection failed on port 8080 trying 80
14:48:06:WU01:FS01:Connecting to 192.0.2.1:80
14:48:27:ERROR:WU01:FS01:Exception: Failed to connect to 192.0.2.1:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

I have configured the Windows 10 firewall to allow F@H through - and also tried with the firewall disabled.
Below a Trace route:

C:\Windows\system32>tracert 192.0.2.1

Tracing route to 192.0.2.1 over a maximum of 30 hops

1 <1 ms <1 ms <1 ms 192.168.0.253 # <- my home router
2 8 ms 6 ms 5 ms 10.255.168.1 # <- my ISP
3 8 ms 6 ms 7 ms mnd-rc0001-cr101-ae95-0.core.as9143.net [213.51.175.217]
4 * * * Request timed out.
5 * * * Request timed out.
6 * * * Request timed out.
7 * * * Request timed out.

This has been going on for several days now, so I would just like a quick confirmation: is the F@H WS really down or is my system to blame?

Thanks very much in advance,
Reuzenkakatoe
 
Posts: 6
Joined: Wed Apr 01, 2020 12:41 pm

Re: WS server for GPU down?

Postby Neil-B » Tue May 19, 2020 4:23 pm

Can you post your log including top 200 lines so we can see the configuration … That message indicates there may be an issue since the 192.0.2.1 is where the AS assigns requests where the GPU is not capable of folding for some reason or other.

For help on posting logs please see https://foldingforum.org/viewtopic.php?f=61&t=26036

Your card may be right on the borderline of being able to fold - there have been a few issues recently where cards have "dropped off" the list where they shouldn't … I'll link a post explaining shortly.

Different card … but might be the same type of issue … https://foldingforum.org/viewtopic.php?f=80&t=34525&start=15#p334242.

Was the swap from one machine to the other "straight away" or was there some time lag (weeks/months) between this?
1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent, Quadro K420 1GB, FAH 7.6.13
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro, Quadro M1000M 2GB, FAH 7.6.13
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro, GTX 750Ti 2GB, FAH 7.6.13
Neil-B
 
Posts: 1217
Joined: Sun Mar 22, 2020 6:52 pm
Location: UK

Re: WS server for GPU down?

Postby JimboPalmer » Tue May 19, 2020 4:48 pm

14:47:45:WU01:FS01:Assigned to work server 192.0.2.1
14:47:45:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:GF108 [GeForce GT 630] 311 from 192.0.2.1

192.0.2.1 is where the software sends you if your hardware won't fold

https://www.techpowerup.com/gpu-specs/g ... t-630.c816

Your card meets the minimum specifications to begin folding, It supports OpenCL 1.1 and double precision floating point math. (F@H actually tests with OpenCL 1.2)

With only 96 threads, I doubt it can finish folding before the deadline.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
JimboPalmer
 
Posts: 1965
Joined: Mon Feb 16, 2009 5:12 am
Location: Greenwood MS USA

Re: WS server for GPU down?

Postby Neil-B » Tue May 19, 2020 5:15 pm

Those threads are double speed so this card is slightly quicker than my K420 (according to various comparison sites - as far as one can trust these) has which shaved in below the deadline on the few WUs I have run on it … You may find that at the moment with a large pool of GPU folders with fast kit that most WUs that you fold with this (if we can get it working for you) will have been reissued at timeout and returned by a quicker machine well before yours finishes the WU … It will be worth using the WU Status app to check this.
Neil-B
 
Posts: 1217
Joined: Sun Mar 22, 2020 6:52 pm
Location: UK

Re: WS server for GPU down?

Postby Reuzenkakatoe » Tue May 19, 2020 7:32 pm

Problem solved! All hints together made me decide to put the GPU card back into its original machine.
It's working again. The working rig is an AMD K10-5800K on a fast Gigabyte motherboard with 32 Gigs of very fast RAM.
The non-working rig was a rather ancient Intel Q6600 on a prehistoric Asus P5 main board with only 4Gigs of DDR2. Also a quad-core, but way slower than the K10-Gigabyte combination.
The F@H server probably correctly rejected this slow old piece of junk. I guess a GPU relies on support from a decent CPU on a fast system bus.

Next time I'll try to do the obvious instead of calling for help. I apologize for taking up you people's valuable time.

Great, so many very helpful answers in such a short time. You're all great people and I will endeavor to support Folding@Home.
You all helped me out in a tremendous way. Thanks so much to all of you!
Reuzenkakatoe
 
Posts: 6
Joined: Wed Apr 01, 2020 12:41 pm

Re: WS server for GPU down?

Postby bruce » Tue May 19, 2020 7:42 pm

I guess a GPU relies on support from a decent CPU on a fast system


Actually, the definition of a decent CPU is pretty broad. Each GPU will generally need one CPU thread which will be dedicated to it. (It can be a tread on a pretty slow GPU if that's what you have in your system.)
bruce
 
Posts: 19697
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.

Re: WS server for GPU down?

Postby HugoNotte » Tue May 19, 2020 7:50 pm

I have tried folding on a GT 630M, which got the same specs as the desktop version apart from being 150 MHz slower. But I don't think that would make a big difference. It's really not worth it, since it exceeded the Timeout on every WU. It did manage to finish before Deadline, but I feel GPUs that regularly exceed Timeout put additional strain on the server, since the same WU then gets send out again.
HugoNotte
 
Posts: 69
Joined: Tue Apr 07, 2020 8:09 pm

Re: WS server for GPU down?

Postby Neil-B » Tue May 19, 2020 8:06 pm

At the moment where the GPU resource available is greater than the GPU WU availability (for the most part) Timed out WUs do get reissued fairly soon after Timeout .. Once the QPU WU pool grows to meet demand (or the GPU resource shrinks) Timed out WUs only get reissued once they get to the head of the queue. So under "normal" circumstances as long as the GPU folds WUs within Deadline it would probably be worth doing so - under current circumstances there is a fair chance that a reissued WU will complete well within the original Deadline.
Neil-B
 
Posts: 1217
Joined: Sun Mar 22, 2020 6:52 pm
Location: UK

Re: WS server for GPU down?

Postby Reuzenkakatoe » Thu May 21, 2020 9:06 am

HugoNotte wrote:I have tried folding on a GT 630M, which got the same specs as the desktop version apart from being 150 MHz slower. But I don't think that would make a big difference. It's really not worth it, since it exceeded the Timeout on every WU. It did manage to finish before Deadline, but I feel GPUs that regularly exceed Timeout put additional strain on the server, since the same WU then gets send out again.


My GT 630 also often exceeds the timeout. Timeouts are often 24 hours, which is rather tight. However, my understanding was that finishing before the expiration date is still useful to the F@H project. Is this true? If not, I'll turn off the GPU altogether so that the running CPU threads get some more breathing space.
Interesting stuff, this.
Reuzenkakatoe
 
Posts: 6
Joined: Wed Apr 01, 2020 12:41 pm

Re: WS server for GPU down?

Postby Neil-B » Thu May 21, 2020 9:49 am

Under normal circumstances even after timeout the WU would be in a queueing system and might not be reissued for days - so missing timeout may well still mean you return the WU before anyone else, even right the way up to expiration.

At the moment there is a fair chance that a WU will be reissued fairly shortly after timeout and so if missing timeout by more than the time it might take a fast GPU to fold the WU then they may well get there first.

Since the reissued WU might end up on another slower GPU for the most part if the WU you are processing will complete clearly before expiration then letting it do so seems fine to me ... Some people may advise your GPU is too slow - but really if it can complete within expiration then it isn't imo (and by definition in FAH's opinion)
Neil-B
 
Posts: 1217
Joined: Sun Mar 22, 2020 6:52 pm
Location: UK

Re: WS server for GPU down?

Postby Reuzenkakatoe » Thu May 21, 2020 7:53 pm

Thanks Neil for your clear answer. Much appreciated.
Reuzenkakatoe
 
Posts: 6
Joined: Wed Apr 01, 2020 12:41 pm

Re: WS server for GPU down?

Postby bruce » Thu May 21, 2020 8:04 pm

Suppose your GPU doesn't finish the WU before that Timeout clock expires and the WU does get duplicated. You've already got a 1 day (or whatever) head-start on completing the WU. As has already been stated, there's no way to predict which person will complete the first and who will complete it second so it's certainly worth continuing to work on the WU. As far as science is concerned, the first return of the result will generate the next Gen in that trajectory and science will move on. FAH gives both persons credit for finishing the WU as long as it's before the Final Deadline.

Everyone's deadlines are established at the time the WU is assigned so yours don't coincide with the other person's deadlines.
bruce
 
Posts: 19697
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.

Re: WS server for GPU down?

Postby PantherX » Thu May 21, 2020 8:22 pm

bruce wrote:...FAH gives both persons credit for finishing the WU as long as it's before the Final Deadline.

Just to expand what bruce said, here's the points overview:
Before the Timeout: Base credits + Bonus points
After Timeout and before Expiration: Base credits
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
User avatar
PantherX
Site Moderator
 
Posts: 6345
Joined: Wed Dec 23, 2009 10:33 am
Location: Land Of The Long White Cloud

Re: WS server for GPU down?

Postby Joe_H » Thu May 21, 2020 9:46 pm

Reuzenkakatoe wrote:
HugoNotte wrote:I have tried folding on a GT 630M, which got the same specs as the desktop version apart from being 150 MHz slower. But I don't think that would make a big difference. It's really not worth it, since it exceeded the Timeout on every WU. It did manage to finish before Deadline, but I feel GPUs that regularly exceed Timeout put additional strain on the server, since the same WU then gets send out again.


My GT 630 also often exceeds the timeout. Timeouts are often 24 hours, which is rather tight. However, my understanding was that finishing before the expiration date is still useful to the F@H project. Is this true? If not, I'll turn off the GPU altogether so that the running CPU threads get some more breathing space.
Interesting stuff, this.


Besides the other answers given, it is hard to put out a hard and fast rule for the GT 630 and some of the other cards nVidia has branded on the low end. They have used the GT 630 designation for at least 4 different desktop cards. Two of them were based on the Fermi GPU chips, basically rebranded GT 400 series cards. The other two were based on Kepler chips and have more shader cores than the Fermi based cards.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Joe_H
Site Admin
 
Posts: 6451
Joined: Tue Apr 21, 2009 5:41 pm
Location: W. MA

Re: WS server for GPU down?

Postby Boluker » Mon Jun 01, 2020 10:25 pm

Hi folks! Our poor work servers are straining under the load from all of you amazing folks contributing your spare computing cycles. We're actively working to spin up more servers on our end to handle the load, but please bear with us---it may take another day or two before we can fully scale up to handle the load.
Boluker
 
Posts: 1
Joined: Mon Jun 01, 2020 10:22 pm


Return to Issues with a specific server

Who is online

Users browsing this forum: Google [Bot] and 2 guests

cron