Quadro NVS 160M Failing

Post requests to add new GPUs to the official whitelist here.

Moderators: Site Moderators, FAHC Science Team

gwildperson
Posts: 450
Joined: Tue Dec 04, 2007 8:36 pm

Re: Quadro NVS 160M Failing

Post by gwildperson »

A WU can remain in the work queue if there's a problem sending it. Your best option is to post the portion of the log near where 5769 (7, 91, 4467) finished and started trying to send.

The "Unknown" credit problem should be ignored. You'll still get the right number of points when the WU is uploaded, but the server never informed the client what that number would be. A server upgrade will fix that someday.
AYColumbia
Posts: 39
Joined: Thu Oct 20, 2011 1:44 pm

Re: Quadro NVS 160M Failing

Post by AYColumbia »

Here's the first time it occurs in the log:

Code: Select all

21:00:29:WU01:FS01:FahCore 0xa4 started
21:00:29:WU00:FS00:Sending unit results: id:00 state:SEND error:DUMPED project:5769 run:7 clone:91 gen:4467 core:0x11 unit:0x344702ac50086e8e1173005b00071689
21:00:29:WU00:FS00:Uploading 6.46KiB to 171.67.108.11
21:00:29:WU02:FS00:Connecting to assign-GPU.stanford.edu:80
21:00:29:WU00:FS00:Connecting to 171.67.108.11:8080
21:00:29:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to read stream
21:00:29:WU00:FS00:Trying to send results to collection server
21:00:29:WU00:FS00:Uploading 6.46KiB to 171.67.108.25
21:00:29:WU00:FS00:Connecting to 171.67.108.25:8080
21:00:29:WU02:FS00:News: Welcome to Folding@Home
21:00:29:WU02:FS00:Assigned to work server 171.67.108.21
21:00:29:WU02:FS00:Requesting new work unit for slot 00: READY gpu:0:"G98M [Quadro NVS 160M]" from 171.67.108.21
21:00:29:WU02:FS00:Connecting to 171.67.108.21:8080
Here's the last entry (there were retry entries like this between them):

Code: Select all

11:01:56:WU00:FS00:Sending unit results: id:00 state:SEND error:DUMPED project:5769 run:7 clone:91 gen:4467 core:0x11 unit:0x344702ac50086e8e1173005b00071689
11:01:56:WU00:FS00:Uploading 6.46KiB to 171.67.108.11
11:01:57:WU00:FS00:Connecting to 171.67.108.11:8080
11:01:57:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to read stream
11:01:57:WU00:FS00:Trying to send results to collection server
11:01:57:WU00:FS00:Uploading 6.46KiB to 171.67.108.25
11:01:57:WU00:FS00:Connecting to 171.67.108.25:8080
11:01:58:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
11:01:58:WU00:FS00:Connecting to 171.67.108.25:80
11:01:59:ERROR:WU00:FS00:Exception: Failed to connect to 171.67.108.25:80: No connection could be made because the target machine actively refused it.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Quadro NVS 160M Failing

Post by bruce »

I'm not sure anything can be done about this WU. It was reissued and has been successfully completed by others, so the original WU was not corrupt. As far as determining whether the problem was introduced by something that happened on your system (such as your 160M) or by something that happened on the server, I simply do not know. The "Failed to read stream" isn't a message I fully understand. I guess we'll just have to forget about this one and see if it happens again to either you or others.

If it still shows in your Work Queue, I would either start ./FAHClient --dump 00 or delete the 00 folder inside of work but that's not particularly important.
AYColumbia
Posts: 39
Joined: Thu Oct 20, 2011 1:44 pm

Re: Quadro NVS 160M Failing

Post by AYColumbia »

Ah, thanks for the tip on getting rid of it. BTW, how quickly does something get reassigned? Sometimes I pause the work process or I may need to reboot my laptop. Does it get reassigned during these short windows?
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Quadro NVS 160M Failing

Post by bruce »

A pause or a reboot have nothing to do with reassigning anything since the server has no knowledge of that.

1) If the client uploads a successful result, the project moves on. 2) If the client uploads an error report or a report that the WU has been deleted, the same WU is reassigned quickly. 3) If somehow the WU simply disappears (e.g.- reformat/reinstall OS and FAH, or whatever happened to you) the server waits until the Timeout (Preferred Deadline) to decide it's probably not going to see a result from the machine it was assigned to and reassigns it to someone else. That's why the QRB is based on the Timeout. The extra delay waiting for the WU to expire plus the cost of having someone else process the lost WU adds up over time and it's nice to avoid it whenever possible.
AYColumbia
Posts: 39
Joined: Thu Oct 20, 2011 1:44 pm

Re: Quadro NVS 160M Failing

Post by AYColumbia »

Thanks bruce.
Post Reply