GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Moderators: Site Moderators, PandeGroup

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Postby noorman » Tue Feb 16, 2010 2:45 pm

.

I wouldn't delete anything just yet; I 'm sure Stanford is working on the problems.
They 've only just awakened to the morning ...

They have been notified and since then I 've seen CS6 go in REJECT and back in to Accepting / IMO that means they are working on it.


.
- stopped Linux SMP w. HT on i7-860@3.5 GHz
....................................
Folded since 10-06-04 till 09-2010
User avatar
noorman
 
Posts: 553
Joined: Sun Dec 02, 2007 2:26 pm
Location: Belgium, near the International Sea-Port of Antwerp

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Postby noorman » Tue Feb 16, 2010 3:16 pm

.

NetLoad is coming down fast already on CS6 (171.67.108.26) :D


.
User avatar
noorman
 
Posts: 553
Joined: Sun Dec 02, 2007 2:26 pm
Location: Belgium, near the International Sea-Port of Antwerp

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Postby Pette Broad » Tue Feb 16, 2010 3:23 pm

I'm not seeing any problems today, all of my GPU clients are getting, completing and uploading. Funny thing is that unlike many others I don't have any waiting to be sent, I did go around all my machines and delete any already received units but they weren't in the queue anyway. Oh, I did have one unsent unit when I got up this morning but by the time I'd had my cup of tea it'd been uploaded.

Pete
Image
Pette Broad
 
Posts: 184
Joined: Mon Dec 03, 2007 9:38 pm
Location: Chester U.K

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Postby Flathead74 » Tue Feb 16, 2010 3:38 pm

I'm still seeing issues with some systems this morning.
This is a multiple GPU system.

[15:12:20] Completed 23%
[15:12:27] - Couldn't send HTTP request to server
[15:12:27] + Could not connect to Work Server (results)
[15:12:27] (171.67.108.21:80)
[15:12:27] - Error: Could not transmit unit 06 (completed February 15) to work server.
[15:12:27] - 16 failed uploads of this unit.
[15:12:27] - Read packet limit of 540015616... Set to 524286976.


[15:12:27] + Attempting to send results [February 16 15:12:27 UTC]
[15:12:27] - Reading file work/wuresults_06.dat from core
[15:12:27] (Read 131152 bytes from disk)
[15:12:27] Connecting to http://171.67.108.26:8080/
[15:13:10] Completed 24%
[15:14:00] Completed 25%
[15:14:51] Completed 26%
[15:15:41] Completed 27%
[15:16:31] Completed 28%
[15:17:21] Completed 29%
[15:17:38] Posted data.
[15:17:38] Initial: 0000; - Uploaded at ~0 kB/s
[15:17:38] - Averaged speed for that direction ~44 kB/s
[15:17:38] - Server does not have record of this unit. Will try again later.
[15:17:38] Could not transmit unit 06 to Collection server; keeping in queue.
[15:17:38] + Sent 0 of 1 completed units to the server
[15:17:38] - Autosend completed
[15:17:38] + Working...
[15:18:11] Completed 30%
Flathead74
 
Posts: 618
Joined: Sun Dec 02, 2007 6:08 pm
Location: Central New York

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Postby VijayPande » Tue Feb 16, 2010 4:10 pm

I think we've had a breakthrough (well maybe that's too strong of a term), but certainly found something that will help. People should be getting lots of credits soon. We have to see whether this will fix all of the problems.

Also, Joe is working on this today and may contact some of you for additional information to help us debug this.
Prof. Vijay Pande, PhD
Departments of Chemistry, Structural Biology, and Computer Science
Chair, Biophysics
Director, Folding@home Distributed Computing Project
Stanford University
User avatar
VijayPande
Pande Group Member
 
Posts: 2727
Joined: Fri Nov 30, 2007 6:25 am
Location: Stanford

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Postby bruce » Tue Feb 16, 2010 4:33 pm

Please watch for updates here viewtopic.php?f=24&t=13474 rather than posting "me too" messages.
bruce
Site Admin
 
Posts: 16851
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Postby Teddy » Tue Feb 16, 2010 6:47 pm

Not really a me too but the client just pauses and won't upload a protein downloaded from 71.

Code: Select all
[18:06:02] Completed 100%
[18:06:02] Successful run
[18:06:02] DynamicWrapper: Finished Work Unit: sleep=10000
[18:06:12] Reserved 101312 bytes for xtc file; Cosm status=0
[18:06:12] Allocated 101312 bytes for xtc file
[18:06:12] - Reading up to 101312 from "work/wudata_06.xtc": Read 101312
[18:06:12] Read 101312 bytes from xtc file; available packet space=786329152
[18:06:12] xtc file hash check passed.
[18:06:12] Reserved 30216 30216 786329152 bytes for arc file=<work/wudata_06.trr> Cosm status=0
[18:06:12] Allocated 30216 bytes for arc file
[18:06:12] - Reading up to 30216 from "work/wudata_06.trr": Read 30216
[18:06:12] Read 30216 bytes from arc file; available packet space=786298936
[18:06:12] trr file hash check passed.
[18:06:12] Allocated 560 bytes for edr file
[18:06:12] Read bedfile
[18:06:12] edr file hash check passed.
[18:06:12] Logfile not read.
[18:06:12] GuardedRun: success in DynamicWrapper
[18:06:12] GuardedRun: done
[18:06:12] Run: GuardedRun completed.
[18:06:17] + Opened results file
[18:06:17] - Writing 132600 bytes of core data to disk...
[18:06:17] Done: 132088 -> 131648 (compressed to 99.6 percent)
[18:06:17]   ... Done.
[18:06:17] DeleteFrameFiles: successfully deleted file=work/wudata_06.ckp
[18:06:17] Shutting down core
[18:06:17]
[18:06:17] Folding@home Core Shutdown: FINISHED_UNIT
[18:06:21] CoreStatus = 64 (100)
[18:06:21] Unit 6 finished with 97 percent of time to deadline remaining.
[18:06:21] Updated performance fraction: 0.977286
[18:06:21] Sending work to server
[18:06:21] Project: 10105 (Run 412, Clone 5, Gen 4)
[18:06:21] - Read packet limit of 540015616... Set to 524286976.


[18:06:21] + Attempting to send results [February 16 18:06:21 UTC]
[18:06:21] - Reading file work/wuresults_06.dat from core
[18:06:21]   (Read 132160 bytes from disk)
[18:06:21] Connecting to http://171.64.65.71:8080/
[18:14:48] Posted data.
[18:34:48] Initial: 001A; ***** Got a SIGTERM signal (2)
[18:36:44] Killing all core threads

Folding@Home Client Shutdown.

It just stops at Initiall: 001A

Restarting the client gives
Code: Select all
[18:36:48] Loaded queue successfully.
[18:36:48] - Preparing to get new work unit...
[18:36:48] + Attempting to get work packet
[18:36:48] - Autosending finished units... [February 16 18:36:48 UTC]
[18:36:48] - Will indicate memory of 2046 MB[18:36:48] Trying to send all finished work units

[18:36:48] Project: 10105 (Run 412, Clone 5, Gen 4)
[18:36:48] - Detect CPU.- Read packet limit of 540015616...  Vendor: GenuineIntel, Family: 6, Model: 7, Stepping: 10
Set to 524286976.
[18:36:48] - Connecting to assignment server
[18:36:48] Connecting to http://assign-GPU.stanford.edu:8080/


[18:36:48] + Attempting to send results [February 16 18:36:48 UTC]
[18:36:48] - Reading file work/wuresults_06.dat from core
[18:36:48]   (Read 132160 bytes from disk)
[18:36:48] Connecting to http://171.64.65.71:8080/
[18:36:53] Posted data.
[18:36:53] Initial: 43AB; - Successful: assigned to (171.67.108.11).
[18:36:53] + News From Folding@Home: Welcome to Folding@Home
[18:36:53] Loaded queue successfully.
[18:36:53] Connecting to http://171.67.108.11:8080/
[18:36:54] Posted data.
[18:36:54] Initial: 0000; - Receiving payload (expected size: 45932)
[18:36:55] - Downloaded at ~44 kB/s
[18:36:55] - Averaged speed for that direction ~41 kB/s
[18:36:55] + Received work.
[18:36:55] + Closed connections
[18:36:55]
[18:36:55] + Processing work unit
[18:36:55] Core required: FahCore_11.exe
[18:36:55] Core found.
[18:36:55] Working on queue slot 07 [February 16 18:36:55 UTC]
[18:36:55] + Working ...
[18:36:55] - Calling '.\FahCore_11.exe -dir work/ -suffix 07 -checkpoint 3 -forceasm -verbose -lifeline 201908 -version 623'

[18:36:56]
[18:36:56] *------------------------------*
[18:36:56] Folding@Home GPU Core
[18:36:56] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[18:36:56]
[18:36:56] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[18:36:56] Build host: amoeba
[18:36:56] Board Type: Nvidia
[18:36:56] Core      :
[18:36:56] Preparing to commence simulation
[18:36:56] - Assembly optimizations manually forced on.
[18:36:56] - Not checking prior termination.
[18:36:56] - Expanded 45420 -> 251112 (decompressed 552.8 percent)
[18:36:56] Called DecompressByteArray: compressed_data_size=45420 data_size=251112, decompressed_data_size=251112 diff=0
[18:36:56] - Digital signature verified
[18:36:56]
[18:36:56] Project: 5769 (Run 5, Clone 7, Gen 1994)
[18:36:56]
[18:36:56] Assembly optimizations on if available.
[18:36:56] Entering M.D.
[18:36:57] Posted data.
[18:36:57] Initial: 0000; - Uploaded at ~14 kB/s
[18:36:57] - Averaged speed for that direction ~19 kB/s
[18:36:57] - Server has already received unit.
[18:36:57] + Sent 0 of 1 completed units to the server
[18:36:57] - Autosend completed
[18:37:02] Tpr hash work/wudata_07.tpr:  1916471031 2789500384 358254593 3107103354 2349751758
[18:37:02]
[18:37:02] Calling fah_main args: 14 usage=100
[18:37:02]
[18:37:02] Working on Protein
[18:37:03] Client config found, loading data.
[18:37:03] Starting GUI Server
[18:37:43] Completed 1%
[18:38:23] Completed 2%


So is the unit upoaded & why does it hang after the intial bit...

Teddy
Teddy
 
Posts: 161
Joined: Tue Feb 12, 2008 3:05 am
Location: Canberra, Australia

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Postby bruce » Tue Feb 16, 2010 7:15 pm

Teddy wrote:Not really a me too but the client just pauses and won't upload a protein downloaded from 71.
. . .
So is the unit upoaded & why does it hang after the intial bit...

Teddy


As you know, they're working on the problem. You're asking the exact same questions that the developers are (probably) asking themselves. Also, you're probably seeing an interim fix which improves part of the problem but causes something different to happen. Until they call the problem resolved, all I can tell you is that they're working on the problem.
bruce
Site Admin
 
Posts: 16851
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Postby glussier » Tue Feb 16, 2010 8:23 pm

Also, you're probably seeing an interim fix which improves part of the problem but causes something different to happen


For me, it was working ok since yesterday evening. Now I'm back with the same "- Server has already received unit." I had on Sunday. So, there is a change allright, however, I'm not sure it's in the right direction.
Image
glussier
 
Posts: 16
Joined: Wed Nov 18, 2009 3:57 am

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Postby bruce » Tue Feb 16, 2010 9:21 pm

With the current setup, it looks like the client successfully uploads the WU the first time but something goes wrong. The server does not confirm the upload (the message about the ACK) so the client continues to believe that it was not uploaded. Then the client repeatedly tries to upload the WU even though the server doesn't want a second copy. If this is correct, then the "already received unit" message is probably correct and the most direct approach would be to delete the WU manually. The earlier message "server has no record" really means the same thing: The collection server is no longer authorized to accept a copy of this WU because FAH is no longer waiting for you to return it.

Before taking that sort of action, however, we need to confirm that my assessment is true.
bruce
Site Admin
 
Posts: 16851
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Postby filu » Tue Feb 16, 2010 9:33 pm

And again the same thing, and it was good. It seems that I lose only unnecessary electricity.
Code: Select all
[20:35:40] Completed 100%
[20:35:40] Successful run
[20:35:40] DynamicWrapper: Finished Work Unit: sleep=10000
[20:35:50] Reserved 101348 bytes for xtc file; Cosm status=0
[20:35:50] Allocated 101348 bytes for xtc file
[20:35:50] - Reading up to 101348 from "work/wudata_03.xtc": Read 101348
[20:35:50] Read 101348 bytes from xtc file; available packet space=786329116
[20:35:50] xtc file hash check passed.
[20:35:50] Reserved 30216 30216 786329116 bytes for arc file=<work/wudata_03.trr> Cosm status=0
[20:35:50] Allocated 30216 bytes for arc file
[20:35:50] - Reading up to 30216 from "work/wudata_03.trr": Read 30216
[20:35:50] Read 30216 bytes from arc file; available packet space=786298900
[20:35:50] trr file hash check passed.
[20:35:50] Allocated 560 bytes for edr file
[20:35:50] Read bedfile
[20:35:50] edr file hash check passed.
[20:35:50] Logfile not read.
[20:35:50] GuardedRun: success in DynamicWrapper
[20:35:50] GuardedRun: done
[20:35:50] Run: GuardedRun completed.
[20:35:50] + Opened results file
[20:35:50] - Writing 132636 bytes of core data to disk...
[20:35:50] Done: 132124 -> 131610 (compressed to 99.6 percent)
[20:35:50]   ... Done.
[20:35:50] DeleteFrameFiles: successfully deleted file=work/wudata_03.ckp
[20:35:50] Shutting down core
[20:35:50]
[20:35:50] Folding@home Core Shutdown: FINISHED_UNIT
[20:35:53] CoreStatus = 64 (100)
[20:35:53] Sending work to server
[20:35:53] Project: 10102 (Run 970, Clone 9, Gen 9)
[20:35:53] - Read packet limit of 540015616... Set to 524286976.


[20:35:53] + Attempting to send results [February 16 20:35:53 UTC]
[21:12:21] - Unknown packet returned from server, expected ACK for results
[21:12:21] - Error: Could not transmit unit 03 (completed February 16) to work server.
[21:12:21]   Keeping unit 03 in queue.
[21:12:21] Project: 10102 (Run 970, Clone 9, Gen 9)
[21:12:21] - Read packet limit of 540015616... Set to 524286976.


[21:12:21] + Attempting to send results [February 16 21:12:21 UTC]
[21:12:48] - Server has already received unit.
[21:12:48] - Preparing to get new work unit...
[21:12:48] + Attempting to get work packet
[21:12:48] - Connecting to assignment server
[21:12:49] - Successful: assigned to (171.64.65.71).
[21:12:49] + News From Folding@Home: Welcome to Folding@Home
[21:12:49] Loaded queue successfully.
[21:13:01] + Closed connections
[21:13:01]
[21:13:01] + Processing work unit
[21:13:01] Core required: FahCore_11.exe
[21:13:01] Core found.
[21:13:01] Working on queue slot 04 [February 16 21:13:01 UTC]
[21:13:01] + Working ...
[21:13:01]
[21:13:01] *------------------------------*
[21:13:01] Folding@Home GPU Core
[21:13:01] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[21:13:01]
[21:13:01] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[21:13:01] Build host: amoeba
[21:13:01] Board Type: Nvidia
[21:13:01] Core      :
[21:13:01] Preparing to commence simulation
[21:13:01] - Looking at optimizations...
[21:13:01] DeleteFrameFiles: successfully deleted file=work/wudata_04.ckp
[21:13:01] - Created dyn
[21:13:01] - Files status OK
[21:13:01] - Expanded 88717 -> 447307 (decompressed 504.1 percent)
[21:13:01] Called DecompressByteArray: compressed_data_size=88717 data_size=447307, decompressed_data_size=447307 diff=0
[21:13:01] - Digital signature verified
[21:13:01]
[21:13:01] Project: 10102 (Run 104, Clone 6, Gen 10)
[21:13:01]
[21:13:01] Assembly optimizations on if available.
[21:13:01] Entering M.D.
[21:13:07] Tpr hash work/wudata_04.tpr:  1694142915 2516875213 1619875581 2972462457 686066882
[21:13:07]
[21:13:07] Calling fah_main args: 14 usage=100
Image
i7-2600K@4.8 Asus P8P67 EVO 2x2GB GTX480
i7-920@4.0 GA-EX58-UD5 3x2GB 2xGTX560Ti
2x Xeon 5620 6x 2GB
2x Xeon 5645 6x 2GB
filu
 
Posts: 76
Joined: Mon Aug 03, 2009 9:33 am
Location: Krzeszyce, Poland

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Postby Nathan_P » Tue Feb 16, 2010 9:47 pm

171.67.108.26 is in reject again, cpu load is 10+ and the net load is 828. Can someone please give it a nudge
Censorship leads to dictatorship

Image
Nathan_P
 
Posts: 1321
Joined: Wed Apr 01, 2009 9:22 pm
Location: Jersey, Channel islands

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Postby ikerekes » Tue Feb 16, 2010 9:48 pm

@Bruce
The situation might be even worst :(.
I have just finished a wu, and it doesn't even tries to upload... it just gets the next wu
Code: Select all
[20:57:48] Completed 98%
[20:59:37] Completed 99%
[21:01:25] Completed 100%
[21:01:25] Successful run
[21:01:25] DynamicWrapper: Finished Work Unit: sleep=10000
[21:01:35] Reserved 101284 bytes for xtc file; Cosm status=0
[21:01:35] Allocated 101284 bytes for xtc file
[21:01:35] - Reading up to 101284 from "work/wudata_05.xtc": Read 101284
[21:01:35] Read 101284 bytes from xtc file; available packet space=786329180
[21:01:35] xtc file hash check passed.
[21:01:35] Reserved 30216 30216 786329180 bytes for arc file=<work/wudata_05.trr> Cosm status=0
[21:01:35] Allocated 30216 bytes for arc file
[21:01:35] - Reading up to 30216 from "work/wudata_05.trr": Read 30216
[21:01:35] Read 30216 bytes from arc file; available packet space=786298964
[21:01:35] trr file hash check passed.
[21:01:35] Allocated 560 bytes for edr file
[21:01:35] Read bedfile
[21:01:35] edr file hash check passed.
[21:01:35] Logfile not read.
[21:01:35] GuardedRun: success in DynamicWrapper
[21:01:35] GuardedRun: done
[21:01:35] Run: GuardedRun completed.
[21:01:40] + Opened results file
[21:01:40] - Writing 132572 bytes of core data to disk...
[21:01:40] Done: 132060 -> 131598 (compressed to 99.6 percent)
[21:01:40]   ... Done.
[21:01:40] DeleteFrameFiles: successfully deleted file=work/wudata_05.ckp
[21:01:40] Shutting down core
[21:01:40]
[21:01:40] Folding@home Core Shutdown: FINISHED_UNIT
[21:01:43] CoreStatus = 64 (100)
[21:01:43] Sending work to server
[21:01:43] - Preparing to get new work unit...
[21:01:43] + Attempting to get work packet
[21:01:43] - Connecting to assignment server
[21:01:43] - Successful: assigned to (171.67.108.21).
[21:01:43] + News From Folding@Home: Welcome to Folding@Home
[21:01:43] Loaded queue successfully.
[21:01:44] + Closed connections
[21:01:44]
[21:01:44] + Processing work unit
[21:01:44] Core required: FahCore_11.exe
[21:01:44] Core found.
[21:01:44] Working on queue slot 06 [February 16 21:01:44 UTC]
[21:01:44] + Working ...
[21:01:44]
[21:01:44] *------------------------------*
[21:01:44] Folding@Home GPU Core
[21:01:44] Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
[21:01:44]
[21:01:44] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[21:01:44] Build host: amoeba
[21:01:44] Board Type: Nvidia
[21:01:44] Core      :
[21:01:44] Preparing to commence simulation
[21:01:44] - Looking at optimizations...
[21:01:44] DeleteFrameFiles: successfully deleted file=work/wudata_06.ckp
[21:01:44] - Created dyn
[21:01:44] - Files status OK
[21:01:44] - Expanded 18488 -> 137900 (decompressed 745.8 percent)
[21:01:44] Called DecompressByteArray: compressed_data_size=18488 data_size=137900, decompressed_data_size=137900 diff=0
[21:01:44] - Digital signature verified
[21:01:44]
[21:01:44] Project: 3469 (Run 32, Clone 147, Gen 0)
[21:01:44]
[21:01:44] Assembly optimizations on if available.
[21:01:44] Entering M.D.
[21:01:50] Tpr hash work/wudata_06.tpr:  2877148812 3580960738 2109933301 2426840157 610428056
[21:01:50]
[21:01:50] Calling fah_main args: 14 usage=100
[21:01:50]
[21:01:50] Working on Fs-peptide-GBSA
[21:01:50] Client config found, loading data.
[21:01:50] Starting GUI Server
[21:02:39] Completed 1%
[21:03:29] Completed 2%
[21:04:18] Completed 3%
[21:05:07] Completed 4%
[21:05:57] Completed 5%
[21:06:46] Completed 6%
[21:07:36] Completed 7%
[21:08:25] Completed 8%
[21:09:14] Completed 9%
[21:10:04] Completed 10%
[21:10:53] Completed 11%
[21:11:42] Completed 12%
[21:12:32] Completed 13%
[21:13:21] Completed 14%
[21:14:10] Completed 15%
[21:15:00] Completed 16%
[21:15:49] Completed 17%
[21:16:38] Completed 18%
[21:17:28] Completed 19%
[21:18:17] Completed 20%
[21:19:06] Completed 21%
[21:19:56] Completed 22%
[21:20:45] Completed 23%

the corresponding qd list looks like this:
Code: Select all
kerekei@ubiQ6600-desktop:~$ foldingathome/qd -f /mnt/ocngpu|grep Index -A 4
 Index 7: finished 23.5 X min speed
  server: 171.64.65.71:8080; project: 10105
  Folding: run 439, clone 8, generation 3; benchmark 0; misc: 500, 200
  issue: Mon Feb 15 18:13:27 2010; begin: Mon Feb 15 18:13:26 2010
  end: Mon Feb 15 21:17:39 2010; due: Thu Feb 18 18:13:26 2010 (3 days)
--
 Index 8: finished 23.8 X min speed
  server: 171.64.65.71:8080; project: 10103
  Folding: run 841, clone 8, generation 5; benchmark 0; misc: 500, 200
  issue: Mon Feb 15 21:21:16 2010; begin: Mon Feb 15 21:21:15 2010
  end: Tue Feb 16 00:22:50 2010; due: Thu Feb 18 21:21:15 2010 (3 days)
--
 Index 9: deleted
  server: 171.64.65.71:8080; project: 10103
  Folding: run 960, clone 3, generation 5; benchmark 0; misc: 500, 200
  issue: Tue Feb 16 00:35:51 2010; begin: Tue Feb 16 00:35:51 2010
  end: Tue Feb 16 03:27:03 2010; due: Fri Feb 19 00:35:51 2010 (3 days)
--
 Index 0: finished 353.00 pts (154.242 pt/hr) 31.5 X min speed
  server: 171.67.108.11:8080; project: 5772
  Folding: run 5, clone 142, generation 2096; benchmark 0; misc: 500, 200
  issue: Tue Feb 16 03:27:05 2010; begin: Tue Feb 16 03:27:05 2010
  end: Tue Feb 16 05:44:24 2010; due: Fri Feb 19 03:27:05 2010 (3 days)
--
 Index 1: ready for upload 164 X min speed
  server: 171.67.108.21:8080; project: 3470
  Folding: run 10, clone 62, generation 0; benchmark 0; misc: 500, 200
  issue: Fri Feb 12 20:27:41 2010; begin: Fri Feb 12 20:27:36 2010
  end: Fri Feb 12 21:46:40 2010; due: Sun Feb 21 20:27:36 2010 (9 days)
--
 Index 2: ready for upload
  server: 171.64.65.71:8080; project: 10102
  Folding: run 363, clone 0, generation 9; benchmark 0; misc: 500, 200
  issue: Mon Feb 15 08:19:47 2010; begin: Mon Feb 15 08:19:47 2010
  end: Mon Feb 15 09:02:25 2010; due: Thu Feb 18 08:19:47 2010 (3 days)
--
 Index 3: deleted 783.00 pts
  server: 171.67.108.21:8080; project: 5783
  Folding: run 7, clone 28, generation 29; benchmark 0; misc: 500, 200
  issue: Tue Feb 16 05:45:02 2010; begin: Tue Feb 16 05:45:01 2010
  end: Tue Feb 16 08:49:51 2010; due: Sat Mar 13 05:45:01 2010 (25 days)
--
 Index 4: finished 353.00 pts (246.231 pt/hr) 50.2 X min speed
  server: 171.67.108.11:8080; project: 5767
  Folding: run 0, clone 139, generation 1831; benchmark 0; misc: 500, 200
  issue: Tue Feb 16 08:53:15 2010; begin: Tue Feb 16 08:53:14 2010
  end: Tue Feb 16 10:19:15 2010; due: Fri Feb 19 08:53:14 2010 (3 days)
--
 Index 5: ready for upload 23.8 X min speed
  server: 171.64.65.71:8080; project: 10105
  Folding: run 497, clone 5, generation 4; benchmark 0; misc: 500, 200
  issue: Tue Feb 16 11:00:07 2010; begin: Tue Feb 16 11:00:06 2010
  end: Tue Feb 16 14:01:43 2010; due: Fri Feb 19 11:00:06 2010 (3 days)
--
 Index 6: folding now 157 X min speed; 49% complete
  server: 171.67.108.21:8080; project: 3469
  Folding: run 32, clone 147, generation 0; benchmark 0; misc: 500, 200
  issue: Tue Feb 16 14:01:45 2010; begin: Tue Feb 16 14:01:44 2010
  expect: Tue Feb 16 15:24:12 2010; due: Thu Feb 25 14:01:44 2010 (9 days)
Image
ikerekes
 
Posts: 170
Joined: Thu Nov 13, 2008 4:18 pm
Location: Calgary, Canada

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Postby glussier » Tue Feb 16, 2010 9:55 pm

bruce wrote:With the current setup, it looks like the client successfully uploads the WU the first time but something goes wrong. The server does not confirm the upload (the message about the ACK) so the client continues to believe that it was not uploaded. Then the client repeatedly tries to upload the WU even though the server doesn't want a second copy. If this is correct, then the "already received unit" message is probably correct and the most direct approach would be to delete the WU manually. The earlier message "server has no record" really means the same thing: The collection server is no longer authorized to accept a copy of this WU because FAH is no longer waiting for you to return it.

Before taking that sort of action, however, we need to confirm that my assessment is true.


I.m not so sure about that, 'cause I never get credited the point for those work units. I'm down 15 to 20k point/day since last Sunday.
glussier
 
Posts: 16
Joined: Wed Nov 18, 2009 3:57 am

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Postby noorman » Tue Feb 16, 2010 10:21 pm

Nathan_P wrote:171.67.108.26 is in reject again, cpu load is 10+ and the net load is 828. Can someone please give it a nudge
.

It 's Accepting again ...

They are looking in to this.

Some servers don't use the CS to collect their work, they do that themselves.

If your Client doesn't progress anymore and can't or won't get work ( after the Results have been sent / see messages that Bruce talked about - see post above this one ) you can try stopping the Client, deleting the Work folder and the queue.dat file and restart your Client.

I did so, at around noon my time and now, 12 hrs later, I have been assigned other GPU servers which are working 100% and are giving me Work and Collecting results normally ...
It 's worth a try :D


.
User avatar
noorman
 
Posts: 553
Joined: Sun Dec 02, 2007 2:26 pm
Location: Belgium, near the International Sea-Port of Antwerp

PreviousNext

Return to Issues with a specific server

Who is online

Users browsing this forum: No registered users and 1 guest