bollix47 wrote:There appears to be two problems here.
1. Why is the transfer failing for the Work Server(129.74.85.15)?
2. Why is the Collection Server(129.74.85.16) dumping the results?
p.s. I just uploaded another WU to 129.74.85.15 and it worked fine.

1. Why is the transfer failing for the Work Server(129.74.85.15)?129.74.85.15 (fahnd03) was in reject mode sometime between 09:30 and 10:00 PDT.
- Code: Select all
Tue Sep 25 09:30:01 PDT 2012 129.74.85.15 fahnd03 izaguirre SMP full Accepting
Tue Sep 25 09:40:00 PDT 2012 129.74.85.15 fahnd03 izaguirre SMP full Reject
Tue Sep 25 09:50:00 PDT 2012 129.74.85.15 fahnd03 izaguirre SMP full Reject
Tue Sep 25 10:00:01 PDT 2012 129.74.85.15 fahnd03 izaguirre SMP full Accepting
To make things a little easier, I filtered your log:
- Code: Select all
16:30:15:WU00:FS00:0xa4:Completed 10000000 out of 10000000 steps (100%)
16:30:15:WU00:FS00:0xa4:DynamicWrapper: Finished Work Unit: sleep=10000
16:30:25:WU00:FS00:0xa4:
16:30:25:WU00:FS00:0xa4:Finished Work Unit:
16:30:25:WU00:FS00:0xa4:- Reading up to 2011920 from "00/wudata_01.trr": Read 2011920
16:30:25:WU00:FS00:0xa4:trr file hash check passed.
16:30:25:WU00:FS00:0xa4:- Reading up to 209244 from "00/wudata_01.xtc": Read 209244
16:30:25:WU00:FS00:0xa4:xtc file hash check passed.
16:30:25:WU00:FS00:0xa4:edr file hash check passed.
16:30:25:WU00:FS00:0xa4:logfile size: 79524
16:30:25:WU00:FS00:0xa4:Leaving Run
16:30:28:WU00:FS00:0xa4:- Writing 2325160 bytes of core data to disk...
16:30:29:WU00:FS00:0xa4:Done: 2324648 -> 1693397 (compressed to 72.8 percent)
16:30:29:WU00:FS00:0xa4: ... Done.
16:30:29:WU00:FS00:0xa4:- Shutting down core
16:30:29:WU00:FS00:0xa4:
16:30:29:WU00:FS00:0xa4:Folding@home Core Shutdown: FINISHED_UNIT
16:30:29:WU00:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
16:30:29:WU00:FS00:Sending unit results: id:00 state:SEND error:OK project:7025 run:2 clone:85 gen:37 core:0xa4 unit:0x000000570001329c4dfbad0cb27fd6b3
16:30:29:WU00:FS00:Uploading 1.62MiB to 129.74.85.15
16:30:29:WU00:FS00:Connecting to 129.74.85.15:8080
16:32:02:WU00:FS00:Upload 3.87%
16:32:02:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
16:32:02:WU00:FS00:Trying to send results to collection server
16:32:02:WU00:FS00:Uploading 1.62MiB to 129.74.85.16
16:32:02:WU00:FS00:Connecting to 129.74.85.16:8080
16:39:13:WU00:FS00:Upload 3.87%
16:39:13:ERROR:WU00:FS00:Exception: Transfer failed
16:39:13:WU00:FS00:Sending unit results: id:00 state:SEND error:OK project:7025 run:2 clone:85 gen:37 core:0xa4 unit:0x000000570001329c4dfbad0cb27fd6b3
16:39:13:WU00:FS00:Uploading 1.62MiB to 129.74.85.15
16:39:13:WU00:FS00:Connecting to 129.74.85.15:8080
16:52:43:WU00:FS00:Upload 3.87%
16:52:43:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
16:52:43:WU00:FS00:Trying to send results to collection server
16:52:43:WU00:FS00:Uploading 1.62MiB to 129.74.85.16
16:52:43:WU00:FS00:Connecting to 129.74.85.16:8080
16:52:49:WU00:FS00:Upload 58.03%
16:52:57:WU00:FS00:Upload complete
16:52:57:WU00:FS00:Server responded WORK_QUIT (404)
16:52:57:WARNING:WU00:FS00:Server did not like results, dumping
16:52:57:WU00:FS00:Cleaning up
Your initial upload attempt (129.74.85.15) failed at 16:32:02 UTC. (09:32:02 PDT)
Second attempt (129.74.85.16) failed at 16:39:13 UTC. (09:39:13 PDT)
Third attempt (129.74.85.15 again) failed at 16:52:43 UTC. (09:52:43 PDT)
Fourth attempt (back to 129.74.85.16) was completed, then rejected.
It seems both attempted uploads to the WS (129.74.85.15) occured during the short time it was in reject mode.

JuanPabloCuervo's 3 failed WS uploads were also during this timeframe. (16:51:13, 16:51:20 and 16:54:18 UTC = 09:51:13, 09:51:20, and 09:54:18 PDT)
2. Why is the Collection Server(129.74.85.16) dumping the results?Is 129.74.85.16 (fahnd04)
actually a CS?
Server status lists it as classic...
And if it really is another (disfunctional) CS, perhaps it should be on standby like the rest.
