Project: 7503 (Run 0, Clone 56, Gen 147) - NO CREDIT

Moderators: Site Moderators, PandeGroup

Project: 7503 (Run 0, Clone 56, Gen 147) - NO CREDIT

Postby Bob8421 » Thu Nov 17, 2011 3:09 pm

Code: Select all
[03:00:01] + Processing work unit
[03:00:01] Core required: FahCore_a3.exe
[03:00:01] Core found.
[03:00:01] Working on queue slot 01 [November 16 03:00:01 UTC]
[03:00:01] + Working ...
[03:00:01] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 01 -np 8 -checkpoint 10 -verbose -lifeline 3792 -version 634'

[03:00:01]
[03:00:01] *------------------------------*
[03:00:01] Folding@Home Gromacs SMP Core
[03:00:01] Version 2.27 (Dec. 15, 2010)
[03:00:01]
[03:00:01] Preparing to commence simulation
[03:00:01] - Looking at optimizations...
[03:00:01] - Created dyn
[03:00:01] - Files status OK
[03:00:01] - Expanded 690919 -> 814976 (decompressed 117.9 percent)
[03:00:01] Called DecompressByteArray: compressed_data_size=690919 data_size=814976, decompressed_data_size=814976 diff=0
[03:00:01] - Digital signature verified
[03:00:01]
[03:00:01] Project: 7503 (Run 0, Clone 56, Gen 147)
[03:00:01]
[03:00:01] Assembly optimizations on if available.
[03:00:01] Entering M.D.
[03:00:07] Mapping NT from 8 to 8
[03:00:07] Completed 0 out of 1000000 steps  (0%)
[03:03:28] Completed 10000 out of 1000000 steps  (1%)
[03:06:49] Completed 20000 out of 1000000 steps  (2%)
[03:10:11] Completed 30000 out of 1000000 steps  (3%)
[03:13:33] Completed 40000 out of 1000000 steps  (4%)
[03:16:57] Completed 50000 out of 1000000 steps  (5%)
[03:20:21] Completed 60000 out of 1000000 steps  (6%)
[03:23:45] Completed 70000 out of 1000000 steps  (7%)
[03:27:10] Completed 80000 out of 1000000 steps  (8%)
[03:30:35] Completed 90000 out of 1000000 steps  (9%)
[03:33:59] Completed 100000 out of 1000000 steps  (10%)
[03:37:23] Completed 110000 out of 1000000 steps  (11%)
[03:40:46] Completed 120000 out of 1000000 steps  (12%)
[03:44:09] Completed 130000 out of 1000000 steps  (13%)
[03:47:31] Completed 140000 out of 1000000 steps  (14%)
[03:50:53] Completed 150000 out of 1000000 steps  (15%)
[03:54:16] Completed 160000 out of 1000000 steps  (16%)
[03:55:10] - Autosending finished units... [November 16 03:55:10 UTC]
[03:55:10] Trying to send all finished work units
[03:55:10] + No unsent completed units remaining.
[03:55:10] - Autosend completed
[03:57:40] Completed 170000 out of 1000000 steps  (17%)
[04:01:02] Completed 180000 out of 1000000 steps  (18%)
[04:04:23] Completed 190000 out of 1000000 steps  (19%)
[04:07:48] Completed 200000 out of 1000000 steps  (20%)
[04:11:10] Completed 210000 out of 1000000 steps  (21%)
[04:14:31] Completed 220000 out of 1000000 steps  (22%)
[04:17:54] Completed 230000 out of 1000000 steps  (23%)
[04:21:19] Completed 240000 out of 1000000 steps  (24%)
[04:24:44] Completed 250000 out of 1000000 steps  (25%)
[04:28:10] Completed 260000 out of 1000000 steps  (26%)
[04:31:35] Completed 270000 out of 1000000 steps  (27%)
[04:34:59] Completed 280000 out of 1000000 steps  (28%)
[04:38:24] Completed 290000 out of 1000000 steps  (29%)
[04:41:48] Completed 300000 out of 1000000 steps  (30%)
[04:45:12] Completed 310000 out of 1000000 steps  (31%)
[04:48:34] Completed 320000 out of 1000000 steps  (32%)
[04:51:59] Completed 330000 out of 1000000 steps  (33%)
[04:55:26] Completed 340000 out of 1000000 steps  (34%)
[04:58:51] Completed 350000 out of 1000000 steps  (35%)
[05:02:15] Completed 360000 out of 1000000 steps  (36%)
[05:05:42] Completed 370000 out of 1000000 steps  (37%)
[05:09:08] Completed 380000 out of 1000000 steps  (38%)
[05:12:33] Completed 390000 out of 1000000 steps  (39%)
[05:15:58] Completed 400000 out of 1000000 steps  (40%)
[05:19:24] Completed 410000 out of 1000000 steps  (41%)
[05:22:48] Completed 420000 out of 1000000 steps  (42%)
[05:26:13] Completed 430000 out of 1000000 steps  (43%)
[05:29:40] Completed 440000 out of 1000000 steps  (44%)
[05:33:05] Completed 450000 out of 1000000 steps  (45%)
[05:36:31] Completed 460000 out of 1000000 steps  (46%)
[05:40:01] Completed 470000 out of 1000000 steps  (47%)
[05:43:28] Completed 480000 out of 1000000 steps  (48%)
[05:46:56] Completed 490000 out of 1000000 steps  (49%)
[05:50:22] Completed 500000 out of 1000000 steps  (50%)
[05:53:46] Completed 510000 out of 1000000 steps  (51%)
[05:57:10] Completed 520000 out of 1000000 steps  (52%)
[06:00:37] Completed 530000 out of 1000000 steps  (53%)
[06:04:02] Completed 540000 out of 1000000 steps  (54%)
[06:07:28] Completed 550000 out of 1000000 steps  (55%)
[06:10:54] Completed 560000 out of 1000000 steps  (56%)
[06:14:20] Completed 570000 out of 1000000 steps  (57%)
[06:17:47] Completed 580000 out of 1000000 steps  (58%)
[06:21:15] Completed 590000 out of 1000000 steps  (59%)
[06:24:42] Completed 600000 out of 1000000 steps  (60%)
[06:28:06] Completed 610000 out of 1000000 steps  (61%)
[06:31:35] Completed 620000 out of 1000000 steps  (62%)
[06:34:56] Completed 630000 out of 1000000 steps  (63%)
[06:38:19] Completed 640000 out of 1000000 steps  (64%)
[06:41:42] Completed 650000 out of 1000000 steps  (65%)
[06:45:05] Completed 660000 out of 1000000 steps  (66%)
[06:48:31] Completed 670000 out of 1000000 steps  (67%)
[06:51:55] Completed 680000 out of 1000000 steps  (68%)
[06:55:20] Completed 690000 out of 1000000 steps  (69%)
[06:58:49] Completed 700000 out of 1000000 steps  (70%)
[07:02:17] Completed 710000 out of 1000000 steps  (71%)
[07:05:45] Completed 720000 out of 1000000 steps  (72%)
[07:09:13] Completed 730000 out of 1000000 steps  (73%)
[07:12:40] Completed 740000 out of 1000000 steps  (74%)
[07:16:04] Completed 750000 out of 1000000 steps  (75%)
[07:19:28] Completed 760000 out of 1000000 steps  (76%)
[07:22:51] Completed 770000 out of 1000000 steps  (77%)
[07:26:14] Completed 780000 out of 1000000 steps  (78%)
[07:29:37] Completed 790000 out of 1000000 steps  (79%)
[07:32:59] Completed 800000 out of 1000000 steps  (80%)
[07:36:23] Completed 810000 out of 1000000 steps  (81%)
[07:39:48] Completed 820000 out of 1000000 steps  (82%)
[07:43:16] Completed 830000 out of 1000000 steps  (83%)
[07:46:43] Completed 840000 out of 1000000 steps  (84%)
[07:50:11] Completed 850000 out of 1000000 steps  (85%)
[07:53:36] Completed 860000 out of 1000000 steps  (86%)
[07:57:01] Completed 870000 out of 1000000 steps  (87%)
[08:00:27] Completed 880000 out of 1000000 steps  (88%)
[08:03:52] Completed 890000 out of 1000000 steps  (89%)
[08:07:19] Completed 900000 out of 1000000 steps  (90%)
[08:10:47] Completed 910000 out of 1000000 steps  (91%)
[08:14:14] Completed 920000 out of 1000000 steps  (92%)
[08:17:41] Completed 930000 out of 1000000 steps  (93%)
[08:21:08] Completed 940000 out of 1000000 steps  (94%)
[08:24:36] Completed 950000 out of 1000000 steps  (95%)
[08:28:05] Completed 960000 out of 1000000 steps  (96%)
[08:31:34] Completed 970000 out of 1000000 steps  (97%)
[08:35:01] Completed 980000 out of 1000000 steps  (98%)
[08:38:30] Completed 990000 out of 1000000 steps  (99%)
[08:42:00] Completed 1000000 out of 1000000 steps  (100%)
[08:42:01] DynamicWrapper: Finished Work Unit: sleep=10000
[08:42:10]
[08:42:10] Finished Work Unit:
[08:42:10] - Reading up to 3547440 from "work/wudata_01.trr": Read 3547440
[08:42:10] trr file hash check passed.
[08:42:10] - Reading up to 4401496 from "work/wudata_01.xtc": Read 4401496
[08:42:10] xtc file hash check passed.
[08:42:10] edr file hash check passed.
[08:42:10] logfile size: 640259
[08:42:10] Leaving Run
[08:42:10] - Writing 8653423 bytes of core data to disk...
[08:42:12] Done: 8652911 -> 7913684 (compressed to 91.4 percent)
[08:42:12]   ... Done.
[08:42:13] - Shutting down core
[08:42:13]
[08:42:13] Folding@home Core Shutdown: FINISHED_UNIT
[08:42:16] CoreStatus = 64 (100)
[08:42:16] Unit 1 finished with 95 percent of time to deadline remaining.
[08:42:16] Updated performance fraction: 0.946723
[08:42:16] Sending work to server
[08:42:16] Project: 7503 (Run 0, Clone 56, Gen 147)


[08:42:16] + Attempting to send results [November 16 08:42:16 UTC]
[08:42:16] - Reading file work/wuresults_01.dat from core
[08:42:16]   (Read 7914196 bytes from disk)
[08:42:16] Connecting to http://128.143.199.97:8080/
[08:42:37] - Couldn't send HTTP request to server
[08:42:37] + Could not connect to Work Server (results)
[08:42:37]     (128.143.199.97:8080)
[08:42:37] + Retrying using alternative port
[08:42:37] Connecting to http://128.143.199.97:80/
[08:42:58] - Couldn't send HTTP request to server
[08:42:58] + Could not connect to Work Server (results)
[08:42:58]     (128.143.199.97:80)
[08:42:58] - Error: Could not transmit unit 01 (completed November 16) to work server.
[08:42:58] - 1 failed uploads of this unit.
[08:42:58]   Keeping unit 01 in queue.
[08:42:58] Trying to send all finished work units
[08:42:58] Project: 7503 (Run 0, Clone 56, Gen 147)


[08:42:58] + Attempting to send results [November 16 08:42:58 UTC]
[08:42:58] - Reading file work/wuresults_01.dat from core
[08:42:58]   (Read 7914196 bytes from disk)
[08:42:58] Connecting to http://128.143.199.97:8080/
[08:43:19] - Couldn't send HTTP request to server
[08:43:19] + Could not connect to Work Server (results)
[08:43:19]     (128.143.199.97:8080)
[08:43:19] + Retrying using alternative port
[08:43:19] Connecting to http://128.143.199.97:80/
[08:46:25] - Couldn't send HTTP request to server
[08:46:25] + Could not connect to Work Server (results)
[08:46:25]     (128.143.199.97:80)
[08:46:25] - Error: Could not transmit unit 01 (completed November 16) to work server.
[08:46:25] - 2 failed uploads of this unit.


[08:46:25] + Attempting to send results [November 16 08:46:25 UTC]
[08:46:25] - Reading file work/wuresults_01.dat from core
[08:46:25]   (Read 7914196 bytes from disk)
[08:46:25] Connecting to http://130.237.165.141:8080/
[08:47:23] Posted data.
[08:47:23] Initial: 0000; - Uploaded at ~133 kB/s
[08:47:23] - Averaged speed for that direction ~133 kB/s
[08:47:23] + Results successfully sent
[08:47:23] Thank you for your contribution to Folding@Home.
[08:47:23] + Number of Units Completed: 814

[08:47:23]   Successfully sent unit 01 to Collection server.
[08:47:24] + Sent 1 of 1 completed units to the server
Bob8421
 
Posts: 94
Joined: Tue Dec 22, 2009 5:16 pm

Re: Project: 7503 (Run 0, Clone 56, Gen 147) - NO CREDIT

Postby sortofageek » Thu Nov 17, 2011 3:41 pm

There is no data back on Project: 7503 (Run 0, Clone 56, Gen 147) yet. I will mark it for moderator followup. If we can make any determination about the WU, someone will post that in this topic.
User avatar
sortofageek
Site Admin
 
Posts: 3172
Joined: Fri Nov 30, 2007 8:06 pm
Location: Team Helix

Re: Project: 7503 (Run 0, Clone 56, Gen 147) - NO CREDIT

Postby sortofageek » Thu Nov 17, 2011 3:43 pm

[08:47:23] Successfully sent unit 01 to Collection server.


Notice the WU went back to the CS instead of the Work Server. The CS will return it to the WS when it can receive it. The points can't be credited until then.

The WS is 128.143.199.97 which was exhibiting some issues yesterday. Dr. Kasson rebooted it and it still may be in catch-up mode after having been in Reject status several times yesterday.
User avatar
sortofageek
Site Admin
 
Posts: 3172
Joined: Fri Nov 30, 2007 8:06 pm
Location: Team Helix

Re: Project: 7503 (Run 0, Clone 56, Gen 147) - NO CREDIT

Postby Bob8421 » Fri Nov 18, 2011 1:13 pm

Thanks for the explanation. Just how long is "catch-up mode" supposed to take? It's been over two days now and still no credit.
Bob8421
 
Posts: 94
Joined: Tue Dec 22, 2009 5:16 pm

Re: Project: 7503 (Run 0, Clone 56, Gen 147) - NO CREDIT

Postby bruce » Mon Nov 21, 2011 8:02 pm

Bob8421 wrote:Just how long is "catch-up mode" supposed to take? It's been over two days now and still no credit.


Let's separate this into two issues. You need to know whether the WU was uploaded or if it's still trying to upload.

If a server has been down/had troubles/etc. there are an unknown number of clients still trying to upload their results. The server resources can accept uploads at some maximum rate and some clients may have to retry again later. There is no reasonable way to measure how many clients still have un-uploaded results so you can't predict how long the server will need to operate near that maximum rate before traffic diminishes to a more normal level.

Once your results have actually been uploaded, we would be looking for issues associate with the crediting process, not with what's going on with the server, itself. There might be different crediting issues depending on whether the WU was uploaded to the Work Server or the Collection Server, but the catch-up-mode comment would no longer apply.

There's now one credit record for that WU, but it's somebody else that completed it.
bruce
 
Posts: 20837
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 7503 (Run 0, Clone 56, Gen 147) - NO CREDIT

Postby sortofageek » Wed Nov 23, 2011 3:09 pm

We now have this piece of the puzzle.

Days taken to complete WU: 6.63

Hi Bob8421 (team 11314),
Your WU (P7503 R0 C56 G147) was added to the stats database on 2011-11-22 10:07:55 for 552 points of credit.
User avatar
sortofageek
Site Admin
 
Posts: 3172
Joined: Fri Nov 30, 2007 8:06 pm
Location: Team Helix

Re: Project: 7503 (Run 0, Clone 56, Gen 147) - NO CREDIT

Postby Bob8421 » Fri Nov 25, 2011 5:38 pm

Thanks for the update. It explains a few things.

According to the log at the top of this thread, the actual time to complete the work unit was 5.79 HOURS, not 6.63 DAYS. Apparently the other 6 days were needed to get it from the collection server back to the work server. It seems that points are calculated using the time the work server finally receives the work unit, not the time that the collection server first received it. That kind of makes the collection server totally useless since the work unit could have just as well stayed on my system for another 6 days and gotten the same number of points. It should have been 4103 points if the actual time the work unit was received was used in the calculation. Now I hate the collection server even more than before!

This also explains one other mystery while creating still another. On the 22nd I received credit for 2 work units and 1196 points that I had no explanation for. This work unit appears to be one of them, but I still have no idea what the other work unit and 644 points were for. It has been months since I had anything that was not properly credited and if something that old was picked up now it should have been worth either thousands of points or most likely nothing at all.
Bob8421
 
Posts: 94
Joined: Tue Dec 22, 2009 5:16 pm

Re: Project: 7503 (Run 0, Clone 56, Gen 147) - NO CREDIT

Postby bruce » Fri Nov 25, 2011 11:57 pm

Bob8421 wrote:Working on queue slot 01 [November 16 03:00:01 UTC]

[08:46:25] + Attempting to send results [November 16 08:46:25 UTC]
[08:47:23] + Results successfully sent


Apparently you're right. The assign time matche but most of the time was between when the WU went to the Collection Server and when the data actually arrived at the Stats server.
WU assigned to donor at: 2011-11-15 18:57:55 PST >>> 2011-11-16 02:57:55 UTC
Entered into logs at: 2011-11-22 10:03:30 PST >>> 2011-11-22 18:03:30 UTC

During those 6 days I'm sure there were others either before or after your upload that are in the same predicament and each case will have a unique time. Manually running a recredit and awarding baseline points is extra work, but it's better than nothing. Managing individual bonus calculations it a LOT of manual work especially since (apparently) the CS didn't log the upload time, only the time that the information reached the stats server. The Pande Group will probably fix this bug going forward but my hunch is that they won't correct the bonus retroactively, much as you'd like that to happen.

Sorry. :(
bruce
 
Posts: 20837
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 7503 (Run 0, Clone 56, Gen 147) - NO CREDIT

Postby Bob8421 » Sat Nov 26, 2011 1:41 pm

bruce wrote:The Pande Group will probably fix this bug going forward but my hunch is that they won't correct the bonus retroactively, much as you'd like that to happen.

Sorry. :(

Don't be sorry. After a few days passed I assumed that work unit was lost forever, so just getting the baseline points was kind of a bonus. And, of course, those 644 mystery points don't hurt. They seem to be baseline points for a 7504 work unit, but all my 7504s were credited properly in the first place.

What kind of shocked me was seeing that they didn't use the time the unit was actually returned. To me getting the points credited properly seemed to be the main purpose of even having a collection server rather than just waiting 6 hours to try the work server again and then giving reduced points.
Bob8421
 
Posts: 94
Joined: Tue Dec 22, 2009 5:16 pm

Re: Project: 7503 (Run 0, Clone 56, Gen 147) - NO CREDIT

Postby bruce » Sat Nov 26, 2011 10:07 pm

Bob8421 wrote:What kind of shocked me was seeing that they didn't use the time the unit was actually returned. To me getting the points credited properly seemed to be the main purpose of even having a collection server rather than just waiting 6 hours to try the work server again and then giving reduced points.


It shocked me too. I've reported it as a bug and hopefully they will find a way to fix it. Under normal circumstances, it's a small enough difference that nobody ever noticed -- or at least nobody ever reported it where I saw it.
bruce
 
Posts: 20837
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.


Return to Issues with a specific WU

Who is online

Users browsing this forum: No registered users and 1 guest

cron