Problems with Project: 8101 (Run 2, Clone 3, Gen 84)

Moderators: Site Moderators, FAHC Science Team

Post Reply
EXT64
Posts: 323
Joined: Mon Apr 09, 2012 11:54 pm

Problems with Project: 8101 (Run 2, Clone 3, Gen 84)

Post by EXT64 »

This is the first time I've seen this with this machine or Project. After it reported failure it picked up the same unit again, so hopefully it isn't a bad WU otherwise I'll be out 2 days of work.

As this computer has been perfect for 97 WUs (at least) and is not OC'ed, I assume the issue is a random memory error (I don't have ECC yet), network error, or server error (I've seen a few people get this error randomly recently).

I will report back with how this unit goes the second time around. Should finish sometime tomorrow morning.

Computer is a 4 x AMD 6172 at stock, 32GB DDR3 1333 Non-ECC

Code: Select all

[07:48:13] + Number of Units Completed: 97

thekraken: The Kraken 0.7-pre15 (compiled Thu Jun 28 17:37:19 EDT 2012 by tpickeri@Sovereign.mshome.net)
thekraken: Processor affinity wrapper for Folding@Home
thekraken: The Kraken comes with ABSOLUTELY NO WARRANTY; licensed under GPLv2
thekraken: PID: 5653
thekraken: Logging to thekraken.log
[07:48:19] Trying to send all finished work units
[07:48:19] + No unsent completed units remaining.
[07:48:19] - Preparing to get new work unit...
[07:48:19] Cleaning up work directory
[07:48:19] + Attempting to get work packet
[07:48:19] Passkey found
[07:48:19] - Will indicate memory of 32058 MB
[07:48:19] - Connecting to assignment server
[07:48:19] Connecting to http://assign.stanford.edu:8080/
[07:48:19] Posted data.
[07:48:19] Initial: 8F80; - Successful: assigned to (128.143.231.201).
[07:48:19] + News From Folding@Home: Welcome to Folding@Home
[07:48:20] Loaded queue successfully.
[07:48:20] Sent data
[07:48:20] Connecting to http://128.143.231.201:8080/
[07:48:26] Posted data.
[07:48:26] Initial: 0000; - Receiving payload (expected size: 30302410)
[07:48:35] - Downloaded at ~3288 kB/s
[07:48:35] - Averaged speed for that direction ~6891 kB/s
[07:48:35] + Received work.
[07:48:35] Trying to send all finished work units
[07:48:35] + No unsent completed units remaining.
[07:48:35] + Closed connections
[07:48:35] 
[07:48:35] + Processing work unit
[07:48:35] Core required: FahCore_a5.exe
[07:48:35] Core found.
[07:48:35] Working on queue slot 08 [September 29 07:48:35 UTC]
[07:48:35] + Working ...
[07:48:35] - Calling './FahCore_a5.exe -dir work/ -nice 19 -suffix 08 -np 48 -checkpoint 15 -verbose -lifeline 23549 -version 634'

thekraken: The Kraken 0.7-pre15 (compiled Thu Jun 28 17:37:19 EDT 2012 by tpickeri@Sovereign.mshome.net)
thekraken: Processor affinity wrapper for Folding@Home
thekraken: The Kraken comes with ABSOLUTELY NO WARRANTY; licensed under GPLv2
thekraken: PID: 5656
thekraken: Logging to thekraken.log
[07:48:35] 
[07:48:35] *------------------------------*
[07:48:35] Folding@Home Gromacs SMP Core
[07:48:35] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[07:48:35] 
[07:48:35] Preparing to commence simulation
[07:48:35] - Looking at optimizations...
[07:48:35] - Created dyn
[07:48:35] - Files status OK
[07:48:39] - Expanded 30301898 -> 33158020 (decompressed 109.4 percent)
[07:48:39] Called DecompressByteArray: compressed_data_size=30301898 data_size=33158020, decompressed_data_size=33158020 diff=0
[07:48:40] - Digital signature verified
[07:48:40] 
[07:48:40] Project: 8101 (Run 2, Clone 3, Gen 84)
[07:48:40] 
[07:48:40] Assembly optimizations on if available.
[07:48:40] Entering M.D.
                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                            :-)  VERSION 4.5.3  (-:

        Written by Emile Apol, Rossen Apostolov, Herman J.C. Berendsen,
      Aldert van Buuren, Pär Bjelkmar, Rudi van Drunen, Anton Feenstra, 
        Gerrit Groenhof, Peter Kasson, Per Larsson, Pieter Meulenhoff, 
           Teemu Murtola, Szilard Pall, Sander Pronk, Roland Schulz, 
                Michael Shirts, Alfons Sijbers, Peter Tieleman,

               Berk Hess, David van der Spoel, and Erik Lindahl.

       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
            Copyright (c) 2001-2010, The GROMACS development team at
        Uppsala University & The Royal Institute of Technology, Sweden.
            check out http://www.gromacs.org for more information.


                               :-)  Gromacs  (-:

Reading file work/wudata_08.tpr, VERSION 4.5.5-dev-20120903-d64b9e3 (single precision)
[07:48:47] Mapping NT from 48 to 48 
Starting 48 threads
Making 2D domain decomposition 12 x 4 x 1
starting mdrun 'FP_membrane in water'
21250000 steps,  85000.0 ps (continuing from step 21000000,  84000.0 ps).
[07:48:54] Completed 0 out of 250000 steps  (0%)

NOTE: Turning on dynamic load balancing

[08:04:41] Completed 2500 out of 250000 steps  (1%)
[08:19:55] Completed 5000 out of 250000 steps  (2%)
[08:35:15] Completed 7500 out of 250000 steps  (3%)
[08:50:31] Completed 10000 out of 250000 steps  (4%)
[09:05:48] Completed 12500 out of 250000 steps  (5%)
[09:08:19] - Autosending finished units... [September 29 09:08:19 UTC]
[09:08:19] Trying to send all finished work units
[09:08:19] + No unsent completed units remaining.
[09:08:19] - Autosend completed
[09:21:09] Completed 15000 out of 250000 steps  (6%)
[09:36:26] Completed 17500 out of 250000 steps  (7%)
[09:51:43] Completed 20000 out of 250000 steps  (8%)
[10:06:59] Completed 22500 out of 250000 steps  (9%)
[10:22:18] Completed 25000 out of 250000 steps  (10%)
[10:37:31] Completed 27500 out of 250000 steps  (11%)
[10:52:44] Completed 30000 out of 250000 steps  (12%)
[11:07:58] Completed 32500 out of 250000 steps  (13%)
[11:23:14] Completed 35000 out of 250000 steps  (14%)
[11:38:28] Completed 37500 out of 250000 steps  (15%)
[11:53:42] Completed 40000 out of 250000 steps  (16%)
[12:08:55] Completed 42500 out of 250000 steps  (17%)
[12:24:14] Completed 45000 out of 250000 steps  (18%)
[12:39:27] Completed 47500 out of 250000 steps  (19%)
[12:54:40] Completed 50000 out of 250000 steps  (20%)
[13:09:54] Completed 52500 out of 250000 steps  (21%)
[13:25:11] Completed 55000 out of 250000 steps  (22%)
[13:40:23] Completed 57500 out of 250000 steps  (23%)
[13:55:35] Completed 60000 out of 250000 steps  (24%)
[14:10:46] Completed 62500 out of 250000 steps  (25%)
[14:26:00] Completed 65000 out of 250000 steps  (26%)
[14:41:11] Completed 67500 out of 250000 steps  (27%)
[14:56:20] Completed 70000 out of 250000 steps  (28%)
[15:08:19] - Autosending finished units... [September 29 15:08:19 UTC]
[15:08:19] Trying to send all finished work units
[15:08:19] + No unsent completed units remaining.
[15:08:19] - Autosend completed
[15:11:30] Completed 72500 out of 250000 steps  (29%)
[15:26:45] Completed 75000 out of 250000 steps  (30%)
[15:41:57] Completed 77500 out of 250000 steps  (31%)
[15:57:08] Completed 80000 out of 250000 steps  (32%)
[16:12:19] Completed 82500 out of 250000 steps  (33%)
[16:27:36] Completed 85000 out of 250000 steps  (34%)
[16:42:49] Completed 87500 out of 250000 steps  (35%)
[16:58:00] Completed 90000 out of 250000 steps  (36%)
[17:13:12] Completed 92500 out of 250000 steps  (37%)
[17:28:28] Completed 95000 out of 250000 steps  (38%)
[17:43:39] Completed 97500 out of 250000 steps  (39%)
[17:58:49] Completed 100000 out of 250000 steps  (40%)
[18:14:00] Completed 102500 out of 250000 steps  (41%)
[18:29:16] Completed 105000 out of 250000 steps  (42%)
[18:44:27] Completed 107500 out of 250000 steps  (43%)
[18:59:37] Completed 110000 out of 250000 steps  (44%)
[19:14:47] Completed 112500 out of 250000 steps  (45%)
[19:30:02] Completed 115000 out of 250000 steps  (46%)
[19:45:12] Completed 117500 out of 250000 steps  (47%)
[20:00:25] Completed 120000 out of 250000 steps  (48%)
[20:15:37] Completed 122500 out of 250000 steps  (49%)
[20:30:51] Completed 125000 out of 250000 steps  (50%)
[20:46:02] Completed 127500 out of 250000 steps  (51%)
[21:01:13] Completed 130000 out of 250000 steps  (52%)
[21:08:19] - Autosending finished units... [September 29 21:08:19 UTC]
[21:08:19] Trying to send all finished work units
[21:08:19] + No unsent completed units remaining.
[21:08:19] - Autosend completed
[21:16:24] Completed 132500 out of 250000 steps  (53%)
[21:31:41] Completed 135000 out of 250000 steps  (54%)
[21:46:52] Completed 137500 out of 250000 steps  (55%)
[22:02:03] Completed 140000 out of 250000 steps  (56%)
[22:17:14] Completed 142500 out of 250000 steps  (57%)
[22:32:27] Completed 145000 out of 250000 steps  (58%)
[22:47:37] Completed 147500 out of 250000 steps  (59%)
[23:02:45] Completed 150000 out of 250000 steps  (60%)
[23:17:56] Completed 152500 out of 250000 steps  (61%)
[23:33:11] Completed 155000 out of 250000 steps  (62%)
[23:48:21] Completed 157500 out of 250000 steps  (63%)
[00:03:30] Completed 160000 out of 250000 steps  (64%)
[00:18:39] Completed 162500 out of 250000 steps  (65%)
[00:33:54] Completed 165000 out of 250000 steps  (66%)
[00:49:08] Completed 167500 out of 250000 steps  (67%)
[01:04:18] Completed 170000 out of 250000 steps  (68%)
[01:19:29] Completed 172500 out of 250000 steps  (69%)
[01:34:43] Completed 175000 out of 250000 steps  (70%)
[01:49:55] Completed 177500 out of 250000 steps  (71%)
[02:05:06] Completed 180000 out of 250000 steps  (72%)
[02:20:16] Completed 182500 out of 250000 steps  (73%)
[02:35:32] Completed 185000 out of 250000 steps  (74%)
[02:50:41] Completed 187500 out of 250000 steps  (75%)
[03:05:52] Completed 190000 out of 250000 steps  (76%)
[03:08:19] - Autosending finished units... [September 30 03:08:19 UTC]
[03:08:19] Trying to send all finished work units
[03:08:19] + No unsent completed units remaining.
[03:08:19] - Autosend completed
[03:21:07] Completed 192500 out of 250000 steps  (77%)
[03:36:17] Completed 195000 out of 250000 steps  (78%)
[03:51:29] Completed 197500 out of 250000 steps  (79%)
[04:06:39] Completed 200000 out of 250000 steps  (80%)
[04:21:55] Completed 202500 out of 250000 steps  (81%)
[04:37:06] Completed 205000 out of 250000 steps  (82%)
[04:52:16] Completed 207500 out of 250000 steps  (83%)
[05:07:27] Completed 210000 out of 250000 steps  (84%)
[05:22:41] Completed 212500 out of 250000 steps  (85%)
[05:37:52] Completed 215000 out of 250000 steps  (86%)
[05:53:01] Completed 217500 out of 250000 steps  (87%)
[06:08:11] Completed 220000 out of 250000 steps  (88%)
[06:23:26] Completed 222500 out of 250000 steps  (89%)
[06:38:37] Completed 225000 out of 250000 steps  (90%)
[06:53:50] Completed 227500 out of 250000 steps  (91%)
[07:09:01] Completed 230000 out of 250000 steps  (92%)
[07:24:18] Completed 232500 out of 250000 steps  (93%)
[07:39:28] Completed 235000 out of 250000 steps  (94%)
[07:54:38] Completed 237500 out of 250000 steps  (95%)
[08:09:51] Completed 240000 out of 250000 steps  (96%)
[08:25:40] Completed 242500 out of 250000 steps  (97%)
[08:41:04] Completed 245000 out of 250000 steps  (98%)
[08:56:26] Completed 247500 out of 250000 steps  (99%)
[09:08:19] - Autosending finished units... [September 30 09:08:19 UTC]
[09:08:19] Trying to send all finished work units
[09:08:19] + No unsent completed units remaining.
[09:08:19] - Autosend completed
[09:11:49] Completed 250000 out of 250000 steps  (100%)

Writing final coordinates.

 Average load imbalance: 0.4 %
 Part of the total run time spent waiting due to load imbalance: 0.1 %
 Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 % Y 0 %


        Parallel run - timing based on wallclock.

               NODE (s)   Real (s)      (%)
       Time:  91389.441  91389.441    100.0
                       1d01h23:09
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:   1189.272     61.916      0.945     25.386

Thanx for Using GROMACS - Have a Nice Day

[09:12:02] DynamicWrapper: Finished Work Unit: sleep=10000
[09:12:12] 
[09:12:12] Finished Work Unit:
[09:12:12] - Reading up to 64340496 from "work/wudata_08.trr": Read 64340496
[09:12:13] trr file hash check passed.
[09:12:13] - Reading up to 31615480 from "work/wudata_08.xtc": Read 31615480
[09:12:14] xtc file hash check passed.
[09:12:14] edr file hash check passed.
[09:12:14] logfile size: 193292
[09:12:14] Leaving Run
[09:12:19] - Writing 96310144 bytes of core data to disk...
[09:12:51] Done: 96309632 -> 91542438 (compressed to 5.8 percent)
[09:12:51]   ... Done.
[09:13:11] - Shutting down core
[09:13:11] 
[09:13:11] Folding@home Core Shutdown: FINISHED_UNIT
[09:13:13] CoreStatus = 64 (100)
[09:13:13] Unit 8 finished with 74 percent of time to deadline remaining.
[09:13:13] Updated performance fraction: 0.734408
[09:13:13] Sending work to server
[09:13:13] Project: 8101 (Run 2, Clone 3, Gen 84)


[09:13:13] + Attempting to send results [September 30 09:13:13 UTC]
[09:13:13] - Reading file work/wuresults_08.dat from core
[09:13:13]   (Read 91542950 bytes from disk)
[09:13:13] Connecting to http://128.143.231.201:8080/
[09:13:21] Posted data.
[09:13:21] Initial: 0000; - Uploaded at ~11174 kB/s
[09:13:21] - Averaged speed for that direction ~6055 kB/s
[09:13:21] - Server reports problem with unit.
[09:13:21] Trying to send all finished work units
[09:13:21] + No unsent completed units remaining.
[09:13:21] - Preparing to get new work unit...
[09:13:21] Cleaning up work directory
[09:13:21] + Attempting to get work packet
[09:13:21] Passkey found
[09:13:21] - Will indicate memory of 32058 MB
[09:13:21] - Connecting to assignment server
[09:13:21] Connecting to http://assign.stanford.edu:8080/
[09:13:22] Posted data.
[09:13:22] Initial: 8F80; - Successful: assigned to (128.143.231.201).
[09:13:22] + News From Folding@Home: Welcome to Folding@Home
[09:13:22] Loaded queue successfully.
[09:13:22] Sent data
[09:13:22] Connecting to http://128.143.231.201:8080/
[09:13:30] Posted data.
[09:13:30] Initial: 0000; - Receiving payload (expected size: 30302410)
[09:13:35] - Downloaded at ~5918 kB/s
[09:13:35] - Averaged speed for that direction ~6697 kB/s
[09:13:35] + Received work.
[09:13:35] Trying to send all finished work units
[09:13:35] + No unsent completed units remaining.
[09:13:35] + Closed connections
[09:13:35] 
[09:13:35] + Processing work unit
[09:13:35] Core required: FahCore_a5.exe
[09:13:35] Core found.
[09:13:35] Working on queue slot 09 [September 30 09:13:35 UTC]
[09:13:35] + Working ...
[09:13:35] - Calling './FahCore_a5.exe -dir work/ -nice 19 -suffix 09 -np 48 -checkpoint 15 -verbose -lifeline 23549 -version 634'

thekraken: The Kraken 0.7-pre15 (compiled Thu Jun 28 17:37:19 EDT 2012 by tpickeri@Sovereign.mshome.net)
thekraken: Processor affinity wrapper for Folding@Home
thekraken: The Kraken comes with ABSOLUTELY NO WARRANTY; licensed under GPLv2
thekraken: PID: 26778
thekraken: Logging to thekraken.log
[09:13:35] 
[09:13:35] *------------------------------*
[09:13:35] Folding@Home Gromacs SMP Core
[09:13:35] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[09:13:35] 
[09:13:35] Preparing to commence simulation
[09:13:35] - Looking at optimizations...
[09:13:35] - Created dyn
[09:13:35] - Files status OK
[09:13:39] - Expanded 30301898 -> 33158020 (decompressed 109.4 percent)
[09:13:39] Called DecompressByteArray: compressed_data_size=30301898 data_size=33158020, decompressed_data_size=33158020 diff=0
[09:13:39] - Digital signature verified
[09:13:39] 
[09:13:39] Project: 8101 (Run 2, Clone 3, Gen 84)
P5-133XL
Posts: 2948
Joined: Sun Dec 02, 2007 4:36 am
Hardware configuration: Machine #1:

Intel Q9450; 2x2GB=8GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460; Windows Server 2008 X64 (SP1).

Machine #2:

Intel Q6600; 2x2GB=4GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460 video card; Windows 7 X64.

Machine 3:

Dell Dimension 8400, 3.2GHz P4 4x512GB Ram, Video card GTX 460, Windows 7 X32

I am currently folding just on the 5x GTX 460's for aprox. 70K PPD
Location: Salem. OR USA

Re: Problems with Project: 8101 (Run 2, Clone 3, Gen 84)

Post by P5-133XL »

Currently the bigadv work servers are having issues. See: viewtopic.php?f=18&t=22566

Stanford knows of the issue and is working on it.
Image
EXT64
Posts: 323
Joined: Mon Apr 09, 2012 11:54 pm

Re: Problems with Project: 8101 (Run 2, Clone 3, Gen 84)

Post by EXT64 »

Thanks for letting me know. I'll let this unit try again, and if it fails again I'll go to SMP.
Post Reply