Merged problems with projects 6903/6904, Part 1

Moderators: Site Moderators, FAHC Science Team

rhavern
Posts: 425
Joined: Mon Dec 03, 2007 8:45 am
Location: UK

Re: Merged problems with projects 6903/6904

Post by rhavern »

It looks like bad WU Project: 6903 (Run 6, Clone 0, Gen 72) is still floating around, I picked it up March 5 12:53:35 UTC and it is still broken (500000 steps). This on a 4P 6176SE (12 core, 2.3G) running v6.34 on Ubuntu 10.10.

Code: Select all

[12:53:12] + Attempting to get work packet
[12:53:12] Passkey found
[12:53:12] - Will indicate memory of 32234 MB
[12:53:12] - Connecting to assignment server
[12:53:12] Connecting to http://assign.stanford.edu:8080/
[12:53:12] Posted data.
[12:53:12] Initial: ED82; - Successful: assigned to (130.237.232.237).
[12:53:12] + News From Folding@Home: Welcome to Folding@Home
[12:53:12] Loaded queue successfully.
[12:53:12] Sent data
[12:53:12] Connecting to http://130.237.232.237:8080/
[12:53:25] Posted data.
[12:53:25] Initial: 0000; - Receiving payload (expected size: 57246952)
[12:53:35] - Downloaded at ~5590 kB/s
[12:53:35] - Averaged speed for that direction ~6817 kB/s
[12:53:35] + Received work.
[12:53:35] Trying to send all finished work units
[12:53:35] + No unsent completed units remaining.
[12:53:35] + Closed connections
[12:53:35] 
[12:53:35] + Processing work unit
[12:53:35] Core required: FahCore_a5.exe
[12:53:35] Core found.
[12:53:35] Working on queue slot 09 [March 5 12:53:35 UTC]
[12:53:35] + Working ...
[12:53:35] - Calling './FahCore_a5.exe -dir work/ -nice 19 -suffix 09 -np 48 -checkpoint 30 -verbose -lifeline 2470 -version 634'

[12:53:35] 
[12:53:35] *------------------------------*
[12:53:35] Folding@Home Gromacs SMP Core
[12:53:35] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[12:53:35] 
[12:53:35] Preparing to commence simulation
[12:53:35] - Looking at optimizations...
[12:53:35] - Created dyn
[12:53:35] - Files status OK
[12:53:41] - Expanded 57246440 -> 71846524 (decompressed 50.4 percent)
[12:53:41] Called DecompressByteArray: compressed_data_size=57246440 data_size=71846524, decompressed_data_size=71846524 diff=0
[12:53:42] - Digital signature verified
[12:53:42] 
[12:53:42] Project: 6903 (Run 6, Clone 0, Gen 72)
[12:53:42] 
[12:53:42] Assembly optimizations on if available.
[12:53:42] Entering M.D.
[12:53:51] Mapping NT from 48 to 48 
[12:53:55] Completed 0 out of 500000 steps  (0%)
[13:20:04] Completed 5000 out of 500000 steps  (1%)
[13:25:37]  NT from 48 to 48 
[13:26:43] Resuming from checkpoint
[13:27:10] Verified work/wudata_09.log
[13:27:10] Verified work/wudata_09.trr
[13:27:10] Verified work/wudata_09.xtc
[13:27:10] Verified work/wudata_09.edr
[13:27:11] Completed 5775 out of 500000 steps  (1%)
[13:48:43] Completed 10000 out of 500000 steps  (2%)
[14:14:17] Completed 15000 out of 500000 steps  (3%)
[14:39:54] Completed 20000 out of 500000 steps  (4%)
[15:05:32] Completed 25000 out of 500000 steps  (5%)
[15:31:07] Completed 30000 out of 500000 steps  (6%)
[15:56:40] Completed 35000 out of 500000 steps  (7%)
[16:22:09] Completed 40000 out of 500000 steps  (8%)
[16:47:43] Completed 45000 out of 500000 steps  (9%)
[17:13:18] Completed 50000 out of 500000 steps  (10%)
[17:18:57] - Autosending finished units... [March 5 17:18:57 UTC]
[17:18:57] Trying to send all finished work units
[17:18:57] + No unsent completed units remaining.
[17:18:57] - Autosend completed
[17:38:52] Completed 55000 out of 500000 steps  (11%)
[18:04:27] Completed 60000 out of 500000 steps  (12%)
[18:29:59] Completed 65000 out of 500000 steps  (13%)
[18:55:27] Completed 70000 out of 500000 steps  (14%)
[19:21:01] Completed 75000 out of 500000 steps  (15%)
[19:46:35] Completed 80000 out of 500000 steps  (16%)
[20:12:08] Completed 85000 out of 500000 steps  (17%)
[20:37:42] Completed 90000 out of 500000 steps  (18%)
[21:03:14] Completed 95000 out of 500000 steps  (19%)
[21:28:47] Completed 100000 out of 500000 steps  (20%)
[21:54:14] Completed 105000 out of 500000 steps  (21%)
[22:19:47] Completed 110000 out of 500000 steps  (22%)
[22:45:21] Completed 115000 out of 500000 steps  (23%)
[23:10:54] Completed 120000 out of 500000 steps  (24%)
[23:18:57] - Autosending finished units... [March 5 23:18:57 UTC]
[23:18:57] Trying to send all finished work units
[23:18:57] + No unsent completed units remaining.
[23:18:57] - Autosend completed
[23:36:25] Completed 125000 out of 500000 steps  (25%)
[00:01:56] Completed 130000 out of 500000 steps  (26%)
[00:27:25] Completed 135000 out of 500000 steps  (27%)
[00:52:52] Completed 140000 out of 500000 steps  (28%)
[01:18:25] Completed 145000 out of 500000 steps  (29%)
[01:43:56] Completed 150000 out of 500000 steps  (30%)
[02:09:30] Completed 155000 out of 500000 steps  (31%)
[02:35:01] Completed 160000 out of 500000 steps  (32%)
[03:00:31] Completed 165000 out of 500000 steps  (33%)
[03:26:02] Completed 170000 out of 500000 steps  (34%)
[03:51:26] Completed 175000 out of 500000 steps  (35%)
[04:16:56] Completed 180000 out of 500000 steps  (36%)
[04:42:27] Completed 185000 out of 500000 steps  (37%)
[05:07:59] Completed 190000 out of 500000 steps  (38%)
[05:18:57] - Autosending finished units... [March 6 05:18:57 UTC]
[05:18:57] Trying to send all finished work units
[05:18:57] + No unsent completed units remaining.
[05:18:57] - Autosend completed
[05:33:30] Completed 195000 out of 500000 steps  (39%)
[05:58:58] Completed 200000 out of 500000 steps  (40%)
[06:24:23] Completed 205000 out of 500000 steps  (41%)
[06:49:52] Completed 210000 out of 500000 steps  (42%)
[07:15:23] Completed 215000 out of 500000 steps  (43%)
[07:40:52] Completed 220000 out of 500000 steps  (44%)
[08:06:26] Completed 225000 out of 500000 steps  (45%)
[08:31:58] Completed 230000 out of 500000 steps  (46%)
[08:57:33] Completed 235000 out of 500000 steps  (47%)
[09:23:01] Completed 240000 out of 500000 steps  (48%)
[09:48:35] Completed 245000 out of 500000 steps  (49%)
[10:14:10] Completed 250000 out of 500000 steps  (50%)
[10:39:48] Completed 255000 out of 500000 steps  (51%)
[11:05:19] ***** Got an Activate signal (2)
[11:05:19] Killing all core threads
Folding since 1 WU=1 point
ImageImage
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: Merged problems with projects 6903/6904

Post by Grandpa_01 »

Must be a slippery little devil, it looks Kasson is going to need a bigger net. :P
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
ChelseaOilman
Posts: 1037
Joined: Sun Dec 02, 2007 3:47 pm
Location: Colorado @ 10,000 feet

Re: Merged problems with projects 6903/6904

Post by ChelseaOilman »

Project: 6903 (Run 6, Clone 0, Gen 72)

I reported it as a bad WU. For the 2nd time.
rhavern
Posts: 425
Joined: Mon Dec 03, 2007 8:45 am
Location: UK

Re: Merged problems with projects 6903/6904

Post by rhavern »

ChelseaOilman wrote:Project: 6903 (Run 6, Clone 0, Gen 72)

I reported it as a bad WU. For the 2nd time.
Funny -1

I got this evil WU again at 07/03/2012 09:43 UTC. Trying to flush even harder this time so it goes down.

Please make it stop.
Folding since 1 WU=1 point
ImageImage
ChelseaOilman
Posts: 1037
Joined: Sun Dec 02, 2007 3:47 pm
Location: Colorado @ 10,000 feet

Re: Merged problems with projects 6903/6904

Post by ChelseaOilman »

Got this one yesterday and reported it bad.

Code: Select all

[20:19:37] Project: 6903 (Run 6, Clone 0, Gen 72)
[20:19:37] 
[20:19:37] Assembly optimizations on if available.
[20:19:37] Entering M.D.
[20:19:46] Mapping NT from 48 to 48 
[20:19:50] Completed 0 out of 500000 steps  (0%)
Got this one this morning and reported it bad.

Code: Select all

[10:27:11] Project: 6904 (Run 1, Clone 16, Gen 44)
[10:27:11] 
[10:27:11] Assembly optimizations on if available.
[10:27:11] Entering M.D.
[10:27:20] Mapping NT from 48 to 48 
[10:27:25] Completed 0 out of 500000 steps  (0%)
kasson
Pande Group Member
Posts: 1459
Joined: Thu Nov 29, 2007 9:37 pm

Re: Merged problems with projects 6903/6904

Post by kasson »

The appropriate tools appear not to have stopped this work unit. I tried again this morning and am filing a bug investigation report with the server code maintainer.
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: Merged problems with projects 6903/6904

Post by Grandpa_01 »

The bigger net has arrived :P Thanks Kasson
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
orion
Posts: 135
Joined: Sun Dec 02, 2007 12:45 pm
Hardware configuration: 4p/4 MC ES @ 3.0GHz/32GB
4p/4x6128 @ 2.47GHz/32GB
2p/2 IL ES @ 2.7GHz/16GB
1p/8150/8GB
1p/1090T/4GB
Location: neither here nor there

Re: Merged problems with projects 6903/6904

Post by orion »

ChelseaOilman wrote:Project: 6903 (Run 6, Clone 0, Gen 72)

I reported it as a bad WU. For the 2nd time.
My turn :lol:

Looks like it's still getting through the net, I caught it today.

Code: Select all

[00:47:58] Project: 6903 (Run 6, Clone 0, Gen 72)
[00:47:58] 
[00:47:58] Assembly optimizations on if available.
[00:47:58] Entering M.D.
[00:48:05] Using Gromacs checkpoints
                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                            :-)  VERSION 4.5.3  (-:

        Written by Emile Apol, Rossen Apostolov, Herman J.C. Berendsen,
      Aldert van Buuren, Pär Bjelkmar, Rudi van Drunen, Anton Feenstra, 
        Gerrit Groenhof, Peter Kasson, Per Larsson, Pieter Meulenhoff, 
           Teemu Murtola, Szilard Pall, Sander Pronk, Roland Schulz, 
                Michael Shirts, Alfons Sijbers, Peter Tieleman,

               Berk Hess, David van der Spoel, and Erik Lindahl.

       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
            Copyright (c) 2001-2010, The GROMACS development team at
        Uppsala University & The Royal Institute of Technology, Sweden.
            check out http://www.gromacs.org for more information.


                               :-)  Gromacs  (-:

[00:48:18] Mapping NT from 48 to 48 
Reading file work/wudata_03.tpr, VERSION 4.5.4-dev-20110530-cc815 (single precision)
Starting 48 threads

Reading checkpoint file work/wudata_03.cpt generated: Sun Mar 11 18:38:22 2012


Making 2D domain decomposition 8 x 6 x 1
starting mdrun 'Overlay'
18250000 steps,  73000.0 ps (continuing from step 17778395,  71113.6 ps).
[00:49:40] Resuming from checkpoint
[00:49:54] Verified work/wudata_03.log
[00:49:56] Verified work/wudata_03.trr
[00:49:57] Verified work/wudata_03.xtc
[00:49:57] Verified work/wudata_03.edr
[00:50:00] Completed 28395 out of 500000 steps  (5%)

NOTE: Turning on dynamic load balancing

[00:58:03] Completed 30000 out of 500000 steps  (6%)
[01:22:33] Completed 35000 out of 500000 steps  (7%)
[01:45:57] Completed 40000 out of 500000 steps  (8%)
[02:11:28] Completed 45000 out of 500000 steps  (9%)
[02:37:35] Completed 50000 out of 500000 steps  (10%)
iustus quia...
orion
Posts: 135
Joined: Sun Dec 02, 2007 12:45 pm
Hardware configuration: 4p/4 MC ES @ 3.0GHz/32GB
4p/4x6128 @ 2.47GHz/32GB
2p/2 IL ES @ 2.7GHz/16GB
1p/8150/8GB
1p/1090T/4GB
Location: neither here nor there

Re: Merged problems with projects 6903/6904

Post by orion »

Picked it up again today after dumping it.
iustus quia...
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: Merged problems with projects 6903/6904

Post by Grandpa_01 »

And Kasson starts pulling his hair out. :lol:
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
ChelseaOilman
Posts: 1037
Joined: Sun Dec 02, 2007 3:47 pm
Location: Colorado @ 10,000 feet

Re: Merged problems with projects 6903/6904

Post by ChelseaOilman »

Orion, I don't know how much good it will do but I reported Project: 6903 (Run 6, Clone 0, Gen 72) as a bad WU again.
ChelseaOilman
Posts: 1037
Joined: Sun Dec 02, 2007 3:47 pm
Location: Colorado @ 10,000 feet

Re: Merged problems with projects 6903/6904

Post by ChelseaOilman »

Just received another bad WU.
[02:40:06] Project: 6904 (Run 1, Clone 16, Gen 44)
[02:40:06]
[02:40:06] Assembly optimizations on if available.
[02:40:06] Entering M.D.
[02:40:15] Mapping NT from 48 to 48
[02:40:20] Completed 0 out of 500000 steps (0%)
tear
Posts: 254
Joined: Sun Dec 02, 2007 4:08 am
Hardware configuration: None
Location: Rocky Mountains

Re: Merged problems with projects 6903/6904

Post by tear »

Gotta remove machinedependent.dat as well -- otherwise unit boomerangs in no time....
One man's ceiling is another man's floor.
Image
3.0charlie
Posts: 13
Joined: Wed Jul 29, 2009 4:34 pm

Re: Merged problems with projects 6903/6904

Post by 3.0charlie »

Picked up the 6903 R6 C0 G72 yesterday. Same bad WU.
Folding for Hardware Canucks
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Merged problems with projects 6903/6904

Post by bruce »

Project: 6903 (Run 6, Clone 0, Gen 72) reported again.
Post Reply