Project: 3065 (Run 1, Clone 37, Gen 35)

Moderators: Site Moderators, FAHC Science Team

Post Reply
EvilAlchemist
Posts: 53
Joined: Fri Feb 08, 2008 4:24 pm
Hardware configuration: 2 x X5550 Xeons - SuperMicro MBD-X8DAi-O
Server 2008 R2 x64 - 12GB Crucial DDR3 ECC Ram
PCP&C 910 Silencer - 1 x HIS 4850 ICEQ Turbo Edition

6 x E5530 Xeons (3 Systems) - SUPERMICRO MBD-X8DTL-i-O
Server 2008 RS x64 - 8GB DDR3 GSkill Non-ECC Ram
Seasonic 80+ Bronze 380w PSU

2 x E5504 - SUPERMICRO MBD-X8DTL-i-O
Server 2008 R2 x64 - 6GB DDR3 GSkill Non-ECC Ram
2.3 TB Raid 5 Array - Corsair 520 Power Supply

E5504 - EVGA X58 ATX Motherboard
Windows 7 x64 - 6GB DDR3 GSkill Non-ECC Ram
Seasonic 300 Power Supply

Intel X5550 CPU - EVGA X58 Micro ATX Motherboard
Windows 7 x64 - 3GB Corsair DDR3-1600
Corsair 550 Power Supply - ATI 4350

Dell Vostro 1500 Laptop - Intel T9300 C2D CPU
Windows 7 x64 - 4 GB DDR2-6400 - nVidia 8400m GS

Xeon 3075 C2D - Intel P35 Motherboard - 4GB DDR2 Non-ECC Ram
Server 2008 R2 x64- Seasonic 300 Power Supply
Location: Columbia, Tennessee
Contact:

Project: 3065 (Run 1, Clone 37, Gen 35)

Post by EvilAlchemist »

WU Fails at 15% each time.

Code: Select all

3:59:50] + Processing work unit
[03:59:50] Core required: FahCore_a1.exe
[03:59:50] Core found.
[03:59:50] Working on Unit 01 [May 21 03:59:50]
[03:59:50] + Working ...
[03:59:50] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 01 -checkpoint 30 -forceasm -verbose -lifeline 525 -version 602'

[03:59:50] 
[03:59:50] *------------------------------*
[03:59:50] Folding@Home Gromacs SMP Core
[03:59:50] Version 1.74 (November 27, 2006)
[03:59:50] 
[03:59:50] Preparing to commence simulation
[03:59:50] - Ensuring status. Please wait.
[03:59:51] - Starting from initial work packet
[03:59:51] 
[03:59:51] Project: 3065 (Run 1, Clone 37, Gen 35)
[03:59:51] 
[03:59:51] Assembly optimizations on if available.
[03:59:51] Entering M.D.
[04:00:08]  on if available.
[04:00:08] Entering M.D.
[04:00:14] Protein: 66728 p3065_lambda5_99sb_big
[04:00:14] Writing local files
[04:00:14] Extra SSEWriting local files
[04:00:14] Completed 0 out of 2500000 steps  (0 percent)
[04:18:33] Writing local files
[04:18:33] Completed 25000 out of 2500000 steps  (1 percent)
[04:37:11] Writing local files
[04:37:11] Completed 50000 out of 2500000 steps  (2 percent)
[04:55:52] Writing local files
[04:55:52] Completed 75000 out of 2500000 steps  (3 percent)
[05:14:31] Writing local files
[05:14:31] Completed 100000 out of 2500000 steps  (4 percent)
[05:33:11] Writing local files
[05:33:11] Completed 125000 out of 2500000 steps  (5 percent)
[05:51:51] Writing local files
[05:51:51] Completed 150000 out of 2500000 steps  (6 percent)
[06:10:30] Writing local files
[06:10:30] Completed 175000 out of 2500000 steps  (7 percent)
[06:29:09] Writing local files
[06:29:09] Completed 200000 out of 2500000 steps  (8 percent)
[06:47:48] Writing local files
[06:47:49] Completed 225000 out of 2500000 steps  (9 percent)
[07:06:27] Writing local files
[07:06:27] Completed 250000 out of 2500000 steps  (10 percent)
[07:25:06] Writing local files
[07:25:06] Completed 275000 out of 2500000 steps  (11 percent)
[07:43:45] Writing local files
[07:43:45] Completed 300000 out of 2500000 steps  (12 percent)
[08:02:23] Writing local files
[08:02:23] Completed 325000 out of 2500000 steps  (13 percent)
[08:21:00] Writing local files
[08:21:00] Completed 350000 out of 2500000 steps  (14 percent)
[08:39:37] Writing local files
[08:39:37] Completed 375000 out of 2500000 steps  (15 percent)
[08:55:10] Warning:  long 1-4 interactions
[08:55:14] CoreStatus = 0 (0)
[08:55:14] Client-core communications error: ERROR 0x0
[08:55:14] Deleting current work unit & continuing...
[08:59:35] - Warning: Could not delete all work unit files (1): Core returned invalid code
[08:59:35] Trying to send all finished work units
[08:59:35] + No unsent completed units remaining.
[08:59:35] - Preparing to get new work unit...
[08:59:35] + Attempting to get work packet
[08:59:35] - Will indicate memory of 2010 MB
[08:59:35] - Connecting to assignment server
[08:59:35] Connecting to http://assign.stanford.edu:8080/
[08:59:36] Posted data.
[08:59:36] Initial: 40AB; - Successful: assigned to (171.64.65.63).
[08:59:36] + News From Folding@Home: Welcome to Folding@Home
[08:59:36] Loaded queue successfully.
[08:59:36] Connecting to http://171.64.65.63:8080/
[08:59:38] Posted data.
[08:59:38] Initial: 0000; - Receiving payload (expected size: 1654831)
[08:59:41] - Downloaded at ~538 kB/s
[08:59:41] - Averaged speed for that direction ~471 kB/s
[08:59:41] + Received work.
[08:59:41] + Closed connections
[08:59:46] 
[08:59:46] + Processing work unit
[08:59:46] Core required: FahCore_a1.exe
[08:59:46] Core found.
[08:59:46] Working on Unit 02 [May 21 08:59:46]
[08:59:46] + Working ...
[08:59:46] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 02 -checkpoint 30 -forceasm -verbose -lifeline 525 -version 602'

[08:59:46] 
[08:59:46] *------------------------------*
[08:59:46] Folding@Home Gromacs SMP Core
[08:59:46] Version 1.74 (November 27, 2006)
[08:59:46] 
[08:59:46] Preparing to commence simulation
[08:59:46] - Ensuring status. Please wait.
[09:00:03] - Assembly optimizations manually forced on.
[09:00:03] - Not checking prior termination.
[09:00:03] - Expanded 1654319 -> 9524377 (decompressed 575.7 percent)
[09:00:03] - Starting from initial work packet
[09:00:03] 
[09:00:03] Project: 3065 (Run 1, Clone 37, Gen 35)
[09:00:03] 
[09:00:03] Assembly optimizations on if available.
[09:00:03] Entering M.D.
[09:00:09] Rejecting checkpoint
[09:00:10] Protein: 66728 p3065_lambda5_99sb_bigExtra SSE boost OK.
[09:00:10] 
[09:00:10] Extra SSE boost OK.
[09:00:10] Writing local files
[09:00:10] Completed 0 out of 2500000 steps  (0 percent)
[09:18:49] Writing local files
[09:18:49] Completed 25000 out of 2500000 steps  (1 percent)
[09:37:29] Writing local files
[09:37:29] Completed 50000 out of 2500000 steps  (2 percent)
[09:56:08] Writing local files
[09:56:08] Completed 75000 out of 2500000 steps  (3 percent)
[09:59:35] - Autosending finished units...
[09:59:35] Trying to send all finished work units
[09:59:35] + No unsent completed units remaining.
[09:59:35] - Autosend completed
[10:14:47] Writing local files
[10:14:48] Completed 100000 out of 2500000 steps  (4 percent)
[10:33:27] Writing local files
[10:33:27] Completed 125000 out of 2500000 steps  (5 percent)
[10:52:07] Writing local files
[10:52:07] Completed 150000 out of 2500000 steps  (6 percent)
[11:10:47] Writing local files
[11:10:47] Completed 175000 out of 2500000 steps  (7 percent)
[11:29:28] Writing local files
[11:29:28] Completed 200000 out of 2500000 steps  (8 percent)
[11:48:08] Writing local files
[11:48:08] Completed 225000 out of 2500000 steps  (9 percent)
[12:06:49] Writing local files
[12:06:49] Completed 250000 out of 2500000 steps  (10 percent)
[12:25:29] Writing local files
[12:25:29] Completed 275000 out of 2500000 steps  (11 percent)
[12:44:09] Writing local files
[12:44:09] Completed 300000 out of 2500000 steps  (12 percent)
[13:02:50] Writing local files
[13:02:50] Completed 325000 out of 2500000 steps  (13 percent)
[13:21:29] Writing local files
[13:21:29] Completed 350000 out of 2500000 steps  (14 percent)
[13:40:07] Writing local files
[13:40:07] Completed 375000 out of 2500000 steps  (15 percent)
[13:55:39] Warning:  long 1-4 interactions
[13:55:43] CoreStatus = 0 (0)
[13:55:43] Client-core communications error: ERROR 0x0
[13:55:43] Deleting current work unit & continuing...
[14:00:04] - Warning: Could not delete all work unit files (2): Core returned invalid code
[14:00:04] Trying to send all finished work units
[14:00:04] + No unsent completed units remaining.
[14:00:04] - Preparing to get new work unit...
[14:00:04] + Attempting to get work packet
[14:00:04] - Will indicate memory of 2010 MB
[14:00:04] - Connecting to assignment server
[14:00:04] Connecting to http://assign.stanford.edu:8080/
[14:00:04] Posted data.
[14:00:04] Initial: 40AB; - Successful: assigned to (171.64.65.63).
[14:00:04] + News From Folding@Home: Welcome to Folding@Home
[14:00:04] Loaded queue successfully.
[14:00:04] Connecting to http://171.64.65.63:8080/
[14:00:07] Posted data.
[14:00:07] Initial: 0000; - Receiving payload (expected size: 1654831)
[14:00:09] - Downloaded at ~808 kB/s
[14:00:09] - Averaged speed for that direction ~583 kB/s
[14:00:09] + Received work.
[14:00:09] + Closed connections
[14:00:14] 
[14:00:14] + Processing work unit
[14:00:14] Core required: FahCore_a1.exe
[14:00:14] Core found.
[14:00:14] Working on Unit 03 [May 21 14:00:14]
[14:00:14] + Working ...
[14:00:14] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 03 -checkpoint 30 -forceasm -verbose -lifeline 525 -version 602'

[14:00:14] 
[14:00:14] *------------------------------*
[14:00:14] Folding@Home Gromacs SMP Core
[14:00:14] Version 1.74 (November 27, 2006)
[14:00:14] 
[14:00:14] Preparing to commence simulation
[14:00:14] - Ensuring status. Please wait.
[14:00:32] - Assembly optimizations manually forced on.
[14:00:32] - Not checking prior termination.
[14:00:32] - Expanded 1654319 -> 9524377 (decompressed 575.7 percent)
[14:00:32] - Starting from initial work packet
[14:00:32] 
[14:00:32] Project: 3065 (Run 1, Clone 37, Gen 35)
[14:00:32] 
[14:00:32] Assembly optimizations on if available.
[14:00:32] Entering M.D.
[14:00:38] Rejecting checkpoint
[14:00:38] Protein: 66728 p3065_lambda5_99sb_bigExtra SSE boost OK.
[14:00:38] 
[14:00:38] Extra SSE boost OK.
[14:00:38] Writing local files
[14:00:38] Completed 0 out of 2500000 steps  (0 percent)
[14:19:18] Writing local files
[14:19:18] Completed 25000 out of 2500000 steps  (1 percent)
[14:37:56] Writing local files
[14:37:56] Completed 50000 out of 2500000 steps  (2 percent)
[14:56:35] Writing local files
[14:56:35] Completed 75000 out of 2500000 steps  (3 percent)
[15:15:15] Writing local files
[15:15:15] Completed 100000 out of 2500000 steps  (4 percent)
[15:33:55] Writing local files
[15:33:55] Completed 125000 out of 2500000 steps  (5 percent)
[15:52:33] Writing local files
[15:52:33] Completed 150000 out of 2500000 steps  (6 percent)
[15:59:35] - Autosending finished units...
[15:59:35] Trying to send all finished work units
[15:59:35] + No unsent completed units remaining.
[15:59:35] - Autosend completed
[16:11:13] Writing local files
[16:11:13] Completed 175000 out of 2500000 steps  (7 percent)
[16:29:51] Writing local files
[16:29:51] Completed 200000 out of 2500000 steps  (8 percent)
[16:48:29] Writing local files
[16:48:30] Completed 225000 out of 2500000 steps  (9 percent)
[17:07:09] Writing local files
[17:07:09] Completed 250000 out of 2500000 steps  (10 percent)
[17:25:49] Writing local files
[17:25:49] Completed 275000 out of 2500000 steps  (11 percent)
[17:44:29] Writing local files
[17:44:29] Completed 300000 out of 2500000 steps  (12 percent)
[18:03:09] Writing local files
[18:03:09] Completed 325000 out of 2500000 steps  (13 percent)
[18:21:46] Writing local files
[18:21:46] Completed 350000 out of 2500000 steps  (14 percent)
[18:40:24] Writing local files
[18:40:24] Completed 375000 out of 2500000 steps  (15 percent)
[18:55:55] Warning:  long 1-4 interactions
[18:55:55] 
[18:55:55] Folding@home Core Shutdown: INTERRUPTED
anandhanju
Posts: 526
Joined: Mon Dec 03, 2007 4:33 am
Location: Australia

Re: Project: 3065 (Run 1, Clone 37, Gen 35)

Post by anandhanju »

Stop client while at 12% or so and restart (after ensuring all FAH processes have ended). This MAY work.
Post Reply