Multiple WU: Client-core communications error: ERROR 0x0

Moderators: Site Moderators, FAHC Science Team

Post Reply
kRa2y_kArMa
Posts: 2
Joined: Sat Aug 09, 2008 10:16 pm

Multiple WU: Client-core communications error: ERROR 0x0

Post by kRa2y_kArMa »

I am running 3 clients on my Mac Pro (8 cores at 2.8Ghz each, 2GB 800 MHz FBDIMM ram, 3 ATI HD2600XT graphics cards) and have recently been having my work units quit part way through their work and starting over. Within the last three days I have lost 6 work units each between 18 and 63% complete.

Client #1: 2 lost work units all at 24%
Project: 3064 (Run 4, Clone 20, Gen 94)
Project: 3064 (Run 4, Clone 20, Gen 94)

Client #2: 4 lost work units all at 63%
Project: 3062 (Run 2, Clone 112, Gen 35)
Project: 3062 (Run 2, Clone 112, Gen 35)
Project: 3062 (Run 2, Clone 112, Gen 35)
Project: 3062 (Run 2, Clone 112, Gen 35)

Client #3: 1 lost work unit at 18%
Project: 3064 (Run 4, Clone 69, Gen 61)


Each client is installed to its own folder, has unique machine IDs, and has the -local flag set. I am running -forceasm on each. Could this be causing the problem?

Thanks for your help,
Ben (kRa2y_kArMa)

Below is the complete log file for each client that is running:

Code: Select all

CLIENT 1 CODE

Last login: Wed Aug  6 02:48:41 on console
ben-bartholomews-mac-pro:~ benbartholomew$ ~/Library/folding@home
-bash: /Users/benbartholomew/Library/folding@home: is a directory
ben-bartholomews-mac-pro:~ benbartholomew$ cd ~/Library/folding@home
ben-bartholomews-mac-pro:folding@home benbartholomew$ ./fah6 -smp -local -forceasm -verbosity 9
Using local directory for configuration

Using local directory for work files
8 cores detected


--- Opening Log file [August 6 09:02:00] 

                       Folding@Home Client Version 6.02


Launch directory: /Users/benbartholomew/Library/folding@home
Executable: ./fah6
Arguments: -smp -local -forceasm -verbosity 9 

[09:02:00] - Ask before connecting: No
[09:02:00] - User name: kRa2y_kArMa (Team 99661)
[09:02:00] - User ID: 18B84C4E13FDBD09
[09:02:00] - Machine ID: 2
[09:02:00] 

...3 successfully completed units...

[17:12:18] - Connecting to assignment server
[17:12:18] Connecting to http://assign.stanford.edu:8080/
[17:12:18] Posted data.
[17:12:18] Initial: 40AB; - Successful: assigned to (171.64.65.63).
[17:12:18] + News From Folding@Home: Welcome to Folding@Home
[17:12:18] Loaded queue successfully.
[17:12:18] Connecting to http://171.64.65.63:8080/
[17:12:19] Posted data.
[17:12:19] Initial: 0000; - Receiving payload (expected size: 607985)
[17:12:21] - Downloaded at ~296 kB/s
[17:12:21] - Averaged speed for that direction ~461 kB/s
[17:12:21] + Received work.
[17:12:21] Trying to send all finished work units
[17:12:21] + No unsent completed units remaining.
[17:12:21] + Closed connections
[17:12:21] 
[17:12:21] + Processing work unit
[17:12:21] Core required: FahCore_a1.exe
[17:12:21] Core found.
[17:12:21] Working on Unit 08 [August 8 17:12:21]
[17:12:21] + Working ...
[17:12:21] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 08 -checkpoint 10 -forceasm -verbose -lifeline 290 -version 602'

[17:12:21] 
[17:12:21] *------------------------------*
[17:12:21] Folding@Home Gromacs SMP Core
[17:12:21] Version 1.74 (September 24, 2007)
[17:12:21] 
[17:12:21] Preparing to commence simulation
[17:12:21] - Ensuring status. Please wait.
[17:12:38] - Assembly optimizations manually forced on.
[17:12:38] - Not checking prior termination.
[17:12:38] - Expanded 607473 -> 3255941 (decompressed 535.9 percent)
[17:12:38] - Starting from initial work packet
[17:12:38] 
[17:12:38] Project: 3064 (Run 4, Clone 20, Gen 94)
[17:12:38] 
[17:12:38] Assembly optimizations on if available.
[17:12:38] Entering M.D.
NNODES=4, MYRANK=0, HOSTNAME=ben-bartholomews-mac-pro.local
NNODES=4, MYRANK=1, HOSTNAME=ben-bartholomews-mac-pro.local
NNODES=4, MYRANK=2, HOSTNAME=ben-bartholomews-mac-pro.local
NNODES=4, MYRANK=3, HOSTNAME=ben-bartholomews-mac-pro.local
NODEID=1 argc=15
NODEID=2 argc=15
NODEID=3 argc=15
NODEID=0 argc=15

[17:12:44] mdrunner cpfilename: 
[17:12:44] Rejecting checkpoint
starting mdrun 'p3064_lambda5_2003'
5000000 steps,  10000.0 ps.

[17:12:45] Protein: p3064_lambda5_2003Extra SSE boost OK.
[17:12:45] 
[17:12:45] Extra SSE boost OK.
[17:12:45] Writing local files

... 1% every 12-13min ...

[22:16:18] Writing local files
[22:16:18] Completed 1200000 out of 5000000 steps  (24 percent)
[22:23:59] Warning:  long 1-4 interactions
[0]0:Return code = 0, signaled with Segmentation fault
[0]1:Return code = 0, signaled with Quit
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Segmentation fault
[22:24:04] CoreStatus = 0 (0)
[22:24:04] Client-core communications error: ERROR 0x0
[22:24:04] Deleting current work unit & continuing...
[0]0:Return code = 0, signaled with Quit
[0]1:Return code = 0, signaled with Quit
[0]2:Return code = 0, signaled with Quit
[0]3:Return code = 18
[22:28:31] - Warning: Could not delete all work unit files (8): Core returned invalid code
[22:28:31] Trying to send all finished work units
[22:28:31] + No unsent completed units remaining.

...successfully completed unit...

[0]2:Return code = 0, signaled with Quit
[12:04:12] - Warning: Could not delete all work unit files (9): Core file absent
[12:04:12] Trying to send all finished work units
[12:04:12] + No unsent completed units remaining.
[12:04:12] - Preparing to get new work unit...
[12:04:12] + Attempting to get work packet
[12:04:12] - Will indicate memory of 2147483647 MB
[12:04:12] - Connecting to assignment server
[12:04:12] Connecting to http://assign.stanford.edu:8080/
[12:04:13] Posted data.
[12:04:13] Initial: 40AB; - Successful: assigned to (171.64.65.63).
[12:04:13] + News From Folding@Home: Welcome to Folding@Home
[12:04:13] Loaded queue successfully.
[12:04:13] Connecting to http://171.64.65.63:8080/
[12:04:15] Posted data.
[12:04:15] Initial: 0000; - Receiving payload (expected size: 607985)
[12:04:16] - Downloaded at ~593 kB/s
[12:04:16] - Averaged speed for that direction ~500 kB/s
[12:04:16] + Received work.
[12:04:16] Trying to send all finished work units
[12:04:16] + No unsent completed units remaining.
[12:04:16] + Closed connections
[12:04:16] 
[12:04:16] + Processing work unit
[12:04:16] Core required: FahCore_a1.exe
[12:04:16] Core found.
[12:04:16] Working on Unit 00 [August 9 12:04:16]
[12:04:16] + Working ...
[12:04:16] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 00 -checkpoint 10 -forceasm -verbose -lifeline 290 -version 602'

[12:04:16] 
[12:04:16] *------------------------------*
[12:04:16] Folding@Home Gromacs SMP Core
[12:04:16] Version 1.74 (September 24, 2007)
[12:04:16] 
[12:04:16] Preparing to commence simulation
[12:04:16] - Ensuring status. Please wait.
[12:04:33] - Assembly optimizations manually forced on.
[12:04:33] - Not checking prior termination.
[12:04:34] - Expanded 607473 -> 3255941 (decompressed 535.9 percent)
[12:04:34] - Starting from initial work packet
[12:04:34] 
[12:04:34] Project: 3064 (Run 4, Clone 20, Gen 94)
[12:04:34] 
[12:04:34] Assembly optimizations on if available.
[12:04:34] Entering M.D.
NNODES=4, MYRANK=2, HOSTNAME=ben-bartholomews-mac-pro.local
NNODES=4, MYRANK=0, HOSTNAME=ben-bartholomews-mac-pro.local
NNODES=4, MYRANK=1, HOSTNAME=ben-bartholomews-mac-pro.local
NNODES=4, MYRANK=3, HOSTNAME=ben-bartholomews-mac-pro.local
NODEID=0 argc=15
NODEID=2 argc=15
NODEID=3 argc=15
NODEID=1 argc=15

[12:04:40] mdrunner cpfilename: 
[12:04:40] Protein: p3064_lambda5_2003
[12:04:40] Writing local files
starting mdrun 'p3064_lambda5_2003'
5000000 steps,  10000.0 ps.

[12:04:40] Extra SSE boost OK.

... 1% every 12-14 min ...

[17:32:55] Completed 1200000 out of 5000000 steps  (24 percent)
[17:40:38] Warning:  long 1-4 interactions
[0]0:Return code = 0, signaled with Segmentation fault
[0]1:Return code = 0, signaled with Quit
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Segmentation fault
[17:40:43] CoreStatus = 0 (0)
[17:40:43] Client-core communications error: ERROR 0x0
[17:40:43] Deleting current work unit & continuing...
[0]0:Return code = 0, signaled with Quit
[0]1:Return code = 0, signaled with Quit
[0]2:Return code = 0, signaled with Quit
[0]3:Return code = 18
[17:45:10] - Warning: Could not delete all work unit files (0): Core returned invalid code
[17:45:10] Trying to send all finished work units
[17:45:10] + No unsent completed units remaining.
[17:45:10] - Preparing to get new work unit...

Code: Select all

CLIENT 2 CODE

Last login: Wed Aug  6 03:00:08 on ttys000
ben-bartholomews-mac-pro:~ benbartholomew$ cd ~/Library/folding@home2
ben-bartholomews-mac-pro:folding@home2 benbartholomew$ ./fah6 -smp -local -forceasm -verbosity 9
Using local directory for configuration

Using local directory for work files
8 cores detected


--- Opening Log file [August 6 09:01:28] 


                       Folding@Home Client Version 6.02


Launch directory: /Users/benbartholomew/Library/folding@home2
Executable: ./fah6
Arguments: -smp -local -forceasm -verbosity 9 

[09:01:28] - Ask before connecting: No
[09:01:28] - User name: kRa2y_kArMa (Team 99661)
[09:01:28] - User ID: 6F54B5B92204A333
[09:01:28] - Machine ID: 3
[09:01:28] 
[09:01:28] Loaded queue successfully.

... completed 1 work unit ...

[03:19:23] - Preparing to get new work unit...
[03:19:23] + Attempting to get work packet
[03:19:23] - Will indicate memory of 2147483647 MB
[03:19:23] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 7, Stepping: 6
[03:19:23] - Connecting to assignment server
[03:19:23] Connecting to http://assign.stanford.edu:8080/
[03:19:23] Posted data.
[03:19:23] Initial: 40AB; - Successful: assigned to (171.64.65.63).
[03:19:23] + News From Folding@Home: Welcome to Folding@Home
[03:19:24] Loaded queue successfully.
[03:19:24] Connecting to http://171.64.65.63:8080/
[03:19:25] Posted data.
[03:19:25] Initial: 0000; - Receiving payload (expected size: 610337)
[03:19:26] - Downloaded at ~596 kB/s
[03:19:26] - Averaged speed for that direction ~510 kB/s
[03:19:26] + Received work.
[03:19:26] Trying to send all finished work units
[03:19:26] + No unsent completed units remaining.
[03:19:26] + Closed connections
[03:19:26] 
[03:19:26] + Processing work unit
[03:19:26] Core required: FahCore_a1.exe
[03:19:26] Core found.
[03:19:26] Working on Unit 09 [August 7 03:19:26]
[03:19:26] + Working ...
[03:19:26] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 09 -checkpoint 5 -forceasm -verbose -lifeline 283 -version 602'

[03:19:26] 
[03:19:26] *------------------------------*
[03:19:26] Folding@Home Gromacs SMP Core
[03:19:26] Version 1.74 (September 24, 2007)
[03:19:26] 
[03:19:26] Preparing to commence simulation
[03:19:26] - Ensuring status. Please wait.
[03:19:43] - Assembly optimizations manually forced on.
[03:19:43] - Not checking prior termination.
[03:19:44] - Expanded 609825 -> 3263133 (decompressed 535.0 percent)
[03:19:44] - Starting from initial work packet
[03:19:44] 
[03:19:44] Project: 3062 (Run 2, Clone 112, Gen 35)
[03:19:44] 
[03:19:44] Assembly optimizations on if available.
[03:19:44] Entering M.D.
NNODES=4, MYRANK=0, HOSTNAME=ben-bartholomews-mac-pro.local
NNODES=4, MYRANK=1, HOSTNAME=ben-bartholomews-mac-pro.local
NNODES=4, MYRANK=2, HOSTNAME=ben-bartholomews-mac-pro.local
NNODES=4, MYRANK=3, HOSTNAME=ben-bartholomews-mac-pro.local
NODEID=1 argc=15
NODEID=2 argc=15
NODEID=3 argc=15
NODEID=0 argc=15

starting mdrun 'p3062_lambda5_99sb'
5000000 steps,  10000.0 ps.

[03:19:50] mdrunner cpfilename: 
[03:19:50] Protein: p3062_lambdaProtein: p3062_lambda5_99sbExtra SSE boost OK.
[03:19:50] 
[03:19:50] Extra SSE boost OK.

...1% every 15min ...

[19:17:50] Writing local files
[19:17:50] Completed 3150000 out of 5000000 steps  (63 percent)
[19:22:49] Timered checkpoint triggered.
[19:27:49] Timered checkpoint triggered.
[0]0:Return code = 0, signaled with Segmentation fault
[0]1:Return code = 0, signaled with Segmentation fault
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Segmentation fault
[19:30:11] CoreStatus = 0 (0)
[19:30:11] Client-core communications error: ERROR 0x0
[19:30:11] Deleting current work unit & continuing...
[0]0:Return code = 18
[0]1:Return code = 0, signaled with Quit
[0]2:Return code = 18
[0]3:Return code = 0, signaled with Quit
[19:34:38] - Warning: Could not delete all work unit files (9): Core returned invalid code
[19:34:38] Trying to send all finished work units
[19:34:38] + No unsent completed units remaining.
[19:34:38] - Preparing to get new work unit...
[19:34:38] + Attempting to get work packet
[19:34:38] - Will indicate memory of 2147483647 MB
[19:34:38] - Connecting to assignment server
[19:34:38] Connecting to http://assign.stanford.edu:8080/
[19:34:38] Posted data.
[19:34:38] Initial: 40AB; - Successful: assigned to (171.64.65.63).
[19:34:38] + News From Folding@Home: Welcome to Folding@Home
[19:34:39] Loaded queue successfully.
[19:34:39] Connecting to http://171.64.65.63:8080/
[19:34:40] Posted data.
[19:34:40] Initial: 0000; - Receiving payload (expected size: 610337)
[19:34:41] - Downloaded at ~596 kB/s
[19:34:41] - Averaged speed for that direction ~527 kB/s
[19:34:41] + Received work.
[19:34:41] + Closed connections
[19:34:46] 
[19:34:46] + Processing work unit
[19:34:46] Core required: FahCore_a1.exe
[19:34:46] Core found.
[19:34:46] Working on Unit 00 [August 7 19:34:46]
[19:34:46] + Working ...
[19:34:46] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 00 -checkpoint 5 -forceasm -verbose -lifeline 283 -version 602'

[19:34:46] 
[19:34:46] *------------------------------*
[19:34:46] Folding@Home Gromacs SMP Core
[19:34:46] Version 1.74 (September 24, 2007)
[19:34:46] 
[19:34:46] Preparing to commence simulation
[19:34:46] - Ensuring status. Please wait.
[19:35:03] - Assembly optimizations manually forced on.
[19:35:03] - Not checking prior termination.
[19:35:04] - Expanded 609825 -> 3263133 (decompressed 535.0 percent)
[19:35:04] - Starting from initial work packet
[19:35:04] 
[19:35:04] Project: 3062 (Run 2, Clone 112, Gen 35)
[19:35:04] 
[19:35:04] Assembly optimizations on if available.
[19:35:04] Entering M.D.
NNODES=4, MYRANK=1, HOSTNAME=ben-bartholomews-mac-pro.local
NNODES=4, MYRANK=0, HOSTNAME=ben-bartholomews-mac-pro.local
NNODES=4, MYRANK=3, HOSTNAME=ben-bartholomews-mac-pro.local
NNODES=4, MYRANK=2, HOSTNAME=ben-bartholomews-mac-pro.local
NODEID=0 argc=15
NODEID=3 argc=15
NODEID=1 argc=15
NODEID=2 argc=15

starting mdrun 'p3062_lambda5_99sb'
5000000 steps,  10000.0 ps.

[19:35:10] mdrunner cpfilename: 
[19:35:10] Protein: p3062_lambdaProtein: p3062_lambda5_99sbExtra SSE boost OK.
[19:35:10] 
[19:35:10] Extra SSE boost OK.

... 1% every 13min ...

[08:51:42] Writing local files
[08:51:42] Completed 3150000 out of 5000000 steps  (63 percent)
[08:56:42] Timered checkpoint triggered.
[09:01:32] - Autosending finished units...
[09:01:32] Trying to send all finished work units
[09:01:32] + No unsent completed units remaining.
[09:01:32] - Autosend completed
[0]0:Return code = 0, signaled with Segmentation fault
[0]1:Return code = 0, signaled with Segmentation fault
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Segmentation fault
[09:01:44] CoreStatus = 0 (0)
[09:01:44] Client-core communications error: ERROR 0x0
[09:01:44] Deleting current work unit & continuing...
[0]0:Return code = 0, signaled with Quit
[0]1:Return code = 0, signaled with Quit
[0]2:Return code = 18
[0]3:Return code = 0, signaled with Quit
[09:06:11] - Warning: Could not delete all work unit files (0): Core returned invalid code
[09:06:11] Trying to send all finished work units
[09:06:11] + No unsent completed units remaining.
[09:06:11] - Preparing to get new work unit...
[09:06:11] + Attempting to get work packet
[09:06:11] - Will indicate memory of 2147483647 MB
[09:06:11] - Connecting to assignment server
[09:06:11] Connecting to http://assign.stanford.edu:8080/
[09:06:12] Posted data.
[09:06:12] Initial: 40AB; - Successful: assigned to (171.64.65.63).
[09:06:12] + News From Folding@Home: Welcome to Folding@Home
[09:06:12] Loaded queue successfully.
[09:06:12] Connecting to http://171.64.65.63:8080/
[09:06:13] Posted data.
[09:06:13] Initial: 0000; - Receiving payload (expected size: 610337)
[09:06:14] - Downloaded at ~596 kB/s
[09:06:14] - Averaged speed for that direction ~541 kB/s
[09:06:14] + Received work.
[09:06:14] + Closed connections
[09:06:19] 
[09:06:19] + Processing work unit
[09:06:19] Core required: FahCore_a1.exe
[09:06:19] Core found.
[09:06:19] Working on Unit 01 [August 8 09:06:19]
[09:06:19] + Working ...
[09:06:19] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 01 -checkpoint 5 -forceasm -verbose -lifeline 283 -version 602'

[09:06:20] 
[09:06:20] *------------------------------*
[09:06:20] Folding@Home Gromacs SMP Core
[09:06:20] Version 1.74 (September 24, 2007)
[09:06:20] 
[09:06:20] Preparing to commence simulation
[09:06:20] - Ensuring status. Please wait.
[09:06:37] - Assembly optimizations manually forced on.
[09:06:37] - Not checking prior termination.
[09:06:37] - Expanded 609825 -> 3263133 (decompressed 535.0 percent)
[09:06:37] - Starting from initial work packet
[09:06:37] 
[09:06:37] Project: 3062 (Run 2, Clone 112, Gen 35)
[09:06:37] 
[09:06:37] Assembly optimizations on if available.
[09:06:37] Entering M.D.
NNODES=4, MYRANK=3, HOSTNAME=ben-bartholomews-mac-pro.local
NNODES=4, MYRANK=0, HOSTNAME=ben-bartholomews-mac-pro.local
NNODES=4, MYRANK=2, HOSTNAME=ben-bartholomews-mac-pro.local
NNODES=4, MYRANK=1, HOSTNAME=ben-bartholomews-mac-pro.local
NODEID=0 argc=15
NODEID=1 argc=15
NODEID=2 argc=15
NODEID=3 argc=15

[09:06:43] mdrunner cpfilename: 
[09:06:43] Rejecting checkpoint
starting mdrun 'p3062_lambda5_99sb'
5000000 steps,  10000.0 ps.

[09:06:44] Protein: p3062_lambda5_99sbExtra SSE boost OK.
[09:06:44] 
[09:06:44] Extra SSE boost OK.

... 1% every 13min ...

[22:20:31] Writing local files
[22:20:31] Completed 3150000 out of 5000000 steps  (63 percent)
[22:25:31] Timered checkpoint triggered.
[0]0:Return code = 0, signaled with Segmentation fault
[0]1:Return code = 0, signaled with Segmentation fault
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Segmentation fault
[22:29:43] CoreStatus = 0 (0)
[22:29:43] Client-core communications error: ERROR 0x0
[22:29:43] - Attempting to download new core...
[22:29:43] + Downloading new core: FahCore_a1.exe
[22:29:43] Downloading core (/~pande/OSX/x86/Core_a1.fah from www.stanford.edu)
...download status updates...
[22:29:45] Verifying core Core_a1.fah...
[22:29:45] Signature is VALID
[22:29:45] 
[22:29:45] Trying to unzip core FahCore_a1.exe
[22:29:46] Decompressed FahCore_a1.exe (3246776 bytes) successfully
[22:29:46] + Core successfully engaged
[22:29:46] Deleting current work unit & continuing...
[0]0:Return code = 18
[0]1:Return code = 18
[0]2:Return code = 0, signaled with Quit
[0]3:Return code = 0, signaled with Quit
[22:34:13] - Warning: Could not delete all work unit files (1): Core returned invalid code
[22:34:13] Trying to send all finished work units
[22:34:13] + No unsent completed units remaining.
[22:34:13] - Preparing to get new work unit...

... 1 completed unit ...

Code: Select all

CLIENT 3 CODE

Last login: Wed Aug  6 03:00:17 on ttys001
ben-bartholomews-mac-pro:~ benbartholomew$ cd ~/Library/folding@home3
ben-bartholomews-mac-pro:folding@home3 benbartholomew$ ./fah6 -smp -local -forceasm -verbosity 9
Using local directory for configuration

Using local directory for work files
8 cores detected

--- Opening Log file [August 6 09:01:05] 

                       Folding@Home Client Version 6.02

Launch directory: /Users/benbartholomew/Library/folding@home3
Executable: ./fah6
Arguments: -smp -local -forceasm -verbosity 9 

[09:01:05] - Ask before connecting: No
[09:01:05] - User name: kRa2y_kArMa (Team 99661)
[09:01:05] - User ID: 6FF2C70674316DCD
[09:01:05] - Machine ID: 4
[09:01:05] 
[09:01:05] Loaded queue successfully.

... 3 work units completed ....

[17:14:36] - Preparing to get new work unit...
[17:14:36] + Attempting to get work packet
[17:14:36] - Will indicate memory of 2147483647 MB
[17:14:36] - Connecting to assignment server
[17:14:36] Connecting to http://assign.stanford.edu:8080/
[17:14:37] Posted data.
[17:14:37] Initial: 40AB; - Successful: assigned to (171.64.65.63).
[17:14:37] + News From Folding@Home: Welcome to Folding@Home
[17:14:37] Loaded queue successfully.
[17:14:37] Connecting to http://171.64.65.63:8080/
[17:14:38] Posted data.
[17:14:38] Initial: 0000; - Receiving payload (expected size: 608484)
[17:14:39] - Downloaded at ~594 kB/s
[17:14:39] - Averaged speed for that direction ~466 kB/s
[17:14:39] + Received work.
[17:14:39] Trying to send all finished work units
[17:14:39] + No unsent completed units remaining.
[17:14:39] + Closed connections
[17:14:39] 
[17:14:39] + Processing work unit
[17:14:39] Core required: FahCore_a1.exe
[17:14:39] Core found.
[17:14:39] Working on Unit 09 [August 8 17:14:39]
[17:14:39] + Working ...
[17:14:39] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 09 -checkpoint 5 -forceasm -verbose -lifeline 277 -version 602'

[17:14:39] 
[17:14:39] *------------------------------*
[17:14:39] Folding@Home Gromacs SMP Core
[17:14:39] Version 1.74 (September 24, 2007)
[17:14:39] 
[17:14:39] Preparing to commence simulation
[17:14:39] - Ensuring status. Please wait.
[17:14:56] - Assembly optimizations manually forced on.
[17:14:56] - Not checking prior termination.
[17:14:57] - Expanded 607972 -> 3255941 (decompressed 535.5 percent)
[17:14:57] - Starting from initial work packet
[17:14:57] 
[17:14:57] Project: 3064 (Run 4, Clone 69, Gen 61)
[17:14:57] 
[17:14:57] Assembly optimizations on if available.
[17:14:57] Entering M.D.
NNODES=4, MYRANK=0, HOSTNAME=ben-bartholomews-mac-pro.local
NNODES=4, MYRANK=1, HOSTNAME=ben-bartholomews-mac-pro.local
NNODES=4, MYRANK=3, HOSTNAME=ben-bartholomews-mac-pro.local
NNODES=4, MYRANK=2, HOSTNAME=ben-bartholomews-mac-pro.local
NODEID=1 argc=15
NODEID=0 argc=15
NODEID=2 argc=15
NODEID=3 argc=15


[17:15:03] mdrunner cpfilename: 
[17:15:03] Rejecting checkpoint
starting mdrun 'p3064_lambda5_2003'
5000000 steps,  10000.0 ps.

[17:15:03] Protein: p3064_lambda5_2003Extra SSE boost OK.

....1% completed every 12-13min...

[21:02:00] Completed 900000 out of 5000000 steps  (18 percent)
[21:02:20] Warning:  long 1-4 interactions
[0]0:Return code = 0, signaled with Segmentation fault
[0]1:Return code = 0, signaled with Segmentation fault
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Segmentation fault
[21:02:25] CoreStatus = 0 (0)
[21:02:25] Client-core communications error: ERROR 0x0
[21:02:25] Deleting current work unit & continuing...
[0]0:Return code = 0, signaled with Quit
[0]1:Return code = 0, signaled with Quit
[0]2:Return code = 0, signaled with Quit
[0]3:Return code = 18
[21:06:52] - Warning: Could not delete all work unit files (9): Core returned invalid code
[21:06:52] Trying to send all finished work units
[21:06:52] + No unsent completed units remaining.

Post Reply