Project: 3043 (Run 2, Clone 60, Gen 43) Failing at 15%

Moderators: Site Moderators, FAHC Science Team

Post Reply
dschief
Posts: 146
Joined: Tue Dec 04, 2007 5:56 am
Hardware configuration: ASUS P5K-E, Q6600/ 8 gig ram Win-7

2X ASUS z97-K 16 G Ram Win-7_64

Project: 3043 (Run 2, Clone 60, Gen 43) Failing at 15%

Post by dschief »

P3043 run 2 clone 60 Gen 43

Getting long 1-4 interactions followed by segmentation faults

the wu is deleted, then the same one is re loaded.

Q6600 stock clock 2.4
Linux {fedora8 ) 2 gig mem.

this box has been very dependable, completing multiple 3062 & 3065 without any glitches,

has any one completed this one?
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Project: 3043 (Run 2, Clone 60, Gen 43) Failing at 15%

Post by 7im »

dschief wrote: has any one completed this one?
No.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
dschief
Posts: 146
Joined: Tue Dec 04, 2007 5:56 am
Hardware configuration: ASUS P5K-E, Q6600/ 8 gig ram Win-7

2X ASUS z97-K 16 G Ram Win-7_64

Re: Project: 3043 (Run 2, Clone 60, Gen 43) Failing at 15%

Post by dschief »

7im wrote:
dschief wrote: has any one completed this one?
No.
I think we may have a Lemon on our hands.

the current one is at 9%, will post an update in the AM.
dschief
Posts: 146
Joined: Tue Dec 04, 2007 5:56 am
Hardware configuration: ASUS P5K-E, Q6600/ 8 gig ram Win-7

2X ASUS z97-K 16 G Ram Win-7_64

Re: Project: 3043 (Run 2, Clone 60, Gen 43) Failing at 15%

Post by dschief »

Well it died again right at 15 %

this time it down-loaaded a different wu, which has made it to 21%

here's the log showing the last 2 failures.
[06:21:37] - Machine ID: 1
[06:21:37]
[06:21:37] Loaded queue successfully.
[06:21:37] - Autosending finished units...
[06:21:37] Trying to send all finished work units
[06:21:37] + No unsent completed units remaining.
[06:21:37] - Autosend completed
[06:21:37] - Preparing to get new work unit...
[06:21:37] + Attempting to get work packet
[06:21:37] - Will indicate memory of 2013 MB
[06:21:37] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 15, Stepping: 11
[06:21:37] - Connecting to assignment server
[06:21:37] Connecting to http://assign.stanford.edu:8080/
[06:21:37] Posted data.
[06:21:37] Initial: 40AB; - Successful: assigned to (171.64.65.63).
[06:21:37] + News From Folding@Home: Welcome to Folding@Home
[06:21:38] Loaded queue successfully.
[06:21:38] Connecting to http://171.64.65.63:8080/
[06:21:38] Posted data.
[06:21:38] Initial: 0000; - Receiving payload (expected size: 283537)
[06:21:40] - Downloaded at ~138 kB/s
[06:21:40] - Averaged speed for that direction ~138 kB/s
[06:21:40] + Received work.
[06:21:40] + Closed connections
[06:21:40]
[06:21:40] + Processing work unit
[06:21:40] Core required: FahCore_a1.exe
[06:21:40] Core found.
[06:21:40] Working on Unit 01 [April 11 06:21:40]
[06:21:40] + Working ...
[06:21:40] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 01 -priority 96 -checkpoint 15 -verbose -lifeline 4313 -version 601'

[06:21:40]
[06:21:40] *------------------------------*
[06:21:40] Folding@Home Gromacs SMP Core
[06:21:40] Version 1.74 (November 27, 2006)
[06:21:40]
[06:21:40] Preparing to commence simulation
[06:21:40] - Ensuring status. Please wait.
[06:21:40] Created dyn
[06:21:40] - Files status OK
[06:21:40] - Expanded 283025 -> 1508541 (decompressed 533.0 percent)
[06:21:40] - Starting from initial work packet
[06:21:40]
[06:21:40] Project: 3043 (Run 2, Clone 60, Gen 43)
[06:21:40]
[06:21:40] Assembly optimizations on if available.
[06:21:40] Entering M.D.
[06:21:57] 0 percent)
[06:21:57] - Starting from initial work packet
[06:21:57]
[06:21:57] Project: 3043 (Run 2, Clone 60, Gen 43)
[06:21:57]
[06:21:57] Entering M.D.
NNODES=4, MYRANK=0, HOSTNAME=localhost.localdomain
NNODES=4, MYRANK=3, HOSTNAME=localhost.localdomain
NNODES=4, MYRANK=1, HOSTNAME=localhost.localdomain
NNODES=4, MYRANK=2, HOSTNAME=localhost.localdomain
NODEID=1 argc=15
NODEID=2 argc=15
NODEID=3 argc=15
NODEID=0 argc=15
Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2004, The GROMACS development team,
check out http://www.gromacs.org for more information.

This inclusion of Gromacs code in the Folding@Home Core is under
a special license (see http://folding.stanford.edu/gromacs.html)
specially granted to Stanford by the copyright holders. If you
are interested in using Gromacs, visit http://www.gromacs.org where
you can download a free version of Gromacs under
the terms of the GNU General Public License (GPL) as published
by the Free Software Foundation; either version 2 of the License,
or (at your option) any later version.

starting mdrun '9684 p3029_SMP-emsv-03'
10000000 steps, 20000.0 ps.

[06:22:04] 684 p3029_SMP-emsv-03
[06:22:04] Writing local files
[06:22:04] Extra SSE Extra SSE Writing local files
[06:22:04] Completed 0 out of 10000000 steps (0 percent)
[06:36:18] Writing local files
[06:36:18] Completed 100000 out of 10000000 steps (1 percent)
[06:50:35] Writing local files
[06:50:35] Completed 200000 out of 10000000 steps (2 percent)
[07:04:52] Writing local files
[07:04:52] Completed 300000 out of 10000000 steps (3 percent)
[07:19:09] Writing local files
[07:19:09] Completed 400000 out of 10000000 steps (4 percent)
[07:33:33] Writing local files
[07:33:33] Completed 500000 out of 10000000 steps (5 percent)
[07:47:51] Writing local files
[07:47:51] Completed 600000 out of 10000000 steps (6 percent)
[08:02:08] Writing local files
[08:02:08] Completed 700000 out of 10000000 steps (7 percent)
[08:16:25] Writing local files
[08:16:25] Completed 800000 out of 10000000 steps (8 percent)
[08:30:41] Writing local files
[08:30:41] Completed 900000 out of 10000000 steps (9 percent)
[08:44:55] Writing local files
[08:44:55] Completed 1000000 out of 10000000 steps (10 percent)
[08:59:12] Writing local files
[08:59:12] Completed 1100000 out of 10000000 steps (11 percent)
[09:13:29] Writing local files
[09:13:29] Completed 1200000 out of 10000000 steps (12 percent)
[09:27:45] Writing local files
[09:27:45] Completed 1300000 out of 10000000 steps (13 percent)
[09:42:02] Writing local files
[09:42:02] Completed 1400000 out of 10000000 steps (14 percent)
[09:56:19] Writing local files
[09:56:19] Completed 1500000 out of 10000000 steps (15 percent)
[10:08:04] Warning: long 1-4 interactions
[0]0:Return code = 0, signaled with Segmentation fault
[0]1:Return code = 0, signaled with Segmentation fault
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Segmentation fault
[10:08:08] CoreStatus = 0 (0)
[10:08:08] Client-core communications error: ERROR 0x0
[10:08:08] Deleting current work unit & continuing...
[0]0:Return code = 18
[0]1:Return code = 0, signaled with Quit
[0]2:Return code = 0, signaled with Quit
[0]3:Return code = 0, signaled with Quit
[10:12:29] - Warning: Could not delete all work unit files (1): Core returned invalid code
[10:12:29] Trying to send all finished work units
[10:12:29] + No unsent completed units remaining.
[10:12:29] - Preparing to get new work unit...
[10:12:29] + Attempting to get work packet
[10:12:29] - Will indicate memory of 2013 MB
[10:12:29] - Connecting to assignment server
[10:12:29] Connecting to http://assign.stanford.edu:8080/
[10:12:29] Posted data.
[10:12:29] Initial: 40AB; - Successful: assigned to (171.64.65.63).
[10:12:29] + News From Folding@Home: Welcome to Folding@Home
[10:12:29] Loaded queue successfully.
[10:12:29] Connecting to http://171.64.65.63:8080/
[10:12:29] Posted data.
[10:12:29] Initial: 0000; - Receiving payload (expected size: 283537)
[10:12:31] - Downloaded at ~138 kB/s
[10:12:31] - Averaged speed for that direction ~138 kB/s
[10:12:31] + Received work.
[10:12:31] + Closed connections
[10:12:36]
[10:12:36] + Processing work unit
[10:12:36] Core required: FahCore_a1.exe
[10:12:36] Core found.
[10:12:36] Working on Unit 02 [April 11 10:12:36]
[10:12:36] + Working ...
[10:12:36] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 02 -priority 96 -checkpoint 15 -verbose -lifeline 4313 -version 601'

[10:12:36]
[10:12:36] *------------------------------*
[10:12:36] Folding@Home Gromacs SMP Core
[10:12:36] Version 1.74 (November 27, 2006)
[10:12:36]
[10:12:36] Preparing to commence simulation
[10:12:36] - Ensuring status. Please wait.
[10:12:53] - Looking at optimizations...
[10:12:53] - Working with standard loops on this execution.
[10:12:53] - Previous termination of core w- Expanded 283025 -> 1508541 (d- Expanded 283025 -> 1508541 (d- Expanded 283025 -> 150854- Starting from initial work packet
[10:12:53]
[10:12:53] Prog from initial work packet
[10:12:53]
[10:12:53] Project: 3Entering M.D.
[10:12:53] one 60, Gen 43)
[10:12:53]
[10:12:53] Entering M.D.
NNODES=4, MYRANK=0, HOSTNAME=localhost.localdomain
NNODES=4, MYRANK=1, HOSTNAME=localhost.localdomain
NNODES=4, MYRANK=2, HOSTNAME=localhost.localdomain
NNODES=4, MYRANK=3, HOSTNAME=localhost.localdomain
NODEID=3 argc=15
NODEID=0 argc=15
NODEID=1 argc=15
NODEID=2 argc=15
Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2004, The GROMACS development team,
check out http://www.gromacs.org for more information.

This inclusion of Gromacs code in the Folding@Home Core is under
a special license (see http://folding.stanford.edu/gromacs.html)
specially granted to Stanford by the copyright holders. If you
are interested in using Gromacs, visit http://www.gromacs.org where
you can download a free version of Gromacs under
the terms of the GNU General Public License (GPL) as published
by the Free Software Foundation; either version 2 of the License,
or (at your option) any later version.

[10:12:59] Rejecting checkpoint
starting mdrun '9684 p3029_SMP-emsv-03'
10000000 steps, 20000.0 ps.

[10:13:00] Extra SSE boost OK.
[10:13:00] SMP-emsv-03Extra SSE boost OK.
[10:13:00]
[10:13:00] Extra SSE boost OK.
[10:13:00] Writing local files
[10:13:00] Completed 0 out of 10000000 steps (0 percent)
[10:27:11] Writing local files
[10:27:11] Completed 100000 out of 10000000 steps (1 percent)
[10:41:25] Writing local files
[10:41:25] Completed 200000 out of 10000000 steps (2 percent)
[10:55:37] Writing local files
[10:55:37] Completed 300000 out of 10000000 steps (3 percent)
[11:09:51] Writing local files
[11:09:51] Completed 400000 out of 10000000 steps (4 percent)
[11:24:05] Writing local files
[11:24:05] Completed 500000 out of 10000000 steps (5 percent)
[11:38:18] Writing local files
[11:38:18] Completed 600000 out of 10000000 steps (6 percent)
[11:52:42] Writing local files
[11:52:42] Completed 700000 out of 10000000 steps (7 percent)
[12:07:00] Writing local files
[12:07:00] Completed 800000 out of 10000000 steps (8 percent)
[12:21:17] Writing local files
[12:21:17] Completed 900000 out of 10000000 steps (9 percent)
[12:21:37] - Autosending finished units...
[12:21:37] Trying to send all finished work units
[12:21:37] + No unsent completed units remaining.
[12:21:37] - Autosend completed
[12:35:29] Writing local files
[12:35:29] Completed 1000000 out of 10000000 steps (10 percent)
[12:49:42] Writing local files
[12:49:42] Completed 1100000 out of 10000000 steps (11 percent)
[13:03:57] Writing local files
[13:03:57] Completed 1200000 out of 10000000 steps (12 percent)
[13:18:11] Writing local files
[13:18:11] Completed 1300000 out of 10000000 steps (13 percent)
[13:32:24] Writing local files
[13:32:24] Completed 1400000 out of 10000000 steps (14 percent)
[13:46:37] Writing local files
[13:46:37] Completed 1500000 out of 10000000 steps (15 percent)
[13:58:18] Warning: long 1-4 interactions
[13:58:18]
[13:58:18] Folding@home Core Shutdown: INTERRUPTED
[0]0:Return code = 0, signaled with Segmentation fault
[0]1:Return code = 0, signaled with Segmentation fault
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Segmentation fault
[13:58:22] CoreStatus = 0 (0)
[13:58:22] Client-core communications error: ERROR 0x0
[13:58:22] Deleting current work unit & continuing...
[0]0:Return code = 0, signaled with Quit
[0]1:Return code = 0, signaled with Quit
[0]2:Return code = 18
[0]3:Return code = 0, signaled with Quit
[14:02:44] - Warning: Could not delete all work unit files (2): Core returned invalid code
[14:02:44] Trying to send all finished work units
[14:02:44] + No unsent completed units remaining.
[14:02:44] - Preparing to get new work unit...
[14:02:44] + Attempting to get work packet
[14:02:44] - Will indicate memory of 2013 MB
[14:02:44] - Connecting to assignment server
[14:02:44] Connecting to http://assign.stanford.edu:8080/
[14:02:44] Posted data.
[14:02:44] Initial: 40AB; - Successful: assigned to (171.64.65.63).
[14:02:44] + News From Folding@Home: Welcome to Folding@Home
[14:02:44] Loaded queue successfully.
[14:02:44] Connecting to http://171.64.65.63:8080/
[14:02:44] Posted data.
[14:02:44] Initial: 0000; - Error: Bad packet type from server, expected work assignment
[14:02:44] - Attempt #1 to get work failed, and no other work to do.
Waiting before retry.
[14:02:49] + Attempting to get work packet
[14:02:49] - Will indicate memory of 2013 MB
[14:02:49] - Connecting to assignment server
[14:02:49] Connecting to http://assign.stanford.edu:8080/
[14:02:49] Posted data.
[14:02:49] Initial: 40AB; - Successful: assigned to (171.64.65.63).
[14:02:49] + News From Folding@Home: Welcome to Folding@Home
[14:02:50] Loaded queue successfully.
[14:02:50] Connecting to http://171.64.65.63:8080/
[14:02:52] Posted data.
[14:02:52] Initial: 0000; - Receiving payload (expected size: 1652750)
[14:03:02] - Downloaded at ~161 kB/s
[14:03:02] - Averaged speed for that direction ~146 kB/s
[14:03:02] + Received work.
[14:03:02] + Closed connections
[14:03:07]
[14:03:07] + Processing work unit
[14:03:07] Core required: FahCore_a1.exe
[14:03:07] Core found.
[14:03:07] Working on Unit 03 [April 11 14:03:07]
[14:03:07] + Working ...
[14:03:07] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 03 -priority 96 -checkpoint 15 -verbose -lifeline 4313 -version 601'

[14:03:07]
[14:03:07] *------------------------------*
[14:03:07] Folding@Home Gromacs SMP Core
[14:03:07] Version 1.74 (November 27, 2006)
[14:03:07]
[14:03:07] Preparing to commence simulation
[14:03:07] - Ensuring status. Please wait.
[14:03:24] - Looking at optimizations...
[14:03:24] - Working with standard loops on this execution.
[14:03:24] - Previous termination of core was improper.
[14:03:24] - Going to use standard loops.
[14:03:24] - Files status OK
[14:03:25] - Expanded 1652238 -> 9524377 (decompressed 576.4 percent)
[14:03:25] ne 10, Gen 45)
[14:03:25]
[14:03:25] Entering M.D.
[14:03:25] one 10
[14:03:25] Project: 3Entering M.D.
[14:03:25] one 10, Gen 45)
[14:03:25]
[14:03:25] Entering M.D.
NNODES=4, MYRANK=0, HOSTNAME=localhost.localdomain
NNODES=4, MYRANK=1, HOSTNAME=localhost.localdomain
NNODES=4, MYRANK=2, HOSTNAME=localhost.localdomain
NNODES=4, MYRANK=3, HOSTNAME=localhost.localdomain
NODEID=2 argc=15
NODEID=3 argc=15
NODEID=0 argc=15
NODEID=1 argc=15
Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2004, The GROMACS development team,
check out http://www.gromacs.org for more information.

This inclusion of Gromacs code in the Folding@Home Core is under
a special license (see http://folding.stanford.edu/gromacs.html)
specially granted to Stanford by the copyright holders. If you
are interested in using Gromacs, visit http://www.gromacs.org where
you can download a free version of Gromacs under
the terms of the GNU General Public License (GPL) as published
by the Free Software Foundation; either version 2 of the License,
or (at your option) any later version.

starting mdrun '66728 p3065_lambda5_99sb_big'
2500000 steps, 5000.0 ps.

[14:03:32] ting local files
[14:03:32] Extra SSE boost OK.
[14:03:32] Writing local files
[14:03:32] Completed 0 out of 2500000 steps (0 percent)
[14:18:32] Timered checkpoint triggered.
[14:24:39] Writing local files
[14:24:39] Completed 25000 out of 2500000 steps (1 percent)
[14:39:38] Timered checkpoint triggered.
[14:45:46] Writing local files
[14:45:46] Completed 50000 out of 2500000 steps (2 percent)
[15:00:46] Timered checkpoint triggered.
[15:06:53] Writing local files
[15:06:53] Completed 75000 out of 2500000 steps (3 percent)
[15:21:53] Timered checkpoint triggered.
[15:28:01] Writing local files
[15:28:01] Completed 100000 out of 2500000 steps (4 percent)
[15:43:01] Timered checkpoint triggered.
[15:49:08] Writing local files
[15:49:08] Completed 125000 out of 2500000 steps (5 percent)
[16:04:08] Timered checkpoint triggered.
[16:10:15] Writing local files
[16:10:15] Completed 150000 out of 2500000 steps (6 percent)
[16:25:15] Timered checkpoint triggered.
[16:31:23] Writing local files
[16:31:23] Completed 175000 out of 2500000 steps (7 percent)
[16:46:23] Timered checkpoint triggered.
[16:52:31] Writing local files
[16:52:31] Completed 200000 out of 2500000 steps (8 percent)
[17:07:31] Timered checkpoint triggered.
[17:13:38] Writing local files
[17:13:38] Completed 225000 out of 2500000 steps (9 percent)
[17:28:38] Timered checkpoint triggered.
[17:34:46] Writing local files
[17:34:46] Completed 250000 out of 2500000 steps (10 percent)
[17:49:46] Timered checkpoint triggered.
[17:55:54] Writing local files
[17:55:54] Completed 275000 out of 2500000 steps (11 percent)
[18:10:54] Timered checkpoint triggered.
[18:17:04] Writing local files
[18:17:04] Completed 300000 out of 2500000 steps (12 percent)
[18:21:37] - Autosending finished units...
[18:21:37] Trying to send all finished work units
[18:21:37] + No unsent completed units remaining.
[18:21:37] - Autosend completed
[18:32:03] Timered checkpoint triggered.
[18:38:12] Writing local files
[18:38:12] Completed 325000 out of 2500000 steps (13 percent)
[18:53:12] Timered checkpoint triggered.
[18:59:21] Writing local files
[18:59:22] Completed 350000 out of 2500000 steps (14 percent)
[19:14:21] Timered checkpoint triggered.
[19:20:31] Writing local files
[19:20:31] Completed 375000 out of 2500000 steps (15 percent)
[19:35:31] Timered checkpoint triggered.
[19:41:40] Writing local files
[19:41:40] Completed 400000 out of 2500000 steps (16 percent)
[19:56:40] Timered checkpoint triggered.
[20:02:47] Writing local files
[20:02:48] Completed 425000 out of 2500000 steps (17 percent)
[20:17:47] Timered checkpoint triggered.
[20:23:56] Writing local files
[20:23:56] Completed 450000 out of 2500000 steps (18 percent)
[20:38:56] Timered checkpoint triggered.
[20:45:06] Writing local files
[20:45:06] Completed 475000 out of 2500000 steps (19 percent)
[21:00:06] Timered checkpoint triggered.
[21:06:15] Writing local files
[21:06:15] Completed 500000 out of 2500000 steps (20 percent)
[21:21:15] Timered checkpoint triggered.
[21:27:27] Writing local files
[21:27:28] Completed 525000 out of 2500000 steps (21 percent)
Post Reply