Project: 3065 (Run 2, Clone 14, Gen 56) multiple seg faults

Moderators: Site Moderators, FAHC Science Team

Post Reply
dschief
Posts: 146
Joined: Tue Dec 04, 2007 5:56 am
Hardware configuration: ASUS P5K-E, Q6600/ 8 gig ram Win-7

2X ASUS z97-K 16 G Ram Win-7_64

Project: 3065 (Run 2, Clone 14, Gen 56) multiple seg faults

Post by dschief »

P3065_Lamda5_99sb_big has Seg. faulted 3 straight times at 4%
Q6600 stock clocking
FC8 2.6.25.11-60
core temps
44
44
42
42

Code: Select all

          are interested in using Gromacs, visit www.gromacs.org where
                you can download a free version of Gromacs under
         the terms of the GNU General Public License (GPL) as published
       by the Free Software Foundation; either version 2 of the License,
                     or (at your option) any later version.

[18:14:50] sb_big
[18:14:50] Writing local files
starting mdrun '66728 p3065_lambda5_99sb_big'
2500000 steps,   5000.0 ps.

[18:14:50] Extra SSE boost OK.
[18:14:50] a5_99sb_bigExtra SSE boost OK.
[18:14:50] 
[18:14:50] Extra SSE boost OK.
[18:14:50] Writing local files
[18:14:50] Completed 0 out of 2500000 steps  (0 percent)
[18:29:50] Timered checkpoint triggered.
[18:37:05] Writing local files
[18:37:05] Completed 25000 out of 2500000 steps  (1 percent)
[18:52:05] Timered checkpoint triggered.
[18:59:18] Writing local files
[18:59:18] Completed 50000 out of 2500000 steps  (2 percent)
[19:14:18] Timered checkpoint triggered.
[19:21:31] Writing local files
[19:21:31] Completed 75000 out of 2500000 steps  (3 percent)
[19:36:31] Timered checkpoint triggered.
[19:43:43] Writing local files
[19:43:43] Completed 100000 out of 2500000 steps  (4 percent)
[19:44:35] Warning:  long 1-4 interactions
[0]0:Return code = 0, signaled with Segmentation fault
[0]1:Return code = 0, signaled with Segmentation fault
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Segmentation fault
[19:44:40] CoreStatus = 0 (0)
[19:44:40] Client-core communications error: ERROR 0x0
[19:44:40] Deleting current work unit & continuing...
[0]0:Return code = 0, signaled with Quit
[0]1:Return code = 0, signaled with Quit
[0]2:Return code = 0, signaled with Quit
[0]3:Return code = 18
[19:49:01] - Warning: Could not delete all work unit files (1): Core returned invalid code
[19:49:01] Trying to send all finished work units
[19:49:01] + No unsent completed units remaining.
[19:49:01] - Preparing to get new work unit...
[19:49:01] + Attempting to get work packet
[19:49:01] - Will indicate memory of 2013 MB
[19:49:01] - Connecting to assignment server
[19:49:01] Connecting to http://assign.stanford.edu:8080/
[19:49:01] Posted data.
[19:49:01] Initial: 40AB; - Successful: assigned to (171.64.65.63).
[19:49:01] + News From Folding@Home: Welcome to Folding@Home
[19:49:02] Loaded queue successfully.
[19:49:02] Connecting to http://171.64.65.63:8080/
[19:49:04] Posted data.
[19:49:04] Initial: 0000; - Receiving payload (expected size: 1657366)
[19:49:15] - Downloaded at ~147 kB/s
[19:49:15] - Averaged speed for that direction ~158 kB/s
[19:49:15] + Received work.
[19:49:15] + Closed connections
[19:49:20] 
[19:49:20] + Processing work unit
[19:49:20] Core required: FahCore_a1.exe
[19:49:20] Core found.
[19:49:20] Working on Unit 02 [August 5 19:49:20]
[19:49:20] + Working ...
[19:49:20] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 02 -priority 96 -checkpoint 15 -verbose -lifeline 3081 -version 602'

[19:49:20] 
[19:49:20] *------------------------------*
[19:49:20] Folding@Home Gromacs SMP Core
[19:49:20] Version 1.74 (November 27, 2006)
[19:49:20] 
[19:49:20] Preparing to commence simulation
[19:49:20] - Ensuring status. Please wait.
[19:49:37] - Looking at optimizations...
[19:49:37] - Working with standard loops on this execution.
[19:49:37] - Previous termination of core was improper.
[19:49:37] - Going to use standard loops.
[19:49:37] - Files status OK
[19:49:37] - Expanded 1656854 -> 9524377 (decompressed 574.8 percent)
[19:49:38] ne tarting from Entering M.D.
[19:49:38] acket
[19:49:38] 
[19:49:38] Project: 3065 (Run 2, Clone 14, Gen 56)
[19:49:38] 
[19:49:38] Entering M.D.
NNODES=4, MYRANK=0, HOSTNAME=localhost.localdomain
NNODES=4, MYRANK=3, HOSTNAME=localhost.localdomain
NNODES=4, MYRANK=1, HOSTNAME=localhost.localdomain
NNODES=4, MYRANK=2, HOSTNAME=localhost.localdomain
NODEID=3 argc=15
NODEID=0 argc=15
NODEID=1 argc=15
NODEID=2 argc=15
      Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
             Copyright (c) 2001-2004, The GROMACS development team,
            check out http://www.gromacs.org for more information.

        This inclusion of Gromacs code in the Folding@Home Core is under
        a special license (see http://folding.stanford.edu/gromacs.html)
         specially granted to Stanford by the copyright holders. If you
          are interested in using Gromacs, visit www.gromacs.org where
                you can download a free version of Gromacs under
         the terms of the GNU General Public License (GPL) as published
       by the Free Software Foundation; either version 2 of the License,
                     or (at your option) any later version.

starting mdrun '66728 p3065_lambda5_99sb_big'
2500000 steps,   5000.0 ps.

[19:49:44] ting local files
[19:49:44] Extra SSE boost OK.
[19:49:44] Writing local files
[19:49:44] 
[19:49:44] Extra SSE boost OK.
[19:49:45] 00000 steps  (0 percent)
[20:04:44] Timered checkpoint triggered.
[20:11:35] Writing local files
[20:11:35] Completed 25000 out of 2500000 steps  (1 percent)
[20:26:34] Timered checkpoint triggered.
[20:33:28] Writing local files
[20:33:28] Completed 50000 out of 2500000 steps  (2 percent)
[20:48:27] Timered checkpoint triggered.
[20:55:19] Writing local files
[20:55:19] Completed 75000 out of 2500000 steps  (3 percent)
[21:10:19] Timered checkpoint triggered.
[21:17:13] Writing local files
[21:17:13] Completed 100000 out of 2500000 steps  (4 percent)
[21:18:04] Warning:  long 1-4 interactions
[0]0:Return code = 0, signaled with Segmentation fault
[0]1:Return code = 0, signaled with Quit
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Segmentation fault
[21:18:09] CoreStatus = 0 (0)
[21:18:09] Client-core communications error: ERROR 0x0
[21:18:09] Deleting current work unit & continuing...
[0]0:Return code = 0, signaled with Quit
[0]1:Return code = 18
[0]2:Return code = 0, signaled with Quit
[0]3:Return code = 0, signaled with Quit
[21:22:30] - Warning: Could not delete all work unit files (2): Core returned invalid code
[21:22:30] Trying to send all finished work units
[21:22:30] + No unsent completed units remaining.
[21:22:30] - Preparing to get new work unit...
[21:22:30] + Attempting to get work packet
[21:22:30] - Will indicate memory of 2013 MB
[21:22:30] - Connecting to assignment server
[21:22:30] Connecting to http://assign.stanford.edu:8080/
[21:22:30] Posted data.
[21:22:30] Initial: 40AB; - Successful: assigned to (171.64.65.63).
[21:22:30] + News From Folding@Home: Welcome to Folding@Home
[21:22:30] Loaded queue successfully.
[21:22:30] Connecting to http://171.64.65.63:8080/
[21:22:32] Posted data.
[21:22:32] Initial: 0000; - Receiving payload (expected size: 1657366)
[21:22:43] - Downloaded at ~147 kB/s
[21:22:43] - Averaged speed for that direction ~156 kB/s
[21:22:43] + Received work.
[21:22:43] + Closed connections
[21:22:48] 
[21:22:48] + Processing work unit
[21:22:48] Core required: FahCore_a1.exe
[21:22:48] Core found.
[21:22:48] Working on Unit 03 [August 5 21:22:48]
[21:22:48] + Working ...
[21:22:48] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 03 -priority 96 -checkpoint 15 -verbose -lifeline 3081 -version 602'

[21:22:48] 
[21:22:48] *------------------------------*
[21:22:48] Folding@Home Gromacs SMP Core
[21:22:48] Version 1.74 (November 27, 2006)
[21:22:48] 
[21:22:48] Preparing to commence simulation
[21:22:48] - Ensuring status. Please wait.
[21:23:05] - Looking at optimizations...
[21:23:05] - Working with standard loops on this execution.
[21:23:05] - Previous termination of core was improper.
[21:23:05] - Going to use standard loops.
[21:23:05] - Files status OK
[21:23:06] - Expanded 1656854 -> 9524377 (decompressed 574.8 percent)
[21:23:06] ne 14, Gen 56)
[21:23:06] 
[21:23:06] Entering M.D.
[21:23:06] acket
[21:23:06] 
[21:23:06] Project: 3065 (Run 2, Clone 14, Gen 56)
[21:23:06] 
[21:23:06] Entering M.D.
NNODES=4, MYRANK=1, HOSTNAME=localhost.localdomain
NNODES=4, MYRANK=2, HOSTNAME=localhost.localdomain
NNODES=4, MYRANK=3, HOSTNAME=localhost.localdomain
NNODES=4, MYRANK=0, HOSTNAME=localhost.localdomain
NODEID=1 argc=15
NODEID=2 argc=15
NODEID=3 argc=15
NODEID=0 argc=15
      Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
             Copyright (c) 2001-2004, The GROMACS development team,
            check out http://www.gromacs.org for more information.

        This inclusion of Gromacs code in the Folding@Home Core is under
        a special license (see http://folding.stanford.edu/gromacs.html)
         specially granted to Stanford by the copyright holders. If you
          are interested in using Gromacs, visit www.gromacs.org where
                you can download a free version of Gromacs under
         the terms of the GNU General Public License (GPL) as published
       by the Free Software Foundation; either version 2 of the License,
                     or (at your option) any later version.

starting mdrun '66728 p3065_lambda5_99sb_big'
2500000 steps,   5000.0 ps.

[21:23:13] ting local files
[21:23:13] Extra SSE boost OK.
[21:23:13] Writing local files
[21:23:13] 
[21:23:13] Extra SSE boost OK.
[21:23:13] 00000 steps  (0 percent)
[21:38:14] Timered checkpoint triggered.
[21:45:05] Writing local files
[21:45:05] Completed 25000 out of 2500000 steps  (1 percent)
[22:00:05] Timered checkpoint triggered.
[22:06:59] Writing local files
[22:06:59] Completed 50000 out of 2500000 steps  (2 percent)
[22:21:59] Timered checkpoint triggered.
[22:28:54] Writing local files
[22:28:54] Completed 75000 out of 2500000 steps  (3 percent)
[22:43:54] Timered checkpoint triggered.
[22:50:46] Writing local files
[22:50:46] Completed 100000 out of 2500000 steps  (4 percent)
[22:51:38] Warning:  long 1-4 interactions
[0]0:Return code = 0, signaled with Segmentation fault
[0]1:Return code = 0, signaled with Segmentation fault
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Segmentation fault
[22:51:42] CoreStatus = 0 (0)
[22:51:42] Client-core communications error: ERROR 0x0
now working on a new wu.
Last edited by 7im on Tue Aug 05, 2008 3:53 pm, edited 1 time in total.
Reason: Edit thread title to standard WU format
uncle_fungus
Site Admin
Posts: 1288
Joined: Fri Nov 30, 2007 9:37 am
Location: Oxfordshire, UK

Re: P3065 R2 Cl 14 Gen 56 multiple seg faults

Post by uncle_fungus »

So far that stats db contains two entries for this WU, one for 0 points, and one for a little over 90.
Post Reply