Project 2608 (Run 0, Clone 389, Gen 18)

Moderators: Site Moderators, FAHC Science Team

Project 2608 (Run 0, Clone 389, Gen 18)

Postby dutchmm » Fri Dec 28, 2007 4:30 pm

This one seems to have killed off the folding service I was running (v6,0 beta 1)
Here are the relevant pieces of the log file. What a bummer, I was away for 5 days after it went tits up.

[0]0:Return code = 0, signaled with Quit
[0]1:Return code = 0, signaled with Quit
[0]2:Return code = 0, signaled with Quit
[0]3:Return code = 18
[05:15:14] - Preparing to get new work unit...
[05:15:14] + Attempting to get work packet
[05:15:14] - Connecting to assignment server
[05:15:14] - Successful: assigned to (171.64.65.56).
[05:15:14] + News From Folding@Home: Welcome to Folding@Home
[05:15:15] Loaded queue successfully.
[05:15:30] + Closed connections
[05:15:30]
[05:15:30] + Processing work unit
[05:15:30] Core required: FahCore_a1.exe
[05:15:30] Core found.
[05:15:30] Working on Unit 06 [December 23 05:15:30]
[05:15:30] + Working ...
[05:15:30]
[05:15:30] *------------------------------*
[05:15:30] Folding@Home Gromacs SMP Core
[05:15:30] Version 1.74 (November 27, 2006)
[05:15:30]
[05:15:30] Preparing to commence simulation
[05:15:30] - Ensuring status. Pltions...
[05:15:30] - Working with standard loops on this execution.
[05:15:30] - Previous termination of core was improper.
[05:15:30] - Files status OK
[05:15:30] ial work packet
[05:15:30]
[05:15:30] Project: 2608 (Run 0, Clone 389, Gen 18)
[05:15:30]
[05:15:30] Assembly optimizations on if available.
[05:15:30] Entering M.D.
[05:15:30] un 0, Clone 389, Gen 18)
[05:15:30]
[05:15:30] Entering M.D.
[05:15:47] acket
[05:15:47]
[05:15:47] Project: 2608 (Run 0, Clone 389, Gen 18)
[05:15:47]
[05:15:47] Entering M.D.
NNODES=4, MYRANK=1, HOSTNAME=localhost
NNODES=4, MYRANK=3, HOSTNAME=localhost
NNODES=4, MYRANK=0, HOSTNAME=localhost
NNODES=4, MYRANK=2, HOSTNAME=localhost
NODEID=0 argc=15
NODEID=3 argc=15
NODEID=1 argc=15
NODEID=2 argc=15
[05:15:54] Protein: Protein
[05:15:54] Writing local files
starting mdrun 'Protein'
500000 steps, 15000.0 ps.

[05:15:54]
[05:15:54] Writing local files
[05:15:54] Extra SSE boost OK.
[05:15:55] Writing local files
[05:15:55] Completed 0 out of 500000 steps (0 percent)
[0]0:Return code = 0, signaled with Segmentation fault
[0]1:Return code = 0, signaled with Segmentation fault
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Segmentation fault
Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2004, The GROMACS development team,
check out http://www.gromacs.org for more information.

This inclusion of Gromacs code in the Folding@Home Core is under
a special license (see http://folding.stanford.edu/gromacs.html)
specially granted to Stanford by the copyright holders. If you
are interested in using Gromacs, visit http://www.gromacs.org where
you can download a free version of Gromacs under
the terms of the GNU General Public License (GPL) as published
by the Free Software Foundation; either version 2 of the License,
or (at your option) any later version.

[05:15:59] CoreStatus = 0 (0)
[05:15:59] Client-core communications error: ERROR 0x0
[05:15:59] Deleting current work unit & continuing...
[0]0:Return code = 0, signaled with Quit
[0]1:Return code = 18
[0]2:Return code = 0, signaled with Quit
[0]3:Return code = 0, signaled with Quit
[05:20:21] - Preparing to get new work unit...
[05:20:21] + Attempting to get work packet
[05:20:21] - Connecting to assignment server
[05:20:21] - Successful: assigned to (171.64.65.56).
[05:20:21] + News From Folding@Home: Welcome to Folding@Home
[05:20:21] Loaded queue successfully.
[05:20:29] + Closed connections
[05:20:34]
[05:20:34] + Processing work unit
[05:20:34] Core required: FahCore_a1.exe
[05:20:34] Core found.
[05:20:34] Working on Unit 07 [December 23 05:20:34]
[05:20:34] + Working ...
[05:20:34]
[05:20:34] *------------------------------*
[05:20:34] Folding@Home Gromacs SMP Core
[05:20:34] Version 1.74 (November 27, 2006)
[05:20:34]
[05:20:34] Preparing to commence simulation
[05:20:34] - Ensuring status. Please wait.
[05:20:52] - Looking at optimizations...
[05:20:52] - Working with standard loops on this execution.
[05:20:52] - Previous termination of core was improper.
[05:20:52] - Going to use standard loops.
[05:20:52] - Files status OK
[05:20:52] (decompressed 559.7 percent)
[05:20:52] 9 (decompressed 559.7 percent)
[05:20:52] ne 389, Gen 18)
[05:20:52]
[05:20:52] Entering M.D.
[05:20:53] cket
[05:20:53]
[05:20:53] Project: 2608 (Run 0, Clone 389, Gen 18)
[05:20:53]
[05:20:53] Entering M.D.
NNODES=4, MYRANK=1, HOSTNAME=localhost
NNODES=4, MYRANK=2, HOSTNAME=localhost
NNODES=4, MYRANK=3, HOSTNAME=localhost
NNODES=4, MYRANK=0, HOSTNAME=localhost
NODEID=2 argc=15
NODEID=0 argc=15
NODEID=1 argc=15
NODEID=3 argc=15
[05:20:59] Rejecting checkpoint
starting mdrun 'Protein'
500000 steps, 15000.0 ps.

[05:20:59] Protein: ProteinExtra SSE boost OK.
[05:20:59]
[05:21:00] Extra SSE boost OK.
[05:21:00] Writing local files
[05:21:00] Completed 0 out of 500000 steps (0 percent)
[0]0:Return code = 0, signaled with Segmentation fault
[0]1:Return code = 0, signaled with Segmentation fault
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Segmentation fault
Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2004, The GROMACS development team,
check out http://www.gromacs.org for more information.

This inclusion of Gromacs code in the Folding@Home Core is under
a special license (see http://folding.stanford.edu/gromacs.html)
specially granted to Stanford by the copyright holders. If you
are interested in using Gromacs, visit http://www.gromacs.org where
you can download a free version of Gromacs under
the terms of the GNU General Public License (GPL) as published
by the Free Software Foundation; either version 2 of the License,
or (at your option) any later version.

[05:21:04] CoreStatus = 0 (0)
[05:21:04] Client-core communications error: ERROR 0x0
[05:21:04] Deleting current work unit & continuing...
[0]0:Return code = 0, signaled with Quit
[0]1:Return code = 18
[0]2:Return code = 0, signaled with Quit
[0]3:Return code = 0, signaled with Quit
[05:25:26] - Preparing to get new work unit...
[05:25:26] + Attempting to get work packet
[05:25:26] - Connecting to assignment server
[05:25:27] - Successful: assigned to (171.64.65.56).
[05:25:27] + News From Folding@Home: Welcome to Folding@Home
[05:25:27] Loaded queue successfully.
[05:25:35] + Closed connections
[05:25:40]
[05:25:40] + Processing work unit
[05:25:40] Core required: FahCore_a1.exe
[05:25:40] Core found.
[05:25:40] Working on Unit 08 [December 23 05:25:40]
[05:25:40] + Working ...
[05:25:40]
[05:25:40] *------------------------------*
[05:25:40] Folding@Home Gromacs SMP Core
[05:25:40] Version 1.74 (November 27, 2006)
[05:25:40]
[05:25:40] Preparing to commence simulation
[05:25:40] - Ensuring status. Please wait.
[05:25:57] - Looking at optimizations...
[05:25:57] - Working with standard loops on this execution.
[05:25:57] - Created dyn
[05:25:57] - Fi- Expanded 3164996 -> 17715569 (decompressed 559.7 percent)
[05:25:57] Files status OK
[05:25:57] (decompressed 559.7 percent)
[05:25:57] 9 (decompressed 559.7 percent)
[05:25:57] )
[05:25:57]
[05:25:57] Entering M.D.
[05:25:58] 08 (Run 0, Clone 389, Gen 18)
[05:25:58]
[05:25:58] Entering M.D.
[05:25:58] ne 389, Gen 18)
[05:25:58]
[05:25:58] Entering M.D.
NNODES=4, MYRANK=0, HOSTNAME=localhost
NNODES=4, MYRANK=3, HOSTNAME=localhost
NNODES=4, MYRANK=2, HOSTNAME=localhost
NNODES=4, MYRANK=1, HOSTNAME=localhost
NODEID=2 argc=15
NODEID=3 argc=15
NODEID=0 argc=15
NODEID=1 argc=15
[05:26:04] files
[05:26:04] ing checkpoint
starting mdrun 'Protein'
500000 steps, 15000.0 ps.

[05:26:05] Protein: Protein
[05:26:05] Writing local files
[05:26:05] Extra SSE boost OK.
[05:26:05] Writing local files
[05:26:05] Completed 0 out of 500000 steps (0 percent)
[05:26:05]
[05:26:05] Folding@home Core Shutdown: INTERRUPTED
[0]0:Return code = 102
[0]1:Return code = 0, signaled with Segmentation fault
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Segmentation fault
Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2004, The GROMACS development team,
check out http://www.gromacs.org for more information.

This inclusion of Gromacs code in the Folding@Home Core is under
a special license (see http://folding.stanford.edu/gromacs.html)
specially granted to Stanford by the copyright holders. If you
are interested in using Gromacs, visit http://www.gromacs.org where
you can download a free version of Gromacs under
the terms of the GNU General Public License (GPL) as published
by the Free Software Foundation; either version 2 of the License,
or (at your option) any later version.

[05:26:09] CoreStatus = 66 (102)
[05:26:09] + Shutdown requested by user. Exiting.
Folding@Home Client Shutdown.
dutchmm
 
Posts: 15
Joined: Fri Dec 14, 2007 3:37 pm

Re: Project 2608 (Run 0, Clone 389, Gen 18)

Postby bruce » Sat Dec 29, 2007 5:43 am

It appears that you've received a corrupt WU. The server assumes that the WU was corrupted in transmission and it generally retransmitts the same WU three times before moving on to something else. Presumably that's what happened after the end of the log that you posted.

Thank you for the report.
bruce
 
Posts: 20122
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.

Re: Project 2608 (Run 0, Clone 389, Gen 18)

Postby dutchmm » Sat Dec 29, 2007 11:33 pm

So, if I am running the folding software as a service, I need to write some script to check the status every 24 hours or so, and if the service appears not to be running (as was the case when I got back from my holiday), it should restart it? Anyone got one for linux?
dutchmm
 
Posts: 15
Joined: Fri Dec 14, 2007 3:37 pm

Re: Project 2608 (Run 0, Clone 389, Gen 18)

Postby Ivoshiee » Sat Dec 29, 2007 11:57 pm

dutchmm wrote:So, if I am running the folding software as a service, I need to write some script to check the status every 24 hours or so, and if the service appears not to be running (as was the case when I got back from my holiday), it should restart it? Anyone got one for linux?

There has been some mentions about this kind of scripts/apps, but I've never used one.
I think it should be easy to write one around the finstall/folding. Just check if there are the FAH core pids or not and act accordingly.

Example: (http://ra.vendomar.ee/%7Eivo/fsuspend)
Code: Select all
#!/bin/bash

. ./folding DoNothing >/dev/null

#CPUs="$(get_dirs)"

FAHcheck >/dev/null

kill -STOP $sFaHPids $sfahclientPids $sFAHCorePids


http://ra.vendomar.ee/%7Eivo/finstall
http://ra.vendomar.ee/%7Eivo/finstall_in_action.html
http://fahwiki.net/index.php/The_finstall_script#Developers_guide_to_finstall_.26_folding_scripts
Ivoshiee
Site Moderator
 
Posts: 822
Joined: Sun Dec 02, 2007 1:05 am
Location: Estonia


Return to Issues with a specific WU

Who is online

Users browsing this forum: No registered users and 1 guest

cron