Project: 2671 (Run 37, Clone 79, Gen 78) - NaN detected

Moderators: Site Moderators, FAHC Science Team

Post Reply
SGirbau
Posts: 3
Joined: Wed Apr 22, 2009 5:48 am

Project: 2671 (Run 37, Clone 79, Gen 78) - NaN detected

Post by SGirbau »

Hello,
my Linux SMP client has loadad twice a WU from this same unit, and twice failed the same way:

Code: Select all

[07:34:33] 
[07:34:33] *------------------------------*
[07:34:33] Folding@Home Gromacs SMP Core
[07:34:33] Version 2.10 (Sun Aug 30 03:43:28 CEST 2009)
[07:34:33] 
[07:34:33] Preparing to commence simulation
[07:34:33] - Ensuring status. Please wait.
[07:34:42] - Looking at optimizations...
[07:34:42] - Working with standard loops on this execution.
[07:34:42] - Files status OK
[07:34:43] - Expanded 1513330 -> 24038109 (decompressed 1588.4 percent)
[07:34:43] Called DecompressByteArray: compressed_data_size=1513330 data_size=24038109, decompressed_data_size=24038109 diff=0
[07:34:43] - Digital signature verified
[07:34:43] 
[07:34:43] Project: 2671 (Run 37, Clone 79, Gen 78)
[07:34:43] 
[07:34:43] Entering M.D.
NNODES=4, MYRANK=1, HOSTNAME=bartleby
NNODES=4, MYRANK=2, HOSTNAME=bartleby
NNODES=4, MYRANK=3, HOSTNAME=bartleby
NNODES=4, MYRANK=0, HOSTNAME=bartleby
NODEID=0 argc=20
NODEID=1 argc=20
NODEID=2 argc=20
NODEID=3 argc=20
Reading file work/wudata_06.tpr, VERSION 3.3.99_development_20070618 (single precision)
Note: tpx file_version 48, software version 68

NOTE: The tpr file used for this simulation is in an old format, for less memory usage and possibly more performance create a new tpr file with an up to date version of grompp

Making 1D domain decomposition 1 x 1 x 4
starting mdrun '22908 system in water'
19750002 steps,  39500.0 ps (continuing from step 19500002,  39000.0 ps).

-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090605
Source code file: md.c, line: 2169

Fatal error:
NaN detected at step 19500002

For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

Error on node 2, will try to stop all the nodes
Halting parallel program mdrun on CPU 2 out of 4

-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090605
Source code file: md.c, line: 2169

Fatal error:
NaN detected at step 19500002

For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

Error on node 3, will try to stop all the nodes
Halting parallel program mdrun on CPU 3 out of 4

-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090605
Source code file: md.c, line: 2169

Fatal error:
NaN detected at step 19500002

For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

Error on node 0, will try to stop all the nodes
Halting parallel program mdrun on CPU 0 out of 4

gcq#0: Thanx for Using GROMACS - Have a Nice Day

application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0

gcq#0: Thanx for Using GROMACS - Have a Nice Day

application called MPI_Abort(MPI_COMM_WORLD, -1) - process 2

-------------------------------------------------------
Program mdrun, VERSION 4.0.99_development_20090605
Source code file: md.c, line: 2169

Fatal error:
NaN detected at step 19500002

For more information and tips for trouble shooting please check the GROMACS Wiki at
http://wiki.gromacs.org/index.php/Errors
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day

Error on node 1, will try to stop all the nodes
Halting parallel program mdrun on CPU 1 out of 4

gcq#0: Thanx for Using GROMACS - Have a Nice Day
Thank you
parkut
Posts: 364
Joined: Tue Feb 12, 2008 7:33 am
Hardware configuration: Running exclusively Linux headless blades. All are dedicated crunching machines.
Location: SE Michigan, USA

Re: Project: 2671 (Run 37, Clone 79, Gen 78) - NaN detected

Post by parkut »

This is a known bad work unit: viewtopic.php?f=19&t=11098&start=0
Post Reply