PRCG 14580 (Run 0, Clone 424, Gen 1) Fatal Error, core count

Moderators: Site Moderators, FAHC Science Team

Post Reply
RunningTurtle
Posts: 7
Joined: Mon Mar 30, 2020 11:38 pm

PRCG 14580 (Run 0, Clone 424, Gen 1) Fatal Error, core count

Post by RunningTurtle »

It appears PRCG 14580 cannot decompose the domain across a 15 core client, and is not configured to automatically reduce core count to compensate.

Log Snippet:

Code: Select all

17:06:55:WU02:FS00:0xa7:Project: 14580 (Run 0, Clone 424, Gen 1)
17:06:55:WU02:FS00:0xa7:Reading tar file core.xml
17:06:55:WU02:FS00:0xa7:Reading tar file frame1.tpr
17:06:55:WU02:FS00:0xa7:Digital signatures verified
17:06:55:WU02:FS00:0xa7:Calling: mdrun -s frame1.tpr -o frame1.trr -x frame1.xtc -cpt 15 -nt 15
17:06:55:WU02:FS00:0xa7:Steps: first=500000 total=500000
17:06:55:WU02:FS00:0xa7:ERROR:
17:06:55:WU02:FS00:0xa7:ERROR:-------------------------------------------------------
17:06:55:WU02:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
17:06:55:WU02:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
17:06:55:WU02:FS00:0xa7:ERROR:
17:06:55:WU02:FS00:0xa7:ERROR:Fatal error:
17:06:55:WU02:FS00:0xa7:ERROR:There is no domain decomposition for 15 ranks that is compatible with the given box and a minimum cell size of 1.37225 nm
17:06:55:WU02:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
17:06:55:WU02:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
17:06:55:WU02:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
17:06:55:WU02:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
17:06:55:WU02:FS00:0xa7:ERROR:-------------------------------------------------------
17:07:00:WU02:FS00:0xa7:WARNING:Unexpected exit() call
17:07:00:WU02:FS00:0xa7:WARNING:Unexpected exit from science code
17:07:00:WU02:FS00:0xa7:Saving result file ../logfile_01.txt
17:07:00:WU02:FS00:0xa7:Saving result file md.log
17:07:00:WU02:FS00:0xa7:Saving result file science.log
17:07:00:WU02:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
17:07:00:WU02:FS00:Starting
17:07:00:WU02:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 02 -suffix 01 -version 705 -lifeline 953 -checkpoint 15 -np 15
17:07:00:WU02:FS00:Started FahCore on PID 16660
17:07:00:WU02:FS00:Core PID:16670
17:07:00:WU02:FS00:FahCore 0xa7 started
I reduced the client core count to 12, and completed the work unit. But other CPU work units automatically reduced core count when they could not decompose the work unit to 15 cores.
Joe_H
Site Admin
Posts: 7870
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: PRCG 14580 (Run 0, Clone 424, Gen 1) Fatal Error, core c

Post by Joe_H »

Thanks for the report, I have passed it on and the assignment parameters for thread count have been adjusted.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Post Reply