_a7 core crashing in Gromacs

Moderators: Site Moderators, FAHC Science Team

Re: _a7 core crashing in Gromacs (p16403)

Postby Zzyzx » Sun Jul 05, 2020 3:49 am

PantherX wrote:FYI, the researcher has decided to err on the side of caution and have prevented 24 CPUs from receiving Project 16403.

Still getting this assigned on 48 CPUs:
Code: Select all
02:46:05:WU01:FS00:Starting
02:46:05:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 01 -suffix 01 -version 706 -lifeline 827168 -checkpoint 5 -np 48
02:46:05:WU01:FS00:Started FahCore on PID 1017750
02:46:05:WU01:FS00:Core PID:1017754
02:46:05:WU01:FS00:FahCore 0xa7 started
02:46:06:WU01:FS00:0xa7:*********************** Log Started 2020-07-05T02:46:05Z ***********************
02:46:06:WU01:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
02:46:06:WU01:FS00:0xa7:       Type: 0xa7
02:46:06:WU01:FS00:0xa7:       Core: Gromacs
02:46:06:WU01:FS00:0xa7:       Args: -dir 01 -suffix 01 -version 706 -lifeline 1017750 -checkpoint 5 -np
02:46:06:WU01:FS00:0xa7:             48
02:46:06:WU01:FS00:0xa7:************************************ CBang *************************************
02:46:06:WU01:FS00:0xa7:       Date: Nov 5 2019
02:46:06:WU01:FS00:0xa7:       Time: 06:06:57
02:46:06:WU01:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
02:46:06:WU01:FS00:0xa7:     Branch: master
02:46:06:WU01:FS00:0xa7:   Compiler: GNU 8.3.0
02:46:06:WU01:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
02:46:06:WU01:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
02:46:06:WU01:FS00:0xa7:       Bits: 64
02:46:06:WU01:FS00:0xa7:       Mode: Release
02:46:06:WU01:FS00:0xa7:************************************ System ************************************
02:46:06:WU01:FS00:0xa7:        CPU: Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
02:46:06:WU01:FS00:0xa7:     CPU ID: GenuineIntel Family 6 Model 62 Stepping 4
02:46:06:WU01:FS00:0xa7:       CPUs: 48
02:46:06:WU01:FS00:0xa7:     Memory: 15.48GiB
02:46:06:WU01:FS00:0xa7:Free Memory: 2.72GiB
02:46:06:WU01:FS00:0xa7:    Threads: POSIX_THREADS
02:46:06:WU01:FS00:0xa7: OS Version: 4.18
02:46:06:WU01:FS00:0xa7:Has Battery: false
02:46:06:WU01:FS00:0xa7: On Battery: false
02:46:06:WU01:FS00:0xa7: UTC Offset: -7
02:46:06:WU01:FS00:0xa7:        PID: 1017754
02:46:06:WU01:FS00:0xa7:        CWD: /var/lib/fahclient/work
02:46:06:WU01:FS00:0xa7:******************************** Build - libFAH ********************************
02:46:06:WU01:FS00:0xa7:    Version: 0.0.18
02:46:06:WU01:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
02:46:06:WU01:FS00:0xa7:  Copyright: 2019 foldingathome.org
02:46:06:WU01:FS00:0xa7:   Homepage: https://foldingathome.org/
02:46:06:WU01:FS00:0xa7:       Date: Nov 5 2019
02:46:06:WU01:FS00:0xa7:       Time: 06:13:26
02:46:06:WU01:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
02:46:06:WU01:FS00:0xa7:     Branch: master
02:46:06:WU01:FS00:0xa7:   Compiler: GNU 8.3.0
02:46:06:WU01:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
02:46:06:WU01:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
02:46:06:WU01:FS00:0xa7:       Bits: 64
02:46:06:WU01:FS00:0xa7:       Mode: Release
02:46:06:WU01:FS00:0xa7:************************************ Build *************************************
02:46:06:WU01:FS00:0xa7:       SIMD: avx_256
02:46:06:WU01:FS00:0xa7:********************************************************************************
02:46:06:WU01:FS00:0xa7:Project: 16403 (Run 1152, Clone 0, Gen 130)
02:46:06:WU01:FS00:0xa7:Unit: 0x0000009196880e6e5e8be0da76e9eaba
02:46:06:WU01:FS00:0xa7:Reading tar file core.xml
02:46:06:WU01:FS00:0xa7:Reading tar file frame130.tpr
02:46:06:WU01:FS00:0xa7:Digital signatures verified
02:46:06:WU01:FS00:0xa7:Calling: mdrun -s frame130.tpr -o frame130.trr -x frame130.xtc -cpt 5 -nt 48
02:46:06:WU01:FS00:0xa7:Steps: first=65000000 total=500000
02:46:06:WU01:FS00:0xa7:ERROR:
02:46:06:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
02:46:06:WU01:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
02:46:06:WU01:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
02:46:06:WU01:FS00:0xa7:ERROR:
02:46:06:WU01:FS00:0xa7:ERROR:Fatal error:
02:46:06:WU01:FS00:0xa7:ERROR:There is no domain decomposition for 40 ranks that is compatible with the given box and a minimum cell size of 1.45733 nm
02:46:06:WU01:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
02:46:06:WU01:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
02:46:06:WU01:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
02:46:06:WU01:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
02:46:06:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
02:46:10:WU01:FS00:0xa7:WARNING:Unexpected exit() call
02:46:10:WU01:FS00:0xa7:WARNING:Unexpected exit from science code
02:46:10:WU01:FS00:0xa7:Saving result file ../logfile_01.txt
02:46:10:WU01:FS00:0xa7:Saving result file md.log
02:46:10:WU01:FS00:0xa7:Saving result file science.log
02:46:11:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
“Don't lose your mind trying to set it free...”
User avatar
Zzyzx
 
Posts: 8
Joined: Thu Apr 18, 2013 5:20 pm
Location: Phoenix, Arizona, USA

Re: _a7 core crashing in Gromacs (p16403)

Postby kyleedwardsny » Sat Jul 11, 2020 1:39 am

Zzyzx wrote:Still getting this assigned on 48 CPUs:


Yep, I'm getting the same thing.

I have set my config to 16 cores for the time being until this gets worked out.
kyleedwardsny
 
Posts: 10
Joined: Fri Apr 10, 2020 11:09 pm

Previous

Return to CPU Projects - released FAHCores _a4 & _a7

Who is online

Users browsing this forum: No registered users and 1 guest

cron