_a7 core crashing in Gromacs

Moderators: Site Moderators, FAHC Science Team

_r2w_ben
Posts: 285
Joined: Wed Apr 23, 2008 3:11 pm

Re: _a7 core crashing in Gromacs

Post by _r2w_ben »

kyleedwardsny wrote:Alright, so, is there anything I can do to skip over this work unit, or otherwise get it to work? It's been crashing over and over since yesterday afternoon and keeping my computer idle :(
Follow JimboPalmer's post and lower the number of CPUs assigned to the slot.
Set it 23, click OK, click Save and then check the log to see if started successfully.
Repeat lowering by 1 until it works.

After the work unit finishes you can set the CPUs back to 24.
kyleedwardsny
Posts: 10
Joined: Fri Apr 10, 2020 10:09 pm

Re: _a7 core crashing in Gromacs

Post by kyleedwardsny »

This is on a Linux server. There is no graphical application for me to click buttons. I'm guessing I need to edit a config file instead?
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: _a7 core crashing in Gromacs

Post by Joe_H »

Answer was given in first response, change the number of CPU threads it tries to run on. You can move the slider Medium or Light if you have left control to that, or change the number in FAHControl's Configure. I would suggest 18 or 16 threads as a starting place.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
_r2w_ben
Posts: 285
Joined: Wed Apr 23, 2008 3:11 pm

Re: _a7 core crashing in Gromacs

Post by _r2w_ben »

kyleedwardsny wrote:This is on a Linux server. There is no graphical application for me to click buttons. I'm guessing I need to edit a config file instead?
Yes, config.xml should be located in /etc/fahclient/

Change

Code: Select all

<slot id='0' type='CPU'/>
to

Code: Select all

<slot id='0' type='CPU'>
    <cpus v='23' />
</slot>
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: _a7 core crashing in Gromacs

Post by PantherX »

kyleedwardsny wrote:Hi PantherX, sorry about the log file confusion on my part. Here is the system configuration portion of /config/log...
That's all good, the important thing to remember is we got there in the end :)

I have noticed that you're not using a passkey. While it is recommended for security and bonus points, it's optional so have a read here and then make a decision: https://foldingathome.org/support/faq/points/passkey/
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
kyleedwardsny
Posts: 10
Joined: Fri Apr 10, 2020 10:09 pm

Re: _a7 core crashing in Gromacs

Post by kyleedwardsny »

_r2w_ben wrote:Yes, config.xml should be located in /etc/fahclient/

Change

Code: Select all

<slot id='0' type='CPU'/>
to

Code: Select all

<slot id='0' type='CPU'>
    <cpus v='23' />
</slot>
Thank you _r2w_ben, this was exactly the information I needed! My client is now churning through the work unit with 16 out of 24 cores and has not crashed.
kyleedwardsny
Posts: 10
Joined: Fri Apr 10, 2020 10:09 pm

Re: _a7 core crashing in Gromacs

Post by kyleedwardsny »

PantherX wrote:I have noticed that you're not using a passkey.
Thanks PantherX, I have heeded your advice and generated a passkey.
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: _a7 core crashing in Gromacs

Post by PantherX »

FYI, I have confirmation from the Project owner that Project 16417 will no longer be assigned to 24 CPUs. Thanks all for your report :)
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
kyleedwardsny
Posts: 10
Joined: Fri Apr 10, 2020 10:09 pm

Re: _a7 core crashing in Gromacs

Post by kyleedwardsny »

Excellent! I will consider this issue closed then.
m1geo
Posts: 10
Joined: Tue Mar 31, 2020 2:07 am
Location: Cambridge, UK
Contact:

Re: _a7 core crashing in Gromacs

Post by m1geo »

PantherX wrote:FYI, I have confirmation from the Project owner that Project 16417 will no longer be assigned to 24 CPUs. Thanks all for your report :)
Just received the same from project 16403. 24 CPUs. I've changed to 23 now, and it's working. I guess I need to put a GPU in here to make use of the "spare" CPU ;)

AMD Ryzen 9 3900X, 64 GB RAM, No GPU, Ubuntu Linux 19.10 x64.

Error below:

Code: Select all

15:58:00:WU00:FS00:Starting
15:58:00:WU00:FS00:Removing old file './work/00/logfile_01-20200415-152559.txt'
15:58:00:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 705 -lifeline 1639 -checkpoint 15 -np 24
15:58:00:WU00:FS00:Started FahCore on PID 7868
15:58:00:WU00:FS00:Core PID:7872
15:58:00:WU00:FS00:FahCore 0xa7 started
15:58:00:WU00:FS00:0xa7:*********************** Log Started 2020-04-15T15:58:00Z ***********************
15:58:00:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
15:58:00:WU00:FS00:0xa7:       Type: 0xa7
15:58:00:WU00:FS00:0xa7:       Core: Gromacs
15:58:00:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 705 -lifeline 7868 -checkpoint 15 -np
15:58:00:WU00:FS00:0xa7:             24
15:58:00:WU00:FS00:0xa7:************************************ CBang *************************************
15:58:00:WU00:FS00:0xa7:       Date: Nov 5 2019
15:58:00:WU00:FS00:0xa7:       Time: 06:06:57
15:58:00:WU00:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
15:58:00:WU00:FS00:0xa7:     Branch: master
15:58:00:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
15:58:00:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
15:58:00:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
15:58:00:WU00:FS00:0xa7:       Bits: 64
15:58:00:WU00:FS00:0xa7:       Mode: Release
15:58:00:WU00:FS00:0xa7:************************************ System ************************************
15:58:00:WU00:FS00:0xa7:        CPU: AMD Ryzen 9 3900X 12-Core Processor
15:58:00:WU00:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
15:58:00:WU00:FS00:0xa7:       CPUs: 24
15:58:00:WU00:FS00:0xa7:     Memory: 62.79GiB
15:58:00:WU00:FS00:0xa7:Free Memory: 61.72GiB
15:58:00:WU00:FS00:0xa7:    Threads: POSIX_THREADS
15:58:00:WU00:FS00:0xa7: OS Version: 5.3
15:58:00:WU00:FS00:0xa7:Has Battery: false
15:58:00:WU00:FS00:0xa7: On Battery: false
15:58:00:WU00:FS00:0xa7: UTC Offset: 1
15:58:00:WU00:FS00:0xa7:        PID: 7872
15:58:00:WU00:FS00:0xa7:        CWD: /var/lib/fahclient/work
15:58:00:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
15:58:00:WU00:FS00:0xa7:    Version: 0.0.18
15:58:00:WU00:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
15:58:00:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
15:58:00:WU00:FS00:0xa7:   Homepage: https://foldingathome.org/
15:58:00:WU00:FS00:0xa7:       Date: Nov 5 2019
15:58:00:WU00:FS00:0xa7:       Time: 06:13:26
15:58:00:WU00:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
15:58:00:WU00:FS00:0xa7:     Branch: master
15:58:00:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
15:58:00:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
15:58:00:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
15:58:00:WU00:FS00:0xa7:       Bits: 64
15:58:00:WU00:FS00:0xa7:       Mode: Release
15:58:00:WU00:FS00:0xa7:************************************ Build *************************************
15:58:00:WU00:FS00:0xa7:       SIMD: avx_256
15:58:00:WU00:FS00:0xa7:********************************************************************************
15:58:00:WU00:FS00:0xa7:Project: 16403 (Run 1069, Clone 0, Gen 34)
15:58:00:WU00:FS00:0xa7:Unit: 0x0000002696880e6e5e8be09915f36d54
15:58:00:WU00:FS00:0xa7:Reading tar file core.xml
15:58:00:WU00:FS00:0xa7:Reading tar file frame34.tpr
15:58:00:WU00:FS00:0xa7:Digital signatures verified
15:58:00:WU00:FS00:0xa7:Calling: mdrun -s frame34.tpr -o frame34.trr -x frame34.xtc -cpt 15 -nt 24
15:58:00:WU00:FS00:0xa7:Steps: first=17000000 total=500000
15:58:00:WU00:FS00:0xa7:ERROR:
15:58:00:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
15:58:00:WU00:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
15:58:00:WU00:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
15:58:00:WU00:FS00:0xa7:ERROR:
15:58:00:WU00:FS00:0xa7:ERROR:Fatal error:
15:58:00:WU00:FS00:0xa7:ERROR:There is no domain decomposition for 20 ranks that is compatible with the given box and a minimum cell size of 1.45733 nm
15:58:00:WU00:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
15:58:00:WU00:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
15:58:00:WU00:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
15:58:00:WU00:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
15:58:00:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
15:58:05:WU00:FS00:0xa7:WARNING:Unexpected exit() call
15:58:05:WU00:FS00:0xa7:WARNING:Unexpected exit from science code
15:58:05:WU00:FS00:0xa7:Saving result file ../logfile_01.txt
15:58:05:WU00:FS00:0xa7:Saving result file md.log
15:58:05:WU00:FS00:0xa7:Saving result file science.log
15:58:05:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
Thanks
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: _a7 core crashing in Gromacs

Post by bruce »

You can reduce the number of CPUs used to some number that works. The curren version of gromacs is allocating some threads to the PME processing and we're still figuring out the details and what to do about it. All 24 cores are being used. Four are being allocated to PME and 20 are being allocated to the domain processing which is not ideal.

In config.xml you'll find something like this:
<slot type="CPU" id="0">
<cpus v="16"/>
</slot>

Adding the cpus entry will allow you to control the setting. 18 might work, or 16, or 12.

When we figure out about PME, we'll get back to you. 24 should work, but it's not.

If you want maximum utilization right now, the earlier suggestion of using two slots, one of 16 and one of 8 is a good one.

That would look something like this:

<slot type="CPU" id="0">
<cpus v="16"/>
</slot><slot type="CPU" id="1">
<cpus v="8"/>
</slot>

(off-topic and just for my information) is it impossible to make FAHControl work in your environment? Would a text-based editor be better?
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: _a7 core crashing in Gromacs

Post by Neil-B »

For the most part 24core works fine so if you have a 24core slot don't feel you have to change it down (unless of course it is to get an issue like this one cleared) - occasionally a project comes along that has issues, but hopefully these get picked up in beta/advanced and they are simply not assigned to 24core slot ... On advanced 24core was kept working pretty much 7/24 over the last months leaner period - but I may just have been lucky - I'll check back but fairly sure I had no failures (might have been one).
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
m1geo
Posts: 10
Joined: Tue Mar 31, 2020 2:07 am
Location: Cambridge, UK
Contact:

Re: _a7 core crashing in Gromacs

Post by m1geo »

I'm not the OP. I was just reporting this was still ongoing, the previous last post said it was resolved.

I switched it down to 23 CPUs for the time being and that's working fine.

Keep up the good work.
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: _a7 core crashing in Gromacs

Post by PantherX »

m1geo wrote:
PantherX wrote:FYI, I have confirmation from the Project owner that Project 16417 will no longer be assigned to 24 CPUs. Thanks all for your report :)
Just received the same from project 16403. 24 CPUs. I've changed to 23 now, and it's working. I guess I need to put a GPU in here to make use of the "spare" CPU ;)...
Please note Project 16403 is different that Project 16417. However, I have informed the researcher to let's see what happens :)
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: _a7 core crashing in Gromacs

Post by PantherX »

FYI, the researcher has decided to err on the side of caution and have prevented 24 CPUs from receiving Project 16403.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Post Reply