Fatal Error with WU

Moderators: Site Moderators, FAHC Science Team

muziqaz
Posts: 901
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 7950x3D, 5950x, 5800x3D, 3900x
7900xtx, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: Fatal Error with WU

Post by muziqaz »

It has been observed recently that sweetspot for current crop projects is 10 or 12 threads. You reduce TPF going more than that but not by big margin. But the way bonuses work, every second counts :)
Maybe with AMDs push towards many core architecture and Zen popularity, we might see the comeback of more optimized more parallel compute capable gromacs code :)
FAH Beta tester
dfreeman
Posts: 6
Joined: Thu Mar 19, 2020 9:28 am

Re: Fatal Error with WU

Post by dfreeman »

With four slots pinned to a socket each, I now have system time below 5% when all slots are working. Before it was ~20% using two slots and two sockets each. So it's definitely worth pinning each slot to one socket, is there a way to do it within the client? Or do I have to run four clients?
muziqaz
Posts: 901
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 7950x3D, 5950x, 5800x3D, 3900x
7900xtx, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: Fatal Error with WU

Post by muziqaz »

Fahclient can have as many slots as you want. How to set them up in the environment you are in I cannot help as I'm not familiar with it :)
FAH Beta tester
jaos
Posts: 7
Joined: Tue Jun 02, 2020 4:28 pm
Hardware configuration: gnujaos Team 236734
GNU/Linux, AMD 3900X on X590, RX5700XT

Re: Fatal Error with WU

Post by jaos »

Had a bad WU that required me to drop my cpu core count:

Code: Select all

16:07:50:WU01:FS01:FahCore 0xa7 started
16:07:50:WU01:FS01:0xa7:*********************** Log Started 2020-06-02T16:07:50Z ***********************
16:07:50:WU01:FS01:0xa7:************************** Gromacs Folding@home Core ***************************
16:07:50:WU01:FS01:0xa7:       Type: 0xa7
16:07:50:WU01:FS01:0xa7:       Core: Gromacs
16:07:50:WU01:FS01:0xa7:       Args: -dir 01 -suffix 01 -version 706 -lifeline 24132 -checkpoint 15 -np
16:07:50:WU01:FS01:0xa7:             23
16:07:50:WU01:FS01:0xa7:************************************ CBang *************************************
16:07:50:WU01:FS01:0xa7:       Date: Nov 5 2019
16:07:50:WU01:FS01:0xa7:       Time: 06:06:57
16:07:50:WU01:FS01:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
16:07:50:WU01:FS01:0xa7:     Branch: master
16:07:50:WU01:FS01:0xa7:   Compiler: GNU 8.3.0
16:07:50:WU01:FS01:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
16:07:50:WU01:FS01:0xa7:   Platform: linux2 4.19.0-5-amd64
16:07:50:WU01:FS01:0xa7:       Bits: 64
16:07:50:WU01:FS01:0xa7:       Mode: Release
16:07:50:WU01:FS01:0xa7:************************************ System ************************************
16:07:50:WU01:FS01:0xa7:        CPU: AMD Ryzen 9 3900X 12-Core Processor
16:07:50:WU01:FS01:0xa7:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
16:07:50:WU01:FS01:0xa7:       CPUs: 24
16:07:50:WU01:FS01:0xa7:     Memory: 31.37GiB
16:07:50:WU01:FS01:0xa7:Free Memory: 1.56GiB
16:07:50:WU01:FS01:0xa7:    Threads: POSIX_THREADS
16:07:50:WU01:FS01:0xa7: OS Version: 5.7
16:07:50:WU01:FS01:0xa7:Has Battery: false
16:07:50:WU01:FS01:0xa7: On Battery: false
16:07:50:WU01:FS01:0xa7: UTC Offset: -4
16:07:50:WU01:FS01:0xa7:        PID: 24136
16:07:50:WU01:FS01:0xa7:        CWD: ...
16:07:50:WU01:FS01:0xa7:******************************** Build - libFAH ********************************
16:07:50:WU01:FS01:0xa7:    Version: 0.0.18
16:07:50:WU01:FS01:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
16:07:50:WU01:FS01:0xa7:  Copyright: 2019 foldingathome.org
16:07:50:WU01:FS01:0xa7:   Homepage: https://foldingathome.org/
16:07:50:WU01:FS01:0xa7:       Date: Nov 5 2019
16:07:50:WU01:FS01:0xa7:       Time: 06:13:26
16:07:50:WU01:FS01:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
16:07:50:WU01:FS01:0xa7:     Branch: master
16:07:50:WU01:FS01:0xa7:   Compiler: GNU 8.3.0
16:07:50:WU01:FS01:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
16:07:50:WU01:FS01:0xa7:   Platform: linux2 4.19.0-5-amd64
16:07:50:WU01:FS01:0xa7:       Bits: 64
16:07:50:WU01:FS01:0xa7:       Mode: Release
16:07:50:WU01:FS01:0xa7:************************************ Build *************************************
16:07:50:WU01:FS01:0xa7:       SIMD: avx_256
16:07:50:WU01:FS01:0xa7:********************************************************************************
16:07:50:WU01:FS01:0xa7:Project: 14246 (Run 0, Clone 8, Gen 258)
16:07:50:WU01:FS01:0xa7:Unit: 0x000001c580fccb0a5d6fe21c31576508
16:07:50:WU01:FS01:0xa7:Digital signatures verified
16:07:50:WU01:FS01:0xa7:Reducing thread count from 23 to 22 to avoid domain decomposition by a prime number > 3
16:07:50:WU01:FS01:0xa7:Reducing thread count from 22 to 21 to avoid domain decomposition with large prime factor 11
16:07:50:WU01:FS01:0xa7:Calling: mdrun -s frame258.tpr -o frame258.trr -x frame258.xtc -cpi state.cpt -cpt 15 -nt 21
16:07:50:WU01:FS01:0xa7:Steps: first=64500000 total=250000
16:07:50:WU01:FS01:0xa7:ERROR:
16:07:50:WU01:FS01:0xa7:ERROR:-------------------------------------------------------
16:07:50:WU01:FS01:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
16:07:50:WU01:FS01:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c,
line: 6902
16:07:50:WU01:FS01:0xa7:ERROR:
16:07:50:WU01:FS01:0xa7:ERROR:Fatal error:
16:07:50:WU01:FS01:0xa7:ERROR:There is no domain decomposition for 16 ranks that is compatible with the given box and a minimum cell size of 1.45733 nm
16:07:50:WU01:FS01:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
16:07:50:WU01:FS01:0xa7:ERROR:Look in the log file for details on the domain decomposition
16:07:50:WU01:FS01:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
16:07:50:WU01:FS01:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
16:07:50:WU01:FS01:0xa7:ERROR:-------------------------------------------------------
16:07:55:WU01:FS01:0xa7:WARNING:Unexpected exit() call
16:07:55:WU01:FS01:0xa7:WARNING:Unexpected exit from science code
U: gnujaos T: 236734
GNU/Linux, AMD 3900X on X509, RX5700XT
muziqaz
Posts: 901
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 7950x3D, 5950x, 5800x3D, 3900x
7900xtx, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: Fatal Error with WU

Post by muziqaz »

jaos wrote:Had a bad WU that required me to drop my cpu core count:

Code: Select all

16:07:50:WU01:FS01:FahCore 0xa7 started
16:07:50:WU01:FS01:0xa7:*********************** Log Started 2020-06-02T16:07:50Z ***********************
16:07:50:WU01:FS01:0xa7:************************** Gromacs Folding@home Core ***************************
16:07:50:WU01:FS01:0xa7:       Type: 0xa7
16:07:50:WU01:FS01:0xa7:       Core: Gromacs
16:07:50:WU01:FS01:0xa7:       Args: -dir 01 -suffix 01 -version 706 -lifeline 24132 -checkpoint 15 -np
16:07:50:WU01:FS01:0xa7:             23
16:07:50:WU01:FS01:0xa7:************************************ CBang *************************************
16:07:50:WU01:FS01:0xa7:       Date: Nov 5 2019
16:07:50:WU01:FS01:0xa7:       Time: 06:06:57
16:07:50:WU01:FS01:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
16:07:50:WU01:FS01:0xa7:     Branch: master
16:07:50:WU01:FS01:0xa7:   Compiler: GNU 8.3.0
16:07:50:WU01:FS01:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
16:07:50:WU01:FS01:0xa7:   Platform: linux2 4.19.0-5-amd64
16:07:50:WU01:FS01:0xa7:       Bits: 64
16:07:50:WU01:FS01:0xa7:       Mode: Release
16:07:50:WU01:FS01:0xa7:************************************ System ************************************
16:07:50:WU01:FS01:0xa7:        CPU: AMD Ryzen 9 3900X 12-Core Processor
16:07:50:WU01:FS01:0xa7:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
16:07:50:WU01:FS01:0xa7:       CPUs: 24
16:07:50:WU01:FS01:0xa7:     Memory: 31.37GiB
16:07:50:WU01:FS01:0xa7:Free Memory: 1.56GiB
16:07:50:WU01:FS01:0xa7:    Threads: POSIX_THREADS
16:07:50:WU01:FS01:0xa7: OS Version: 5.7
16:07:50:WU01:FS01:0xa7:Has Battery: false
16:07:50:WU01:FS01:0xa7: On Battery: false
16:07:50:WU01:FS01:0xa7: UTC Offset: -4
16:07:50:WU01:FS01:0xa7:        PID: 24136
16:07:50:WU01:FS01:0xa7:        CWD: ...
16:07:50:WU01:FS01:0xa7:******************************** Build - libFAH ********************************
16:07:50:WU01:FS01:0xa7:    Version: 0.0.18
16:07:50:WU01:FS01:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
16:07:50:WU01:FS01:0xa7:  Copyright: 2019 foldingathome.org
16:07:50:WU01:FS01:0xa7:   Homepage: https://foldingathome.org/
16:07:50:WU01:FS01:0xa7:       Date: Nov 5 2019
16:07:50:WU01:FS01:0xa7:       Time: 06:13:26
16:07:50:WU01:FS01:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
16:07:50:WU01:FS01:0xa7:     Branch: master
16:07:50:WU01:FS01:0xa7:   Compiler: GNU 8.3.0
16:07:50:WU01:FS01:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
16:07:50:WU01:FS01:0xa7:   Platform: linux2 4.19.0-5-amd64
16:07:50:WU01:FS01:0xa7:       Bits: 64
16:07:50:WU01:FS01:0xa7:       Mode: Release
16:07:50:WU01:FS01:0xa7:************************************ Build *************************************
16:07:50:WU01:FS01:0xa7:       SIMD: avx_256
16:07:50:WU01:FS01:0xa7:********************************************************************************
16:07:50:WU01:FS01:0xa7:Project: 14246 (Run 0, Clone 8, Gen 258)
16:07:50:WU01:FS01:0xa7:Unit: 0x000001c580fccb0a5d6fe21c31576508
16:07:50:WU01:FS01:0xa7:Digital signatures verified
16:07:50:WU01:FS01:0xa7:Reducing thread count from 23 to 22 to avoid domain decomposition by a prime number > 3
16:07:50:WU01:FS01:0xa7:Reducing thread count from 22 to 21 to avoid domain decomposition with large prime factor 11
16:07:50:WU01:FS01:0xa7:Calling: mdrun -s frame258.tpr -o frame258.trr -x frame258.xtc -cpi state.cpt -cpt 15 -nt 21
16:07:50:WU01:FS01:0xa7:Steps: first=64500000 total=250000
16:07:50:WU01:FS01:0xa7:ERROR:
16:07:50:WU01:FS01:0xa7:ERROR:-------------------------------------------------------
16:07:50:WU01:FS01:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
16:07:50:WU01:FS01:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c,
line: 6902
16:07:50:WU01:FS01:0xa7:ERROR:
16:07:50:WU01:FS01:0xa7:ERROR:Fatal error:
16:07:50:WU01:FS01:0xa7:ERROR:There is no domain decomposition for 16 ranks that is compatible with the given box and a minimum cell size of 1.45733 nm
16:07:50:WU01:FS01:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
16:07:50:WU01:FS01:0xa7:ERROR:Look in the log file for details on the domain decomposition
16:07:50:WU01:FS01:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
16:07:50:WU01:FS01:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
16:07:50:WU01:FS01:0xa7:ERROR:-------------------------------------------------------
16:07:55:WU01:FS01:0xa7:WARNING:Unexpected exit() call
16:07:55:WU01:FS01:0xa7:WARNING:Unexpected exit from science code
Thanks for that, project owner has been informed.
Also, reduce your slot thread count to 21. There is no projects which will ever fold on 23 threads you have free. Best would be to have one slot with CPU:21 and maybe second CPU slot with CPU:2 set up. I take it you have a GPU folding as well.
FAH Beta tester
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Fatal Error with WU

Post by bruce »

Microsoft's licensing policy make you pay more for a license that supports >64 threads ... and since there are a lot of @home Donors who run plain-vanilla Windows, a lot of people run into that limit. (I'm not familiar enough with the Docker image to comment about it yet.)
jaos
Posts: 7
Joined: Tue Jun 02, 2020 4:28 pm
Hardware configuration: gnujaos Team 236734
GNU/Linux, AMD 3900X on X590, RX5700XT

Re: Fatal Error with WU

Post by jaos »

muziqaz wrote: Thanks for that, project owner has been informed.
Also, reduce your slot thread count to 21. There is no projects which will ever fold on 23 threads you have free. Best would be to have one slot with CPU:21 and maybe second CPU slot with CPU:2 set up. I take it you have a GPU folding as well.
Yes, I have a GPU as well. Is it better to split my cores up by 7s in different slots?
U: gnujaos T: 236734
GNU/Linux, AMD 3900X on X509, RX5700XT
JimboPalmer
Posts: 2573
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: Fatal Error with WU

Post by JimboPalmer »

jaos wrote:
muziqaz wrote:Yes, I have a GPU as well. Is it better to split my cores up by 7s in different slots?
No.

F@H has difficulty with large primes and their multiples number of CPUs.
7 is always large, 5 is sometimes large, and 3 is never large. Try to choose a number that is a multiple of 2 and/or 3.
1, 2, 3, 4, 6, 8, 9, 12, 16, 18, 20, 21, 24, 27, 30, 32 are known good numbers of CPUs to choose. (_r2w_ben has advised me of more good numbers)
5. 10. 15, 20, 25, 28 may work most of the time.

Other numbers will bite you

From a Science and Points perspective the largest number is best as it completes the WU quickest. So two slots of 21/2 makes more Points/Science than three slots of 9/8/6 even though both use all 23 threads available.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Fatal Error with WU

Post by bruce »

No. Don't use 7. But the idea is often sound.

Many people split up their cores by some smaller numbers containing only small factors ... like 24 or 16 or 12.

FAH will always reduce 7 to 6, leaving one thread unused. If you choose 64 and it works, fine ... until the next project is assigned and its more challenging to find a workable number, leaving too much of the machine idle.
muziqaz
Posts: 901
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 7950x3D, 5950x, 5800x3D, 3900x
7900xtx, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: Fatal Error with WU

Post by muziqaz »

12 and 9 might be safest and less chance to get into scaling issues :)
FAH Beta tester
jaos
Posts: 7
Joined: Tue Jun 02, 2020 4:28 pm
Hardware configuration: gnujaos Team 236734
GNU/Linux, AMD 3900X on X590, RX5700XT

Re: Fatal Error with WU

Post by jaos »

muziqaz wrote:12 and 9 might be safest and less chance to get into scaling issues :)
Thanks all for the information and suggestions!
U: gnujaos T: 236734
GNU/Linux, AMD 3900X on X509, RX5700XT
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Fatal Error with WU

Post by PantherX »

Not sure if you are aware or not but there's an official Docker image from F@H: https://github.com/FoldingAtHome/containers
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
muziqaz
Posts: 901
Joined: Sun Dec 16, 2007 6:22 pm
Hardware configuration: 7950x3D, 5950x, 5800x3D, 3900x
7900xtx, Radeon 7, 5700xt, 6900xt, RX 550 640SP
Location: London
Contact:

Re: Fatal Error with WU

Post by muziqaz »

PantherX wrote:Not sure if you are aware or not but there's an official Docker image from F@H: https://github.com/FoldingAtHome/containers
let's not muddy the waters further with extra layers of complexity :D :D there is absolutely no need to use middle man to adjust simple slot settings :)
Last edited by muziqaz on Fri Jun 12, 2020 8:19 pm, edited 1 time in total.
FAH Beta tester
jaos
Posts: 7
Joined: Tue Jun 02, 2020 4:28 pm
Hardware configuration: gnujaos Team 236734
GNU/Linux, AMD 3900X on X590, RX5700XT

Re: Fatal Error with WU

Post by jaos »

Another bad WU

Code: Select all

13:47:55:WU01:FS01:0xa7:*********************** Log Started 2020-06-12T13:47:54Z ***********************
13:47:55:WU01:FS01:0xa7:************************** Gromacs Folding@home Core ***************************
13:47:55:WU01:FS01:0xa7:       Type: 0xa7
13:47:55:WU01:FS01:0xa7:       Core: Gromacs
13:47:55:WU01:FS01:0xa7:       Args: -dir 01 -suffix 01 -version 706 -lifeline 21173 -checkpoint 15 -np
13:47:55:WU01:FS01:0xa7:             21
13:47:55:WU01:FS01:0xa7:************************************ CBang *************************************
13:47:55:WU01:FS01:0xa7:       Date: Nov 5 2019
13:47:55:WU01:FS01:0xa7:       Time: 06:06:57
13:47:55:WU01:FS01:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
13:47:55:WU01:FS01:0xa7:     Branch: master
13:47:55:WU01:FS01:0xa7:   Compiler: GNU 8.3.0
13:47:55:WU01:FS01:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
13:47:55:WU01:FS01:0xa7:   Platform: linux2 4.19.0-5-amd64
13:47:55:WU01:FS01:0xa7:       Bits: 64
13:47:55:WU01:FS01:0xa7:       Mode: Release
13:47:55:WU01:FS01:0xa7:************************************ System ************************************
13:47:55:WU01:FS01:0xa7:        CPU: AMD Ryzen 9 3900X 12-Core Processor
13:47:55:WU01:FS01:0xa7:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
13:47:55:WU01:FS01:0xa7:       CPUs: 24
13:47:55:WU01:FS01:0xa7:     Memory: 31.37GiB
13:47:55:WU01:FS01:0xa7:Free Memory: 8.25GiB
13:47:55:WU01:FS01:0xa7:    Threads: POSIX_THREADS
13:47:55:WU01:FS01:0xa7: OS Version: 5.7
13:47:55:WU01:FS01:0xa7:Has Battery: false
13:47:55:WU01:FS01:0xa7: On Battery: false
13:47:55:WU01:FS01:0xa7: UTC Offset: -4
13:47:55:WU01:FS01:0xa7:        PID: 21177
13:47:55:WU01:FS01:0xa7:        CWD: /home/jason/projects/fah/work
13:47:55:WU01:FS01:0xa7:******************************** Build - libFAH ********************************
13:47:55:WU01:FS01:0xa7:    Version: 0.0.18
13:47:55:WU01:FS01:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
13:47:55:WU01:FS01:0xa7:  Copyright: 2019 foldingathome.org
13:47:55:WU01:FS01:0xa7:   Homepage: https://foldingathome.org/
13:47:55:WU01:FS01:0xa7:       Date: Nov 5 2019
13:47:55:WU01:FS01:0xa7:       Time: 06:13:26
13:47:55:WU01:FS01:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
13:47:55:WU01:FS01:0xa7:     Branch: master
13:47:55:WU01:FS01:0xa7:   Compiler: GNU 8.3.0
13:47:55:WU01:FS01:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
13:47:55:WU01:FS01:0xa7:   Platform: linux2 4.19.0-5-amd64
13:47:55:WU01:FS01:0xa7:       Bits: 64
13:47:55:WU01:FS01:0xa7:       Mode: Release
13:47:55:WU01:FS01:0xa7:************************************ Build *************************************
13:47:55:WU01:FS01:0xa7:       SIMD: avx_256
13:47:55:WU01:FS01:0xa7:********************************************************************************
13:47:55:WU01:FS01:0xa7:Project: 14524 (Run 482, Clone 2, Gen 33)
13:47:55:WU01:FS01:0xa7:Unit: 0x0000003580fccb0a5e459ba4615a0d2d
13:47:55:WU01:FS01:0xa7:Reading tar file core.xml
13:47:55:WU01:FS01:0xa7:Reading tar file frame33.tpr
13:47:55:WU01:FS01:0xa7:Digital signatures verified
13:47:55:WU01:FS01:0xa7:Calling: mdrun -s frame33.tpr -o frame33.trr -x frame33.xtc -cpt 15 -nt 21
13:47:55:WU01:FS01:0xa7:Steps: first=8250000 total=250000
13:47:55:WU01:FS01:0xa7:ERROR:
13:47:55:WU01:FS01:0xa7:ERROR:-------------------------------------------------------
13:47:55:WU01:FS01:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
13:47:55:WU01:FS01:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
13:47:55:WU01:FS01:0xa7:ERROR:
13:47:55:WU01:FS01:0xa7:ERROR:Fatal error:
13:47:55:WU01:FS01:0xa7:ERROR:There is no domain decomposition for 16 ranks that is compatible with the given box and a minimum cell size of 1.4227 nm
13:47:55:WU01:FS01:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
13:47:55:WU01:FS01:0xa7:ERROR:Look in the log file for details on the domain decomposition
13:47:55:WU01:FS01:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
13:47:55:WU01:FS01:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors

U: gnujaos T: 236734
GNU/Linux, AMD 3900X on X509, RX5700XT
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Fatal Error with WU

Post by bruce »

I would reconfigure the CPU slot(s) with FAHClient. The software tried to avoid this problem several times reducing the thread count but 21 still didn't work. Start with 16 and maybe another slot with 8 threads. The slider isn't well designed for wider CPUs. (Help is on the way with the potential for a new FAHCore "soon")
Post Reply