Page 1 of 1

ERROR:There is no domain decomposition for 10 ranks that is

Posted: Sun Jun 21, 2020 2:43 am
by HannuN

Code: Select all

...
02:36:12:WU01:FS00:Starting
02:36:12:WU01:FS00:Removing old file 'work/01/logfile_01-20200621-020412.txt'
02:36:12:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 01 -suffix 01 -version 706 -lifeline 23033 -checkpoint 15 -np 11
02:36:12:WU01:FS00:Started FahCore on PID 3582
02:36:12:WU01:FS00:Core PID:3586
02:36:12:WU01:FS00:FahCore 0xa7 started
02:36:13:WU01:FS00:0xa7:*********************** Log Started 2020-06-21T02:36:12Z ***********************
02:36:13:WU01:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
02:36:13:WU01:FS00:0xa7:       Type: 0xa7
02:36:13:WU01:FS00:0xa7:       Core: Gromacs
02:36:13:WU01:FS00:0xa7:       Args: -dir 01 -suffix 01 -version 706 -lifeline 3582 -checkpoint 15 -np
02:36:13:WU01:FS00:0xa7:             11
02:36:13:WU01:FS00:0xa7:************************************ CBang *************************************
02:36:13:WU01:FS00:0xa7:       Date: Nov 5 2019
02:36:13:WU01:FS00:0xa7:       Time: 06:06:57
02:36:13:WU01:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
02:36:13:WU01:FS00:0xa7:     Branch: master
02:36:13:WU01:FS00:0xa7:   Compiler: GNU 8.3.0
02:36:13:WU01:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
02:36:13:WU01:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
02:36:13:WU01:FS00:0xa7:       Bits: 64
02:36:13:WU01:FS00:0xa7:       Mode: Release
02:36:13:WU01:FS00:0xa7:************************************ System ************************************
02:36:13:WU01:FS00:0xa7:        CPU: Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
02:36:13:WU01:FS00:0xa7:     CPU ID: GenuineIntel Family 6 Model 63 Stepping 2
02:36:13:WU01:FS00:0xa7:       CPUs: 12
02:36:13:WU01:FS00:0xa7:     Memory: 31.29GiB
02:36:13:WU01:FS00:0xa7:Free Memory: 26.72GiB
02:36:13:WU01:FS00:0xa7:    Threads: POSIX_THREADS
02:36:13:WU01:FS00:0xa7: OS Version: 5.3
02:36:13:WU01:FS00:0xa7:Has Battery: false
02:36:13:WU01:FS00:0xa7: On Battery: false
02:36:13:WU01:FS00:0xa7: UTC Offset: 2
02:36:13:WU01:FS00:0xa7:        PID: 3586
02:36:13:WU01:FS00:0xa7:        CWD: /var/lib/fahclient/work
02:36:13:WU01:FS00:0xa7:******************************** Build - libFAH ********************************
02:36:13:WU01:FS00:0xa7:    Version: 0.0.18
02:36:13:WU01:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
02:36:13:WU01:FS00:0xa7:  Copyright: 2019 foldingathome.org
02:36:13:WU01:FS00:0xa7:   Homepage: https://foldingathome.org/
02:36:13:WU01:FS00:0xa7:       Date: Nov 5 2019
02:36:13:WU01:FS00:0xa7:       Time: 06:13:26
02:36:13:WU01:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
02:36:13:WU01:FS00:0xa7:     Branch: master
02:36:13:WU01:FS00:0xa7:   Compiler: GNU 8.3.0
02:36:13:WU01:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
02:36:13:WU01:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
02:36:13:WU01:FS00:0xa7:       Bits: 64
02:36:13:WU01:FS00:0xa7:       Mode: Release
02:36:13:WU01:FS00:0xa7:************************************ Build *************************************
02:36:13:WU01:FS00:0xa7:       SIMD: avx_256
02:36:13:WU01:FS00:0xa7:********************************************************************************
02:36:13:WU01:FS00:0xa7:Project: 14523 (Run 915, Clone 2, Gen 50)
02:36:13:WU01:FS00:0xa7:Unit: 0x0000004f80fccb0a5e459bc34b84af55
02:36:13:WU01:FS00:0xa7:Reading tar file core.xml
02:36:13:WU01:FS00:0xa7:Reading tar file frame50.tpr
02:36:13:WU01:FS00:0xa7:Digital signatures verified
02:36:13:WU01:FS00:0xa7:Reducing thread count from 11 to 10 to avoid domain decomposition by a prime number > 3
02:36:13:WU01:FS00:0xa7:Calling: mdrun -s frame50.tpr -o frame50.trr -x frame50.xtc -cpt 15 -nt 10
02:36:13:WU01:FS00:0xa7:Steps: first=12500000 total=250000
02:36:13:WU01:FS00:0xa7:ERROR:
02:36:13:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
02:36:13:WU01:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
02:36:13:WU01:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
02:36:13:WU01:FS00:0xa7:ERROR:
02:36:13:WU01:FS00:0xa7:ERROR:Fatal error:
02:36:13:WU01:FS00:0xa7:ERROR:There is no domain decomposition for 10 ranks that is compatible with the given box and a minimum cell size of 1.4227 nm
02:36:13:WU01:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
02:36:13:WU01:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
02:36:13:WU01:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
02:36:13:WU01:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
02:36:13:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
02:36:17:WU01:FS00:0xa7:WARNING:Unexpected exit() call
02:36:17:WU01:FS00:0xa7:WARNING:Unexpected exit from science code
02:36:17:WU01:FS00:0xa7:Saving result file ../logfile_01.txt
02:36:17:WU01:FS00:0xa7:Saving result file md.log
02:36:17:WU01:FS00:0xa7:Saving result file science.log
02:36:18:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
Mod Edit: Added Code Tags - PantherX

Re: ERROR:There is no domain decomposition for 10 ranks that

Posted: Sun Jun 21, 2020 12:06 pm
by uyaem
Did your client dump the WU after, or is it stuck in a loop with this?
If stuck in a loop, temporarily reduce the CPU count to 9 or 8.

Re: ERROR:There is no domain decomposition for 10 ranks that

Posted: Mon Jul 06, 2020 9:17 pm
by HannuN
It is stuck in a loop.

I'm fairly fed up with this. It happens more often recently.
I check the clients at most once per 24h.

Code: Select all

1:12:02:WU02:FS00:0xa7:ERROR:
21:12:02:WU02:FS00:0xa7:ERROR:-------------------------------------------------------
21:12:02:WU02:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
21:12:02:WU02:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
21:12:02:WU02:FS00:0xa7:ERROR:
21:12:02:WU02:FS00:0xa7:ERROR:Fatal error:
21:12:02:WU02:FS00:0xa7:ERROR:There is no domain decomposition for 10 ranks that is compatible with the given box and a minimum cell size of 1.4227 nm
21:12:02:WU02:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
21:12:02:WU02:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
21:12:02:WU02:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
21:12:02:WU02:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
21:12:02:WU02:FS00:0xa7:ERROR:-------------------------------------------------------
21:12:02:WU01:FS01:0x22:Completed 90000 out of 600000 steps (15%)
21:12:07:WU02:FS00:0xa7:WARNING:Unexpected exit() call
21:12:07:WU02:FS00:0xa7:WARNING:Unexpected exit from science code


Re: ERROR:There is no domain decomposition for 10 ranks that

Posted: Mon Jul 06, 2020 9:38 pm
by Joe_H
Then change the CPU thread setting to something else. As shown in your first log, you have 12 CPU threads, half of those are from HT. Using just the 6 FPUs by assigning a value of 6 will get about 75-80% of the possible processing power out of your system, Leaving it set at 11 is always going to adjust down to 10.

Re: ERROR:There is no domain decomposition for 10 ranks that

Posted: Tue Jul 07, 2020 1:29 am
by JimboPalmer
Welcome to Folding@Home!

I am going to give some Philosophy, then advice.

If a GPU exists, one thread (F@H calls them CPUs) is devoted to each GPU. In your example, while you have 12 threads, the most you can fold on are 11 as one is feeding data to the GPU.

The science code is called GROMACS, and there is a team of programmers who work on improving it. Then F@H uses GROMACS to fold on CPUs.
https://en.wikipedia.org/wiki/GROMACS

Sadly, GROMACS has issues with 'large' prime numbers and numbers with 'large' prime factors. 2 and 3 are not large, 5 is sometimes large, and 7 and up are always large. In your case, 10 has 2 and 5 as factors and 5 is causing trouble. You will notice that the numbers being recommended only have 2 and 3 as factors.

(During your schooling, they swore you would use that math as a grown up. This is it)

So that is why folks are suggesting 9, 8, or 6 as better choices than 11 or 10.

You are using Linux, which I know little about, but if you have a GUI that support FAHControl here is the directions.

Start fahcontrol

On the screen to the left is a Configure button, click it

Now you get a screen with a Slots tab, click it

On this white field should be a cpu item, click it and then click edit

By default F@H set the number of CPUs to -1 meaning let the software decide.
You can enter any number from 1 to the number of threads your CPU supports.

If you have GPUs, F@H reserves one CPU per GPU to feed it data across the PCIE bus.

1, 2, 3, 4, 6, 8, and 9 are good numbers of CPUs to choose.
5 and 10 may work most of the time. Other numbers will bite you
Type the number you want, and click save.

Re: ERROR:There is no domain decomposition for 10 ranks that

Posted: Wed Jul 08, 2020 2:39 pm
by HannuN
...
14:38:43:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 25533
14:38:43:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 25533
14:38:43:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 25533
14:38:43:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 25533
14:38:43:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 25533
14:38:43:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 25533
14:38:43:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 25533
14:38:43:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 25533
14:38:43:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 25533
14:38:43:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 25533
14:38:43:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 25533
14:38:43:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)

Re: ERROR:There is no domain decomposition for 10 ranks that

Posted: Wed Jul 08, 2020 9:33 pm
by _r2w_ben
HannuN, thanks for the reports! Could you include the project number for the most recent occurrence?