Problem with p14520

Moderators: Site Moderators, FAHC Science Team

Post Reply
steffenmoser
Posts: 6
Joined: Fri Nov 07, 2014 10:10 am

Problem with p14520

Post by steffenmoser »

Hi all,

The project p14520 results in the following error directly after start:

Code: Select all

11:34:14:WU01:FS00:0xa7:ERROR:Fatal error:
[b]11:34:14:WU01:FS00:0xa7:ERROR:There is no domain decomposition for 15 ranks that is compatible with the given box and a minimum cell size of 1.46925 nm[/b]
11:34:14:WU01:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
11:34:14:WU01:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
11:34:14:WU01:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
11:34:14:WU01:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
I run client 7.5.1 on OpenSuSE 15.0 and Linux kernel 4.12 on an AMD Opteron(tm) Processor 4386 (2 CPUs, 16 cores) with 128 GB of RAM. Here are the details:

Code: Select all

11:34:59:WU01:FS00:Starting
11:34:59:WU01:FS00:Running FahCore: /home/steffen/dnet/fah/FAHCoreWrapper /home/steffen/dnet/fah/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 01 -suffix 01 -version 705 -lifeline 6982 -checkpoint 15 -np 15
11:34:59:WU01:FS00:Started FahCore on PID 7033
11:34:59:WU01:FS00:Core PID:7037
11:34:59:WU01:FS00:FahCore 0xa7 started
11:35:00:WU01:FS00:0xa7:*********************** Log Started 2020-03-07T11:34:59Z ***********************
11:35:00:WU01:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
11:35:00:WU01:FS00:0xa7:       Type: 0xa7
11:35:00:WU01:FS00:0xa7:       Core: Gromacs
11:35:00:WU01:FS00:0xa7:       Args: -dir 01 -suffix 01 -version 705 -lifeline 7033 -checkpoint 15 -np
11:35:00:WU01:FS00:0xa7:             15
11:35:00:WU01:FS00:0xa7:************************************ CBang *************************************
11:35:00:WU01:FS00:0xa7:       Date: Nov 5 2019
11:35:00:WU01:FS00:0xa7:       Time: 06:06:57
11:35:00:WU01:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
11:35:00:WU01:FS00:0xa7:     Branch: master
11:35:00:WU01:FS00:0xa7:   Compiler: GNU 8.3.0
11:35:00:WU01:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
11:35:00:WU01:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
11:35:00:WU01:FS00:0xa7:       Bits: 64
11:35:00:WU01:FS00:0xa7:       Mode: Release
11:35:00:WU01:FS00:0xa7:************************************ System ************************************
11:35:00:WU01:FS00:0xa7:        CPU: AMD Opteron(tm) Processor 4386
11:35:00:WU01:FS00:0xa7:     CPU ID: AuthenticAMD Family 21 Model 2 Stepping 0
11:35:00:WU01:FS00:0xa7:       CPUs: 16
11:35:00:WU01:FS00:0xa7:     Memory: 125.90GiB
11:35:00:WU01:FS00:0xa7:Free Memory: 111.75GiB
11:35:00:WU01:FS00:0xa7:    Threads: POSIX_THREADS
11:35:00:WU01:FS00:0xa7: OS Version: 4.12
11:35:00:WU01:FS00:0xa7:Has Battery: false
11:35:00:WU01:FS00:0xa7: On Battery: false
11:35:00:WU01:FS00:0xa7: UTC Offset: 1
11:35:00:WU01:FS00:0xa7:        PID: 7037
11:35:00:WU01:FS00:0xa7:        CWD: /home/steffen/dnet/fah/work
11:35:00:WU01:FS00:0xa7:******************************** Build - libFAH ********************************
11:35:00:WU01:FS00:0xa7:    Version: 0.0.18
11:35:00:WU01:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
11:35:00:WU01:FS00:0xa7:  Copyright: 2019 foldingathome.org
11:35:00:WU01:FS00:0xa7:   Homepage: https://foldingathome.org/
11:35:00:WU01:FS00:0xa7:       Date: Nov 5 2019
11:35:00:WU01:FS00:0xa7:       Time: 06:13:26
11:35:00:WU01:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
11:35:00:WU01:FS00:0xa7:     Branch: master
11:35:00:WU01:FS00:0xa7:   Compiler: GNU 8.3.0
11:35:00:WU01:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
11:35:00:WU01:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
11:35:00:WU01:FS00:0xa7:       Bits: 64
11:35:00:WU01:FS00:0xa7:       Mode: Release
11:35:00:WU01:FS00:0xa7:************************************ Build *************************************
11:35:00:WU01:FS00:0xa7:       SIMD: avx_256
11:35:00:WU01:FS00:0xa7:********************************************************************************
11:35:00:WU01:FS00:0xa7:Project: 14520 (Run 0, Clone 18, Gen 29)
11:35:00:WU01:FS00:0xa7:Unit: 0x0000002480fccb0a5e349475e906b0f3
11:35:00:WU01:FS00:0xa7:Reading tar file core.xml
11:35:00:WU01:FS00:0xa7:Reading tar file frame29.tpr
11:35:00:WU01:FS00:0xa7:Digital signatures verified
11:35:00:WU01:FS00:0xa7:Calling: mdrun -s frame29.tpr -o frame29.trr -x frame29.xtc -cpt 15 -nt 15
11:35:00:WU01:FS00:0xa7:Steps: first=7250000 total=250000
11:35:00:WU01:FS00:0xa7:ERROR:
11:35:00:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
11:35:00:WU01:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
11:35:00:WU01:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
11:35:00:WU01:FS00:0xa7:ERROR:
11:35:00:WU01:FS00:0xa7:ERROR:Fatal error:
11:35:00:WU01:FS00:0xa7:ERROR:There is no domain decomposition for 15 ranks that is compatible with the given box and a minimum cell size of 1.46925 nm
11:35:00:WU01:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
11:35:00:WU01:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
11:35:00:WU01:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
11:35:00:WU01:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
11:35:00:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
Does anybody know what I could do here? Unfortunately, my machine currently only gets the p14520.

Kind regards,
Steffen
foldy
Posts: 2061
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: Problem with p14520

Post by foldy »

Maybe it helps to reduce number of CPU threads to 14 or even lower number?
steffenmoser
Posts: 6
Joined: Fri Nov 07, 2014 10:10 am

Re: Problem with p14520

Post by steffenmoser »

foldy wrote:Maybe it helps to reduce number of CPU threads to 14 or even lower number?
Yes, it helped. Thank you very much. It then even reduced it further automatically:

Code: Select all

15:17:25:WU01:FS00:0xa7:Reducing thread count from 14 to 13 to avoid domain decomposition with large prime factor 7
15:17:25:WU01:FS00:0xa7:Reducing thread count from 13 to 12 to avoid domain decomposition by a prime number > 3
15:17:25:WU01:FS00:0xa7:Calling: mdrun -s frame7.tpr -o frame7.trr -x frame7.xtc -cpt 15 -nt 12
I am just thinking whether it should be better to define two CPU slots that don't span two sockets on a NUMA architecture.

Kind regards,
Steffen
JimboPalmer
Posts: 2573
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: Problem with p14520

Post by JimboPalmer »

The issue is 'large' primes and their multiples.
your rig has 16 threads (F@H will call them CPUs) but for some reason is only using 15 of them. (It is possible you have 'helpfully' edited out all mention of your GPU, it is possible you are running on Medium instead of Full)
15 is 5 x 3 and it is detected as 'large' in the WU, but not in the client. Setting it at 14 (7x 2) is so obviously 'large' that the client recognizes it and adjusts to 12.

If you have a supported GPU, a CPU slot of 12 and a CPU slot of 3 might make the most Points Per Day. (You have no way of forcing two CPU slots of 6 to divide among chips)

If you do not fold on a GPU, manually setting 16 CPUs in a single slot is even better.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
toTOW
Site Moderator
Posts: 6309
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Problem with p14520

Post by toTOW »

I've notified the researcher in charge of this project about the issue with 15 threads on this project.

If you're not feeding a GPU on this system, you'll have less troubles by setting the client to use all your 16 threads ...
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Post Reply