Certain WUs keep erroring with high thread numbers

Moderators: Site Moderators, FAHC Science Team

Certain WUs keep erroring with high thread numbers

Postby Celmor » Mon May 18, 2020 6:16 pm

I've tried thread numbers of 28, 30, 32 and even with 2*16 (with 1 slot affected) but after some amount of time when being served a particular WU it keeps SEGFAULTing (and creating dumps) with these errors in the log:
16:28:28:WU00:FS00:0xa7:ERROR:
16:28:28:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
16:28:28:WU00:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
16:28:28:WU00:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
16:28:28:WU00:FS00:0xa7:ERROR:
16:28:28:WU00:FS00:0xa7:ERROR:Fatal error:
16:28:28:WU00:FS00:0xa7:ERROR:There is no domain decomposition for 20 ranks that is compatible with the given box and a minimum cell size of 1.37225 nm
16:28:28:WU00:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
16:28:28:WU00:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
16:28:28:WU00:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
16:28:28:WU00:FS00:0xa7:ERROR:website at <cut>
16:28:28:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
16:28:33:WU00:FS00:0xa7:WARNING:Unexpected exit() call
16:28:33:WU00:FS00:0xa7:WARNING:Unexpected exit from science code


I have a 16 core/32 thread processor (AMD R9 3950X) and like to keep a few cores free for the system but don't want to go as low as 24 threads for FAH or partition the cores off into slots because of the penalty (faster returned WUs account for more points). Nothing is overclocked (CPU running at 4GHz), and there's no over-heating problem or other instability issues from what I can tell. Have been experiencing this problem for over half a month when I started folding on this hardware. Therefore the problem should be caused by the number of assigned CPU threads.
I have read the troubleshooting guide which contained my error code and stated that thread numbers of a power of 2 should be safed but as I experienced problems with 16 and 32 thread counts this did not help.

Thanks in advance for any pointers.

Full log:
Code: Select all
16:20:27:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/private/fah/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 706 -lifeline 334756 -checkpoint 15 -np 28
16:20:27:WU00:FS00:Started FahCore on PID 2022548
16:20:27:WU00:FS00:Core PID:2022552
16:20:27:WU00:FS00:FahCore 0xa7 started
16:20:28:WU00:FS00:0xa7:*********************** Log Started 2020-05-18T16:20:27Z ***********************
16:20:28:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
16:20:28:WU00:FS00:0xa7:       Type: 0xa7
16:20:28:WU00:FS00:0xa7:       Core: Gromacs
16:20:28:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 706 -lifeline 2022548 -checkpoint 15
16:20:28:WU00:FS00:0xa7:             -np 28
16:20:28:WU00:FS00:0xa7:************************************ CBang *************************************
16:20:28:WU00:FS00:0xa7:       Date: Nov 5 2019
16:20:28:WU00:FS00:0xa7:       Time: 06:06:57
16:20:28:WU00:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
16:20:28:WU00:FS00:0xa7:     Branch: master
16:20:28:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
16:20:28:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
16:20:28:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
16:20:28:WU00:FS00:0xa7:       Bits: 64
16:20:28:WU00:FS00:0xa7:       Mode: Release
16:20:28:WU00:FS00:0xa7:************************************ System ************************************
16:20:28:WU00:FS00:0xa7:        CPU: AMD Ryzen 9 3950X 16-Core Processor
16:20:28:WU00:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
16:20:28:WU00:FS00:0xa7:       CPUs: 32
16:20:28:WU00:FS00:0xa7:     Memory: 47.06GiB
16:20:28:WU00:FS00:0xa7:Free Memory: 6.54GiB
16:20:28:WU00:FS00:0xa7:    Threads: POSIX_THREADS
16:20:28:WU00:FS00:0xa7: OS Version: 5.6
16:20:28:WU00:FS00:0xa7:Has Battery: false
16:20:28:WU00:FS00:0xa7: On Battery: false
16:20:28:WU00:FS00:0xa7: UTC Offset: 2
16:20:28:WU00:FS00:0xa7:        PID: 2022552
16:20:28:WU00:FS00:0xa7:        CWD: /var/lib/private/fah/work
16:20:28:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
16:20:28:WU00:FS00:0xa7:    Version: 0.0.18
16:20:28:WU00:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
16:20:28:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
16:20:28:WU00:FS00:0xa7:   Homepage: <cut>
16:20:28:WU00:FS00:0xa7:       Date: Nov 5 2019
16:20:28:WU00:FS00:0xa7:       Time: 06:13:26
16:20:28:WU00:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
16:20:28:WU00:FS00:0xa7:     Branch: master
16:20:28:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
16:20:28:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
16:20:28:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
16:20:28:WU00:FS00:0xa7:       Bits: 64
16:20:28:WU00:FS00:0xa7:       Mode: Release
16:20:28:WU00:FS00:0xa7:************************************ Build *************************************
16:20:28:WU00:FS00:0xa7:       SIMD: avx_256
16:20:28:WU00:FS00:0xa7:********************************************************************************
16:20:28:WU00:FS00:0xa7:Project: 16427 (Run 0, Clone 2618, Gen 121)
16:20:28:WU00:FS00:0xa7:Unit: 0x0000008da8f5c67d5e924aab863769f3
16:20:28:WU00:FS00:0xa7:Reading tar file core.xml
16:20:28:WU00:FS00:0xa7:Reading tar file frame121.tpr
16:20:28:WU00:FS00:0xa7:Digital signatures verified
16:20:28:WU00:FS00:0xa7:Calling: mdrun -s frame121.tpr -o frame121.trr -x frame121.xtc -cpt 15 -nt 28
16:20:28:WU00:FS00:0xa7:Steps: first=60500000 total=500000
16:20:28:WU00:FS00:0xa7:ERROR:
16:20:28:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
16:20:28:WU00:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
16:20:28:WU00:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
16:20:28:WU00:FS00:0xa7:ERROR:
16:20:28:WU00:FS00:0xa7:ERROR:Fatal error:
16:20:28:WU00:FS00:0xa7:ERROR:There is no domain decomposition for 20 ranks that is compatible with the given box and a minimum cell size of 1.37225 nm
16:20:28:WU00:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
16:20:28:WU00:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
16:20:28:WU00:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
16:20:28:WU00:FS00:0xa7:ERROR:website at <cut>
16:20:28:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
16:20:33:WU00:FS00:0xa7:WARNING:Unexpected exit() call
16:20:33:WU00:FS00:0xa7:WARNING:Unexpected exit from science code
16:20:33:WU00:FS00:0xa7:Saving result file ../logfile_01.txt
16:20:33:WU00:FS00:0xa7:Saving result file md.log
16:20:33:WU00:FS00:0xa7:Saving result file science.log
16:20:33:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
16:21:27:WU00:FS00:Starting
16:21:27:WU00:FS00:Removing old file 'work/00/logfile_01-20200518-154927.txt'
16:21:27:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/private/fah/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 706 -lifeline 334756 -checkpoint 15 -np 28
16:21:27:WU00:FS00:Started FahCore on PID 2022637
16:21:27:WU00:FS00:Core PID:2022641
16:21:27:WU00:FS00:FahCore 0xa7 started
16:21:28:WU00:FS00:0xa7:*********************** Log Started 2020-05-18T16:21:27Z ***********************
16:21:28:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
16:21:28:WU00:FS00:0xa7:       Type: 0xa7
16:21:28:WU00:FS00:0xa7:       Core: Gromacs
16:21:28:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 706 -lifeline 2022637 -checkpoint 15
16:21:28:WU00:FS00:0xa7:             -np 28
16:21:28:WU00:FS00:0xa7:************************************ CBang *************************************
16:21:28:WU00:FS00:0xa7:       Date: Nov 5 2019
16:21:28:WU00:FS00:0xa7:       Time: 06:06:57
16:21:28:WU00:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
16:21:28:WU00:FS00:0xa7:     Branch: master
16:21:28:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
16:21:28:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
16:21:28:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
16:21:28:WU00:FS00:0xa7:       Bits: 64
16:21:28:WU00:FS00:0xa7:       Mode: Release
16:21:28:WU00:FS00:0xa7:************************************ System ************************************
16:21:28:WU00:FS00:0xa7:        CPU: AMD Ryzen 9 3950X 16-Core Processor
16:21:28:WU00:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
16:21:28:WU00:FS00:0xa7:       CPUs: 32
16:21:28:WU00:FS00:0xa7:     Memory: 47.06GiB
16:21:28:WU00:FS00:0xa7:Free Memory: 6.53GiB
16:21:28:WU00:FS00:0xa7:    Threads: POSIX_THREADS
16:21:28:WU00:FS00:0xa7: OS Version: 5.6
16:21:28:WU00:FS00:0xa7:Has Battery: false
16:21:28:WU00:FS00:0xa7: On Battery: false
16:21:28:WU00:FS00:0xa7: UTC Offset: 2
16:21:28:WU00:FS00:0xa7:        PID: 2022641
16:21:28:WU00:FS00:0xa7:        CWD: /var/lib/private/fah/work
16:21:28:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
16:21:28:WU00:FS00:0xa7:    Version: 0.0.18
16:21:28:WU00:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
16:21:28:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
16:21:28:WU00:FS00:0xa7:   Homepage: <cut>
16:21:28:WU00:FS00:0xa7:       Date: Nov 5 2019
16:21:28:WU00:FS00:0xa7:       Time: 06:13:26
16:21:28:WU00:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
16:21:28:WU00:FS00:0xa7:     Branch: master
16:21:28:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
16:21:28:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
16:21:28:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
16:21:28:WU00:FS00:0xa7:       Bits: 64
16:21:28:WU00:FS00:0xa7:       Mode: Release
16:21:28:WU00:FS00:0xa7:************************************ Build *************************************
16:21:28:WU00:FS00:0xa7:       SIMD: avx_256
16:21:28:WU00:FS00:0xa7:********************************************************************************
16:21:28:WU00:FS00:0xa7:Project: 16427 (Run 0, Clone 2618, Gen 121)
16:21:28:WU00:FS00:0xa7:Unit: 0x0000008da8f5c67d5e924aab863769f3
16:21:28:WU00:FS00:0xa7:Reading tar file core.xml
16:21:28:WU00:FS00:0xa7:Reading tar file frame121.tpr
16:21:28:WU00:FS00:0xa7:Digital signatures verified
16:21:28:WU00:FS00:0xa7:Calling: mdrun -s frame121.tpr -o frame121.trr -x frame121.xtc -cpt 15 -nt 28
16:21:28:WU00:FS00:0xa7:Steps: first=60500000 total=500000
16:21:28:WU00:FS00:0xa7:ERROR:
16:21:28:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
16:21:28:WU00:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
16:21:28:WU00:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
16:21:28:WU00:FS00:0xa7:ERROR:
16:21:28:WU00:FS00:0xa7:ERROR:Fatal error:
16:21:28:WU00:FS00:0xa7:ERROR:There is no domain decomposition for 20 ranks that is compatible with the given box and a minimum cell size of 1.37225 nm
16:21:28:WU00:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
16:21:28:WU00:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
16:21:28:WU00:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
16:21:28:WU00:FS00:0xa7:ERROR:website at <cut>
16:21:28:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
16:21:33:WU00:FS00:0xa7:WARNING:Unexpected exit() call
16:21:33:WU00:FS00:0xa7:WARNING:Unexpected exit from science code
16:21:33:WU00:FS00:0xa7:Saving result file ../logfile_01.txt
16:21:33:WU00:FS00:0xa7:Saving result file md.log
16:21:33:WU00:FS00:0xa7:Saving result file science.log
16:21:33:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
16:22:28:WU00:FS00:Starting
16:22:28:WU00:FS00:Removing old file 'work/00/logfile_01-20200518-155027.txt'
16:22:28:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/private/fah/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 706 -lifeline 334756 -checkpoint 15 -np 28
16:22:28:WU00:FS00:Started FahCore on PID 2022724
16:22:28:WU00:FS00:Core PID:2022728
16:22:28:WU00:FS00:FahCore 0xa7 started
16:22:28:WU00:FS00:0xa7:*********************** Log Started 2020-05-18T16:22:28Z ***********************
16:22:28:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
16:22:28:WU00:FS00:0xa7:       Type: 0xa7
16:22:28:WU00:FS00:0xa7:       Core: Gromacs
16:22:28:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 706 -lifeline 2022724 -checkpoint 15
16:22:28:WU00:FS00:0xa7:             -np 28
16:22:28:WU00:FS00:0xa7:************************************ CBang *************************************
16:22:28:WU00:FS00:0xa7:       Date: Nov 5 2019
16:22:28:WU00:FS00:0xa7:       Time: 06:06:57
16:22:28:WU00:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
16:22:28:WU00:FS00:0xa7:     Branch: master
16:22:28:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
16:22:28:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
16:22:28:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
16:22:28:WU00:FS00:0xa7:       Bits: 64
16:22:28:WU00:FS00:0xa7:       Mode: Release
16:22:28:WU00:FS00:0xa7:************************************ System ************************************
16:22:28:WU00:FS00:0xa7:        CPU: AMD Ryzen 9 3950X 16-Core Processor
16:22:28:WU00:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
16:22:28:WU00:FS00:0xa7:       CPUs: 32
16:22:28:WU00:FS00:0xa7:     Memory: 47.06GiB
16:22:28:WU00:FS00:0xa7:Free Memory: 6.60GiB
16:22:28:WU00:FS00:0xa7:    Threads: POSIX_THREADS
16:22:28:WU00:FS00:0xa7: OS Version: 5.6
16:22:28:WU00:FS00:0xa7:Has Battery: false
16:22:28:WU00:FS00:0xa7: On Battery: false
16:22:28:WU00:FS00:0xa7: UTC Offset: 2
16:22:28:WU00:FS00:0xa7:        PID: 2022728
16:22:28:WU00:FS00:0xa7:        CWD: /var/lib/private/fah/work
16:22:28:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
16:22:28:WU00:FS00:0xa7:    Version: 0.0.18
16:22:28:WU00:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
16:22:28:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
16:22:28:WU00:FS00:0xa7:   Homepage: <cut>
16:22:28:WU00:FS00:0xa7:       Date: Nov 5 2019
16:22:28:WU00:FS00:0xa7:       Time: 06:13:26
16:22:28:WU00:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
16:22:28:WU00:FS00:0xa7:     Branch: master
16:22:28:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
16:22:28:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
16:22:28:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
16:22:28:WU00:FS00:0xa7:       Bits: 64
16:22:28:WU00:FS00:0xa7:       Mode: Release
16:22:28:WU00:FS00:0xa7:************************************ Build *************************************
16:22:28:WU00:FS00:0xa7:       SIMD: avx_256
16:22:28:WU00:FS00:0xa7:********************************************************************************
16:22:28:WU00:FS00:0xa7:Project: 16427 (Run 0, Clone 2618, Gen 121)
16:22:28:WU00:FS00:0xa7:Unit: 0x0000008da8f5c67d5e924aab863769f3
16:22:28:WU00:FS00:0xa7:Reading tar file core.xml
16:22:28:WU00:FS00:0xa7:Reading tar file frame121.tpr
16:22:28:WU00:FS00:0xa7:Digital signatures verified
16:22:28:WU00:FS00:0xa7:Calling: mdrun -s frame121.tpr -o frame121.trr -x frame121.xtc -cpt 15 -nt 28
16:22:28:WU00:FS00:0xa7:Steps: first=60500000 total=500000
16:22:28:WU00:FS00:0xa7:ERROR:
16:22:28:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
16:22:28:WU00:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
16:22:28:WU00:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
16:22:28:WU00:FS00:0xa7:ERROR:
16:22:28:WU00:FS00:0xa7:ERROR:Fatal error:
16:22:28:WU00:FS00:0xa7:ERROR:There is no domain decomposition for 20 ranks that is compatible with the given box and a minimum cell size of 1.37225 nm
16:22:28:WU00:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
16:22:28:WU00:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
16:22:28:WU00:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
16:22:28:WU00:FS00:0xa7:ERROR:website at <cut>
16:22:28:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
16:22:33:WU00:FS00:0xa7:WARNING:Unexpected exit() call
16:22:33:WU00:FS00:0xa7:WARNING:Unexpected exit from science code
16:22:33:WU00:FS00:0xa7:Saving result file ../logfile_01.txt
16:22:33:WU00:FS00:0xa7:Saving result file md.log
16:22:33:WU00:FS00:0xa7:Saving result file science.log
16:22:33:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
16:23:28:WU00:FS00:Starting
16:23:28:WU00:FS00:Removing old file 'work/00/logfile_01-20200518-155127.txt'
16:23:28:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/private/fah/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 706 -lifeline 334756 -checkpoint 15 -np 28
16:23:28:WU00:FS00:Started FahCore on PID 2022811
16:23:28:WU00:FS00:Core PID:2022815
16:23:28:WU00:FS00:FahCore 0xa7 started
16:23:28:WU00:FS00:0xa7:*********************** Log Started 2020-05-18T16:23:28Z ***********************
16:23:28:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
16:23:28:WU00:FS00:0xa7:       Type: 0xa7
16:23:28:WU00:FS00:0xa7:       Core: Gromacs
16:23:28:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 706 -lifeline 2022811 -checkpoint 15
16:23:28:WU00:FS00:0xa7:             -np 28
16:23:28:WU00:FS00:0xa7:************************************ CBang *************************************
16:23:28:WU00:FS00:0xa7:       Date: Nov 5 2019
16:23:28:WU00:FS00:0xa7:       Time: 06:06:57
16:23:28:WU00:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
16:23:28:WU00:FS00:0xa7:     Branch: master
16:23:28:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
16:23:28:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
16:23:28:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
16:23:28:WU00:FS00:0xa7:       Bits: 64
16:23:28:WU00:FS00:0xa7:       Mode: Release
16:23:28:WU00:FS00:0xa7:************************************ System ************************************
16:23:28:WU00:FS00:0xa7:        CPU: AMD Ryzen 9 3950X 16-Core Processor
16:23:28:WU00:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
16:23:28:WU00:FS00:0xa7:       CPUs: 32
16:23:28:WU00:FS00:0xa7:     Memory: 47.06GiB
16:23:28:WU00:FS00:0xa7:Free Memory: 6.65GiB
16:23:28:WU00:FS00:0xa7:    Threads: POSIX_THREADS
16:23:28:WU00:FS00:0xa7: OS Version: 5.6
16:23:28:WU00:FS00:0xa7:Has Battery: false
16:23:28:WU00:FS00:0xa7: On Battery: false
16:23:28:WU00:FS00:0xa7: UTC Offset: 2
16:23:28:WU00:FS00:0xa7:        PID: 2022815
16:23:28:WU00:FS00:0xa7:        CWD: /var/lib/private/fah/work
16:23:28:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
16:23:28:WU00:FS00:0xa7:    Version: 0.0.18
16:23:28:WU00:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
16:23:28:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
16:23:28:WU00:FS00:0xa7:   Homepage: <cut>
16:23:28:WU00:FS00:0xa7:       Date: Nov 5 2019
16:23:28:WU00:FS00:0xa7:       Time: 06:13:26
16:23:28:WU00:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
16:23:28:WU00:FS00:0xa7:     Branch: master
16:23:28:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
16:23:28:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
16:23:28:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
16:23:28:WU00:FS00:0xa7:       Bits: 64
16:23:28:WU00:FS00:0xa7:       Mode: Release
16:23:28:WU00:FS00:0xa7:************************************ Build *************************************
16:23:28:WU00:FS00:0xa7:       SIMD: avx_256
16:23:28:WU00:FS00:0xa7:********************************************************************************
16:23:28:WU00:FS00:0xa7:Project: 16427 (Run 0, Clone 2618, Gen 121)
16:23:28:WU00:FS00:0xa7:Unit: 0x0000008da8f5c67d5e924aab863769f3
16:23:28:WU00:FS00:0xa7:Reading tar file core.xml
16:23:28:WU00:FS00:0xa7:Reading tar file frame121.tpr
16:23:28:WU00:FS00:0xa7:Digital signatures verified
16:23:28:WU00:FS00:0xa7:Calling: mdrun -s frame121.tpr -o frame121.trr -x frame121.xtc -cpt 15 -nt 28
16:23:28:WU00:FS00:0xa7:Steps: first=60500000 total=500000
16:23:28:WU00:FS00:0xa7:ERROR:
16:23:28:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
16:23:28:WU00:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
16:23:28:WU00:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
16:23:28:WU00:FS00:0xa7:ERROR:
16:23:28:WU00:FS00:0xa7:ERROR:Fatal error:
16:23:28:WU00:FS00:0xa7:ERROR:There is no domain decomposition for 20 ranks that is compatible with the given box and a minimum cell size of 1.37225 nm
16:23:28:WU00:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
16:23:28:WU00:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
16:23:28:WU00:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
16:23:28:WU00:FS00:0xa7:ERROR:website at <cut>
16:23:28:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
16:23:33:WU00:FS00:0xa7:WARNING:Unexpected exit() call
16:23:33:WU00:FS00:0xa7:WARNING:Unexpected exit from science code
16:23:33:WU00:FS00:0xa7:Saving result file ../logfile_01.txt
16:23:33:WU00:FS00:0xa7:Saving result file md.log
16:23:33:WU00:FS00:0xa7:Saving result file science.log
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:0xa7:Caught signal SIGSEGV(11) on PID 2022815
16:23:33:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
16:24:28:WU00:FS00:Starting
16:24:28:WU00:FS00:Removing old file 'work/00/logfile_01-20200518-155227.txt'
16:24:28:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/private/fah/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 706 -lifeline 334756 -checkpoint 15 -np 28
16:24:28:WU00:FS00:Started FahCore on PID 2022973
16:24:28:WU00:FS00:Core PID:2022977
16:24:28:WU00:FS00:FahCore 0xa7 started
16:24:28:WU00:FS00:0xa7:*********************** Log Started 2020-05-18T16:24:28Z ***********************
16:24:28:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
16:24:28:WU00:FS00:0xa7:       Type: 0xa7
16:24:28:WU00:FS00:0xa7:       Core: Gromacs
16:24:28:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 706 -lifeline 2022973 -checkpoint 15
16:24:28:WU00:FS00:0xa7:             -np 28
16:24:28:WU00:FS00:0xa7:************************************ CBang *************************************
16:24:28:WU00:FS00:0xa7:       Date: Nov 5 2019
16:24:28:WU00:FS00:0xa7:       Time: 06:06:57
16:24:28:WU00:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
16:24:28:WU00:FS00:0xa7:     Branch: master
16:24:28:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
16:24:28:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
16:24:28:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
16:24:28:WU00:FS00:0xa7:       Bits: 64
16:24:28:WU00:FS00:0xa7:       Mode: Release
16:24:28:WU00:FS00:0xa7:************************************ System ************************************
16:24:28:WU00:FS00:0xa7:        CPU: AMD Ryzen 9 3950X 16-Core Processor
16:24:28:WU00:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
16:24:28:WU00:FS00:0xa7:       CPUs: 32
16:24:28:WU00:FS00:0xa7:     Memory: 47.06GiB
16:24:28:WU00:FS00:0xa7:Free Memory: 6.60GiB
16:24:28:WU00:FS00:0xa7:    Threads: POSIX_THREADS
16:24:28:WU00:FS00:0xa7: OS Version: 5.6
16:24:28:WU00:FS00:0xa7:Has Battery: false
16:24:28:WU00:FS00:0xa7: On Battery: false
16:24:28:WU00:FS00:0xa7: UTC Offset: 2
16:24:28:WU00:FS00:0xa7:        PID: 2022977
16:24:28:WU00:FS00:0xa7:        CWD: /var/lib/private/fah/work
16:24:28:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
16:24:28:WU00:FS00:0xa7:    Version: 0.0.18
16:24:28:WU00:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
16:24:28:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
16:24:28:WU00:FS00:0xa7:   Homepage: <cut>
16:24:28:WU00:FS00:0xa7:       Date: Nov 5 2019
16:24:28:WU00:FS00:0xa7:       Time: 06:13:26
16:24:28:WU00:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
16:24:28:WU00:FS00:0xa7:     Branch: master
16:24:28:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
16:24:28:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
16:24:28:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
16:24:28:WU00:FS00:0xa7:       Bits: 64
16:24:28:WU00:FS00:0xa7:       Mode: Release
16:24:28:WU00:FS00:0xa7:************************************ Build *************************************
16:24:28:WU00:FS00:0xa7:       SIMD: avx_256
16:24:28:WU00:FS00:0xa7:********************************************************************************
16:24:28:WU00:FS00:0xa7:Project: 16427 (Run 0, Clone 2618, Gen 121)
16:24:28:WU00:FS00:0xa7:Unit: 0x0000008da8f5c67d5e924aab863769f3
16:24:28:WU00:FS00:0xa7:Reading tar file core.xml
16:24:28:WU00:FS00:0xa7:Reading tar file frame121.tpr
16:24:28:WU00:FS00:0xa7:Digital signatures verified
16:24:28:WU00:FS00:0xa7:Calling: mdrun -s frame121.tpr -o frame121.trr -x frame121.xtc -cpt 15 -nt 28
16:24:28:WU00:FS00:0xa7:Steps: first=60500000 total=500000
16:24:28:WU00:FS00:0xa7:ERROR:
16:24:28:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
16:24:28:WU00:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
16:24:28:WU00:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
16:24:28:WU00:FS00:0xa7:ERROR:
16:24:28:WU00:FS00:0xa7:ERROR:Fatal error:
16:24:28:WU00:FS00:0xa7:ERROR:There is no domain decomposition for 20 ranks that is compatible with the given box and a minimum cell size of 1.37225 nm
16:24:28:WU00:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
16:24:28:WU00:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
16:24:28:WU00:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
16:24:28:WU00:FS00:0xa7:ERROR:website at <cut>
16:24:28:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
16:24:33:WU00:FS00:0xa7:WARNING:Unexpected exit() call
16:24:33:WU00:FS00:0xa7:WARNING:Unexpected exit from science code
16:24:33:WU00:FS00:0xa7:Saving result file ../logfile_01.txt
16:24:33:WU00:FS00:0xa7:Saving result file md.log
16:24:33:WU00:FS00:0xa7:Saving result file science.log
16:24:33:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
16:25:28:WU00:FS00:Starting
16:25:28:WU00:FS00:Removing old file 'work/00/logfile_01-20200518-155327.txt'
16:25:28:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/private/fah/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 706 -lifeline 334756 -checkpoint 15 -np 28
16:25:28:WU00:FS00:Started FahCore on PID 2023083
16:25:28:WU00:FS00:Core PID:2023087
16:25:28:WU00:FS00:FahCore 0xa7 started
16:25:28:WU00:FS00:0xa7:*********************** Log Started 2020-05-18T16:25:28Z ***********************
16:25:28:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
16:25:28:WU00:FS00:0xa7:       Type: 0xa7
16:25:28:WU00:FS00:0xa7:       Core: Gromacs
16:25:28:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 706 -lifeline 2023083 -checkpoint 15
16:25:28:WU00:FS00:0xa7:             -np 28
16:25:28:WU00:FS00:0xa7:************************************ CBang *************************************
16:25:28:WU00:FS00:0xa7:       Date: Nov 5 2019
16:25:28:WU00:FS00:0xa7:       Time: 06:06:57
16:25:28:WU00:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
16:25:28:WU00:FS00:0xa7:     Branch: master
16:25:28:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
16:25:28:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
16:25:28:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
16:25:28:WU00:FS00:0xa7:       Bits: 64
16:25:28:WU00:FS00:0xa7:       Mode: Release
16:25:28:WU00:FS00:0xa7:************************************ System ************************************
16:25:28:WU00:FS00:0xa7:        CPU: AMD Ryzen 9 3950X 16-Core Processor
16:25:28:WU00:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
16:25:28:WU00:FS00:0xa7:       CPUs: 32
16:25:28:WU00:FS00:0xa7:     Memory: 47.06GiB
16:25:28:WU00:FS00:0xa7:Free Memory: 6.53GiB
16:25:28:WU00:FS00:0xa7:    Threads: POSIX_THREADS
16:25:28:WU00:FS00:0xa7: OS Version: 5.6
16:25:28:WU00:FS00:0xa7:Has Battery: false
16:25:28:WU00:FS00:0xa7: On Battery: false
16:25:28:WU00:FS00:0xa7: UTC Offset: 2
16:25:28:WU00:FS00:0xa7:        PID: 2023087
16:25:28:WU00:FS00:0xa7:        CWD: /var/lib/private/fah/work
16:25:28:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
16:25:28:WU00:FS00:0xa7:    Version: 0.0.18
16:25:28:WU00:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
16:25:28:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
16:25:28:WU00:FS00:0xa7:   Homepage: <cut>
16:25:28:WU00:FS00:0xa7:       Date: Nov 5 2019
16:25:28:WU00:FS00:0xa7:       Time: 06:13:26
16:25:28:WU00:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
16:25:28:WU00:FS00:0xa7:     Branch: master
16:25:28:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
16:25:28:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
16:25:28:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
16:25:28:WU00:FS00:0xa7:       Bits: 64
16:25:28:WU00:FS00:0xa7:       Mode: Release
16:25:28:WU00:FS00:0xa7:************************************ Build *************************************
16:25:28:WU00:FS00:0xa7:       SIMD: avx_256
16:25:28:WU00:FS00:0xa7:********************************************************************************
16:25:28:WU00:FS00:0xa7:Project: 16427 (Run 0, Clone 2618, Gen 121)
16:25:28:WU00:FS00:0xa7:Unit: 0x0000008da8f5c67d5e924aab863769f3
16:25:28:WU00:FS00:0xa7:Reading tar file core.xml
16:25:28:WU00:FS00:0xa7:Reading tar file frame121.tpr
16:25:28:WU00:FS00:0xa7:Digital signatures verified
16:25:28:WU00:FS00:0xa7:Calling: mdrun -s frame121.tpr -o frame121.trr -x frame121.xtc -cpt 15 -nt 28
16:25:28:WU00:FS00:0xa7:Steps: first=60500000 total=500000
16:25:28:WU00:FS00:0xa7:ERROR:
16:25:28:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
16:25:28:WU00:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
16:25:28:WU00:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
16:25:28:WU00:FS00:0xa7:ERROR:
16:25:28:WU00:FS00:0xa7:ERROR:Fatal error:
16:25:28:WU00:FS00:0xa7:ERROR:There is no domain decomposition for 20 ranks that is compatible with the given box and a minimum cell size of 1.37225 nm
16:25:28:WU00:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
16:25:28:WU00:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
16:25:28:WU00:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
16:25:28:WU00:FS00:0xa7:ERROR:website at <cut>
16:25:28:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
16:25:33:WU00:FS00:0xa7:WARNING:Unexpected exit() call
16:25:33:WU00:FS00:0xa7:WARNING:Unexpected exit from science code
16:25:33:WU00:FS00:0xa7:Saving result file ../logfile_01.txt
16:25:33:WU00:FS00:0xa7:Saving result file md.log
16:25:33:WU00:FS00:0xa7:Saving result file science.log
16:25:33:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
16:26:28:WU00:FS00:Starting
16:26:28:WU00:FS00:Removing old file 'work/00/logfile_01-20200518-155427.txt'
16:26:28:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/private/fah/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 706 -lifeline 334756 -checkpoint 15 -np 28
16:26:28:WU00:FS00:Started FahCore on PID 2023175
16:26:28:WU00:FS00:Core PID:2023179
16:26:28:WU00:FS00:FahCore 0xa7 started
16:26:28:WU00:FS00:0xa7:*********************** Log Started 2020-05-18T16:26:28Z ***********************
16:26:28:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
16:26:28:WU00:FS00:0xa7:       Type: 0xa7
16:26:28:WU00:FS00:0xa7:       Core: Gromacs
16:26:28:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 706 -lifeline 2023175 -checkpoint 15
16:26:28:WU00:FS00:0xa7:             -np 28
16:26:28:WU00:FS00:0xa7:************************************ CBang *************************************
16:26:28:WU00:FS00:0xa7:       Date: Nov 5 2019
16:26:28:WU00:FS00:0xa7:       Time: 06:06:57
16:26:28:WU00:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
16:26:28:WU00:FS00:0xa7:     Branch: master
16:26:28:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
16:26:28:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
16:26:28:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
16:26:28:WU00:FS00:0xa7:       Bits: 64
16:26:28:WU00:FS00:0xa7:       Mode: Release
16:26:28:WU00:FS00:0xa7:************************************ System ************************************
16:26:28:WU00:FS00:0xa7:        CPU: AMD Ryzen 9 3950X 16-Core Processor
16:26:28:WU00:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
16:26:28:WU00:FS00:0xa7:       CPUs: 32
16:26:28:WU00:FS00:0xa7:     Memory: 47.06GiB
16:26:28:WU00:FS00:0xa7:Free Memory: 6.59GiB
16:26:28:WU00:FS00:0xa7:    Threads: POSIX_THREADS
16:26:28:WU00:FS00:0xa7: OS Version: 5.6
16:26:28:WU00:FS00:0xa7:Has Battery: false
16:26:28:WU00:FS00:0xa7: On Battery: false
16:26:28:WU00:FS00:0xa7: UTC Offset: 2
16:26:28:WU00:FS00:0xa7:        PID: 2023179
16:26:28:WU00:FS00:0xa7:        CWD: /var/lib/private/fah/work
16:26:28:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
16:26:28:WU00:FS00:0xa7:    Version: 0.0.18
16:26:28:WU00:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
16:26:28:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
16:26:28:WU00:FS00:0xa7:   Homepage: <cut>
16:26:28:WU00:FS00:0xa7:       Date: Nov 5 2019
16:26:28:WU00:FS00:0xa7:       Time: 06:13:26
16:26:28:WU00:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
16:26:28:WU00:FS00:0xa7:     Branch: master
16:26:28:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
16:26:28:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
16:26:28:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
16:26:28:WU00:FS00:0xa7:       Bits: 64
16:26:28:WU00:FS00:0xa7:       Mode: Release
16:26:28:WU00:FS00:0xa7:************************************ Build *************************************
16:26:28:WU00:FS00:0xa7:       SIMD: avx_256
16:26:28:WU00:FS00:0xa7:********************************************************************************
16:26:28:WU00:FS00:0xa7:Project: 16427 (Run 0, Clone 2618, Gen 121)
16:26:28:WU00:FS00:0xa7:Unit: 0x0000008da8f5c67d5e924aab863769f3
16:26:28:WU00:FS00:0xa7:Reading tar file core.xml
16:26:28:WU00:FS00:0xa7:Reading tar file frame121.tpr
16:26:28:WU00:FS00:0xa7:Digital signatures verified
16:26:28:WU00:FS00:0xa7:Calling: mdrun -s frame121.tpr -o frame121.trr -x frame121.xtc -cpt 15 -nt 28
16:26:28:WU00:FS00:0xa7:Steps: first=60500000 total=500000
16:26:28:WU00:FS00:0xa7:ERROR:
16:26:28:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
16:26:28:WU00:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
16:26:28:WU00:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
16:26:28:WU00:FS00:0xa7:ERROR:
16:26:28:WU00:FS00:0xa7:ERROR:Fatal error:
16:26:28:WU00:FS00:0xa7:ERROR:There is no domain decomposition for 20 ranks that is compatible with the given box and a minimum cell size of 1.37225 nm
16:26:28:WU00:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
16:26:28:WU00:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
16:26:28:WU00:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
16:26:28:WU00:FS00:0xa7:ERROR:website at <cut>
16:26:28:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
16:26:33:WU00:FS00:0xa7:WARNING:Unexpected exit() call
16:26:33:WU00:FS00:0xa7:WARNING:Unexpected exit from science code
16:26:33:WU00:FS00:0xa7:Saving result file ../logfile_01.txt
16:26:33:WU00:FS00:0xa7:Saving result file md.log
16:26:33:WU00:FS00:0xa7:Saving result file science.log
16:26:33:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
16:27:28:WU00:FS00:Starting
16:27:28:WU00:FS00:Removing old file 'work/00/logfile_01-20200518-155527.txt'
16:27:28:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/private/fah/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 706 -lifeline 334756 -checkpoint 15 -np 28
16:27:28:WU00:FS00:Started FahCore on PID 2023260
16:27:28:WU00:FS00:Core PID:2023264
16:27:28:WU00:FS00:FahCore 0xa7 started
16:27:28:WU00:FS00:0xa7:*********************** Log Started 2020-05-18T16:27:28Z ***********************
16:27:28:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
16:27:28:WU00:FS00:0xa7:       Type: 0xa7
16:27:28:WU00:FS00:0xa7:       Core: Gromacs
16:27:28:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 706 -lifeline 2023260 -checkpoint 15
16:27:28:WU00:FS00:0xa7:             -np 28
16:27:28:WU00:FS00:0xa7:************************************ CBang *************************************
16:27:28:WU00:FS00:0xa7:       Date: Nov 5 2019
16:27:28:WU00:FS00:0xa7:       Time: 06:06:57
16:27:28:WU00:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
16:27:28:WU00:FS00:0xa7:     Branch: master
16:27:28:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
16:27:28:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
16:27:28:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
16:27:28:WU00:FS00:0xa7:       Bits: 64
16:27:28:WU00:FS00:0xa7:       Mode: Release
16:27:28:WU00:FS00:0xa7:************************************ System ************************************
16:27:28:WU00:FS00:0xa7:        CPU: AMD Ryzen 9 3950X 16-Core Processor
16:27:28:WU00:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
16:27:28:WU00:FS00:0xa7:       CPUs: 32
16:27:28:WU00:FS00:0xa7:     Memory: 47.06GiB
16:27:28:WU00:FS00:0xa7:Free Memory: 6.59GiB
16:27:28:WU00:FS00:0xa7:    Threads: POSIX_THREADS
16:27:28:WU00:FS00:0xa7: OS Version: 5.6
16:27:28:WU00:FS00:0xa7:Has Battery: false
16:27:28:WU00:FS00:0xa7: On Battery: false
16:27:28:WU00:FS00:0xa7: UTC Offset: 2
16:27:28:WU00:FS00:0xa7:        PID: 2023264
16:27:28:WU00:FS00:0xa7:        CWD: /var/lib/private/fah/work
16:27:28:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
16:27:28:WU00:FS00:0xa7:    Version: 0.0.18
16:27:28:WU00:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
16:27:28:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
16:27:28:WU00:FS00:0xa7:   Homepage: <cut>
16:27:28:WU00:FS00:0xa7:       Date: Nov 5 2019
16:27:28:WU00:FS00:0xa7:       Time: 06:13:26
16:27:28:WU00:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
16:27:28:WU00:FS00:0xa7:     Branch: master
16:27:28:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
16:27:28:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
16:27:28:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
16:27:28:WU00:FS00:0xa7:       Bits: 64
16:27:28:WU00:FS00:0xa7:       Mode: Release
16:27:28:WU00:FS00:0xa7:************************************ Build *************************************
16:27:28:WU00:FS00:0xa7:       SIMD: avx_256
16:27:28:WU00:FS00:0xa7:********************************************************************************
16:27:28:WU00:FS00:0xa7:Project: 16427 (Run 0, Clone 2618, Gen 121)
16:27:28:WU00:FS00:0xa7:Unit: 0x0000008da8f5c67d5e924aab863769f3
16:27:28:WU00:FS00:0xa7:Reading tar file core.xml
16:27:28:WU00:FS00:0xa7:Reading tar file frame121.tpr
16:27:28:WU00:FS00:0xa7:Digital signatures verified
16:27:28:WU00:FS00:0xa7:Calling: mdrun -s frame121.tpr -o frame121.trr -x frame121.xtc -cpt 15 -nt 28
16:27:28:WU00:FS00:0xa7:Steps: first=60500000 total=500000
16:27:28:WU00:FS00:0xa7:ERROR:
16:27:28:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
16:27:28:WU00:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
16:27:28:WU00:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
16:27:28:WU00:FS00:0xa7:ERROR:
16:27:28:WU00:FS00:0xa7:ERROR:Fatal error:
16:27:28:WU00:FS00:0xa7:ERROR:There is no domain decomposition for 20 ranks that is compatible with the given box and a minimum cell size of 1.37225 nm
16:27:28:WU00:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
16:27:28:WU00:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
16:27:28:WU00:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
16:27:28:WU00:FS00:0xa7:ERROR:website at <cut>
16:27:28:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
16:27:33:WU00:FS00:0xa7:WARNING:Unexpected exit() call
16:27:33:WU00:FS00:0xa7:WARNING:Unexpected exit from science code
16:27:33:WU00:FS00:0xa7:Saving result file ../logfile_01.txt
16:27:33:WU00:FS00:0xa7:Saving result file md.log
16:27:33:WU00:FS00:0xa7:Saving result file science.log
16:27:33:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
16:28:28:WU00:FS00:Starting
16:28:28:WU00:FS00:Removing old file 'work/00/logfile_01-20200518-155627.txt'
16:28:28:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/private/fah/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 706 -lifeline 334756 -checkpoint 15 -np 28
16:28:28:WU00:FS00:Started FahCore on PID 2023347
16:28:28:WU00:FS00:Core PID:2023351
16:28:28:WU00:FS00:FahCore 0xa7 started
16:28:28:WU00:FS00:0xa7:*********************** Log Started 2020-05-18T16:28:28Z ***********************
16:28:28:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
16:28:28:WU00:FS00:0xa7:       Type: 0xa7
16:28:28:WU00:FS00:0xa7:       Core: Gromacs
16:28:28:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 706 -lifeline 2023347 -checkpoint 15
16:28:28:WU00:FS00:0xa7:             -np 28
16:28:28:WU00:FS00:0xa7:************************************ CBang *************************************
16:28:28:WU00:FS00:0xa7:       Date: Nov 5 2019
16:28:28:WU00:FS00:0xa7:       Time: 06:06:57
16:28:28:WU00:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
16:28:28:WU00:FS00:0xa7:     Branch: master
16:28:28:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
16:28:28:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
16:28:28:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
16:28:28:WU00:FS00:0xa7:       Bits: 64
16:28:28:WU00:FS00:0xa7:       Mode: Release
16:28:28:WU00:FS00:0xa7:************************************ System ************************************
16:28:28:WU00:FS00:0xa7:        CPU: AMD Ryzen 9 3950X 16-Core Processor
16:28:28:WU00:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
16:28:28:WU00:FS00:0xa7:       CPUs: 32
16:28:28:WU00:FS00:0xa7:     Memory: 47.06GiB
16:28:28:WU00:FS00:0xa7:Free Memory: 6.58GiB
16:28:28:WU00:FS00:0xa7:    Threads: POSIX_THREADS
16:28:28:WU00:FS00:0xa7: OS Version: 5.6
16:28:28:WU00:FS00:0xa7:Has Battery: false
16:28:28:WU00:FS00:0xa7: On Battery: false
16:28:28:WU00:FS00:0xa7: UTC Offset: 2
16:28:28:WU00:FS00:0xa7:        PID: 2023351
16:28:28:WU00:FS00:0xa7:        CWD: /var/lib/private/fah/work
16:28:28:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
16:28:28:WU00:FS00:0xa7:    Version: 0.0.18
16:28:28:WU00:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
16:28:28:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
16:28:28:WU00:FS00:0xa7:   Homepage: <cut>
16:28:28:WU00:FS00:0xa7:       Date: Nov 5 2019
16:28:28:WU00:FS00:0xa7:       Time: 06:13:26
16:28:28:WU00:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
16:28:28:WU00:FS00:0xa7:     Branch: master
16:28:28:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
16:28:28:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
16:28:28:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
16:28:28:WU00:FS00:0xa7:       Bits: 64
16:28:28:WU00:FS00:0xa7:       Mode: Release
16:28:28:WU00:FS00:0xa7:************************************ Build *************************************
16:28:28:WU00:FS00:0xa7:       SIMD: avx_256
16:28:28:WU00:FS00:0xa7:********************************************************************************
16:28:28:WU00:FS00:0xa7:Project: 16427 (Run 0, Clone 2618, Gen 121)

(because of the stupid posting urls I had to cut out the URLs in the log, replacing it by "<cut>")
Celmor
 
Posts: 4
Joined: Mon May 18, 2020 5:39 pm

Re: Certain WUs keep erroring with high thread numbers

Postby Celmor » Mon May 18, 2020 6:27 pm

Rest of the log (which I couldn't post in OP because of the character limit):

Code: Select all
16:28:28:WU00:FS00:0xa7:Unit: 0x0000008da8f5c67d5e924aab863769f3
16:28:28:WU00:FS00:0xa7:Reading tar file core.xml
16:28:28:WU00:FS00:0xa7:Reading tar file frame121.tpr
16:28:28:WU00:FS00:0xa7:Digital signatures verified
16:28:28:WU00:FS00:0xa7:Calling: mdrun -s frame121.tpr -o frame121.trr -x frame121.xtc -cpt 15 -nt 28
16:28:28:WU00:FS00:0xa7:Steps: first=60500000 total=500000
16:28:28:WU00:FS00:0xa7:ERROR:
16:28:28:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
16:28:28:WU00:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
16:28:28:WU00:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
16:28:28:WU00:FS00:0xa7:ERROR:
16:28:28:WU00:FS00:0xa7:ERROR:Fatal error:
16:28:28:WU00:FS00:0xa7:ERROR:There is no domain decomposition for 20 ranks that is compatible with the given box and a minimum cell size of 1.37225 nm
16:28:28:WU00:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
16:28:28:WU00:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
16:28:28:WU00:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
16:28:28:WU00:FS00:0xa7:ERROR:website at <cut>
16:28:28:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
16:28:33:WU00:FS00:0xa7:WARNING:Unexpected exit() call
16:28:33:WU00:FS00:0xa7:WARNING:Unexpected exit from science code
16:28:33:WU00:FS00:0xa7:Saving result file ../logfile_01.txt
16:28:33:WU00:FS00:0xa7:Saving result file md.log
16:28:33:WU00:FS00:0xa7:Saving result file science.log
16:28:33:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
16:29:08:Adding folding slot 01: READY cpu:24
16:29:08:Removing old file 'configs/config-20200518-040834.xml'
16:29:08:Saving configuration to /etc/foldingathome/config.xml
16:29:08:<config>
16:29:08:  <!-- Folding Slot Configuration -->
16:29:08:  <cause v='COVID_19'/>
16:29:08:  <gpu v='false'/>
16:29:08:
16:29:08:  <!-- Network -->
16:29:08:  <proxy v=':8080'/>
16:29:08:
16:29:08:  <!-- Slot Control -->
16:29:08:  <pause-on-battery v='false'/>
16:29:08:  <power v='full'/>
16:29:08:
16:29:08:  <!-- User Information -->
16:29:08:  <passkey v='*****'/>
16:29:08:  <team v='223518'/>
16:29:08:  <user v='celmor'/>
16:29:08:
16:29:08:  <!-- Folding Slots -->
16:29:08:  <slot id='1' type='CPU'>
16:29:08:    <cpus v='24'/>
16:29:08:  </slot>
16:29:08:</config>
16:29:08:WARNING:WU00:FS00:Slot ID 0 no longer exists, migrating to FS01
16:29:28:WU00:FS01:Starting
16:29:28:WARNING:WU00:FS01:Changed SMP threads from 28 to 24 this can cause some work units to fail
16:29:28:WU00:FS01:Removing old file 'work/00/logfile_01-20200518-155727.txt'
16:29:28:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/private/fah/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 706 -lifeline 334756 -checkpoint 15 -np 24
16:29:28:WU00:FS01:Started FahCore on PID 2023439
16:29:28:WU00:FS01:Core PID:2023443
16:29:28:WU00:FS01:FahCore 0xa7 started
16:29:28:WU00:FS01:0xa7:*********************** Log Started 2020-05-18T16:29:28Z ***********************
16:29:28:WU00:FS01:0xa7:************************** Gromacs Folding@home Core ***************************
16:29:28:WU00:FS01:0xa7:       Type: 0xa7
16:29:28:WU00:FS01:0xa7:       Core: Gromacs
16:29:28:WU00:FS01:0xa7:       Args: -dir 00 -suffix 01 -version 706 -lifeline 2023439 -checkpoint 15
16:29:28:WU00:FS01:0xa7:             -np 24
16:29:28:WU00:FS01:0xa7:************************************ CBang *************************************
16:29:28:WU00:FS01:0xa7:       Date: Nov 5 2019
16:29:28:WU00:FS01:0xa7:       Time: 06:06:57
16:29:28:WU00:FS01:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
16:29:28:WU00:FS01:0xa7:     Branch: master
16:29:28:WU00:FS01:0xa7:   Compiler: GNU 8.3.0
16:29:28:WU00:FS01:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
16:29:28:WU00:FS01:0xa7:   Platform: linux2 4.19.0-5-amd64
16:29:28:WU00:FS01:0xa7:       Bits: 64
16:29:28:WU00:FS01:0xa7:       Mode: Release
16:29:28:WU00:FS01:0xa7:************************************ System ************************************
16:29:28:WU00:FS01:0xa7:        CPU: AMD Ryzen 9 3950X 16-Core Processor
16:29:28:WU00:FS01:0xa7:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
16:29:28:WU00:FS01:0xa7:       CPUs: 32
16:29:28:WU00:FS01:0xa7:     Memory: 47.06GiB
16:29:28:WU00:FS01:0xa7:Free Memory: 6.56GiB
16:29:28:WU00:FS01:0xa7:    Threads: POSIX_THREADS
16:29:28:WU00:FS01:0xa7: OS Version: 5.6
16:29:28:WU00:FS01:0xa7:Has Battery: false
16:29:28:WU00:FS01:0xa7: On Battery: false
16:29:28:WU00:FS01:0xa7: UTC Offset: 2
16:29:28:WU00:FS01:0xa7:        PID: 2023443
16:29:28:WU00:FS01:0xa7:        CWD: /var/lib/private/fah/work
16:29:28:WU00:FS01:0xa7:******************************** Build - libFAH ********************************
16:29:28:WU00:FS01:0xa7:    Version: 0.0.18
16:29:28:WU00:FS01:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
16:29:28:WU00:FS01:0xa7:  Copyright: 2019 foldingathome.org
16:29:28:WU00:FS01:0xa7:   Homepage: <cut>
16:29:28:WU00:FS01:0xa7:       Date: Nov 5 2019
16:29:28:WU00:FS01:0xa7:       Time: 06:13:26
16:29:28:WU00:FS01:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
16:29:28:WU00:FS01:0xa7:     Branch: master
16:29:28:WU00:FS01:0xa7:   Compiler: GNU 8.3.0
16:29:28:WU00:FS01:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
16:29:28:WU00:FS01:0xa7:   Platform: linux2 4.19.0-5-amd64
16:29:28:WU00:FS01:0xa7:       Bits: 64
16:29:28:WU00:FS01:0xa7:       Mode: Release
16:29:28:WU00:FS01:0xa7:************************************ Build *************************************
16:29:28:WU00:FS01:0xa7:       SIMD: avx_256
16:29:28:WU00:FS01:0xa7:********************************************************************************
16:29:28:WU00:FS01:0xa7:Project: 16427 (Run 0, Clone 2618, Gen 121)
16:29:28:WU00:FS01:0xa7:Unit: 0x0000008da8f5c67d5e924aab863769f3
16:29:28:WU00:FS01:0xa7:Reading tar file core.xml
16:29:28:WU00:FS01:0xa7:Reading tar file frame121.tpr
16:29:28:WU00:FS01:0xa7:Digital signatures verified
16:29:28:WU00:FS01:0xa7:Calling: mdrun -s frame121.tpr -o frame121.trr -x frame121.xtc -cpt 15 -nt 24
16:29:28:WU00:FS01:0xa7:Steps: first=60500000 total=500000
16:29:29:WU00:FS01:0xa7:Completed 1 out of 500000 steps (0%)
16:29:43:Removing old file 'configs/config-20200518-051742.xml'
16:29:43:Saving configuration to /etc/foldingathome/config.xml
16:29:43:<config>
16:29:43:  <!-- Folding Slot Configuration -->
16:29:43:  <cause v='COVID_19'/>
16:29:43:  <gpu v='false'/>
16:29:43:
16:29:43:  <!-- Network -->
16:29:43:  <proxy v=':8080'/>
16:29:43:
16:29:43:  <!-- Slot Control -->
16:29:43:  <pause-on-battery v='false'/>
16:29:43:  <power v='full'/>
16:29:43:
16:29:43:  <!-- User Information -->
16:29:43:  <passkey v='*****'/>
16:29:43:  <team v='223518'/>
16:29:43:  <user v='celmor'/>
16:29:43:
16:29:43:  <!-- Folding Slots -->
16:29:43:  <slot id='1' type='CPU'>
16:29:43:    <cpus v='24'/>
16:29:43:  </slot>
16:29:43:</config>
16:29:56:WU00:FS01:0xa7:Completed 5000 out of 500000 steps (1%)
16:30:25:WU00:FS01:0xa7:Completed 10000 out of 500000 steps (2%)
16:30:52:WU00:FS01:0xa7:Completed 15000 out of 500000 steps (3%)
16:31:21:WU00:FS01:0xa7:Completed 20000 out of 500000 steps (4%)
16:31:49:WU00:FS01:0xa7:Completed 25000 out of 500000 steps (5%)
16:32:16:WU00:FS01:0xa7:Completed 30000 out of 500000 steps (6%)
16:32:45:WU00:FS01:0xa7:Completed 35000 out of 500000 steps (7%)
16:33:13:WU00:FS01:0xa7:Completed 40000 out of 500000 steps (8%)
16:33:41:WU00:FS01:0xa7:Completed 45000 out of 500000 steps (9%)
16:34:09:WU00:FS01:0xa7:Completed 50000 out of 500000 steps (10%)
16:34:38:WU00:FS01:0xa7:Completed 55000 out of 500000 steps (11%)
16:35:06:WU00:FS01:0xa7:Completed 60000 out of 500000 steps (12%)
16:35:34:WU00:FS01:0xa7:Completed 65000 out of 500000 steps (13%)
16:36:03:WU00:FS01:0xa7:Completed 70000 out of 500000 steps (14%)
16:36:33:WU00:FS01:0xa7:Completed 75000 out of 500000 steps (15%)
16:37:01:WU00:FS01:0xa7:Completed 80000 out of 500000 steps (16%)
16:37:29:WU00:FS01:0xa7:Completed 85000 out of 500000 steps (17%)
16:37:57:WU00:FS01:0xa7:Completed 90000 out of 500000 steps (18%)
16:38:25:WU00:FS01:0xa7:Completed 95000 out of 500000 steps (19%)
16:38:54:WU00:FS01:0xa7:Completed 100000 out of 500000 steps (20%)
16:39:23:WU00:FS01:0xa7:Completed 105000 out of 500000 steps (21%)
16:39:52:WU00:FS01:0xa7:Completed 110000 out of 500000 steps (22%)
16:40:21:WU00:FS01:0xa7:Completed 115000 out of 500000 steps (23%)
16:40:50:WU00:FS01:0xa7:Completed 120000 out of 500000 steps (24%)
16:41:20:WU00:FS01:0xa7:Completed 125000 out of 500000 steps (25%)
16:41:48:WU00:FS01:0xa7:Completed 130000 out of 500000 steps (26%)
16:42:18:WU00:FS01:0xa7:Completed 135000 out of 500000 steps (27%)
16:42:48:WU00:FS01:0xa7:Completed 140000 out of 500000 steps (28%)
16:43:16:WU00:FS01:0xa7:Completed 145000 out of 500000 steps (29%)
16:43:44:WU00:FS01:0xa7:Completed 150000 out of 500000 steps (30%)
16:44:13:WU00:FS01:0xa7:Completed 155000 out of 500000 steps (31%)
16:44:42:WU00:FS01:0xa7:Completed 160000 out of 500000 steps (32%)
16:45:11:WU00:FS01:0xa7:Completed 165000 out of 500000 steps (33%)
16:45:41:WU00:FS01:0xa7:Completed 170000 out of 500000 steps (34%)
16:46:10:WU00:FS01:0xa7:Completed 175000 out of 500000 steps (35%)
16:46:39:WU00:FS01:0xa7:Completed 180000 out of 500000 steps (36%)
16:47:09:WU00:FS01:0xa7:Completed 185000 out of 500000 steps (37%)
16:47:42:WU00:FS01:0xa7:Completed 190000 out of 500000 steps (38%)
16:48:13:WU00:FS01:0xa7:Completed 195000 out of 500000 steps (39%)
16:48:44:WU00:FS01:0xa7:Completed 200000 out of 500000 steps (40%)
16:49:15:WU00:FS01:0xa7:Completed 205000 out of 500000 steps (41%)
16:49:46:WU00:FS01:0xa7:Completed 210000 out of 500000 steps (42%)
16:50:17:WU00:FS01:0xa7:Completed 215000 out of 500000 steps (43%)
16:50:49:WU00:FS01:0xa7:Completed 220000 out of 500000 steps (44%)
16:51:23:WU00:FS01:0xa7:Completed 225000 out of 500000 steps (45%)
16:51:53:WU00:FS01:0xa7:Completed 230000 out of 500000 steps (46%)
16:52:23:WU00:FS01:0xa7:Completed 235000 out of 500000 steps (47%)
16:52:52:WU00:FS01:0xa7:Completed 240000 out of 500000 steps (48%)
16:53:22:WU00:FS01:0xa7:Completed 245000 out of 500000 steps (49%)
16:53:51:WU00:FS01:0xa7:Completed 250000 out of 500000 steps (50%)
16:54:20:WU00:FS01:0xa7:Completed 255000 out of 500000 steps (51%)
16:54:49:WU00:FS01:0xa7:Completed 260000 out of 500000 steps (52%)
16:55:18:WU00:FS01:0xa7:Completed 265000 out of 500000 steps (53%)
16:55:47:WU00:FS01:0xa7:Completed 270000 out of 500000 steps (54%)
16:56:17:WU00:FS01:0xa7:Completed 275000 out of 500000 steps (55%)
16:56:46:WU00:FS01:0xa7:Completed 280000 out of 500000 steps (56%)
16:57:15:WU00:FS01:0xa7:Completed 285000 out of 500000 steps (57%)
16:57:46:WU00:FS01:0xa7:Completed 290000 out of 500000 steps (58%)
16:58:15:WU00:FS01:0xa7:Completed 295000 out of 500000 steps (59%)
16:58:45:WU00:FS01:0xa7:Completed 300000 out of 500000 steps (60%)
16:59:14:WU00:FS01:0xa7:Completed 305000 out of 500000 steps (61%)
16:59:47:WU00:FS01:0xa7:Completed 310000 out of 500000 steps (62%)
17:00:15:WU00:FS01:0xa7:Completed 315000 out of 500000 steps (63%)
17:00:45:WU00:FS01:0xa7:Completed 320000 out of 500000 steps (64%)
17:01:16:WU00:FS01:0xa7:Completed 325000 out of 500000 steps (65%)
17:01:47:WU00:FS01:0xa7:Completed 330000 out of 500000 steps (66%)
17:02:17:WU00:FS01:0xa7:Completed 335000 out of 500000 steps (67%)
17:02:47:WU00:FS01:0xa7:Completed 340000 out of 500000 steps (68%)
17:03:17:WU00:FS01:0xa7:Completed 345000 out of 500000 steps (69%)
17:03:45:WU00:FS01:0xa7:Completed 350000 out of 500000 steps (70%)
17:04:15:WU00:FS01:0xa7:Completed 355000 out of 500000 steps (71%)
17:04:44:WU00:FS01:0xa7:Completed 360000 out of 500000 steps (72%)
17:05:14:WU00:FS01:0xa7:Completed 365000 out of 500000 steps (73%)
17:05:43:WU00:FS01:0xa7:Completed 370000 out of 500000 steps (74%)
17:06:12:WU00:FS01:0xa7:Completed 375000 out of 500000 steps (75%)
17:06:41:WU00:FS01:0xa7:Completed 380000 out of 500000 steps (76%)
17:07:10:WU00:FS01:0xa7:Completed 385000 out of 500000 steps (77%)
17:07:39:WU00:FS01:0xa7:Completed 390000 out of 500000 steps (78%)
17:08:11:WU00:FS01:0xa7:Completed 395000 out of 500000 steps (79%)
17:08:40:WU00:FS01:0xa7:Completed 400000 out of 500000 steps (80%)
17:09:09:WU00:FS01:0xa7:Completed 405000 out of 500000 steps (81%)
17:09:38:WU00:FS01:0xa7:Completed 410000 out of 500000 steps (82%)
17:10:07:WU00:FS01:0xa7:Completed 415000 out of 500000 steps (83%)
17:10:37:WU00:FS01:0xa7:Completed 420000 out of 500000 steps (84%)
17:11:06:WU00:FS01:0xa7:Completed 425000 out of 500000 steps (85%)
17:11:36:WU00:FS01:0xa7:Completed 430000 out of 500000 steps (86%)
17:12:07:WU00:FS01:0xa7:Completed 435000 out of 500000 steps (87%)
17:12:38:WU00:FS01:0xa7:Completed 440000 out of 500000 steps (88%)
17:13:08:WU00:FS01:0xa7:Completed 445000 out of 500000 steps (89%)
17:13:39:WU00:FS01:0xa7:Completed 450000 out of 500000 steps (90%)
17:14:09:WU00:FS01:0xa7:Completed 455000 out of 500000 steps (91%)
17:14:39:WU00:FS01:0xa7:Completed 460000 out of 500000 steps (92%)
17:15:09:WU00:FS01:0xa7:Completed 465000 out of 500000 steps (93%)
17:15:38:WU00:FS01:0xa7:Completed 470000 out of 500000 steps (94%)
17:16:09:WU00:FS01:0xa7:Completed 475000 out of 500000 steps (95%)
17:16:38:WU00:FS01:0xa7:Completed 480000 out of 500000 steps (96%)
17:17:09:WU00:FS01:0xa7:Completed 485000 out of 500000 steps (97%)
17:17:39:WU00:FS01:0xa7:Completed 490000 out of 500000 steps (98%)
17:18:08:WU00:FS01:0xa7:Completed 495000 out of 500000 steps (99%)
17:18:09:WU01:FS01:Connecting to assign1.foldingathome.org:80
17:18:09:WU01:FS01:Assigned to work server 13.82.98.119
17:18:09:WU01:FS01:Requesting new work unit for slot 01: RUNNING cpu:24 from 13.82.98.119
17:18:09:WU01:FS01:Connecting to 13.82.98.119:8080
17:18:10:WU01:FS01:Downloading 7.46MiB
17:18:16:WU01:FS01:Download 82.15%
17:18:17:WU01:FS01:Download complete
17:18:17:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:13871 run:0 clone:2493 gen:129 core:0xa7 unit:0x000000950d5262775e791af05417f40b
17:18:39:WU00:FS01:0xa7:Completed 500000 out of 500000 steps (100%)
17:18:40:WU00:FS01:0xa7:Saving result file ../logfile_01.txt
17:18:40:WU00:FS01:0xa7:Saving result file frame121.trr
17:18:40:WU00:FS01:0xa7:Saving result file frame121.xtc
17:18:40:WU00:FS01:0xa7:Saving result file md.log
17:18:40:WU00:FS01:0xa7:Saving result file science.log
17:18:40:WU00:FS01:0xa7:Folding@home Core Shutdown: FINISHED_UNIT
17:18:40:WU00:FS01:FahCore returned: FINISHED_UNIT (100 = 0x64)
17:18:40:WU00:FS01:Sending unit results: id:00 state:SEND error:NO_ERROR project:16427 run:0 clone:2618 gen:121 core:0xa7 unit:0x0000008da8f5c67d5e924aab863769f3
17:18:40:WU00:FS01:Uploading 1.70MiB to 168.245.198.125
17:18:40:WU00:FS01:Connecting to 168.245.198.125:8080
17:18:40:WU01:FS01:Starting
17:18:40:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/private/fah/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 01 -suffix 01 -version 706 -lifeline 334756 -checkpoint 15 -np 24
17:18:40:WU01:FS01:Started FahCore on PID 2026534
17:18:40:WU01:FS01:Core PID:2026538
17:18:40:WU01:FS01:FahCore 0xa7 started
17:18:41:WU01:FS01:0xa7:*********************** Log Started 2020-05-18T17:18:40Z ***********************
17:18:41:WU01:FS01:0xa7:************************** Gromacs Folding@home Core ***************************
17:18:41:WU01:FS01:0xa7:       Type: 0xa7
17:18:41:WU01:FS01:0xa7:       Core: Gromacs
17:18:41:WU01:FS01:0xa7:       Args: -dir 01 -suffix 01 -version 706 -lifeline 2026534 -checkpoint 15
17:18:41:WU01:FS01:0xa7:             -np 24
17:18:41:WU01:FS01:0xa7:************************************ CBang *************************************
17:18:41:WU01:FS01:0xa7:       Date: Nov 5 2019
17:18:41:WU01:FS01:0xa7:       Time: 06:06:57
17:18:41:WU01:FS01:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
17:18:41:WU01:FS01:0xa7:     Branch: master
17:18:41:WU01:FS01:0xa7:   Compiler: GNU 8.3.0
17:18:41:WU01:FS01:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
17:18:41:WU01:FS01:0xa7:   Platform: linux2 4.19.0-5-amd64
17:18:41:WU01:FS01:0xa7:       Bits: 64
17:18:41:WU01:FS01:0xa7:       Mode: Release
17:18:41:WU01:FS01:0xa7:************************************ System ************************************
17:18:41:WU01:FS01:0xa7:        CPU: AMD Ryzen 9 3950X 16-Core Processor
17:18:41:WU01:FS01:0xa7:     CPU ID: AuthenticAMD Family 23 Model 113 Stepping 0
17:18:41:WU01:FS01:0xa7:       CPUs: 32
17:18:41:WU01:FS01:0xa7:     Memory: 47.06GiB
17:18:41:WU01:FS01:0xa7:Free Memory: 6.46GiB
17:18:41:WU01:FS01:0xa7:    Threads: POSIX_THREADS
17:18:41:WU01:FS01:0xa7: OS Version: 5.6
17:18:41:WU01:FS01:0xa7:Has Battery: false
17:18:41:WU01:FS01:0xa7: On Battery: false
17:18:41:WU01:FS01:0xa7: UTC Offset: 2
17:18:41:WU01:FS01:0xa7:        PID: 2026538
17:18:41:WU01:FS01:0xa7:        CWD: /var/lib/private/fah/work
17:18:41:WU01:FS01:0xa7:******************************** Build - libFAH ********************************
17:18:41:WU01:FS01:0xa7:    Version: 0.0.18
17:18:41:WU01:FS01:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
17:18:41:WU01:FS01:0xa7:  Copyright: 2019 foldingathome.org
17:18:41:WU01:FS01:0xa7:   Homepage: <cut>
17:18:41:WU01:FS01:0xa7:       Date: Nov 5 2019
17:18:41:WU01:FS01:0xa7:       Time: 06:13:26
17:18:41:WU01:FS01:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
17:18:41:WU01:FS01:0xa7:     Branch: master
17:18:41:WU01:FS01:0xa7:   Compiler: GNU 8.3.0
17:18:41:WU01:FS01:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
17:18:41:WU01:FS01:0xa7:   Platform: linux2 4.19.0-5-amd64
17:18:41:WU01:FS01:0xa7:       Bits: 64
17:18:41:WU01:FS01:0xa7:       Mode: Release
17:18:41:WU01:FS01:0xa7:************************************ Build *************************************
17:18:41:WU01:FS01:0xa7:       SIMD: avx_256
17:18:41:WU01:FS01:0xa7:********************************************************************************
17:18:41:WU01:FS01:0xa7:Project: 13871 (Run 0, Clone 2493, Gen 129)
17:18:41:WU01:FS01:0xa7:Unit: 0x000000950d5262775e791af05417f40b
17:18:41:WU01:FS01:0xa7:Reading tar file core.xml
17:18:41:WU01:FS01:0xa7:Reading tar file frame129.tpr
17:18:41:WU01:FS01:0xa7:Digital signatures verified
17:18:41:WU01:FS01:0xa7:Calling: mdrun -s frame129.tpr -o frame129.trr -x frame129.xtc -e frame129.edr -cpt 15 -nt 24
17:18:41:WU01:FS01:0xa7:Steps: first=16125000 total=125000
17:18:42:WU01:FS01:0xa7:Completed 1 out of 125000 steps (0%)
17:18:46:WU00:FS01:Upload complete
17:18:46:WU00:FS01:Server responded WORK_ACK (400)
17:18:46:WU00:FS01:Final credit estimate, 4205.00 points
17:18:46:WU00:FS01:Cleaning up
17:19:16:WU01:FS01:0xa7:Completed 1250 out of 125000 steps (1%)
17:19:46:WU01:FS01:0xa7:Completed 2500 out of 125000 steps (2%)
17:20:17:WU01:FS01:0xa7:Completed 3750 out of 125000 steps (3%)
17:20:49:WU01:FS01:0xa7:Completed 5000 out of 125000 steps (4%)
17:21:19:WU01:FS01:0xa7:Completed 6250 out of 125000 steps (5%)
17:21:49:WU01:FS01:0xa7:Completed 7500 out of 125000 steps (6%)
17:22:19:WU01:FS01:0xa7:Completed 8750 out of 125000 steps (7%)
17:22:49:WU01:FS01:0xa7:Completed 10000 out of 125000 steps (8%)
17:23:20:WU01:FS01:0xa7:Completed 11250 out of 125000 steps (9%)
17:23:49:WU01:FS01:0xa7:Completed 12500 out of 125000 steps (10%)
17:24:20:WU01:FS01:0xa7:Completed 13750 out of 125000 steps (11%)
17:24:51:WU01:FS01:0xa7:Completed 15000 out of 125000 steps (12%)
17:25:20:WU01:FS01:0xa7:Completed 16250 out of 125000 steps (13%)
17:25:51:WU01:FS01:0xa7:Completed 17500 out of 125000 steps (14%)
Celmor
 
Posts: 4
Joined: Mon May 18, 2020 5:39 pm

Re: Certain WUs keep erroring with high thread numbers

Postby Neil-B » Mon May 18, 2020 6:32 pm

28 can be an issue as multiple of 7 ... Would have expected 32/27/24 all to be ok ... I run 32/56 and 24/56 slots and can't recall a Domain Decomp error on either ... When I used to run 28/56 slots they were relatively common place ... a change from 28 to 32 wont take effect until a new WU is downloaded - but down steps should ... I can only see 28 slots in use in the first part of the log - the 2nd part has a 24 slot which looks as if it has worked?

So try 27 as that should be ok.
Last edited by Neil-B on Mon May 18, 2020 6:34 pm, edited 1 time in total.
1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent, Quadro K420 1GB, FAH 7.6.13
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro, Quadro M1000M 2GB, FAH 7.6.13
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro, GTX 750Ti 2GB, FAH 7.6.13
Neil-B
 
Posts: 1217
Joined: Sun Mar 22, 2020 6:52 pm
Location: UK

Re: Certain WUs keep erroring with high thread numbers

Postby _r2w_ben » Mon May 18, 2020 6:32 pm

Welcome to the forum Celmor!

Thanks for the logs showing the project numbers that are failing. Can you try 27 threads?
_r2w_ben
 
Posts: 277
Joined: Wed Apr 23, 2008 4:11 pm

Re: Certain WUs keep erroring with high thread numbers

Postby Celmor » Tue May 19, 2020 5:36 am

Thanks for the answers. Now running 27 threads and so far so good (although now I have a WU/work queue hanging around without a corresponding slot). I expected the thread count ideally being a power of 2).
I have some more historical logs with the slot configs and following errors extracted, in confis like 8+8+16, 16+16, 28, 30 and 32 iirc.
I could upload complete logs if I were allowed to post a hyperlink (39M in extracted form), but here's the "reduced" version:
Code: Select all
May 04 21:43:42 Celmor-PC FAHClient[270074]: 19:43:42:<config>
May 04 21:43:42 Celmor-PC FAHClient[270074]: 19:43:42:  <!-- Folding Slot Configuration -->
May 04 21:43:42 Celmor-PC FAHClient[270074]: 19:43:42:  <cause v='COVID_19'/>
May 04 21:43:42 Celmor-PC FAHClient[270074]: 19:43:42:  <gpu v='false'/>
May 04 21:43:42 Celmor-PC FAHClient[270074]: 19:43:42:
May 04 21:43:42 Celmor-PC FAHClient[270074]: 19:43:42:  <!-- Network -->
May 04 21:43:42 Celmor-PC FAHClient[270074]: 19:43:42:  <proxy v=':8080'/>
May 04 21:43:42 Celmor-PC FAHClient[270074]: 19:43:42:
May 04 21:43:42 Celmor-PC FAHClient[270074]: 19:43:42:  <!-- Slot Control -->
May 04 21:43:42 Celmor-PC FAHClient[270074]: 19:43:42:  <pause-on-battery v='false'/>
May 04 21:43:42 Celmor-PC FAHClient[270074]: 19:43:42:  <power v='light'/>
May 04 21:43:42 Celmor-PC FAHClient[270074]: 19:43:42:
May 04 21:43:42 Celmor-PC FAHClient[270074]: 19:43:42:  <!-- User Information -->
May 04 21:43:42 Celmor-PC FAHClient[270074]: 19:43:42:  <passkey v='*****'/>
May 04 21:43:42 Celmor-PC FAHClient[270074]: 19:43:42:  <team v='223518'/>
May 04 21:43:42 Celmor-PC FAHClient[270074]: 19:43:42:  <user v='celmor'/>
May 04 21:43:42 Celmor-PC FAHClient[270074]: 19:43:42:
May 04 21:43:42 Celmor-PC FAHClient[270074]: 19:43:42:  <!-- Folding Slots -->
May 04 21:43:42 Celmor-PC FAHClient[270074]: 19:43:42:  <slot id='0' type='CPU'>
May 04 21:43:42 Celmor-PC FAHClient[270074]: 19:43:42:    <cpus v='16'/>
May 04 21:43:42 Celmor-PC FAHClient[270074]: 19:43:42:  </slot>
May 04 21:43:42 Celmor-PC FAHClient[270074]: 19:43:42:  <slot id='1' type='CPU'>
May 04 21:43:42 Celmor-PC FAHClient[270074]: 19:43:42:    <cpus v='16'/>
May 04 21:43:42 Celmor-PC FAHClient[270074]: 19:43:42:  </slot>
May 04 21:43:42 Celmor-PC FAHClient[270074]: 19:43:42:</config>

May 05 15:11:44 Celmor-PC FAHClient[270074]: 13:11:44:WU02:FS01:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:14534 run:267 clone:2 gen:31 core:0xa7 unit:0x00000023cedfaa925ea34b60de2e5a23
May 05 15:12:40 Celmor-PC FAHClient[270074]: 13:12:40:WU00:FS01:0xa7:Completed 250000 out of 250000 steps (100%)
May 05 15:12:43 Celmor-PC FAHClient[270074]: 13:12:43:WU00:FS01:0xa7:Saving result file ../logfile_01.txt
May 05 15:12:43 Celmor-PC FAHClient[270074]: 13:12:43:WU00:FS01:0xa7:Saving result file frame15.trr
May 05 15:12:43 Celmor-PC FAHClient[270074]: 13:12:43:WU00:FS01:0xa7:Saving result file frame15.xtc
May 05 15:12:43 Celmor-PC FAHClient[270074]: 13:12:43:WU00:FS01:0xa7:Saving result file md.log
May 05 15:12:43 Celmor-PC FAHClient[270074]: 13:12:43:WU00:FS01:0xa7:Saving result file science.log
May 05 15:12:43 Celmor-PC FAHClient[270074]: 13:12:43:WU00:FS01:0xa7:Folding@home Core Shutdown: FINISHED_UNIT
May 05 15:12:44 Celmor-PC FAHClient[270074]: 13:12:44:WU02:FS01:0xa7:ERROR:
May 05 15:12:44 Celmor-PC FAHClient[270074]: 13:12:44:WU02:FS01:0xa7:ERROR:-------------------------------------------------------
May 05 15:12:44 Celmor-PC FAHClient[270074]: 13:12:44:WU02:FS01:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
May 05 15:12:44 Celmor-PC FAHClient[270074]: 13:12:44:WU02:FS01:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
May 05 15:12:44 Celmor-PC FAHClient[270074]: 13:12:44:WU02:FS01:0xa7:ERROR:
May 05 15:12:44 Celmor-PC FAHClient[270074]: 13:12:44:WU02:FS01:0xa7:ERROR:Fatal error:
May 05 15:12:44 Celmor-PC FAHClient[270074]: 13:12:44:WU02:FS01:0xa7:ERROR:There is no domain decomposition for 16 ranks that is compatible with the given box and a minimum cell size of 1.46925 nm
May 05 15:12:44 Celmor-PC FAHClient[270074]: 13:12:44:WU02:FS01:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
May 05 15:12:44 Celmor-PC FAHClient[270074]: 13:12:44:WU02:FS01:0xa7:ERROR:Look in the log file for details on the domain decomposition
May 05 15:12:44 Celmor-PC FAHClient[270074]: 13:12:44:WU02:FS01:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
May 05 15:12:44 Celmor-PC FAHClient[270074]: 13:12:44:WU02:FS01:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
May 05 15:12:44 Celmor-PC FAHClient[270074]: 13:12:44:WU02:FS01:0xa7:ERROR:-------------------------------------------------------
May 05 15:12:49 Celmor-PC FAHClient[270074]: 13:12:49:WU02:FS01:0xa7:WARNING:Unexpected exit() call
May 05 15:12:49 Celmor-PC FAHClient[270074]: 13:12:49:WU02:FS01:0xa7:WARNING:Unexpected exit from science code
May 05 15:12:49 Celmor-PC FAHClient[270074]: 13:12:49:WU02:FS01:0xa7:Saving result file ../logfile_01.txt
May 05 15:12:49 Celmor-PC FAHClient[270074]: 13:12:49:WU02:FS01:0xa7:Saving result file md.log
May 05 15:12:49 Celmor-PC FAHClient[270074]: 13:12:49:WU02:FS01:0xa7:Saving result file science.log
May 05 15:12:50 Celmor-PC FAHClient[270074]: 13:12:50:WU02:FS01:FahCore returned: INTERRUPTED (102 = 0x66)

May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:<config>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  <!-- Folding Slot Configuration -->
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  <cause v='COVID_19'/>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  <gpu v='false'/>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  <!-- Network -->
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  <proxy v=':8080'/>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  <!-- Slot Control -->
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  <pause-on-battery v='false'/>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  <power v='light'/>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  <!-- User Information -->
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  <passkey v='*****'/>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  <team v='223518'/>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  <user v='celmor'/>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  <!-- Folding Slots -->
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  <slot id='0' type='CPU'>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:    <cpus v='16'/>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:    <paused v='true'/>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  </slot>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  <slot id='1' type='CPU'>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:    <cpus v='16'/>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:    <paused v='true'/>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  </slot>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:</config>

May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:Project: 14534 (Run 267, Clone 2, Gen 31)
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:Unit: 0x00000023cedfaa925ea34b60de2e5a23
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:Reading tar file core.xml
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:Reading tar file frame31.tpr
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:Digital signatures verified
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:Calling: mdrun -s frame31.tpr -o frame31.trr -x frame31.xtc -cpt 15 -nt 16
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:Steps: first=31000000 total=1000000
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:ERROR:
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:ERROR:-------------------------------------------------------
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:ERROR:
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:ERROR:Fatal error:
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:ERROR:There is no domain decomposition for 16 ranks that is compatible with the given box and a minimum cell size of 1.46925 nm
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:ERROR:Look in the log file for details on the domain decomposition
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:ERROR:-------------------------------------------------------

May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:<config>
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:  <!-- Folding Slot Configuration -->
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:  <cause v='COVID_19'/>
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:  <gpu v='false'/>
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:  <!-- Network -->
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:  <proxy v=':8080'/>
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:  <!-- Slot Control -->
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:  <pause-on-battery v='false'/>
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:  <power v='light'/>
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:  <!-- User Information -->
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:  <passkey v='*****'/>
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:  <team v='223518'/>
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:  <user v='celmor'/>
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:  <!-- Folding Slots -->
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:  <slot id='0' type='CPU'>
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:    <cpus v='16'/>
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:    <paused v='true'/>
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:  </slot>
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:  <slot id='1' type='CPU'>
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:    <cpus v='16'/>
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:  </slot>
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:  <slot id='2' type='CPU'>
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:    <cpus v='8'/>
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:  </slot>
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:  <slot id='3' type='CPU'>
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:    <cpus v='8'/>
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:  </slot>
May 06 01:22:37 Celmor-PC FAHClient[1127419]: 23:22:37:</config>

May 06 10:18:27 Celmor-PC FAHClient[1127419]: 08:18:27:WU03:FS03:0xa7:Project: 14728 (Run 263, Clone 1, Gen 0)
May 06 10:18:27 Celmor-PC FAHClient[1127419]: 08:18:27:WU03:FS03:0xa7:Unit: 0x000000029bf7a4d65ea6f14e10a13c17
May 06 10:18:27 Celmor-PC FAHClient[1127419]: 08:18:27:WU03:FS03:0xa7:Reading tar file core.xml
May 06 10:18:27 Celmor-PC FAHClient[1127419]: 08:18:27:WU03:FS03:0xa7:Reading tar file frame0.tpr
May 06 10:18:27 Celmor-PC FAHClient[1127419]: 08:18:27:WU03:FS03:0xa7:Digital signatures verified
May 06 10:18:27 Celmor-PC FAHClient[1127419]: 08:18:27:WU03:FS03:0xa7:Calling: mdrun -s frame0.tpr -o frame0.trr -cpt 15 -nt 8
May 06 10:18:27 Celmor-PC FAHClient[1127419]: 08:18:27:WU03:FS03:0xa7:Steps: first=0 total=250000
May 06 10:18:27 Celmor-PC FAHClient[1127419]: 08:18:27:WU04:FS01:0xa7:Completed 185000 out of 250000 steps (74%)
May 06 10:18:28 Celmor-PC FAHClient[1127419]: 08:18:28:WU03:FS03:0xa7:Completed 1 out of 250000 steps (0%)
May 06 10:18:32 Celmor-PC FAHClient[1127419]: 08:18:32:WU00:FS03:Upload complete
May 06 10:18:32 Celmor-PC FAHClient[1127419]: 08:18:32:WU00:FS03:Server responded WORK_ACK (400)
May 06 10:18:32 Celmor-PC FAHClient[1127419]: 08:18:32:WU00:FS03:Final credit estimate, 7415.00 points
May 06 10:18:32 Celmor-PC FAHClient[1127419]: 08:18:32:WU00:FS03:Cleaning up
May 06 10:19:02 Celmor-PC FAHClient[1127419]: 08:19:02:WU04:FS01:0xa7:Completed 187500 out of 250000 steps (75%)
May 06 10:19:11 Celmor-PC FAHClient[1127419]: 08:19:11:WU01:FS02:0xa7:Completed 237500 out of 250000 steps (95%)
May 06 10:19:30 Celmor-PC FAHClient[1127419]: 08:19:30:WU03:FS03:0xa7:Completed 2500 out of 250000 steps (1%)
May 06 10:19:35 Celmor-PC FAHClient[1127419]: 08:19:35:WU04:FS01:0xa7:Completed 190000 out of 250000 steps (76%)
May 06 10:20:10 Celmor-PC FAHClient[1127419]: 08:20:10:WU04:FS01:0xa7:Completed 192500 out of 250000 steps (77%)
May 06 10:20:14 Celmor-PC FAHClient[1127419]: 08:20:14:WU01:FS02:0xa7:Completed 240000 out of 250000 steps (96%)
May 06 10:20:31 Celmor-PC FAHClient[1127419]: 08:20:31:WU03:FS03:0xa7:Completed 5000 out of 250000 steps (2%)
May 06 10:20:43 Celmor-PC FAHClient[1127419]: 08:20:43:WU04:FS01:0xa7:Completed 195000 out of 250000 steps (78%)
May 06 10:20:52 Celmor-PC FAHClient[1127419]: 08:20:52:WU03:FS03:0xa7:ERROR:
May 06 10:20:52 Celmor-PC FAHClient[1127419]: 08:20:52:WU03:FS03:0xa7:ERROR:-------------------------------------------------------
May 06 10:20:52 Celmor-PC FAHClient[1127419]: 08:20:52:WU03:FS03:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
May 06 10:20:52 Celmor-PC FAHClient[1127419]: 08:20:52:WU03:FS03:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/pme.c, line: 754
May 06 10:20:52 Celmor-PC FAHClient[1127419]: 08:20:52:WU03:FS03:0xa7:ERROR:
May 06 10:20:52 Celmor-PC FAHClient[1127419]: 08:20:52:WU03:FS03:0xa7:ERROR:Fatal error:
May 06 10:20:52 Celmor-PC FAHClient[1127419]: 08:20:52:WU03:FS03:0xa7:ERROR:1 particles communicated to PME rank 0 are more than 2/3 times the cut-off out of the domain decomposition cell of their charge group in dimension x.
May 06 10:20:52 Celmor-PC FAHClient[1127419]: 08:20:52:WU03:FS03:0xa7:ERROR:This usually means that your system is not well equilibrated.
May 06 10:20:52 Celmor-PC FAHClient[1127419]: 08:20:52:WU03:FS03:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
May 06 10:20:52 Celmor-PC FAHClient[1127419]: 08:20:52:WU03:FS03:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
May 06 10:20:52 Celmor-PC FAHClient[1127419]: 08:20:52:WU03:FS03:0xa7:ERROR:-------------------------------------------------------
May 06 10:20:57 Celmor-PC FAHClient[1127419]: 08:20:57:WU03:FS03:0xa7:ERROR:
May 06 10:20:57 Celmor-PC FAHClient[1127419]: 08:20:57:WU03:FS03:0xa7:ERROR:-------------------------------------------------------
May 06 10:20:57 Celmor-PC FAHClient[1127419]: 08:20:57:WU03:FS03:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
May 06 10:20:57 Celmor-PC FAHClient[1127419]: 08:20:57:WU03:FS03:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/pme.c, line: 754
May 06 10:20:57 Celmor-PC FAHClient[1127419]: 08:20:57:WU03:FS03:0xa7:ERROR:
May 06 10:20:57 Celmor-PC FAHClient[1127419]: 08:20:57:WU03:FS03:0xa7:ERROR:Fatal error:
May 06 10:20:57 Celmor-PC FAHClient[1127419]: 08:20:57:WU03:FS03:0xa7:ERROR:1 particles communicated to PME rank 0 are more than 2/3 times the cut-off out of the domain decomposition cell of their charge group in dimension x.
May 06 10:20:57 Celmor-PC FAHClient[1127419]: 08:20:57:WU03:FS03:0xa7:ERROR:This usually means that your system is not well equilibrated.
May 06 10:20:57 Celmor-PC FAHClient[1127419]: 08:20:57:WU03:FS03:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
May 06 10:20:57 Celmor-PC FAHClient[1127419]: 08:20:57:WU03:FS03:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
May 06 10:20:57 Celmor-PC FAHClient[1127419]: 08:20:57:WU03:FS03:0xa7:ERROR:-------------------------------------------------------

May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:<config>
May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:  <!-- Folding Slot Configuration -->
May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:  <cause v='COVID_19'/>
May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:  <gpu v='false'/>
May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:
May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:  <!-- Network -->
May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:  <proxy v=':8080'/>
May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:
May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:  <!-- Slot Control -->
May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:  <pause-on-battery v='false'/>
May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:  <power v='light'/>
May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:
May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:  <!-- User Information -->
May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:  <passkey v='*****'/>
May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:  <team v='223518'/>
May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:  <user v='celmor'/>
May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:
May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:  <!-- Folding Slots -->
May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:  <slot id='0' type='CPU'>
May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:    <cpus v='16'/>
May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:  </slot>
May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:  <slot id='1' type='CPU'>
May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:    <cpus v='16'/>
May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:  </slot>
May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:  <slot id='2' type='CPU'>
May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:    <cpus v='8'/>
May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:  </slot>
May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:  <slot id='3' type='CPU'>
May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:    <cpus v='8'/>
May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:  </slot>
May 06 19:04:01 Celmor-PC FAHClient[1127419]: 17:04:01:</config>

May 06 19:04:32 Celmor-PC FAHClient[1127419]: 17:04:32:WU02:FS00:0xa7:Project: 14534 (Run 267, Clone 2, Gen 31)
May 06 19:04:32 Celmor-PC FAHClient[1127419]: 17:04:32:WU02:FS00:0xa7:Unit: 0x00000023cedfaa925ea34b60de2e5a23
May 06 19:04:32 Celmor-PC FAHClient[1127419]: 17:04:32:WU02:FS00:0xa7:Reading tar file core.xml
May 06 19:04:32 Celmor-PC FAHClient[1127419]: 17:04:32:WU02:FS00:0xa7:Reading tar file frame31.tpr
May 06 19:04:32 Celmor-PC FAHClient[1127419]: 17:04:32:WU02:FS00:0xa7:Digital signatures verified
May 06 19:04:32 Celmor-PC FAHClient[1127419]: 17:04:32:WU02:FS00:0xa7:Calling: mdrun -s frame31.tpr -o frame31.trr -x frame31.xtc -cpt 15 -nt 16
May 06 19:04:32 Celmor-PC FAHClient[1127419]: 17:04:32:WU02:FS00:0xa7:Steps: first=31000000 total=1000000
May 06 19:04:32 Celmor-PC FAHClient[1127419]: 17:04:32:WU02:FS00:0xa7:ERROR:
May 06 19:04:32 Celmor-PC FAHClient[1127419]: 17:04:32:WU02:FS00:0xa7:ERROR:-------------------------------------------------------
May 06 19:04:32 Celmor-PC FAHClient[1127419]: 17:04:32:WU02:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
May 06 19:04:32 Celmor-PC FAHClient[1127419]: 17:04:32:WU02:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
May 06 19:04:32 Celmor-PC FAHClient[1127419]: 17:04:32:WU02:FS00:0xa7:ERROR:
May 06 19:04:32 Celmor-PC FAHClient[1127419]: 17:04:32:WU02:FS00:0xa7:ERROR:Fatal error:
May 06 19:04:32 Celmor-PC FAHClient[1127419]: 17:04:32:WU02:FS00:0xa7:ERROR:There is no domain decomposition for 16 ranks that is compatible with the given box and a minimum cell size of 1.46925 nm
May 06 19:04:32 Celmor-PC FAHClient[1127419]: 17:04:32:WU02:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
May 06 19:04:32 Celmor-PC FAHClient[1127419]: 17:04:32:WU02:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
May 06 19:04:32 Celmor-PC FAHClient[1127419]: 17:04:32:WU02:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
May 06 19:04:32 Celmor-PC FAHClient[1127419]: 17:04:32:WU02:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
May 06 19:04:32 Celmor-PC FAHClient[1127419]: 17:04:32:WU02:FS00:0xa7:ERROR:-------------------------------------------------------

May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:<config>
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:  <!-- Folding Slot Configuration -->
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:  <cause v='COVID_19'/>
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:  <gpu v='false'/>
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:  <!-- Network -->
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:  <proxy v=':8080'/>
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:  <!-- Slot Control -->
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:  <pause-on-battery v='false'/>
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:  <power v='light'/>
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:  <!-- User Information -->
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:  <passkey v='*****'/>
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:  <team v='223518'/>
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:  <user v='celmor'/>
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:  <!-- Folding Slots -->
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:  <slot id='0' type='CPU'>
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:    <cpus v='16'/>
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:    <paused v='true'/>
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:  </slot>
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:  <slot id='1' type='CPU'>
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:    <cpus v='16'/>
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:    <paused v='true'/>
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:  </slot>
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:  <slot id='2' type='CPU'>
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:    <cpus v='8'/>
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:    <paused v='true'/>
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:  </slot>
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:  <slot id='3' type='CPU'>
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:    <cpus v='8'/>
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:    <paused v='true'/>
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:  </slot>
May 06 20:06:02 Celmor-PC FAHClient[1127419]: 18:06:02:</config>

May 06 20:07:07 Celmor-PC FAHClient[1127419]: 18:07:07:WU02:FS00:0xa7:Project: 14534 (Run 267, Clone 2, Gen 31)
May 06 20:07:07 Celmor-PC FAHClient[1127419]: 18:07:07:WU02:FS00:0xa7:Unit: 0x00000023cedfaa925ea34b60de2e5a23
May 06 20:07:07 Celmor-PC FAHClient[1127419]: 18:07:07:WU02:FS00:0xa7:Reading tar file core.xml
May 06 20:07:07 Celmor-PC FAHClient[1127419]: 18:07:07:WU02:FS00:0xa7:Reading tar file frame31.tpr
May 06 20:07:07 Celmor-PC FAHClient[1127419]: 18:07:07:WU02:FS00:0xa7:Digital signatures verified
May 06 20:07:07 Celmor-PC FAHClient[1127419]: 18:07:07:WU02:FS00:0xa7:Calling: mdrun -s frame31.tpr -o frame31.trr -x frame31.xtc -cpt 15 -nt 16
May 06 20:07:07 Celmor-PC FAHClient[1127419]: 18:07:07:WU02:FS00:0xa7:Steps: first=31000000 total=1000000
May 06 20:07:07 Celmor-PC FAHClient[1127419]: 18:07:07:WU02:FS00:0xa7:ERROR:
May 06 20:07:07 Celmor-PC FAHClient[1127419]: 18:07:07:WU02:FS00:0xa7:ERROR:-------------------------------------------------------
May 06 20:07:07 Celmor-PC FAHClient[1127419]: 18:07:07:WU02:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
May 06 20:07:07 Celmor-PC FAHClient[1127419]: 18:07:07:WU02:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
May 06 20:07:07 Celmor-PC FAHClient[1127419]: 18:07:07:WU02:FS00:0xa7:ERROR:
May 06 20:07:07 Celmor-PC FAHClient[1127419]: 18:07:07:WU02:FS00:0xa7:ERROR:Fatal error:
May 06 20:07:07 Celmor-PC FAHClient[1127419]: 18:07:07:WU02:FS00:0xa7:ERROR:There is no domain decomposition for 16 ranks that is compatible with the given box and a minimum cell size of 1.46925 nm
May 06 20:07:07 Celmor-PC FAHClient[1127419]: 18:07:07:WU02:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
May 06 20:07:07 Celmor-PC FAHClient[1127419]: 18:07:07:WU02:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
May 06 20:07:07 Celmor-PC FAHClient[1127419]: 18:07:07:WU02:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
May 06 20:07:07 Celmor-PC FAHClient[1127419]: 18:07:07:WU02:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
May 06 20:07:07 Celmor-PC FAHClient[1127419]: 18:07:07:WU02:FS00:0xa7:ERROR:-------------------------------------------------------

May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:<config>
May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:  <!-- Folding Slot Configuration -->
May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:  <cause v='COVID_19'/>
May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:  <gpu v='false'/>
May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:
May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:  <!-- Network -->
May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:  <proxy v=':8080'/>
May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:
May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:  <!-- Slot Control -->
May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:  <pause-on-battery v='false'/>
May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:  <power v='light'/>
May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:
May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:  <!-- User Information -->
May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:  <passkey v='*****'/>
May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:  <team v='223518'/>
May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:  <user v='celmor'/>
May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:
May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:  <!-- Folding Slots -->
May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:  <slot id='0' type='CPU'>
May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:    <cpus v='16'/>
May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:  </slot>
May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:  <slot id='1' type='CPU'>
May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:    <cpus v='16'/>
May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:  </slot>
May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:  <slot id='2' type='CPU'>
May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:    <cpus v='8'/>
May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:  </slot>
May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:  <slot id='3' type='CPU'>
May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:    <cpus v='8'/>
May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:  </slot>
May 06 20:08:04 Celmor-PC FAHClient[1127419]: 18:08:04:</config>

May 06 20:08:07 Celmor-PC FAHClient[1127419]: 18:08:07:WU02:FS00:0xa7:Project: 14534 (Run 267, Clone 2, Gen 31)
May 06 20:08:07 Celmor-PC FAHClient[1127419]: 18:08:07:WU02:FS00:0xa7:Unit: 0x00000023cedfaa925ea34b60de2e5a23
May 06 20:08:07 Celmor-PC FAHClient[1127419]: 18:08:07:WU02:FS00:0xa7:Reading tar file core.xml
May 06 20:08:07 Celmor-PC FAHClient[1127419]: 18:08:07:WU02:FS00:0xa7:Reading tar file frame31.tpr
May 06 20:08:07 Celmor-PC FAHClient[1127419]: 18:08:07:WU02:FS00:0xa7:Digital signatures verified
May 06 20:08:07 Celmor-PC FAHClient[1127419]: 18:08:07:WU02:FS00:0xa7:Calling: mdrun -s frame31.tpr -o frame31.trr -x frame31.xtc -cpt 15 -nt 16
May 06 20:08:07 Celmor-PC FAHClient[1127419]: 18:08:07:WU02:FS00:0xa7:Steps: first=31000000 total=1000000
May 06 20:08:07 Celmor-PC FAHClient[1127419]: 18:08:07:WU02:FS00:0xa7:ERROR:
May 06 20:08:07 Celmor-PC FAHClient[1127419]: 18:08:07:WU02:FS00:0xa7:ERROR:-------------------------------------------------------
May 06 20:08:07 Celmor-PC FAHClient[1127419]: 18:08:07:WU02:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
May 06 20:08:07 Celmor-PC FAHClient[1127419]: 18:08:07:WU02:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
May 06 20:08:07 Celmor-PC FAHClient[1127419]: 18:08:07:WU02:FS00:0xa7:ERROR:
May 06 20:08:07 Celmor-PC FAHClient[1127419]: 18:08:07:WU02:FS00:0xa7:ERROR:Fatal error:
May 06 20:08:07 Celmor-PC FAHClient[1127419]: 18:08:07:WU02:FS00:0xa7:ERROR:There is no domain decomposition for 16 ranks that is compatible with the given box and a minimum cell size of 1.46925 nm
May 06 20:08:07 Celmor-PC FAHClient[1127419]: 18:08:07:WU02:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
May 06 20:08:07 Celmor-PC FAHClient[1127419]: 18:08:07:WU02:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
May 06 20:08:07 Celmor-PC FAHClient[1127419]: 18:08:07:WU02:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
May 06 20:08:07 Celmor-PC FAHClient[1127419]: 18:08:07:WU02:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
May 06 20:08:07 Celmor-PC FAHClient[1127419]: 18:08:07:WU02:FS00:0xa7:ERROR:------------------------------------------------------

May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:<config>
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:  <!-- Folding Slot Configuration -->
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:  <cause v='COVID_19'/>
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:  <gpu v='false'/>
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:  <!-- Network -->
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:  <proxy v=':8080'/>
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:  <!-- Slot Control -->
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:  <pause-on-battery v='false'/>
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:  <power v='light'/>
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:  <!-- User Information -->
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:  <passkey v='*****'/>
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:  <team v='223518'/>
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:  <user v='celmor'/>
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:  <!-- Folding Slots -->
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:  <slot id='0' type='CPU'>
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:    <cpus v='16'/>
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:    <paused v='true'/>
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:  </slot>
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:  <slot id='1' type='CPU'>
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:    <cpus v='16'/>
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:  </slot>
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:  <slot id='2' type='CPU'>
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:    <cpus v='8'/>
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:  </slot>
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:  <slot id='3' type='CPU'>
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:    <cpus v='8'/>
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:  </slot>
May 07 13:40:19 Celmor-PC FAHClient[1127419]: 11:40:19:</config>

May 07 17:40:25 Celmor-PC FAHClient[1127419]: 15:40:25:WU02:FS00:0xa7:Project: 14534 (Run 267, Clone 2, Gen 31)
May 07 17:40:25 Celmor-PC FAHClient[1127419]: 15:40:25:WU02:FS00:0xa7:Unit: 0x00000023cedfaa925ea34b60de2e5a23
May 07 17:40:25 Celmor-PC FAHClient[1127419]: 15:40:25:WU02:FS00:0xa7:Reading tar file core.xml
May 07 17:40:25 Celmor-PC FAHClient[1127419]: 15:40:25:WU02:FS00:0xa7:Reading tar file frame31.tpr
May 07 17:40:25 Celmor-PC FAHClient[1127419]: 15:40:25:WU02:FS00:0xa7:Digital signatures verified
May 07 17:40:25 Celmor-PC FAHClient[1127419]: 15:40:25:WU02:FS00:0xa7:Calling: mdrun -s frame31.tpr -o frame31.trr -x frame31.xtc -cpt 15 -nt 16
May 07 17:40:25 Celmor-PC FAHClient[1127419]: 15:40:25:WU02:FS00:0xa7:Steps: first=31000000 total=1000000
May 07 17:40:25 Celmor-PC FAHClient[1127419]: 15:40:25:WU02:FS00:0xa7:ERROR:
May 07 17:40:25 Celmor-PC FAHClient[1127419]: 15:40:25:WU02:FS00:0xa7:ERROR:-------------------------------------------------------
May 07 17:40:25 Celmor-PC FAHClient[1127419]: 15:40:25:WU02:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
May 07 17:40:25 Celmor-PC FAHClient[1127419]: 15:40:25:WU02:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
May 07 17:40:25 Celmor-PC FAHClient[1127419]: 15:40:25:WU02:FS00:0xa7:ERROR:
May 07 17:40:25 Celmor-PC FAHClient[1127419]: 15:40:25:WU02:FS00:0xa7:ERROR:Fatal error:
May 07 17:40:25 Celmor-PC FAHClient[1127419]: 15:40:25:WU02:FS00:0xa7:ERROR:There is no domain decomposition for 16 ranks that is compatible with the given box and a minimum cell size of 1.46925 nm
May 07 17:40:25 Celmor-PC FAHClient[1127419]: 15:40:25:WU02:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
May 07 17:40:25 Celmor-PC FAHClient[1127419]: 15:40:25:WU02:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
May 07 17:40:25 Celmor-PC FAHClient[1127419]: 15:40:25:WU02:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
May 07 17:40:25 Celmor-PC FAHClient[1127419]: 15:40:25:WU02:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
May 07 17:40:25 Celmor-PC FAHClient[1127419]: 15:40:25:WU02:FS00:0xa7:ERROR:-------------------------------------------------------

May 07 17:41:16 Celmor-PC FAHClient[1127419]: 15:41:16:<config>
May 07 17:41:16 Celmor-PC FAHClient[1127419]: 15:41:16:  <!-- Folding Slot Configuration -->
May 07 17:41:16 Celmor-PC FAHClient[1127419]: 15:41:16:  <cause v='COVID_19'/>
May 07 17:41:16 Celmor-PC FAHClient[1127419]: 15:41:16:  <gpu v='false'/>
May 07 17:41:16 Celmor-PC FAHClient[1127419]: 15:41:16:
May 07 17:41:16 Celmor-PC FAHClient[1127419]: 15:41:16:  <!-- Network -->
May 07 17:41:16 Celmor-PC FAHClient[1127419]: 15:41:16:  <proxy v=':8080'/>
May 07 17:41:16 Celmor-PC FAHClient[1127419]: 15:41:16:
May 07 17:41:16 Celmor-PC FAHClient[1127419]: 15:41:16:  <!-- Slot Control -->
May 07 17:41:16 Celmor-PC FAHClient[1127419]: 15:41:16:  <pause-on-battery v='false'/>
May 07 17:41:16 Celmor-PC FAHClient[1127419]: 15:41:16:  <power v='light'/>
May 07 17:41:16 Celmor-PC FAHClient[1127419]: 15:41:16:
May 07 17:41:16 Celmor-PC FAHClient[1127419]: 15:41:16:  <!-- User Information -->
May 07 17:41:16 Celmor-PC FAHClient[1127419]: 15:41:16:  <passkey v='*****'/>
May 07 17:41:16 Celmor-PC FAHClient[1127419]: 15:41:16:  <team v='223518'/>
May 07 17:41:16 Celmor-PC FAHClient[1127419]: 15:41:16:  <user v='celmor'/>
May 07 17:41:16 Celmor-PC FAHClient[1127419]: 15:41:16:
May 07 17:41:16 Celmor-PC FAHClient[1127419]: 15:41:16:  <!-- Folding Slots -->
May 07 17:41:16 Celmor-PC FAHClient[1127419]: 15:41:16:  <slot id='1' type='CPU'>
May 07 17:41:16 Celmor-PC FAHClient[1127419]: 15:41:16:    <cpus v='16'/>
May 07 17:41:16 Celmor-PC FAHClient[1127419]: 15:41:16:  </slot>
May 07 17:41:16 Celmor-PC FAHClient[1127419]: 15:41:16:  <slot id='2' type='CPU'>
May 07 17:41:16 Celmor-PC FAHClient[1127419]: 15:41:16:    <cpus v='8'/>
May 07 17:41:16 Celmor-PC FAHClient[1127419]: 15:41:16:  </slot>
May 07 17:41:16 Celmor-PC FAHClient[1127419]: 15:41:16:  <slot id='3' type='CPU'>
May 07 17:41:16 Celmor-PC FAHClient[1127419]: 15:41:16:    <cpus v='8'/>
May 07 17:41:16 Celmor-PC FAHClient[1127419]: 15:41:16:  </slot>
May 07 17:41:16 Celmor-PC FAHClient[1127419]: 15:41:16:</config>

...
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:<config>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  <!-- Folding Slot Configuration -->
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  <cause v='COVID_19'/>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  <gpu v='false'/>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  <!-- Network -->
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  <proxy v=':8080'/>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  <!-- Slot Control -->
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  <pause-on-battery v='false'/>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  <power v='light'/>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  <!-- User Information -->
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  <passkey v='*****'/>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  <team v='223518'/>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  <user v='celmor'/>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  <!-- Folding Slots -->
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  <slot id='0' type='CPU'>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:    <cpus v='16'/>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:    <paused v='true'/>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  </slot>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  <slot id='1' type='CPU'>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:    <cpus v='16'/>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:    <paused v='true'/>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:  </slot>
May 05 23:14:31 Celmor-PC FAHClient[1127419]: 21:14:31:</config>

May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:Project: 14534 (Run 267, Clone 2, Gen 31)
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:Unit: 0x00000023cedfaa925ea34b60de2e5a23
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:Reading tar file core.xml
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:Reading tar file frame31.tpr
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:Digital signatures verified
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:Calling: mdrun -s frame31.tpr -o frame31.trr -x frame31.xtc -cpt 15 -nt 16
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:Steps: first=31000000 total=1000000
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:ERROR:
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:ERROR:-------------------------------------------------------
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:ERROR:
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:ERROR:Fatal error:
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:ERROR:There is no domain decomposition for 16 ranks that is compatible with the given box and a minimum cell size of 1.46925 nm
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:ERROR:Look in the log file for details on the domain decomposition
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
May 05 23:15:33 Celmor-PC FAHClient[1127419]: 21:15:33:WU02:FS01:0xa7:ERROR:-------------------------------------------------------

May 18 14:57:14 Celmor-PC FAHClient[334756]: 12:57:14:<config>
May 18 14:57:14 Celmor-PC FAHClient[334756]: 12:57:14:  <!-- Folding Slot Configuration -->
May 18 14:57:14 Celmor-PC FAHClient[334756]: 12:57:14:  <cause v='COVID_19'/>
May 18 14:57:14 Celmor-PC FAHClient[334756]: 12:57:14:  <gpu v='false'/>
May 18 14:57:14 Celmor-PC FAHClient[334756]: 12:57:14:
May 18 14:57:14 Celmor-PC FAHClient[334756]: 12:57:14:  <!-- Network -->
May 18 14:57:14 Celmor-PC FAHClient[334756]: 12:57:14:  <proxy v=':8080'/>
May 18 14:57:14 Celmor-PC FAHClient[334756]: 12:57:14:
May 18 14:57:14 Celmor-PC FAHClient[334756]: 12:57:14:  <!-- Slot Control -->
May 18 14:57:14 Celmor-PC FAHClient[334756]: 12:57:14:  <pause-on-battery v='false'/>
May 18 14:57:14 Celmor-PC FAHClient[334756]: 12:57:14:  <power v='full'/>
May 18 14:57:14 Celmor-PC FAHClient[334756]: 12:57:14:
May 18 14:57:14 Celmor-PC FAHClient[334756]: 12:57:14:  <!-- User Information -->
May 18 14:57:14 Celmor-PC FAHClient[334756]: 12:57:14:  <passkey v='*****'/>
May 18 14:57:14 Celmor-PC FAHClient[334756]: 12:57:14:  <team v='223518'/>
May 18 14:57:14 Celmor-PC FAHClient[334756]: 12:57:14:  <user v='celmor'/>
May 18 14:57:14 Celmor-PC FAHClient[334756]: 12:57:14:
May 18 14:57:14 Celmor-PC FAHClient[334756]: 12:57:14:  <!-- Folding Slots -->
May 18 14:57:14 Celmor-PC FAHClient[334756]: 12:57:14:  <slot id='1' type='CPU'>
May 18 14:57:14 Celmor-PC FAHClient[334756]: 12:57:14:    <cpus v='30'/>
May 18 14:57:14 Celmor-PC FAHClient[334756]: 12:57:14:  </slot>
May 18 14:57:14 Celmor-PC FAHClient[334756]: 12:57:14:</config>

May 18 14:57:17 Celmor-PC FAHClient[334756]: 12:57:17:WU00:FS01:0xa7:Project: 16406 (Run 106, Clone 1, Gen 196)
May 18 14:57:17 Celmor-PC FAHClient[334756]: 12:57:17:WU00:FS01:0xa7:Unit: 0x000000e1a8f5c67d5e801f05e2818afe
May 18 14:57:17 Celmor-PC FAHClient[334756]: 12:57:17:WU00:FS01:0xa7:Reading tar file core.xml
May 18 14:57:17 Celmor-PC FAHClient[334756]: 12:57:17:WU00:FS01:0xa7:Reading tar file frame196.tpr
May 18 14:57:17 Celmor-PC FAHClient[334756]: 12:57:17:WU00:FS01:0xa7:Digital signatures verified
May 18 14:57:17 Celmor-PC FAHClient[334756]: 12:57:17:WU00:FS01:0xa7:Reducing thread count from 29 to 28 to avoid domain decomposition by a prime number > 3
May 18 14:57:17 Celmor-PC FAHClient[334756]: 12:57:17:WU00:FS01:0xa7:Calling: mdrun -s frame196.tpr -o frame196.trr -x frame196.xtc -cpt 15 -nt 28
May 18 14:57:17 Celmor-PC FAHClient[334756]: 12:57:17:WU00:FS01:0xa7:Steps: first=98000000 total=500000
May 18 14:57:17 Celmor-PC FAHClient[334756]: 12:57:17:WU00:FS01:0xa7:ERROR:
May 18 14:57:17 Celmor-PC FAHClient[334756]: 12:57:17:WU00:FS01:0xa7:ERROR:-------------------------------------------------------
May 18 14:57:17 Celmor-PC FAHClient[334756]: 12:57:17:WU00:FS01:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
May 18 14:57:17 Celmor-PC FAHClient[334756]: 12:57:17:WU00:FS01:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
May 18 14:57:17 Celmor-PC FAHClient[334756]: 12:57:17:WU00:FS01:0xa7:ERROR:
May 18 14:57:17 Celmor-PC FAHClient[334756]: 12:57:17:WU00:FS01:0xa7:ERROR:Fatal error:
May 18 14:57:17 Celmor-PC FAHClient[334756]: 12:57:17:WU00:FS01:0xa7:ERROR:There is no domain decomposition for 20 ranks that is compatible with the given box and a minimum cell size of 1.37225 nm
May 18 14:57:17 Celmor-PC FAHClient[334756]: 12:57:17:WU00:FS01:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
May 18 14:57:17 Celmor-PC FAHClient[334756]: 12:57:17:WU00:FS01:0xa7:ERROR:Look in the log file for details on the domain decomposition
May 18 14:57:17 Celmor-PC FAHClient[334756]: 12:57:17:WU00:FS01:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
May 18 14:57:17 Celmor-PC FAHClient[334756]: 12:57:17:WU00:FS01:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
May 18 14:57:17 Celmor-PC FAHClient[334756]: 12:57:17:WU00:FS01:0xa7:ERROR:-------------------------------------------------------

May 18 15:20:24 Celmor-PC FAHClient[334756]: 13:20:24:WU00:FS00:0xa7:Project: 16406 (Run 106, Clone 1, Gen 196)
May 18 15:20:24 Celmor-PC FAHClient[334756]: 13:20:24:WU00:FS00:0xa7:Unit: 0x000000e1a8f5c67d5e801f05e2818afe
May 18 15:20:24 Celmor-PC FAHClient[334756]: 13:20:24:WU00:FS00:0xa7:Reading tar file core.xml
May 18 15:20:24 Celmor-PC FAHClient[334756]: 13:20:24:WU00:FS00:0xa7:Reading tar file frame196.tpr
May 18 15:20:24 Celmor-PC FAHClient[334756]: 13:20:24:WU00:FS00:0xa7:Digital signatures verified
May 18 15:20:24 Celmor-PC FAHClient[334756]: 13:20:24:WU00:FS00:0xa7:Calling: mdrun -s frame196.tpr -o frame196.trr -x frame196.xtc -cpt 15 -nt 28
May 18 15:20:24 Celmor-PC FAHClient[334756]: 13:20:24:WU00:FS00:0xa7:Steps: first=98000000 total=500000
May 18 15:20:24 Celmor-PC FAHClient[334756]: 13:20:24:WU00:FS00:0xa7:ERROR:
May 18 15:20:24 Celmor-PC FAHClient[334756]: 13:20:24:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
May 18 15:20:24 Celmor-PC FAHClient[334756]: 13:20:24:WU00:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
May 18 15:20:24 Celmor-PC FAHClient[334756]: 13:20:24:WU00:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
May 18 15:20:24 Celmor-PC FAHClient[334756]: 13:20:24:WU00:FS00:0xa7:ERROR:
May 18 15:20:24 Celmor-PC FAHClient[334756]: 13:20:24:WU00:FS00:0xa7:ERROR:Fatal error:
May 18 15:20:24 Celmor-PC FAHClient[334756]: 13:20:24:WU00:FS00:0xa7:ERROR:There is no domain decomposition for 20 ranks that is compatible with the given box and a minimum cell size of 1.37225 nm
May 18 15:20:24 Celmor-PC FAHClient[334756]: 13:20:24:WU00:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
May 18 15:20:24 Celmor-PC FAHClient[334756]: 13:20:24:WU00:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
May 18 15:20:24 Celmor-PC FAHClient[334756]: 13:20:24:WU00:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
May 18 15:20:24 Celmor-PC FAHClient[334756]: 13:20:24:WU00:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
May 18 15:20:24 Celmor-PC FAHClient[334756]: 13:20:24:WU00:FS00:0xa7:ERROR:-------------------------------------------------------

May 18 15:21:18 Celmor-PC FAHClient[334756]: 13:21:18:<config>
May 18 15:21:18 Celmor-PC FAHClient[334756]: 13:21:18:  <!-- Folding Slot Configuration -->
May 18 15:21:18 Celmor-PC FAHClient[334756]: 13:21:18:  <cause v='COVID_19'/>
May 18 15:21:18 Celmor-PC FAHClient[334756]: 13:21:18:  <gpu v='false'/>
May 18 15:21:18 Celmor-PC FAHClient[334756]: 13:21:18:
May 18 15:21:18 Celmor-PC FAHClient[334756]: 13:21:18:  <!-- Network -->
May 18 15:21:18 Celmor-PC FAHClient[334756]: 13:21:18:  <proxy v=':8080'/>
May 18 15:21:18 Celmor-PC FAHClient[334756]: 13:21:18:
May 18 15:21:18 Celmor-PC FAHClient[334756]: 13:21:18:  <!-- Slot Control -->
May 18 15:21:18 Celmor-PC FAHClient[334756]: 13:21:18:  <pause-on-battery v='false'/>
May 18 15:21:18 Celmor-PC FAHClient[334756]: 13:21:18:  <power v='full'/>
May 18 15:21:18 Celmor-PC FAHClient[334756]: 13:21:18:
May 18 15:21:18 Celmor-PC FAHClient[334756]: 13:21:18:  <!-- User Information -->
May 18 15:21:18 Celmor-PC FAHClient[334756]: 13:21:18:  <passkey v='*****'/>
May 18 15:21:18 Celmor-PC FAHClient[334756]: 13:21:18:  <team v='223518'/>
May 18 15:21:18 Celmor-PC FAHClient[334756]: 13:21:18:  <user v='celmor'/>
May 18 15:21:18 Celmor-PC FAHClient[334756]: 13:21:18:
May 18 15:21:18 Celmor-PC FAHClient[334756]: 13:21:18:  <!-- Folding Slots -->
May 18 15:21:18 Celmor-PC FAHClient[334756]: 13:21:18:  <slot id='0' type='CPU'>
May 18 15:21:18 Celmor-PC FAHClient[334756]: 13:21:18:    <cpus v='28'/>
May 18 15:21:18 Celmor-PC FAHClient[334756]: 13:21:18:  </slot>
May 18 15:21:18 Celmor-PC FAHClient[334756]: 13:21:18:</config>

May 18 17:00:21 Celmor-PC FAHClient[334756]: 15:00:21:WU00:FS00:0xa7:Project: 16427 (Run 0, Clone 2618, Gen 121)
May 18 17:00:21 Celmor-PC FAHClient[334756]: 15:00:21:WU00:FS00:0xa7:Unit: 0x0000008da8f5c67d5e924aab863769f3
May 18 17:00:21 Celmor-PC FAHClient[334756]: 15:00:21:WU00:FS00:0xa7:Reading tar file core.xml
May 18 17:00:21 Celmor-PC FAHClient[334756]: 15:00:21:WU00:FS00:0xa7:Reading tar file frame121.tpr
May 18 17:00:21 Celmor-PC FAHClient[334756]: 15:00:21:WU00:FS00:0xa7:Digital signatures verified
May 18 17:00:21 Celmor-PC FAHClient[334756]: 15:00:21:WU00:FS00:0xa7:Calling: mdrun -s frame121.tpr -o frame121.trr -x frame121.xtc -cpt 15 -nt 28
May 18 17:00:21 Celmor-PC FAHClient[334756]: 15:00:21:WU00:FS00:0xa7:Steps: first=60500000 total=500000
May 18 17:00:21 Celmor-PC FAHClient[334756]: 15:00:21:WU00:FS00:0xa7:ERROR:
May 18 17:00:21 Celmor-PC FAHClient[334756]: 15:00:21:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
May 18 17:00:21 Celmor-PC FAHClient[334756]: 15:00:21:WU00:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
May 18 17:00:21 Celmor-PC FAHClient[334756]: 15:00:21:WU00:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
May 18 17:00:21 Celmor-PC FAHClient[334756]: 15:00:21:WU00:FS00:0xa7:ERROR:
May 18 17:00:21 Celmor-PC FAHClient[334756]: 15:00:21:WU00:FS00:0xa7:ERROR:Fatal error:
May 18 17:00:21 Celmor-PC FAHClient[334756]: 15:00:21:WU00:FS00:0xa7:ERROR:There is no domain decomposition for 20 ranks that is compatible with the given box and a minimum cell size of 1.37225 nm
May 18 17:00:21 Celmor-PC FAHClient[334756]: 15:00:21:WU00:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
May 18 17:00:21 Celmor-PC FAHClient[334756]: 15:00:21:WU00:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
May 18 17:00:21 Celmor-PC FAHClient[334756]: 15:00:21:WU00:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
May 18 17:00:21 Celmor-PC FAHClient[334756]: 15:00:21:WU00:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
May 18 17:00:21 Celmor-PC FAHClient[334756]: 15:00:21:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
Celmor
 
Posts: 4
Joined: Mon May 18, 2020 5:39 pm

Re: Certain WUs keep erroring with high thread numbers

Postby Joe_H » Tue May 19, 2020 6:01 am

Usually, multiples of 2 are good. The same goes for multiples of 3. Analysis of the way the decomposition code by one of the persons who posts here indicates that above 18-20 threads it gets a bit more complicated and it depends on the geometry of the bounding box area involved.

For instance, in the second post the log indicates 28 threads assigned, but a decomposition fails for 20 ranks. 8 threads were reserved for PME calculations, and the bounding box had dimension that would not divide by 5, either 1x4x5 or 2x2x5.

For thread counts 16 and under, usually those projects have been tested and will not assign to thread counts that are either too many, or problematic. But a few do get through internal and beta testing, reports on which projects have issues do help in refining assignments. Full logs would not be needed, enough to show how the WU started and the error that occurred.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Joe_H
Site Admin
 
Posts: 6451
Joined: Tue Apr 21, 2009 5:41 pm
Location: W. MA

Re: Certain WUs keep erroring with high thread numbers

Postby Celmor » Tue May 19, 2020 6:13 am

Joe_H wrote:Usually, multiples of 2 are good. The same goes for multiples of 3. Analysis of the way the decomposition code by one of the persons who posts here indicates that above 18-20 threads it gets a bit more complicated and it depends on the geometry of the bounding box area involved.

For instance, in the second post the log indicates 28 threads assigned, but a decomposition fails for 20 ranks. 8 threads were reserved for PME calculations, and the bounding box had dimension that would not divide by 5, either 1x4x5 or 2x2x5.

For thread counts 16 and under, usually those projects have been tested and will not assign to thread counts that are either too many, or problematic. But a few do get through internal and beta testing, reports on which projects have issues do help in refining assignments. Full logs would not be needed, enough to show how the WU started and the error that occurred.


Thanks for the explanation and insight.
Though I must say it would have been nice to have that information in the troubleshooting guide (other than just "Safe values are 4, 6, 8, 12, 24, 32, and 64.").
There's a (temporary) paste of the log: paste.xinu.at/Q3jtP/
Celmor
 
Posts: 4
Joined: Mon May 18, 2020 5:39 pm

Re: Certain WUs keep erroring with high thread numbers

Postby Joe_H » Tue May 19, 2020 6:24 am

Well, the additional "safe'ish" settings end up depending on which project, so hard to give in a general guide. For instance there is a small group of projects that run quite well on a thread setting of 21, but so many other projects errors out on a multiple of 7.
Joe_H
Site Admin
 
Posts: 6451
Joined: Tue Apr 21, 2009 5:41 pm
Location: W. MA

Re: Certain WUs keep erroring with high thread numbers

Postby PantherX » Tue May 19, 2020 8:30 am

Celmor wrote:...Thanks for the explanation and insight.
Though I must say it would have been nice to have that information in the troubleshooting guide (other than just "Safe values are 4, 6, 8, 12, 24, 32, and 64.")...

Thanks for your feedback. I have updated the content slightly to provide a more meaningful workflow without getting too specific/technical:
...There is no domain decomposition...
  1. Modify the CPU value (Advanced Control/FAHControl -> Configure -> Slots Tab -> CPU Slot -> Edit -> New CPU Value) to a smaller one which isn't a prime number bigger than 3 nor a multiple of 5. Generally speaking, the safe values are; 2, 4, 6, 8, 12, 24, 32, and 64. There can be other safe numbers too but that would be Project specific. You can create a topic or search the Forums. The reason some values fail is because the FahCore can't divide the assigned WU across all the CPUs (technical details). Create a new topic here, provide the log file, and state the original CPU value so that the F@H Team can prevent that Project from being assigned to those CPU values.
...
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
User avatar
PantherX
Site Moderator
 
Posts: 6345
Joined: Wed Dec 23, 2009 10:33 am
Location: Land Of The Long White Cloud

Re: Certain WUs keep erroring with high thread numbers

Postby Neil-B » Tue May 19, 2020 8:54 am

I am intrigued as to why 27 is pretty much always left off the list ... has there been past issues with this? ... only divisible by low primes ... is there some issue with PME that makes this risky?
Neil-B
 
Posts: 1217
Joined: Sun Mar 22, 2020 6:52 pm
Location: UK

Re: Certain WUs keep erroring with high thread numbers

Postby PantherX » Tue May 19, 2020 9:49 am

Neil-B wrote:I am intrigued as to why 27 is pretty much always left off the list ... has there been past issues with this? ... only divisible by low primes ... is there some issue with PME that makes this risky?

I have purposely left it off that list simple because 27 CPUs is not a "natural" number of CPUs. By that, I mean that CPUs released by AMD/Intel which are used by researchers who run those systems dedicated for GROMACS.

While _r2w_ben's awesome analysis hasn't proved that 27 failes for the current CPU projects (viewtopic.php?p=330501#p330501), there's no guarantee that future projects will work. Thus, that list contains the safest options to ensure that it just works. I like to apply the 80/20 rule. My guides tend to cover 80% and have a bit of hint for 20% but not too much details as it might become a challenge for me to maintain in future. Focusing on the 80% provides the best balance for me :)
User avatar
PantherX
Site Moderator
 
Posts: 6345
Joined: Wed Dec 23, 2009 10:33 am
Location: Land Of The Long White Cloud

Re: Certain WUs keep erroring with high thread numbers

Postby Neil-B » Tue May 19, 2020 11:40 am

Ta for the logic … makes absolute sense …

I may switch my 32/56 and 24/56 slots to a pair of 27/56 to give a little headroom for other stuff and see how it goes - might be able to report back that it is best avoided !! … I'm intrigued as to how they will fair in total ppd against my existing - part of me thinks I will see a noticeable drop in PPD (and hence science throughput) as dropping highest core count slot by 5 and also two less cores - but the other part is wondering if this is the easiest way to free up a couple of cores for non FAHCore duties and reduce any contention without having to take more than a couple of cores hit (24>18, 32>24).

In case anyone is interested:

OK so have been running 2x 27/56 slots instead of my usual 24/56 and 32/56 slots this afternoon and have had no issues with faults - all WUs completed OK … tbh it seems, off a very short run admittedly, that my regular setup delivers higher PPD (and therefore better science) than this adjusted setup so I guess the contention issues of having my regular two slots utilising all available threads are less damaging than dropping two threads to alleviate this (by reducing the 32 to 27 and increasing the 24 to 27)

I'll run it for a day and compare (as close as I can) like for like WUs form same projects and see how the actual figure add up - but looking like I'll be back to 24/56 and 32/56 tomorrow afternoon.
Neil-B
 
Posts: 1217
Joined: Sun Mar 22, 2020 6:52 pm
Location: UK

Re: Certain WUs keep erroring with high thread numbers

Postby _r2w_ben » Wed May 20, 2020 1:13 am

Neil-B wrote:OK so have been running 2x 27/56 slots instead of my usual 24/56 and 32/56 slots this afternoon and have had no issues with faults - all WUs completed OK … tbh it seems, off a very short run admittedly, that my regular setup delivers higher PPD (and therefore better science) than this adjusted setup so I guess the contention issues of having my regular two slots utilising all available threads are less damaging than dropping two threads to alleviate this (by reducing the 32 to 27 and increasing the 24 to 27)

I'll run it for a day and compare (as close as I can) like for like WUs form same projects and see how the actual figure add up - but looking like I'll be back to 24/56 and 32/56 tomorrow afternoon.

24 vs. 27 threads is an interesting comparison. Here's some of the data I've collected on thread allocations for those two core counts:
Code: Select all
                                            Expected
Thread allocations                          Winner     Reason
24 = 12 PP + 12 PME   27 = 15 PP + 12 PME   27         25% more PP threads. PME looks higher than needed on 24
24 = 15 PP +  9 PME   27 = 15 PP + 12 PME   Draw       Same PP threads. 27 if PME was a bottleneck on 24
24 = 16 PP +  8 PME   27 = 18 PP +  9 PME   27         12.5% more PP threads. PME/Total are both 1/3
24 = 18 PP +  6 PME   27 = 18 PP +  9 PME   Draw       Same PP threads. 27 if PME was a bottleneck on 24
24 = 20 PP +  4 PME   27 = 24 PP +  3 PME   27         20% more PP threads
24 = 20 PP +  4 PME   27 = 18 PP +  9 PME   24         10% less PP threads. PME looks higher than needed on 27

I would expect performance between 24 and 27 threads to be a tossup dependent on projects that are currently running.
_r2w_ben
 
Posts: 277
Joined: Wed Apr 23, 2008 4:11 pm

Re: Certain WUs keep erroring with high thread numbers

Postby Neil-B » Wed May 20, 2020 10:14 am

_r2w_ben wrote:I would expect performance between 24 and 27 threads to be a tossup dependent on projects that are currently running.

… Gold Star awarded for Crystal Ball Excellence :)

The 27slots were overall maybe a shave above the 24slot … So the Gain of a 27 instead of a 24 was minimal - the loss of a 32 for a 27 was significant - overall the drop in PPD fairly closely matched the loss from 32 to 27 so the gain by reducing contention was in my case not noticeable over a fairly small sample - I changed slots back to normal after about 12 hours (16 or so WUs completed).

All of the 27slot WUs completed OK (from some 10+ different projects) … I could do the switch again on a longer term if checking 25thread slots beta testing was something needed by researchers - but if people tend to steer clear of this slot then I guess that it isn't that critical?
Neil-B
 
Posts: 1217
Joined: Sun Mar 22, 2020 6:52 pm
Location: UK

Re: Certain WUs keep erroring with high thread numbers

Postby _r2w_ben » Wed May 20, 2020 11:04 pm

25 threads will fail more often than 24 and 27. It normally fails on the same work units as 10, 15, and 28 threads. You could use 25 if you want to try catch domain decomposition errors that slipped through internal testing.

Here's a chart for 24 vs. 25.
Code: Select all
                                            Expected
Thread allocations                          Winner     Reason
24 = 12 PP + 12 PME   25 = 15 PP + 10 PME   25         25% more PP threads. PME looks higher than needed on 24
24 = 15 PP +  9 PME   25 = 15 PP + 10 PME   Draw       Same PP threads. 25 if PME was a bottleneck on 24
24 = 16 PP +  8 PME   25 = 15 PP + 10 PME   24         6% less PP threads
24 = 18 PP +  6 PME   25 = 20 PP +  5 PME   25         11% more PP threads
24 = 18 PP +  6 PME   25 = 15 PP + 10 PME   24         11% less PP threads. PME looks higher than needed on 24
24 = 20 PP +  4 PME   25 = 20 PP +  5 PME   Draw       Same PP threads. 25 if PME was a bottleneck on 24
_r2w_ben
 
Posts: 277
Joined: Wed Apr 23, 2008 4:11 pm


Return to Issues with a specific WU

Who is online

Users browsing this forum: No registered users and 1 guest

cron