Project 16417 fails on high core count machines

Moderators: Site Moderators, FAHC Science Team

Post Reply
Area256
Posts: 3
Joined: Wed Jul 14, 2010 2:49 pm

Project 16417 fails on high core count machines

Post by Area256 »

I'm having a problem with Project 16417 (and possibly similar ones) on very high core count machines (128 threads in this case).

I think the issue is having too many threads assigned to the core. If I limit the number of threads then the unit will run. Also frustratingly when it fails it just keeps retrying instead of reporting the failure and trying to get another unit, so my system stays idle constantly retrying until I limit the number of cores for this unit by switching to "Light" folding power.

Full logs:

Code: Select all

22:34:41:WU00:FS00:Starting
22:34:41:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 704 -lifeline 8778 -checkpoint 15 -np 128
22:34:41:WU00:FS00:Started FahCore on PID 11885
22:34:41:WU00:FS00:Core PID:11889
22:34:41:WU00:FS00:FahCore 0xa7 started
22:34:41:WU00:FS00:0xa7:*********************** Log Started 2020-04-06T22:34:41Z ***********************
22:34:41:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
22:34:41:WU00:FS00:0xa7:       Type: 0xa7
22:34:41:WU00:FS00:0xa7:       Core: Gromacs
22:34:41:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 704 -lifeline 11885 -checkpoint 15 -np
22:34:41:WU00:FS00:0xa7:             128
22:34:41:WU00:FS00:0xa7:************************************ CBang *************************************
22:34:41:WU00:FS00:0xa7:       Date: Nov 5 2019
22:34:41:WU00:FS00:0xa7:       Time: 06:06:57
22:34:41:WU00:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
22:34:41:WU00:FS00:0xa7:     Branch: master
22:34:41:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
22:34:41:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
22:34:41:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
22:34:41:WU00:FS00:0xa7:       Bits: 64
22:34:41:WU00:FS00:0xa7:       Mode: Release
22:34:41:WU00:FS00:0xa7:************************************ System ************************************
22:34:41:WU00:FS00:0xa7:        CPU: AMD EPYC 7702 64-Core Processor
22:34:41:WU00:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 49 Stepping 0
22:34:41:WU00:FS00:0xa7:       CPUs: 128
22:34:41:WU00:FS00:0xa7:     Memory: 251.54GiB
22:34:41:WU00:FS00:0xa7:Free Memory: 244.47GiB
22:34:41:WU00:FS00:0xa7:    Threads: POSIX_THREADS
22:34:41:WU00:FS00:0xa7: OS Version: 5.3
22:34:41:WU00:FS00:0xa7:Has Battery: false
22:34:41:WU00:FS00:0xa7: On Battery: false
22:34:41:WU00:FS00:0xa7: UTC Offset: 0
22:34:41:WU00:FS00:0xa7:        PID: 11889
22:34:41:WU00:FS00:0xa7:        CWD: /var/lib/fahclient/work
22:34:41:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
22:34:41:WU00:FS00:0xa7:    Version: 0.0.18
22:34:41:WU00:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
22:34:41:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
22:34:41:WU00:FS00:0xa7:   Homepage: https://foldingathome.org/
22:34:41:WU00:FS00:0xa7:       Date: Nov 5 2019
22:34:41:WU00:FS00:0xa7:       Time: 06:13:26
22:34:41:WU00:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
22:34:41:WU00:FS00:0xa7:     Branch: master
22:34:41:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
22:34:41:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
22:34:41:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
22:34:41:WU00:FS00:0xa7:       Bits: 64
22:34:41:WU00:FS00:0xa7:       Mode: Release
22:34:41:WU00:FS00:0xa7:************************************ Build *************************************
22:34:41:WU00:FS00:0xa7:       SIMD: avx_256
22:34:41:WU00:FS00:0xa7:********************************************************************************
22:34:41:WU00:FS00:0xa7:Project: 16417 (Run 535, Clone 2, Gen 7)
22:34:41:WU00:FS00:0xa7:Unit: 0x0000000796880e6e5e8a61a9533cfa03
22:34:41:WU00:FS00:0xa7:Reading tar file core.xml
22:34:41:WU00:FS00:0xa7:Reading tar file frame7.tpr
22:34:41:WU00:FS00:0xa7:Digital signatures verified
22:34:41:WU00:FS00:0xa7:Calling: mdrun -s frame7.tpr -o frame7.trr -x frame7.xtc -cpt 15 -nt 128
22:34:41:WU00:FS00:0xa7:Steps: first=1750000 total=250000
22:34:41:WU00:FS00:0xa7:ERROR:
22:34:41:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
22:34:41:WU00:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
22:34:41:WU00:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
22:34:41:WU00:FS00:0xa7:ERROR:
22:34:41:WU00:FS00:0xa7:ERROR:Fatal error:
22:34:41:WU00:FS00:0xa7:ERROR:There is no domain decomposition for 96 ranks that is compatible with the given box and a minimum cell size of 1.4227 nm
22:34:41:WU00:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
22:34:41:WU00:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
22:34:41:WU00:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
22:34:41:WU00:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
22:34:41:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
22:34:46:WU00:FS00:0xa7:WARNING:Unexpected exit() call
22:34:46:WU00:FS00:0xa7:WARNING:Unexpected exit from science code
22:34:46:WU00:FS00:0xa7:Saving result file ../logfile_01.txt
22:34:46:WU00:FS00:0xa7:Saving result file md.log
22:34:46:WU00:FS00:0xa7:Saving result file science.log
22:34:46:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
My config.xml

Code: Select all

<config>
  <!-- Client Control -->
  <fold-anon v='true'/>

  <!-- Folding Slot Configuration -->
  <gpu v='false'/>

  <!-- HTTP Server -->
  <allow v='127.0.0.1'/>

  <!-- Network -->
  <proxy v=':8080'/>

  <!-- Remote Command Server -->
  <password v='***'/>

  <!-- Slot Control -->
  <power v='light'/>

  <!-- User Information -->
  <passkey v='***''/>
  <team v='***'/>
  <user v='***'/>

  <!-- Web Server -->
  <web-allow v='127.0.0.1/>

  <!-- Folding Slots -->
  <slot id='0' type='CPU'>
    <client-type v='bigadv'/>
  </slot>
</config>
Joe_H
Site Admin
Posts: 7857
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Project 16417 fails on high core count machines

Post by Joe_H »

There are a limited number of CPU projects that will run on that many threads, the rest usually have upper limits set for assignment.

What was the size you were able to run this project on? I will pass this information back to the researcher so they can check and adjust the limits for this project.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Area256
Posts: 3
Joined: Wed Jul 14, 2010 2:49 pm

Re: Project 16417 fails on high core count machines

Post by Area256 »

I was able to run it on 64 threads (which seems to be the limit set by the "Light") option. I'm afraid I didn't test anything higher.
Joe_H
Site Admin
Posts: 7857
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Project 16417 fails on high core count machines

Post by Joe_H »

Other settings would need to be made through FAHControl's Configure. But that at least gives some useful limits.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
sukritsingh
Scientist
Posts: 135
Joined: Sat Mar 14, 2020 11:53 pm

Re: Project 16417 fails on high core count machines

Post by sukritsingh »

Thanks for flagging! I've updated the project thread limits so that it only uses 64 cores and below since that was the limit you tested.
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Project 16417 fails on high core count machines

Post by Neil-B »

FYI - A larger core count might have been the reason my Project: 16417 (Run 2023, Clone 4, Gen 2) failed on a 24core slot .. it has since been successfully completed by someone else - so upper limit might be less than 64 ??

Code: Select all

12:26:12:WU01:FS00:Connecting to 65.254.110.245:8080
12:26:12:WU01:FS00:Assigned to work server 150.136.14.110
12:26:12:WU01:FS00:Requesting new work unit for slot 00: READY cpu:24 from 150.136.14.110
12:26:12:WU01:FS00:Connecting to 150.136.14.110:8080
12:26:13:WU01:FS00:Downloading 2.34MiB
12:26:14:WU01:FS00:Download complete
12:26:14:WU01:FS00:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:16417 run:2023 clone:4 gen:2 core:0xa7 unit:0x0000000296880e6e5e8a604c5fef0873
12:26:14:WU01:FS00:Starting
12:26:14:WU01:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\OpDoubleHelix\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/avx/Core_a7.fah/FahCore_a7.exe -dir 01 -suffix 01 -version 705 -lifeline 13272 -checkpoint 5 -np 24
12:26:14:WU01:FS00:Started FahCore on PID 10236
12:26:14:WU01:FS00:Core PID:10032
12:26:14:WU01:FS00:FahCore 0xa7 started
12:26:14:WU01:FS00:0xa7:*********************** Log Started 2020-04-06T12:26:14Z ***********************
12:26:14:WU01:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
12:26:14:WU01:FS00:0xa7:       Type: 0xa7
12:26:14:WU01:FS00:0xa7:       Core: Gromacs
12:26:14:WU01:FS00:0xa7:       Args: -dir 01 -suffix 01 -version 705 -lifeline 10236 -checkpoint 5 -np
12:26:14:WU01:FS00:0xa7:             24
12:26:14:WU01:FS00:0xa7:************************************ CBang *************************************
12:26:14:WU01:FS00:0xa7:       Date: Oct 26 2019
12:26:14:WU01:FS00:0xa7:       Time: 01:38:25
12:26:14:WU01:FS00:0xa7:   Revision: c46a1a011a24143739ac7218c5a435f66777f62f
12:26:14:WU01:FS00:0xa7:     Branch: master
12:26:14:WU01:FS00:0xa7:   Compiler: Visual C++ 2008
12:26:14:WU01:FS00:0xa7:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
12:26:14:WU01:FS00:0xa7:   Platform: win32 10
12:26:14:WU01:FS00:0xa7:       Bits: 64
12:26:14:WU01:FS00:0xa7:       Mode: Release
12:26:14:WU01:FS00:0xa7:************************************ System ************************************
12:26:14:WU01:FS00:0xa7:        CPU: Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz
12:26:14:WU01:FS00:0xa7:     CPU ID: GenuineIntel Family 6 Model 63 Stepping 2
12:26:14:WU01:FS00:0xa7:       CPUs: 56
12:26:14:WU01:FS00:0xa7:     Memory: 511.75GiB
12:26:14:WU01:FS00:0xa7:Free Memory: 500.27GiB
12:26:14:WU01:FS00:0xa7:    Threads: WINDOWS_THREADS
12:26:14:WU01:FS00:0xa7: OS Version: 6.2
12:26:14:WU01:FS00:0xa7:Has Battery: false
12:26:14:WU01:FS00:0xa7: On Battery: false
12:26:14:WU01:FS00:0xa7: UTC Offset: 1
12:26:14:WU01:FS00:0xa7:        PID: 10032
12:26:14:WU01:FS00:0xa7:        CWD: C:\Users\OpDoubleHelix\AppData\Roaming\FAHClient\work
12:26:14:WU01:FS00:0xa7:******************************** Build - libFAH ********************************
12:26:14:WU01:FS00:0xa7:    Version: 0.0.18
12:26:14:WU01:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
12:26:15:WU01:FS00:0xa7:  Copyright: 2019 foldingathome.org
12:26:15:WU01:FS00:0xa7:   Homepage: https://foldingathome.org/
12:26:15:WU01:FS00:0xa7:       Date: Oct 26 2019
12:26:15:WU01:FS00:0xa7:       Time: 01:52:30
12:26:15:WU01:FS00:0xa7:   Revision: c1e3513b1bc0c16013668f2173ee969e5995b38e
12:26:15:WU01:FS00:0xa7:     Branch: master
12:26:15:WU01:FS00:0xa7:   Compiler: Visual C++ 2008
12:26:15:WU01:FS00:0xa7:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
12:26:15:WU01:FS00:0xa7:   Platform: win32 10
12:26:15:WU01:FS00:0xa7:       Bits: 64
12:26:15:WU01:FS00:0xa7:       Mode: Release
12:26:15:WU01:FS00:0xa7:************************************ Build *************************************
12:26:15:WU01:FS00:0xa7:       SIMD: avx_256
12:26:15:WU01:FS00:0xa7:********************************************************************************
12:26:15:WU01:FS00:0xa7:Project: 16417 (Run 2023, Clone 4, Gen 2)
12:26:15:WU01:FS00:0xa7:Unit: 0x0000000296880e6e5e8a604c5fef0873
12:26:15:WU01:FS00:0xa7:Reading tar file core.xml
12:26:15:WU01:FS00:0xa7:Reading tar file frame2.tpr
12:26:15:WU01:FS00:0xa7:Digital signatures verified
12:26:15:WU01:FS00:0xa7:Calling: mdrun -s frame2.tpr -o frame2.trr -x frame2.xtc -cpt 5 -nt 24
12:26:15:WU01:FS00:0xa7:Steps: first=500000 total=250000
12:26:15:WU01:FS00:0xa7:ERROR:
12:26:15:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
12:26:15:WU01:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
12:26:15:WU01:FS00:0xa7:ERROR:Source code file: C:\build\fah\core-a7-avx-release\windows-10-64bit-core-a7-avx-release\gromacs-core\build\gromacs\src\gromacs\mdlib\domdec.c, line: 6902
12:26:15:WU01:FS00:0xa7:ERROR:
12:26:15:WU01:FS00:0xa7:ERROR:Fatal error:
12:26:15:WU01:FS00:0xa7:ERROR:There is no domain decomposition for 20 ranks that is compatible with the given box and a minimum cell size of 1.4227 nm
12:26:15:WU01:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
12:26:15:WU01:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
12:26:15:WU01:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
12:26:15:WU01:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
12:26:15:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
12:26:20:WU01:FS00:0xa7:WARNING:Unexpected exit() call
12:26:20:WU01:FS00:0xa7:WARNING:Unexpected exit from science code
12:26:20:WU01:FS00:0xa7:Saving result file ..\logfile_01.txt
12:26:20:WU01:FS00:0xa7:Saving result file md.log
12:26:20:WU01:FS00:0xa7:Saving result file science.log
12:26:20:WU01:FS00:0xa7:WARNING:While cleaning up: boost::filesystem::remove: The process cannot access the file because it is being used by another process: "01/md.log"
12:26:20:WU01:FS00:0xa7:Folding@home Core Shutdown: BAD_WORK_UNIT
12:26:20:WARNING:WU01:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
12:26:20:WU01:FS00:Sending unit results: id:01 state:SEND error:FAULTY project:16417 run:2023 clone:4 gen:2 core:0xa7 unit:0x0000000296880e6e5e8a604c5fef0873
12:26:20:WU01:FS00:Uploading 20.00KiB to 150.136.14.110
12:26:20:WU01:FS00:Connecting to 150.136.14.110:8080
12:26:20:WU01:FS00:Upload complete
12:26:20:WU01:FS00:Server responded WORK_ACK (400)
12:26:20:WU01:FS00:Cleaning up
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Project 16417 fails on high core count machines

Post by Neil-B »

Further update from another thread … viewtopic.php?f=19&t=34072#p323443 might have to go as low as 8cores for this?

Ignore me … Joe_H has responded on other thread … given the 64 core that worked he reckons probably multiple of 5 is the issue … not sure about my 24core failure but happy for that to be an anomaly.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Project 16417 fails on high core count machines

Post by PantherX »

FYI, I have confirmation from the Project owner that Project 16417 will no longer be assigned to 24 CPUs. Thanks all for your report :)
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Zzyzx
Posts: 8
Joined: Thu Apr 18, 2013 4:20 pm
Hardware configuration: Core i7 920
16GB DDR3 RAM
GTX 580
Location: Phoenix, Arizona, USA
Contact:

Re: Project 16417 fails on high core count machines

Post by Zzyzx »

PantherX wrote:FYI, I have confirmation from the Project owner that Project 16417 will no longer be assigned to 24 CPUs. Thanks all for your report :)
Hey there! I got assigned 16417 on a 24c/48t machine today and had the same issue:

Code: Select all

13:47:45:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 706 -lifeline 84821 -checkpoint 5 -np 48
13:47:45:WU00:FS00:Started FahCore on PID 64119
13:47:45:WU00:FS00:Core PID:64123
13:47:45:WU00:FS00:FahCore 0xa7 started
13:47:45:WU00:FS00:0xa7:*********************** Log Started 2020-04-20T13:47:45Z ***********************
13:47:45:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
13:47:45:WU00:FS00:0xa7:       Type: 0xa7
13:47:45:WU00:FS00:0xa7:       Core: Gromacs
13:47:45:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 706 -lifeline 64119 -checkpoint 5 -np
13:47:45:WU00:FS00:0xa7:             48
13:47:45:WU00:FS00:0xa7:************************************ CBang *************************************
13:47:45:WU00:FS00:0xa7:       Date: Nov 5 2019
13:47:45:WU00:FS00:0xa7:       Time: 06:06:57
13:47:45:WU00:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
13:47:45:WU00:FS00:0xa7:     Branch: master
13:47:45:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
13:47:45:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
13:47:45:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
13:47:45:WU00:FS00:0xa7:       Bits: 64
13:47:45:WU00:FS00:0xa7:       Mode: Release
13:47:45:WU00:FS00:0xa7:************************************ System ************************************
13:47:45:WU00:FS00:0xa7:        CPU: Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
13:47:45:WU00:FS00:0xa7:     CPU ID: GenuineIntel Family 6 Model 62 Stepping 4
13:47:45:WU00:FS00:0xa7:       CPUs: 48
13:47:45:WU00:FS00:0xa7:     Memory: 15.48GiB
13:47:45:WU00:FS00:0xa7:Free Memory: 8.42GiB
13:47:45:WU00:FS00:0xa7:    Threads: POSIX_THREADS
13:47:45:WU00:FS00:0xa7: OS Version: 4.18
13:47:45:WU00:FS00:0xa7:Has Battery: false
13:47:45:WU00:FS00:0xa7: On Battery: false
13:47:45:WU00:FS00:0xa7: UTC Offset: -7
13:47:45:WU00:FS00:0xa7:        PID: 64123
13:47:45:WU00:FS00:0xa7:        CWD: /var/lib/fahclient/work
13:47:45:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
13:47:45:WU00:FS00:0xa7:    Version: 0.0.18
13:47:45:WU00:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
13:47:45:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
13:47:45:WU00:FS00:0xa7:   Homepage: https://foldingathome.org/
13:47:45:WU00:FS00:0xa7:       Date: Nov 5 2019
13:47:45:WU00:FS00:0xa7:       Time: 06:13:26
13:47:45:WU00:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
13:47:45:WU00:FS00:0xa7:     Branch: master
13:47:45:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
13:47:45:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
13:47:45:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
13:47:45:WU00:FS00:0xa7:       Bits: 64
13:47:45:WU00:FS00:0xa7:       Mode: Release
13:47:45:WU00:FS00:0xa7:************************************ Build *************************************
13:47:45:WU00:FS00:0xa7:       SIMD: avx_256
13:47:45:WU00:FS00:0xa7:********************************************************************************
13:47:45:WU00:FS00:0xa7:Project: 16417 (Run 473, Clone 2, Gen 83)
13:47:45:WU00:FS00:0xa7:Unit: 0x0000005a96880e6e5e8a61200c024db9
13:47:45:WU00:FS00:0xa7:Reading tar file core.xml
13:47:45:WU00:FS00:0xa7:Reading tar file frame83.tpr
13:47:45:WU00:FS00:0xa7:Digital signatures verified
13:47:45:WU00:FS00:0xa7:Calling: mdrun -s frame83.tpr -o frame83.trr -x frame83.xtc -cpt 5 -nt 48
13:47:45:WU00:FS00:0xa7:Steps: first=20750000 total=250000
13:47:45:WU00:FS00:0xa7:ERROR:
13:47:45:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
13:47:45:WU00:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
13:47:45:WU00:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
13:47:45:WU00:FS00:0xa7:ERROR:
13:47:45:WU00:FS00:0xa7:ERROR:Fatal error:
13:47:45:WU00:FS00:0xa7:ERROR:There is no domain decomposition for 40 ranks that is compatible with the given box and a minimum cell size of 1.4227 nm
13:47:45:WU00:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
13:47:45:WU00:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
13:47:45:WU00:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
13:47:45:WU00:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
13:47:45:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
13:47:50:WU00:FS00:0xa7:WARNING:Unexpected exit() call
13:47:50:WU00:FS00:0xa7:WARNING:Unexpected exit from science code
13:47:50:WU00:FS00:0xa7:Saving result file ../logfile_01.txt
13:47:50:WU00:FS00:0xa7:Saving result file md.log
13:47:50:WU00:FS00:0xa7:Saving result file science.log
13:47:51:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
Here is md.log in case it helps:

Code: Select all

Log file opened on Mon Apr 20 06:47:45 2020
Host: direwolf-fah.wolfeindustrie.com  pid: 64123  rank ID: 0  number of ranks:  1
GROMACS:    GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown

GROMACS is written by:
Emile Apol         Rossen Apostolov   Herman J.C. Berendsen Par Bjelkmar
Aldert van Buuren  Rudi van Drunen    Anton Feenstra     Sebastian Fritsch
Gerrit Groenhof    Christoph Junghans Peter Kasson       Carsten Kutzner
Per Larsson        Justin A. Lemkul   Magnus Lundborg    Pieter Meulenhoff
Erik Marklund      Teemu Murtola      Szilard Pall       Sander Pronk
Roland Schulz      Alexey Shvetsov    Michael Shirts     Alfons Sijbers
Peter Tieleman     Christian Wennberg Maarten Wolf
and the project leaders:
Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel

Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2014, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.


GROMACS:      GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown

Gromacs version:    VERSION 5.0.4-20191026-456f0d636-unknown
GIT SHA1 hash:      456f0d636b694d70ef483843dbb1b1383643ee12
Branched from:      unknown
Precision:          single
Memory model:       64 bit
MPI library:        thread_mpi
OpenMP support:     disabled
GPU support:        disabled
invsqrt routine:    gmx_software_invsqrt(x)
SIMD instructions:  AVX_256
FFT library:        fftw-3.3.8-sse2-avx
RDTSCP usage:       disabled
C++11 compilation:  disabled
TNG support:        enabled
Tracing support:    disabled
Built on:           Wed Mar 22 01:02:31 UTC 2017
Built by:           root@69562b3fdcef [CMAKE]
Build OS/arch:      Linux 4.9.0-1-amd64 x86_64
Build CPU vendor:   GenuineIntel
Build CPU brand:    Intel(R) Core(TM) i7-3770S CPU @ 3.10GHz
Build CPU family:   6   Model: 58   Stepping: 9
Build CPU features: aes apic avx clfsh cmov cx8 cx16 f16c htt lahf_lm mmx msr nonstop_tsc pcid pclmuldq pdcm popcnt pse rdrnd rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
C compiler:         /usr/bin/cc GNU 8.3.0
C compiler flags:    -mavx   -I/host/debian-stable-64bit-core-a7-avx-release/libfah/build/src -I/host/debian-stable-64bit-core-a7-avx-release/cbang/build/include -Wno-maybe-uninitialized -Wextra -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith -Wall -Wno-unused -Wunused-value -Wunused-parameter -Wno-unknown-pragmas  -O3 -DNDEBUG -fomit-frame-pointer -funroll-all-loops -fexcess-precision=fast  -Wno-array-bounds
C++ compiler:       /usr/bin/c++ GNU 8.3.0
C++ compiler flags:  -mavx   -I/host/debian-stable-64bit-core-a7-avx-release/libfah/build/src -I/host/debian-stable-64bit-core-a7-avx-release/cbang/build/include -Wextra -Wno-missing-field-initializers -Wpointer-arith -Wall -Wno-unused-function -Wno-unknown-pragmas  -O3 -DNDEBUG -fomit-frame-pointer -funroll-all-loops -fexcess-precision=fast  -Wno-array-bounds
Boost version:      1.55.0 (internal)



++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 435-447
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
Berendsen
GROMACS: Fast, Flexible and Free
J. Comp. Chem. 26 (2005) pp. 1701-1719
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
E. Lindahl and B. Hess and D. van der Spoel
GROMACS 3.0: A package for molecular simulation and trajectory analysis
J. Mol. Mod. 7 (2001) pp. 306-317
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
GROMACS: A message-passing parallel molecular dynamics implementation
Comp. Phys. Comm. 91 (1995) pp. 43-56
-------- -------- --- Thank You --- -------- --------

Can not increase nstlist because verlet-buffer-tolerance is not set or used
Input Parameters:
   integrator                     = md
   tinit                          = 0
   dt                             = 0.004
   nsteps                         = 250000
   init-step                      = 20750000
   simulation-part                = 1
   comm-mode                      = Linear
   nstcomm                        = 5
   bd-fric                        = 0
   ld-seed                        = 2396924895
   emtol                          = 10
   emstep                         = 0.01
   niter                          = 20
   fcstep                         = 0
   nstcgsteep                     = 1000
   nbfgscorr                      = 10
   rtpi                           = 0.05
   nstxout                        = 0
   nstvout                        = 0
   nstfout                        = 0
   nstlog                         = 0
   nstcalcenergy                  = 0
   nstenergy                      = 0
   nstxout-compressed             = 5000
   compressed-x-precision         = 1000
   cutoff-scheme                  = Verlet
   nstlist                        = 10
   ns-type                        = Grid
   pbc                            = xyz
   periodic-molecules             = FALSE
   verlet-buffer-tolerance        = -1
   rlist                          = 1.1
   rlistlong                      = 1.1
   nstcalclr                      = 10
   coulombtype                    = PME
   coulomb-modifier               = Potential-shift
   rcoulomb-switch                = 0
   rcoulomb                       = 0.9
   epsilon-r                      = 1
   epsilon-rf                     = inf
   vdw-type                       = Cut-off
   vdw-modifier                   = Potential-shift
   rvdw-switch                    = 0
   rvdw                           = 0.9
   DispCorr                       = EnerPres
   table-extension                = 1
   fourierspacing                 = 0.12
   fourier-nx                     = 72
   fourier-ny                     = 72
   fourier-nz                     = 72
   pme-order                      = 4
   ewald-rtol                     = 1e-05
   ewald-rtol-lj                  = 0.001
   lj-pme-comb-rule               = Geometric
   ewald-geometry                 = 0
   epsilon-surface                = 0
   implicit-solvent               = No
   gb-algorithm                   = Still
   nstgbradii                     = 1
   rgbradii                       = 1
   gb-epsilon-solvent             = 80
   gb-saltconc                    = 0
   gb-obc-alpha                   = 1
   gb-obc-beta                    = 0.8
   gb-obc-gamma                   = 4.85
   gb-dielectric-offset           = 0.009
   sa-algorithm                   = Ace-approximation
   sa-surface-tension             = 2.05016
   tcoupl                         = V-rescale
   nsttcouple                     = 10
   nh-chain-length                = 0
   print-nose-hoover-chain-variables = FALSE
   pcoupl                         = Parrinello-Rahman
   pcoupltype                     = Isotropic
   nstpcouple                     = 10
   tau-p                          = 1
   compressibility (3x3):
      compressibility[    0]={ 4.50000e-05,  0.00000e+00,  0.00000e+00}
      compressibility[    1]={ 0.00000e+00,  4.50000e-05,  0.00000e+00}
      compressibility[    2]={ 0.00000e+00,  0.00000e+00,  4.50000e-05}
   ref-p (3x3):
      ref-p[    0]={ 1.00000e+00,  0.00000e+00,  0.00000e+00}
      ref-p[    1]={ 0.00000e+00,  1.00000e+00,  0.00000e+00}
      ref-p[    2]={ 0.00000e+00,  0.00000e+00,  1.00000e+00}
   refcoord-scaling               = All
   posres-com (3):
      posres-com[0]= 0.00000e+00
      posres-com[1]= 0.00000e+00
      posres-com[2]= 0.00000e+00
   posres-comB (3):
      posres-comB[0]= 0.00000e+00
      posres-comB[1]= 0.00000e+00
      posres-comB[2]= 0.00000e+00
   QMMM                           = FALSE
   QMconstraints                  = 0
   QMMMscheme                     = 0
   MMChargeScaleFactor            = 1
qm-opts:
   ngQM                           = 0
   constraint-algorithm           = Lincs
   continuation                   = TRUE
   Shake-SOR                      = FALSE
   shake-tol                      = 0.0001
   lincs-order                    = 6
   lincs-iter                     = 2
   lincs-warnangle                = 30
   nwall                          = 0
   wall-type                      = 9-3
   wall-r-linpot                  = -1
   wall-atomtype[0]               = -1
   wall-atomtype[1]               = -1
   wall-density[0]                = 0
   wall-density[1]                = 0
   wall-ewald-zfac                = 3
   pull                           = no
   rotation                       = FALSE
   interactiveMD                  = FALSE
   disre                          = No
   disre-weighting                = Conservative
   disre-mixed                    = FALSE
   dr-fc                          = 1000
   dr-tau                         = 0
   nstdisreout                    = 100
   orire-fc                       = 0
   orire-tau                      = 0
   nstorireout                    = 100
   free-energy                    = no
   cos-acceleration               = 0
   deform (3x3):
      deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
   simulated-tempering            = FALSE
   E-x:
      n = 0
   E-xt:
      n = 0
   E-y:
      n = 0
   E-yt:
      n = 0
   E-z:
      n = 0
   E-zt:
      n = 0
   swapcoords                     = no
   adress                         = FALSE
   userint1                       = 0
   userint2                       = 0
   userint3                       = 0
   userint4                       = 0
   userreal1                      = 0
   userreal2                      = 0
   userreal3                      = 0
   userreal4                      = 0
grpopts:
   nrdf:       86119
   ref-t:         300
   tau-t:         0.1
annealing:          No
annealing-npoints:           0
   acc:            0           0           0
   nfreeze:           N           N           N
   energygrp-flags[  0]: 0

Initializing Domain Decomposition on 48 ranks
Dynamic load balancing: auto
Will sort the charge groups at every domain (re)decomposition
Initial maximum inter charge-group distances:
    two-body bonded interactions: 0.429 nm, LJ-14, atoms 4153 4162
  multi-body bonded interactions: 0.429 nm, Proper Dih., atoms 4153 4162
Minimum cell size due to bonded interactions: 0.472 nm
Maximum distance for 7 constraints, at 120 deg. angles, all-trans: 1.138 nm
Estimated maximum distance required for P-LINCS: 1.138 nm
This distance will limit the DD cell size, you can override this with -rcon
Guess for relative PME load: 0.17
Will use 40 particle-particle and 8 PME only ranks
This is a guess, check the performance at the end of the log file
Using 8 separate PME ranks, as guessed by mdrun
Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
Optimizing the DD grid for 40 cells with a minimum initial size of 1.423 nm
The maximum allowed number of cells is: X 4 Y 4 Z 4
By turning FAH down to medium so it only runs on 47 threads, it runs fine:

Code: Select all

13:55:07:WU00:FS00:Starting
13:55:07:WARNING:WU00:FS00:Changed SMP threads from 48 to 47 this can cause some work units to fail
13:55:07:WU00:FS00:Removing old file 'work/00/logfile_01-20200420-131644.txt'
13:55:07:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 706 -lifeline 84821 -checkpoint 5 -np 47
13:55:07:WU00:FS00:Started FahCore on PID 64727
13:55:07:WU00:FS00:Core PID:64731
13:55:07:WU00:FS00:FahCore 0xa7 started
13:55:07:WU00:FS00:0xa7:*********************** Log Started 2020-04-20T13:55:07Z ***********************
13:55:07:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
13:55:07:WU00:FS00:0xa7:       Type: 0xa7
13:55:07:WU00:FS00:0xa7:       Core: Gromacs
13:55:07:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 706 -lifeline 64727 -checkpoint 5 -np
13:55:07:WU00:FS00:0xa7:             47
13:55:07:WU00:FS00:0xa7:************************************ CBang *************************************
13:55:07:WU00:FS00:0xa7:       Date: Nov 5 2019
13:55:07:WU00:FS00:0xa7:       Time: 06:06:57
13:55:07:WU00:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
13:55:07:WU00:FS00:0xa7:     Branch: master
13:55:07:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
13:55:07:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
13:55:07:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
13:55:07:WU00:FS00:0xa7:       Bits: 64
13:55:07:WU00:FS00:0xa7:       Mode: Release
13:55:07:WU00:FS00:0xa7:************************************ System ************************************
13:55:07:WU00:FS00:0xa7:        CPU: Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
13:55:07:WU00:FS00:0xa7:     CPU ID: GenuineIntel Family 6 Model 62 Stepping 4
13:55:07:WU00:FS00:0xa7:       CPUs: 48
13:55:07:WU00:FS00:0xa7:     Memory: 15.48GiB
13:55:07:WU00:FS00:0xa7:Free Memory: 8.38GiB
13:55:07:WU00:FS00:0xa7:    Threads: POSIX_THREADS
13:55:07:WU00:FS00:0xa7: OS Version: 4.18
13:55:07:WU00:FS00:0xa7:Has Battery: false
13:55:07:WU00:FS00:0xa7: On Battery: false
13:55:07:WU00:FS00:0xa7: UTC Offset: -7
13:55:07:WU00:FS00:0xa7:        PID: 64731
13:55:07:WU00:FS00:0xa7:        CWD: /var/lib/fahclient/work
13:55:07:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
13:55:07:WU00:FS00:0xa7:    Version: 0.0.18
13:55:07:WU00:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
13:55:07:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
13:55:07:WU00:FS00:0xa7:   Homepage: https://foldingathome.org/
13:55:07:WU00:FS00:0xa7:       Date: Nov 5 2019
13:55:07:WU00:FS00:0xa7:       Time: 06:13:26
13:55:07:WU00:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
13:55:07:WU00:FS00:0xa7:     Branch: master
13:55:07:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
13:55:07:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
13:55:07:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
13:55:07:WU00:FS00:0xa7:       Bits: 64
13:55:07:WU00:FS00:0xa7:       Mode: Release
13:55:07:WU00:FS00:0xa7:************************************ Build *************************************
13:55:07:WU00:FS00:0xa7:       SIMD: avx_256
13:55:07:WU00:FS00:0xa7:********************************************************************************
13:55:07:WU00:FS00:0xa7:Project: 16417 (Run 473, Clone 2, Gen 83)
13:55:07:WU00:FS00:0xa7:Unit: 0x0000005a96880e6e5e8a61200c024db9
13:55:07:WU00:FS00:0xa7:Reading tar file core.xml
13:55:07:WU00:FS00:0xa7:Reading tar file frame83.tpr
13:55:07:WU00:FS00:0xa7:Digital signatures verified
13:55:07:WU00:FS00:0xa7:Reducing thread count from 47 to 46 to avoid domain decomposition by a prime number > 3
13:55:07:WU00:FS00:0xa7:Reducing thread count from 46 to 45 to avoid domain decomposition with large prime factor 23
13:55:07:WU00:FS00:0xa7:Calling: mdrun -s frame83.tpr -o frame83.trr -x frame83.xtc -cpt 5 -nt 45
13:55:07:WU00:FS00:0xa7:Steps: first=20750000 total=250000
13:55:08:Removing old file 'configs/config-20200407-010322.xml'
13:55:08:Saving configuration to /etc/fahclient/config.xml
13:55:08:<config>
13:55:08:  <!-- Folding Core -->
13:55:08:  <checkpoint v='5'/>
13:55:08:
13:55:08:  <!-- Folding Slot Configuration -->
13:55:08:  <gpu v='false'/>
13:55:08:
13:55:08:  <!-- HTTP Server -->
13:55:08:  <allow v='10.10.10.0/24 127.0.0.1'/>
13:55:08:
13:55:08:  <!-- Network -->
13:55:08:  <proxy v=':8080'/>
13:55:08:
13:55:08:  <!-- Remote Command Server -->
13:55:08:  <command-allow-no-pass v='10.10.10.0/24 127.0.0.1'/>
13:55:08:  <password v='*****'/>
13:55:08:
13:55:08:  <!-- User Information -->
13:55:08:  <passkey v='*****'/>
13:55:08:  <team v='241312'/>
13:55:08:  <user v='whlee'/>
13:55:08:
13:55:08:  <!-- Folding Slots -->
13:55:08:  <slot id='0' type='CPU'>
13:55:08:    <client-type v='bigbeta'/>
13:55:08:  </slot>
13:55:08:</config>
13:55:09:WU00:FS00:0xa7:Completed 1 out of 250000 steps (0%)
13:55:24:WU00:FS00:0xa7:Completed 2500 out of 250000 steps (1%)
13:55:38:WU00:FS00:0xa7:Completed 5000 out of 250000 steps (2%)
13:55:52:WU00:FS00:0xa7:Completed 7500 out of 250000 steps (3%)
13:56:06:WU00:FS00:0xa7:Completed 10000 out of 250000 steps (4%)
“Don't lose your mind trying to set it free...”
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Project 16417 fails on high core count machines

Post by Neil-B »

You may find setting core count to 32 will complete the WU.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
Joe_H
Site Admin
Posts: 7857
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Project 16417 fails on high core count machines

Post by Joe_H »

Neil-B wrote:You may find setting core count to 32 will complete the WU.
However, if the WU was downloaded at a setting of 24 for the CPU thread count you will not be able to raise it over that number.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Project 16417 fails on high core count machines

Post by Neil-B »

Sorry, was responding to Zzyzx post which had log showing was running 48 then 47 threads … and I guess there may be a number between 32 and 47/48 that works as well … My bad.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Project 16417 fails on high core count machines

Post by PantherX »

Zzyzx wrote:...
13:55:08: <client-type v='bigbeta'/>
...
Please note that in the current client, there's no argument value called "bigbeta" so you can remove it.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Zzyzx
Posts: 8
Joined: Thu Apr 18, 2013 4:20 pm
Hardware configuration: Core i7 920
16GB DDR3 RAM
GTX 580
Location: Phoenix, Arizona, USA
Contact:

Re: Project 16417 fails on high core count machines

Post by Zzyzx »

Neil-B wrote:Sorry, was responding to Zzyzx post which had log showing was running 48 then 47 threads … and I guess there may be a number between 32 and 47/48 that works as well … My bad.
Yeah, I was getting the error with 48 threads. I found by turning it down to 47 (which actually decayed down to 45 because of primes,) it ran just fine.
PantherX wrote:
Zzyzx wrote:...
13:55:08: <client-type v='bigbeta'/>
...
Please note that in the current client, there's no argument value called "bigbeta" so you can remove it.
Ah, thanks, updated!
“Don't lose your mind trying to set it free...”
HendricksSA
Posts: 336
Joined: Fri Jun 26, 2009 4:34 am

Re: Project 16417 fails on high core count machines

Post by HendricksSA »

Project: 16417 (Run 904, Clone 3, Gen 243) Decomposition Fail. I thought this would not get assigned to 48 thread machines after all this conversation, but I got one this morning. Changed to 45 threads per _r2w_ben advice and processing fine. Just passing FYI.
Post Reply