Page 1 of 1

Bad Work Unit for All a7 Cores

Posted: Mon Sep 11, 2017 1:43 am
by rbpeake
On one machine, all my a7 work fails immediately:

Code: Select all

21:31:07:WU01:FS01:FahCore 0xa7 started
21:31:07:WU01:FS01:0xa7:*********************** Log Started 2017-09-10T21:31:07Z ***********************
21:31:07:WU01:FS01:0xa7:************************** Gromacs Folding@home Core ***************************
21:31:07:WU01:FS01:0xa7:       Type: 0xa7
21:31:07:WU01:FS01:0xa7:       Core: Gromacs
21:31:07:WU01:FS01:0xa7:    Website: http://folding.stanford.edu/
21:31:07:WU01:FS01:0xa7:  Copyright: (c) 2009-2016 Stanford University
21:31:07:WU01:FS01:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
21:31:07:WU01:FS01:0xa7:       Args: -dir 01 -suffix 01 -version 704 -lifeline 1420 -checkpoint 15 -np
21:31:07:WU01:FS01:0xa7:             22
21:31:07:WU01:FS01:0xa7:     Config: <none>
21:31:07:WU01:FS01:0xa7:************************************ Build *************************************
21:31:07:WU01:FS01:0xa7:    Version: 0.0.11
21:31:07:WU01:FS01:0xa7:       Date: Sep 21 2016
21:31:07:WU01:FS01:0xa7:       Time: 01:53:37
21:31:07:WU01:FS01:0xa7: Repository: Git
21:31:07:WU01:FS01:0xa7:   Revision: 957bd90e68d95ddcf1594dc15ff6c64cc4555146
21:31:07:WU01:FS01:0xa7:     Branch: master
21:31:07:WU01:FS01:0xa7:   Compiler: GNU 4.2.1 Compatible Clang 3.9.0 (trunk 274080)
21:31:07:WU01:FS01:0xa7:    Options: -std=gnu++98 -O3 -funroll-loops -ffast-math -mfpmath=sse
21:31:07:WU01:FS01:0xa7:             -fno-unsafe-math-optimizations -msse2 -I/mingw64/include
21:31:07:WU01:FS01:0xa7:             -Wno-inconsistent-dllimport -Wno-parentheses-equality
21:31:07:WU01:FS01:0xa7:             -Wno-deprecated-register -Wno-unused-local-typedef
21:31:07:WU01:FS01:0xa7:   Platform: linux2 4.6.0-1-amd64
21:31:07:WU01:FS01:0xa7:       Bits: 64
21:31:07:WU01:FS01:0xa7:       Mode: Release
21:31:07:WU01:FS01:0xa7:       SIMD: sse2
21:31:07:WU01:FS01:0xa7:************************************ System ************************************
21:31:07:WU01:FS01:0xa7:        CPU: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz
21:31:07:WU01:FS01:0xa7:     CPU ID: GenuineIntel Family 6 Model 44 Stepping 2
21:31:07:WU01:FS01:0xa7:       CPUs: 24
21:31:07:WU01:FS01:0xa7:     Memory: 72.00GiB
21:31:07:WU01:FS01:0xa7:Free Memory: 67.03GiB
21:31:07:WU01:FS01:0xa7:    Threads: WINDOWS_THREADS
21:31:07:WU01:FS01:0xa7: OS Version: 6.1
21:31:07:WU01:FS01:0xa7:Has Battery: false
21:31:07:WU01:FS01:0xa7: On Battery: false
21:31:07:WU01:FS01:0xa7: UTC Offset: -4
21:31:07:WU01:FS01:0xa7:        PID: 4348
21:31:07:WU01:FS01:0xa7:        CWD: C:\Users\Bob\AppData\Roaming\FAHClient\work
21:31:07:WU01:FS01:0xa7:         OS: Windows 7 Professional Service Pack 1
21:31:07:WU01:FS01:0xa7:    OS Arch: AMD64
21:31:07:WU01:FS01:0xa7:********************************************************************************
21:31:07:WU01:FS01:0xa7:Project: 13805 (Run 0, Clone 1278, Gen 104)
21:31:07:WU01:FS01:0xa7:Unit: 0x0000007c80fccb045907bc09f356ea94
21:31:07:WU01:FS01:0xa7:Reading tar file core.xml
21:31:07:WU01:FS01:0xa7:Reading tar file frame104.tpr
21:31:07:WU01:FS01:0xa7:Digital signatures verified
21:31:07:WU01:FS01:0xa7:Reducing thread count from 22 to 21 to avoid domain decomposition with large prime factor 11
21:31:07:WU01:FS01:0xa7:Calling: mdrun -s frame104.tpr -o frame104.trr -x frame104.xtc -cpt 15 -nt 21 -nt 21
21:31:07:WU01:FS01:0xa7:ERROR:
21:31:07:WU01:FS01:0xa7:ERROR:-------------------------------------------------------
21:31:07:WU01:FS01:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20160919-669094a-unknown
21:31:07:WU01:FS01:0xa7:ERROR:Source code file: /host/windows-cross-64bit-core-a7-sse-release/gromacs-core/build/gromacs/src/gromacs/commandline/pargs.cpp, line: 680
21:31:07:WU01:FS01:0xa7:ERROR:
21:31:07:WU01:FS01:0xa7:ERROR:Fatal error:
21:31:07:WU01:FS01:0xa7:ERROR:Double command line argument -nt
21:31:07:WU01:FS01:0xa7:ERROR:
21:31:07:WU01:FS01:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
21:31:07:WU01:FS01:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
21:31:07:WU01:FS01:0xa7:ERROR:-------------------------------------------------------
21:31:11:WU00:FS01:Upload 51.73%
21:31:13:WU01:FS01:0xa7:Steps: first=26000000 total=250000
21:31:17:WU00:FS01:Upload 96.99%
21:31:17:WU01:FS01:0xa7:Completed 1 out of 250000 steps (0%)
21:31:17:WU00:FS01:Upload complete
21:31:17:WU00:FS01:Server responded WORK_ACK (400)
21:31:17:WU00:FS01:Cleaning up
21:31:19:WU01:FS01:0xa7:Saving result file ..\logfile_01.txt
21:31:19:WU01:FS01:0xa7:Saving result file frame104.trr
21:31:19:WU01:FS01:0xa7:Saving result file frame104.xtc
21:31:19:WU01:FS01:0xa7:Saving result file md.log
21:31:19:WU01:FS01:0xa7:Saving result file science.log
21:31:19:WU01:FS01:0xa7:Folding@home Core Shutdown: BAD_WORK_UNIT
21:31:20:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
21:31:20:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13805 run:0 clone:1278 gen:104 core:0xa7 unit:0x0000007c80fccb045907bc09f356ea94
21:31:20:WU01:FS01:Uploading 6.80MiB to 128.252.203.4
21:31:20:WU01:FS01:Connecting to 128.252.203.4:8080
21:31:20:WU00:FS01:Connecting to 171.67.108.45:8080
21:31:21:WU00:FS01:Assigned to work server 155.247.166.220
21:31:21:WU00:FS01:Requesting new work unit for slot 01: READY cpu:23 from 155.247.166.220
21:31:21:WU00:FS01:Connecting to 155.247.166.220:8080
21:31:21:WU00:FS01:Downloading 334.35KiB
21:31:21:WU00:FS01:Download complete
21:31:22:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:8646 run:979 clone:0 gen:20 core:0xa4 unit:0x000000170002894c57e0c3b012ed070c
21:31:22:WU00:FS01:Starting
21:31:22:WARNING:WU00:FS01:AS lowered CPUs from 23 to 20
21:31:22:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\Bob\AppData\Roaming\FAHClient\cores/fahwebx.stanford.edu/cores/Win32/AMD64/Core_a4.fah/FahCore_a4.exe -dir 00 -suffix 01 -version 704 -lifeline 3656 -checkpoint 15 -np 20
21:31:22:WU00:FS01:Started FahCore on PID 5044
21:31:22:WU00:FS01:Core PID:5000
21:31:22:WU00:FS01:FahCore 0xa4 started

Re: Bad Work Unit for All a7 Cores

Posted: Mon Sep 11, 2017 1:54 am
by JimboPalmer
a7 does not run in a virtual machine, (a4 does) are you virtualized?

In the advanced client have you tried 20 CPUs? -nt (number of threads) keeps counting down, but then gets confused by multiple -nt parameters

21:31:07:WU01:FS01:0xa7:Reducing thread count from 22 to 21 to avoid domain decomposition with large prime factor 11
21:31:07:WU01:FS01:0xa7:Calling: mdrun -s frame104.tpr -o frame104.trr -x frame104.xtc -cpt 15 -nt 21 -nt 21
21:31:07:WU01:FS01:0xa7:ERROR:
21:31:07:WU01:FS01:0xa7:ERROR:-------------------------------------------------------
21:31:07:WU01:FS01:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20160919-669094a-unknown
21:31:07:WU01:FS01:0xa7:ERROR:Source code file: /host/windows-cross-64bit-core-a7-sse-release/gromacs-core/build/gromacs/src/gromacs/commandline/pargs.cpp, line: 680
21:31:07:WU01:FS01:0xa7:ERROR:
21:31:07:WU01:FS01:0xa7:ERROR:Fatal error:
21:31:07:WU01:FS01:0xa7:ERROR:Double command line argument -nt

(i suspect a line where it goes from 23 to 22 was omitted)
It goes down from 22 to avoid a multiple of a large prime, 11
Sadly 21 is also a multiple of a large prime, 7
20 would have 5 as the largest prime, 5 is often OK.
if not, the number of CPUs as 16 is a sure thing.
if you wanted you could make 2 CPU slots one 16 and one 6, neither would have large primes.

In the a4 setup you can see it drop from 23 CPUs to 20, which works.

21:31:21:WU00:FS01:Download complete
21:31:22:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:8646 run:979 clone:0 gen:20 core:0xa4 unit:0x000000170002894c57e0c3b012ed070c
21:31:22:WU00:FS01:Starting
21:31:22:WARNING:WU00:FS01:AS lowered CPUs from 23 to 20

As to why you have 23 CPUs, I suspect one of two scenarios, one is that you have Folding on medium (number of CPUs minus one) or two, you have a GPU you did not tell us about. If it is the first you could change to Full which is 24 CPUs, which has no large prime factors. If you have a GPU, turning the CPU client down to 20 CPUs is best.

Re: Bad Work Unit for All a7 Cores

Posted: Mon Sep 11, 2017 2:59 pm
by rbpeake
I have one GPU folding, which may be when the problem started. Using 24 cores was not a problem. I am using the beta client, would this not automatically adjust the remaining cores to a stable configuration? If not, what do I need to do to fix the problem? Is it a setting under "Configuration"?
Thanks!

Re: Bad Work Unit for All a7 Cores

Posted: Mon Sep 11, 2017 4:17 pm
by rbpeake
I found the setting. Will force it to 20 cores for CPU, presumably setting to 22 will not work because 11 is a large prime number. Thanks.

Re: Bad Work Unit for All a7 Cores

Posted: Mon Sep 11, 2017 5:59 pm
by JimboPalmer
If you worry about the 3 idle cores, you can make a second CPU slot with CPUS = 3, but it won't add much.

Glad you got it fixed!

Re: Bad Work Unit for All a7 Cores

Posted: Sat Sep 16, 2017 10:45 am
by toTOW
21:31:07:WU01:FS01:0xa7:Calling: mdrun -s frame104.tpr -o frame104.trr -x frame104.xtc -cpt 15 -nt 21 -nt 21
[...]
21:31:07:WU01:FS01:0xa7:ERROR:Fatal error:
21:31:07:WU01:FS01:0xa7:ERROR:Double command line argument -nt
Do you pass arguments manually using extra-core-args in advanced slot settings ?

Re: Bad Work Unit for All a7 Cores

Posted: Sat Sep 16, 2017 3:22 pm
by Joe_H
toTOW wrote:
21:31:07:WU01:FS01:0xa7:Calling: mdrun -s frame104.tpr -o frame104.trr -x frame104.xtc -cpt 15 -nt 21 -nt 21
[...]
21:31:07:WU01:FS01:0xa7:ERROR:Fatal error:
21:31:07:WU01:FS01:0xa7:ERROR:Double command line argument -nt
Do you pass arguments manually using extra-core-args in advanced slot settings ?
There is a known issue with Core_A7, https://github.com/FoldingAtHome/fah-issues/issues/1179.