Bad Work Unit for All a7 Cores

Moderators: Site Moderators, PandeGroup

Bad Work Unit for All a7 Cores

Postby rbpeake » Mon Sep 11, 2017 1:43 am

On one machine, all my a7 work fails immediately:

Code: Select all
21:31:07:WU01:FS01:FahCore 0xa7 started
21:31:07:WU01:FS01:0xa7:*********************** Log Started 2017-09-10T21:31:07Z ***********************
21:31:07:WU01:FS01:0xa7:************************** Gromacs Folding@home Core ***************************
21:31:07:WU01:FS01:0xa7:       Type: 0xa7
21:31:07:WU01:FS01:0xa7:       Core: Gromacs
21:31:07:WU01:FS01:0xa7:    Website: http://folding.stanford.edu/
21:31:07:WU01:FS01:0xa7:  Copyright: (c) 2009-2016 Stanford University
21:31:07:WU01:FS01:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
21:31:07:WU01:FS01:0xa7:       Args: -dir 01 -suffix 01 -version 704 -lifeline 1420 -checkpoint 15 -np
21:31:07:WU01:FS01:0xa7:             22
21:31:07:WU01:FS01:0xa7:     Config: <none>
21:31:07:WU01:FS01:0xa7:************************************ Build *************************************
21:31:07:WU01:FS01:0xa7:    Version: 0.0.11
21:31:07:WU01:FS01:0xa7:       Date: Sep 21 2016
21:31:07:WU01:FS01:0xa7:       Time: 01:53:37
21:31:07:WU01:FS01:0xa7: Repository: Git
21:31:07:WU01:FS01:0xa7:   Revision: 957bd90e68d95ddcf1594dc15ff6c64cc4555146
21:31:07:WU01:FS01:0xa7:     Branch: master
21:31:07:WU01:FS01:0xa7:   Compiler: GNU 4.2.1 Compatible Clang 3.9.0 (trunk 274080)
21:31:07:WU01:FS01:0xa7:    Options: -std=gnu++98 -O3 -funroll-loops -ffast-math -mfpmath=sse
21:31:07:WU01:FS01:0xa7:             -fno-unsafe-math-optimizations -msse2 -I/mingw64/include
21:31:07:WU01:FS01:0xa7:             -Wno-inconsistent-dllimport -Wno-parentheses-equality
21:31:07:WU01:FS01:0xa7:             -Wno-deprecated-register -Wno-unused-local-typedef
21:31:07:WU01:FS01:0xa7:   Platform: linux2 4.6.0-1-amd64
21:31:07:WU01:FS01:0xa7:       Bits: 64
21:31:07:WU01:FS01:0xa7:       Mode: Release
21:31:07:WU01:FS01:0xa7:       SIMD: sse2
21:31:07:WU01:FS01:0xa7:************************************ System ************************************
21:31:07:WU01:FS01:0xa7:        CPU: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz
21:31:07:WU01:FS01:0xa7:     CPU ID: GenuineIntel Family 6 Model 44 Stepping 2
21:31:07:WU01:FS01:0xa7:       CPUs: 24
21:31:07:WU01:FS01:0xa7:     Memory: 72.00GiB
21:31:07:WU01:FS01:0xa7:Free Memory: 67.03GiB
21:31:07:WU01:FS01:0xa7:    Threads: WINDOWS_THREADS
21:31:07:WU01:FS01:0xa7: OS Version: 6.1
21:31:07:WU01:FS01:0xa7:Has Battery: false
21:31:07:WU01:FS01:0xa7: On Battery: false
21:31:07:WU01:FS01:0xa7: UTC Offset: -4
21:31:07:WU01:FS01:0xa7:        PID: 4348
21:31:07:WU01:FS01:0xa7:        CWD: C:\Users\Bob\AppData\Roaming\FAHClient\work
21:31:07:WU01:FS01:0xa7:         OS: Windows 7 Professional Service Pack 1
21:31:07:WU01:FS01:0xa7:    OS Arch: AMD64
21:31:07:WU01:FS01:0xa7:********************************************************************************
21:31:07:WU01:FS01:0xa7:Project: 13805 (Run 0, Clone 1278, Gen 104)
21:31:07:WU01:FS01:0xa7:Unit: 0x0000007c80fccb045907bc09f356ea94
21:31:07:WU01:FS01:0xa7:Reading tar file core.xml
21:31:07:WU01:FS01:0xa7:Reading tar file frame104.tpr
21:31:07:WU01:FS01:0xa7:Digital signatures verified
21:31:07:WU01:FS01:0xa7:Reducing thread count from 22 to 21 to avoid domain decomposition with large prime factor 11
21:31:07:WU01:FS01:0xa7:Calling: mdrun -s frame104.tpr -o frame104.trr -x frame104.xtc -cpt 15 -nt 21 -nt 21
21:31:07:WU01:FS01:0xa7:ERROR:
21:31:07:WU01:FS01:0xa7:ERROR:-------------------------------------------------------
21:31:07:WU01:FS01:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20160919-669094a-unknown
21:31:07:WU01:FS01:0xa7:ERROR:Source code file: /host/windows-cross-64bit-core-a7-sse-release/gromacs-core/build/gromacs/src/gromacs/commandline/pargs.cpp, line: 680
21:31:07:WU01:FS01:0xa7:ERROR:
21:31:07:WU01:FS01:0xa7:ERROR:Fatal error:
21:31:07:WU01:FS01:0xa7:ERROR:Double command line argument -nt
21:31:07:WU01:FS01:0xa7:ERROR:
21:31:07:WU01:FS01:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
21:31:07:WU01:FS01:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
21:31:07:WU01:FS01:0xa7:ERROR:-------------------------------------------------------
21:31:11:WU00:FS01:Upload 51.73%
21:31:13:WU01:FS01:0xa7:Steps: first=26000000 total=250000
21:31:17:WU00:FS01:Upload 96.99%
21:31:17:WU01:FS01:0xa7:Completed 1 out of 250000 steps (0%)
21:31:17:WU00:FS01:Upload complete
21:31:17:WU00:FS01:Server responded WORK_ACK (400)
21:31:17:WU00:FS01:Cleaning up
21:31:19:WU01:FS01:0xa7:Saving result file ..\logfile_01.txt
21:31:19:WU01:FS01:0xa7:Saving result file frame104.trr
21:31:19:WU01:FS01:0xa7:Saving result file frame104.xtc
21:31:19:WU01:FS01:0xa7:Saving result file md.log
21:31:19:WU01:FS01:0xa7:Saving result file science.log
21:31:19:WU01:FS01:0xa7:Folding@home Core Shutdown: BAD_WORK_UNIT
21:31:20:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
21:31:20:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13805 run:0 clone:1278 gen:104 core:0xa7 unit:0x0000007c80fccb045907bc09f356ea94
21:31:20:WU01:FS01:Uploading 6.80MiB to 128.252.203.4
21:31:20:WU01:FS01:Connecting to 128.252.203.4:8080
21:31:20:WU00:FS01:Connecting to 171.67.108.45:8080
21:31:21:WU00:FS01:Assigned to work server 155.247.166.220
21:31:21:WU00:FS01:Requesting new work unit for slot 01: READY cpu:23 from 155.247.166.220
21:31:21:WU00:FS01:Connecting to 155.247.166.220:8080
21:31:21:WU00:FS01:Downloading 334.35KiB
21:31:21:WU00:FS01:Download complete
21:31:22:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:8646 run:979 clone:0 gen:20 core:0xa4 unit:0x000000170002894c57e0c3b012ed070c
21:31:22:WU00:FS01:Starting
21:31:22:WARNING:WU00:FS01:AS lowered CPUs from 23 to 20
21:31:22:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\Bob\AppData\Roaming\FAHClient\cores/fahwebx.stanford.edu/cores/Win32/AMD64/Core_a4.fah/FahCore_a4.exe -dir 00 -suffix 01 -version 704 -lifeline 3656 -checkpoint 15 -np 20
21:31:22:WU00:FS01:Started FahCore on PID 5044
21:31:22:WU00:FS01:Core PID:5000
21:31:22:WU00:FS01:FahCore 0xa4 started
rbpeake
 
Posts: 328
Joined: Sun Jun 15, 2008 4:39 pm
Location: NYC Metro Area

Re: Bad Work Unit for All a7 Cores

Postby JimboPalmer » Mon Sep 11, 2017 1:54 am

a7 does not run in a virtual machine, (a4 does) are you virtualized?

In the advanced client have you tried 20 CPUs? -nt (number of threads) keeps counting down, but then gets confused by multiple -nt parameters

21:31:07:WU01:FS01:0xa7:Reducing thread count from 22 to 21 to avoid domain decomposition with large prime factor 11
21:31:07:WU01:FS01:0xa7:Calling: mdrun -s frame104.tpr -o frame104.trr -x frame104.xtc -cpt 15 -nt 21 -nt 21
21:31:07:WU01:FS01:0xa7:ERROR:
21:31:07:WU01:FS01:0xa7:ERROR:-------------------------------------------------------
21:31:07:WU01:FS01:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20160919-669094a-unknown
21:31:07:WU01:FS01:0xa7:ERROR:Source code file: /host/windows-cross-64bit-core-a7-sse-release/gromacs-core/build/gromacs/src/gromacs/commandline/pargs.cpp, line: 680
21:31:07:WU01:FS01:0xa7:ERROR:
21:31:07:WU01:FS01:0xa7:ERROR:Fatal error:
21:31:07:WU01:FS01:0xa7:ERROR:Double command line argument -nt

(i suspect a line where it goes from 23 to 22 was omitted)
It goes down from 22 to avoid a multiple of a large prime, 11
Sadly 21 is also a multiple of a large prime, 7
20 would have 5 as the largest prime, 5 is often OK.
if not, the number of CPUs as 16 is a sure thing.
if you wanted you could make 2 CPU slots one 16 and one 6, neither would have large primes.

In the a4 setup you can see it drop from 23 CPUs to 20, which works.

21:31:21:WU00:FS01:Download complete
21:31:22:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:8646 run:979 clone:0 gen:20 core:0xa4 unit:0x000000170002894c57e0c3b012ed070c
21:31:22:WU00:FS01:Starting
21:31:22:WARNING:WU00:FS01:AS lowered CPUs from 23 to 20

As to why you have 23 CPUs, I suspect one of two scenarios, one is that you have Folding on medium (number of CPUs minus one) or two, you have a GPU you did not tell us about. If it is the first you could change to Full which is 24 CPUs, which has no large prime factors. If you have a GPU, turning the CPU client down to 20 CPUs is best.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
JimboPalmer
 
Posts: 519
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: Bad Work Unit for All a7 Cores

Postby rbpeake » Mon Sep 11, 2017 2:59 pm

I have one GPU folding, which may be when the problem started. Using 24 cores was not a problem. I am using the beta client, would this not automatically adjust the remaining cores to a stable configuration? If not, what do I need to do to fix the problem? Is it a setting under "Configuration"?
Thanks!
rbpeake
 
Posts: 328
Joined: Sun Jun 15, 2008 4:39 pm
Location: NYC Metro Area

Re: Bad Work Unit for All a7 Cores

Postby rbpeake » Mon Sep 11, 2017 4:17 pm

I found the setting. Will force it to 20 cores for CPU, presumably setting to 22 will not work because 11 is a large prime number. Thanks.
rbpeake
 
Posts: 328
Joined: Sun Jun 15, 2008 4:39 pm
Location: NYC Metro Area

Re: Bad Work Unit for All a7 Cores

Postby JimboPalmer » Mon Sep 11, 2017 5:59 pm

If you worry about the 3 idle cores, you can make a second CPU slot with CPUS = 3, but it won't add much.

Glad you got it fixed!
JimboPalmer
 
Posts: 519
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: Bad Work Unit for All a7 Cores

Postby toTOW » Sat Sep 16, 2017 10:45 am

21:31:07:WU01:FS01:0xa7:Calling: mdrun -s frame104.tpr -o frame104.trr -x frame104.xtc -cpt 15 -nt 21 -nt 21
[...]
21:31:07:WU01:FS01:0xa7:ERROR:Fatal error:
21:31:07:WU01:FS01:0xa7:ERROR:Double command line argument -nt


Do you pass arguments manually using extra-core-args in advanced slot settings ?
Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.

FAH-Addict : latest news, tests and reviews about Folding@Home project.

Image
User avatar
toTOW
Site Moderator
 
Posts: 8931
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France

Re: Bad Work Unit for All a7 Cores

Postby Joe_H » Sat Sep 16, 2017 3:22 pm

toTOW wrote:
21:31:07:WU01:FS01:0xa7:Calling: mdrun -s frame104.tpr -o frame104.trr -x frame104.xtc -cpt 15 -nt 21 -nt 21
[...]
21:31:07:WU01:FS01:0xa7:ERROR:Fatal error:
21:31:07:WU01:FS01:0xa7:ERROR:Double command line argument -nt


Do you pass arguments manually using extra-core-args in advanced slot settings ?


There is a known issue with Core_A7, https://github.com/FoldingAtHome/fah-issues/issues/1179.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Joe_H
Site Admin
 
Posts: 3883
Joined: Tue Apr 21, 2009 4:41 pm
Location: W. MA


Return to CPU Projects - released FAHCores _a4 & _a7

Who is online

Users browsing this forum: No registered users and 1 guest

cron