GPU tasks do not pause once started on Radeon RX Vega

It seems that a lot of GPU problems revolve around specific versions of drivers. Though AMD has their own support structure, you can often learn from information reported by others who fold.

Moderators: Site Moderators, FAHC Science Team

seanthegeek
Posts: 5
Joined: Fri Sep 08, 2017 4:01 pm

GPU tasks do not pause once started on Radeon RX Vega

Post by seanthegeek »

The hardware usage meter on the card also stays maxed out.

Screenshot: https://i.redd.it/5jg7de86kokz.png
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU tasks do not pause once started on Radeon RX Vega

Post by bruce »

Welcome to FoldingForum.org, seanthegeek.

1) Please post the FAH log in accordance with the link below.
2) I notice that you're running 8 CPUs but they're paused, too. That's strange.
3) If that's Windows, please sort Task Manager's output by CPU utilization and post the list of top processes.
4) I notice you have FAH's performance slider set to the minimum. That's strange, too.
seanthegeek
Posts: 5
Joined: Fri Sep 08, 2017 4:01 pm

Re: GPU tasks do not pause once started on Radeon RX Vega

Post by seanthegeek »

Code: Select all

*********************** Log Started 2017-09-08T19:37:05Z ***********************
19:37:05:************************* Folding@home Client *************************
19:37:05:      Website: http://folding.stanford.edu/
19:37:05:    Copyright: (c) 2009-2014 Stanford University
19:37:05:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
19:37:05:         Args: 
19:37:05:       Config: C:/Users/sean/AppData/Roaming/FAHClient/config.xml
19:37:05:******************************** Build ********************************
19:37:05:      Version: 7.4.4
19:37:05:         Date: Mar 4 2014
19:37:05:         Time: 20:26:54
19:37:05:      SVN Rev: 4130
19:37:05:       Branch: fah/trunk/client
19:37:05:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
19:37:05:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
19:37:05:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
19:37:05:     Platform: win32 XP
19:37:05:         Bits: 32
19:37:05:         Mode: Release
19:37:05:******************************* System ********************************
19:37:05:          CPU: AMD Ryzen 7 1700X Eight-Core Processor
19:37:05:       CPU ID: AuthenticAMD Family 23 Model 1 Stepping 1
19:37:05:         CPUs: 16
19:37:05:       Memory: 15.93GiB
19:37:05:  Free Memory: 10.96GiB
19:37:05:      Threads: WINDOWS_THREADS
19:37:05:   OS Version: 6.2
19:37:05:  Has Battery: false
19:37:05:   On Battery: false
19:37:05:   UTC Offset: -4
19:37:05:          PID: 20436
19:37:05:          CWD: C:/Users/sean/AppData/Roaming/FAHClient
19:37:05:           OS: Windows 10 Pro
19:37:05:      OS Arch: AMD64
19:37:05:         GPUs: 1
19:37:05:        GPU 0: ATI:5 [Radeon Rx vega]
19:37:05:         CUDA: Not detected
19:37:05:Win32 Service: false
19:37:05:***********************************************************************
19:37:05:<config>
19:37:05:  <!-- HTTP Server -->
19:37:05:  <allow v='127.0.0.1 192.168.1.0/24'/>
19:37:05:
19:37:05:  <!-- Network -->
19:37:05:  <proxy v=':8080'/>
19:37:05:
19:37:05:  <!-- Remote Command Server -->
19:37:05:  <password v='********'/>
19:37:05:
19:37:05:  <!-- Slot Control -->
19:37:05:  <power v='light'/>
19:37:05:
19:37:05:  <!-- User Information -->
19:37:05:  <passkey v='********************************'/>
19:37:05:  <team v='224497'/>
19:37:05:  <user v='seanw_FLDC_1BddRSZxfZW3xjq4C2r8BWAmr2J9tT7GMW'/>
19:37:05:
19:37:05:  <!-- Folding Slots -->
19:37:05:  <slot id='0' type='CPU'/>
19:37:05:  <slot id='1' type='GPU'>
19:37:05:    <idle v='true'/>
19:37:05:    <paused v='true'/>
19:37:05:  </slot>
19:37:05:</config>
19:37:05:Trying to access database...
19:37:05:Successfully acquired database lock
19:37:05:Enabled folding slot 00: READY cpu:8
19:37:05:Enabled folding slot 01: PAUSED gpu:0:[Radeon Rx vega] (by user)
19:37:05:WU02:FS00:Starting
19:37:05:WU02:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/sean/AppData/Roaming/FAHClient/cores/fahwebx.stanford.edu/cores/Win32/AMD64/Core_a4.fah/FahCore_a4.exe -dir 02 -suffix 01 -version 704 -lifeline 20436 -checkpoint 15 -np 8
19:37:05:WU02:FS00:Started FahCore on PID 22920
19:37:05:WU02:FS00:Core PID:22956
19:37:05:WU02:FS00:FahCore 0xa4 started
19:37:06:WU02:FS00:0xa4:
19:37:06:WU02:FS00:0xa4:*------------------------------*
19:37:06:WU02:FS00:0xa4:Folding@Home Gromacs GB Core
19:37:06:WU02:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
19:37:06:WU02:FS00:0xa4:
19:37:06:WU02:FS00:0xa4:Preparing to commence simulation
19:37:06:WU02:FS00:0xa4:- Ensuring status. Please wait.
19:37:15:WU02:FS00:0xa4:- Looking at optimizations...
19:37:15:WU02:FS00:0xa4:- Working with standard loops on this execution.
19:37:15:WU02:FS00:0xa4:- Previous termination of core was improper.
19:37:15:WU02:FS00:0xa4:- Files status OK
19:37:15:WU02:FS00:0xa4:- Expanded 825958 -> 1403472 (decompressed 169.9 percent)
19:37:15:WU02:FS00:0xa4:Called DecompressByteArray: compressed_data_size=825958 data_size=1403472, decompressed_data_size=1403472 diff=0
19:37:15:WU02:FS00:0xa4:- Digital signature verified
19:37:15:WU02:FS00:0xa4:
19:37:15:WU02:FS00:0xa4:Project: 9037 (Run 256, Clone 1, Gen 1221)
19:37:15:WU02:FS00:0xa4:
19:37:15:WU02:FS00:0xa4:Entering M.D.
19:37:21:WU02:FS00:0xa4:Mapping NT from 8 to 8 
19:37:21:WU02:FS00:0xa4:Completed 0 out of 250000 steps  (0%)
19:38:23:WU02:FS00:0xa4:Completed 2500 out of 250000 steps  (1%)
19:39:23:WU02:FS00:0xa4:Completed 5000 out of 250000 steps  (2%)
19:40:23:WU02:FS00:0xa4:Completed 7500 out of 250000 steps  (3%)
19:41:23:WU02:FS00:0xa4:Completed 10000 out of 250000 steps  (4%)
19:42:24:WU02:FS00:0xa4:Completed 12500 out of 250000 steps  (5%)
19:43:24:WU02:FS00:0xa4:Completed 15000 out of 250000 steps  (6%)
19:44:25:WU02:FS00:0xa4:Completed 17500 out of 250000 steps  (7%)
19:45:24:WU02:FS00:0xa4:Completed 20000 out of 250000 steps  (8%)
19:46:24:WU02:FS00:0xa4:Completed 22500 out of 250000 steps  (9%)
19:47:24:WU02:FS00:0xa4:Completed 25000 out of 250000 steps  (10%)
19:48:11:FS00:Shutting core down
19:48:15:WU02:FS00:0xa4:Client no longer detected. Shutting down core 
19:48:15:WU02:FS00:0xa4:
19:48:15:WU02:FS00:0xa4:Folding@home Core Shutdown: CLIENT_DIED
19:48:15:WU02:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
19:48:16:Removing old file 'configs/config-20170703-212517.xml'
19:48:16:Saving configuration to config.xml
19:48:16:<config>
19:48:16:  <!-- HTTP Server -->
19:48:16:  <allow v='127.0.0.1 192.168.1.0/24'/>
19:48:16:
19:48:16:  <!-- Network -->
19:48:16:  <proxy v=':8080'/>
19:48:16:
19:48:16:  <!-- Remote Command Server -->
19:48:16:  <password v='********'/>
19:48:16:
19:48:16:  <!-- User Information -->
19:48:16:  <passkey v='********************************'/>
19:48:16:  <team v='224497'/>
19:48:16:  <user v='seanw_FLDC_1BddRSZxfZW3xjq4C2r8BWAmr2J9tT7GMW'/>
19:48:16:
19:48:16:  <!-- Folding Slots -->
19:48:16:  <slot id='0' type='CPU'/>
19:48:16:  <slot id='1' type='GPU'>
19:48:16:    <paused v='true'/>
19:48:16:  </slot>
19:48:16:</config>
19:48:16:WU02:FS00:Starting
19:48:16:WARNING:WU02:FS00:Changed SMP threads from 8 to 14 this can cause some work units to fail
19:48:16:WU02:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/sean/AppData/Roaming/FAHClient/cores/fahwebx.stanford.edu/cores/Win32/AMD64/Core_a4.fah/FahCore_a4.exe -dir 02 -suffix 01 -version 704 -lifeline 20436 -checkpoint 15 -np 14
19:48:16:WU02:FS00:Started FahCore on PID 1308
19:48:16:WU02:FS00:Core PID:8008
19:48:16:WU02:FS00:FahCore 0xa4 started
19:48:16:WU02:FS00:0xa4:
19:48:16:WU02:FS00:0xa4:*------------------------------*
19:48:16:WU02:FS00:0xa4:Folding@Home Gromacs GB Core
19:48:16:WU02:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
19:48:16:WU02:FS00:0xa4:
19:48:16:WU02:FS00:0xa4:Preparing to commence simulation
19:48:16:WU02:FS00:0xa4:- Looking at optimizations...
19:48:16:WU02:FS00:0xa4:- Files status OK
19:48:16:WU02:FS00:0xa4:- Expanded 825958 -> 1403472 (decompressed 169.9 percent)
19:48:16:WU02:FS00:0xa4:Called DecompressByteArray: compressed_data_size=825958 data_size=1403472, decompressed_data_size=1403472 diff=0
19:48:16:WU02:FS00:0xa4:- Digital signature verified
19:48:16:WU02:FS00:0xa4:
19:48:16:WU02:FS00:0xa4:Project: 9037 (Run 256, Clone 1, Gen 1221)
19:48:16:WU02:FS00:0xa4:
19:48:16:WU02:FS00:0xa4:Assembly optimizations on if available.
19:48:16:WU02:FS00:0xa4:Entering M.D.
19:48:22:WU02:FS00:0xa4:Mapping NT from 14 to 14 
19:48:22:WU02:FS00:0xa4:mdrun returned 255
19:48:22:WU02:FS00:0xa4:Going to send back what have done -- stepsTotalG=250000
19:48:22:WU02:FS00:0xa4:Work fraction=0.0000 steps=250000.
19:48:23:FS01:Unpaused
19:48:23:WU00:FS01:Starting
19:48:23:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/sean/AppData/Roaming/FAHClient/cores/fahwebx.stanford.edu/cores/Win32/AMD64/ATI/R600/Core_21.fah/FahCore_21.exe -dir 00 -suffix 01 -version 704 -lifeline 20436 -checkpoint 15 -gpu 0 -gpu-vendor ati
19:48:23:WU00:FS01:Started FahCore on PID 19728
19:48:23:WU00:FS01:Core PID:20476
19:48:23:WU00:FS01:FahCore 0x21 started
19:48:23:WU00:FS01:0x21:*********************** Log Started 2017-09-08T19:48:23Z ***********************
19:48:23:WU00:FS01:0x21:Project: 9414 (Run 9, Clone 0, Gen 769)
19:48:23:WU00:FS01:0x21:Unit: 0x0000036fab436c9d585e0690972a6b29
19:48:23:WU00:FS01:0x21:CPU: 0x00000000000000000000000000000000
19:48:23:WU00:FS01:0x21:Machine: 1
19:48:23:WU00:FS01:0x21:Digital signatures verified
19:48:23:WU00:FS01:0x21:Folding@home GPU Core21 Folding@home Core
19:48:23:WU00:FS01:0x21:Version 0.0.18
19:48:23:WU00:FS01:0x21:  Found a checkpoint file
19:48:26:WU02:FS00:0xa4:logfile size=0 infoLength=0 edr=0 trr=25
19:48:26:WU02:FS00:0xa4:logfile size: 0 info=0 bed=0 hdr=25
19:48:26:WU02:FS00:0xa4:- Writing 640 bytes of core data to disk...
19:48:26:WU02:FS00:0xa4:Done: 128 -> 146 (compressed to 114.0 percent)
19:48:26:WU02:FS00:0xa4:  ... Done.
19:48:28:WU02:FS00:0xa4:
19:48:28:WU02:FS00:0xa4:Folding@home Core Shutdown: EARLY_UNIT_END
19:48:28:WU00:FS01:0x21:Completed 300000 out of 6250000 steps (4%)
19:48:28:WU00:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
19:48:28:WARNING:WU02:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
19:48:28:WU02:FS00:Sending unit results: id:02 state:SEND error:FAULTY project:9037 run:256 clone:1 gen:1221 core:0xa4 unit:0x0000053eab436c9e56982b8905c5290c
19:48:28:WU02:FS00:Uploading 658B to 171.67.108.158
19:48:28:WU02:FS00:Connecting to 171.67.108.158:8080
19:48:28:WU01:FS00:Connecting to 171.67.108.45:8080
19:48:28:WU02:FS00:Upload complete
19:48:28:WU02:FS00:Server responded WORK_ACK (400)
19:48:28:WU02:FS00:Cleaning up
19:48:29:WARNING:WU01:FS00:Failed to get assignment from '171.67.108.45:8080': Empty work server assignment
19:48:29:WU01:FS00:Connecting to 171.64.65.35:80
19:48:29:WARNING:WU01:FS00:Failed to get assignment from '171.64.65.35:80': Empty work server assignment
19:48:29:ERROR:WU01:FS00:Exception: Could not get an assignment
19:48:29:WU01:FS00:Connecting to 171.67.108.45:8080
19:48:30:WARNING:WU01:FS00:Failed to get assignment from '171.67.108.45:8080': Empty work server assignment
19:48:30:WU01:FS00:Connecting to 171.64.65.35:80
19:48:31:WARNING:WU01:FS00:Failed to get assignment from '171.64.65.35:80': Empty work server assignment
19:48:31:ERROR:WU01:FS00:Exception: Could not get an assignment
19:48:41:WU00:FS01:0x21:Completed 312500 out of 6250000 steps (5%)
19:49:07:FS01:Shutting core down
19:49:07:WU00:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
19:49:17:Removing old file 'configs/config-20170707-041705.xml'
19:49:17:Saving configuration to config.xml
19:49:17:<config>
19:49:17:  <!-- HTTP Server -->
19:49:17:  <allow v='127.0.0.1 192.168.1.0/24'/>
19:49:17:
19:49:17:  <!-- Network -->
19:49:17:  <proxy v=':8080'/>
19:49:17:
19:49:17:  <!-- Remote Command Server -->
19:49:17:  <password v='********'/>
19:49:17:
19:49:17:  <!-- Slot Control -->
19:49:17:  <power v='light'/>
19:49:17:
19:49:17:  <!-- User Information -->
19:49:17:  <passkey v='********************************'/>
19:49:17:  <team v='224497'/>
19:49:17:  <user v='seanw_FLDC_1BddRSZxfZW3xjq4C2r8BWAmr2J9tT7GMW'/>
19:49:17:
19:49:17:  <!-- Folding Slots -->
19:49:17:  <slot id='0' type='CPU'/>
19:49:17:  <slot id='1' type='GPU'/>
19:49:17:</config>
19:49:30:WU01:FS00:Connecting to 171.67.108.45:8080
19:49:30:WU01:FS00:Assigned to work server 134.139.52.3
19:49:30:WU01:FS00:Requesting new work unit for slot 00: READY cpu:8 from 134.139.52.3
19:49:30:WU01:FS00:Connecting to 134.139.52.3:8080
19:49:31:WU01:FS00:Downloading 4.14MiB
19:49:35:WU01:FS00:Download complete
19:49:35:WU01:FS00:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:8216 run:4 clone:25 gen:37 core:0xa7 unit:0x00000028868b340358ed4235bb195156
19:49:35:WU01:FS00:Starting
19:49:35:WU01:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/sean/AppData/Roaming/FAHClient/cores/fahwebx.stanford.edu/cores/Win32/AMD64/AVX/Core_a7.fah/FahCore_a7.exe -dir 01 -suffix 01 -version 704 -lifeline 20436 -checkpoint 15 -np 8
19:49:35:WU01:FS00:Started FahCore on PID 24420
19:49:35:WU01:FS00:Core PID:24520
19:49:35:WU01:FS00:FahCore 0xa7 started
19:49:36:WU01:FS00:0xa7:*********************** Log Started 2017-09-08T19:49:35Z ***********************
19:49:36:WU01:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
19:49:36:WU01:FS00:0xa7:       Type: 0xa7
19:49:36:WU01:FS00:0xa7:       Core: Gromacs
19:49:36:WU01:FS00:0xa7:    Website: http://folding.stanford.edu/
19:49:36:WU01:FS00:0xa7:  Copyright: (c) 2009-2016 Stanford University
19:49:36:WU01:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
19:49:36:WU01:FS00:0xa7:       Args: -dir 01 -suffix 01 -version 704 -lifeline 24420 -checkpoint 15 -np
19:49:36:WU01:FS00:0xa7:             8
19:49:36:WU01:FS00:0xa7:     Config: <none>
19:49:36:WU01:FS00:0xa7:************************************ Build *************************************
19:49:36:WU01:FS00:0xa7:    Version: 0.0.11
19:49:36:WU01:FS00:0xa7:       Date: Sep 21 2016
19:49:36:WU01:FS00:0xa7:       Time: 01:43:48
19:49:36:WU01:FS00:0xa7: Repository: Git
19:49:36:WU01:FS00:0xa7:   Revision: 957bd90e68d95ddcf1594dc15ff6c64cc4555146
19:49:36:WU01:FS00:0xa7:     Branch: master
19:49:36:WU01:FS00:0xa7:   Compiler: GNU 4.2.1 Compatible Clang 3.9.0 (trunk 274080)
19:49:36:WU01:FS00:0xa7:    Options: -std=gnu++98 -O3 -funroll-loops -ffast-math -mfpmath=sse
19:49:36:WU01:FS00:0xa7:             -fno-unsafe-math-optimizations -msse2 -I/mingw64/include
19:49:36:WU01:FS00:0xa7:             -Wno-inconsistent-dllimport -Wno-parentheses-equality
19:49:36:WU01:FS00:0xa7:             -Wno-deprecated-register -Wno-unused-local-typedef
19:49:36:WU01:FS00:0xa7:   Platform: linux2 4.6.0-1-amd64
19:49:36:WU01:FS00:0xa7:       Bits: 64
19:49:36:WU01:FS00:0xa7:       Mode: Release
19:49:36:WU01:FS00:0xa7:       SIMD: avx_256
19:49:36:WU01:FS00:0xa7:************************************ System ************************************
19:49:36:WU01:FS00:0xa7:        CPU: AMD Ryzen 7 1700X Eight-Core Processor
19:49:36:WU01:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 1 Stepping 1
19:49:36:WU01:FS00:0xa7:       CPUs: 16
19:49:36:WU01:FS00:0xa7:     Memory: 15.93GiB
19:49:36:WU01:FS00:0xa7:Free Memory: 9.46GiB
19:49:36:WU01:FS00:0xa7:    Threads: WINDOWS_THREADS
19:49:36:WU01:FS00:0xa7: OS Version: 6.2
19:49:36:WU01:FS00:0xa7:Has Battery: false
19:49:36:WU01:FS00:0xa7: On Battery: false
19:49:36:WU01:FS00:0xa7: UTC Offset: -4
19:49:36:WU01:FS00:0xa7:        PID: 24520
19:49:36:WU01:FS00:0xa7:        CWD: C:\Users\sean\AppData\Roaming\FAHClient\work
19:49:36:WU01:FS00:0xa7:         OS: Windows 10 Pro
19:49:36:WU01:FS00:0xa7:    OS Arch: AMD64
19:49:36:WU01:FS00:0xa7:********************************************************************************
19:49:36:WU01:FS00:0xa7:Project: 8216 (Run 4, Clone 25, Gen 37)
19:49:36:WU01:FS00:0xa7:Unit: 0x00000028868b340358ed4235bb195156
19:49:36:WU01:FS00:0xa7:Reading tar file core.xml
19:49:36:WU01:FS00:0xa7:Reading tar file frame37.tpr
19:49:36:WU01:FS00:0xa7:Digital signatures verified
19:49:36:WU01:FS00:0xa7:Calling: mdrun -cpo frame37.cpt -s frame37.tpr -x frame37.xtc -e frame37.edr -cpi frame37.cpt -cpt 15 -nt 8
19:49:36:WU01:FS00:0xa7:Steps: first=18500000 total=500000
19:49:37:WU01:FS00:0xa7:Completed 1 out of 500000 steps (0%)
19:50:04:FS00:Shutting core down
19:50:04:WU00:FS01:Starting
19:50:04:WU00:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/sean/AppData/Roaming/FAHClient/cores/fahwebx.stanford.edu/cores/Win32/AMD64/ATI/R600/Core_21.fah/FahCore_21.exe -dir 00 -suffix 01 -version 704 -lifeline 20436 -checkpoint 15 -gpu 0 -gpu-vendor ati
19:50:04:WU00:FS01:Started FahCore on PID 15336
19:50:04:WU00:FS01:Core PID:22436
19:50:04:WU00:FS01:FahCore 0x21 started
19:50:04:WU01:FS00:0xa7:WARNING:Console control signal 1 on PID 24520
19:50:04:WU01:FS00:0xa7:Exiting, please wait. . .
19:50:04:WU00:FS01:0x21:*********************** Log Started 2017-09-08T19:50:04Z ***********************
19:50:04:WU00:FS01:0x21:Project: 9414 (Run 9, Clone 0, Gen 769)
19:50:04:WU00:FS01:0x21:Unit: 0x0000036fab436c9d585e0690972a6b29
19:50:04:WU00:FS01:0x21:CPU: 0x00000000000000000000000000000000
19:50:04:WU00:FS01:0x21:Machine: 1
19:50:04:WU00:FS01:0x21:Digital signatures verified
19:50:04:WU00:FS01:0x21:Folding@home GPU Core21 Folding@home Core
19:50:04:WU00:FS01:0x21:Version 0.0.18
19:50:04:WU00:FS01:0x21:  Found a checkpoint file
19:50:05:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
19:50:05:WU01:FS00:Starting
19:50:05:WARNING:WU01:FS00:Changed SMP threads from 8 to 14 this can cause some work units to fail
19:50:05:WU01:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/sean/AppData/Roaming/FAHClient/cores/fahwebx.stanford.edu/cores/Win32/AMD64/AVX/Core_a7.fah/FahCore_a7.exe -dir 01 -suffix 01 -version 704 -lifeline 20436 -checkpoint 15 -np 14
19:50:05:WU01:FS00:Started FahCore on PID 2316
19:50:05:WU01:FS00:Core PID:22972
19:50:05:WU01:FS00:FahCore 0xa7 started
19:50:05:WU01:FS00:0xa7:*********************** Log Started 2017-09-08T19:50:05Z ***********************
19:50:05:WU01:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
19:50:05:WU01:FS00:0xa7:       Type: 0xa7
19:50:05:WU01:FS00:0xa7:       Core: Gromacs
19:50:05:WU01:FS00:0xa7:    Website: http://folding.stanford.edu/
19:50:05:WU01:FS00:0xa7:  Copyright: (c) 2009-2016 Stanford University
19:50:05:WU01:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
19:50:05:WU01:FS00:0xa7:       Args: -dir 01 -suffix 01 -version 704 -lifeline 2316 -checkpoint 15 -np
19:50:05:WU01:FS00:0xa7:             14
19:50:05:WU01:FS00:0xa7:     Config: <none>
19:50:05:WU01:FS00:0xa7:************************************ Build *************************************
19:50:05:WU01:FS00:0xa7:    Version: 0.0.11
19:50:05:WU01:FS00:0xa7:       Date: Sep 21 2016
19:50:05:WU01:FS00:0xa7:       Time: 01:43:48
19:50:05:WU01:FS00:0xa7: Repository: Git
19:50:05:WU01:FS00:0xa7:   Revision: 957bd90e68d95ddcf1594dc15ff6c64cc4555146
19:50:05:WU01:FS00:0xa7:     Branch: master
19:50:05:WU01:FS00:0xa7:   Compiler: GNU 4.2.1 Compatible Clang 3.9.0 (trunk 274080)
19:50:05:WU01:FS00:0xa7:    Options: -std=gnu++98 -O3 -funroll-loops -ffast-math -mfpmath=sse
19:50:05:WU01:FS00:0xa7:             -fno-unsafe-math-optimizations -msse2 -I/mingw64/include
19:50:05:WU01:FS00:0xa7:             -Wno-inconsistent-dllimport -Wno-parentheses-equality
19:50:05:WU01:FS00:0xa7:             -Wno-deprecated-register -Wno-unused-local-typedef
19:50:05:WU01:FS00:0xa7:   Platform: linux2 4.6.0-1-amd64
19:50:05:WU01:FS00:0xa7:       Bits: 64
19:50:05:WU01:FS00:0xa7:       Mode: Release
19:50:05:WU01:FS00:0xa7:       SIMD: avx_256
19:50:05:WU01:FS00:0xa7:************************************ System ************************************
19:50:05:WU01:FS00:0xa7:        CPU: AMD Ryzen 7 1700X Eight-Core Processor
19:50:05:WU01:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 1 Stepping 1
19:50:05:WU01:FS00:0xa7:       CPUs: 16
19:50:05:WU01:FS00:0xa7:     Memory: 15.93GiB
19:50:05:WU01:FS00:0xa7:Free Memory: 9.37GiB
19:50:05:WU01:FS00:0xa7:    Threads: WINDOWS_THREADS
19:50:05:WU01:FS00:0xa7: OS Version: 6.2
19:50:05:WU01:FS00:0xa7:Has Battery: false
19:50:05:WU01:FS00:0xa7: On Battery: false
19:50:05:WU01:FS00:0xa7: UTC Offset: -4
19:50:05:WU01:FS00:0xa7:        PID: 22972
19:50:05:WU01:FS00:0xa7:        CWD: C:\Users\sean\AppData\Roaming\FAHClient\work
19:50:05:WU01:FS00:0xa7:         OS: Windows 10 Pro
19:50:05:WU01:FS00:0xa7:    OS Arch: AMD64
19:50:05:WU01:FS00:0xa7:********************************************************************************
19:50:05:WU01:FS00:0xa7:Project: 8216 (Run 4, Clone 25, Gen 37)
19:50:05:WU01:FS00:0xa7:Unit: 0x00000028868b340358ed4235bb195156
19:50:05:WU01:FS00:0xa7:Digital signatures verified
19:50:05:WU01:FS00:0xa7:Reducing thread count from 14 to 13 to avoid domain decomposition with large prime factor 7
19:50:05:WU01:FS00:0xa7:Reducing thread count from 13 to 12 to avoid domain decomposition by a prime number > 3
19:50:05:WU01:FS00:0xa7:Calling: mdrun -cpo frame37.cpt -s frame37.tpr -x frame37.xtc -e frame37.edr -cpi frame37.cpt -cpt 15 -nt 13 -nt 12 -nt 12
19:50:05:WU01:FS00:0xa7:ERROR:
19:50:05:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
19:50:05:WU01:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20160919-669094a-unknown
19:50:05:WU01:FS00:0xa7:ERROR:Source code file: /host/windows-cross-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/commandline/pargs.cpp, line: 680
19:50:05:WU01:FS00:0xa7:ERROR:
19:50:05:WU01:FS00:0xa7:ERROR:Fatal error:
19:50:05:WU01:FS00:0xa7:ERROR:Double command line argument -nt
19:50:05:WU01:FS00:0xa7:ERROR:
19:50:05:WU01:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
19:50:05:WU01:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
19:50:05:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
19:50:08:WU00:FS01:0x21:Completed 300000 out of 6250000 steps (4%)
19:50:08:WU00:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
19:50:10:WU01:FS00:0xa7:ERROR:
19:50:10:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
19:50:10:WU01:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20160919-669094a-unknown
19:50:10:WU01:FS00:0xa7:ERROR:Source code file: /host/windows-cross-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/commandline/pargs.cpp, line: 680
19:50:10:WU01:FS00:0xa7:ERROR:
19:50:10:WU01:FS00:0xa7:ERROR:Fatal error:
19:50:10:WU01:FS00:0xa7:ERROR:Double command line argument -nt
19:50:10:WU01:FS00:0xa7:ERROR:
19:50:10:WU01:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
19:50:10:WU01:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
19:50:10:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
19:50:15:WU01:FS00:0xa7:ERROR:
19:50:15:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
19:50:15:WU01:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20160919-669094a-unknown
19:50:15:WU01:FS00:0xa7:ERROR:Source code file: /host/windows-cross-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/commandline/pargs.cpp, line: 680
19:50:15:WU01:FS00:0xa7:ERROR:
19:50:15:WU01:FS00:0xa7:ERROR:Fatal error:
19:50:15:WU01:FS00:0xa7:ERROR:Double command line argument -nt
19:50:15:WU01:FS00:0xa7:ERROR:
19:50:15:WU01:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
19:50:15:WU01:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
19:50:15:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
19:50:18:Removing old file 'configs/config-20170707-070349.xml'
19:50:18:Saving configuration to config.xml
19:50:18:<config>
19:50:18:  <!-- HTTP Server -->
19:50:18:  <allow v='127.0.0.1 192.168.1.0/24'/>
19:50:18:
19:50:18:  <!-- Network -->
19:50:18:  <proxy v=':8080'/>
19:50:18:
19:50:18:  <!-- Remote Command Server -->
19:50:18:  <password v='********'/>
19:50:18:
19:50:18:  <!-- User Information -->
19:50:18:  <passkey v='********************************'/>
19:50:18:  <team v='224497'/>
19:50:18:  <user v='seanw_FLDC_1BddRSZxfZW3xjq4C2r8BWAmr2J9tT7GMW'/>
19:50:18:
19:50:18:  <!-- Folding Slots -->
19:50:18:  <slot id='0' type='CPU'/>
19:50:18:  <slot id='1' type='GPU'/>
19:50:18:</config>
19:50:20:WU01:FS00:0xa7:Steps: first=18500000 total=500000
19:50:21:WU01:FS00:0xa7:Completed 1432 out of 500000 steps (0%)
19:50:21:WU01:FS00:0xa7:Saving result file ..\logfile_01.txt
19:50:21:WU01:FS00:0xa7:Saving result file frame37.cpt
19:50:21:WU01:FS00:0xa7:Saving result file frame37.edr
19:50:21:WU01:FS00:0xa7:Saving result file frame37.xtc
19:50:21:WU01:FS00:0xa7:Saving result file frame37_prev.cpt
19:50:21:WU01:FS00:0xa7:Saving result file science.log
19:50:21:WU01:FS00:0xa7:Folding@home Core Shutdown: BAD_WORK_UNIT
19:50:22:WU00:FS01:0x21:Completed 312500 out of 6250000 steps (5%)
19:50:22:WARNING:WU01:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
19:50:22:WU01:FS00:Sending unit results: id:01 state:SEND error:FAULTY project:8216 run:4 clone:25 gen:37 core:0xa7 unit:0x00000028868b340358ed4235bb195156
19:50:22:WU01:FS00:Uploading 3.60MiB to 134.139.52.3
19:50:22:WU01:FS00:Connecting to 134.139.52.3:8080
19:50:22:WU02:FS00:Connecting to 171.67.108.45:8080
19:50:23:WARNING:WU02:FS00:Failed to get assignment from '171.67.108.45:8080': Empty work server assignment
19:50:23:WU02:FS00:Connecting to 171.64.65.35:80
19:50:24:WARNING:WU02:FS00:Failed to get assignment from '171.64.65.35:80': Empty work server assignment
19:50:24:ERROR:WU02:FS00:Exception: Could not get an assignment
19:50:24:WU02:FS00:Connecting to 171.67.108.45:8080
19:50:24:WARNING:WU02:FS00:Failed to get assignment from '171.67.108.45:8080': Empty work server assignment
19:50:24:WU02:FS00:Connecting to 171.64.65.35:80
19:50:25:WU01:FS00:Upload complete
19:50:25:WU01:FS00:Server responded WORK_ACK (400)
19:50:25:WU01:FS00:Cleaning up
19:50:25:WARNING:WU02:FS00:Failed to get assignment from '171.64.65.35:80': Empty work server assignment
19:50:25:ERROR:WU02:FS00:Exception: Could not get an assignment
19:51:24:WU02:FS00:Connecting to 171.67.108.45:8080
19:51:24:WARNING:WU02:FS00:Failed to get assignment from '171.67.108.45:8080': Empty work server assignment
19:51:24:WU02:FS00:Connecting to 171.64.65.35:80
19:51:25:WARNING:WU02:FS00:Failed to get assignment from '171.64.65.35:80': Empty work server assignment
19:51:25:ERROR:WU02:FS00:Exception: Could not get an assignment
19:51:30:WU00:FS01:0x21:Completed 375000 out of 6250000 steps (6%)
seanthegeek
Posts: 5
Joined: Fri Sep 08, 2017 4:01 pm

Re: GPU tasks do not pause once started on Radeon RX Vega

Post by seanthegeek »

Oddly, when I moved the performance slider to medium, I was able to pause the GPU task, and it also worked when I moved it back to light. Whatever was causing the GPU problem seems to have resolved itself.

As for the CPU, based on the log, it seems my CPU can only get jobs when the FAH slider is set to light (8 threads) or full (16 threads). When set to medium (14 threads), the client cannot find any jobs.
Rel25917
Posts: 303
Joined: Wed Aug 15, 2012 2:31 am

Re: GPU tasks do not pause once started on Radeon RX Vega

Post by Rel25917 »

Whats strange about a status of paused waiting for idle? They are clearly set to run on idle and it isn't idle.
JimboPalmer
Posts: 2573
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: GPU tasks do not pause once started on Radeon RX Vega

Post by JimboPalmer »

Things I see 'wrong'
Folding is set to light, so half the CPUs are idle, and the GPU never runs.
19:37:05: <power v='light'/>

14 threads is a multiple of a 'large' prime (7) so will not work well. 15 or 12 threads would be better. The software chose 12.

19:50:05:WU01:FS00:0xa7:Reducing thread count from 14 to 13 to avoid domain decomposition with large prime factor 7
19:50:05:WU01:FS00:0xa7:Reducing thread count from 13 to 12 to avoid domain decomposition by a prime number > 3

It then gets confused and shuts down due to assigning the number of threads too many times

19:50:05:WU01:FS00:0xa7:ERROR:Double command line argument -nt


Mind you, all of this is CPU folding not GPU folding, which is what you asked about.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
seanthegeek
Posts: 5
Joined: Fri Sep 08, 2017 4:01 pm

Re: GPU tasks do not pause once started on Radeon RX Vega

Post by seanthegeek »

@Rel25917 Yes, the client is showing that the task is paused, but GPU usage still registers at 100%

@JimboPalmer That makes sense for the CPU. Thanks. Getting back to the GPU problem - after it had gone away for some reason, it has reoccured. When the client shows that the GPU job is paused. GPU software and hardware LEDs both indicate that the load is at 100%, and it does not change until late in the Windows shutdown process. Interestingly, the GPU fan did spin down, so I wonder if the problem is actually the GPU reporting incorrect usage after this type of compute task is stopped/paused.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU tasks do not pause once started on Radeon RX Vega

Post by bruce »

* GPUs cannot be moderated. They're either folding at 100% (less whatever is used by other tasks) or they're not folding at all.
* A CPU that runs K threads can be modulated in steps from 1 to K, excluding whatever prime factors GROMACS has trouble with.
* Each GPU runs best if there's a free CPU available to manage data transfers to/from RAM and VRAM.

The power slider is a simplified method of adjustment and is intended primarily for simple systems ere it works when the choices are relatively simple.

For advanced systems, you probably will do best if you manage of all your resources (within the restrictions mentioned above) using FAHControl. It's more complex but you have full control ...
including managing each slot's idle setting, if you still plan on using it.

The reason I questioned CPUs running on "idle" is that the OS can manage CPU resources quite well. FAH sets its CPU processing priority to the lowest possible setting so it never interferes with foreground activities. Assign as many CPU threads as you choose and the OS will back off FAH processing to whatever resources remain.

Unfortunately, that's not possible for GPUs. There's no such thing as GPu priority -- except for the OS's concept of "idle." Thus for people with slow GPUs (certainly excluding those with a Vega) may experience screen lag when they try to use their system because the GPU is busy when they want a simple screen update.
rwh202
Posts: 425
Joined: Mon Nov 15, 2010 8:51 pm
Hardware configuration: 8x GTX 1080
3x GTX 1080 Ti
3x GTX 1060
Various other bits and pieces
Location: South Coast, UK

Re: GPU tasks do not pause once started on Radeon RX Vega

Post by rwh202 »

bruce wrote:Thus for people with slow GPUs (certainly excluding those with a Vega) may experience screen lag when they try to use their system because the GPU is busy when they want a simple screen update.
Don't worry - that's not a feature just reserved for slow GPUs - you can still get to experience it with a 1080, so I suspect Vega will get in on the act too...
seanthegeek
Posts: 5
Joined: Fri Sep 08, 2017 4:01 pm

Re: GPU tasks do not pause once started on Radeon RX Vega

Post by seanthegeek »

There isn't any lag or resource contention, just continued 100% GPU usage long after the FAH GPU process has exited - like an infinite loop, until a complete log out or reboot of Windows. At least one other redditer had similar problems with other compute workloads:

During those crashes it seems as though the card has gotten into some sort of an infinite loop scenario and the gpu tach still shows 100% even though the process stopped requesting compute from the GPU.
I would assume that the drivers are reporting incorrect usage however the current draw at the wall is still high which suggests otherwise. Often times the fans spool down or never spool up, which then results in a huge purge of heat upon reboot.
https://www.reddit.com/r/Amd/comments/6 ... radeon_rx/

So this looks like a Vega problem, not a FAH one. The RX Vega drivers are still in beta, and are not WHQL (Windows Hardware Quality Labs) certified (see the direct URL to the installer), which is why we can't generate validated results with benchmarking tools yet.

After undervolting and overclocking, the RX Vega 56 is awesome for gaming, but it's not ready for those of us that do casual GPU computing (well, other than mining, apparently). The joys of being an early adopter to a new architecture :\
JiiPee
Posts: 59
Joined: Sun Mar 09, 2008 4:09 pm
Location: FINLAND

Re: GPU tasks do not pause once started on Radeon RX Vega

Post by JiiPee »

Yeah there is something strange happening. I have so far seen it happen with folding.

Image

Folding stopped hours ago, but card is still working hard. Not at full because fan speed dropped, but gpu usage is high and it's dumbing hot air alot.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU tasks do not pause once started on Radeon RX Vega

Post by bruce »

Have you reported this problem to AMD?
JiiPee
Posts: 59
Joined: Sun Mar 09, 2008 4:09 pm
Location: FINLAND

Re: GPU tasks do not pause once started on Radeon RX Vega

Post by JiiPee »

bruce wrote:Have you reported this problem to AMD?
I have not, I'm trying to see if it happens on anything else than folding, so far FAH has been only with this behavior.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU tasks do not pause once started on Radeon RX Vega

Post by bruce »

In any case report it to them. They do test prerelease drivers with FAH so if they need to fix the drivers so ithe GPU doesn't show an incorrect 100% or if FAH is somehow leaving the GPU in a loop, they'll fix it or tell FAH what to fix.
JiiPee
Posts: 59
Joined: Sun Mar 09, 2008 4:09 pm
Location: FINLAND

Re: GPU tasks do not pause once started on Radeon RX Vega

Post by JiiPee »

I have now report this to AMD. Stll haven't hit this same issue on anything else than with folding.
Post Reply