R9 280X stuck at 99.99%

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, FAHC Science Team

Post Reply
Farwalker
Posts: 23
Joined: Thu Sep 26, 2013 5:54 pm

R9 280X stuck at 99.99%

Post by Farwalker »

I have two computers running Windows 7 Professional 64-bit and each has an AMD R9 280X (even though it says 7970) folding in it. I am running version 7.3.6.
Each computer is stuck at 99.99% on the GPU. This started a few days ago. Before this they had been running smoothly yielding about 110,000PPD each.

Prior to this stoppage these GPUs had been completing 3 or 4 WUs per day.

I used the Advanced Client to remove the GPU and then adding it back and saving. This got me a new work unit and it proceeded to fold, until it reached 99% and it stuck there again.

What has changed?
What do I do to get this running again?

Code: Select all

*********************** Log Started 2013-12-03T04:07:09Z ***********************
04:07:09:************************* Folding@home Client *************************
04:07:09:      Website: http://folding.stanford.edu/
04:07:09:    Copyright: (c) 2009-2013 Stanford University
04:07:09:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
04:07:09:         Args: 
04:07:09:       Config: C:/Users/GEE/AppData/Roaming/FAHClient/config.xml
04:07:09:******************************** Build ********************************
04:07:09:      Version: 7.3.6
04:07:09:         Date: Feb 18 2013
04:07:09:         Time: 15:25:17
04:07:09:      SVN Rev: 3923
04:07:09:       Branch: fah/trunk/client
04:07:09:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
04:07:09:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
04:07:09:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
04:07:09:     Platform: win32 XP
04:07:09:         Bits: 32
04:07:09:         Mode: Release
04:07:09:******************************* System ********************************
04:07:09:          CPU: Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz
04:07:09:       CPU ID: GenuineIntel Family 6 Model 30 Stepping 5
04:07:09:         CPUs: 8
04:07:09:       Memory: 8.00GiB
04:07:09:  Free Memory: 6.19GiB
04:07:09:      Threads: WINDOWS_THREADS
04:07:09:  Has Battery: false
04:07:09:   On Battery: false
04:07:09:   UTC offset: -5
04:07:09:          PID: 4456
04:07:09:          CWD: C:/Users/GEE/AppData/Roaming/FAHClient
04:07:09:           OS: Windows 7 Professional
04:07:09:      OS Arch: AMD64
04:07:09:         GPUs: 1
04:07:09:        GPU 0: ATI:5 Tahiti XT [Radeon HD 7970]
04:07:09:         CUDA: Not detected
04:07:09:Win32 Service: false
04:07:09:***********************************************************************
04:07:09:<config>
04:07:09:  <!-- Folding Core -->
04:07:09:  <checkpoint v='5'/>
04:07:09:
04:07:09:  <!-- Folding Slot Configuration -->
04:07:09:  <power v='full'/>
04:07:09:
04:07:09:  <!-- Network -->
04:07:09:  <proxy v=':8080'/>
04:07:09:
04:07:09:  <!-- User Information -->
04:07:09:  <passkey v='********************************'/>
04:07:09:  <team v='32'/>
04:07:09:  <user v='Farwalker'/>
04:07:09:
04:07:09:  <!-- Folding Slots -->
04:07:09:  <slot id='2' type='CPU'>
04:07:09:    <cpus v='6'/>
04:07:09:  </slot>
04:07:09:  <slot id='0' type='GPU'/>
04:07:09:</config>
04:07:09:Trying to access database...
04:07:13:Successfully acquired database lock
04:07:13:Enabled folding slot 02: READY cpu:6
04:07:13:Enabled folding slot 00: READY gpu:0:Tahiti XT [Radeon HD 7970]
04:07:14:WU00:FS02:Starting
04:07:15:WU00:FS02:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/GEE/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/Core_a3.fah/FahCore_a3.exe -dir 00 -suffix 01 -version 703 -lifeline 4456 -checkpoint 5 -np 6
04:07:15:WU00:FS02:Started FahCore on PID 5800
04:07:16:WU00:FS02:Core PID:5928
04:07:16:WU00:FS02:FahCore 0xa3 started
04:07:17:WU00:FS02:0xa3:
04:07:17:WU00:FS02:0xa3:*------------------------------*
04:07:17:WU00:FS02:0xa3:Folding@Home Gromacs SMP Core
04:07:17:WU00:FS02:0xa3:Version 2.27 (Dec. 15, 2010)
04:07:17:WU00:FS02:0xa3:
04:07:17:WU00:FS02:0xa3:Preparing to commence simulation
04:07:17:WU00:FS02:0xa3:- Looking at optimizations...
04:07:17:WU00:FS02:0xa3:- Files status OK
04:07:17:WU01:FS00:Starting
04:07:17:WU00:FS02:0xa3:- Expanded 3848960 -> 4383200 (decompressed 113.8 percent)
04:07:17:WU00:FS02:0xa3:Called DecompressByteArray: compressed_data_size=3848960 data_size=4383200, decompressed_data_size=4383200 diff=0
04:07:17:WU00:FS02:0xa3:- Digital signature verified
04:07:17:WU00:FS02:0xa3:
04:07:17:WU00:FS02:0xa3:Project: 8575 (Run 1, Clone 6, Gen 126)
04:07:17:WU00:FS02:0xa3:
04:07:17:WU00:FS02:0xa3:Assembly optimizations on if available.
04:07:17:WU00:FS02:0xa3:Entering M.D.
04:07:18:WU01:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/GEE/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/ATI/R600/Core_17.fah/FahCore_17.exe -dir 01 -suffix 01 -version 703 -lifeline 4456 -checkpoint 5 -gpu 0 -gpu-vendor ati
04:07:18:WU01:FS00:Started FahCore on PID 6060
04:07:18:WU01:FS00:Core PID:6072
04:07:18:WU01:FS00:FahCore 0x17 started
04:07:23:WU01:FS00:0x17:*********************** Log Started 2013-12-03T04:07:23Z ***********************
04:07:23:WU01:FS00:0x17:Project: 8900 (Run 128, Clone 7, Gen 11)
04:07:23:WU01:FS00:0x17:Unit: 0x00000014028c126651a64313fbc97223
04:07:23:WU01:FS00:0x17:CPU: 0x00000000000000000000000000000000
04:07:23:WU01:FS00:0x17:Machine: 0
04:07:23:WU01:FS00:0x17:Reading tar file state.xml
04:07:24:WU00:FS02:0xa3:Using Gromacs checkpoints
04:07:25:WU01:FS00:0x17:Reading tar file system.xml
04:07:25:WU00:FS02:0xa3:Mapping NT from 6 to 6 
04:07:26:WU01:FS00:0x17:Reading tar file integrator.xml
04:07:26:WU01:FS00:0x17:Reading tar file core.xml
04:07:26:WU01:FS00:0x17:Digital signatures verified
04:07:31:WU00:FS02:0xa3:Resuming from checkpoint
04:07:31:WU00:FS02:0xa3:Verified 00/wudata_01.log
04:07:33:WU00:FS02:0xa3:Verified 00/wudata_01.trr
04:07:33:WU00:FS02:0xa3:Verified 00/wudata_01.edr
04:07:34:WU00:FS02:0xa3:Completed 416145 out of 500000 steps  (83%)
04:11:48:WU01:FS00:0x17:Completed 0 out of 2500000 steps (0%)
04:12:42:13:127.0.0.1:New Web connection
04:15:34:WU01:FS00:0x17:Completed 25000 out of 2500000 steps (1%)
04:18:56:WU01:FS00:0x17:Completed 50000 out of 2500000 steps (2%)
04:20:06:WU00:FS02:0xa3:Completed 420000 out of 500000 steps  (84%)
04:22:46:WU01:FS00:0x17:Completed 75000 out of 2500000 steps (3%)
04:26:08:WU01:FS00:0x17:Completed 100000 out of 2500000 steps (4%)
04:29:58:WU01:FS00:0x17:Completed 125000 out of 2500000 steps (5%)
04:33:20:WU01:FS00:0x17:Completed 150000 out of 2500000 steps (6%)
04:35:56:WU00:FS02:0xa3:Completed 425000 out of 500000 steps  (85%)
04:37:09:WU01:FS00:0x17:Completed 175000 out of 2500000 steps (7%)
04:49:57:WU01:FS00:0x17:Completed 200000 out of 2500000 steps (8%)
04:51:22:WU01:FS00:0x17:Bad State detected... attempting to resume from last good checkpoint
04:51:46:WU00:FS02:0xa3:Completed 430000 out of 500000 steps  (86%)
04:54:44:WU01:FS00:0x17:Completed 175000 out of 2500000 steps (7%)
04:58:06:WU01:FS00:0x17:Completed 200000 out of 2500000 steps (8%)
05:01:56:WU01:FS00:0x17:Completed 225000 out of 2500000 steps (9%)
05:05:18:WU01:FS00:0x17:Completed 250000 out of 2500000 steps (10%)
05:07:36:WU00:FS02:0xa3:Completed 435000 out of 500000 steps  (87%)
05:12:00:WU01:FS00:0x17:Completed 275000 out of 2500000 steps (11%)
05:23:22:WU00:FS02:0xa3:Completed 440000 out of 500000 steps  (88%)
05:31:27:WU01:FS00:0x17:Completed 300000 out of 2500000 steps (12%)
05:33:47:WU01:FS00:0x17:Bad State detected... attempting to resume from last good checkpoint
05:37:09:WU01:FS00:0x17:Completed 275000 out of 2500000 steps (11%)
05:39:15:WU00:FS02:0xa3:Completed 445000 out of 500000 steps  (89%)
05:40:32:WU01:FS00:0x17:Completed 300000 out of 2500000 steps (12%)
05:47:44:WU01:FS00:0x17:Completed 325000 out of 2500000 steps (13%)
05:55:03:WU00:FS02:0xa3:Completed 450000 out of 500000 steps  (90%)
06:07:11:WU01:FS00:0x17:Completed 350000 out of 2500000 steps (14%)
06:09:31:WU01:FS00:0x17:Bad State detected... attempting to resume from last good checkpoint
06:09:31:WU01:FS00:0x17:Max number of retries reached. Aborting.
06:09:31:WU01:FS00:0x17:ERROR:exception: Max Retries Reached
06:09:31:WU01:FS00:0x17:Saving result file logfile_01.txt
06:09:32:WU01:FS00:0x17:Saving result file badStateCheckpoint_18467
06:09:35:WU01:FS00:0x17:Saving result file badStateCheckpoint_6334
06:09:38:WU01:FS00:0x17:Saving result file badStateForceGroup0_18467Core.xml
06:09:43:WU01:FS00:0x17:Saving result file badStateForceGroup0_18467Ref.xml
06:09:48:WU01:FS00:0x17:Saving result file badStateForceGroup0_41Core.xml
06:09:52:WU01:FS00:0x17:Saving result file badStateForceGroup0_41Ref.xml
06:09:57:WU01:FS00:0x17:Saving result file badStateForceGroup0_6334Core.xml
06:10:02:WU01:FS00:0x17:Saving result file badStateForceGroup0_6334Ref.xml
06:10:07:WU01:FS00:0x17:Saving result file badStateForceGroup1_18467Core.xml
06:10:11:WU01:FS00:0x17:Saving result file badStateForceGroup1_18467Ref.xml
06:10:15:WU01:FS00:0x17:Saving result file badStateForceGroup1_41Core.xml
06:10:20:WU01:FS00:0x17:Saving result file badStateForceGroup1_41Ref.xml
06:10:26:WU01:FS00:0x17:Saving result file badStateForceGroup1_6334Core.xml
06:10:30:WU01:FS00:0x17:Saving result file badStateForceGroup1_6334Ref.xml
06:10:35:WU01:FS00:0x17:Saving result file badStateForceGroup2_18467Core.xml
06:10:40:WU01:FS00:0x17:Saving result file badStateForceGroup2_18467Ref.xml
06:10:45:WU01:FS00:0x17:Saving result file badStateForceGroup2_6334Core.xml
06:10:50:WU01:FS00:0x17:Saving result file badStateForceGroup2_6334Ref.xml
06:10:55:WU01:FS00:0x17:Saving result file log.txt
06:10:55:WU01:FS00:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
06:10:56:WARNING:WU01:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
06:10:56:WU01:FS00:Sending unit results: id:01 state:SEND error:FAULTY project:8900 run:128 clone:7 gen:11 core:0x17 unit:0x00000014028c126651a64313fbc97223
06:10:56:WU01:FS00:Uploading 7.14MiB to 171.64.65.69
06:10:56:WU01:FS00:Connecting to 171.64.65.69:8080
06:10:56:WU02:FS00:Connecting to assign-GPU.stanford.edu:80
06:10:57:WU02:FS00:News: Welcome to Folding@Home
06:10:57:WU02:FS00:Assigned to work server 171.64.65.69
06:10:57:WU02:FS00:Requesting new work unit for slot 00: READY gpu:0:Tahiti XT [Radeon HD 7970] from 171.64.65.69
06:10:57:WU02:FS00:Connecting to 171.64.65.69:8080
06:10:57:WU02:FS00:Downloading 4.18MiB
06:10:58:WU00:FS02:0xa3:Completed 455000 out of 500000 steps  (91%)
06:11:00:WU02:FS00:Download complete
06:11:00:WU02:FS00:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:8900 run:874 clone:1 gen:55 core:0x17 unit:0x00000058028c126651a6e90f3d809a19
06:11:00:WU02:FS00:Starting
06:11:00:WU02:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/GEE/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/ATI/R600/Core_17.fah/FahCore_17.exe -dir 02 -suffix 01 -version 703 -lifeline 4456 -checkpoint 5 -gpu 0 -gpu-vendor ati
06:11:00:WU02:FS00:Started FahCore on PID 6836
06:11:00:WU02:FS00:Core PID:3988
06:11:00:WU02:FS00:FahCore 0x17 started
06:11:00:WU02:FS00:0x17:*********************** Log Started 2013-12-03T06:11:00Z ***********************
06:11:00:WU02:FS00:0x17:Project: 8900 (Run 874, Clone 1, Gen 55)
06:11:00:WU02:FS00:0x17:Unit: 0x00000058028c126651a6e90f3d809a19
06:11:00:WU02:FS00:0x17:CPU: 0x00000000000000000000000000000000
06:11:00:WU02:FS00:0x17:Machine: 0
06:11:00:WU02:FS00:0x17:Reading tar file state.xml
06:11:01:WU02:FS00:0x17:Reading tar file system.xml
06:11:02:WU01:FS00:Upload 10.51%
06:11:02:WU02:FS00:0x17:Reading tar file integrator.xml
06:11:02:WU02:FS00:0x17:Reading tar file core.xml
06:11:02:WU02:FS00:0x17:Digital signatures verified
06:11:08:WU01:FS00:Upload 21.90%
06:11:14:WU01:FS00:Upload 33.28%
06:11:20:WU01:FS00:Upload 44.67%
06:11:26:WU01:FS00:Upload 56.05%
06:11:32:WU01:FS00:Upload 66.56%
06:11:38:WU01:FS00:Upload 77.95%
06:11:44:WU01:FS00:Upload 89.33%
06:11:50:WU01:FS00:Upload 100.00%
06:11:50:WU01:FS00:Upload complete
06:11:50:WU01:FS00:Server responded WORK_ACK (400)
06:11:50:WU01:FS00:Cleaning up
06:15:19:WU02:FS00:0x17:Completed 0 out of 2500000 steps (0%)
06:19:03:WU02:FS00:0x17:Completed 25000 out of 2500000 steps (1%)
06:22:24:WU02:FS00:0x17:Completed 50000 out of 2500000 steps (2%)
06:26:13:WU02:FS00:0x17:Completed 75000 out of 2500000 steps (3%)
06:26:57:WU00:FS02:0xa3:Completed 460000 out of 500000 steps  (92%)
06:29:34:WU02:FS00:0x17:Completed 100000 out of 2500000 steps (4%)
06:33:22:WU02:FS00:0x17:Completed 125000 out of 2500000 steps (5%)
06:36:43:WU02:FS00:0x17:Completed 150000 out of 2500000 steps (6%)
06:40:32:WU02:FS00:0x17:Completed 175000 out of 2500000 steps (7%)
06:42:48:WU00:FS02:0xa3:Completed 465000 out of 500000 steps  (93%)
06:43:53:WU02:FS00:0x17:Completed 200000 out of 2500000 steps (8%)
06:47:41:WU02:FS00:0x17:Completed 225000 out of 2500000 steps (9%)
06:51:01:WU02:FS00:0x17:Completed 250000 out of 2500000 steps (10%)
06:54:50:WU02:FS00:0x17:Completed 275000 out of 2500000 steps (11%)
06:58:11:WU02:FS00:0x17:Completed 300000 out of 2500000 steps (12%)
06:58:41:WU00:FS02:0xa3:Completed 470000 out of 500000 steps  (94%)
07:01:59:WU02:FS00:0x17:Completed 325000 out of 2500000 steps (13%)
07:05:20:WU02:FS00:0x17:Completed 350000 out of 2500000 steps (14%)
07:09:09:WU02:FS00:0x17:Completed 375000 out of 2500000 steps (15%)
07:12:30:WU02:FS00:0x17:Completed 400000 out of 2500000 steps (16%)
07:14:32:WU00:FS02:0xa3:Completed 475000 out of 500000 steps  (95%)
07:16:18:WU02:FS00:0x17:Completed 425000 out of 2500000 steps (17%)
07:19:07:WARNING:Exception: 1458:127.0.0.1: Receive error: 10054: An existing connection was forcibly closed by the remote host.
07:30:18:WU00:FS02:0xa3:Completed 480000 out of 500000 steps  (96%)
07:46:02:WU00:FS02:0xa3:Completed 485000 out of 500000 steps  (97%)
08:01:46:WU00:FS02:0xa3:Completed 490000 out of 500000 steps  (98%)
08:01:46:WU01:FS02:Connecting to assign3.stanford.edu:8080
08:01:46:WU01:FS02:News: Welcome to Folding@Home
08:01:46:WU01:FS02:Assigned to work server 128.143.231.202
08:01:46:WU01:FS02:Requesting new work unit for slot 02: RUNNING cpu:6 from 128.143.231.202
08:01:46:WU01:FS02:Connecting to 128.143.231.202:8080
08:01:48:WU01:FS02:Downloading 3.63MiB
08:01:48:WU01:FS02:Download complete
08:01:49:WU01:FS02:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:6098 run:5 clone:87 gen:365 core:0xa3 unit:0x000002000a3b1e594e91de9261ebf754
08:17:29:WU00:FS02:0xa3:Completed 495000 out of 500000 steps  (99%)
08:33:13:WU00:FS02:0xa3:Completed 500000 out of 500000 steps  (100%)
08:33:15:WU00:FS02:0xa3:DynamicWrapper: Finished Work Unit: sleep=10000
08:33:25:WU00:FS02:0xa3:
08:33:25:WU00:FS02:0xa3:Finished Work Unit:
08:33:25:WU00:FS02:0xa3:- Reading up to 8056464 from "00/wudata_01.trr": Read 8056464
08:33:25:WU00:FS02:0xa3:trr file hash check passed.
08:33:25:WU00:FS02:0xa3:edr file hash check passed.
08:33:25:WU00:FS02:0xa3:logfile size: 78623
08:33:25:WU00:FS02:0xa3:Leaving Run
08:33:28:WU00:FS02:0xa3:- Writing 8171919 bytes of core data to disk...
08:33:30:WU00:FS02:0xa3:Done: 8171407 -> 7532079 (compressed to 92.1 percent)
08:33:30:WU00:FS02:0xa3:  ... Done.
08:33:34:WU00:FS02:0xa3:- Shutting down core
08:33:34:WU00:FS02:0xa3:
08:33:34:WU00:FS02:0xa3:Folding@home Core Shutdown: FINISHED_UNIT
08:33:34:WU00:FS02:FahCore returned: FINISHED_UNIT (100 = 0x64)
08:33:35:WU00:FS02:Sending unit results: id:00 state:SEND error:NO_ERROR project:8575 run:1 clone:6 gen:126 core:0xa3 unit:0x0000042c0a3b1e59522885acdafae8ea
08:33:35:WU00:FS02:Uploading 7.18MiB to 128.143.231.202
08:33:35:WU00:FS02:Connecting to 128.143.231.202:8080
08:33:35:WU01:FS02:Starting
08:33:35:WU01:FS02:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/GEE/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/Core_a3.fah/FahCore_a3.exe -dir 01 -suffix 01 -version 703 -lifeline 4456 -checkpoint 5 -np 6
08:33:35:WU01:FS02:Started FahCore on PID 3912
08:33:35:WU01:FS02:Core PID:2612
08:33:35:WU01:FS02:FahCore 0xa3 started
08:33:36:WU01:FS02:0xa3:
08:33:36:WU01:FS02:0xa3:*------------------------------*
08:33:36:WU01:FS02:0xa3:Folding@Home Gromacs SMP Core
08:33:36:WU01:FS02:0xa3:Version 2.27 (Dec. 15, 2010)
08:33:36:WU01:FS02:0xa3:
08:33:36:WU01:FS02:0xa3:Preparing to commence simulation
08:33:36:WU01:FS02:0xa3:- Looking at optimizations...
08:33:36:WU01:FS02:0xa3:- Created dyn
08:33:36:WU01:FS02:0xa3:- Files status OK
08:33:36:WU01:FS02:0xa3:- Expanded 3808809 -> 4136808 (decompressed 108.6 percent)
08:33:36:WU01:FS02:0xa3:Called DecompressByteArray: compressed_data_size=3808809 data_size=4136808, decompressed_data_size=4136808 diff=0
08:33:36:WU01:FS02:0xa3:- Digital signature verified
08:33:36:WU01:FS02:0xa3:
08:33:36:WU01:FS02:0xa3:Project: 6098 (Run 5, Clone 87, Gen 365)
08:33:36:WU01:FS02:0xa3:
08:33:36:WU01:FS02:0xa3:Assembly optimizations on if available.
08:33:36:WU01:FS02:0xa3:Entering M.D.
08:33:41:WU00:FS02:Upload 28.71%
08:33:42:WU01:FS02:0xa3:Mapping NT from 6 to 6 
08:33:42:WU01:FS02:0xa3:Completed 0 out of 500000 steps  (0%)
08:33:47:WU00:FS02:Upload 59.16%
08:33:53:WU00:FS02:Upload 90.48%
08:33:56:WU00:FS02:Upload complete
08:33:56:WU00:FS02:Server responded WORK_ACK (400)
08:33:56:WU00:FS02:Final credit estimate, 10089.00 points
08:33:56:WU00:FS02:Cleaning up
08:48:18:WU01:FS02:0xa3:Completed 5000 out of 500000 steps  (1%)
09:02:53:WU01:FS02:0xa3:Completed 10000 out of 500000 steps  (2%)
09:17:28:WU01:FS02:0xa3:Completed 15000 out of 500000 steps  (3%)
09:32:08:WU01:FS02:0xa3:Completed 20000 out of 500000 steps  (4%)
09:46:44:WU01:FS02:0xa3:Completed 25000 out of 500000 steps  (5%)
10:01:20:WU01:FS02:0xa3:Completed 30000 out of 500000 steps  (6%)
******************************* Date: 2013-12-03 *******************************
10:15:56:WU01:FS02:0xa3:Completed 35000 out of 500000 steps  (7%)
10:30:31:WU01:FS02:0xa3:Completed 40000 out of 500000 steps  (8%)
10:45:06:WU01:FS02:0xa3:Completed 45000 out of 500000 steps  (9%)
10:59:51:WU01:FS02:0xa3:Completed 50000 out of 500000 steps  (10%)
11:14:26:WU01:FS02:0xa3:Completed 55000 out of 500000 steps  (11%)
11:29:01:WU01:FS02:0xa3:Completed 60000 out of 500000 steps  (12%)
11:43:35:WU01:FS02:0xa3:Completed 65000 out of 500000 steps  (13%)
11:58:10:WU01:FS02:0xa3:Completed 70000 out of 500000 steps  (14%)
12:12:46:WU01:FS02:0xa3:Completed 75000 out of 500000 steps  (15%)
12:27:21:WU01:FS02:0xa3:Completed 80000 out of 500000 steps  (16%)
12:41:57:WU01:FS02:0xa3:Completed 85000 out of 500000 steps  (17%)
12:56:32:WU01:FS02:0xa3:Completed 90000 out of 500000 steps  (18%)
13:11:07:WU01:FS02:0xa3:Completed 95000 out of 500000 steps  (19%)
13:25:42:WU01:FS02:0xa3:Completed 100000 out of 500000 steps  (20%)
13:40:18:WU01:FS02:0xa3:Completed 105000 out of 500000 steps  (21%)
13:54:55:WU01:FS02:0xa3:Completed 110000 out of 500000 steps  (22%)
14:09:31:WU01:FS02:0xa3:Completed 115000 out of 500000 steps  (23%)
14:24:06:WU01:FS02:0xa3:Completed 120000 out of 500000 steps  (24%)
14:38:40:WU01:FS02:0xa3:Completed 125000 out of 500000 steps  (25%)
14:53:16:WU01:FS02:0xa3:Completed 130000 out of 500000 steps  (26%)
15:07:52:WU01:FS02:0xa3:Completed 135000 out of 500000 steps  (27%)
15:22:27:WU01:FS02:0xa3:Completed 140000 out of 500000 steps  (28%)
15:34:35:1522:127.0.0.1:New Web connection
15:34:37:WARNING:Exception: 1532:127.0.0.1: Send error: 10053: An established connection was aborted by the software in your host machine.
15:34:38:1535:127.0.0.1:New Web connection
15:34:39:WARNING:1541:127.0.0.1:500 HTTP INTERNAL SERVER ERROR /api/updates?sid: Session id= does not exist
15:34:39:ERROR:Exception in WebHandler: Session id= does not exist
15:34:40:WARNING:1540:127.0.0.1:500 HTTP INTERNAL SERVER ERROR /api/updates?sid: Session id= does not exist
15:34:40:ERROR:Exception in WebHandler: Session id= does not exist
15:34:40:WARNING:Exception: 1533:127.0.0.1: Send error: 10053: An established connection was aborted by the software in your host machine.
15:34:40:WARNING:1543:127.0.0.1:500 HTTP INTERNAL SERVER ERROR /api/updates?sid: Session id= does not exist
15:34:40:ERROR:Exception in WebHandler: Session id= does not exist
15:34:41:WARNING:1544:127.0.0.1:500 HTTP INTERNAL SERVER ERROR /api/updates?sid: Session id= does not exist
15:34:41:ERROR:Exception in WebHandler: Session id= does not exist
15:34:41:WARNING:1545:127.0.0.1:500 HTTP INTERNAL SERVER ERROR /api/updates?sid: Session id= does not exist
15:34:41:ERROR:Exception in WebHandler: Session id= does not exist
15:34:45:WARNING:Exception: 1542:127.0.0.1: Send error: 10053: An established connection was aborted by the software in your host machine.
15:34:52:1549:127.0.0.1:New Web connection
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: R9 280X stuck at 99.99%

Post by bruce »

From your log, it looks like it actually got stuck at 17%, in spite of what Web Control was telling you.
07:16:18:WU02:FS00:0x17:Completed 425000 out of 2500000 steps (17%)

Most likely your GPU is overclocked too much and has become unstable. FAH does an excellent job of using all of the resources of powerful GPUs, and often there's no room left for overclocking.

When this happens, if you restart your system, the WU may resume processing from the 16% or 17% but there's no guarantee it won't hang again since the hardware is marginally unstable.
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: R9 280X stuck at 99.99%

Post by PantherX »

Project: 8900 (Run 128, Clone 7, Gen 11) fails at 14%
06:07:11:WU01:FS00:0x17:Completed 350000 out of 2500000 steps (14%)

Project: 8900 (Run 874, Clone 1, Gen 55) stuck at 17% as bruce pointed out.

If your GPU is overclocked, it might now be unstable due to dust built-up and/or newer drivers. I would suggest that you lower the OC including the factory overclocked ones and see if that eliminates your issue.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Joe_H
Site Admin
Posts: 7868
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: R9 280X stuck at 99.99%

Post by Joe_H »

Also check the Windows logs for a GPU driver crash and reset that coincides with the WU's reaching 14% and 17%. The log timestamps are in UTC, so you will need to make the appropriate time offset to find this in the Windows logs. A driver crash and reset will result in the scenario you described of the WU no longer proceeding in the log and the display in FAHControl or WebControl climbing to 99.9%.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Farwalker
Posts: 23
Joined: Thu Sep 26, 2013 5:54 pm

Re: R9 280X stuck at 99.99%

Post by Farwalker »

ty
No dust issue.
Been overclocked for a couple of weeks and stable until the last three days.
no driver crashes.
no driver changes.
And it is happening on two separate computers.
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: R9 280X stuck at 99.99%

Post by 7im »

No Windows Event Logs for around that time showing any problem?

Also in regards to the OC being stable, past performance is no guarantee of future stability. Fahcore_17 is very new, and is getting updated quite frequently. You are actually running an older version (v49). Don't worry, not a big issue, the newer version hasn't been required yet (v52). However, as improvements are made, demand on resources may increase, causing a once stable system to become unstable. Also, the mix of work units is constantly changing, and even drivers can change performance. Always good practice to keep an eye on things, but don't assume any overclock setting will stay stable forever.

As a simple test, back off the overclock by 10% and see if you continue to have GPU slots failing.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Farwalker
Posts: 23
Joined: Thu Sep 26, 2013 5:54 pm

Re: R9 280X stuck at 99.99%

Post by Farwalker »

It seems the problem was that my GPUs can fold in overclocked mode, but cannot properly communicate with the folding server through my client (or something like that).

I just set my R9 280X's to default and they work at a core speed of 1100MHz.
For about two weeks they worked at an overclock speed of 1200MHz.

Thank you for your advice. I am back folding on the two 280X's.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: R9 280X stuck at 99.99%

Post by bruce »

Farwalker wrote:It seems the problem was that my GPUs can fold in overclocked mode, but cannot properly communicate with the folding server through my client (or something like that).
It's "something like that"

As I pointed out earlier, at least one WU stopped processing at 17% -- most likely due to an overclocking error. In other cases, overclocking errors can cause errors that are not otherwise detected. Those hidden errors are rejected by the server rather than integrated into the scientific results. It's not a communication error with the server ... it's a rejection of bad results during validation checking.

As has been said elsewhere, if an overclocked GPU error results in a single pixel that's the wrong shade of blue, nobody will notice. If the same type of error occurs during folding, the scientific value of the results is lost and the data should be rejected.
Post Reply