peculiar problem related to work progressing and driver rese

Moderators: Site Moderators, FAHC Science Team

peculiar problem related to work progressing and driver rese

Postby leonhard » Sat Oct 06, 2012 1:09 pm

I run fah on a windows 7 64bit box with i7-920 and hd7970.
the problem is :
when video driver reset, fah's mechanism for calculating progress is functioning abnormally.
for example, when the driver reset, the core is in 52% of a work unit; after driver reset, the gpu load becomes 0%, but the core still progresses normally; but the log will never progress from this point on; at last, the core reaches 99% and finds something wrong, so it rolls back to 0%.

yes, when I find the gpu is reset and gpu load is 0%, I can pause the fah and resume it again, this time the gpu load is 99% now; but the result is the same as the previous case: the core reaches 99% and finds something wrong, so it returns to 0% and restart.

my point is: if the progress of the core and the checkpoint is not synchronous, there should be a re-synchronizing process and resume the processing progress to a correct point and not waste time.
leonhard
 
Posts: 3
Joined: Sat Oct 06, 2012 12:53 pm

Re: peculiar problem related to work progressing and driver

Postby bollix47 » Sat Oct 06, 2012 1:45 pm

Welcome to the folding support forum leonhard.

If you could copy/paste the v7 log it may be helpful. Click on the refresh button above the log(available in Advanced or Expert view), then scroll up to the top and copy the system info and config. Paste that and the section where the driver resets between code tags (available in Full Editor) in a post here.

Also, what version of graphic drivers are you using?
bollix47
 
Posts: 2871
Joined: Sun Dec 02, 2007 6:04 am
Location: Canada

Re: peculiar problem related to work progressing and driver

Postby leonhard » Sat Oct 06, 2012 2:50 pm

driver version: catalyst 12.8

In this problem, driver reset is only the initial reason. After driver reset, even quit the fah application or restart the system does not help.
the log is below, seems it does not help.

Code: Select all
*********************** Log Started 2012-10-06T13:48:20Z ***********************
13:48:20:************************* Folding@home Client *************************
13:48:20:      Website: http://folding.stanford.edu/
13:48:20:    Copyright: (c) 2009-2012 Stanford University
13:48:20:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
13:48:20:         Args: --lifeline 4648 --command-port=36330
13:48:20:       Config: C:/Users/Administrator/AppData/Roaming/FAHClient/config.xml
13:48:20:******************************** Build ********************************
13:48:20:      Version: 7.1.52
13:48:20:         Date: Mar 20 2012
13:48:20:         Time: 19:37:42
13:48:20:      SVN Rev: 3515
13:48:20:       Branch: fah/trunk/client
13:48:20:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
13:48:20:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
13:48:20:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT
13:48:20:     Platform: win32 XP
13:48:20:         Bits: 32
13:48:20:         Mode: Release
13:48:20:******************************* System ********************************
13:48:20:          CPU: Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
13:48:20:       CPU ID: GenuineIntel Family 6 Model 26 Stepping 5
13:48:20:         CPUs: 8
13:48:20:       Memory: 6.00GiB
13:48:20:  Free Memory: 3.61GiB
13:48:20:      Threads: WINDOWS_THREADS
13:48:20:   On Battery: false
13:48:20:   UTC offset: 8
13:48:20:          PID: 2340
13:48:20:          CWD: C:/Users/Administrator/AppData/Roaming/FAHClient
13:48:20:           OS: Windows 7 Ultimate
13:48:20:      OS Arch: AMD64
13:48:20:         GPUs: 2
13:48:20:        GPU 0: ATI:5 Tahiti XT [Radeon HD 7970]
13:48:20:        GPU 1: UNSUPPORTED: Rage XL (Intel Corporation)
13:48:20:         CUDA: Not detected
13:48:20:Win32 Service: false
13:48:20:***********************************************************************
13:48:20:<config>
13:48:20:  <service-description v='Folding@home Client'/>
13:48:20:  <service-restart v='true'/>
13:48:20:  <service-restart-delay v='5000'/>
13:48:20:
13:48:20:  <!-- Client Control -->
13:48:20:  <cycle-rate v='4'/>
13:48:20:  <cycles v='-1'/>
13:48:20:  <data-directory v='.'/>
13:48:20:  <disable-project-lookup v='false'/>
13:48:20:  <exec-directory v='C:\Program Files (x86)\FAHClient'/>
13:48:20:  <exit-when-done v='false'/>
13:48:20:  <threads v='4'/>
13:48:20:
13:48:20:  <!-- Configuration -->
13:48:20:  <config-rotate v='true'/>
13:48:20:  <config-rotate-dir v='configs'/>
13:48:20:  <config-rotate-max v='16'/>
13:48:20:
13:48:20:  <!-- Debugging -->
13:48:20:  <assignment-servers>
13:48:20:    assign3.stanford.edu:8080 assign4.stanford.edu:80
13:48:20:  </assignment-servers>
13:48:20:  <capture-directory v='capture'/>
13:48:20:  <capture-sockets v='false'/>
13:48:20:  <debug-sockets v='false'/>
13:48:20:  <exception-locations v='true'/>
13:48:20:  <gpu-assignment-servers>
13:48:20:    assign-GPU.stanford.edu:80 assign-GPU.stanford.edu:8080
13:48:20:  </gpu-assignment-servers>
13:48:20:  <stack-traces v='false'/>
13:48:20:
13:48:20:  <!-- Error Handling -->
13:48:20:  <max-slot-errors v='5'/>
13:48:20:  <max-unit-errors v='5'/>
13:48:20:
13:48:20:  <!-- FahCore Control -->
13:48:20:  <checkpoint v='3'/>
13:48:20:  <core-dir v='cores'/>
13:48:20:  <core-priority v='idle'/>
13:48:20:  <cpu-affinity v='false'/>
13:48:20:  <cpu-usage v='100'/>
13:48:20:  <no-assembly v='true'/>
13:48:20:
13:48:20:  <!-- Folding Slot Configuration -->
13:48:20:  <client-subtype v='STDCLI'/>
13:48:20:  <client-type v='normal'/>
13:48:20:  <cpu-species v='X86_PENTIUM_II'/>
13:48:20:  <cpu-type v='AMD64'/>
13:48:20:  <cpus v='-1'/>
13:48:20:  <cuda-index v='0'/>
13:48:20:  <gpu v='true'/>
13:48:20:  <gpu-usage v='100'/>
13:48:20:  <max-packet-size v='normal'/>
13:48:20:  <opencl-index v='0'/>
13:48:20:  <os-species v='UNKNOWN'/>
13:48:20:  <os-type v='WIN32'/>
13:48:20:  <project-key v='0'/>
13:48:20:  <smp v='false'/>
13:48:20:
13:48:20:  <!-- Logging -->
13:48:20:  <log v='log.txt'/>
13:48:20:  <log-color v='false'/>
13:48:20:  <log-crlf v='true'/>
13:48:20:  <log-date v='false'/>
13:48:20:  <log-date-periodically v='21600'/>
13:48:20:  <log-debug v='true'/>
13:48:20:  <log-domain v='false'/>
13:48:20:  <log-header v='true'/>
13:48:20:  <log-level v='true'/>
13:48:20:  <log-no-info-header v='true'/>
13:48:20:  <log-redirect v='false'/>
13:48:20:  <log-rotate v='true'/>
13:48:20:  <log-rotate-dir v='logs'/>
13:48:20:  <log-rotate-max v='16'/>
13:48:20:  <log-short-level v='false'/>
13:48:20:  <log-simple-domains v='true'/>
13:48:20:  <log-thread-id v='false'/>
13:48:20:  <log-thread-prefix v='true'/>
13:48:20:  <log-time v='true'/>
13:48:20:  <log-to-screen v='true'/>
13:48:20:  <log-truncate v='false'/>
13:48:20:  <verbosity v='5'/>
13:48:20:
13:48:20:  <!-- Network -->
13:48:20:  <proxy v=':8080'/>
13:48:20:  <proxy-enable v='false'/>
13:48:20:  <proxy-pass v=''/>
13:48:20:  <proxy-user v=''/>
13:48:20:
13:48:20:  <!-- Process Control -->
13:48:20:  <child v='false'/>
13:48:20:  <daemon v='false'/>
13:48:20:  <pid v='false'/>
13:48:20:  <pid-file v='Folding@home Client.pid'/>
13:48:20:  <respawn v='false'/>
13:48:20:  <service v='false'/>
13:48:20:
13:48:20:  <!-- Remote Command Server -->
13:48:20:  <command-address v='0.0.0.0'/>
13:48:20:  <command-allow v='127.0.0.1'/>
13:48:20:  <command-allow-no-pass v='127.0.0.1'/>
13:48:20:  <command-deny v='0.0.0.0/0'/>
13:48:20:  <command-deny-no-pass v='0.0.0.0/0'/>
13:48:20:  <command-port v='36330'/>
13:48:20:
13:48:20:  <!-- Slot Control -->
13:48:20:  <max-shutdown-wait v='60'/>
13:48:20:  <pause-on-battery v='false'/>
13:48:20:  <pause-on-start v='false'/>
13:48:20:
13:48:20:  <!-- User Information -->
13:48:20:  <machine-id v='0'/>
13:48:20:  <passkey v='********************************'/>
13:48:20:  <team v='0'/>
13:48:20:  <user v='leonhard'/>
13:48:20:
13:48:20:  <!-- Work Unit Control -->
13:48:20:  <dump-after-deadline v='true'/>
13:48:20:  <max-queue v='16'/>
13:48:20:  <max-units v='0'/>
13:48:20:  <next-unit-percentage v='99'/>
13:48:20:
13:48:20:  <!-- Folding Slots -->
13:48:20:  <slot id='0' type='GPU'/>
13:48:20:</config>
13:48:20:Trying to access database...
13:48:20:Successfully acquired database lock
13:48:20:Enabled folding slot 00: READY gpu:0:"Tahiti XT [Radeon HD 7970]"
13:48:20:Started thread 3 on PID 2340
13:48:20:Started thread 1 on PID 2340
13:48:20:Started thread 4 on PID 2340
13:48:20:Started thread 6 on PID 2340
13:48:20:WU00:FS00:Starting
13:48:20:Started thread 5 on PID 2340
13:48:20:WU00:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Administrator/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/ATI/R600/Core_16.fah/FahCore_16.exe -dir 00 -suffix 01 -version 701 -lifeline 2340 -checkpoint 3 -noassembly -gpu 0
13:48:20:WU00:FS00:Started FahCore on PID 2260
13:48:20:Started thread 7 on PID 2340
13:48:20:WU00:FS00:Core PID:3740
13:48:20:WU00:FS00:FahCore 0x16 started
13:48:21:WU00:FS00:0x16:
13:48:21:WU00:FS00:0x16:*------------------------------*
13:48:21:WU00:FS00:0x16:Folding@Home GPU Core
13:48:21:WU00:FS00:0x16:Version 2.11 (Thu Dec 9 15:00:14 PST 2010)
13:48:21:WU00:FS00:0x16:
13:48:21:WU00:FS00:0x16:Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.30729.01 for 80x86
13:48:21:WU00:FS00:0x16:Build host: user-f6d030f24f
13:48:21:WU00:FS00:0x16:Board Type: AMD/OpenCL
13:48:21:WU00:FS00:0x16:Core      : x=16
13:48:21:WU00:FS00:0x16: Window's signal control handler registered.
13:48:21:WU00:FS00:0x16:Preparing to commence simulation
13:48:21:WU00:FS00:0x16:- User disabled assembly optimizations.
13:48:21:WU00:FS00:0x16:- Files status OK
13:48:21:WU00:FS00:0x16:sizeof(CORE_PACKET_HDR) = 512 file=<>
13:48:21:WU00:FS00:0x16:- Expanded 45019 -> 171163 (decompressed 380.2 percent)
13:48:21:WU00:FS00:0x16:Called DecompressByteArray: compressed_data_size=45019 data_size=171163, decompressed_data_size=171163 diff=0
13:48:21:WU00:FS00:0x16:- Digital signature verified
13:48:21:WU00:FS00:0x16:
13:48:21:WU00:FS00:0x16:Project: 11293 (Run 30, Clone 241, Gen 31)
13:48:21:WU00:FS00:0x16:
13:48:21:WU00:FS00:0x16:Entering M.D.
13:48:22:WU00:FS00:0x16:Will resume from checkpoint file 00/wudata_01.ckp
13:48:22:WU00:FS00:0x16:Tpr hash 00/wudata_01.tpr:  3985685935 1755468968 2956476451 526209289 1066074825
13:48:22:WU00:FS00:0x16:Working on ALZHEIMER DISEASE AMYLOID
13:48:22:WU00:FS00:0x16:Client config unavailable.
13:48:23:WU00:FS00:0x16:Starting GUI Server
13:48:23:Started thread 8 on PID 2340
13:48:23:Server connection id=1 on 0.0.0.0:36330 from 127.0.0.1
13:48:25:WU00:FS00:0x16:Resuming from checkpoint
13:48:25:WU00:FS00:0x16:fcCheckPointResume: retreived and current tpr file hash:
13:48:25:WU00:FS00:0x16:   0   3985685935   3985685935
13:48:25:WU00:FS00:0x16:   1   1755468968   1755468968
13:48:25:WU00:FS00:0x16:   2   2956476451   2956476451
13:48:25:WU00:FS00:0x16:   3    526209289    526209289
13:48:25:WU00:FS00:0x16:   4   1066074825   1066074825
13:48:25:WU00:FS00:0x16:fcCheckPointResume: file hashes same.
13:48:25:WU00:FS00:0x16:fcCheckPointResume: state restored.
13:48:25:WU00:FS00:0x16:fcCheckPointResume: name 00/wudata_01.log Verified 00/wudata_01.log
13:48:25:WU00:FS00:0x16:fcCheckPointResume: name 00/wudata_01.trr Verified 00/wudata_01.trr
13:48:25:WU00:FS00:0x16:fcCheckPointResume: name 00/wudata_01.xtc Verified 00/wudata_01.xtc
13:48:25:WU00:FS00:0x16:fcCheckPointResume: name 00/wudata_01.edr Verified 00/wudata_01.edr
13:48:25:WU00:FS00:0x16:fcCheckPointResume: state restored 2
13:48:25:WU00:FS00:0x16:Resumed from checkpoint
13:48:25:WU00:FS00:0x16:Setting checkpoint frequency: 500000
13:48:25:WU00:FS00:0x16:Completed  26000001 out of 50000000 steps (52%).


at the time of this log, the progress displayed on the "status" panel is 55.70%, interesting!
leonhard
 
Posts: 3
Joined: Sat Oct 06, 2012 12:53 pm

Re: peculiar problem related to work progressing and driver

Postby leonhard » Sat Oct 06, 2012 3:01 pm

sometimes it resumes to normal state after several percentage point, but often it can not resume until 99%.
leonhard
 
Posts: 3
Joined: Sat Oct 06, 2012 12:53 pm

Re: peculiar problem related to work progressing and driver

Postby 7im » Sat Oct 06, 2012 5:59 pm

What's causing the reset?
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
User avatar
7im
 
Posts: 10189
Joined: Thu Nov 29, 2007 5:30 pm
Location: Arizona

Re: peculiar problem related to work progressing and driver

Postby P5-133XL » Sat Oct 06, 2012 7:36 pm

Typical causes of resets and their solutions: If you are playing games while folding, please pause folding before starting the game and un-pause when done. If you are OC'ing, lower it please.

Video resets are not good for anything, They indicate a basic problem detected by the driver. Typically your clocks get reset to a slow rate when one happens so gaming as well as folding occurs much slower. There is always a potential that whatever caused the reset (or the reset itself) will corrupt the WU. You really want to avoid these.
Image
P5-133XL
 
Posts: 2948
Joined: Sun Dec 02, 2007 5:36 am
Location: Salem. OR USA

Re: peculiar problem related to work progressing and driver

Postby bruce » Sat Oct 06, 2012 9:21 pm

7im wrote:What's causing the reset?


When your CPU drivers reset, the OS detects it and (at least in Windows) you'll get a BSOD. If there's a checkpoint that can be recovered when you restart, it work will resume from that point. Otherwise it will restart the WU from frame 0.

When your GPU drivers reset, there's no real system crash although there probably should be. Error recovery for GPUs hasn't had as many years of development as error recovery for CPUs.

The key here is to have a stable system that doesn't force a reset. I don't know any way to determine whether that's (A) a problem with the game or (B) a problem with the FahCore, or (C) an inability of the drivers to deal with the game and the FahCore at the same time. Until the developers can identify the problem, they can't fix it and the best thing you can do is avoid it. I've seen reports from other donors who have chosen not to run certain games and FAH at the same time. Still others have managed to avoid the problem by focusing on possible hardware issues like overheating/overclocking/under powered systems. (Obviously those issues might have already been eliminated in your case.)
bruce
 
Posts: 20139
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.


Return to V7.1.52 Windows/Linux

Who is online

Users browsing this forum: No registered users and 1 guest

cron