GPU Clients Hang

Moderators: Site Moderators, PandeGroup

GPU Clients Hang

Postby nfettinger » Thu Feb 02, 2012 3:15 am

I have a multi GPU Setup:
Core2Quad
1xNvidia GTX 470
2xNvidia 8600 GT (for a tri-monitor support).
There are four folding slots, one per gpu and one smp.
Everything is identified correctly and assigned to the right slot.

My problem is that the only client that works is the one for the second gpu, the others hang, no errors, but no activity either.
I have tried using only one GPU slot, but it only works when set to use the second gpu(device id 1), otherwise it hangs.
I would like to be able to fold on all three cards at once.

Here is what I mean:
Code: Select all
*********************** Log Started 2012-02-02T02:41:37 ************************
02:41:37:WU00:FS02:Starting
02:41:37:WU00:FS02:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/ProgramData/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_15.fah/FahCore_15.exe -dir 00 -suffix 01 -version 701 -checkpoint 15 -gpu 2
02:41:37:WU00:FS02:Started FahCore on PID 4792
02:41:37:WU00:FS02:Core PID:3884
02:41:37:WU00:FS02:FahCore 0x15 started
02:41:38:WU00:FS02:0x15:
02:41:38:WU00:FS02:0x15:*------------------------------*
02:41:38:WU00:FS02:0x15:Folding@Home GPU Core
02:41:38:WU00:FS02:0x15:Version                2.20 (Tue Aug 2 12:06:37 PDT 2011)
02:41:38:WU00:FS02:0x15:Build host             SimbiosNvdWin7
02:41:38:WU00:FS02:0x15:Board Type             NVIDIA/CUDA
02:41:38:WU00:FS02:0x15:Core                   15
02:41:38:WU00:FS02:0x15:GPU device id          2
02:41:38:WU00:FS02:0x15:
02:41:38:WU00:FS02:0x15:Window's signal control handler registered.
02:41:38:WU00:FS02:0x15:Preparing to commence simulation
02:41:38:WU00:FS02:0x15:- Ensuring status. Please wait.
02:41:47:WU00:FS02:0x15:- Looking at optimizations...
02:41:47:WU00:FS02:0x15:- Working with standard loops on this execution.
02:41:47:WU00:FS02:0x15:- Previous termination of core was improper.
02:41:47:WU00:FS02:0x15:- Going to use standard loops.
02:41:47:WU00:FS02:0x15:- Files status OK
02:41:47:WU00:FS02:0x15:sizeof(CORE_PACKET_HDR) = 512 file=<>
02:41:47:WU00:FS02:0x15:- Expanded 44405 -> 167707 (decompressed 377.6 percent)
02:41:47:WU00:FS02:0x15:Called DecompressByteArray: compressed_data_size=44405 data_size=167707, decompressed_data_size=167707 diff=0
02:41:47:WU00:FS02:0x15:- Digital signature verified
02:41:47:WU00:FS02:0x15:
02:41:47:WU00:FS02:0x15:Project: 6802 (Run 15, Clone 47, Gen 800)
02:41:47:WU00:FS02:0x15:
02:41:47:WU00:FS02:0x15:Entering M.D.
02:41:49:WU00:FS02:0x15:Tpr hash 00/wudata_01.tpr:  4005748037 2405796843 320818263 3040721147 2986720935
02:41:49:WU00:FS02:0x15:calling fah_main gpuDeviceId=2
02:41:49:WU00:FS02:0x15:Working on ALZHEIMER'S DISEASE AMYLOID
02:41:49:WU00:FS02:0x15:Client config unavailable.
02:41:49:WU00:FS02:0x15:Starting GUI Server

Once it reaches here, there is NO gpu activity and no indication of what is going on.
nfettinger
 
Posts: 15
Joined: Wed Nov 02, 2011 5:50 am

Re: GPU Clients Hang

Postby bruce » Thu Feb 02, 2012 3:34 am

Please post the ****System**** section from the beginning of the log along with the config information that follows it.
bruce
Site Admin
 
Posts: 16851
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU Clients Hang

Postby nfettinger » Wed Feb 08, 2012 2:28 am

Like I stated before, only GPU slot 1 works, the rest hang. According to GPUz as well as others, the other GPUs are not doing anything.
This is the code from a clean install I just preformed.

Code: Select all
*********************** Log Started 2012-02-08T02:08:21 ************************
02:08:21:************************* Folding@home Client *************************
02:08:21:      Website: http://folding.stanford.edu/
02:08:21:    Copyright: (c) 2009-2012 Stanford University
02:08:21:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
02:08:21:         Args: --lifeline 3332 --command-port=36330
02:08:21:       Config: C:/ProgramData/FAHClient/config.xml
02:08:21:******************************** Build ********************************
02:08:21:      Version: 7.1.43
02:08:21:         Date: Jan 2 2012
02:08:21:         Time: 12:33:05
02:08:21:      SVN Rev: 3223
02:08:21:       Branch: fah/trunk/client
02:08:21:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
02:08:21:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
02:08:21:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT
02:08:21:     Platform: win32 XP
02:08:21:         Bits: 32
02:08:21:         Mode: Release
02:08:21:******************************* System ********************************
02:08:21:          CPU: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
02:08:21:       CPU ID: GenuineIntel Family 6 Model 15 Stepping 11
02:08:21:         CPUs: 4
02:08:21:       Memory: 8.00GiB
02:08:21:  Free Memory: 6.07GiB
02:08:21:      Threads: WINDOWS_THREADS
02:08:21:   On Battery: false
02:08:21:   UTC offset: -5
02:08:21:          PID: 4044
02:08:21:          CWD: C:/ProgramData/FAHClient
02:08:21:           OS: Windows 7 Professional
02:08:21:      OS Arch: AMD64
02:08:21:         GPUs: 3
02:08:21:        GPU 0: NVIDIA:1 G84 [GeForce 8600 GT]
02:08:21:        GPU 1: NVIDIA:1 G84 [GeForce 8600 GT]
02:08:21:        GPU 2: FERMI:1 GF100 [GeForce GTX 470]
02:08:21:         CUDA: 2.0
02:08:21:  CUDA Driver: 4010
02:08:21:Win32 Service: false
02:08:21:***********************************************************************
02:08:21:<config>
02:08:21:  <service-description v='Folding@home Client'/>
02:08:21:  <service-restart v='true'/>
02:08:21:  <service-restart-delay v='5000'/>
02:08:21:
02:08:21:  <!-- Client Control -->
02:08:21:  <cycle-rate v='4'/>
02:08:21:  <cycles v='-1'/>
02:08:21:  <data-directory v='.'/>
02:08:21:  <disable-project-lookup v='false'/>
02:08:21:  <exec-directory v='C:\Program Files (x86)\FAHClient'/>
02:08:21:  <exit-when-done v='false'/>
02:08:21:  <threads v='4'/>
02:08:21:
02:08:21:  <!-- Configuration -->
02:08:21:  <config-rotate v='true'/>
02:08:21:  <config-rotate-dir v='configs'/>
02:08:21:  <config-rotate-max v='16'/>
02:08:21:
02:08:21:  <!-- Debugging -->
02:08:21:  <assignment-servers>
02:08:21:    assign3.stanford.edu:8080 assign4.stanford.edu:80
02:08:21:  </assignment-servers>
02:08:21:  <capture-directory v='capture'/>
02:08:21:  <capture-sockets v='false'/>
02:08:21:  <debug-sockets v='false'/>
02:08:21:  <exception-locations v='true'/>
02:08:21:  <gpu-assignment-servers>
02:08:21:    assign-GPU.stanford.edu:80 assign-GPU.stanford.edu:8080
02:08:21:  </gpu-assignment-servers>
02:08:21:  <stack-traces v='false'/>
02:08:21:
02:08:21:  <!-- Error Handling -->
02:08:21:  <max-slot-errors v='5'/>
02:08:21:  <max-unit-errors v='5'/>
02:08:21:
02:08:21:  <!-- FahCore Control -->
02:08:21:  <checkpoint v='30'/>
02:08:21:  <core-dir v='cores'/>
02:08:21:  <core-priority v='idle'/>
02:08:21:  <cpu-affinity v='false'/>
02:08:21:  <cpu-usage v='100'/>
02:08:21:  <no-assembly v='false'/>
02:08:21:
02:08:21:  <!-- Folding Slot Configuration -->
02:08:21:  <client-subtype v='STDCLI'/>
02:08:21:  <client-type v='normal'/>
02:08:21:  <cpu-species v='X86_PENTIUM_II'/>
02:08:21:  <cpu-type v='AMD64'/>
02:08:21:  <cpus v='-1'/>
02:08:21:  <cuda-index v='0'/>
02:08:21:  <gpu v='true'/>
02:08:21:  <gpu-usage v='100'/>
02:08:21:  <max-packet-size v='normal'/>
02:08:21:  <opencl-index v='0'/>
02:08:21:  <os-species v='UNKNOWN'/>
02:08:21:  <os-type v='WIN32'/>
02:08:21:  <project-key v='0'/>
02:08:21:  <smp v='true'/>
02:08:21:
02:08:21:  <!-- Logging -->
02:08:21:  <log v='log.txt'/>
02:08:21:  <log-color v='false'/>
02:08:21:  <log-crlf v='true'/>
02:08:21:  <log-date v='false'/>
02:08:21:  <log-date-periodically v='21600'/>
02:08:21:  <log-debug v='true'/>
02:08:21:  <log-domain v='false'/>
02:08:21:  <log-header v='true'/>
02:08:21:  <log-level v='true'/>
02:08:21:  <log-no-info-header v='true'/>
02:08:21:  <log-redirect v='false'/>
02:08:21:  <log-rotate v='true'/>
02:08:21:  <log-rotate-dir v='logs'/>
02:08:21:  <log-rotate-max v='16'/>
02:08:21:  <log-short-level v='false'/>
02:08:21:  <log-simple-domains v='true'/>
02:08:21:  <log-thread-id v='false'/>
02:08:21:  <log-thread-prefix v='true'/>
02:08:21:  <log-time v='true'/>
02:08:21:  <log-to-screen v='true'/>
02:08:21:  <log-truncate v='false'/>
02:08:21:  <verbosity v='5'/>
02:08:21:
02:08:21:  <!-- Network -->
02:08:21:  <proxy v=':8080'/>
02:08:21:  <proxy-enable v='false'/>
02:08:21:  <proxy-pass v=''/>
02:08:21:  <proxy-user v=''/>
02:08:21:
02:08:21:  <!-- Process Control -->
02:08:21:  <child v='false'/>
02:08:21:  <daemon v='false'/>
02:08:21:  <pid v='false'/>
02:08:21:  <pid-file v='Folding@home Client.pid'/>
02:08:21:  <respawn v='false'/>
02:08:21:  <service v='false'/>
02:08:21:
02:08:21:  <!-- Remote Command Server -->
02:08:21:  <command-address v='0.0.0.0'/>
02:08:21:  <command-allow v='127.0.0.1'/>
02:08:21:  <command-allow-no-pass v='127.0.0.1'/>
02:08:21:  <command-deny v='0.0.0.0/0'/>
02:08:21:  <command-deny-no-pass v='0.0.0.0/0'/>
02:08:21:  <command-port v='36330'/>
02:08:21:
02:08:21:  <!-- Slot Control -->
02:08:21:  <max-shutdown-wait v='60'/>
02:08:21:  <pause-on-battery v='false'/>
02:08:21:  <pause-on-start v='false'/>
02:08:21:
02:08:21:  <!-- User Information -->
02:08:21:  <machine-id v='0'/>
02:08:21:  <passkey v='********************************'/>
02:08:21:  <team v='97103'/>
02:08:21:  <user v='Nfettinger'/>
02:08:21:
02:08:21:  <!-- Work Unit Control -->
02:08:21:  <dump-after-deadline v='true'/>
02:08:21:  <max-queue v='16'/>
02:08:21:  <max-units v='0'/>
02:08:21:  <next-unit-percentage v='99'/>
02:08:21:
02:08:21:  <!-- Folding Slots -->
02:08:21:  <slot id='0' type='GPU'/>
02:08:21:  <slot id='1' type='GPU'/>
02:08:21:  <slot id='2' type='GPU'/>
02:08:21:  <slot id='3' type='SMP'/>
02:08:21:</config>
02:08:22:Trying to access database...
02:08:22:Successfully acquired database lock
02:08:22:Enabled folding slot 00: READY gpu:0:"G84 [GeForce 8600 GT]"
02:08:22:Enabled folding slot 01: READY gpu:1:"G84 [GeForce 8600 GT]"
02:08:22:Enabled folding slot 02: READY gpu:2:"GF100 [GeForce GTX 470]"
02:08:22:Enabled folding slot 03: READY smp:4
02:08:22:Started thread 5 on PID 4044
02:08:22:Started thread 6 on PID 4044
02:08:22:Started thread 1 on PID 4044
02:08:22:WU00:FS00:Starting
02:08:22:Started thread 3 on PID 4044
02:08:22:Started thread 4 on PID 4044
02:08:22:WU00:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/ProgramData/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/NVIDIA/G80/Core_11.fah/FahCore_11.exe -dir 00 -suffix 01 -version 701 -checkpoint 30 -gpu 0
02:08:22:WU00:FS00:Started FahCore on PID 824
02:08:22:Started thread 7 on PID 4044
02:08:22:WU00:FS00:Core PID:3560
02:08:22:WU00:FS00:FahCore 0x11 started
02:08:22:WU02:FS02:Starting
02:08:22:WU02:FS02:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/ProgramData/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_15.fah/FahCore_15.exe -dir 02 -suffix 01 -version 701 -checkpoint 30 -gpu 2
02:08:22:WU02:FS02:Started FahCore on PID 3716
02:08:22:Started thread 8 on PID 4044
02:08:22:WU02:FS02:Core PID:3852
02:08:22:WU02:FS02:FahCore 0x15 started
02:08:22:WU01:FS01:Starting
02:08:22:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/ProgramData/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/NVIDIA/G80/Core_11.fah/FahCore_11.exe -dir 01 -suffix 01 -version 701 -checkpoint 30 -gpu 1
02:08:22:WU01:FS01:Started FahCore on PID 3028
02:08:22:Started thread 9 on PID 4044
02:08:22:WU01:FS01:Core PID:640
02:08:22:WU01:FS01:FahCore 0x11 started
02:08:22:WU03:FS03:Downloading project 7809 description
02:08:22:WU03:FS03:Connecting to fah-web.stanford.edu:80
02:08:22:WU00:FS00:0x11:
02:08:22:WU00:FS00:0x11:*------------------------------*
02:08:22:WU00:FS00:0x11:Folding@Home GPU Core
02:08:22:WU00:FS00:0x11:Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
02:08:22:WU00:FS00:0x11:
02:08:22:WU00:FS00:0x11:Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
02:08:22:WU00:FS00:0x11:Build host: amoeba
02:08:22:WU00:FS00:0x11:Board Type: Nvidia
02:08:22:WU00:FS00:0x11:Core      :
02:08:22:WU00:FS00:0x11:Preparing to commence simulation
02:08:22:WU00:FS00:0x11:- Ensuring status. Please wait.
02:08:22:WU02:FS02:0x15:
02:08:22:WU02:FS02:0x15:*------------------------------*
02:08:22:WU02:FS02:0x15:Folding@Home GPU Core
02:08:22:WU02:FS02:0x15:Version                2.20 (Tue Aug 2 12:06:37 PDT 2011)
02:08:22:WU02:FS02:0x15:Build host             SimbiosNvdWin7
02:08:22:WU02:FS02:0x15:Board Type             NVIDIA/CUDA
02:08:22:WU02:FS02:0x15:Core                   15
02:08:22:WU02:FS02:0x15:GPU device id          2
02:08:22:WU02:FS02:0x15:
02:08:22:WU02:FS02:0x15:Window's signal control handler registered.
02:08:22:WU02:FS02:0x15:Preparing to commence simulation
02:08:22:WU02:FS02:0x15:- Looking at optimizations...
02:08:22:WU02:FS02:0x15:- Files status OK
02:08:22:WU02:FS02:0x15:sizeof(CORE_PACKET_HDR) = 512 file=<>
02:08:22:WU02:FS02:0x15:- Expanded 44721 -> 169787 (decompressed 379.6 percent)
02:08:22:WU02:FS02:0x15:Called DecompressByteArray: compressed_data_size=44721 data_size=169787, decompressed_data_size=169787 diff=0
02:08:22:WU02:FS02:0x15:- Digital signature verified
02:08:22:WU02:FS02:0x15:
02:08:22:WU02:FS02:0x15:Project: 6800 (Run 19838, Clone 0, Gen 542)
02:08:22:WU02:FS02:0x15:
02:08:22:WU02:FS02:0x15:Assembly optimizations on if available.
02:08:22:WU02:FS02:0x15:Entering M.D.
02:08:22:WU01:FS01:0x11:
02:08:22:WU01:FS01:0x11:*------------------------------*
02:08:22:WU03:FS03:Project 7809 description downloaded successfully
02:08:22:WU01:FS01:0x11:Folding@Home GPU Core
02:08:22:WU01:FS01:0x11:Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
02:08:22:WU01:FS01:0x11:
02:08:22:WU01:FS01:0x11:Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
02:08:22:WU01:FS01:0x11:Build host: amoeba
02:08:22:WU03:FS03:Starting
02:08:22:WU01:FS01:0x11:Board Type: Nvidia
02:08:22:WU03:FS03:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/ProgramData/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/Core_a4.fah/FahCore_a4.exe -dir 03 -suffix 01 -version 701 -checkpoint 30 -np 4
02:08:22:WU01:FS01:0x11:Core      :
02:08:22:WU01:FS01:0x11:Preparing to commence simulation
02:08:22:WU01:FS01:0x11:- Looking at optimizations...
02:08:22:WU03:FS03:Started FahCore on PID 1724
02:08:22:WU01:FS01:0x11:- Files status OK
02:08:22:Started thread 10 on PID 4044
02:08:22:WU01:FS01:0x11:- Expanded 45397 -> 251112 (decompressed 553.1 percent)
02:08:23:WU01:FS01:0x11:Called DecompressByteArray: compressed_data_size=45397 data_size=251112, decompressed_data_size=251112 diff=0
02:08:23:WU01:FS01:0x11:- Digital signature verified
02:08:23:WU01:FS01:0x11:
02:08:23:WU01:FS01:0x11:Project: 5770 (Run 10, Clone 350, Gen 1097)
02:08:23:WU01:FS01:0x11:
02:08:23:WU01:FS01:0x11:Assembly optimizations on if available.
02:08:23:WU01:FS01:0x11:Entering M.D.
02:08:23:WU03:FS03:Core PID:2020
02:08:23:WU03:FS03:FahCore 0xa4 started
02:08:23:WU03:FS03:0xa4:
02:08:23:WU03:FS03:0xa4:*------------------------------*
02:08:23:WU03:FS03:0xa4:Folding@Home Gromacs GB Core
02:08:23:WU03:FS03:0xa4:Version 2.27 (Dec. 15, 2010)
02:08:23:WU03:FS03:0xa4:
02:08:23:WU03:FS03:0xa4:Preparing to commence simulation
02:08:23:WU03:FS03:0xa4:- Looking at optimizations...
02:08:23:WU03:FS03:0xa4:- Files status OK
02:08:23:WU03:FS03:0xa4:- Expanded 2079195 -> 5386224 (decompressed 259.0 percent)
02:08:23:WU03:FS03:0xa4:Called DecompressByteArray: compressed_data_size=2079195 data_size=5386224, decompressed_data_size=5386224 diff=0
02:08:23:WU03:FS03:0xa4:- Digital signature verified
02:08:23:WU03:FS03:0xa4:
02:08:23:WU03:FS03:0xa4:Project: 7809 (Run 8, Clone 105, Gen 51)
02:08:23:WU03:FS03:0xa4:
02:08:23:WU03:FS03:0xa4:Assembly optimizations on if available.
02:08:23:WU03:FS03:0xa4:Entering M.D.
02:08:24:Server connection id=1 on 0.0.0.0:36330 from 127.0.0.1
02:08:24:Started thread 11 on PID 4044
02:08:24:WU02:FS02:0x15:Tpr hash 02/wudata_01.tpr:  2625445221 702242864 3019094029 4024015929 1378055471
02:08:24:WU02:FS02:0x15:calling fah_main gpuDeviceId=2
02:08:24:WU02:FS02:0x15:Working on PEPTIDE (1-42)
02:08:24:WU02:FS02:0x15:Client config unavailable.
02:08:24:WU02:FS02:0x15:Starting GUI Server
02:08:28:WU01:FS01:0x11:Will resume from checkpoint file
02:08:28:WU01:FS01:0x11:Tpr hash 01/wudata_01.tpr:  2470919606 880458732 1116749242 732434962 2317364188
02:08:28:WU01:FS01:0x11:
02:08:28:WU01:FS01:0x11:Calling fah_main args: 14 usage=100
02:08:28:WU01:FS01:0x11:
02:08:28:WU01:FS01:0x11:Working on Protein
02:08:29:WU01:FS01:0x11:Client config unavailable.
02:08:29:WU01:FS01:0x11:Resuming from checkpoint
02:08:29:WU01:FS01:0x11:fcCheckPointResume: retreived and current tpr file hash:
02:08:29:WU01:FS01:0x11:   0   2470919606   2470919606
02:08:29:WU01:FS01:0x11:   1    880458732    880458732
02:08:29:WU01:FS01:0x11:   2   1116749242   1116749242
02:08:29:WU01:FS01:0x11:   3    732434962    732434962
02:08:29:WU01:FS01:0x11:   4   2317364188   2317364188
02:08:29:WU01:FS01:0x11:fcCheckPointResume: file hashes same.
02:08:29:WU01:FS01:0x11:fcCheckPointResume: state restored.
02:08:29:WU01:FS01:0x11:Verified 01/wudata_01.log
02:08:29:WU01:FS01:0x11:Verified 01/wudata_01.edr
02:08:29:WU01:FS01:0x11:Verified 01/wudata_01.xtc
02:08:29:WU03:FS03:0xa4:Mapping NT from 4 to 4
02:08:30:WU01:FS01:0x11:Starting GUI Server
02:08:30:WU03:FS03:0xa4:Completed 0 out of 1500000 steps  (0%)
02:08:31:WU00:FS00:0x11:- Looking at optimizations...
02:08:31:WU00:FS00:0x11:- Working with standard loops on this execution.
02:08:31:WU00:FS00:0x11:- Previous termination of core was improper.
02:08:31:WU00:FS00:0x11:- Files status OK
02:08:31:WU00:FS00:0x11:- Expanded 45445 -> 251112 (decompressed 552.5 percent)
02:08:31:WU00:FS00:0x11:Called DecompressByteArray: compressed_data_size=45445 data_size=251112, decompressed_data_size=251112 diff=0
02:08:31:WU00:FS00:0x11:- Digital signature verified
02:08:31:WU00:FS00:0x11:
02:08:31:WU00:FS00:0x11:Project: 5769 (Run 11, Clone 313, Gen 1881)
02:08:31:WU00:FS00:0x11:
02:08:31:WU00:FS00:0x11:Entering M.D.
02:08:37:WU00:FS00:0x11:Tpr hash 00/wudata_01.tpr:  3799447554 4088103662 2889721865 3075314348 610771359
02:08:37:WU00:FS00:0x11:
02:08:37:WU00:FS00:0x11:Calling fah_main args: 14 usage=100
02:08:37:WU00:FS00:0x11:
02:11:45:WU01:FS01:0x11:Completed 1%
02:15:04:WU01:FS01:0x11:Completed 2%
02:18:23:WU01:FS01:0x11:Completed 3%
02:21:41:WU01:FS01:0x11:Completed 4%
02:25:00:WU01:FS01:0x11:Completed 5%
Last edited by nfettinger on Tue Feb 14, 2012 6:51 pm, edited 1 time in total.
nfettinger
 
Posts: 15
Joined: Wed Nov 02, 2011 5:50 am

Re: GPU Clients Hang

Postby nfettinger » Wed Feb 08, 2012 2:34 am

FYI:
Nvidia Driver versions:
Video 285.62
Nforce 73.20
Direct X 10.0
Last edited by nfettinger on Tue Feb 14, 2012 6:51 pm, edited 1 time in total.
nfettinger
 
Posts: 15
Joined: Wed Nov 02, 2011 5:50 am

Re: GPU Clients Hang

Postby 7im » Wed Feb 08, 2012 5:20 am

Which GPU is in slot 1? Which one is working?
Please do not mistake my brevity as dispassion or condescension. I recognize the time you spend reading the forum is time you could use elsewhere, so my short responses save you time. Please do not hesitate to ask for clarification if I was too terse.
User avatar
7im
 
Posts: 13308
Joined: Thu Nov 29, 2007 4:30 pm
Location: Arizona

Re: GPU Clients Hang

Postby nfettinger » Thu Feb 09, 2012 9:54 pm

Slot 0: GTX 470
Slot 1: Geforce 8600 (Working)
Slot 2: Geforce 8600
Last edited by nfettinger on Tue Feb 14, 2012 6:51 pm, edited 1 time in total.
nfettinger
 
Posts: 15
Joined: Wed Nov 02, 2011 5:50 am

Re: GPU Clients Hang

Postby MtM » Fri Feb 10, 2012 1:08 am

nfettinger wrote:I have a multi GPU Setup:
Core2Quad
1xNvidia GTX 470
2xNvidia 8600 GT (for a tri-monitor support).


Probably not related in any way, but you can run 3 monitors on 2 screens ( depending on the cards, sometimes even one ).

What is your desktop configuration ( which gpu drives the main screen? ).
MtM
 
Posts: 3233
Joined: Fri Jun 27, 2008 2:20 pm
Location: The Netherlands

Re: GPU Clients Hang

Postby PantherX » Fri Feb 10, 2012 10:22 pm

How did you install the Client? During the installation did you select GPU+SMP or SMP or GPU?
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Chrome Folding App (Beta) Ӂ Troubleshooting "Bad WUs" Ӂ Troubleshooting Server Connectivity Issues
User avatar
PantherX
Super Moderator
 
Posts: 6272
Joined: Wed Dec 23, 2009 9:33 am

Re: GPU Clients Hang

Postby nfettinger » Sun Feb 12, 2012 4:25 am

PantherX wrote:How did you install the Client? During the installation did you select GPU+SMP or SMP or GPU?

I have tried both ways, then adding GPUs one at a time.
I also tried GPU only, but no configuration seems to work.

The only way I can get the other two to run is if it is the only GPU not disabled on start-up.
Last edited by nfettinger on Tue Feb 14, 2012 6:52 pm, edited 2 times in total.
nfettinger
 
Posts: 15
Joined: Wed Nov 02, 2011 5:50 am

Re: GPU Clients Hang

Postby nfettinger » Sun Feb 12, 2012 4:26 am

MtM wrote:
nfettinger wrote:I have a multi GPU Setup:
Core2Quad
1xNvidia GTX 470
2xNvidia 8600 GT (for a tri-monitor support).


Probably not related in any way, but you can run 3 monitors on 2 screens ( depending on the cards, sometimes even one ).

What is your desktop configuration ( which gpu drives the main screen? ).

The main screen is run by the GTX470. It is in the first PCI-ex slot of the motherboard as well.
Last edited by nfettinger on Tue Feb 14, 2012 6:52 pm, edited 2 times in total.
nfettinger
 
Posts: 15
Joined: Wed Nov 02, 2011 5:50 am

Re: GPU Clients Hang

Postby MtM » Sun Feb 12, 2012 1:05 pm

What do you mean with disabled ( device manager? ). Do you normally disable devices there? What is your ( normal ) desktop configuration?
MtM
 
Posts: 3233
Joined: Fri Jun 27, 2008 2:20 pm
Location: The Netherlands

Re: GPU Clients Hang

Postby nfettinger » Sun Feb 12, 2012 10:05 pm

MtM wrote:What do you mean with disabled ( device manager? ). Do you normally disable devices there? What is your ( normal ) desktop configuration?

The normal setup is:
One Screen per GPU, main screen run by the GTX470
I disable the cards through the device manager and rebooted each time to make sure there is not a driver issue.
Last edited by nfettinger on Tue Feb 14, 2012 6:52 pm, edited 2 times in total.
nfettinger
 
Posts: 15
Joined: Wed Nov 02, 2011 5:50 am

Re: GPU Clients Hang

Postby nfettinger » Sun Feb 12, 2012 10:50 pm

Even when I only add one GPU slot and assign it to a card when all are active, it only works when it is on card 2 (ID 1) as stated above.
Last edited by nfettinger on Tue Feb 14, 2012 6:52 pm, edited 1 time in total.
nfettinger
 
Posts: 15
Joined: Wed Nov 02, 2011 5:50 am

Re: GPU Clients Hang

Postby MtM » Mon Feb 13, 2012 8:58 am

You could try downloading FAHWatch7 ( link in my signature ), use the installer and run the desktop shortcut called 'Diagnostics'. Post the results ( it's checks fahclient info, configuration and adds gpgpu enumeration ). It might show something interesting, it might not I'm not sure but it can't hurt to try. You can remove/uninstall the application when you're finished.
MtM
 
Posts: 3233
Joined: Fri Jun 27, 2008 2:20 pm
Location: The Netherlands

Re: GPU Clients Hang

Postby nfettinger » Tue Feb 14, 2012 6:20 pm

I Found the Solution:
It turns out my GPUs are detected in the same order as device manager, GPUz, ect. However, for some reason they need to be indexed backwards since the Nvidia Drivers / Control Panel has them in the reverse order. So, for anyone else having this problem, check the Nvidia Control Panel as well! The reason this was caught before is because when I restarted FAH to make sure the configuration changes took, it erased the values I set, and loaded the default ones (-1 for all). After a fifth install, i tried using a NON clean uninstall / install (Keep the data behind but erase the program). This may be a beta bug, but I cannot verify this.

In essence, the work that should have been sent to the GTX470 was sent to the 8600GT and vice versa. This did not cause an error, but the clients just hung there. The GPU basically rejected the work, and sat idle. In my opinion, this IS a beta error that needs to be fixed. The client (fah_main) should have some sort of feedback or timeout when a problem such as this occurs. Perhaps there is an error returned to the client that isn't being caught or is being ignored. The reason slot 1 always worked is because the index matched up always.

Here is the configuration I have now:
Slot 0: gpu index 0, open-cl index 2, cuda-index 2 (now on GTX470)
Slot 1: gpu index 1, open-cl index 1, cuda-index 1 (now on 8600GT)
Slot 2: gpu index 2, open-cl index 0, cuda-index 0 (now on 8600GT)
Slot 3: SMP ... (always worked)

To clarify any confusion if you have any:
GTX470: detected on slot 0: gpu/cl index 0 --> really gpu/cl index 2
8600GT: detected on slot 1, gpu/cl index 1 --> really gpu/cl index 1
8600GT: detected on slot 2, gpu/cl index 2 --> really gpu/cl index 0

It should also be noted that adding them in the reverse order (same index numbers across the board) did NOT fix the problem since it detected the cards wrong. Using the force gpu switch (--gpu --gpu-species <type>) did not fix this either; looking back it should have, but I don't have any idea why not. I am assuming if all of the GPUs were of the same type / group, it would not have mattered, the core would still work on the "wrong index, but same type" GPU. I guess I should also point out that the client for the GTX470 sometimes (1/3) attempted to work on the 8600GT, ran about half speed sputtering up and down for 30 seconds or so before giving up (no error though).

I would like to thank you all for your help, but especially MtM for his suggestion, the errors produced from FAHWatch7, (not even for the diagnostics output, but the run-time errors especially) helped me to narrow down the problem and find a way to fix it. I hope this saves someone else a headache, time, or some hair if they run into the same problem I did. For forum searching purposes, I am using a Nvidia 780i Motherboard; this should make it easier for others to find this thread when frustrated.

*My previous posts have been modified to add quotes, this should clarify which answer went to which question.
nfettinger
 
Posts: 15
Joined: Wed Nov 02, 2011 5:50 am

Next

Return to V7.1.52 Windows/Linux

Who is online

Users browsing this forum: No registered users and 2 guests