MultiGPU problem

A forum for discussing FAH-related hardware choices and info on actual products (not speculation).

Moderator: Site Moderators

Forum rules
Please read the forum rules before posting.
Post Reply
yalexey
Posts: 14
Joined: Sun Oct 30, 2016 5:10 pm

MultiGPU problem

Post by yalexey »

Almost every reboot, I see in the log messages, such as:

Code: Select all

14:51:48:ERROR:FS01:OpenCL device not found for 'opencl-index' = 3 with vendor ID = 0x10de, plese correct this by removing the manually configured 'opencl-index' option.
14:51:48:ERROR:FS03:'opencl-index'=2 is in use by another folding slot but GPU 1 matches this device's PCI bus=4 and PCI slot=0, please correct this by removing any manually configured 'opencl-index' options.
14:51:49:ERROR:WU03:FS03:Failed to start core: OpenCL device matching slot 3 not found
After that, sometimes, one or two video cards left without work. I have to remove slots or assign values manually, restart the client, etc. Just deleting is not enough. It is necessary to prescribe the characteristics of slots.

I have 4 GPU in system. One of them - CPU internal graphic core. Three other - Nvidia GTX 1070 by Palit and Gigabyte. System - Win10.

Is it possible to somehow get rid of this problem completely?
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: MultiGPU problem

Post by bruce »

FAHClient does have some problems in multi-GPU installations. Some changes have been made to the beta version which might or might not help, but it's still an open issue.

Have you tried the V7.4.15 Open Beta?

We need to see more of the log ... from the beginning through the portion you've included.
(See my Signature if you need help)
yalexey
Posts: 14
Joined: Sun Oct 30, 2016 5:10 pm

Re: MultiGPU problem

Post by yalexey »

Yes. It is V7.4.15 client.

Code: Select all

14:51:48:************************* Folding@home Client *************************
14:51:48:        Website: http://folding.stanford.edu/
14:51:48:      Copyright: (c) 2009-2016 Stanford University
14:51:48:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
14:51:48:           Args: 
14:51:48:         Config: C:\Users\folder\AppData\Roaming\FAHClient\config.xml
14:51:48:******************************** Build ********************************
14:51:48:        Version: 7.4.15
14:51:48:           Date: Aug 17 2016
14:51:48:           Time: 04:33:41
14:51:48:     Repository: Git
14:51:48:       Revision: 4f3e0e25571a9f691719f0c273739294bde517dd
14:51:48:         Branch: master
14:51:48:       Compiler: GNU 5.3.1 20160205
14:51:48:        Options: -std=gnu++98 -I/mingw64/include -O3 -funroll-loops -ffast-math
14:51:48:                 -mfpmath=sse -fno-unsafe-math-optimizations -msse2
14:51:48:       Platform: linux2 4.6.0-1-amd64
14:51:48:           Bits: 64
14:51:48:           Mode: Release
14:51:48:******************************* System ********************************
14:51:48:            CPU: Intel(R) Pentium(R) CPU G4400 @ 3.30GHz
14:51:48:         CPU ID: GenuineIntel Family 6 Model 94 Stepping 3
14:51:48:           CPUs: 2
14:51:48:         Memory: 3.92GiB
14:51:48:    Free Memory: 2.38GiB
14:51:48:        Threads: WINDOWS_THREADS
14:51:48:     OS Version: 6.2
14:51:48:    Has Battery: false
14:51:48:     On Battery: false
14:51:48:     UTC Offset: 3
14:51:48:            PID: 4900
14:51:48:            CWD: C:\Users\folder\AppData\Roaming\FAHClient
14:51:48:             OS: Windows 10 Pro
14:51:48:        OS Arch: AMD64
14:51:48:           GPUs: 3
14:51:48:          GPU 0: Bus:4 Slot:0 NVIDIA:5 GP104 [GeForce GTX 1070]
14:51:48:          GPU 1: Bus:4 Slot:0 NVIDIA:5 GP104 [GeForce GTX 1070]
14:51:48:          GPU 2: Bus:4 Slot:0 NVIDIA:5 GP104 [GeForce GTX 1070]
14:51:48:  CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:6.1 Driver:8.0
14:51:48:  CUDA Device 1: Platform:0 Device:1 Bus:2 Slot:0 Compute:6.1 Driver:8.0
14:51:48:  CUDA Device 2: Platform:0 Device:2 Bus:4 Slot:0 Compute:6.1 Driver:8.0
14:51:48:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:373.6
14:51:48:OpenCL Device 1: Platform:0 Device:1 Bus:2 Slot:0 Compute:1.2 Driver:373.6
14:51:48:OpenCL Device 2: Platform:0 Device:2 Bus:4 Slot:0 Compute:1.2 Driver:373.6
14:51:48:OpenCL Device 3: Platform:1 Device:0 Bus:NA Slot:NA Compute:1.2 Driver:21.20
14:51:48:OpenCL Device 4: Platform:1 Device:1 Bus:NA Slot:NA Compute:1.2 Driver:6.6
14:51:48:  Win32 Service: false
14:51:48:***********************************************************************
14:51:48:<config>
14:51:48:  <service-description v='Folding@home Client'/>
14:51:48:  <service-restart v='true'/>
14:51:48:  <service-restart-delay v='5000'/>
14:51:48:
14:51:48:  <!-- Client Control -->
14:51:48:  <client-threads v='6'/>
14:51:48:  <cycle-rate v='4'/>
14:51:48:  <cycles v='-1'/>
14:51:48:  <data-directory v='.'/>
14:51:48:  <disable-sleep-when-active v='true'/>
14:51:48:  <exec-directory v='C:\Program Files\FAHClient'/>
14:51:48:  <exit-when-done v='false'/>
14:51:48:  <fold-anon v='false'/>
14:51:48:  <open-web-control v='false'/>
14:51:48:
14:51:48:  <!-- Configuration -->
14:51:48:  <config-rotate v='true'/>
14:51:48:  <config-rotate-dir v='configs'/>
14:51:48:  <config-rotate-max v='16'/>
14:51:48:
14:51:48:  <!-- Debugging -->
14:51:48:  <assignment-servers>
14:51:48:    assign3.stanford.edu:8080 assign4.stanford.edu:80
14:51:48:  </assignment-servers>
14:51:48:  <auth-as v='true'/>
14:51:48:  <capture-directory v='capture'/>
14:51:48:  <capture-on-error v='false'/>
14:51:48:  <capture-packets v='false'/>
14:51:48:  <capture-requests v='false'/>
14:51:48:  <capture-responses v='false'/>
14:51:48:  <capture-sockets v='false'/>
14:51:48:  <core-exec v='FahCore_$type'/>
14:51:48:  <core-wrapper-exec v='FAHCoreWrapper'/>
14:51:48:  <debug-sockets v='false'/>
14:51:48:  <exception-locations v='true'/>
14:51:48:  <gpu-assignment-servers>
14:51:48:    assign-GPU.stanford.edu:80 assign-GPU2.stanford.edu:80
14:51:48:  </gpu-assignment-servers>
14:51:48:  <stack-traces v='false'/>
14:51:48:
14:51:48:  <!-- Error Handling -->
14:51:48:  <max-slot-errors v='10'/>
14:51:48:  <max-unit-errors v='5'/>
14:51:48:
14:51:48:  <!-- Folding Core -->
14:51:48:  <checkpoint v='5'/>
14:51:48:  <core-dir v='cores'/>
14:51:48:  <core-priority v='low'/>
14:51:48:  <cpu-affinity v='false'/>
14:51:48:  <cpu-usage v='100'/>
14:51:48:  <gpu-usage v='100'/>
14:51:48:  <no-assembly v='false'/>
14:51:48:
14:51:48:  <!-- Folding Slot Configuration -->
14:51:48:  <cause v='ANY'/>
14:51:48:  <client-subtype v='STDCLI'/>
14:51:48:  <client-type v='advanced'/>
14:51:48:  <cpu-species v='X86_PENTIUM_II'/>
14:51:48:  <cpu-type v='AMD64'/>
14:51:48:  <cpus v='-1'/>
14:51:48:  <disable-viz v='false'/>
14:51:48:  <gpu v='true'/>
14:51:48:  <max-packet-size v='normal'/>
14:51:48:  <os-species v='WIN_8'/>
14:51:48:  <os-type v='WIN32'/>
14:51:48:  <project-key v='0'/>
14:51:48:  <smp v='true'/>
14:51:48:
14:51:48:  <!-- GUI -->
14:51:48:  <gui-enabled v='true'/>
14:51:48:
14:51:48:  <!-- HTTP Server -->
14:51:48:  <allow v='127.0.0.1, 192.168.147.90-192.168.147.120'/>
14:51:48:  <connection-timeout v='60'/>
14:51:48:  <deny v='0/0'/>
14:51:48:  <http-addresses v='0:7396'/>
14:51:48:  <https-addresses v=''/>
14:51:48:  <max-connect-time v='900'/>
14:51:48:  <max-connections v='800'/>
14:51:48:  <max-request-length v='52428800'/>
14:51:48:  <min-connect-time v='300'/>
14:51:48:
14:51:48:  <!-- Logging -->
14:51:48:  <log v='log.txt'/>
14:51:48:  <log-color v='false'/>
14:51:48:  <log-crlf v='true'/>
14:51:48:  <log-date v='false'/>
14:51:48:  <log-date-periodically v='21600'/>
14:51:48:  <log-domain v='false'/>
14:51:48:  <log-header v='true'/>
14:51:48:  <log-level v='true'/>
14:51:48:  <log-no-info-header v='true'/>
14:51:48:  <log-redirect v='false'/>
14:51:48:  <log-rotate v='true'/>
14:51:48:  <log-rotate-dir v='logs'/>
14:51:48:  <log-rotate-max v='16'/>
14:51:48:  <log-short-level v='false'/>
14:51:48:  <log-simple-domains v='true'/>
14:51:48:  <log-thread-id v='false'/>
14:51:48:  <log-thread-prefix v='true'/>
14:51:48:  <log-time v='true'/>
14:51:48:  <log-to-screen v='true'/>
14:51:48:  <log-truncate v='false'/>
14:51:48:  <verbosity v='4'/>
14:51:48:
14:51:48:  <!-- Network -->
14:51:48:  <proxy v='5.189.132.136:1455'/>
14:51:48:  <proxy-enable v='false'/>
14:51:48:  <proxy-pass v=''/>
14:51:48:  <proxy-user v=''/>
14:51:48:
14:51:48:  <!-- Process Control -->
14:51:48:  <child v='false'/>
14:51:48:  <daemon v='false'/>
14:51:48:  <pid v='false'/>
14:51:48:  <pid-file v='Folding@home Client.pid'/>
14:51:48:  <respawn v='false'/>
14:51:48:  <service v='false'/>
14:51:48:
14:51:48:  <!-- Remote Command Server -->
14:51:48:  <command-address v='0.0.0.0'/>
14:51:48:  <command-allow-no-pass v='127.0.0.1, 192.168.147.90-192.168.147.120'/>
14:51:48:  <command-deny-no-pass v='0/0'/>
14:51:48:  <command-enable v='true'/>
14:51:48:  <command-port v='36330'/>
14:51:48:  <password v='*'/>
14:51:48:
14:51:48:  <!-- Slot Control -->
14:51:48:  <idle v='false'/>
14:51:48:  <max-shutdown-wait v='60'/>
14:51:48:  <pause-on-battery v='false'/>
14:51:48:  <pause-on-start v='false'/>
14:51:48:  <paused v='false'/>
14:51:48:  <power v='full'/>
14:51:48:  <streaming v='false'/>
14:51:48:
14:51:48:  <!-- Web Server -->
14:51:48:  <web-allow v='127.0.0.1'/>
14:51:48:  <web-deny v='0/0'/>
14:51:48:  <web-enable v='true'/>
14:51:48:
14:51:48:  <!-- Web Server Sessions -->
14:51:48:  <session-cookie v='sid'/>
14:51:48:  <session-lifetime v='86400'/>
14:51:48:  <session-timeout v='3600'/>
14:51:48:
14:51:48:  <!-- Work Unit Control -->
14:51:48:  <dump-after-deadline v='true'/>
14:51:48:  <max-queue v='16'/>
14:51:48:  <max-units v='0'/>
14:51:48:  <next-unit-percentage v='99'/>
14:51:48:  <stall-detection-enabled v='false'/>
14:51:48:  <stall-percent v='5'/>
14:51:48:  <stall-timeout v='1800'/>
14:51:48:
14:51:48:  <!-- Folding Slots -->
14:51:48:  <slot id='1' type='GPU'>
14:51:48:    <opencl-index v='3'/>
14:51:48:  </slot>
14:51:48:  <slot id='2' type='GPU'>
14:51:48:    <cuda-index v='1'/>
14:51:48:    <gpu-index v='2'/>
14:51:48:  </slot>
14:51:48:  <slot id='3' type='GPU'>
14:51:48:    <cuda-index v='0'/>
14:51:48:  </slot>
14:51:48:</config>
14:51:48:Trying to access database...
14:51:48:Successfully acquired database lock
14:51:48:ERROR:FS01:OpenCL device not found for 'opencl-index' = 3 with vendor ID = 0x10de, plese correct this by removing the manually configured 'opencl-index' option.
14:51:48:Enabled folding slot 01: READY gpu:0:GP104 [GeForce GTX 1070]
14:51:48:Enabled folding slot 02: READY gpu:2:GP104 [GeForce GTX 1070]
14:51:48:ERROR:FS03:'opencl-index'=2 is in use by another folding slot but GPU 1 matches this device's PCI bus=4 and PCI slot=0, please correct this by removing any manually configured 'opencl-index' options.
14:51:48:Enabled folding slot 03: READY gpu:1:GP104 [GeForce GTX 1070]
14:51:48:WU01:FS02:Starting
14:51:48:WU01:FS02:Running FahCore: "C:\Program Files\FAHClient/FAHCoreWrapper.exe" C:\Users\folder\AppData\Roaming\FAHClient\cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21.exe -dir 01 -suffix 01 -version 704 -lifeline 4900 -checkpoint 5 -opencl-platform 0 -gpu-vendor nvidia -gpu 2
14:51:48:WU01:FS02:Started FahCore on PID 4432
14:51:48:WU01:FS02:Core PID:1788
14:51:48:WU01:FS02:FahCore 0x21 started
14:51:49:WU03:FS03:Starting
14:51:49:ERROR:WU03:FS03:Failed to start core: OpenCL device matching slot 3 not found
14:51:49:WU03:FS03:Starting
14:51:49:ERROR:WU03:FS03:Failed to start core: OpenCL device matching slot 3 not found
Obviously, for some reason, sometimes the client takes into account first (0) built-in graphics core CPU, and sometimes puts his last in the list of devices.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: MultiGPU problem

Post by bruce »

PLEASE turn off the added verbosity. It makes it more difficult to see what you've changed.

{Added to existing Github ticket.]
des1957
Posts: 37
Joined: Fri Jan 04, 2013 3:20 pm

Re: MultiGPU problem

Post by des1957 »

This a known problem with the beta 7.4.15. Every restart it will fail to assign the proper slots. Revert back to previous version. I ran 4 gpus on the old version with no problems. I tries the beta version and ran into the same problem.
Joe_H
Site Admin
Posts: 7870
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: MultiGPU problem

Post by Joe_H »

One thing that was seen in early reports of this GPU problem and the 7.4.15 public beta is that this bug shows up more for multi-GPU setups where the cards are all the same model. It is less of an issue with mixed GPU installations.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Post Reply