Core 21 "Running" but not folding

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, FAHC Science Team

Post Reply
shawn.ucd
Posts: 5
Joined: Mon Nov 23, 2015 6:08 am

Core 21 "Running" but not folding

Post by shawn.ucd »

I've been having a problem over the last few days where the FAHControl client indicates that my GPU is Running, but when I look with aticonfig --otgc, the GPU load is 0%. Also, the ETA on the Work Units can be anywhere between 3 and 8 days, and even though the completion percentage increases, completed frames aren't logged (b/c they aren't really being completed). When using Core 21 -- but not Core 17 -- I sometimes get this message: "Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900." My GPU has successfully completed work units with this same configuration previously.

Current log file is below.

Operating System: Ubuntu 14.04
GPU: Radeon HD 7790 (I know, I know, ATI isn't recommended on linux)
ATI Catalyst Driver: 15.2

Code: Select all

Project: 9643 (Run 0, Clone 1, Gen 102)
Unit: 0x0000008fab436c9b5609bee45a553b68
CPU: 0x00000000000000000000000000000000
Machine: 1
Reading tar file core.xml
Reading tar file integrator.xml
Reading tar file state.xml
Reading tar file system.xml
Digital signatures verified
*************************** Core21 Folding@home Core ***************************
       Type: 33
       Core: Core21
    Website: http://folding.stanford.edu/
  Copyright: (c) 2009-2014 Stanford University
     Author: Yutong Zhao <yutong.zhao@stanford.edu>
       Args: -dir 01 -suffix 01 -version 704 -lifeline 5917 -checkpoint 15 -gpu
             0 -gpu-vendor ati
     Config: <none>
************************************ Build *************************************
    Version: 0.0.12
       Date: Sep 16 2015
       Time: 11:37:16
 Repository: Git
   Revision: fedcb38a95256b5e39e717cd626c0f7b0afdcdf1
     Branch: HEAD
   Compiler: GNU 4.4.7 20120313 (Red Hat 4.4.7-16)
    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
             -fno-unsafe-math-optimizations -msse2
   Platform: linux2 2.6.32-573.3.1.el6.x86_64
       Bits: 64
       Mode: Release
************************************ System ************************************
        CPU: AMD Phenom(tm) II X6 1090T Processor
     CPU ID: AuthenticAMD Family 16 Model 10 Stepping 0
       CPUs: 6
     Memory: 11.73GiB
Free Memory: 6.20GiB
    Threads: POSIX_THREADS
 OS Version: 3.13
Has Battery: false
 On Battery: false
 UTC Offset: -8
        PID: 5921
        CWD: /var/lib/fahclient/work
         OS: Linux 3.13.0-68-generic x86_64
    OS Arch: AMD64
       GPUs: 1
      GPU 0: Bus:1 Slot:0 ATI:5 Bonaire XT [Radeon HD 7790]
       CUDA: Not detected
     OpenCL: Not detected
********************************************************************************
Folding@home GPU Core21 Folding@home Core
Version 0.0.12
No protocol specified
No protocol specified
[1] compatible platform(s):
  -- 0 --
  PROFILE = FULL_PROFILE
  VERSION = OpenCL 2.0 AMD-APP (1729.3)
  NAME = AMD Accelerated Parallel Processing
  VENDOR = Advanced Micro Devices, Inc.

(1) device(s) found on platform 0:
  -- 0 --
  DEVICE_NAME = AMD Phenom(tm) II X6 1090T Processor
  DEVICE_VENDOR = AuthenticAMD
  DEVICE_VERSION = OpenCL 1.2 AMD-APP (1729.3)

[ Entering Init ]
  Launch time: 2015-11-23T02:43:42Z
  Arguments passed: -dir 01 -suffix 01 -version 704 -lifeline 5917 -checkpoint 15 -gpu 0 -gpu-vendor ati 
[ Leaving  Init ]
[ Entering Main ]
  Reading core settings...
  Total number of steps: 2000000
  XTC write frequency: 100000
[ Initializing Core Contexts ]
  Using platform OpenCL
  Looking for vendor: ati...found on platformId 0
  Deserializing System...
  Setting up Force Groups:
    Group 0: Everything Else
    Group 1: Nonbonded Direct Space
    Group 2: Nonbonded Reciprocal Space
  Found MonteCarloBarostat @ 1.01325 (default) Bar, 300 Kelvin, 25 pressure change frequency.
    Found: 63555 atoms, 6 forces.
  Deserializing State...  done.
    Integrator Type: N6OpenMM18LangevinIntegratorE
    Constraint Tolerance: 1e-05
    Time Step in PS: 0.002
    Temperature: 300
    Friction Coeff: 1
  Checking core state against reference...
  Checking checkpoint state against reference...
[ Initialized Core Contexts... ]
  Using OpenCL on platformId 0 and gpu 0
  v(^_^)v  MD ready starting from step 0

Completed 0 out of 2000000 steps (0%)
Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
Caught signal SIGINT(2) on PID 5921
Exiting, please wait. . .
2015-11-23T02:52:22Z
[ Leaving  Main ]
Folding@home Core Shutdown: INTERRUPTED
Project: 9643 (Run 0, Clone 1, Gen 102)
Unit: 0x0000008fab436c9b5609bee45a553b68
CPU: 0x00000000000000000000000000000000
Machine: 1
Digital signatures verified
*************************** Core21 Folding@home Core ***************************
       Type: 33
       Core: Core21
    Website: http://folding.stanford.edu/
  Copyright: (c) 2009-2014 Stanford University
     Author: Yutong Zhao <yutong.zhao@stanford.edu>
       Args: -dir 01 -suffix 01 -version 704 -lifeline 11070 -checkpoint 15 -gpu
             0 -gpu-vendor ati
     Config: <none>
************************************ Build *************************************
    Version: 0.0.12
       Date: Sep 16 2015
       Time: 11:37:16
 Repository: Git
   Revision: fedcb38a95256b5e39e717cd626c0f7b0afdcdf1
     Branch: HEAD
   Compiler: GNU 4.4.7 20120313 (Red Hat 4.4.7-16)
    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
             -fno-unsafe-math-optimizations -msse2
   Platform: linux2 2.6.32-573.3.1.el6.x86_64
       Bits: 64
       Mode: Release
************************************ System ************************************
        CPU: AMD Phenom(tm) II X6 1090T Processor
     CPU ID: AuthenticAMD Family 16 Model 10 Stepping 0
       CPUs: 6
     Memory: 11.73GiB
Free Memory: 640.14MiB
    Threads: POSIX_THREADS
 OS Version: 3.13
Has Battery: false
 On Battery: false
 UTC Offset: -8
        PID: 11074
        CWD: /var/lib/fahclient/work
         OS: Linux 3.13.0-68-generic x86_64
    OS Arch: AMD64
       GPUs: 1
      GPU 0: Bus:1 Slot:0 ATI:5 Bonaire XT [Radeon HD 7790]
       CUDA: Not detected
     OpenCL: Not detected
********************************************************************************
Folding@home GPU Core21 Folding@home Core
Version 0.0.12
No protocol specified
No protocol specified
[1] compatible platform(s):
  -- 0 --
  PROFILE = FULL_PROFILE
  VERSION = OpenCL 2.0 AMD-APP (1729.3)
  NAME = AMD Accelerated Parallel Processing
  VENDOR = Advanced Micro Devices, Inc.

(1) device(s) found on platform 0:
  -- 0 --
  DEVICE_NAME = AMD Phenom(tm) II X6 1090T Processor
  DEVICE_VENDOR = AuthenticAMD
  DEVICE_VERSION = OpenCL 1.2 AMD-APP (1729.3)

[ Entering Init ]
  Launch time: 2015-11-23T06:01:05Z
  Arguments passed: -dir 01 -suffix 01 -version 704 -lifeline 11070 -checkpoint 15 -gpu 0 -gpu-vendor ati 
[ Leaving  Init ]
[ Entering Main ]
  Reading core settings...
  Total number of steps: 2000000
  XTC write frequency: 100000
[ Initializing Core Contexts ]
  Using platform OpenCL
  Looking for vendor: ati...found on platformId 0
  Deserializing System...
  Setting up Force Groups:
    Group 0: Everything Else
    Group 1: Nonbonded Direct Space
    Group 2: Nonbonded Reciprocal Space
  Found MonteCarloBarostat @ 1.01325 (default) Bar, 300 Kelvin, 25 pressure change frequency.
    Found: 63555 atoms, 6 forces.
  Deserializing State...  done.
    Integrator Type: N6OpenMM18LangevinIntegratorE
    Constraint Tolerance: 1e-05
    Time Step in PS: 0.002
    Temperature: 300
    Friction Coeff: 1
  Checking core state against reference...
  Checking checkpoint state against reference...
[ Initialized Core Contexts... ]
  Using OpenCL on platformId 0 and gpu 0
  v(^_^)v  MD ready starting from step 0

Completed 0 out of 2000000 steps (0%)
Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Core 21 "Running" but not folding

Post by bruce »

Welcome to foldingforum.org, shawn.ucd.
shawn.ucd wrote: GPUs: 1
GPU 0: Bus:1 Slot:0 ATI:5 Bonaire XT [Radeon HD 7790]
CUDA: Not detected
OpenCL: Not detected
Reinstall the latest drivers from ATI.

If you installed the GPU drivers from ATI/AMD, OpenCL would have been installed too. You can't fold without it.
If you got them from some other source, it's hard to say what you do have.
toTOW
Site Moderator
Posts: 6296
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Core 21 "Running" but not folding

Post by toTOW »

Something is indeed not installed properly. The startup log of the core you posted clearly shows that only one CPU device has been detected in the OpenCL platform :(

Your current WU is probably running on the CPU which is definitely not the target for this core ...
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
shawn.ucd
Posts: 5
Joined: Mon Nov 23, 2015 6:08 am

Re: Core 21 "Running" but not folding

Post by shawn.ucd »

I guess the thing to do is tear out the Catalyst drivers and start over. What I particularly don't understand is why the FAH client is "folding" on the GPU at all -- or why it thinks it is -- if it doesn't think OpenCL is installed. (And I checked, and OpenCL is, indeed, installed.) Shouldn't the client give me a message along the lines of, "hey buddy, you can't fold on your GPU without OpenCL" instead of indicating that it's running and displaying a progress percentage? Shouldn't the software be set up to fail noisily when it is asked to do something impossible?

Oh well. Maybe this is why AMD cards aren't recommended for folding on linux.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Core 21 "Running" but not folding

Post by bruce »

shawn.ucd wrote:I guess the thing to do is tear out the Catalyst drivers and start over. What I particularly don't understand is why the FAH client is "folding" on the GPU at all -- or why it thinks it is -- if it doesn't think OpenCL is installed. (And I checked, and OpenCL is, indeed, installed.) Shouldn't the client give me a message along the lines of, "hey buddy, you can't fold on your GPU without OpenCL" instead of indicating that it's running and displaying a progress percentage? Shouldn't the software be set up to fail noisily when it is asked to do something impossible?

Oh well. Maybe this is why AMD cards aren't recommended for folding on linux.
It did say that, although not in terms you understood. (See the red message I quoted above.)
shawn.ucd
Posts: 5
Joined: Mon Nov 23, 2015 6:08 am

Re: Core 21 and 17 "Running" but not folding

Post by shawn.ucd »

Yes, the log gives the message that OpenCL isn't detected (the red message), but the client software still shows that the GPU is folding and indicates that it is making progress, which, correct me if I'm wrong, can only happen if OpenCL is installed. So the software is giving the user contradictory answers to the same question. Also, since it is true that OpenCL is installed on the computer in question -- and the drivers are the proprietary AMD drivers and I can adjust settings to my heart's content using the Catalyst control panel, etc. -- one could be forgiven for thinking that the message in the log was wrong. So it's not that I couldn't understand your terms; rather, it's that I was having trouble understanding the software's aberrant behavior.
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Core 21 "Running" but not folding

Post by 7im »

Please post the fah log file showing the client completing multiple percentages.

Also, 99% of the time, when the OpenCL is not installed in the log, the driver won't work for folding. But I have only seen one log where folding ran without the client detecting it. Driver fix is the first answer you will get every time with a missing OpenCL or missing CUDA in the log. Can't really cater to the very rare exceptions.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Core 21 "Running" but not folding

Post by bruce »

7im wrote:...Driver fix is the first answer you will get every time with a missing OpenCL or missing CUDA in the log...
Except for the fact that AMD/ATI GPUs will never show CUDA as available. (It's an NVidia proprietary feature.)
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Core 21 "Running" but not folding

Post by 7im »

Thanks to bruce for the added clarity. CUDA for NV cards, OpenCL for AMD cards, but I think he knew that.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
shawn.ucd
Posts: 5
Joined: Mon Nov 23, 2015 6:08 am

Re: Core 21 "Running" but not folding

Post by shawn.ucd »

I completely understand that fixing the driver is the most sensible solution, especially with AMD cards on linux. The log I posted above is from a Work Unit that the FAHControl program indicated (under the Status Tab ) was being folded by the percentage increasing. However, the log wasn't recording each percentage as it completed, which is what tipped me off that it wasn't actually folding.

And this has been happening with all the Work Units for my GPU over the last few days (from Core 21 and 17). I don't get any errors, the folding "starts," the log records, e.g., "Completed 0 out of 2000000 steps (0%)," and then the percent completion increases -- very, very slowly -- under the Status Tab, but the GPU isn't doing anything, and nothing further is logged. Since it takes days to "complete" any of these Work Units, I have only let one or two run until they hit 100%, and I didn't preserve those logs, so I don't know what message is reported.

It's clear that the AMD drivers aren't installed properly, so I can probably fix that problem.
davidcoton
Posts: 1102
Joined: Wed Nov 05, 2008 3:19 pm
Location: Cambridge, UK

Re: Core 21 "Running" but not folding

Post by davidcoton »

The logs will probably still be on your system. Try "Refresh" on the Log tab of Advanced Control (AKA FAHControl). If that doesn't work they will still be in the Log directory, something like var/lib/fahclient/logs.
There is a known bug, due to the way progress is reported to FAHControl, whereby the Status tab assumes progress is being made even without any progress actually being made and logged.
Image
shawn.ucd
Posts: 5
Joined: Mon Nov 23, 2015 6:08 am

Re: Core 21 "Running" but not folding

Post by shawn.ucd »

Ah, that sounds like the bug, thanks. The logs are gone since I've since wiped the disk.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Core 21 "Running" but not folding

Post by bruce »

7im wrote:Thanks to bruce for the added clarity. CUDA for NV cards, OpenCL for AMD cards, but I think he knew that.
Almost.

CUDA for NV cards; OpenCL for either. (In other words, some FahCores use OpenCL on both except when there's a CUDA version than can be used on NV. A new FahCore will almost certainy be OpenCL, because the same code is expected to work on either. Later (once it's working well PLUS they have spare developoment time (Ya, Right. :roll: ) a CUDA version may be developed.
Post Reply