GPU slot continuously returns INTERRUPTED

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, PandeGroup

GPU slot continuously returns INTERRUPTED

Postby hiigaran » Tue Apr 04, 2017 8:56 pm

Code: Select all
20:48:00:WU05:FS04:0x21:*********************** Log Started 2017-04-04T20:48:00Z ***********************
20:48:00:WU05:FS04:0x21:Project: 10496 (Run 162, Clone 18, Gen 42)
20:48:00:WU05:FS04:0x21:Unit: 0x0000003e8ca304f556bbb19331942678
20:48:00:WU05:FS04:0x21:CPU: 0x00000000000000000000000000000000
20:48:00:WU05:FS04:0x21:Machine: 4
20:48:00:WU05:FS04:0x21:Reading tar file core.xml
20:48:00:WU05:FS04:0x21:Reading tar file system.xml
20:48:01:WU05:FS04:0x21:Reading tar file integrator.xml
20:48:01:WU05:FS04:0x21:Reading tar file state.xml
20:48:01:WU05:FS04:0x21:Digital signatures verified
20:48:01:WU05:FS04:0x21:Folding@home GPU Core21 Folding@home Core
20:48:01:WU05:FS04:0x21:Version 0.0.18
20:48:16:WU05:FS04:FahCore returned: INTERRUPTED (102 = 0x66)


The above is the log on my GPU slot that I have noticed is constantly repeating. GPU slot keeps cycling between ready and running during this time. I was under the impression that INTERRUPTED is caused by either the user or client pausing the slot, but this is not the case. This system has been running without any user interaction for several days, though a restart did not help. The other three GPU slots, and CPU slot are working fine. Just in case, I tried setting priority higher, and unchecking the pause on battery options. No change.

Any additional information required?

EDIT: Uhh...I think I might have posted this in the wrong forum

EDIT2: I've noticed that the PRCG is always 10496 (162,18,42). Wondering if perhaps it is this particular WU causing the issues. It's always at 0%, so I can't tell if it's the same WU, or if it is restarting the WU completely. Is there a way to force a download of a new WU?
Last edited by hiigaran on Tue Apr 04, 2017 9:09 pm, edited 1 time in total.
User avatar
hiigaran
 
Posts: 109
Joined: Thu Nov 17, 2011 6:01 pm

Re: GPU slot continuously returns INTERRUPTED

Postby bruce » Tue Apr 04, 2017 9:08 pm

Which drivers are installed?
Also, pleas post the top 100 lines of the log which shows your systems characteristics.
bruce
 
Posts: 20827
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU slot continuously returns INTERRUPTED

Postby hiigaran » Tue Apr 04, 2017 10:03 pm

I'll assume this is everything. Drivers are 370.28

Code: Select all
*********************** Log Started 2017-04-04T20:45:40Z ***********************
20:45:40:************************* Folding@home Client *************************
20:45:40:      Website: http://folding.stanford.edu/
20:45:40:    Copyright: (c) 2009-2016 Stanford University
20:45:40:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
20:45:40:         Args: --child --lifeline 1825 /etc/fahclient/config.xml --run-as
20:45:40:               fahclient --pid-file=/var/run/fahclient.pid --daemon
20:45:40:       Config: /etc/fahclient/config.xml
20:45:40:******************************** Build ********************************
20:45:40:      Version: 7.4.16
20:45:40:         Date: Jan 6 2017
20:45:40:         Time: 08:08:33
20:45:40:   Repository: Git
20:45:40:     Revision: e12187cbb0bd6937c067b9749af011374563b7b9
20:45:40:       Branch: master
20:45:40:     Compiler: GNU 4.9.2
20:45:40:      Options: -std=gnu++98 -O3 -funroll-loops -ffast-math -mfpmath=sse
20:45:40:               -fno-unsafe-math-optimizations -msse2
20:45:40:     Platform: linux2 4.8.0-2-amd64
20:45:40:         Bits: 64
20:45:40:         Mode: Release
20:45:40:******************************* System ********************************
20:45:40:          CPU: Intel(R) Xeon(R) CPU E5-2609 v4 @ 1.70GHz
20:45:40:       CPU ID: GenuineIntel Family 6 Model 79 Stepping 1
20:45:40:         CPUs: 8
20:45:40:       Memory: 7.72GiB
20:45:40:  Free Memory: 7.28GiB
20:45:40:      Threads: POSIX_THREADS
20:45:40:   OS Version: 4.4
20:45:40:  Has Battery: false
20:45:40:   On Battery: false
20:45:40:   UTC Offset: 4
20:45:40:          PID: 1827
20:45:40:          CWD: /var/lib/fahclient
20:45:40:           OS: Linux 4.4.0-66-generic x86_64
20:45:40:      OS Arch: AMD64
20:45:40:         GPUs: 4
20:45:40:        GPU 0: Bus:4 Slot:0 Func:0 NVIDIA:5 GP104 [GeForce GTX 1080]
20:45:40:        GPU 1: Bus:5 Slot:0 Func:0 NVIDIA:5 GP104 [GeForce GTX 1080]
20:45:40:        GPU 2: Bus:8 Slot:0 Func:0 NVIDIA:5 GP104 [GeForce GTX 1080]
20:45:40:        GPU 3: Bus:9 Slot:0 Func:0 NVIDIA:5 GP104 [GeForce GTX 1080]
20:45:40:CUDA Device 0: Platform:0 Device:0 Bus:4 Slot:0 Compute:6.1 Driver:8.0
20:45:40:CUDA Device 1: Platform:0 Device:1 Bus:5 Slot:0 Compute:6.1 Driver:8.0
20:45:40:CUDA Device 2: Platform:0 Device:2 Bus:8 Slot:0 Compute:6.1 Driver:8.0
20:45:40:CUDA Device 3: Platform:0 Device:3 Bus:9 Slot:0 Compute:6.1 Driver:8.0
20:45:40:       OpenCL: Not detected: Failed to open dynamic library 'libOpenCL.so':
20:45:40:               libOpenCL.so: cannot open shared object file: No such file or
20:45:40:               directory
20:45:40:***********************************************************************
20:45:40:<config>
20:45:40:  <!-- Client Control -->
20:45:40:  <fold-anon v='true'/>
20:45:40:
20:45:40:  <!-- Folding Slot Configuration -->
20:45:40:  <gpu v='false'/>
20:45:40:
20:45:40:  <!-- Network -->
20:45:40:  <proxy v=':8080'/>
20:45:40:
20:45:40:  <!-- Slot Control -->
20:45:40:  <power v='full'/>
20:45:40:
20:45:40:  <!-- User Information -->
20:45:40:  <passkey v='********************************'/>
20:45:40:  <team v='212997'/>
20:45:40:  <user v='hiigaran'/>
20:45:40:
20:45:40:  <!-- Folding Slots -->
20:45:40:  <slot id='0' type='CPU'/>
20:45:40:  <slot id='1' type='GPU'>
20:45:40:    <opencl-index v='0'/>
20:45:40:  </slot>
20:45:40:  <slot id='2' type='GPU'>
20:45:40:    <cuda-index v='1'/>
20:45:40:    <opencl-index v='1'/>
20:45:40:  </slot>
20:45:40:  <slot id='3' type='GPU'>
20:45:40:    <cuda-index v='2'/>
20:45:40:    <opencl-index v='2'/>
20:45:40:  </slot>
20:45:40:  <slot id='4' type='GPU'>
20:45:40:    <cuda-index v='3'/>
20:45:40:    <opencl-index v='3'/>
20:45:40:  </slot>
20:45:40:</config>
20:45:40:Switching to user fahclient
20:45:40:Trying to access database...
20:45:41:Successfully acquired database lock
20:45:41:Enabled folding slot 00: READY cpu:4
20:45:41:Enabled folding slot 01: READY gpu:0:GP104 [GeForce GTX 1080]
20:45:41:Enabled folding slot 02: READY gpu:1:GP104 [GeForce GTX 1080]
20:45:41:Enabled folding slot 03: READY gpu:2:GP104 [GeForce GTX 1080]
20:45:41:Enabled folding slot 04: READY gpu:3:GP104 [GeForce GTX 1080]
20:45:41:WU00:FS00:Starting
20:45:41:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 00 -suffix 01 -version 704 -lifeline 1827 -checkpoint 15 -np 4
20:45:41:WU00:FS00:Started FahCore on PID 1837
20:45:41:WU00:FS00:Core PID:1841
20:45:41:WU00:FS00:FahCore 0xa4 started
20:45:42:WU02:FS03:Starting
20:45:42:WU02:FS03:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 02 -suffix 01 -version 704 -lifeline 1827 -checkpoint 15 -gpu-vendor nvidia -opencl-device 2 -cuda-device 2 -gpu 2
20:45:42:WU02:FS03:Started FahCore on PID 1847
20:45:42:WU02:FS03:Core PID:1851
20:45:42:WU02:FS03:FahCore 0x21 started
20:45:42:WU00:FS00:0xa4:
20:45:42:WU00:FS00:0xa4:*------------------------------*
20:45:42:WU00:FS00:0xa4:Folding@Home Gromacs GB Core
20:45:42:WU00:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
20:45:42:WU00:FS00:0xa4:
20:45:42:WU00:FS00:0xa4:Preparing to commence simulation
20:45:42:WU00:FS00:0xa4:- Looking at optimizations...
20:45:42:WU00:FS00:0xa4:- Files status OK
20:45:42:WU00:FS00:0xa4:- Expanded 887768 -> 2072336 (decompressed 233.4 percent)
20:45:42:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=887768 data_size=2072336, decompressed_data_size=2072336 diff=0
20:45:42:WU00:FS00:0xa4:- Digital signature verified
20:45:42:WU00:FS00:0xa4:
20:45:42:WU00:FS00:0xa4:Project: 8633 (Run 0, Clone 52, Gen 62)
20:45:42:WU00:FS00:0xa4:
20:45:42:WU00:FS00:0xa4:Assembly optimizations on if available.
20:45:42:WU00:FS00:0xa4:Entering M.D.
20:45:42:WU05:FS04:Starting
20:45:42:WU05:FS04:Removing old file './work/05/logfile_01-20170404-201242.txt'
20:45:42:WU05:FS04:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 05 -suffix 01 -version 704 -lifeline 1827 -checkpoint 15 -gpu-vendor nvidia -opencl-device 3 -cuda-device 3 -gpu 3
20:45:42:WU05:FS04:Started FahCore on PID 1855
20:45:42:WU05:FS04:Core PID:1859
20:45:42:WU05:FS04:FahCore 0x21 started
20:45:42:WU04:FS02:Starting
20:45:42:WU04:FS02:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 04 -suffix 01 -version 704 -lifeline 1827 -checkpoint 15 -gpu-vendor nvidia -opencl-device 1 -cuda-device 1 -gpu 1
20:45:42:WU04:FS02:Started FahCore on PID 1860
20:45:42:WU04:FS02:Core PID:1864
20:45:42:WU04:FS02:FahCore 0x21 started
20:45:43:WU01:FS01:Starting
20:45:43:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 01 -suffix 01 -version 704 -lifeline 1827 -checkpoint 15 -gpu-vendor nvidia -opencl-device 0 -cuda-device 0 -gpu 0
20:45:43:WU01:FS01:Started FahCore on PID 1865
20:45:43:WU01:FS01:Core PID:1869
20:45:43:WU01:FS01:FahCore 0x21 started
20:45:44:WU05:FS04:0x21:*********************** Log Started 2017-04-04T20:45:43Z ***********************
20:45:44:WU05:FS04:0x21:Project: 10496 (Run 162, Clone 18, Gen 42)
20:45:44:WU05:FS04:0x21:Unit: 0x0000003e8ca304f556bbb19331942678
20:45:44:WU05:FS04:0x21:CPU: 0x00000000000000000000000000000000
20:45:44:WU05:FS04:0x21:Machine: 4
20:45:44:WU05:FS04:0x21:Reading tar file core.xml
20:45:44:WU05:FS04:0x21:Reading tar file system.xml
20:45:44:WU02:FS03:0x21:*********************** Log Started 2017-04-04T20:45:44Z ***********************
20:45:44:WU02:FS03:0x21:Project: 9196 (Run 1, Clone 50, Gen 368)
20:45:44:WU02:FS03:0x21:Unit: 0x000001efab40415c57cb3f32bfc1f20e
20:45:44:WU02:FS03:0x21:CPU: 0x00000000000000000000000000000000
20:45:44:WU02:FS03:0x21:Machine: 3
20:45:44:WU02:FS03:0x21:Digital signatures verified
20:45:44:WU02:FS03:0x21:Folding@home GPU Core21 Folding@home Core
20:45:44:WU02:FS03:0x21:Version 0.0.18
20:45:44:WU01:FS01:0x21:*********************** Log Started 2017-04-04T20:45:44Z ***********************
20:45:44:WU01:FS01:0x21:Project: 9178 (Run 15, Clone 15, Gen 226)
20:45:44:WU01:FS01:0x21:Unit: 0x00000138ab436c6957b24c2a0ac9ed8f
20:45:44:WU01:FS01:0x21:CPU: 0x00000000000000000000000000000000
20:45:44:WU01:FS01:0x21:Machine: 1
20:45:44:WU01:FS01:0x21:Digital signatures verified
20:45:44:WU01:FS01:0x21:Folding@home GPU Core21 Folding@home Core
20:45:44:WU01:FS01:0x21:Version 0.0.18
20:45:44:WU04:FS02:0x21:*********************** Log Started 2017-04-04T20:45:44Z ***********************
20:45:44:WU04:FS02:0x21:Project: 10496 (Run 102, Clone 14, Gen 73)
20:45:44:WU04:FS02:0x21:Unit: 0x000000568ca304f556bbad604c30b42a
20:45:44:WU04:FS02:0x21:CPU: 0x00000000000000000000000000000000
20:45:44:WU04:FS02:0x21:Machine: 2
20:45:44:WU04:FS02:0x21:Digital signatures verified
20:45:44:WU04:FS02:0x21:Folding@home GPU Core21 Folding@home Core
20:45:44:WU04:FS02:0x21:Version 0.0.18
20:45:45:WU04:FS02:0x21:  Found a checkpoint file
20:45:46:WU02:FS03:0x21:  Found a checkpoint file
20:45:46:WU05:FS04:0x21:Reading tar file integrator.xml
20:45:46:WU05:FS04:0x21:Reading tar file state.xml
20:45:46:WU05:FS04:0x21:Digital signatures verified
20:45:46:WU05:FS04:0x21:Folding@home GPU Core21 Folding@home Core
20:45:46:WU05:FS04:0x21:Version 0.0.18
20:45:46:WU01:FS01:0x21:  Found a checkpoint file
20:45:48:WU00:FS00:0xa4:Using Gromacs checkpoints
20:45:49:WARNING:FS02:Size of positions 18948 does not match topology 18865
20:45:50:WARNING:FS02:Size of positions 18948 does not match topology 18865
20:45:50:WARNING:FS02:Size of positions 18948 does not match topology 18865
20:45:50:WARNING:FS02:Size of positions 18948 does not match topology 18865
20:45:50:WARNING:FS02:Size of positions 18948 does not match topology 18865
20:45:50:WU00:FS00:0xa4:Resuming from checkpoint
20:45:50:WU00:FS00:0xa4:Verified 00/wudata_01.log
20:45:51:WU00:FS00:0xa4:Verified 00/wudata_01.trr
20:45:51:WU00:FS00:0xa4:Verified 00/wudata_01.xtc
20:45:51:WU00:FS00:0xa4:Verified 00/wudata_01.edr
20:45:51:WU00:FS00:0xa4:Completed 856830 out of 1250000 steps  (68%)
20:45:59:WU05:FS04:FahCore returned: INTERRUPTED (102 = 0x66)

User avatar
hiigaran
 
Posts: 109
Joined: Thu Nov 17, 2011 6:01 pm

Re: GPU slot continuously returns INTERRUPTED

Postby bollix47 » Tue Apr 04, 2017 10:52 pm

20:45:40: OpenCL: Not detected: Failed to open dynamic library 'libOpenCL.so':
20:45:40: libOpenCL.so: cannot open shared object file: No such file or
20:45:40: directory


How were the drivers installed? When using the ones from nvidia, opencl is installed too but other sources may not.
bollix47
 
Posts: 3319
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: GPU slot continuously returns INTERRUPTED

Postby hiigaran » Tue Apr 04, 2017 11:54 pm

The instructions I followed had these commands:

Code: Select all
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt-get install nvidia-370 nvidia-settings
sudo apt-get install mesa-common-dev
sudo apt-get install freeglut3-dev


That being said, the other three cards are working just fine. It's just this one card. I've got two identical systems, each with four cards. Just one system, with just one card, on just one slot is having this problem. So I don't know if drivers are the cause of this particular problem.
User avatar
hiigaran
 
Posts: 109
Joined: Thu Nov 17, 2011 6:01 pm

Re: GPU slot continuously returns INTERRUPTED

Postby _r2w_ben » Wed Apr 05, 2017 2:05 am

Have you tried physically swapping two of the cards to see if the problem moves with the card or is dependent on the PCIe slot?
_r2w_ben
 
Posts: 115
Joined: Wed Apr 23, 2008 3:11 pm

Re: GPU slot continuously returns INTERRUPTED

Postby bruce » Wed Apr 05, 2017 3:46 am

1) Where is libOpenCL.so? is it accessible through the path or is it somewhere that FAH can see such as the CWD listed in FAH's startup?

2) I suggest you discard Project: 10496 (Run 162, Clone 18, Gen 42) AKA WU05 on your system. Several people have attempted to run this same WU and all have failed. I'll mark it as a corrupt WU and it shouldn't be assigned again after 8am pacific time.

If the you then have the same problem with another WU on that GPU,
3) Here are some other things you can try.

I'm not certain, but you MAY have to reinstall the drivers AFTER installing the last GPU and you MAY have to reinstall FAHClient after all of that. I'd first pause FAH, then go through the processes of reinstalling drvers and software before reactivating FAH's folding function. Depending on how the installers are configured, you may also have to reboot once or twice, but Linux is a lot better about that than Windows.

Lets us know if this helps.
bruce
 
Posts: 20827
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU slot continuously returns INTERRUPTED

Postby hiigaran » Wed Apr 05, 2017 4:14 am

How do I discard the WU?
User avatar
hiigaran
 
Posts: 109
Joined: Thu Nov 17, 2011 6:01 pm

Re: GPU slot continuously returns INTERRUPTED

Postby bruce » Wed Apr 05, 2017 4:22 am

The official method is to run FAHClient --dump 05
(note the double dash after the space)


The unofficial method is to pause that slot and delete the subdirectory 05 from inside of the work directory.
bruce
 
Posts: 20827
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU slot continuously returns INTERRUPTED

Postby hiigaran » Wed Apr 05, 2017 6:26 am

Command didn't do anything. Am I supposed to stop FAHClient first? If so, how?

As for the second method, where is the directory? Haven't found it in /etc/fahclient, or in /home/user/.FAH. The former contains a single .xml file, and the latter a single .db file. ls -a shows no hidden files. I'm guessing there has to be another directory.
User avatar
hiigaran
 
Posts: 109
Joined: Thu Nov 17, 2011 6:01 pm

Re: GPU slot continuously returns INTERRUPTED

Postby Joe_H » Wed Apr 05, 2017 7:19 am

Your current working directory is shown in the log you posted as /var/lib/fahclient, that is the first place to check. Your config file is shown as being in /etc/fahclient, I forget what else if anything the client stores there by default.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Joe_H
Site Admin
 
Posts: 3890
Joined: Tue Apr 21, 2009 4:41 pm
Location: W. MA

Re: GPU slot continuously returns INTERRUPTED

Postby SteveWillis » Wed Apr 05, 2017 11:21 am

for me it's in /var/lib/fahclient/work
after I delete the directory I restart the client
sudo /etc/init.d/FAHClient stop
sudo /etc/init.d/FAHClient start
Image
My thanks to my very indulgent wife
http://folding.extremeoverclocking.com/user_summary.php?s=&u=712804

3 AMD Linux rigs 3, 4, and 5 GPUs 7 X GTX 1080, 5 X GTX 1080 TI
SteveWillis
 
Posts: 231
Joined: Fri Apr 15, 2016 12:42 am

Re: GPU slot continuously returns INTERRUPTED

Postby bruce » Wed Apr 05, 2017 3:10 pm

SteveWillis wrote:for me it's in /var/lib/fahclient/work
after I delete the directory I restart the client
sudo /etc/init.d/FAHClient stop
sudo /etc/init.d/FAHClient start

(or just sudo /etc/init.d/FAHClient restart :ewink:

If you use the unofficial method, once the subdirectory is gone, you shouldn't need to restart ... just unpause the slot and it should recover.

If you did it before 8am, FAH may re-download the same WU.
bruce
 
Posts: 20827
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU slot continuously returns INTERRUPTED

Postby SteveWillis » Wed Apr 05, 2017 3:19 pm

I was thinking that for some reason it didn't like "restart" but I might have been thinking about something else. Computers can be so finicky
SteveWillis
 
Posts: 231
Joined: Fri Apr 15, 2016 12:42 am

Re: GPU slot continuously returns INTERRUPTED

Postby hiigaran » Wed Apr 05, 2017 10:34 pm

Alrighty, everything is working as it should be after the delete. Thanks guys.
User avatar
hiigaran
 
Posts: 109
Joined: Thu Nov 17, 2011 6:01 pm


Return to GPU Projects and FahCores

Who is online

Users browsing this forum: No registered users and 1 guest

cron