GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_UNIT

It seems that a lot of GPU problems revolve around specific versions of drivers. Though NVidia has their own support structure, you can often learn from information reported by others who fold.

Moderators: Site Moderators, FAHC Science Team

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Postby m1geo » Wed Apr 01, 2020 4:24 am

Hey, thanks for the confirmation. I've been doing some digging, and I notice something weird. As the GPU changes up through the power levels, the FAHBench https://fahbench.github.io/ benchmarker falls over as soon as the GPU enters power level 3 (P3). Now, I haven't ever overclocked the GPU (GTX 1070) and PSU is more than big enough to run the 150W dissipation.

The GPU will happily sit running a hacked about version of the matrix multiply (just the multiply operation in a infinite loop, recompiled):
Image

Again, thanks for the help.
m1geo
 
Posts: 10
Joined: Tue Mar 31, 2020 3:07 am
Location: Cambridge, UK

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Postby Joe_H » Wed Apr 01, 2020 7:58 am

The GPU folding core does not use CUDA, it uses OpenCL.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Joe_H
Site Admin
 
Posts: 6435
Joined: Tue Apr 21, 2009 5:41 pm
Location: W. MA

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Postby m1geo » Wed Apr 01, 2020 2:20 pm

Yeah, I realise this. It was more a proof of concept that the GPU is there, will talk to the PC, and will run something without crashing.
m1geo
 
Posts: 10
Joined: Tue Mar 31, 2020 3:07 am
Location: Cambridge, UK

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Postby m1geo » Wed Apr 01, 2020 6:46 pm

Some more debugging with FAHBench https://fahbench.github.io/...

The GPU benchmark runs to about 10% before falling over with either "NaN" error or some random exception (usually clEnqueueMapBuffer).

Image

Image

I'm not too sure what to do with this information, or how to debug further...
Thanks...
m1geo
 
Posts: 10
Joined: Tue Mar 31, 2020 3:07 am
Location: Cambridge, UK

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Postby toTOW » Thu Apr 02, 2020 12:18 am

It's not a sign of good shape of your GPU ... :(
Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.

FAH-Addict : latest news, tests and reviews about Folding@Home project.

Image
User avatar
toTOW
Site Moderator
 
Posts: 5616
Joined: Sun Dec 02, 2007 11:38 am
Location: Bordeaux, France

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Postby m1geo » Thu Apr 02, 2020 2:01 am

That's what I thought.

I'm new to GPUs. I'm an electronic engineer, so I don't know the details of GPUs, Nvidia settings, parameters, etc., but I have a pragmatic considered approach.

What I have learned this evening is that reducing the power limit down makes things behave, and I can complete the test.

Image

My current working theory is that there is either a power issue or a clock speed issue which the lower power limit prevents the GPU from entering. The GPU came from a friend, but maybe he dabbled with overclocking it in the past and didn't remember. Thanks all.
m1geo
 
Posts: 10
Joined: Tue Mar 31, 2020 3:07 am
Location: Cambridge, UK

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Postby ipkh » Thu Apr 02, 2020 3:55 am

I've never managed to get a bad GPU to work via downclocking. And I have sent at least 4 cards back for warranty service in the past 5 years. 24/7 folding just has a habit of revealing faults with graphics cards.
You should definitely contact the manufacturer about warranty status on that card.
ipkh
 
Posts: 134
Joined: Thu Jul 16, 2015 3:03 pm

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Postby m1geo » Thu Apr 02, 2020 1:50 pm

Thanks for the heads up.

Using Uengine Heaven https://benchmark.unigine.com/heaven looping, I am able to get the GPU performance at
[*]Graphics: -70
[*]Memory: +500

However, a 10 minute FAHBench session doesn't like that. For FAHBench, I need to run:
[*]Graphics: -90
[*]Memory: +300

The card is an Asus ROG Strix GTX 1070 O8G with the factory heavy overclock, so I guess I'm just reducing that default a little. If I had a Windows key, I'd check to see how it performed on Windows. Maybe worth a shot.

Thanks for the help/advice.
m1geo
 
Posts: 10
Joined: Tue Mar 31, 2020 3:07 am
Location: Cambridge, UK

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Postby kevinjos » Thu Apr 02, 2020 4:05 pm

m1geo - I do not see the same compile error reported by florinandrei. I think we may be dealing with separate issues. Do you consistently see the BAD_WORK_UNIT warning?

florinandrei - have you tried to compile the cuda samples as shown by m1geo above?
Image
kevinjos
 
Posts: 4
Joined: Sun Mar 29, 2020 10:04 pm

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Postby toTOW » Fri Apr 03, 2020 12:59 am

Rule number one to overclocking with FAH : don't touch the VRAM clocks ! It creates more issues that it adds performances.

However factory overclocks should work ... it they don't, then the card need a RMA.

Does the card runs fine in Furmak with manufacturer default clocks ?
User avatar
toTOW
Site Moderator
 
Posts: 5616
Joined: Sun Dec 02, 2007 11:38 am
Location: Bordeaux, France

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Postby m1geo » Fri Apr 03, 2020 2:16 pm

I finally found my issue. It's bizarre! One of the fan bearings has failed. When the controller tried to spin the fans up, the fan would spin a bit, then jam, then drag the 12V rail down on the GPU. That caused all kinds of weirdness. Simply unplugging the one fan and the card works fine. I have ordered 3 new fans. Thanks for the patience and the advice!
m1geo
 
Posts: 10
Joined: Tue Mar 31, 2020 3:07 am
Location: Cambridge, UK

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Postby Neil-B » Fri Apr 03, 2020 2:44 pm

Wow … damn good spot/catch … at least fans are (I believe) cheaper than new GPU card !!
1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent, Quadro K420 1GB, FAH 7.6.13
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro, Quadro M1000M 2GB, FAH 7.6.13
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro, GTX 750Ti 2GB, FAH 7.6.13
Neil-B
 
Posts: 1180
Joined: Sun Mar 22, 2020 6:52 pm
Location: UK

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Postby toTOW » Sat Apr 04, 2020 1:35 pm

That was a nasty one ... :(
User avatar
toTOW
Site Moderator
 
Posts: 5616
Joined: Sun Dec 02, 2007 11:38 am
Location: Bordeaux, France

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Postby kevinjos » Sat Apr 04, 2020 7:54 pm

Right on, good catch! I'm curios if florinandrei was able to sort out their compile error. Are there a set of standard programs to test the system's ability to compile OpenCL code?
kevinjos
 
Posts: 4
Joined: Sun Mar 29, 2020 10:04 pm

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Postby Roadpower » Sun Apr 05, 2020 2:02 pm

Nice catch indeed.
Roadpower
 
Posts: 71
Joined: Mon Mar 16, 2020 6:11 pm

PreviousNext

Return to Problems with NVidia drivers

Who is online

Users browsing this forum: No registered users and 1 guest

cron