GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_UNIT

It seems that a lot of GPU problems revolve around specific versions of drivers. Though NVidia has their own support structure, you can often learn from information reported by others who fold.

Moderators: Site Moderators, FAHC Science Team

m1geo
Posts: 10
Joined: Tue Mar 31, 2020 2:07 am
Location: Cambridge, UK
Contact:

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Post by m1geo »

Hey, thanks for the confirmation. I've been doing some digging, and I notice something weird. As the GPU changes up through the power levels, the FAHBench https://fahbench.github.io/ benchmarker falls over as soon as the GPU enters power level 3 (P3). Now, I haven't ever overclocked the GPU (GTX 1070) and PSU is more than big enough to run the 150W dissipation.

The GPU will happily sit running a hacked about version of the matrix multiply (just the multiply operation in a infinite loop, recompiled):
Image

Again, thanks for the help.
Joe_H
Site Admin
Posts: 7867
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Post by Joe_H »

The GPU folding core does not use CUDA, it uses OpenCL.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
m1geo
Posts: 10
Joined: Tue Mar 31, 2020 2:07 am
Location: Cambridge, UK
Contact:

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Post by m1geo »

Yeah, I realise this. It was more a proof of concept that the GPU is there, will talk to the PC, and will run something without crashing.
m1geo
Posts: 10
Joined: Tue Mar 31, 2020 2:07 am
Location: Cambridge, UK
Contact:

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Post by m1geo »

Some more debugging with FAHBench https://fahbench.github.io/...

The GPU benchmark runs to about 10% before falling over with either "NaN" error or some random exception (usually clEnqueueMapBuffer).

Image

Image

I'm not too sure what to do with this information, or how to debug further...
Thanks...
toTOW
Site Moderator
Posts: 6307
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Post by toTOW »

It's not a sign of good shape of your GPU ... :(
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
m1geo
Posts: 10
Joined: Tue Mar 31, 2020 2:07 am
Location: Cambridge, UK
Contact:

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Post by m1geo »

That's what I thought.

I'm new to GPUs. I'm an electronic engineer, so I don't know the details of GPUs, Nvidia settings, parameters, etc., but I have a pragmatic considered approach.

What I have learned this evening is that reducing the power limit down makes things behave, and I can complete the test.

Image

My current working theory is that there is either a power issue or a clock speed issue which the lower power limit prevents the GPU from entering. The GPU came from a friend, but maybe he dabbled with overclocking it in the past and didn't remember. Thanks all.
ipkh
Posts: 175
Joined: Thu Jul 16, 2015 2:03 pm

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Post by ipkh »

I've never managed to get a bad GPU to work via downclocking. And I have sent at least 4 cards back for warranty service in the past 5 years. 24/7 folding just has a habit of revealing faults with graphics cards.
You should definitely contact the manufacturer about warranty status on that card.
m1geo
Posts: 10
Joined: Tue Mar 31, 2020 2:07 am
Location: Cambridge, UK
Contact:

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Post by m1geo »

Thanks for the heads up.

Using Uengine Heaven https://benchmark.unigine.com/heaven looping, I am able to get the GPU performance at
[*]Graphics: -70
[*]Memory: +500

However, a 10 minute FAHBench session doesn't like that. For FAHBench, I need to run:
[*]Graphics: -90
[*]Memory: +300

The card is an Asus ROG Strix GTX 1070 O8G with the factory heavy overclock, so I guess I'm just reducing that default a little. If I had a Windows key, I'd check to see how it performed on Windows. Maybe worth a shot.

Thanks for the help/advice.
kevinjos
Posts: 4
Joined: Sun Mar 29, 2020 9:04 pm
Hardware configuration: Igneous - iMac 4GHz 4-Core Intel i7
Valis - 2.2GHz 20-Core Intel Xeon Silver 4114 + 4x Titan V GPUs
Contact:

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Post by kevinjos »

m1geo - I do not see the same compile error reported by florinandrei. I think we may be dealing with separate issues. Do you consistently see the BAD_WORK_UNIT warning?

florinandrei - have you tried to compile the cuda samples as shown by m1geo above?
Image
toTOW
Site Moderator
Posts: 6307
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Post by toTOW »

Rule number one to overclocking with FAH : don't touch the VRAM clocks ! It creates more issues that it adds performances.

However factory overclocks should work ... it they don't, then the card need a RMA.

Does the card runs fine in Furmak with manufacturer default clocks ?
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
m1geo
Posts: 10
Joined: Tue Mar 31, 2020 2:07 am
Location: Cambridge, UK
Contact:

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Post by m1geo »

I finally found my issue. It's bizarre! One of the fan bearings has failed. When the controller tried to spin the fans up, the fan would spin a bit, then jam, then drag the 12V rail down on the GPU. That caused all kinds of weirdness. Simply unplugging the one fan and the card works fine. I have ordered 3 new fans. Thanks for the patience and the advice!
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Post by Neil-B »

Wow … damn good spot/catch … at least fans are (I believe) cheaper than new GPU card !!
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
toTOW
Site Moderator
Posts: 6307
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Post by toTOW »

That was a nasty one ... :(
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
kevinjos
Posts: 4
Joined: Sun Mar 29, 2020 9:04 pm
Hardware configuration: Igneous - iMac 4GHz 4-Core Intel i7
Valis - 2.2GHz 20-Core Intel Xeon Silver 4114 + 4x Titan V GPUs
Contact:

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Post by kevinjos »

Right on, good catch! I'm curios if florinandrei was able to sort out their compile error. Are there a set of standard programs to test the system's ability to compile OpenCL code?
Image
Roadpower
Posts: 71
Joined: Mon Mar 16, 2020 5:11 pm

Re: GTX1060 Linux drv v430 Error compiling kernel BAD_WORK_U

Post by Roadpower »

Nice catch indeed.
Post Reply