Page 1 of 1

Lots of problems with random FAH crashes on GPU (RX 5600 XT)

Posted: Mon Apr 06, 2020 4:03 pm
by RiaSkies
I don't know if this is a driver or a BIOS issue, or if F@H Core v22 just doesn't like the 5600 XT, or if my card is just butts

Note: This is (primarily) concerning the FAH core crashing, becoming unstable, or otherwise just not working. At this time, while I have encountered one blue-screen, I have not noticed full system or GPU crashes as a result of running FAH, but I haven't been able to successfully fold on my GPU.

I've had seemingly every issue in the book:

- clCreateCommandQueue (-6) errors getting the program started
- Reaching bad (NaN) states and being unable to load from a checkpoint
- Random crashes in the F@H core
- Core refuses to boot up and no log

Some sample log files can be found here: viewtopic.php?f=61&t=33972

I'm running a Sapphire Pulse graphics card at factory-stock settings, and it was able to run multiple consecutive runs of FurMark (max temp on the last run was 75 C) without any system errors, benchmark shutdowns, or other evidence of instability. I have downloaded the latest AMD drivers, and GPU-Z is detecting that OpenCL support is present on the card.

I'm sort of at wit's end; right now, I have disabled GPU folding since I have yet to successfully fold a protein and constantly submitting bad WU's doesn't help advance the science, but I'd very much like to be able to use this card for the betterment of kicking CoVID in the teeth.

GPU-Z Profile: https://i.imgur.com/AEZoBdO.gif

From what I've seen, there are some issues being reported with GCN-based cards but this is a Navi-based card which is having the issues, so... Not sure where to go from here. Maybe a BIOS flash would fix this?

Re: Lots of problems with random FAH crashes on GPU (RX 5600

Posted: Mon Apr 06, 2020 5:40 pm
by Joe_H
The reference clock setting for a 5600 XT is 1375 with a 1750 boost. Are the Sapphire factory settings higher than that? If so, try reducing the clock speed to the reference values.

Some others using the Navi based cards have posted that they needed to adjust fan and power settings, some searches here should turn those up. Also someone posted recently about one of the latest drivers from AMD having issues with F@h.

Re: Lots of problems with random FAH crashes on GPU (RX 5600

Posted: Tue Apr 07, 2020 6:42 am
by foldy
This is another heavy benchmark you could try to run to reproduce the issue
https://benchmark.unigine.com/superposition

You can limit power usage of the GPU to see if that is the problem.

Re: Lots of problems with random FAH crashes on GPU (RX 5600

Posted: Tue Apr 07, 2020 1:05 pm
by RiaSkies
It seems like increasing the fan speed considerably and lowering power usage to -43% was enough to stabilize F@H to run some GPU folds on this computer, though at some loss to performance. Hopefully new driver development from AMD will be forthcoming that addresses the black screen issue.

Re: Lots of problems with random FAH crashes on GPU (RX 5600

Posted: Tue Apr 07, 2020 7:21 pm
by toTOW
Don't forget the list of known issue in latest drivers (but it existed before) :
Known Issues
Running Folding@Home while also running an application using hardware acceleration of video content can cause a system hang or black screen. A potential workaround is disabling hardware acceleration for the application that has it enabled.

Re: Lots of problems with random FAH crashes on GPU (RX 5600

Posted: Fri Apr 24, 2020 7:40 pm
by DarkFoss
Try the new AMD 20.4.2 driver I had those errors too with my Fury X.https://www.amd.com/en/support/kb/relea ... win-20-4-2
Fixed Issue Radeon RX Vega series graphics products may experience a system hang or black screen when running Folding@Home while also running an application using hardware acceleration of video content.

Project: 11761 (Run 0, Clone 10697, Gen 12) Project: 14562 (Run 0, Clone 157, Gen 21)Project: 14561 (Run 0, Clone 828, Gen 1) Project: 14561 (Run 0, Clone 1905, Gen 0) Project: 14561 (Run 0, Clone 1173, Gen 2)Project: 14563 (Run 0, Clone 990, Gen 3)

All the above completed with zero errors. Projects 14561 has 438,651 Atoms 14562 has 371,771 Atoms 14563 has 448,584 Atoms so that error is gone as well previously anything above 165k Atoms would fail immediately. The third one listed had actually been failed by a nvidia card.
The new covid19 wu's run hot all I had to do was increase the minimum fan speed from 500 to 975.