Page 3 of 3

Re: Multiple Issues with AMD GPU Processing?

Posted: Wed Apr 08, 2020 12:06 pm
by mwroggenbuck
A quick update.

I think I am seeing something different than other people. This morning, Einstein at home caused a Radeon control crash and reset. This was followed by no more work being done in the GPU (no power or utilization).

So apparently, my issue is not isolated to FAH.

This is a new card. It is possible something is wrong with it, but I have not had any problems outside of OpenCL software. All stress tests that I run leave the GPU below 75 degrees. I have been running Einstein at home for several days before this (by itself) without issues, but it does tend to have less utilization than FAH.

In any event, I am going to discontinue OpenCL processing until I have had more time to think about this.

If I do run FAH any more, I will leave the log level at 3 and make sure to save my log file.

Re: Multiple Issues with AMD GPU Processing?

Posted: Wed Apr 08, 2020 9:38 pm
by kwthom
An update...

Appreciate all of the pointers to other threads here where I could learn a bit more about the AMD issues - ugh.

Anyhow...a magic button somewhere must have been pressed. Every time I'm in here looking at my Client Advanced Control screen, I've had a WU crunching away on my GPU.

Nice!

Re: Multiple Issues with AMD GPU Processing?

Posted: Wed Apr 15, 2020 12:30 pm
by mwroggenbuck
I actually have another thread about this, but it turned out my problem was that the GPU could not run the FAH software at the rated GPU clock speed. I reduced the speed by 10%, and it worked fine. I have an RMA and will get a new card. It will be interesting to see if it acts differently.

Re: Multiple Issues with AMD GPU Processing?

Posted: Sat May 02, 2020 11:16 pm
by kwthom
mwroggenbuck wrote:<...>but it turned out my problem was that the GPU could not run the FAH software at the rated GPU clock speed. I reduced the speed by 10%, and it worked fine. <...>
I read this, then really started mucking about with settings...

I now have a stable (I think...) set of settings, but it is underclocked and undervolted by a bit.

A solid day or two will then confirm this, then I can start tweaking as the WU's are now coming a bit more regularly these days.

Re: Multiple Issues with AMD GPU Processing?

Posted: Sun May 03, 2020 12:07 am
by bruce
mwroggenbuck wrote:This morning, Einstein at home caused a Radeon control crash and reset.
FAH has absolutely no connection with Einstein@home. We can't provide any kind of support for their projects. They may or may not use the sortshortlist so any connection you can draw between the information provied on the previous pages is entirely your responsibility.

Changing your clock rate won't bypass that problem, but it certainly could bypass some other problems.

Re: Multiple Issues with AMD GPU Processing?

Posted: Sun May 03, 2020 7:51 pm
by mwroggenbuck
Update: my new card work fine. There was a definite stability problem at the clock rate it was supposed to be able to use.

I realize that FAH and Einstein@home are different programs, but they both exercise OpenCL and the GPU. I was not seeing the shortlist problem that initiated this thread. The fact that my error finally occurred outside of FAH made me believe the problem was not FAH.

I apologize if I sent anyone down the wrong path.

Ultimately, I was fighting two different issues: 1) unstable hardware, 2) Anti-virus that locked a file FAH needed to rename.

Re: Multiple Issues with AMD GPU Processing?

Posted: Mon Jun 01, 2020 2:28 am
by kwthom
I've tweaked a few more things; things are running a bit better.

Yet, I still get periodic shutdowns. The last couple have been related to CPU crunching - weird.

No, I've not saved anything from my last 'unplanned termination event', but a general question...

Is there a public accessible repository of the WU's my system(s) have crunched?

Re: Multiple Issues with AMD GPU Processing?

Posted: Mon Jun 01, 2020 9:00 am
by PantherX
kwthom wrote:...Is there a public accessible repository of the WU's my system(s) have crunched?
Not officially. However, you can either:
1) Save your log files and get the PRCG details from it.
2) Use HFM.NET to maintain a WU database across your clients (https://github.com/harlam357/hfm-net)

Re: Multiple Issues with AMD GPU Processing?

Posted: Tue Jun 02, 2020 1:04 am
by Crawdaddy79
kwthom wrote:I've tweaked a few more things; things are running a bit better.

Yet, I still get periodic shutdowns. The last couple have been related to CPU crunching - weird.

No, I've not saved anything from my last 'unplanned termination event', but a general question...

Is there a public accessible repository of the WU's my system(s) have crunched?
I painstakingly built out a spreadsheet over the course of two weeks to try to find a pattern to my crashes. I found that Project 16435, for whatever reason, was the project that failed on my system (causing a crash) by a far and large margin. Often if the CPU was folding at the time, that work unit would come back with Guru Meditation errors and get dumped while the GPU would pick up where it left off (but if it crashes once, it inevitably crashes again and again until it fails) (I have about a 60% success rate of finishing these). It's enough that when I notice I've picked up 16435, I pause the CPU slot just to preserve the work. I have never had a crash when the CPU is folding by itself.

I hope at least some portion of my post is helpful.

Re: Multiple Issues with AMD GPU Processing?

Posted: Tue Jun 02, 2020 2:49 am
by bruce
kwthom wrote:Is there a public accessible repository of the WU's my system(s) have crunched?
It's not what you're looking for, but the last WU from each of your slots can be found here:
https://apps.foldingathome.org/cpu

If you've reinstalled FAH, you will find the WUs that have been processed both by the old and the new installation. If you have several machines running FAH, you'll find all of them that use the name that you enter in the User field at the top.

I see five slots all reporting that the last WU was successfully completed and got bonus points.