Multiple Issues with AMD GPU Processing?

It seems that a lot of GPU problems revolve around specific versions of drivers. Though AMD has their own support structure, you can often learn from information reported by others who fold.

Moderators: Site Moderators, FAHC Science Team

Re: Multiple Issues with AMD GPU Processing?

Postby mwroggenbuck » Wed Apr 08, 2020 1:06 pm

A quick update.

I think I am seeing something different than other people. This morning, Einstein at home caused a Radeon control crash and reset. This was followed by no more work being done in the GPU (no power or utilization).

So apparently, my issue is not isolated to FAH.

This is a new card. It is possible something is wrong with it, but I have not had any problems outside of OpenCL software. All stress tests that I run leave the GPU below 75 degrees. I have been running Einstein at home for several days before this (by itself) without issues, but it does tend to have less utilization than FAH.

In any event, I am going to discontinue OpenCL processing until I have had more time to think about this.

If I do run FAH any more, I will leave the log level at 3 and make sure to save my log file.
mwroggenbuck
 
Posts: 74
Joined: Tue Mar 24, 2020 1:47 pm

Re: Multiple Issues with AMD GPU Processing?

Postby kwthom » Wed Apr 08, 2020 10:38 pm

An update...

Appreciate all of the pointers to other threads here where I could learn a bit more about the AMD issues - ugh.

Anyhow...a magic button somewhere must have been pressed. Every time I'm in here looking at my Client Advanced Control screen, I've had a WU crunching away on my GPU.

Nice!
Image
kwthom
 
Posts: 23
Joined: Mon Mar 30, 2020 12:06 am
Location: Jaynes Station, AZ

Re: Multiple Issues with AMD GPU Processing?

Postby mwroggenbuck » Wed Apr 15, 2020 1:30 pm

I actually have another thread about this, but it turned out my problem was that the GPU could not run the FAH software at the rated GPU clock speed. I reduced the speed by 10%, and it worked fine. I have an RMA and will get a new card. It will be interesting to see if it acts differently.
mwroggenbuck
 
Posts: 74
Joined: Tue Mar 24, 2020 1:47 pm

Re: Multiple Issues with AMD GPU Processing?

Postby kwthom » Sun May 03, 2020 12:16 am

mwroggenbuck wrote:<...>but it turned out my problem was that the GPU could not run the FAH software at the rated GPU clock speed. I reduced the speed by 10%, and it worked fine. <...>

I read this, then really started mucking about with settings...

I now have a stable (I think...) set of settings, but it is underclocked and undervolted by a bit.

A solid day or two will then confirm this, then I can start tweaking as the WU's are now coming a bit more regularly these days.
kwthom
 
Posts: 23
Joined: Mon Mar 30, 2020 12:06 am
Location: Jaynes Station, AZ

Re: Multiple Issues with AMD GPU Processing?

Postby bruce » Sun May 03, 2020 1:07 am

mwroggenbuck wrote:This morning, Einstein at home caused a Radeon control crash and reset.
FAH has absolutely no connection with Einstein@home. We can't provide any kind of support for their projects. They may or may not use the sortshortlist so any connection you can draw between the information provied on the previous pages is entirely your responsibility.

Changing your clock rate won't bypass that problem, but it certainly could bypass some other problems.
bruce
 
Posts: 19656
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.

Re: Multiple Issues with AMD GPU Processing?

Postby mwroggenbuck » Sun May 03, 2020 8:51 pm

Update: my new card work fine. There was a definite stability problem at the clock rate it was supposed to be able to use.

I realize that FAH and Einstein@home are different programs, but they both exercise OpenCL and the GPU. I was not seeing the shortlist problem that initiated this thread. The fact that my error finally occurred outside of FAH made me believe the problem was not FAH.

I apologize if I sent anyone down the wrong path.

Ultimately, I was fighting two different issues: 1) unstable hardware, 2) Anti-virus that locked a file FAH needed to rename.
mwroggenbuck
 
Posts: 74
Joined: Tue Mar 24, 2020 1:47 pm

Re: Multiple Issues with AMD GPU Processing?

Postby kwthom » Mon Jun 01, 2020 3:28 am

I've tweaked a few more things; things are running a bit better.

Yet, I still get periodic shutdowns. The last couple have been related to CPU crunching - weird.

No, I've not saved anything from my last 'unplanned termination event', but a general question...

Is there a public accessible repository of the WU's my system(s) have crunched?
kwthom
 
Posts: 23
Joined: Mon Mar 30, 2020 12:06 am
Location: Jaynes Station, AZ

Re: Multiple Issues with AMD GPU Processing?

Postby PantherX » Mon Jun 01, 2020 10:00 am

kwthom wrote:...Is there a public accessible repository of the WU's my system(s) have crunched?

Not officially. However, you can either:
1) Save your log files and get the PRCG details from it.
2) Use HFM.NET to maintain a WU database across your clients (https://github.com/harlam357/hfm-net)
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
User avatar
PantherX
Site Moderator
 
Posts: 6327
Joined: Wed Dec 23, 2009 10:33 am
Location: Land Of The Long White Cloud

Re: Multiple Issues with AMD GPU Processing?

Postby Crawdaddy79 » Tue Jun 02, 2020 2:04 am

kwthom wrote:I've tweaked a few more things; things are running a bit better.

Yet, I still get periodic shutdowns. The last couple have been related to CPU crunching - weird.

No, I've not saved anything from my last 'unplanned termination event', but a general question...

Is there a public accessible repository of the WU's my system(s) have crunched?


I painstakingly built out a spreadsheet over the course of two weeks to try to find a pattern to my crashes. I found that Project 16435, for whatever reason, was the project that failed on my system (causing a crash) by a far and large margin. Often if the CPU was folding at the time, that work unit would come back with Guru Meditation errors and get dumped while the GPU would pick up where it left off (but if it crashes once, it inevitably crashes again and again until it fails) (I have about a 60% success rate of finishing these). It's enough that when I notice I've picked up 16435, I pause the CPU slot just to preserve the work. I have never had a crash when the CPU is folding by itself.

I hope at least some portion of my post is helpful.
Image
Crawdaddy79
 
Posts: 69
Joined: Sat Mar 21, 2020 4:56 pm

Re: Multiple Issues with AMD GPU Processing?

Postby bruce » Tue Jun 02, 2020 3:49 am

kwthom wrote:Is there a public accessible repository of the WU's my system(s) have crunched?

It's not what you're looking for, but the last WU from each of your slots can be found here:
https://apps.foldingathome.org/cpu

If you've reinstalled FAH, you will find the WUs that have been processed both by the old and the new installation. If you have several machines running FAH, you'll find all of them that use the name that you enter in the User field at the top.

I see five slots all reporting that the last WU was successfully completed and got bonus points.
bruce
 
Posts: 19656
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.

Previous

Return to Problems with AMD/ATI drivers

Who is online

Users browsing this forum: _r2w_ben, Google [Bot] and 1 guest

cron