ETA 10 days on RX550: Project 13416 run 1140 clone 293 gen 1

Moderators: Site Moderators, FAHC Science Team

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Postby GregC » Thu Jul 23, 2020 9:55 am

1) If WU times out or expires, should that not show on the WU status page, same as when it fails?

2) I'd be interested to see if I'm the only one with such an issue, GPU stopping and failing to make any progress toward finishing the WU
- if so, should I RMA the GPU card, or report this as software bug to AMD, OpenMM?
- if not
a) does it follow manufacturer, GPU generation, or model, or something else, such as OS?
b) can these (multiple) WUs be inspected for the reason why they stalled?
GregC
 
Posts: 24
Joined: Wed May 20, 2020 1:36 am
Location: Houston, TX

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Postby _r2w_ben » Thu Jul 23, 2020 12:29 pm

GregC wrote:I have FAHCore_A22 at below normal priority, and FAHCore_A07 at idle. Only 6 of 8 threads are assigned.

Good, that should leave enough CPU time available for FahCore_22.exe.

13416 writes to disk 10x more frequently than 13418. Is the FAH work folder on an SSD? Do you observe any CPU time used by antivirus/antimalware processes that could be causing FahCore_22.exe to pause while scans are happening?
_r2w_ben
 
Posts: 281
Joined: Wed Apr 23, 2008 4:11 pm

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Postby GregC » Thu Jul 23, 2020 1:08 pm

It's running on an SSD with plenty of free space. It's Microsoft's defaults. On that computer I have a fresh install of Windows, default AV. Not doing much besides folding. One free CPU thread can be seen in Process Explorer/System Information when GPU is actively engaged in folding. Two free CPU threads when this stalled out WU is sitting there.
GregC
 
Posts: 24
Joined: Wed May 20, 2020 1:36 am
Location: Houston, TX

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Postby _r2w_ben » Thu Jul 23, 2020 10:27 pm

Did you try pausing CPU folding when you had one of these slow work units?
_r2w_ben
 
Posts: 281
Joined: Wed Apr 23, 2008 4:11 pm

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Postby GregC » Fri Jul 24, 2020 12:37 am

At night, I run both CPU and GPU, and during the day, GPU only. I think that doesn't matter, but I can't be 100% sure, so will try this, will pay attention to it.
GregC
 
Posts: 24
Joined: Wed May 20, 2020 1:36 am
Location: Houston, TX

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Postby GregC » Wed Jul 29, 2020 10:27 pm

The Ryzen 3100-based PC was unstable even after I replaced the AMD 550 RX with Nvidia 1660 Super. Instability would show itself as an unexpected reset, once a day. I root-caused it to a high FCLK: was 1800MHz unstable, 1766MHz still unstable but better, 1733MHz stable.

I am now running the AMD 550 RX card in an Intel-based PC without issues.

Will try AMD 550 RX back in the Ryzen 3100-based PC to ensure the issue is resolved. Will keep you posted.
GregC
 
Posts: 24
Joined: Wed May 20, 2020 1:36 am
Location: Houston, TX

Re: ETA 10 days on RX550: Project 13416 run 1140 clone 293 g

Postby GregC » Thu Sep 24, 2020 3:30 pm

All of this was due to hardware instability. Case closed.
GregC
 
Posts: 24
Joined: Wed May 20, 2020 1:36 am
Location: Houston, TX

Previous

Return to Issues with a specific WU

Who is online

Users browsing this forum: No registered users and 4 guests

cron