13421 WUs with abnormally long runtime

Moderators: Site Moderators, FAHC Science Team

Post Reply
Sparkly
Posts: 73
Joined: Sun Apr 19, 2020 11:01 am

13421 WUs with abnormally long runtime

Post by Sparkly »

PRCG numbers for some abnormally long runtime WUs

In my case this is based on 1 x CPU core running 1 x FAHCore_22

13421 - 4444, 0, 0 - ETA 13 days
13421 – 3142, 11, 0 – ETA 11 days

Tested what makes them move forward at some normal speed, and giving each of them 2 x CPU cores makes them rather happy, and giving them 3 x CPU cores makes them even more happy, which in general is rather grabby on CPU resources for a GPU WU.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 13421 WUs with abnormally long runtime

Post by bruce »

Sparkly wrote:PRCG numbers for some abnormally long runtime WUs

In my case this is based on 1 x CPU core running 1 x FAHCore_22
Three questions:
* What GPU is doing the processing?
* How long were those assignments processing without interruption before you noted the estimated run-time?
* Are you running other projects that make heavy use of the GPU?
Sparkly
Posts: 73
Joined: Sun Apr 19, 2020 11:01 am

Re: 13421 WUs with abnormally long runtime

Post by Sparkly »

Both WUs ran on its own RX580 GPU for like 8-10 hours, reaching like 3%, when the remaining ETA was recorded, before giving each of the WUs more cores to play with, resulting in a more normal runtime and ETA of like 3h, and this was the only two WUs running at the time, since everything else was turned off.
JohnChodera
Pande Group Member
Posts: 470
Joined: Fri Feb 22, 2013 9:59 pm

Re: 13421 WUs with abnormally long runtime

Post by JohnChodera »

Thanks for the heads-up on these. We're still working to figure out why some RUNs for these are exceptionally long. Our best hypothesis so far is that this has to do with how constraints are handled, but more investigation is necessary. Hopefully we can dig into that this week now that the sprints are running.

As always, huge thanks for sticking with us despite the suboptimal situation right now!

~ John Chodera // MSKCC
Yeroon
Posts: 25
Joined: Tue Jul 07, 2020 11:09 pm

Re: 13421 WUs with abnormally long runtime

Post by Yeroon »

How are you giving the gpu work units more cores? I have lowered the cpu slot to 6c out of 12, and bumped a22 priority on a wu per wu basis when I can, but only see 2 process threads per wu.
I would like to try this as well, as 13421 has some pretty low performance for me (rx470, ppd down to half, power use down 30% per card)
Sparkly
Posts: 73
Joined: Sun Apr 19, 2020 11:01 am

Re: 13421 WUs with abnormally long runtime

Post by Sparkly »

Yeroon wrote:How are you giving the gpu work units more cores?
I am using a process manager

https://www.bill2-software.com/processm ... load.shtml

to automatically reduce the number cores a FAHcore_22 process gets access to in the first place, so unless you have reduced the amount of cores the process gets access to when it starts, then it will just grab what it can from available cores, so setting affinity might not make any difference for you in that case, unless you do it for manual load balancing purposes.

In Windows 10 you can set process affinity manually for each running process

https://thegeekpage.com/set-affinity-fo ... indows-10/
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 13421 WUs with abnormally long runtime

Post by bruce »

FAHCore_22 has two functions that use a CPU thread. One process moves data to/from mail RAM and the GPU. For NVidia, on WIndows, this process uses a spin-wait (rather han an interruptible sleep) so the CPU always appears to be busy. For AMD, it doesn't use a spin-wait.
The other thing it does (for either GPU) is it runs a sanity check which periodically checks the current state of the WU. That generally uses CPU resources for a very short period of time at widely spaced intervals.

To the best of my knowledge, it doesn't add up to much (except for the spin-wait) so I don't really worry about it.
BobWilliams757
Posts: 493
Joined: Fri Apr 03, 2020 2:22 pm
Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X

Re: 13421 WUs with abnormally long runtime

Post by BobWilliams757 »

Of the 20 WU's I've run on this project, only 2 seemed a bit out of range.

RCG 876, 79, 0 was a bit quicker, with a 56 second frame time vs 1:11 average. 215k PPD

RCG 6262, 31, 0 was the slow one, with a frame time of 2:47 and PPD of 42k

All the others with the average frame time of 1:11 netted approx 150k PPD

All other than the one slow one are on the high side of normal for the Vega 11 I'm using. But any project with atom counts below 15k or so are always on the fast side with this little onboard GPU, so really nothing unexpected.


Passed for any use in troubleshooting only. I'll fold 'em if they all drop down to slow return times. :D
Fold them if you get them!
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 13421 WUs with abnormally long runtime

Post by bruce »

Thanks.

Troubleshooting is focusing mostly on error reports (crashes, etc.) I don't think anybody notices (except donors like yourself) when a WU takes longer or shorter time than "normal" unless you report them.

One or two have been extracted for special attention, though.

I suggest a concise report sort of like this one including the hardware involved.
Post Reply