Hardware Accelerated GPU Scheduling (Win 10)

Moderators: Site Moderators, FAHC Science Team

ajm
Posts: 754
Joined: Sat Mar 21, 2020 5:22 am
Location: Lucerne, Switzerland

Hardware Accelerated GPU Scheduling (Win 10)

Post by ajm »

This new feature of Windows 10 2004 is described on the MS devblog: https://devblogs.microsoft.com/directx/ ... cheduling/

It can bring a potential GPU performance boost, down the road, but for now, rather performance snags are observed, if anything: https://youtu.be/wlrWDb1pKXg

It would probably be wise to keep an eye on it, as it may affect FAH as well.

I just enabled it on one machine (2070S with latest drivers). It was smooth, no problem. I don't see any effect yet.

EDIT: That's how it looks when enabled with FAH on High performance:

Image

EDIT2: A new WU [13414 (5128, 33, 1)] is now stable (over 10%) at PPD 1997353, which is really high for a 2070S limited to 75% power by Afterburner.

EDIT3: Next WU is smaller [16441 (477, 1, 122)] but still delivers PPD 1928293. I' going to try that on another machine with AMD and Nvidia GPUs...

EDIT4: Done (1080 ti and 5700XT). No change yet, like the first time. But the overall some 250-300K more on the 2070S have persisted, for hours now. This new Hardware Accelerated GPU Scheduling seems to confuse Adrenalin's Power tuning, which is stuck at 345W, whereas GPU-Z sees around 100W.

EDIT5: An hour later, the 1080ti has started a new WU, but without any performance gain. The 5700XT is still finishing its "old" WU. But I'm wondering whether FahCore_22 should not be stopped entirely in order to pick up the new deal? That's what was necessary for the 2070S. IN order to check that, I'll have to finish both GPU WUs and restart FAH. That will lead us in some two hours from now.
The 2070S is now crunching 13416 (381, 21, 0) at PPD 1989081.

EDIT6: The 2070S at 75% power is now delivering 2M+ (2018834) with 13414 (6310, 34, 1). But the 1080ti and the 5700XT (both also under powered) are "only" at 1604441 and 1154245, respectively, which is excellent but not as significantly better than usual than for the 2070S. Could be a coincidence so far. We'll see tomorrow.

EDIT7: This morning, the 1080ti/5700XT machine was a complete mess! The AMD card in failed state, and then unable to fold anything, the machine terribly lagging while doing almost nothing, the 1080ti struggling. I could save only the current CPU job... Back on track now. The 2070S is folding smaller WUs (11752s) at 1.6M.

EDIT8: A couple hours later, it appears that the good results of last night were just a coincidence: all PPDs are now stable at the previous level.
Last edited by ajm on Sun Jul 05, 2020 6:10 am, edited 8 times in total.
JimboPalmer
Posts: 2573
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: Hardware Accelerated GPU Scheduling (Win 10)

Post by JimboPalmer »

Using Nvidia, this should work with Pascal and later generation GPUs. (Volta, Turing, Ampere)

In GPU-Z in sensors, the Bus Interface Load seemed lower, but I did not do a 'before and after' if you consider this look at how it affects GPU-Z sensors.

I set both Core_22 and Core_21 to High Performance.
Last edited by JimboPalmer on Sat Jul 04, 2020 5:18 pm, edited 1 time in total.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
MeeLee
Posts: 1375
Joined: Tue Feb 19, 2019 10:16 pm

Re: Hardware Accelerated GPU Scheduling (Win 10)

Post by MeeLee »

I read the main advantage is gaming.
Anand did some tests and only got a few percent at most (2-3fps out of 60fps) gain.
Though no one seems to know what exactly happens, and what about the GPU is accelerated. And it seems to only be supported by Nvidia GPUs.
More than likely it's a more direct interface between the CPU PCIE lanes, and the GPU, as the GPU drivers remain the same.
Perhaps a better way of locking the GPU to a single core, rather than shuffling it around, or perhaps shuffling it around more?
Nvidia drivers are supposed to lock the GPU to a single thread on the CPU, but that doesn't happen.
It stays longer on a core, but it still jumps around (without the feature enabled).

Since MS is heavily investing in Linux lately, I presume some optimizations may have come from Linux!
ajm
Posts: 754
Joined: Sat Mar 21, 2020 5:22 am
Location: Lucerne, Switzerland

Re: Hardware Accelerated GPU Scheduling (Win 10)

Post by ajm »

JimboPalmer wrote:In GPU-Z in sensors, the Bus Interface Load seemed lower, but I did not do a 'before and after' if you consider this look at how it affects GPU-Z sensors.
Well, I havent, too bad. But others may. I just checked whether the frequencies and the temps were the same. And they are.
JimboPalmer
Posts: 2573
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: Hardware Accelerated GPU Scheduling (Win 10)

Post by JimboPalmer »

MeeLee wrote:I read the main advantage is gaming.
F@H is not a mainstream app, there is no hint it was tested before and after.
MeeLee wrote:And it seems to only be supported by Nvidia GPUs.

Pascal and later for Nvidia, Navi and later for AMD.
MeeLee wrote:the GPU drivers remain the same.

No, the latest Drivers are needed on both platforms to take advantage of Hardware Accelerated GPU Scheduling. 451.48 on Nvidia, 20.5.1 Beta on AMD
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
ajm
Posts: 754
Joined: Sat Mar 21, 2020 5:22 am
Location: Lucerne, Switzerland

Re: Hardware Accelerated GPU Scheduling (Win 10)

Post by ajm »

Side note: with Win 10 2004, the Task Manager now detects FAH's activity under "3D", but seemingly only for Nvidia cards (AMD visible under Compute 1). And there's no CUDA anymore.
Weird: the graph of Copy stays at zero for the 2070S, but is showing quite intense activity for the 1080ti, often reaching 100%.

Image
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Hardware Accelerated GPU Scheduling (Win 10)

Post by bruce »

Microsoft has not done a good job of recognizing that GPUs are legitimate computing devices. When Stream Computing was in it's infancy, their developers were convinced that the only reason anybody would have a GPU is to display the desktop or play a game. They still don't do a good job of accounting for the activity.
aetch
Posts: 447
Joined: Thu Jun 25, 2020 3:04 pm
Location: Between chair and keyboard

Re: Hardware Accelerated GPU Scheduling (Win 10)

Post by aetch »

I've been having a play about with it as well.
I can't say if it's any better or not. I had a couple of units for 13416 go through a pair of my systems and they fluctuated in speed quite a lot throughout the run(1 had GPU scheduling enabled, the other was still on Win 10 1903). I also saw that without GPU scheduling my copy was steady in the 30-45% range, while the GPU scheduling was all over the map.
I've found that it's not always the first "copy" that sees activity.
I've also had to restart task manager a few times to see that activity.
It's early days, there's a few bugs to iron out.
Folding Rigs - None (25-Jun-2022)

ImageImage
Breach
Posts: 205
Joined: Sat Mar 09, 2013 8:07 pm
Location: Brussels, Belgium

Re: Hardware Accelerated GPU Scheduling (Win 10)

Post by Breach »

The CUDA option is there in Task Manager provided that you have Hardware-accelerated GPU scheduling set to off. Go figure.

Secondly, I am currently observing a performance loss if that's turned on (nVidia 1070, 2Ghz clocks):

PRCG 16913, 15,9,2

eTPF: 3 mins 37 secs
ePPD: 532,245

Turned it off, and I'm back to 750k PPD after 3-4 frames (I normally get 800k).

Edit: Confirmed with a second WU. Methodology:

Started with a new WU, PRCG: 16911 (4, 8, 5)

Off
eTPF 2 mins 13 secs
ePPD 908582

Switched on GPU scheduling, restarted (OK, some minor PPD loss during the restart), waited 20 mins for a few frames to process:

On
eTPF 2 mins 50 secs
ePPD 675245

So, maybe it's Pascal-specific, or because I may be using a beta core, or because of project specific WUs. But I'm keeping GPU scheduling off for the time being. I'll report this on nVidia's forums.
Windows 11 x64 / 5800X@5Ghz / 32GB DDR4 3800 CL14 / 4090 FE / Creative Titanium HD / Sennheiser 650 / PSU Corsair AX1200i
JimboPalmer
Posts: 2573
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: Hardware Accelerated GPU Scheduling (Win 10)

Post by JimboPalmer »

I have been running hardware scheduling for 5 days on 3 GPUs, and I am seeing a slight decline in points.

While it functions fine, it does not seem to be a desirable feature for F@H
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Hardware Accelerated GPU Scheduling (Win 10)

Post by bruce »

OK, from reading what you guys wrote, we simply don't understand what this provides for games or for FAH, if anything. I can't offer any more information than you already have EXCEPT the words "Pascal or better" did catch my eye. I do remember reading a Pascal tuning guide. (If anybody is familiar with the Navi white-papers, maybe they talk about a similar feature.)
JimboPalmer wrote:
MeeLee wrote:And it seems to only be supported by Nvidia GPUs.

Pascal and later for Nvidia, Navi and later for AMD.
One of the GPU internal functions is performed by a "warp scheduler" which initiates internal GPU processes. Let's start with an ASSUMPTION that you can increase your game's frame rate by telling the warp scheduler to concentrate on finishing the processing of a screen update rather than delaying it behind a pending FAH kernel. Wouldn't you want to do that. (Imagine you're a GAMER rather than an admirer of FAH.) This may improve the performance of GPU memory access but since almost none of FAH's performance depends on GPU memory, we don't really care.

Second, there's a new feature beginning with Pascal called Compute Preemption. Compute Preemption allows compute tasks running on the GPU to be interrupted at instruction-level granularity. The execution context (registers, shared memory, etc.) are swapped to GPU DRAM so that another application can be swapped in and run. Compute preemption offers an advantage for developers:

> Long-running kernels no longer need to be broken up into small timeslices to avoid an unresponsive graphical user interface or kernel timeouts when a GPU is used simultaneously for compute and graphics.

I think that's probably what we're talking about. I expect that FAH would be guilty of dispatching long-running kernels but OpenMM would already have broken it's work up into small timeslices to minimize screen lag on pre-Pascal GPUs. Maybe this new feature would minimize screen-lag on Pascal, but I don't think anybody with Pascal-or-better GPUs ever reported a screen-lag problem.

Again, in an ideal world, maybe it's a nice feature but not useful for FAH.

This involves a lot of speculation on my part, so I could easily be wrong. Feel free to comment.
LazyDev
Posts: 13
Joined: Tue Aug 30, 2016 7:28 pm

Re: Hardware Accelerated GPU Scheduling (Win 10)

Post by LazyDev »

I've noticed that the bus interface load has lowered since enabling the option. I'm running a GTX 1070 on PCIe X1.

GPU-z and HWMonitor:
https://prnt.sc/thgajm
Image
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Hardware Accelerated GPU Scheduling (Win 10)

Post by bruce »

LazyDev wrote:I've noticed that the bus interface load has lowered ...
So is the WU running slower?
LazyDev
Posts: 13
Joined: Tue Aug 30, 2016 7:28 pm

Re: Hardware Accelerated GPU Scheduling (Win 10)

Post by LazyDev »

bruce wrote:
LazyDev wrote:I've noticed that the bus interface load has lowered ...
So is the WU running slower?
In terms of GPU Performance, I didn't observe a difference. However, as the bus load seems to have dropped, it's allowed me to push boundaries, and such, has opened the opportunity to run higher performing GPU's on just 1 lane. I need to test this in the future, tho.
Image
MeeLee
Posts: 1375
Joined: Tue Feb 19, 2019 10:16 pm

Re: Hardware Accelerated GPU Scheduling (Win 10)

Post by MeeLee »

bruce wrote: > Long-running kernels no longer need to be broken up into small timeslices to avoid an unresponsive graphical user interface or kernel timeouts when a GPU is used simultaneously for compute and graphics.

I think that's probably what we're talking about. I expect that FAH would be guilty of dispatching long-running kernels but OpenMM would already have broken it's work up into small timeslices to minimize screen lag on pre-Pascal GPUs. Maybe this new feature would minimize screen-lag on Pascal, but I don't think anybody with Pascal-or-better GPUs ever reported a screen-lag problem.

Again, in an ideal world, maybe it's a nice feature but not useful for FAH.

This involves a lot of speculation on my part, so I could easily be wrong. Feel free to comment.
There's no screen lag on my GPUs (like you say, GTX with 384 cores or more).
In Boinc there are some situations in which screen lag can occur, but it's usually when tripling/quadrupling small WUs on a single GPU.
Post Reply