NVidia folding - *real* FahCore_1?.exe CPU utilization?

It seems that a lot of GPU problems revolve around specific versions of drivers. Though NVidia has their own support structure, you can often learn from information reported by others who fold.

Moderators: Site Moderators, FAHC Science Team

Napoleon
Posts: 887
Joined: Wed May 26, 2010 2:31 pm
Hardware configuration: Atom330 (overclocked):
Windows 7 Ultimate 64bit
Intel Atom330 dualcore (4 HyperThreads)
NVidia GT430, core_15 work
2x2GB Kingston KVR1333D3N9K2/4G 1333MHz memory kit
Asus AT3IONT-I Deluxe motherboard
Location: Finland

NVidia folding - *real* FahCore_1?.exe CPU utilization?

Post by Napoleon »

It looks like an established fact that folding with NVidia GPUs requires next to nothing from the CPU, and utilities like Task Manager may report zero CPU utilization percentage. Indeed, as I'm writing this, Task Manager reports 0% for both FahCore_11.exe & FahCore_15.exe (projects 5768 and 8018). However, the reality does not seem quite so blissful. Total CPU time spent in the GPU FahCores keeps increasing gradually. Furthermore, when I take a peek with Process Explorer, I steadily get something like this when measuring over 10s intervals:

Image Image

Notice anything wrong? Like, FahCore_11.exe uses uses about 3 300 million CPU cycles and FahCore_15.exe uses about 2 000 million CPU cycles during a 10s interval, but the CPU utilization percentage is practically zero? My CPU is running at 1.71GHz, which means 1.71GHz * 1000 * 10s == 17 100 million CPU cycles in 10s, per logical CPU. I've got FahCore_11.exe affinity set to logical CPU 1 and FahCore_15.exe affinity set to logical CPU3 in order to keep things simple. So FahCore_11.exe actually uses something like 3300 / 17100 * 100% == 19% of one logical CPU and FahCore_15.exe uses about 2000 / 17100 * 100% == 12%, all the time.

If I increase FahCore_1?.exe priorities to High and run something else on logical CPUs 1 & 3, I actually see a noticeable increase in plain old wall clock time to complete some relatively long running CPU task once I start my GPU folding. Unsurprisingly, the increase is very neatly explained with the actual (logical) CPU cycles / 10s it takes to fold with the GPUs. Yet even Process Explorer reports practically 0% CPU utilization percentage! :eo

I'm running 306.23 WHQL drivers, FahCore_11.exe v1.31 and FahCore_15.exe v2.25, but I've actually noticed the issue with older driver and FahCore_15.exe as well. OK, it isn't exactly news that some people doing SMP+NVidia GPU with a HyperThreaded Intel CPU have actually gotten better PPD by allocating dedicated logical CPU(s) to GPU folding and tweaking the FahCore affinities carefully. In any case, I'm wondering if mine is some sort of weird special case, as far as the erroneous CPU utilization percentage reporting goes?

Not that I'm having actual folding problems; the GPUs are producing just fine. However, I'm interested in seeing if I am actually able to do a little bit of myth busting regarding "practically zero percent CPU utilization" when folding with NVidia GPUs...
Win7 64bit, FAH v7, OC'd
2C/4T Atom330 3x667MHz - GT430 2x832.5MHz - ION iGPU 3x466.7MHz
NaCl - Core_15 - display
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: NVidia folding - *real* FahCore_1?.exe CPU utilization?

Post by 7im »

Nope. Nothing weird or special.

For example, GPU1 did the exact opposite. Showed 100% CPU utilization, but didn't really use all those cycles, just polling that much.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
P5-133XL
Posts: 2948
Joined: Sun Dec 02, 2007 4:36 am
Hardware configuration: Machine #1:

Intel Q9450; 2x2GB=8GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460; Windows Server 2008 X64 (SP1).

Machine #2:

Intel Q6600; 2x2GB=4GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460 video card; Windows 7 X64.

Machine 3:

Dell Dimension 8400, 3.2GHz P4 4x512GB Ram, Video card GTX 460, Windows 7 X32

I am currently folding just on the 5x GTX 460's for aprox. 70K PPD
Location: Salem. OR USA

Re: NVidia folding - *real* FahCore_1?.exe CPU utilization?

Post by P5-133XL »

Nvidia GPU's don't use 0% that really is a misconception. There is always overhead and if you are using the task manager to measure then there is OS overhead associated with the video subsystem (driver overhead) as well as general OS overhead, as well as task-switching that isn't ever reported by that particular tool that is really designed to hide the OS. My guess is that what you are observing is some of that overhead that is not being reported directly but that is just a guess. The actual amount of CPU usage will vary by processor but on my 3.2GHz hyper-threaded P4 running just GPU folding with a GTX 460 the task manager measures about 4% while the task manager itself is at 11%. If I uniprocessor fold on the same machine while GPU folding I'll see the GPU core go much higher (45+% of the hyper-threaded core) while the uniprocessor folding will still starve the GPU dropping the total PPD significantly I've seen my Nvidia GTX 460 GPU folding cores on a Q6600 go up to 8% each while SMP folding. It also matters what the core is doing. Is it folding, or setting itself up to fold or shutting down and getting ready to send each of those processes use a different amount of the CPU?



Really the big difference is the comparison between Nvidia and ATI (which requires a full core all by itself) not that Nvidia is always at 0%.
Image
JimboPalmer
Posts: 2573
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: NVidia folding - *real* FahCore_1?.exe CPU utilization?

Post by JimboPalmer »

I suspect DMA is using clock cycles even though the CPU is not using them. That would match your results without any CPU time being lost.

http://en.wikipedia.org/wiki/Direct_memory_access

It specifically mentions that getting data on and off a graphics card will use DMA, certainly F@H will move data on and off the Nvidia card.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
codysluder
Posts: 1024
Joined: Sun Dec 02, 2007 12:43 pm

Re: NVidia folding - *real* FahCore_1?.exe CPU utilization?

Post by codysluder »

So if I understand correctly, NVIDIA uses a DMA access method which moves the data without actually posting enough CPU interrupts for Task Manager to pick them up and AMD uses a CPU based polling method to move data with the CPU that generates a high interrupt rate. Both will take some resources away from SMP, but AMD will cause a large effect while NVIDIA will cause a small effect. In the early days of CD-ROMs folks made a big deal about whether they used DMA or not.

Somwhere I read that to run AMD, leaving one hyperthreaded (virtual) core per GPU is sufficient but in a non-hyperthreaded CPU you still need one full core per GPU. I don't suppose the resources stolen by NVIDIA follow the same rules, but whether they do or not isn't likely to be important. My computer has whatever CPU it has and I'm not going to switch just because I want to run GPU+SMP.
Rel25917
Posts: 303
Joined: Wed Aug 15, 2012 2:31 am

Re: NVidia folding - *real* FahCore_1?.exe CPU utilization?

Post by Rel25917 »

I just did a quick test on my core 2 duo system here. Frame times on a project 8066 were running about 2:53 to 2:55 while sitting here browsing the web with no gpu client running. Read this thread and fired up my gpu client(gtx560 ti) and frame times are still only 2:55 after 20 minutes. Whatever the gpu client is using it is not having any noticeable effect on my smp client

computer details if interested
Core 2 duo e8400 3ghz overclocked to 4ghz
4k ppd avg high of 5.5k on some projects.
gtx 560 ti slightly underclocked for heat reasons(one of the two fans on it died)
video driver version 280.26
WinXP sp3
artoar_11
Posts: 657
Joined: Sun Nov 22, 2009 8:42 pm
Hardware configuration: AMD R7 3700X @ 4.0 GHz; ASUS ROG STRIX X470-F GAMING; DDR4 2x8GB @ 3.0 GHz; GByte RTX 3060 Ti @ 1890 MHz; Fortron-550W 80+ bronze; Win10 Pro/64
Location: Bulgaria/Team #224497/artoar11_ALL_....

Re: NVidia folding - *real* FahCore_1?.exe CPU utilization?

Post by artoar_11 »

In early summer did underclock of the GTX 460 with 50MHz. The next day I noticed that SMP client has reduced TPF. When I calculate with Bonus Calculator, I saw that the total PPD (GPU+CPU) is the same. I wondered, why I need to OC of the GPU?

Later I noticed that some GPU projects use more CPU cycles, than others. Examples are GPU projects - p8005-8010.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: NVidia folding - *real* FahCore_1?.exe CPU utilization?

Post by bruce »

artoar_11 wrote:Later I noticed that some GPU projects use more CPU cycles, than others. Examples are GPU projects - p8005-8010.
The amount of CPU probably depends on the size of the protein and the size of your GPU. It seems likely that with more data to move, more time is spent moving it. In the days when GPU2 was new, most GPU proteins were 300-900 atoms. As time has passed proteins increased to 1000 - 1500. Now we're seeing more in the 2000+ range. One thing for sure, the trend toward bigger proteins is going to continue, whether that actually means more measurable CPU time or not.
Napoleon
Posts: 887
Joined: Wed May 26, 2010 2:31 pm
Hardware configuration: Atom330 (overclocked):
Windows 7 Ultimate 64bit
Intel Atom330 dualcore (4 HyperThreads)
NVidia GT430, core_15 work
2x2GB Kingston KVR1333D3N9K2/4G 1333MHz memory kit
Asus AT3IONT-I Deluxe motherboard
Location: Finland

Re: NVidia folding - *real* FahCore_1?.exe CPU utilization?

Post by Napoleon »

Thanks for all the replies, most informative. On my behalf, I'm glad I finally figured out (with some proof) exactly why there is some previously unexplained CPU process performance variation in my case. Windows Task Manager gave me misleading information, plain and simple. Not a FAH problem, and of course I'll keep my GPUs folding anyway. Just one of those "nice to know" things, I suppose. At least it isn't some mystery malware stealing my precious CPU cycles. :egeek:
Win7 64bit, FAH v7, OC'd
2C/4T Atom330 3x667MHz - GT430 2x832.5MHz - ION iGPU 3x466.7MHz
NaCl - Core_15 - display
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: NVidia folding - *real* FahCore_1?.exe CPU utilization?

Post by bruce »

You'll find some more information on Microsoft.com. There were some WhitePapers written many years ago about what can and cannot be accurately measured by TaskMan (etc.) and the trade-offs of how much overhead it takes to track things that are "estimated." It's also closely tied to how much overhead the dispatcher is allowed to use when sorting priorities, etc.

Though not actually related, there's some interesting overlap with http://en.wikipedia.org/wiki/Uncertainty_principle
Napoleon
Posts: 887
Joined: Wed May 26, 2010 2:31 pm
Hardware configuration: Atom330 (overclocked):
Windows 7 Ultimate 64bit
Intel Atom330 dualcore (4 HyperThreads)
NVidia GT430, core_15 work
2x2GB Kingston KVR1333D3N9K2/4G 1333MHz memory kit
Asus AT3IONT-I Deluxe motherboard
Location: Finland

Re: NVidia folding - *real* FahCore_1?.exe CPU utilization?

Post by Napoleon »

Indeed, the uncertainty principle is universal, I just didn't expect the Task Manager readouts to be off the mark quite as much as they were in my case. Since I've been receiving relatively CPU heavy projects for my GPUs lately (10505 & 8010) and once again gotten into the habit of running two OGR crunchers (pure integer work) along with two classic slots and the two GPU slots, I considered it worthwhile to post some updated info about my case, especially after cutting back on my CPU OC.

First off, I get ~20Mnodes/s performance running just OGR. When I start the CPU slots, OGR drops to ~17Mndoes/s. I'm really suprised by how well HyperThreading manages to parallelize these two different workloads - (mostly) floating point work vs pure integer work. With careful arrangement of workload affinities, it's almost like getting the performance of an equivalent true quad. Not quite 100%, but I consider about 17 / 20 * 100% == 85% rather amazing.

When I throw GPU folding into the mix, OGR performance drops to about ~5Mnodes/s. Just so you know, I've carefully arranged priorities and affinities so that GPU folding "steals" its CPU cycles from OGR only while FAH CPU folding remains largely unaffected by the addition of GPU folding. Task Manager may paint a much prettier picture, but in my case the real life impact of GPU folding on the OGR performance is huge: drop from 17Mnodes/s to 5Mnodes/s!

EDIT: Looks like the "real" CPU utilization may vary A LOT between various NVidia GPU projects. Currently running P10502 & P7623 (as opposed to 10505 & 8010) and the OGR is crunching at 14Mnodes/s (as opposed to 5Mnodes/s).

Of course, cramming abovementioned 6 active threads on a 2C/4T CPU causes scheduling conflicts, cache contention, younameit. Admittedly, I'm running a bit strange mix of semi-dedicated 24/7 folding at the moment. Until now, I've thought my Atom330 platform to be a miniature version of its stronger siblings, so I couldn't help wondering if my observations can be extrapolated to a setup which has a stronger CPU but stronger GPUs as well. I suppose the conclusion is that more powerful setups also deal with this kind of resource race much more gracefully, whatever the reason may be. :roll:
Last edited by Napoleon on Wed Oct 24, 2012 1:51 pm, edited 2 times in total.
Win7 64bit, FAH v7, OC'd
2C/4T Atom330 3x667MHz - GT430 2x832.5MHz - ION iGPU 3x466.7MHz
NaCl - Core_15 - display
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: NVidia folding - *real* FahCore_1?.exe CPU utilization?

Post by bruce »

We can speculate about "younameit" but without a better monitoring tool than Taskmgr, it would be just that ... speculation. (The fact that Taskmgr [sometimes] gives optimistic results that happen to favor HyperThreading isn't going to give them a good reason to improve it.) Still, you're certainly getting a lot out of your 330.
P5-133XL
Posts: 2948
Joined: Sun Dec 02, 2007 4:36 am
Hardware configuration: Machine #1:

Intel Q9450; 2x2GB=8GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460; Windows Server 2008 X64 (SP1).

Machine #2:

Intel Q6600; 2x2GB=4GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460 video card; Windows 7 X64.

Machine 3:

Dell Dimension 8400, 3.2GHz P4 4x512GB Ram, Video card GTX 460, Windows 7 X32

I am currently folding just on the 5x GTX 460's for aprox. 70K PPD
Location: Salem. OR USA

Re: NVidia folding - *real* FahCore_1?.exe CPU utilization?

Post by P5-133XL »

If you want better accuracy try using performance monitor (built into Windows too). You can isolate and measure individual characteristics and thereby determine what factors are important and what are not.
Image
PinHead
Posts: 285
Joined: Tue Jan 24, 2012 3:43 am
Hardware configuration: Quad Q9550 2.83 contains the GPU 57xx - running SMP and GPU
Quad Q6700 2.66 running just SMP
2P 32core Interlagos SMP on linux

Re: NVidia folding - *real* FahCore_1?.exe CPU utilization?

Post by PinHead »

I didn't go thru all the research that Napoleon did ( thanks for the info btw ), but when I put a GTX570 into a slower Q6700 PC; I noticed that it was getting a bottleneck. CPU frame times were dropping. The GPU was new, so I had no PPD reference.

When I changed the SMP slot to 99%; both the SMP and GPU PPD increased. I think 99.5 or 99.8 would have done the same thing, but I don't think I have that option. :)
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: NVidia folding - *real* FahCore_1?.exe CPU utilization?

Post by 7im »

Try this same old v6 client trick. Change both back to 100% usage.

Keep SMP slot at default priority of idle. Change GPU slot priority to "low"... slot option = "core-priority" with setting = "low"

Restart clients. What happens to PPD? ;)
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Post Reply