To GPU or not to GPU

Moderators: Site Moderators, PandeGroup

To GPU or not to GPU

Postby Madcap » Sun Dec 18, 2011 11:18 pm

Hey guys.

I am currently using the V6.34 for CPU processing (Q9400 quad) and the V7 beta for GPU processing (ATI 6950). I tried using V7 for both but completing one WU for the CPU was estimated to take more than two days so I switched to V6.34. I run Windows 7 64 bit.

Running the GPU folding roughly halves the speed of my simultaneous CPU folding, so is it worth it? CPU folding I can do all the time while the computer is running as I have water cooling and there is no additional noise, but GPU folding makes noise so I only want to run that while I am away. I read somewhere that you can set the CPU to only use 3 cores, but that would be a waste of 25 % resources when I am using the computer (and not doing GPU folding). What is the best strategy? Thanks.
Madcap
 
Posts: 7
Joined: Sun Dec 18, 2011 11:08 pm

Re: To GPU or not to GPU

Postby JonazzDJ » Mon Dec 19, 2011 3:34 pm

You say you loose 50% of your speed while running your GPU. If you disable one core you only use 25% of the SMP speed but you will get the extra points from your GPU. I say experiment with that find out what maximizes your PPD.
JonazzDJ
 
Posts: 186
Joined: Sun Jan 17, 2010 2:08 pm

Re: To GPU or not to GPU

Postby Napoleon » Mon Dec 19, 2011 4:26 pm

Welcome to the forum, Madcap.

SMP performance is very sensitive about load balancing. It runs only as fast as its slowest thread, so if you slow even one SMP thread down with e.g GPU client, it will slow down the whole SMP process. AMD GPU folding needs a considerable amount of processing time on one of the CPUcores, which should explain the dramatic SMP slowdown.

One option is to run SMP:3 and uniprocessor+GPU on the remaining CPUcore. Configure the GPU client/slot to have "slightly higher" priority than the SMP and uniprocessor cores. The SMP+uniproc cores will keep your CPU 100% utilized at all times, and the "slightly higher" priority on GPU core will (hopefully) steal cycles only from the uniprocessor core when it is running. When the GPU core is paused, uniprocessor will take 100% of the idle cycles on the remaining free core.

NOTE: for optimal results, you may need to install and configure some 3rd party utility which will enforce core priorities and CPU affinities for you in the background. I personally like Process Lasso. Using ProLasso as an example, I'd enforce SMP:3 (FahCore_a*.exe) to "affinity 1;2;3" and "Below Normal" priority, uniprocessor FAHcore (FahCore_78.exe) to "affinity 0" and "Low" priority, and finally GPU core (FahCore_1*.exe) to "affinity 0" and "Below Normal" priority.

"Below normal" priority will suspend the uniprocessor core on the 0th CPUcore whenever the GPU needs the attention, but the uniprocessor core will immediately use any free cycles on CPUcore 0 when the GPU is paused or folding without needing CPU attention. To the best of my knowledge "Below Normal" priority on SMP:3 core should cause your day-to-day apps - "Normal" priority by default - to suspend the uniprocessor FAHCore first. If you multitask or your apps otherwise need more than one CPUcore concurrently, the OS scheduler will then decide (more or less arbitrarily) which of the "Below Normal" FAHCores/threads to suspend whenever your own stuff needs CPU time. The idea is to "protect" SMP:3 performance as much as possible, without compromising system (CPU) responsiveness.

Takes some effort to set up, but IMHO it's worth it. Once it's done, everything is pretty much "start and forget" for 24/7 operation, apart from pausing/restarting the GPU client whenever you feel like it. Once you've configured ProLasso (I'd recommend disabling all the extra features like ProBalance), you can even shut down the ProLasso main window, only leaving ProcessGovernor.exe running in the background. Its CPU usage is near zero, at least if you set the priority/affinity check interval to the maximal 10 seconds. That is entirely sufficient for FAHCore processes, they will run for hours if not days at a time.

This works for v6 clients and v7 beta client slots alike, I've discussed FAHCore executables here, and they are the same for both client versions. However, I'd recommend v7 beta client and client-type='advanced' for your AMD GPU slot. That way you will receive FAHCore_16.exe and appropriate advanced WUs for AMD 5xxx/6xxx cards. Those will utilize your GPU more much efficiently than the earlier ATI/AMD cores and Work Units. If you decide to use only v7 client, switch to Advanced (or Expert) view in FAHControl and you'll be able to choose the slot(s) to pause individually.

Another note, at least the v7 client recommends even number of CPU cores for SMP. To the best of my knowledge, SMP:3 should be OK, but IIRC I've seen error reports with larger primes such as SMP:7 and SMP:11. If points aren't your top priority, the rock steady solution would be SMP:2 + 2x uniprocessor slots + GPU slot, adjusting affinities to "2;3" for SMP FAHCores and "0;1" for uniprocessor & GPU FAHCores.

Back to the original question... sometimes PPD/W oriented people choose not to fold on GPU at all, SMP only since GPU and (most) uniprocessor WUs don't have a Quick Return Bonus system in place (yet?)... Madcap, you've set up a passkey, right?

Ultimately, it's up to you to decide which combo is best for you, the rambling above is just what I would do. :eugeek:

PS:
FahCore_a4.exe WUs are a bit problematic concerning the affinity tweaking. They are hybrid WUs for both SMP and fast uniprocessors, so it is possible to get them for uniprocessor slots also. This foils the FahCore_a*.exe "1;2;3" affinity scheme if you happen to get an A4 WU for the uniprocessor slot as well. However, currently there are some other client setting one can adjust in order to receive only FahCore_78 WUs for the uniproc slot. On a v7 public release, 3rd party utilities such as ProLasso may become unnecessary, see ticket #38.
Win7 64bit, FAH v7, OC'd
2C/4T Atom330 3x667MHz - GT430 2x832.5MHz - ION iGPU 3x466.7MHz
NaCl - Core_15 - display
User avatar
Napoleon
 
Posts: 1032
Joined: Wed May 26, 2010 2:31 pm
Location: Finland

Re: To GPU or not to GPU

Postby bruce » Mon Dec 19, 2011 8:03 pm

Consider another possiblity.

Use SMP 3 as Napoleon suggests, along with a Uniprocessor and a GPU cleint. Leave the Uniprocessor client at IDLE priority but set both the SMP and the GPU client to SLIGHTLY HIGHER priority. That should allow both the SMP and the GPU clients to have dedicated access to the CPU resources they want and the Uniprocessor client will only get resources that are not used by either of them. The Uniprocessor client will run at full speed whenever either there is a break in processing for either the GPU or the SMP client. With these settings, setting affinity would be unnecessary.

This isn't the configuration we normal recommend, so you'll need to test it and compare it with other options, but it might be the best option, considering the uniqueness of your constraints.
bruce
 
Posts: 20827
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: To GPU or not to GPU

Postby Madcap » Mon Dec 19, 2011 10:02 pm

Thank you very much for your in-depth answers. At the moment I am running only the v6.34 on all CPU cores and not folding using the GPU, but I will probably try to experiment with the settings as you suggest.

Should I use only the V7? As mentioned I tried that but the work unit I got was very large (or it was very slow/inefficient to process, I cannot tell). Also, I couldn't find an easy way to start / stop just one slot, they would all stop when I clicked pause. If possible I would like to use v6.34 and run one SMP (3) and one single core thread along with V7 for the GPU.

The computer I am folding on is my main computer, so I need the folding to be non-intrusive with regards to noise. I am sure you understand.
Madcap
 
Posts: 7
Joined: Sun Dec 18, 2011 11:08 pm

Re: To GPU or not to GPU

Postby Zagen30 » Mon Dec 19, 2011 10:39 pm

Client choice almost never affects WU selection (the two exceptions I know of are 1) the core 16 WUs for AMD cards can only be obtained using v7 and 2) old versions of v6 SMP can't get a lot of the newer projects). The large WU you got was either because of different slot settings or just random chance. What project was it from, and what was the TPF? The large projects have longer deadlines and usually earn more points per WU to compensate for their size and runtime (some projects have more noticeable discrepancies than others based on the hardware that's running them).

To pause one slot, right-click on the slot in FAHControl and select pause.

I've certainly found v7 to be much easier to manage than multiple separate v6 clients, but ultimately it's your choice.
Image
Zagen30
 
Posts: 1589
Joined: Tue Mar 25, 2008 12:45 am

Re: To GPU or not to GPU

Postby Madcap » Mon Dec 19, 2011 11:01 pm

I don't remember the number of the project, but it had four digits. The TPF for the SMP I think was around 26 minutes. It would take about 2 days to complete and would yield about 1300 points when running 4 cores along with the GPU. Isn't that a very bad score? I read about people getting >10 k a day. The GPU would complete a unit in about 4 hours, TPF about 2 minutes and 30 seconds.

Is it possible to set up what bruce suggested in V7?
Madcap
 
Posts: 7
Joined: Sun Dec 18, 2011 11:08 pm

Re: To GPU or not to GPU

Postby Zagen30 » Mon Dec 19, 2011 11:51 pm

If you still have the log file, you can check that for WU information; assuming a normal installation, the folder with the logs is [user name]/AppData/Roaming/FAHClient/logs (the current log is just under FAHClient, while the older ones get sent here).

The slow speed may have been due to the GPU. As Napoleon explained, SMP only goes as fast as the slowest core, and AMD cards use up most of one core (that's a problem with AMD's drivers, not something the Pande Group can fix), so the high TPF could have been because of that.

Also, I looked at WUs with roughly 1300 base points, and saw that projects 6040 and 6041 fit the bill. Those are rather large and slow-processing projects, which would match your description. FAHControl may have only been showing the base points; the bonus points are what enable people to get five-figure points per day totals on SMP folding.

Yes, it's possible to do SMP:3/uniprocessor/GPU in v7. See the guide for information on how to add additional slots.
Zagen30
 
Posts: 1589
Joined: Tue Mar 25, 2008 12:45 am

Re: To GPU or not to GPU

Postby Napoleon » Tue Dec 20, 2011 12:09 am

As I mentioned earlier, you need to switch the v7 FahControl to Advanced or Expert view (see top right corner of FAHControl window). Either of those views will have a slot list, allowing you to pause slots independently, just right-click on the appropriate slot and a menu appears. For example, I sometimes pause my ION GPU slot when I'm at my computer and restart it when I leave, my other slots fold 24/7.

@Bruce, thanks for simplifying my optimization proposal a whole lot, "slightly higher" priority for SMP:3 & GPU slots and Idle priority for the uniprocessor slot should indeed accomplish the same thing in this case, without resorting to 3rd party utilities and cumbersome affinity tweaks. :D

@OP: make sure you have a passkey set up properly, you won't be eligible for SMP bonus points without it.
User avatar
Napoleon
 
Posts: 1032
Joined: Wed May 26, 2010 2:31 pm
Location: Finland

Re: To GPU or not to GPU

Postby Madcap » Tue Dec 20, 2011 5:51 pm

I've reinstalled V7 and now I have one GPU slot, one uniprocessor slot and one SMP:3 slot. Btw I found out which project took so long : 7014

When only the uniprocessor slot is running, the Task manager reports 25 % usage across all cores. How come?
Also, when only the smp:3 slot is running, three cores are at about 90 % while one is at 50 %?
Is there a way to designate which cores should work with which slots?

I cannot find an option to set the priority of each separate slot (only in Configure > Advanced there is an option but then for all slots simultaneously). Should I put something in in Configure > Expert?

Thanks for any help.
Madcap
 
Posts: 7
Joined: Sun Dec 18, 2011 11:08 pm

Re: To GPU or not to GPU

Postby bruce » Tue Dec 20, 2011 6:51 pm

By default, the OS assigns work to any available CPU so a single task will wander randomly across all cores. You CAN designate which CPU is to be used using the Affinity setting. If you use Windows to set it, the setting will not be retained when a WU finishes since a new copy of the FahCore starts up when a new WU is assigned. To retain the settings, you'll need a 3rd party utility such as Process Lasso which Napoleon suggested earlier. (Please re-read his earlier post.)

My earlier suggestion doesn't work as well as I'd like it to. When tasks such as SMP request multiple threads AND the total number of threads exceeds the number of processors, the Priority setting doesn't get the same respect as when the total number of independent tasks exceeds the number of processors. Locking in affinity settings may be the only way to do this cleanly, and you may or may not be comfortable with the added level of complexity.
bruce
 
Posts: 20827
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: To GPU or not to GPU

Postby Napoleon » Wed Dec 21, 2011 1:24 am

I apologize in advance for the double post, but perhaps these two topics could be merged? The issues and solutions seem very similar to me:

To GPU or not to GPU
Running SMP + GPU optimally
User avatar
Napoleon
 
Posts: 1032
Joined: Wed May 26, 2010 2:31 pm
Location: Finland

Re: To GPU or not to GPU

Postby Napoleon » Wed Dec 21, 2011 6:22 am

Madcap wrote:When only the uniprocessor slot is running, the Task manager reports 25 % usage across all cores. How come? Also, when only the smp:3 slot is running, three cores are at about 90 % while one is at 50 %?

For uniprocessor only, the logic is simple, your average CPU use with uniprocessor is 1 CPUcore / 4 CPUcores * 100% == 25% for FAH. Like Bruce pointed out, the OS scheduler is just doing its magic, periodically migrating active processes/threads between CPU cores. These migrations happen too quickly for Task Manager/human eye to catch in the graphs, so the uniprocessor only case may look like it's using all four cores concurrently at 25%, but that's not really the case. It just looks like that on the average - (100% +0% +0% +0%) / 4cores - the 100% being rotated quickly through all 4 cores - produces the same average result as (25% +25% +25% +25%) / 4cores, namely 25% average CPU utilization.

Same logic applies to SMP:3 only, 3cores / 4cores * 100% == 75% overall CPU utilization for FAH. If you do the math for the apparent utilization of individual cores, it yields (90% + 90% + 90% + 50%) / 4 cores == 320% / 4 cores == 80% average utilization. About 5% extra compared to theoretical 75% average utilization from FAH is easily explained as overhead from your non-FAH background jobs, running Taks Manager to check CPU utilization being one of them.

Just to make sure: you do see steady 100% CPU utilization for SMP:3 + uniprocessor slots only, don't you? If not, something is wrong.
User avatar
Napoleon
 
Posts: 1032
Joined: Wed May 26, 2010 2:31 pm
Location: Finland

Re: To GPU or not to GPU

Postby Madcap » Wed Dec 21, 2011 4:26 pm

Thanks for all the input. At the moment I am running the v6.34 with one smp:4 slot, nothing else. I will probably keep it this way for now. It works well and requires no Windows tweaking or external programs to function properly. I might do some more experiments in the future.

From a "consumer" point of view I hope there are plans to develop the folding software further so that the resource sharing between CPU and GPU is optimized automatically.
Madcap
 
Posts: 7
Joined: Sun Dec 18, 2011 11:08 pm

Re: To GPU or not to GPU

Postby Zagen30 » Wed Dec 21, 2011 5:28 pm

If you're talking about the CPU usage of AMD GPUs, we've been told it's a function of AMD's drivers, and is not something PG can fix.
Zagen30
 
Posts: 1589
Joined: Tue Mar 25, 2008 12:45 am

Next

Return to V6.34Beta SMP2 with passkey [Not Bigadv]

Who is online

Users browsing this forum: No registered users and 3 guests

cron