Helpful info for those with SMP2 and GPU3 clients.

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: slegrand, Site Moderators, PandeGroup

Helpful info for those with SMP2 and GPU3 clients.

Postby WoodburyMan » Sat Mar 12, 2011 5:34 pm

Thought I would share a few pointers I got while trying to optimize my SMP2 client. Original threads over here in SMP2 thread http://foldingforum.org/viewtopic.php?f=58&t=17835&start=30
I have a i7 920 (Quad core, hyperthreading so 8 threads/CPU's) on Windows 7 64bit with a GTX570 SC @ 810mhz.

Basically there are some new WU's that will not run on a uneven amount of cores/threads. I used to run SMP with smp -7 to utilize 7 out of my 8 threads on my i7 920, leaving one free for GPU3. I never set affinity or did anything of the like. I used to get 1:38 frames with the GPU3 client with SMP2 running this way. Because of these SMP2 work units I had to redo my setup, using affinity settings and it benefited my GPU3 client too.

Setting my SMP2 client to use 6 threads, and running the GPU3 client was affecting SMP2 oddly. I was still getting roughly 1:38 times for GPU3, but SMP2 would vary greatly, between 30 minute and 60 minute frame times on the same Workunit.

I basically found out, by running just the GPU3 client and setting it to two threads, one physical core,, that despite Task Manager reporting the GPU3 clients fahcore_15 process is only using 1-3% CPU, it really uses between 25-40% of a single core depending on Work Unit. (Roughly 10% overall). This might be different for every different GPU and CPU combo. I used CoreTemp to find this out. (http://www.alcpu.com/CoreTemp/. Without affinity lock, it was distributing the workload to all theads, using a coupe % CPU to each thread. This is what was causing a bottleneck in my SMP2 client, because it would throw each thread completely out of sync with the others making other threads have to wait so the single offset thread can catch up. For some reason the GPU3 client would get priority on the threads even though they would have the same priority in Task Manager.

So. I set my SMP2 client to use 6 threads, and locked its afinity to CPU 0,1,2,3,4,5. (3 physical cores). It ran each of them at 100% and left CPU 6,7 alone. I then ran GPU3 client and locked it to CPU 6,7, and it would use 25%-40% of the 4th physical core. My frame times for GPU3 INCREASED from 1:38 to 1:18 on the same work unit doing this and is holding stead at that across 4 work units now. My frame time for SMP2 is holding stead too. The same WU that would varry from 35:00 to 60:00 frame times is now running between 24:30.

Now since setting affinity manually each time I start the client is a paint, here's the batch file I use to launch both Folding@Home clients on startup.

Code: Select all
cd C:\Folding\FAH6.34-win32-SMP
start /affinity 3F  Folding@home-Win32-x86.exe
cd C:\Folding\Folding@home-Win32-GPU_Vista-641_570
start /affinity c0 Folding@home-Win32-GPU.exe

This locks SMP2 to CPU 0,1,2,3,4,5 and locks GPU3 to CPU 6,7. The child processes of fahcore_15 and fahcore_a3 use the same affinity as their parent processes that are launched here when they are created so it stays the same for every work unit as well as they start new ones. It uses the START command's affinity switch to do so.

If you have a system with a different setup, such as 4 or 6 cores you will have to use different numbers for the /affinity command of START.
The way it works is you have to convert CPU's you want to use to binary. 11111111 would mean all 8 CPU's on in a 8 thread system. If you want CPU 6,7 to run it would be 11000000, 192 in decimal. Convert that to hex and you get c0, that is why I use the /affinity c0 command for GPU3 client. If you want to use CPU 0,1,2,3,4,5 you would use 00111111, 63 in decimal. Convert that to hex and you get 3F. That's why I use the /affinity 3F for my SMP2 client.

It's fairly easy to figure out. It goes from right to left for CPU numbers. 00111111 uses first 6 CPUs. 00001111 uses first 4 CPUs. 00000011 uses first 2 CPUs.
00001100 would use the 3rd and 4th CPU. 01010101 would use CPU 0,2,4,6. 10101010 would use CPU 1,3,5,7. Just convert those numbers from binary to normal decimal numbers, then convert that decimal number to hex. Any questions let me know!
User avatar
WoodburyMan
 
Posts: 46
Joined: Tue Mar 08, 2011 1:30 pm
Location: CT, United States

Re: Helpful info for those with SMP2 and GPU3 clients.

Postby WoodburyMan » Mon Mar 14, 2011 4:32 pm

FYI this is also giving me a PPD boost for my SMP2 client when running other WU's besides 101xx projects. Getting 3-4 minute (20%) increase on other units too.
User avatar
WoodburyMan
 
Posts: 46
Joined: Tue Mar 08, 2011 1:30 pm
Location: CT, United States

Re: Helpful info for those with SMP2 and GPU3 clients.

Postby Napoleon » Tue Mar 15, 2011 6:07 am

EDIT: Adjust priorities at your own discretion. E.g Windows Task Manager gives you a warning when you change them. :!:

Good to hear, WoodburyMan. :egeek:

Inspired by the positive results, I had a really wild affinity/priority tweaking idea, in case you'd like to experiment even further:
  1. Configure your system for "Background services" (EDIT: link)
  2. Start the SMP client with "start /affinity 3F Folding@home-Win32-x86.exe" (like you are doing now)
  3. Use e.g Task Manager to find all the processes with High priority and lock their affinity to CPU6 only (remember "Show processes from all users")
  4. Start the GPU3 client with "start /affinity 80 Folding@home-Win32-GPU.exe" (locks the GPU3 client/FAHcore to CPU7 only)
  5. Check TPF
  6. Set FahCore*.exe process priorities to High
  7. Check again after a few frames
High SMP2 FAHcore priority and the affinity tweaks together should keep all other activities off of CPUs 0, 1, 2, 3, 4 & 5. They'll get whatever CPU6 & CPU7 cycles the High priority system tasks and the High priority GPU3 FAHcore leaves them. Also, CPU7 (hyper)thread is completely devoted to the GPU3 client, which should guarantee maximal GPU utilization at all times. This could make your system seem quite sluggish, but hopefully it will remain responsive enough for casual use - browsing, IM, Email, media streaming etc.

Note: you'll lose the FAHCore priorities when a new WU starts. However, if there's any PPD gain and you decide to stick with the tweaks for good, you could use e.g Process Lasso to automate all of the above steps.
Last edited by Napoleon on Sun Apr 17, 2011 12:20 am, edited 2 times in total.
Win7 64bit, FAH v7, OC'd
2C/4T Atom330 3x667MHz - GT430 2x832.5MHz - ION iGPU 3x466.7MHz
NaCl - Core_15 - display
User avatar
Napoleon
 
Posts: 1032
Joined: Wed May 26, 2010 2:31 pm
Location: Finland

Re: Helpful info for those with SMP2 and GPU3 clients.

Postby Napoleon » Tue Mar 15, 2011 7:09 am

Did some benchmarking on my Atom330 (2 physical cores, 4 threads), GT430 & W7 x64 Ultimate. As a baseline, ran one classic client on core 0 and the GPU3 client on core1. The results:
  1. Classic: 25min 14s per frame (P6526, R4, C90, G72). Reported CPU time per frame was almost exactly the same as TPF
  2. GPU3: 4min 40s per frame (P6800, R3568, C2, G31)
Once I forced also the classic client to core1, classic client frame time (read: wall clock time) degraded to 27:33 but the GPU3 frame time remained the same. Quite a performance hit for the classic WU, considering that the FahCore_15.exe CPU utilization was only about 2%. What was even more interesting, the CPU time of the classic client was almost the same as the wall clock time (27:33) again. A bit strange, you'd expect the CPU time per frame to remain constant, wouldn't you?

According to Wikipedia, Vista (and Seven) should have very accurate CPU time accounting, using RDTSC instruction. In order to figure out, I used Process Explorer to try and find out if something "extra" is taking place when GPU3 client is running. Sure enough, there was one noticeable change in System process (PID 4): context switch delta for some "dxgmms1.sys!VidMmInterface+0x26500" leaped to about 500 context switches per second. Thus my wild speculations start once again...

AFAIK, FahCore_15.exe moves data between the GPU and main memory quite frequently. Perhaps at about 1000ms / 500 == 2ms intervals... :?:
Well well, if the amount of data is large enough and goes through CPU cache, perhaps all previous data gets evicted from the cache(s). Could it be that the GPU3 CPU utilization is actually quite low for some projects and WUs, but it still can ruin things for other apps because of this?

I doubt classic - let alone SMP2 - client would particularly like a situation where there's already contention for CPU core resources and some killjoy comes to trash the cache every other millisecond or so. Especially if the OS scheduler allows that particular killjoy to run rampant across all CPU cores unless explicitly instructed otherwise using affinity settings... maybe someone in the know would like to educate me a bit? :ewink:
User avatar
Napoleon
 
Posts: 1032
Joined: Wed May 26, 2010 2:31 pm
Location: Finland

Re: Helpful info for those with SMP2 and GPU3 clients.

Postby baz657 » Thu Mar 17, 2011 3:18 pm

Napoleon wrote:Note: you'll lose the FAHCore priorities when a new WU starts. However, if there's any PPD gain and you decide to stick with the tweaks for good, you could use e.g Process Lasso to automate all of the above steps.

I did a quick search a couple of weeks back and found a small add on called prio. It will save whatever you set the priority for future instances. Download from herehttp://www.prnwatch.com/download.html (it's the last one down), choose 32 or 64 bit, install & reboot. You'll then have to tick the save priority when you change it.

You'll have to change each different core (a3, 11, 15 etc) priority once it's running to take effect but from then on forget about it.
baz657
 
Posts: 64
Joined: Thu Nov 12, 2009 1:40 pm
Location: Chesterfield, UK

Re: Helpful info for those with SMP2 and GPU3 clients.

Postby WoodburyMan » Thu Mar 17, 2011 4:41 pm

Actually... In addition to the /affinity setting the START command can also set CPU Priority
There are a few settings. [/LOW | /NORMAL | /HIGH | /REALTIME | /ABOVENORMAL | /BELOWNORMAL ]

Much better than using any 3rd party software IMHO since it's built into Windows. You just have to make a batch file to start it.

You would run it like the following
[code[cd C:\Folding\FAH6.34-win32-SMP
start /affinity 3F /HIGH Folding@home-Win32-x86.exe
cd C:\Folding\Folding@home-Win32-GPU_Vista-641_570
start /affinity c0 /NORMAL Folding@home-Win32-GPU.exe[/code]

Setting the clients to start with that processor priority should in term also set their child processes, the FAH@Home Cores to the same priority. You can mix and max using any priority you want, I just showed HIGH and NORMAL as a example. I also added in my CPU affinity setting in, but you don't need to if you dont want to. Just backspace the /affinity xx part out.
User avatar
WoodburyMan
 
Posts: 46
Joined: Tue Mar 08, 2011 1:30 pm
Location: CT, United States

Re: Helpful info for those with SMP2 and GPU3 clients.

Postby 7im » Thu Mar 17, 2011 5:09 pm

We don't recommend changing the priority in Windows, because it causes unexpected results. It is also unnecessary, as the client has a priority setting option.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
User avatar
7im
 
Posts: 14648
Joined: Thu Nov 29, 2007 4:30 pm
Location: Arizona

Re: Helpful info for those with SMP2 and GPU3 clients.

Postby WoodburyMan » Thu Mar 17, 2011 5:20 pm

Agreed. I find it unnecisary as there shouldn't be a need to do it. Setting priority of the FAH client means the FAH client gets priority over other processes. Ideally you wouldn't want other processes on the CPU/Thread as the FAH client period cause it would throw each thread out of sync with the others.. which is why the affinity settings are important. Even if you do set FAH client to have a higher priority, it would still slip the other processes that are lower in no matter what and slow down that thread and cause it to fall out of sync with other threads. Best thing to do would be try to throw any processes you have that aren't FAH that use CPU that run in the background over to the cores not used by FAH.

I just put that there in case someone really wishes to try it. I'm not running it myself.
User avatar
WoodburyMan
 
Posts: 46
Joined: Tue Mar 08, 2011 1:30 pm
Location: CT, United States

Re: Helpful info for those with SMP2 and GPU3 clients.

Postby bruce » Thu Mar 17, 2011 6:19 pm

I don't think you're recognizing the difference between the client priority and the core priority. The function of the client is to download and upload WUs and to start a FahCore. It's the SMP cores that start mutiple threads that need to be synchronized, and the core priority is set directly by the configuration setting in the client. If a SMP FahCore is actively processing a WU while the previous result is being uploaded, the client is going to interrupt the core the same number of times to transmit data from disk to the internet no matter what priority you give it.

Let's not get off on setting client priority when it probably doesn't make any difference anyway. Unless you prove it does make a difference, go with 7im's recommendation: Don't recommend that others mess with it just because they can. Any trime you change something from the default, you should know what you're changing and why, and you should be willing to suffer the consequences when something unexpected happens.

If nothing noticeable happens, change it back, in case you changes something that was not noticed that only shows up much later when you've forgotten that you're not running the expected configuration.
bruce
 
Posts: 22712
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Helpful info for those with SMP2 and GPU3 clients.

Postby WoodburyMan » Thu Mar 17, 2011 6:51 pm

Not sure if the difference between core/client thing was to me or not? But I do. However setting the Client priority will also set any processes it creates, the cores, to the same affinity and to the same process priority for Windows scheduling. That's why it works for my affinity settings. I start the client that the settings and when it creates the core, the affinity settings made the same as the parent process.

I again agree it's pointless to do process priority settings but hopefully someone can try it and say it for certain. I'm not risking my PPD or anything to be affected by it :lol:
User avatar
WoodburyMan
 
Posts: 46
Joined: Tue Mar 08, 2011 1:30 pm
Location: CT, United States

Re: Helpful info for those with SMP2 and GPU3 clients.

Postby bruce » Thu Mar 17, 2011 7:01 pm

Setting affinity for the client does also set that affinity for any core that it starts. Setting priority for a client is useless because the client or core (I forget which) sets the priority of the core in accordance with the configuration choice of Lowest_Possible(IDLE) vs. Slightly_higher(LOW).
bruce
 
Posts: 22712
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Helpful info for those with SMP2 and GPU3 clients.

Postby WoodburyMan » Thu Mar 17, 2011 7:25 pm

Ah. I thought it was just a scheduling that the core itself tries to regulate (Like % CPU to use, which is fails at). Didn't know it actually set the task's priority in Windows scheduling itself, therefore making anything started with START /high or anything would be reset by the core itself when it started. Again I've never actually tried it, and don't plan on it. You think if they could regulate task priority, the core itself would be able to do a better job at ahiring to the % CPU setting you can input in config as well, and set affinity for it's cores.
User avatar
WoodburyMan
 
Posts: 46
Joined: Tue Mar 08, 2011 1:30 pm
Location: CT, United States

Re: Helpful info for those with SMP2 and GPU3 clients.

Postby codysluder » Thu Mar 17, 2011 8:41 pm

%CPU was written for the uniprocessor client. It works fine there. It was never updated to account for multiple threads/tasks.
codysluder
 
Posts: 2128
Joined: Sun Dec 02, 2007 12:43 pm

Re: Helpful info for those with SMP2 and GPU3 clients.

Postby Nathan_P » Thu Mar 17, 2011 9:17 pm

codysluder wrote:%CPU was written for the uniprocessor client. It works fine there. It was never updated to account for multiple threads/tasks.


It works on the GPU2 client as well, haven't tried gpu3 yet.
Image
Nathan_P
 
Posts: 1442
Joined: Wed Apr 01, 2009 9:22 pm
Location: Jersey, Channel islands

Re: Helpful info for those with SMP2 and GPU3 clients.

Postby codysluder » Thu Mar 17, 2011 9:28 pm

In that regard, the GPU cores are much like the uniprocessor cores. For SMP, suspending one thread/task out of many for (100-X)% of the time independently of the other threads/tasks doesn't produce a linear reduction in production or in CPU time. It probably could be rewritten to synchronously suspend processing for all threads/tasks.
codysluder
 
Posts: 2128
Joined: Sun Dec 02, 2007 12:43 pm

Next

Return to V6 GPU3 beta (including Fermi) OpenMM

Who is online

Users browsing this forum: No registered users and 1 guest

cron