GPU2 Suggestions Thread

Moderators: Site Moderators, PandeGroup

Re: GPU2 Suggestions Thread

Postby Wrish » Sun Feb 28, 2010 5:51 am

I feel GPU should remain separate from SMP... it's too different. SMP has access to so much RAM, limited parallelism, and very fast serial operation. A GPU has so little RAM (the alternative Tesla is way too expensive right now), tons of parallelism, and much lower clocks. Also, GPU is inherently less stable due to the pace of GPU development and the heretofore frequent turnover in driver code - as compared to CPU generations that stress backward compatibility - not to mention GPGPU is barely out of the Wild West stages.

If a novice can install a GPU systray as well as a uniprocessor systray, then with a little conflict avoidance in the client installers, a novice can install the two in combination; an SMP systray would be a nice addition, though.

P.S. That Linux client doesn't support GPUs. To run GPUs at all so far requires a third-party wrapper, or Windows.
Wrish
 
Posts: 390
Joined: Thu Jan 28, 2010 5:09 am

Re: GPU2 Suggestions Thread

Postby theteofscuba » Sun Feb 28, 2010 6:08 am

Wrish wrote:I feel GPU should remain separate from SMP... it's too different. SMP has access to so much RAM, limited parallelism, and very fast serial operation. A GPU has so little RAM (the alternative Tesla is way too expensive right now), tons of parallelism, and much lower clocks. Also, GPU is inherently less stable due to the pace of GPU development and the heretofore frequent turnover in driver code - as compared to CPU generations that stress backward compatibility - not to mention GPGPU is barely out of the Wild West stages.

If a novice can install a GPU systray as well as a uniprocessor systray, then with a little conflict avoidance in the client installers, a novice can install the two in combination; an SMP systray would be a nice addition, though.

P.S. That Linux client doesn't support GPUs. To run GPUs at all so far requires a third-party wrapper, or Windows.


I have no problem with the idea of having a single client. It might require some changes from what is currently ordinary, but that is OK. I don't make any claim that this is something easy to do, but it is preferrable.

Say you had an SMP client and one or more GPU client on a computer. each of those clients will run at a particular rate. Having a unified shouldn't require it being being faster than a non-unified client. Having a unified client would ensure that each machine is put to maximum use. Instead of having people download and configure like 3 clients, they would only have to download once client, and the unified client will handle scaling the resources available on the machine that is running it. I assume that *alot* of people have machines that have two or more cores *AND* compatible GPU! If these people aren't exactly computer saavy, they might only download the uni-processor client which is waaay under total capacity! Unified client's only requirement is to put most of these resources to use with as little effort as possible. Not everyone who runs folding@home is an expert of computer science. This is why I advocate a unfied client. I have no illusions that the unified client will actually theoretically be more optimized.
theteofscuba
 
Posts: 171
Joined: Wed Dec 05, 2007 7:15 am

Re: GPU2 Suggestions Thread

Postby Wrish » Sun Feb 28, 2010 6:44 am

Unifying CPU and GPU is a much bigger step than "unifying" SMP with uniprocessor into a systray. A3 should be the new uniprocessor after a qualifying benchmark, but they haven't even done that, so I'm not confident of their abilities to integrate asynchronous CPU and GPU clients. So much more could go wrong, and novices aren't good at troubleshooting a crash. Additionally, most "novices" don't have much of a GPU to worry about. There is a huge variation in performance of contemporary GPUs compared to modern CPUs. Just converting uniprocessor to A3 for the ubiquitous multi-cores would be the biggest help for science, I think.
Wrish
 
Posts: 390
Joined: Thu Jan 28, 2010 5:09 am

Re: GPU2 Suggestions Thread

Postby rfurgy » Sun Feb 28, 2010 7:35 am

theteofscuba wrote:I have no problem with the idea of having a single client. It might require some changes from what is currently ordinary, but that is OK. I don't make any claim that this is something easy to do, but it is preferrable.

Say you had an SMP client and one or more GPU client on a computer. each of those clients will run at a particular rate. Having a unified shouldn't require it being being faster than a non-unified client. Having a unified client would ensure that each machine is put to maximum use. Instead of having people download and configure like 3 clients, they would only have to download once client, and the unified client will handle scaling the resources available on the machine that is running it. I assume that *alot* of people have machines that have two or more cores *AND* compatible GPU! If these people aren't exactly computer saavy, they might only download the uni-processor client which is waaay under total capacity! Unified client's only requirement is to put most of these resources to use with as little effort as possible. Not everyone who runs folding@home is an expert of computer science. This is why I advocate a unfied client. I have no illusions that the unified client will actually theoretically be more optimized.

Very good point about a unified client being much more user friendly for the not so tech savy ppl. Totally valid.

I don't claim to be able to do the required work to make this happen, at least not yet. On the other hand I do know enough about computers to know that once in an operating system (no matter which one) everything has to be processed by the CPU, even what the GPU(s) are doing. Basically there is a command given and the CPU has to process it to get the end result. So the GPU client still has to use the CPU and system memory in order to run, simply because it's an application (software) running inside an operating system environment (more software). Then the operating system has to be able to communicate with the application running (takes resources) then be able to know what to do with it (CPU usage) then it reaches its destination and executes. Then you add anything extra like tracking (which is like pinging and IP address), or visual aid, application options (commands which can be executed) and you end up with your resources being used again. With that said, a unified client would not only make it easier for us donors, but would also produce faster results for the scientists on the other end by fully utilising the CPU, GPU, and memory.
rfurgy wrote:An example of the above: On an AMD Athlon 1400 system running Ubuntu. At CPU idle I open system monitor, I'll see a spike in the CPU usage then it drops back off after the process has been completed. Now if I keep watching, the system will idle down and stabilise. Then I'll notice small fluctuations, processes starting and stopping. Now if I take note of the CPU % at the time, it might read 6%. So then from there I do Edit > Preferences > Switch Interval to 5 seconds, I can effectively drop the CPU usage to 3%. This means that just by changing how often System Monitor pings my hardware for the feedback information I can effectively decrease CPU usage by 3% from that process alone.


The above mention of memory leads me to another thing, memory utilisation. I guess this isn't really client specific but could still apply here. I currently have 4GB of ram and am only using about 36% of it with SMP running, Firefox open, streaming music in Rythmbox, System Monitor open and chatting on Pidgin all at the same time. I wanted to bring this up because I ran into some Linux distros that are able to load themselves from a CD entirely into the RAM memory of a computer in order to make it run like it would be installed on a HDD. This shows me that applications, programs, processes, drivers, modules, etc.. can be loaded into the memory to produce faster operation of the system. Not sure exactly how a F@H client would have to be programmed for better memory use. In theory and evidence of other running applications, there should be a way to get some better use from it. Maybe a way to set an amount of memory specifically for pre seeking if available. Better yet, an option to load a Unified SMP+GPU Controller completely in the memory if enough resources are available.
rfurgy wrote:I think it might be able to have the two clients then acting as hubs at the drive simply there to access if the controller needed. Putting the controller of the client processes into the memory and having it pre seek pretty much would remove the slowest step in this part of the process (hard drive to memory). I would also think this would allow for enough time to line up what to do with what processor. Also as a side note, onboard graphics can actually communicate with other hardware faster than an add on card. Ever since the video card has been invented, the expansion slots in which they are installed have always slowed the process. The onboard has a much more efficient communication with the other hardware in a system (hence the reason Intel and AMD are planning to devote a CPU core entirely to graphics thereby removing the communication bottle neck entirely).

Just to end it off, the memory controllers are now being integrated into the CPU so there is much faster communication between the two (HyperThreading and HyperTransport (Dual Channel Memory)).

Well, sorry for the novel up there :wink: I could talk about computers all day. I think I'll leave it at that to stew over in everyone's brains for a while and see what we come up with.
Happy Folding Everyone.

P.S. To be honest, what we need is a very stripped down terminal-like custom built Linux distro (which works with Grub2 for multi-booting) that can utilise an entire computer with Uni-processor, SMP, GPU and memory unification. :idea:
Image
rfurgy
 
Posts: 37
Joined: Sun Feb 21, 2010 11:20 pm

Re: GPU2 Suggestions Thread

Postby theteofscuba » Sun Feb 28, 2010 8:11 am

currently you might notice that the actual science is done in seperate processes. for example, the uniprocessor core downloads FahCore_78.exe and spawns it into a sperate process. i don't use the SMP client at the moment. but for sake of argument, we should note that the GPU client downloads FahCore_11.exe for gpu. the unified processor would download FahCore_78 for eaach free cpu, and one FahCore_11.exe for each available GPU. this is all I expect. I run three uniprocessor cores because i don't want to use MPI or Deino as part of SMP clients on windows.

the problem as i see it is a bit different by today';s standard. when you are running console clients it spits out a message every time you pass 1% towards completing the work unit. if you are spawning multiple CPU and GPU science cores, then i can see how it is iffy on how to report status for all science cores being executed. you can probably report these seperately and a 3rd party app like FahMon would give you appropriate status reports.

I just want to see the client detect how many GPU are available, how many CPU are available and spawn a core tthat will make full use of the hardware.
theteofscuba
 
Posts: 171
Joined: Wed Dec 05, 2007 7:15 am

Re: GPU2 Suggestions Thread

Postby bruce » Sun Feb 28, 2010 8:29 am

theteofscuba wrote:. . . i don't use the SMP client at the moment. . . . because i don't want to use MPI or Deino as part of SMP clients on windows . . . I just want to see the client detect how many GPU are available, how many CPU are available and spawn a core tthat will make full use of the hardware.


First, I should point out that this is the GPU2 suggestion thread and you're not really making suggestions about the GPU2 client.

Nevertheless, there has been significant progress on one aspect of your suggestion. The new SMP2 core (known also as the A3 core) does a lot of what you're asking about. It detects how many CPUs are available and spawns a core that will make full use of all of them. (...and it does not use MPI or Deino). The older version of SMP is still being phased out, so we're not quite there yet, but it's an important step forward along the lines that you're suggesting.

A new version of the GPU2 client will (called GPU3) is being developed. The features have not been announced yet so I don't know if it will provide the GPU feature that you're requesting. We do know that it will be released in more than one phase, though, and I interpret that to mean that of the plans are for things that can't be done yet.
bruce
 
Posts: 21413
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU2 Suggestions Thread

Postby shdbcamping » Sun Feb 28, 2010 8:57 am

Wrish wrote:I feel GPU should remain separate from SMP... it's too different. SMP has access to so much RAM, limited parallelism, and very fast serial operation. A GPU has so little RAM (the alternative Tesla is way too expensive right now), tons of parallelism, and much lower clocks. Also, GPU is inherently less stable due to the pace of GPU development and the heretofore frequent turnover in driver code - as compared to CPU generations that stress backward compatibility - not to mention GPGPU is barely out of the Wild West stages.

If a novice can install a GPU systray as well as a uniprocessor systray, then with a little conflict avoidance in the client installers, a novice can install the two in combination; an SMP systray would be a nice addition, though.

P.S. That Linux client doesn't support GPUs. To run GPUs at all so far requires a third-party wrapper, or Windows.

Intel has been exploring this since Larabie. Sandy lake is coming and my guess is... Tie for GPU more directly to CPU :wink: , Kinda a 'reverse' Tesla dual GPU combination. Just a quicker path than a 'bus' to the core cpu. Only time will tell... and that's the point :D

Please provide some links to the assumptions :wink: . There are a lot of folk that are speculating on tommorrow and not today :) . If this did not happen there would be no need to advance anything. Try to interject your opnion as "your take" instead as a contradiction or an 'absolute :e( . You are very knowledgable, that I will not deny :wink: . But sometimes, folk just want to post an opinion. Please, let us learn to allow that some choose a different path. Right or wrong is not important.

This is just a "suggestion" thread :ewink: . I need to learn to 'agree to disagree' as well, trust me :lol: .

Sean
shdbcamping
 
Posts: 587
Joined: Mon Nov 10, 2008 7:57 am

Re: GPU2 Suggestions Thread

Postby rfurgy » Sun Feb 28, 2010 9:25 am

theteofscuba wrote:the problem as i see it is a bit different by today';s standard. when you are running console clients it spits out a message every time you pass 1% towards completing the work unit. if you are spawning multiple CPU and GPU science cores, then i can see how it is iffy on how to report status for all science cores being executed. you can probably report these seperately and a 3rd party app like FahMon would give you appropriate status reports.

I just want to see the client detect how many GPU are available, how many CPU are available and spawn a core tthat will make full use of the hardware.

I think you're on to something there. If I'm not mistaken the FahCore_a1 in SMP kind of does that. I think the problem with those is it assumes four CPU cores and spawns four folding cores. From what I'm seeing, the FahCore_a3 is starting a single process but is awesome at multi-threading. So basically its kinda working the same way Intel's Hyper-Threading works. A single processor/core running multiple threads by the use of a virtual processor. SMP is running in the same manner but from the other side in order to better utilise multiple threads per CPU/core. I assume this would actually engage the dual channel memory more efficiently as well.

But back to what you were saying. I think developing a stable version of each client type that can effectively detect your hardware and work it regardless of # of CPU's/core's with what ever WU it gets. Maybe even make a set of WU's handed out specificity for benchmarking and give what ever the standard points are at the time when completed. Then have the client make some sort of hardware and performance log for the server to synch with just to make sure it isn't handing you a WU your system can't handle based on those logs. Which I'm sure a highly tuned WU delegation could help minimise unproductive situations (ex, certain cores getting errors and deleting entire work). I don't know if they do automatic debug logs and upload them or not, but if they don't, the clients could very well use it. I've lost three FahCore_a1's this past week myself which I'm sure would have been better utilised if there was no error and got turned in.

As for the 3rd party app, that's pretty much how it is now. Thing with a 3rd party app is it's an additional program stealing computing power from the client(s) folding at the time. The GUI clients for Windows are nice because they have built in monitoring (I would think the built in one would require less resources). I rather liked being able to take a peak at it to visually see the state its in too, but in no way do I think that part of the program should be active less called upon.

Thinking about all this, Linux support for the GPU2 would be awesome. Good multiple GPU detection and utilisation as well. Better RAM memory utilization (high end systems come no where near maxing the RAM in them most of the time) to try and help speed the communication process to and from the GPU(s). Too bad there isn't a way to transfer the data out through the ports on a video card. I would think that would produce lightning speeds because that is the normal flow of their design.

Just a thought. The operating systems we are using to run these clients in are very good and utilizing the whole computer at any given time (usually). Matter of fact if I go into terminal and call up my Linux kernel version I get this:
Code: Select all
#48-Ubuntu SMP Fri Oct 16 14:05:01 UTC 2009

Notice anything familiar? Yep, SMP. I'm curious to find out if the Linux 6.29 SMP client could be patched into the kernel on a very stripped down command prompt operating system which is made to only install what you need to drive your hardware and start folding with full utilisation of all hardware. Not to mention many newer mother boards are capable of driving the hardware plugged into it. Pretty much the folding client should be the desktop. :idea:
rfurgy
 
Posts: 37
Joined: Sun Feb 21, 2010 11:20 pm

Re: GPU2 Suggestions Thread

Postby amdfan404 » Thu Mar 18, 2010 4:54 am

I'm really disappointed with the GPU client, I though it would be using my GPU and soon my other ASUS ATI HD4670 in CrossFireX, but it also uses 100% of 1 core, well that is a shame because I won't be able to help out since I participate in BOINC WCG (World Community Grid) and I need my 2 cores (soon to be 4). If this client ever uses 10% or less of my CPU I would run this client and if so I'd like to use both cores (5% each) so my BOINC WUs on CPU0 won't delay too much, same for CPUs with 4, 6 or more cores, thanks.

PS: I don't want to give 1 of my future 4 cores away, I've been wanting to get 4 cores to add them to The WCG, so if this GPU-client could go into BOINC would be great.
amdfan404
 
Posts: 23
Joined: Thu Mar 18, 2010 3:40 am

Re: GPU2 Suggestions Thread

Postby theteofscuba » Thu Mar 18, 2010 5:05 am

The development of OpenMM and OpenCL is promising. The F@H team or collaborating group can write the science algorithms once in OpenCL code, then compile it to whatever platform that supports OpenCL - x86, ATI, NVIDIA, PowerPC, whatever has an OpenCL library. If openCL actually does enumerate devices, which I suspect it does then it should be easier to have a "unified" client for single, multi-core and/or multiple gpu machines without requiring an expert admin to install. today's SMP client is tricky to install, about as tricky as teaching recruits how to install multiple uniprocessor cleints , which is my main complaint about recruting new people to F@H. also, convincing them to download clients for gpu(s) *AND* every cpu available is much too complicated and time consuming for each recruit to be taught.

rfurgy wrote:
theteofscuba wrote:the problem as i see it is a bit different by today';s standard. when you are running console clients it spits out a message every time you pass 1% towards completing the work unit. if you are spawning multiple CPU and GPU science cores, then i can see how it is iffy on how to report status for all science cores being executed. you can probably report these seperately and a 3rd party app like FahMon would give you appropriate status reports.

I just want to see the client detect how many GPU are available, how many CPU are available and spawn a core tthat will make full use of the hardware.

I think you're on to something there. If I'm not mistaken the FahCore_a1 in SMP kind of does that. I think the problem with those is it assumes four CPU cores and spawns four folding cores. From what I'm seeing, the FahCore_a3 is starting a single process but is awesome at multi-threading. So basically its kinda working the same way Intel's Hyper-Threading works. A single processor/core running multiple threads by the use of a virtual processor. SMP is running in the same manner but from the other side in order to better utilise multiple threads per CPU/core. I assume this would actually engage the dual channel memory more efficiently as well.

But back to what you were saying. I think developing a stable version of each client type that can effectively detect your hardware and work it regardless of # of CPU's/core's with what ever WU it gets. Maybe even make a set of WU's handed out specificity for benchmarking and give what ever the standard points are at the time when completed. Then have the client make some sort of hardware and performance log for the server to synch with just to make sure it isn't handing you a WU your system can't handle based on those logs. Which I'm sure a highly tuned WU delegation could help minimise unproductive situations (ex, certain cores getting errors and deleting entire work). I don't know if they do automatic debug logs and upload them or not, but if they don't, the clients could very well use it. I've lost three FahCore_a1's this past week myself which I'm sure would have been better utilised if there was no error and got turned in.

As for the 3rd party app, that's pretty much how it is now. Thing with a 3rd party app is it's an additional program stealing computing power from the client(s) folding at the time. The GUI clients for Windows are nice because they have built in monitoring (I would think the built in one would require less resources). I rather liked being able to take a peak at it to visually see the state its in too, but in no way do I think that part of the program should be active less called upon.

Thinking about all this, Linux support for the GPU2 would be awesome. Good multiple GPU detection and utilisation as well. Better RAM memory utilization (high end systems come no where near maxing the RAM in them most of the time) to try and help speed the communication process to and from the GPU(s). Too bad there isn't a way to transfer the data out through the ports on a video card. I would think that would produce lightning speeds because that is the normal flow of their design.

Just a thought. The operating systems we are using to run these clients in are very good and utilizing the whole computer at any given time (usually). Matter of fact if I go into terminal and call up my Linux kernel version I get this:
Code: Select all
#48-Ubuntu SMP Fri Oct 16 14:05:01 UTC 2009

Notice anything familiar? Yep, SMP. I'm curious to find out if the Linux 6.29 SMP client could be patched into the kernel on a very stripped down command prompt operating system which is made to only install what you need to drive your hardware and start folding with full utilisation of all hardware. Not to mention many newer mother boards are capable of driving the hardware plugged into it. Pretty much the folding client should be the desktop. :idea:




a
theteofscuba
 
Posts: 171
Joined: Wed Dec 05, 2007 7:15 am

Re: GPU2 Suggestions Thread

Postby Zagen30 » Thu Mar 18, 2010 6:37 am

amdfan404 wrote:I'm really disappointed with the GPU client, I though it would be using my GPU and soon my other ASUS ATI HD4670 in CrossFireX, but it also uses 100% of 1 core, well that is a shame because I won't be able to help out since I participate in BOINC WCG (World Community Grid) and I need my 2 cores (soon to be 4). If this client ever uses 10% or less of my CPU I would run this client and if so I'd like to use both cores (5% each) so my BOINC WUs on CPU0 won't delay too much, same for CPUs with 4, 6 or more cores, thanks.

PS: I don't want to give 1 of my future 4 cores away, I've been wanting to get 4 cores to add them to The WCG, so if this GPU-client could go into BOINC would be great.


Look on the ATI board for info about setting Environment Variables. If you set them right, you can significantly reduce CPU usage by the ATI GPU client. I'm pretty sure that many people have gotten it to under 10% CPU usage.
Image
Zagen30
 
Posts: 1814
Joined: Tue Mar 25, 2008 12:45 am

Re: GPU2 Suggestions Thread

Postby toTOW » Thu Mar 18, 2010 2:32 pm

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.

FAH-Addict : latest news, tests and reviews about Folding@Home project.

Image
User avatar
toTOW
Site Moderator
 
Posts: 8931
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France

Re: GPU2 Suggestions Thread

Postby amdfan404 » Fri Mar 19, 2010 7:09 am

Thx for the help, it worked perfectly :D

Anyway I have suggested that the F@H GPU Client be added to the BOINC Client on BOINC Message Board, since the CPU Client is on it as part of the the WCG, so the addition seemed logical to me. Of course I pointed out to them what you pointed out to me :)
amdfan404
 
Posts: 23
Joined: Thu Mar 18, 2010 3:40 am

Re: GPU2 Suggestions Thread

Postby bruce » Fri Mar 19, 2010 7:20 am

F@H is not part of WCG nor is it part of BOINC -- and that includes both the CPU client and the GPU client. It's an independent set of research projects run by Stanford University.
bruce
 
Posts: 21413
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU2 Suggestions Thread

Postby amdfan404 » Fri Mar 19, 2010 7:48 am

My bad, I got confused with Human Proteome Folding Phase 2.

Edit: I also noted that with this environment variables the execution is a lot faster, about 90 iteration/s, weird... when the client used 100% of 1 of my cores and didn't have the variables, i got like 20 iteration/s max. on the 'Display' Window, I know this is faster when that window is not open.

So I suggest F@H add those variables according to the volunteer hardware available I think this is kinda of urgent given how it affects both F@H and the volunteers' PCs with little or no experience but that got interested enough to want to help/participate; and getting to the stage of learning about F@H, how it works/helps and how they can participate with their hardware is more than most people are willing to do. You could add a Java/Flash analyzer to check their hardware telling them if they can participate or not with the CPU/GPU Clients, then offer them hyper-links to download any or both clients, and then improve the GPU Client so it can add those environment variables or, what i think would be better, handle those parameters internally and automatically. I think you could get a lot more volunteers a lot faster and easier, plus you save support time in forums. Maybe even a table on your webpage could help potential volunteers see if they have the right hardware for F@H.

Best of lucks, cheers.
amdfan404
 
Posts: 23
Joined: Thu Mar 18, 2010 3:40 am

PreviousNext

Return to General GPU client issues

Who is online

Users browsing this forum: No registered users and 1 guest

cron