1000PPD increase with Intel Quad & 2 *ubuntu SMP clients

Testing the new OS X/Linux client in SMP mode.

Postby Ivoshiee » Tue Dec 04, 2007 12:38 pm

It is a nice HOWTO, but there are much easier ways to get the FAH v6 running in SMP mode under Linux.

One of those ways it the finstall script:

Code: Select all
wget http://ra.vendomar.ee/~ivo/finstall; chmod +x ./finstall; ./finstall smp

Note: If you still want to run more than 1 SMP client then add a "dirs" option:
Code: Select all
wget http://ra.vendomar.ee/~ivo/finstall; chmod +x ./finstall; ./finstall smp dirs 20


If you insist of running the FAH client in visible terminal window then you can do this by running:
Code: Select all
~/foldingathome/CPU1/FaH

but you can run it at the background:
Code: Select all
~/foldingathome/folding start

or as a service:
Code: Select all
~/foldingathome/installService; ~/foldingathome/folding start


Note: The 32bit support for 64bit Linux must be installed separately.
Ivoshiee
Site Moderator
 
Posts: 1286
Joined: Sun Dec 02, 2007 12:05 am
Location: Estonia

Postby toTOW » Tue Dec 04, 2007 1:01 pm

Your PPD increase are coherent with what happens in other situations :

Quad core under windows, running two linux in VMware on two CPU per VM.
Quad core under windows, running two win smp clients (not always very stable).

In fact, you're exploiting the fact that on quad cores, SMP doesn't use the full processing power. Some time is lost in synchronizing threads. By running two SMP client on a signle quad core, you simply use this spare power.
Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.

FAH-Addict : latest news, tests and reviews about Folding@Home project.

Image
User avatar
toTOW
Site Moderator
 
Posts: 8711
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France

Postby Ivoshiee » Tue Dec 04, 2007 1:14 pm

00john00 wrote:Thanks for the info. How would you assign the clients to the appropriate cores? Running two clients and assigning them to the appropriate cores the purpose of this HOWTO.

How much you gain by assigning a FAH client to the fixed core?

At the moment you can edit FaH files and add "taskset -c X,Y" in front of the FAH client startup. If this is a truly needed functionality then I can add it into the finstall script...
Ivoshiee
Site Moderator
 
Posts: 1286
Joined: Sun Dec 02, 2007 12:05 am
Location: Estonia

Postby 7im » Tue Dec 04, 2007 4:39 pm

toTOW wrote:..

By running two SMP client on a signle quad core, you simply use this spare power.


No, it's not that simple. You are also returning work units more more slowly, which is contrary to the nature and design of "High Performance" clients such as the SMP client.

The project recommendation is to run one fahcore per cpu core.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
User avatar
7im
 
Posts: 14648
Joined: Thu Nov 29, 2007 4:30 pm
Location: Arizona

Postby ChasR » Tue Dec 04, 2007 5:33 pm

You get a 1000 ppd increase in performance without assigning an instance to specific cores, so Ivo's question needs answering. I'll do some testing on one of my dedicated rigs and report back.
Image
User avatar
ChasR
 
Posts: 698
Joined: Sun Dec 02, 2007 5:36 am
Location: Atlanta, GA

Postby Rebel44 » Tue Dec 04, 2007 5:44 pm

If Stanford wants to avoid this they have few choices:

Allow users to set SMP client to create 2 FAH processes for each detected core - this would make sure that FAH is using all available CPU performance.

Adjust points so that WUs which are finished sooner have bigger bonus.
User avatar
Rebel44
 
Posts: 21
Joined: Sun Dec 02, 2007 12:26 pm
Location: Prague - Czech Republic

Postby Ivoshiee » Tue Dec 04, 2007 5:49 pm

Rebel44 wrote:If Stanford wants to avoid this they have few choices:

Allow users to set SMP client to create 2 FAH processes for each detected core - this would make sure that FAH is using all available CPU performance.

Adjust points so that WUs which are finished sooner have bigger bonus.

We have had this kind of "talks" before and we do not need to have this again. You can do whatever things with your computer, but Stanford will encourage the policy 1 FAH Uni-proc client per 1 real CPU (core) or 1 FAH SMP client per 4 CPU (core). They will not force this policy, but they like to get the WU results back as soon as possible and running more FAH clients will hinder this.
Ivoshiee
Site Moderator
 
Posts: 1286
Joined: Sun Dec 02, 2007 12:05 am
Location: Estonia

Postby hrsetrdr » Tue Dec 04, 2007 6:04 pm

Maybe my view is overly simplistic, but I'm seeing that one Q6600= two E6600's. If I run 2 fah clients on my Q6600, doesn't the TPF basically double?
User avatar
hrsetrdr
 
Posts: 179
Joined: Sun Dec 02, 2007 4:29 pm
Location: In the Fold somewhere in SoCal.

Postby 7im » Tue Dec 04, 2007 6:22 pm

hrsetrdr wrote:Maybe my view is overly simplistic, but I'm seeing that one Q6600= two E6600's. If I run 2 fah clients on my Q6600, doesn't the TPF basically double?


Close to it, yes. However, the SMP client does not yet scale well, nor reach 100% CPU usage. E6600s are CPU bound, Q6600s not so much. And the second client on a Q6600 can use some of the unused cycles from the first client, which is part of the PPD increase mentioned, the other part is moving to the currently more efficient Linux client over from using the Windows SMP client.
User avatar
7im
 
Posts: 14648
Joined: Thu Nov 29, 2007 4:30 pm
Location: Arizona

Postby Oak37 » Tue Dec 04, 2007 6:33 pm

7im wrote:
toTOW wrote:..

By running two SMP client on a signle quad core, you simply use this spare power.


No, it's not that simple. You are also returning work units more more slowly, which is contrary to the nature and design of "High Performance" clients such as the SMP client.

The project recommendation is to run one fahcore per cpu core.

I think 7im's point is very valid as I've noticed on my Ubuntu box (running one smp client on 4 cores) that I receive work units with preferred deadlines of 24 hours (the final deadlines are still three days though).
What projects were your running before on one smp client? What projects are you getting now?
User avatar
Oak37
 
Posts: 134
Joined: Tue Dec 04, 2007 6:21 pm
Location: Ireland

Postby ChasR » Tue Dec 04, 2007 10:04 pm

As Oak points out, if you're running p305x you gain nothing, at best, from running two instances. If you get two p2608s, p2609s, or p2652s you'll make less ppd than running a single instance of either. At least one of the two instances has to be a P2604, p2605, p2651, or p2653 for there to be any gain in production at all.
User avatar
ChasR
 
Posts: 698
Joined: Sun Dec 02, 2007 5:36 am
Location: Atlanta, GA

Postby ChasR » Tue Dec 04, 2007 11:24 pm

Ivoshiee wrote:How much you gain by assigning a FAH client to the fixed core?

Q6600 @ 3.33 GHz, Ubuntu 7.10, 2 GB ram @ 925 MHz, 4-4-4-12
Unassigned:
Instance 1 2653 (Run 22, Clone 193, Gen 17)
TPF: 10:44.8
PPD: 2358.3

Instance 2 2653 (Run 22, Clone 126, Gen 17)
TPF: 10:56.9
PPD: 2314.9
Total PPD: 4673.19

Assigned:
Instance 1 0,1 2653 (Run 22, Clone 193, Gen 17)
TPF: 10:42.0
PPD: 2368.6

Instance 2 2,3 2653 (Run 22, Clone 126, Gen 17)
TPF: 10:44.0
PPD: 2361.2
Total PPD: 4729.84

Improvement from fixed assingment of each instance to two cores: 56.65 ppd

As close as the results are, testing needs to use the same frames from the same WUs to ensure WU variability doesn't skew the results.
User avatar
ChasR
 
Posts: 698
Joined: Sun Dec 02, 2007 5:36 am
Location: Atlanta, GA

Postby bruce » Tue Dec 04, 2007 11:43 pm

hrsetrdr wrote:Maybe my view is overly simplistic, but I'm seeing that one Q6600= two E6600's. If I run 2 fah clients on my Q6600, doesn't the TPF basically double?


Yes, that is too simplistic, for several reasons.

A lot of this still points back to the fact that the Q6600 is two independent E6600s. A true quad would have four cores connected to a single cache with enough bandwidth to keep them all happy.

Each half has an independent cache putting a strain on Northbridge when portions of the same WU are split across the die. This is one place where you either need to run one client or you need to set affinity properly on two clients.

The other fact is that if the server detects 4 cores, it will give you work assignments that don't work well on only one half of the Q6600 (ie. should not be run on an E6600, and especially on portions of two E6600s separated by Northbridge. You have to face the fact that Stanford is TRYING to make the best use of the hardware it detects and users who encourage people to disrupt that process are working contrary to Stanford's best efforts to improve the throughput of the overall FAH system. (I really don't want them to decide that server-based WU assignment customization doesn't work, just because users are unwilling to cooperate with the program. I also don't want to see a bunch of WUs with 30hr deadlines duplicated because they were turned in after two days, simply because they still get points even though somebody else had to run them.)

In my mind, running two VMs, each one appearing as a Duo, is a lot closer to acceptable than simply running two clients in the same OS. You can set the affinity for each VM so that they stay entirely in the same cache and CPU-detction probably works properly.
bruce
 
Posts: 22437
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Postby Ultra Nexus » Wed Dec 05, 2007 1:41 am

Great guide for Linux, 00john00. But havent you forgot to include the chmod command to give the fah6 program, full access to the file system?

I grant it like this: chmod +x ./fah6

BTW, Im glad you liked my guide. :D
Ultra Nexus
 
Posts: 35
Joined: Sun Dec 02, 2007 8:51 pm
Location: Buenos Aires, ARG

Postby _ikki_ » Wed Dec 05, 2007 8:45 am

hi,

The physical processor, on a core2Quad (that is 2 E6600), are identified by the pair of core 0/2 and 1/3.

Modify the affinity in consequence

Code: Select all
taskset -c 0,2 ./fah6 -smp -verbosity 9


and

Code: Select all
taskset -c 1,3 ./fah6 -smp -verbosity 9


I tried this on my Q6600 (before to overclock it), but I finally returned to 1 client because, yes the gain of points is real, but you take more time to return the results to Stanford, and it is not acceptable for a "high performance " client.

Cheers :p
_ikki_
 
Posts: 30
Joined: Wed Dec 05, 2007 8:38 am

Next

Return to Intel Mac OS X & Linux SMP client v6.0

Who is online

Users browsing this forum: No registered users and 1 guest

cron