Scheduling single work units on a cluster

If you're new to FAH and need help getting started or you have very basic questions, start here.

Moderators: Site Moderators, FAHC Science Team

Post Reply
mangler
Posts: 2
Joined: Mon Feb 07, 2011 7:56 am

Scheduling single work units on a cluster

Post by mangler »

Hi,

I know this is a bit of a tough question, and probably deserves a wiki page or something other than a post, but I'll start here.

OS: Linux
Version of FAH: Latest

I have been requested to look in to the potential of setting up a folding at home cluster, and have an existing grid management too that I need to integrate it with.

What is the current length of time needed to process a work unit on say a 2 cores of a Xenon x5500 series processor? I need to keep the cores in use down per system so I may need to run multiple independent copies of the client to promote work sharing with other cluster jobs.

Are there any things I should be aware of when setting up a large supercomputer type job?

What scheduling systems have you used, what worked and didn't?

Thanks in advance!
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Scheduling single work units on a cluster

Post by bruce »

Welcome to foldingforum.org, mangler.

Yes, it's a tough question. FAH is not designed to run on clusters but inasmuch as each node can be used independently, each node can be considered an independent computer as long as there's some scheduling plan. Stanford does not provide support for scheduling scripts but you might find other forum members who have some experience in that sort of thing, so it's an excellent question.

I suggest you start by reading the other topics mentioning the word "cluster" but I don't remember that theyre is any specific help with that question.

Does your Linux have both 64- and 32-bit libraries?

The duration of a job is a general question, though. Fundamentally there are several classifications of WUs and most run on Linux. The basic question is how many cpus/threads are on a typical node? Nodes which have a uniprocessor are treated quite differently than nodes that have smp-capable CPUs. FAH makes excellent use of an i7/Xeon that can run 8 or more threads locally but does not support inter-node communications. Nodes with dual or quad processors are in an intermediate class. If your cluster is non-uniform, it can get pretty tricky.
mangler
Posts: 2
Joined: Mon Feb 07, 2011 7:56 am

Re: Scheduling single work units on a cluster

Post by mangler »

Hi Bruce:

Yes, 64 and 32bit libraries are available

Unfortunately due to the other things on the cluster, we schedule each core independently if at all possible to keep our task churn rate as high as possible. I am currently using the -smp 2 flag to allow this.

Currently I am using the bigmem workunits, but if the smaller ones would be shorter, I would prefer shorter quicker job sets, as if I have to stop folding on a node for some reason, it may not get back to that node for several weeks, resulting in wasted scientific effort.

Thanks for any advice.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Scheduling single work units on a cluster

Post by bruce »

Big packets (in the client configuration) is different than -bigadv in the parameter list and I don't know which one you mean by bigmem. Specifying -smp 2 is reasonable if all your nodes are dual processors but can be a problem if some are Hyperthreaded single core/dual thread chips.

Most of the deadlines for -smp are around 4 to 6 days and anything that exceeds the deadline will be discarded by the server. The actual processing speed depends mostly on the GFLOPS capability allocated to the FahCore. If the node is a Quad, for example, limiting it to -smp 2 will only use half of the resources.

The use of the -oneunit flag will end the client whenever a WU is finished rather than downloading a new assignment.
P5-133XL
Posts: 2948
Joined: Sun Dec 02, 2007 4:36 am
Hardware configuration: Machine #1:

Intel Q9450; 2x2GB=8GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460; Windows Server 2008 X64 (SP1).

Machine #2:

Intel Q6600; 2x2GB=4GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460 video card; Windows 7 X64.

Machine 3:

Dell Dimension 8400, 3.2GHz P4 4x512GB Ram, Video card GTX 460, Windows 7 X32

I am currently folding just on the 5x GTX 460's for aprox. 70K PPD
Location: Salem. OR USA

Re: Scheduling single work units on a cluster

Post by P5-133XL »

You may find that running the uniprocessor client is a better fit. They tend to be much smaller then the smp WU's allowing for a higher task churn rate. Also, the deadlines tend to be on the order of a month or more so if you can't get back to the specific WU for a significant time then the WU is much less likely to have reached the deadline when you do get to it. You can also run multiple uniprocessor clients to fill up a node. The problems are that it is unlikely that you will make as much PPD as the smp WU's and there is a limit of 16 unique machineID's per machine. I'm just saying that it may be worth considering.
Image
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Scheduling single work units on a cluster

Post by 7im »

I agree wit P5, the CPU client is probably better for non-dedicated nodes. And I doubt the points are the biggest concern.
P5-133XL wrote:...and there is a limit of 16 unique machineID's per machine...
No limit on Linux clients. ;)
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Scheduling single work units on a cluster

Post by bruce »

You might want to read this topic on the same subject (including a comment from VijayPande):
viewtopic.php?f=55&t=17373
Post Reply