Dual X5650 giving me 20k PPD, why so slow?

A forum for discussing FAH-related hardware choices and info on actual products (not speculation).

Moderator: Site Moderators

Forum rules
Please read the forum rules before posting.
Nathan_P
Posts: 1180
Joined: Wed Apr 01, 2009 9:22 pm
Hardware configuration: Asus Z8NA D6C, 2 x5670@3.2 Ghz, , 12gb Ram, GTX 980ti, AX650 PSU, win 10 (daily use)

Asus Z87 WS, Xeon E3-1230L v3, 8gb ram, KFA GTX 1080, EVGA 750ti , AX760 PSU, Mint 18.2 OS

Not currently folding
Asus Z9PE- D8 WS, 2 E5-2665@2.3 Ghz, 16Gb 1.35v Ram, Ubuntu (Fold only)
Asus Z9PA, 2 Ivy 12 core, 16gb Ram, H folding appliance (fold only)
Location: Jersey, Channel islands

Re: Dual X5650 giving me 20k PPD, why so slow?

Post by Nathan_P »

Nilem wrote:
Nathan_P wrote:If you haven't already, try linux as the OS. That 20k is low for a pair of x5650's
I didn't try, I actually tought windows would be a bit faster. Do you have an estimation on how many more points should I get under Linux?
Its been a long time since I ran them but under core A7 they were getting roughly 50k
Image
gunnarre
Posts: 567
Joined: Sun May 24, 2020 7:23 pm
Location: Norway

Re: Dual X5650 giving me 20k PPD, why so slow?

Post by gunnarre »

MeeLee wrote: Disabling HT is also a good way to save power.
On some CPUs, folding with hyperthreading is actually more efficient than without. For example, Chris found that on a 3950X, enabling hyperthreading and folding on max-2 threads was the most efficient way to CPU fold. At least on that processor: https://greenfoldingathome.com/2020/08/ ... threading/
Image
Online: GTX 1660 Super, GTX 1080, GTX 1050 Ti 4G OC, RX580 + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 960, GTX 950
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Dual X5650 giving me 20k PPD, why so slow?

Post by bruce »

A lot depends on the specific project.

Suppose your WU uses 75% FP32 instructions, 3% FP64 instructions and 22% other instructions. Your FPU will be 78% busy.
Suppose another WU running on the other "half" of a hyperthreaded pair does the same? The shared FPU will be 100% busy (since it can't be (78+78)% busy but you'll be getting more total work done: 100/78 = 1.28 which is greater than 1.0

OTOH, if it's not FAH running on the other "half" it may use as little as 0% FP instructions and much more than 22% other instructions so the competition for shared resources ceases to be a problem.
Hopfgeist
Posts: 71
Joined: Thu Jul 09, 2020 12:07 pm
Hardware configuration: Dell T420, 2x Xeon E5-2470 v2, NetBSD 10, SunFire X2270 M2, 2x Xeon X5675, NetBSD 9; various other Linux/NetBSD PCs, Macs and virtual servers.
Location: Germany

Re: Dual X5650 giving me 20k PPD, why so slow?

Post by Hopfgeist »

Neil-B wrote:With xeons (at least the more recent ones) the contention impacts the scenario less than other multi thread cpus (or at least that is my experience) .. the output/throughput from my 14 core xeons increases significantly (admittedly not quite double but fairly close) when running 28 threads ... but worth a few tests using an offline core/wu to see the real difference.
I have the X5675 (which is the same family as the 5650, and should be quite similar except for the clock speed), and I get almost identical results with 24 threads vs. 12 threads with CPU affinity set to use only the first thread of each core.

That said, I get close to 100k PPD on a dual-CPU (12 core, 24 thread) system, so 20K for the 5650 seems a bit slow.

Cheers,
HG
Image
Dell PowerEdge T420: 2x Xeon E5-2470 v2
MeeLee
Posts: 1375
Joined: Tue Feb 19, 2019 10:16 pm

Re: Dual X5650 giving me 20k PPD, why so slow?

Post by MeeLee »

Hopfgeist wrote:
Neil-B wrote:With xeons (at least the more recent ones) the contention impacts the scenario less than other multi thread cpus (or at least that is my experience) .. the output/throughput from my 14 core xeons increases significantly (admittedly not quite double but fairly close) when running 28 threads ... but worth a few tests using an offline core/wu to see the real difference.
I have the X5675 (which is the same family as the 5650, and should be quite similar except for the clock speed), and I get almost identical results with 24 threads vs. 12 threads with CPU affinity set to use only the first thread of each core.

That said, I get close to 100k PPD on a dual-CPU (12 core, 24 thread) system, so 20K for the 5650 seems a bit slow.

Cheers,
HG
Manually setting affinity isn't really feasible in Windows, as with each new WU the affinity gets reset.
Nilem
Posts: 9
Joined: Wed Oct 07, 2020 7:47 pm

Re: Dual X5650 giving me 20k PPD, why so slow?

Post by Nilem »

PantherX wrote:Here's a Google Sheets that have some information for you to make an informed decision:
Same set of hardware for each test: https://docs.google.com/spreadsheets/d/ ... edit#gid=0
Thanks! this is very informative!
MeeLee wrote:If you have PCIE ports available, it may be better to plug a few GPUs in. As a single GT1030 will get a higher score than both of those CPUs.
The HP Proliant GL380 I run does have a extra PCI-E, and I did try to put a GPU I had around but since my server is a 1U, I only have one available slot and my GPU required 2... I was about to search for a GPU that could fit, but also reading that adding non-HP hardware will make the fans run at full speed because of the lack of sensor, and this server has the ability of making the same noise as a jet.
MeeLee
Posts: 1375
Joined: Tue Feb 19, 2019 10:16 pm

Re: Dual X5650 giving me 20k PPD, why so slow?

Post by MeeLee »

Nilem wrote:
PantherX wrote:Here's a Google Sheets that have some information for you to make an informed decision:
Same set of hardware for each test: https://docs.google.com/spreadsheets/d/ ... edit#gid=0
Thanks! this is very informative!
MeeLee wrote:If you have PCIE ports available, it may be better to plug a few GPUs in. As a single GT1030 will get a higher score than both of those CPUs.
The HP Proliant GL380 I run does have a extra PCI-E, and I did try to put a GPU I had around but since my server is a 1U, I only have one available slot and my GPU required 2... I was about to search for a GPU that could fit, but also reading that adding non-HP hardware will make the fans run at full speed because of the lack of sensor, and this server has the ability of making the same noise as a jet.
Being stuck at a single slot GPU, limits the capabilities of GPU folding. There are some single slot GT 1050 GPUs, or below.
Adding a GPU shouldn't increase fan speed by much. A 1050 operates at 75W, a 1030 at 35W, and a 730 at 25W. The GPU fans are steered by the Nvidia software, but you could manually set them however you like with MSI afterburner.
Nilem
Posts: 9
Joined: Wed Oct 07, 2020 7:47 pm

Re: Dual X5650 giving me 20k PPD, why so slow?

Post by Nilem »

Hopfgeist wrote: That said, I get close to 100k PPD on a dual-CPU (12 core, 24 thread) system, so 20K for the 5650 seems a bit slow.
Interesting. Do you run them under Linux or Windows?
I see the the X5675 is 1 year younger than the X5650. Maybe there is more to it that just clock speed? But if so, I can't find what on intel's website.
MeeLee
Posts: 1375
Joined: Tue Feb 19, 2019 10:16 pm

Re: Dual X5650 giving me 20k PPD, why so slow?

Post by MeeLee »

Nilem wrote:
Hopfgeist wrote: That said, I get close to 100k PPD on a dual-CPU (12 core, 24 thread) system, so 20K for the 5650 seems a bit slow.
Interesting. Do you run them under Linux or Windows?
I see the the X5675 is 1 year younger than the X5650. Maybe there is more to it that just clock speed? But if so, I can't find what on intel's website.
May also depend on what memory he's running.
Apparently these CPUs run anywhere from 800Mhz to 1333Mhz of RAM.
I bet 800Mhz Ram could be restrictive.
Hopfgeist
Posts: 71
Joined: Thu Jul 09, 2020 12:07 pm
Hardware configuration: Dell T420, 2x Xeon E5-2470 v2, NetBSD 10, SunFire X2270 M2, 2x Xeon X5675, NetBSD 9; various other Linux/NetBSD PCs, Macs and virtual servers.
Location: Germany

Re: Dual X5650 giving me 20k PPD, why so slow?

Post by Hopfgeist »

MeeLee wrote:
Nilem wrote:
Hopfgeist wrote: That said, I get close to 100k PPD on a dual-CPU (12 core, 24 thread) system, so 20K for the 5650 seems a bit slow.
Interesting. Do you run them under Linux or Windows?
I see the the X5675 is 1 year younger than the X5650. Maybe there is more to it that just clock speed? But if so, I can't find what on intel's website.
May also depend on what memory he's running.
Apparently these CPUs run anywhere from 800Mhz to 1333Mhz of RAM.
I bet 800Mhz Ram could be restrictive.
I run the Linux client on NetBSD in the Linux emulation, which has very little (for FAH practically zero) overhead. It just maps Linux system calls to NetBSD system calls. I use monit to check for the presence of the FahCore process and then have it rescheduled to only use CPUs 0 through 11, which are (on NetBSD) the first threads of each physical core.

The standard NetBSD scheduler is not (yet?) smart enough to avoid running two (process) threads on two (CPU) threads of the same physical core, even though the kernel detects the difference, so it is significantly faster with the CPU affinity set explicitly. Task monitoring software (such as "top") shows only 50% CPU usage, but in fact the FPUs are fully loaded, and get about as hot as when using 24 calculation threads.

I am quite certain I use 1333 MHz RAM; the machine is a Sun Fire X2270 M2 with 40 GB RAM. Even at 9 years old it's still a very capable machine.

According to Passmark, the X5675 is only a bit faster, even less than the difference in clock speed. Since Passmark tests more than just the CPU core, it seems plausible that they use the same architecture.

It certainly would not be that much slower that it ends up at only 20k PPD for a dual-CPU system. Although early work unit returns are rewarded, and therefore PPD is not linear with processing speed, that's still too much of a difference.


Cheers,
HG
Image
Dell PowerEdge T420: 2x Xeon E5-2470 v2
MeeLee
Posts: 1375
Joined: Tue Feb 19, 2019 10:16 pm

Re: Dual X5650 giving me 20k PPD, why so slow?

Post by MeeLee »

I was about to say "Did you convert the binaries (from RPM or Debian)? As NetBSD/FreeBSD/OpenBSD are not supported operating systems.", until I saw you're emulating them.
You'll ALWAYS get higher PPD from running a program natively, compared to running it in emulation (even if it's just a native supported VM)
Hopfgeist
Posts: 71
Joined: Thu Jul 09, 2020 12:07 pm
Hardware configuration: Dell T420, 2x Xeon E5-2470 v2, NetBSD 10, SunFire X2270 M2, 2x Xeon X5675, NetBSD 9; various other Linux/NetBSD PCs, Macs and virtual servers.
Location: Germany

Re: Dual X5650 giving me 20k PPD, why so slow?

Post by Hopfgeist »

MeeLee wrote:I was about to say "Did you convert the binaries (from RPM or Debian)? As NetBSD/FreeBSD/OpenBSD are not supported operating systems.", until I saw you're emulating them.
You'll ALWAYS get higher PPD from running a program natively, compared to running it in emulation (even if it's just a native supported VM)
The binaries are not running in a virtual machine, they run natively in the operating system. It is rather like running Linux binaries that had been compiled for an older kernel version and/or an older libc version. Unless you use lots and lots of syscalls, there is no measurable overhead. (Incidentally, the a7 core uses a lot more syscalls than the a8 core, but even so, the system load is basically 0 (< 0,05%) when running FaH and nothing else.)

Since the system is running NetBSD for various unrelated reasons, syscall mapping is certainly the most efficient way to run FaH, especially compared to a virtual machine.

Straight from the horse's mouth
[...] it is only a thin software layer, mostly for system calls which are already very similar between the two systems. The application code itself is processed at the full speed of your CPU, so you don't get a degraded performance with the Linux emulation [...]
Besides, I'm not OP complaining about low PPD. I get pretty good PPD, given that it's a 10 year old system.


Cheers,
HG
Image
Dell PowerEdge T420: 2x Xeon E5-2470 v2
MeeLee
Posts: 1375
Joined: Tue Feb 19, 2019 10:16 pm

Re: Dual X5650 giving me 20k PPD, why so slow?

Post by MeeLee »

Hopfgeist wrote:
MeeLee wrote:I was about to say "Did you convert the binaries (from RPM or Debian)? As NetBSD/FreeBSD/OpenBSD are not supported operating systems.", until I saw you're emulating them.
You'll ALWAYS get higher PPD from running a program natively, compared to running it in emulation (even if it's just a native supported VM)
The binaries are not running in a virtual machine, they run natively in the operating system. It is rather like running Linux binaries that had been compiled for an older kernel version and/or an older libc version. Unless you use lots and lots of syscalls, there is no measurable overhead. (Incidentally, the a7 core uses a lot more syscalls than the a8 core, but even so, the system load is basically 0 (< 0,05%) when running FaH and nothing else.)

Since the system is running NetBSD for various unrelated reasons, syscall mapping is certainly the most efficient way to run FaH, especially compared to a virtual machine.

Straight from the horse's mouth
[...] it is only a thin software layer, mostly for system calls which are already very similar between the two systems. The application code itself is processed at the full speed of your CPU, so you don't get a degraded performance with the Linux emulation [...]
Besides, I'm not OP complaining about low PPD. I get pretty good PPD, given that it's a 10 year old system.


Cheers,
HG
Reminds me of what they said about Wine, it's a sort of older Windows, that should get very little overhead,
However, you still have the overhead of the original Operating system you're running from, even if the VM is very efficient.
Darth_Peter_dualxeon
Posts: 51
Joined: Fri Mar 20, 2020 3:13 am
Hardware configuration: EVGA SR-2 motherboard
2x Xeon x5670 CPU
64 GB ECC DDR3
Nvidia RTX 2070

Re: Dual X5650 giving me 20k PPD, why so slow?

Post by Darth_Peter_dualxeon »

Hi,
I can tell for sure, 20K PPD for these CPUs is sort-of low.
Do you have a passkey? Did you do something like 10+ workunits (if I remember correctly) already to have the quick return bonus kick in?
Do you pause the folding or turn of the PC often? (I usually click finish button and wait for all WU to finish before shutdown.)
If you pause a workunit, it will be ready later and you lose some bonus.


I have a dual Xeon X5670, so similar to your setup, but a little higher clock speed. My CPU slots can do somewhere between 60-90K PPD total (depending on work unit)

Also I have all memory slots populated, as these CPUs have 3-channel memory controller per cpu. this might add some speed, if the calculations don't fit in the cache, but I don't know this for sure.
I run a clean install of Linux Mint system for the Folding stuff overnight to keep the room warm, nothing more runs except some task manager and hardware monitor apps and maybe a browser.

What could help:
Do not use only one CPU slot, as some workunits do not like the lot of threads these dual-xeons have.
And also they sometimes don't like when there are prime numbers in the factorization of the number of cpu cores larger than 3 (or something)

I have a CPU:12 and a CPU:9 slot, and then some cores left on the second Cpu for the Gpu slot and the background stuffs
12 cores =2 * 2 *3 and 9 = 3*3, so no large prime numbers.
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Dual X5650 giving me 20k PPD, why so slow?

Post by PantherX »

I would suggest that you experiment with these values:
CPU:18
CPU:20
CPU:21

Since they have shown to be working in majority of cases based off this neat chart that _r2w_ben created a while ago: viewtopic.php?f=72&t=34350&start=45

Note that FahCore_a8 WUs can use whatever number of CPUs you provide to it without throwing any errors like FahCore_a7 does.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Post Reply