F@H CPU Memory Bandwidth? 4 vs 8 Channel DDR4?

A forum for discussing FAH-related hardware choices and info on actual products (not speculation).

Moderator: Site Moderators

Forum rules
Please read the forum rules before posting.
Post Reply
John_Alan
Posts: 9
Joined: Mon Mar 16, 2020 12:00 am

F@H CPU Memory Bandwidth? 4 vs 8 Channel DDR4?

Post by John_Alan »

How much is F@H able to take advantage of CPU memory bandwidth? I am folding with a 2nd generation AMD Epyc 7302p CPU and it has an 8-channel DDR4-3200 memory controller. I currently have only 4 sticks of DDR4 in the machine, thus the CPU is only using 4 memory channels to talk with the memory. With F@H, would I see any benefits in populating all 8 memory channels? I am also GPU folding on this same machine.
JimboPalmer
Posts: 2573
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: F@H CPU Memory Bandwidth? 4 vs 8 Channel DDR4?

Post by JimboPalmer »

I am WAY on the other end of computing capability, but I find Dual Channel to be 10% faster than Single Channel on my old laptops.
Last edited by JimboPalmer on Thu Feb 25, 2021 5:12 pm, edited 2 times in total.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
Joe_H
Site Admin
Posts: 7868
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: F@H CPU Memory Bandwidth? 4 vs 8 Channel DDR4?

Post by Joe_H »

You may see a minor improvement in speed of folding, but it might be hard to quantify. The effect may be larger when you get WUs consisting of many atoms, more data is being moved to and from RAM.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
John_Alan
Posts: 9
Joined: Mon Mar 16, 2020 12:00 am

Re: F@H CPU Memory Bandwidth? 4 vs 8 Channel DDR4?

Post by John_Alan »

Thank you for the replies. Has anyone run across any benchmarks or articles that explore these aspects of F@H? I'd gladly pick up another 4 sticks of DDR4 for this machine if you think the additional memory bandwidth will help performance.

On prior Xeon workstation CPUs, it was hard to justify folding on a E5-2680v3/v4 as I rarely saw more than 140,000 PPD, but the Epyc 7302p is pushing 350,000 to 400,000 PPD with similar power consumption, so CPU folding on this workstation seems worth it since some science needs CPU's for the work units.
Callaghan
Posts: 6
Joined: Tue Feb 02, 2021 8:04 am
Location: Somerset, UK

Re: F@H CPU Memory Bandwidth? 4 vs 8 Channel DDR4?

Post by Callaghan »

I am not a techie but the above posts promt me to ask ......
MY CPU Intel Core i5-8400 CPU @ 2.80GHz has 2 x memory channels. Does this mean that the CPU can only access memory from 2 of the 4 memory slots on the MB.
Currently only 2 memory slots are loaded.
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: F@H CPU Memory Bandwidth? 4 vs 8 Channel DDR4?

Post by Neil-B »

think the four slots will be paired into two channels ... if the two dimms are one in each of the pairs youe will have 2 channel but if they are in the slots next to each other (normally) then you might be running single channel ... say the sockets are a, b, c and d ... (sometime mobo vendors actually call them A1, A2, B1 and B2 ... you want to have one dim in a or b and the other in c or d ... or using the other naming convention use a1 and b1 or a2 and b2 ... mobo manuals usually make it clear which ones to use
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
JimboPalmer
Posts: 2573
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: F@H CPU Memory Bandwidth? 4 vs 8 Channel DDR4?

Post by JimboPalmer »

Callaghan wrote:MY CPU Intel Core i5-8400 CPU @ 2.80GHz has 2 x memory channels. Does this mean that the CPU can only access memory from 2 of the 4 memory slots on the MB. Currently only 2 memory slots are loaded.
All four slots will work. In single channel mode, the wait to access RAM will take about 28 memory cycles to cycle to be ready again , (over 100 CPU cycles) In Dual channel Mode you can access each channel without waiting that long (11 cycles for mine) , so you have 50% odds of reaching the next memory location in a shorter time. (His quad channel allows a 75% chance of not waiting 8 channel would give him a 87% chance of not waiting)train
Only very select kinds of software are typically memory channel constrained, databases are one. F@H only has a minor improvement, as It is usually constrained by Floating Point math speed.

If you are interested, I use Speccy to examine RAM, there is tons of detail as you drill down. (click blue links)
https://www.ccleaner.com/speccy
Last edited by JimboPalmer on Thu Feb 25, 2021 10:49 pm, edited 1 time in total.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
Joe_H
Site Admin
Posts: 7868
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: F@H CPU Memory Bandwidth? 4 vs 8 Channel DDR4?

Post by Joe_H »

John_Alan wrote:Has anyone run across any benchmarks or articles that explore these aspects of F@H?
I vaguely recall that someone or several someones did tests on memory size and speeds with F@h. They did see a small amount of speed improvement with faster RAM, but I don't recall numbers.

Making it a bit harder to characterize is differences in the size of WUs from different projects. Some of the smaller ones may fit a large amount of the data in the available L2 or L3 cache connected to a processor. Those will spend much less time accessing data in main RAM as compared to other much larger projects.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: F@H CPU Memory Bandwidth? 4 vs 8 Channel DDR4?

Post by bruce »

It would be wrong to suggest that memory bandwidth has no effect on FAH's speed, but it would also be wrong to suggest that it's a critical feature. FAH's core process for CPUs uses a relatively small amount of RAM. Speed depends almost entirely on the speed at which the CPU can compute floating point numbers.

The speed of FAH for GPUs depends almost entirely on the speed at which the GPU can compute floating point numbers. A secondary limitation may pop up depending on the PCIe bus.

Bottom line: Upgrading your RAM is fine, but you probably won't notice a difference since it's a rather small percentage of a rather complex set of other limitations.
John_Alan
Posts: 9
Joined: Mon Mar 16, 2020 12:00 am

Re: F@H CPU Memory Bandwidth? 4 vs 8 Channel DDR4?

Post by John_Alan »

Bruce and all, thank you for the info! The 7302p has 128MB of L3 cache, so I would imagine that cache likely is able to hold large parts of each work unit right on the CPU.
JimboPalmer
Posts: 2573
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: F@H CPU Memory Bandwidth? 4 vs 8 Channel DDR4?

Post by JimboPalmer »

John_Alan wrote:I would imagine that cache likely is able to hold large parts of each work unit right on the CPU.
This is why databases respond well to multiple channel RAM, they overwhelm the cache that works well with most applications.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: F@H CPU Memory Bandwidth? 4 vs 8 Channel DDR4?

Post by bruce »

John_Alan wrote:,... so I would imagine that cache likely is able to hold large parts of each work unit right on the CPU.
"large parts of..." is a lot different that "all of..."

Unfortunately every force on every atom must be calculated during every step so cache performance is significantly degraded unless the entire protein fits.
Post Reply