Bug report: FAH client cannot detect more than 32 CPUs

Moderators: Site Moderators, FAHC Science Team

Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Bug report: FAH client cannot detect more than 32 CPUs

Post by Neil-B »

jnv11 wrote:Asking the folding cores to pass the data to the client would create loads of messy complexity that is asking for more spaghetti code which is a nightmare to maintain. Furthermore, the current Windows Folding@home client will ignore user requests to set the number of CPU cores in one folding slot to more than 32 cores, so that will need to be changed in the next version of the Folding@home client software.
Not necessarily might simply be change to one line of code allows both tbh ?! :)

... putting minor quick fix in the 32but client as part a new release may be a lot quicker, simpler and less messy than coding and building a 64bit client?
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Bug report: FAH client cannot detect more than 32 CPUs

Post by bruce »

I think somebody at some time in the past decided that if there are 64 threads, that certainly means that the NUMA structure is two sockets of 32-threads each. Perhaps in that timeframe, the OS was not able to figure out the actual NUMA structure or the programmer didn't feel like breaking out the manuals to learn about specific OPs that he hadn't used in years.

It's a bad idea for a single project to make use of more than one socketed CPU. It's not uncommon for independent sockets and independent memory to have loosly coupled clocks so you either have to seriously down-clock both CPUs or you fact race conditions when portions of the calculation have to synchronize with portions when were run based on a different CPU clock.
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Bug report: FAH client cannot detect more than 32 CPUs

Post by Neil-B »

bruce wrote:It's a bad idea for a single project to make use of more than one socketed CPU. It's not uncommon for independent sockets and independent memory to have loosly coupled clocks so you either have to seriously down-clock both CPUs or you fact race conditions when portions of the calculation have to synchronize with portions when were run based on a different CPU clock.
From the tests I have run on FaH A7 and A8 cores on various project WUs and various totally unrelatedhigh intensity ai/my loads (albeit on a twin xeon server specifically configured for such loads) this doesn't appear to be an issue .. may have been in the past, may even be the case today and I have just been luck but I rather like to ensure this is still the case prior to consigning this to the discard pile .. also haven't had this issue on multi threaded ai/my loads on 4 and 8 cpu servers (not used for fah).

... and linux clients currently running greater than 32 threads over multiple sockets without issues I believe?
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
MeeLee
Posts: 1375
Joined: Tue Feb 19, 2019 10:16 pm

Re: Bug report: FAH client cannot detect more than 32 CPUs

Post by MeeLee »

I'm wondering,
If splitting a CPU into 2 CPU slots randomly assigns cores, or if they take consideration on what chiplet they are?
Like, if I have 16 cores 32 threads on one slot, and the same on another, will the client automatically assign core 0 to core 15 for slot 1, and core 16 to core 31 for slot 2?
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Bug report: FAH client cannot detect more than 32 CPUs

Post by bruce »

Newer versions of the OS probably do take it into consideration whereas older versions did not.
MeeLee
Posts: 1375
Joined: Tue Feb 19, 2019 10:16 pm

Re: Bug report: FAH client cannot detect more than 32 CPUs

Post by MeeLee »

The OS, or FAH?
How will the OS see which thread to assign, when it's assigned in FAH (through slots)?
I think the OS doesn't see what FAH sees, but could automatically address cores depending on the L-cache requests the tasks pull.
It would be better if in FAH one could somehow assign cores (manually, or automatically).
gunnarre
Posts: 567
Joined: Sun May 24, 2020 7:23 pm
Location: Norway

Re: Bug report: FAH client cannot detect more than 32 CPUs

Post by gunnarre »

The OS. As far as I understand, an SMP-aware Linux kernel will by default attempt to keep all the threads of a particular process running on one CPU. So I would test with one slot per CPU and see if that makes folding faster. It should spawn one process for each work unit, and if the kernel does its job correctly, it should run each process on one CPU without having to move threads between them. You shouldn't have to force the process to run on a particular CPU, although you can do so manually if you want:
From the Linux taskset documentation:
Note that the Linux scheduler also supports natural CPU affinity: the scheduler attempts to keep processes on the same CPU as long as practical for performance reasons. Therefore, forcing a specific CPU affinity is useful only in certain applications.
The kernel scheduler has a process tree, and it knows which processes communicate, so a multi-threaded process with a lower thread count than one CPU should automatically be kept on one CPU.

Likewise a NUMA aware Linux kernel should automatically try to localize each process to one NUMA node, to reduce cross-node memory access. I don't run a Threadripper though, so I'm not sure about this part.
Image
Online: GTX 1660 Super, GTX 1080, GTX 1050 Ti 4G OC, RX580 + occasional CPU folding in the cold.
Offline: Radeon HD 7770, GTX 960, GTX 950
Neil-B
Posts: 2027
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: Bug report: FAH client cannot detect more than 32 CPUs

Post by Neil-B »

using a single slot utilising both cpus makes this a non issue? ... the a8 core works fine under windows utilising two cpus and delivers great performance - just needs a client that allows the core to run as it can :)
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
MeeLee
Posts: 1375
Joined: Tue Feb 19, 2019 10:16 pm

Re: Bug report: FAH client cannot detect more than 32 CPUs

Post by MeeLee »

Neil-B wrote:using a single slot utilising both cpus makes this a non issue? ... the a8 core works fine under windows utilising two cpus and delivers great performance - just needs a client that allows the core to run as it can :)
We're talking about more than 32 cores or threads, there seem to be some issues.
gunnarre wrote:The OS. As far as I understand, an SMP-aware Linux kernel will by default attempt to keep all the threads of a particular process running on one CPU. So I would test with one slot per CPU and see if that makes folding faster. It should spawn one process for each work unit, and if the kernel does its job correctly, it should run each process on one CPU without having to move threads between them. You shouldn't have to force the process to run on a particular CPU, although you can do so manually if you want:
From the Linux taskset documentation:
Note that the Linux scheduler also supports natural CPU affinity: the scheduler attempts to keep processes on the same CPU as long as practical for performance reasons. Therefore, forcing a specific CPU affinity is useful only in certain applications.
The kernel scheduler has a process tree, and it knows which processes communicate, so a multi-threaded process with a lower thread count than one CPU should automatically be kept on one CPU.

Likewise a NUMA aware Linux kernel should automatically try to localize each process to one NUMA node, to reduce cross-node memory access. I don't run a Threadripper though, so I'm not sure about this part.
Neil-B wrote:using a single slot utilising both cpus makes this a non issue? ... the a8 core works fine under windows utilising two cpus and delivers great performance - just needs a client that allows the core to run as it can :)
We're talking about more than 32 cores or threads, there seem to be some issues.

Yeah, CPUs is different from chiplets. CPU Chiplets are on the same CPU block.
You can have a CPU like a threadripper, with 2 CPU chiplets, with 8 or 16 cores on it (and 16 to 32 threads per chiplet, multiplied by the amount of chiplets to give the total CPU threads).

It would definitely lower performance if a few threads of one chiplet are running on another chiplet, as they're pulling data from another L-cache section, that data has to be loaded from one L-cache, into one CPU core, where it'll forward the data to the second L-cache block on the other chiplet. That's a lot of latency loss. Hence the question.
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: Bug report: FAH client cannot detect more than 32 CPUs

Post by PantherX »

MeeLee wrote:
Neil-B wrote:using a single slot utilising both cpus makes this a non issue? ... the a8 core works fine under windows utilising two cpus and delivers great performance - just needs a client that allows the core to run as it can :)
We're talking about more than 32 cores or threads, there seem to be some issues...
To elaborate a bit on what Neil-B posted, FahCore_a8 can run on 32+ threads on Windows when run stand-alone. I have worked with Neil-B and documented various test cases and have passed that information on which was appreciated. While I hope that the fix is easy, I don't know if/when it will be resolved since there are other things that needs to be addressed. I am hoping that CPU folding on Windows with 32+ CPUs can become a reality soon-ish (no ETA or promise) :)
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Post Reply