More CPU = slower

Moderators: Site Moderators, FAHC Science Team

Post Reply
GunnarB_Hamburg
Posts: 5
Joined: Tue Mar 31, 2020 6:01 am

More CPU = slower

Post by GunnarB_Hamburg »

I have a work package for CPU (project 14534 Unit 0x00000001cedfaa925ea34b233aef4e3b) running on an Intel 8700-K with 6 cores/12 threads.
I needed CPU for myself and reduced from then 12 cpu to 4 cpu usage. This did not result in significant longer calculation time, which spiked my interest.

I reduced it then to only 1 CPU and got a TPF of about 5:30. I then double to 2 CPU and again to 4 CPU. The TPF sank to 1:50, which is to be expected.
I then added 2 more CPU, but then the TPF rose to 2:00 (GPU also running and taking CPU), which I find odd. And no, the CPU is not throttling. Going back to 4 CPU it sank again to now 1:42.

Summary
1 CPU = 5:35 TPF with xx% total CPU load and 4,29 GHz.
4 CPU = 1:42 TPF with 41% total CPU load and 4,29 GHz.
6 CPU = 1:20 TPF with 61% total CPU load and 4,29 GHz. (no GPU calculation)
12 CPU/Threads = 1:33 TPF with 81% total CPU load and 4,31 GHz. (no GPU calculation)

4 CPU = 1: TPF with 52% total CPU load and 4,32 GHz. (with GPU calculation on another package)
6 CPU = 1:53 TPF with 71% total CPU load and 4,29 GHz. (with GPU calculation on another package)

Since this is using a lot more watts, why is it not faster?

I believe that the CPU is doing more paging and can not use the cache as efficiently.
_r2w_ben
Posts: 285
Joined: Wed Apr 23, 2008 3:11 pm

Re: More CPU = slower

Post by _r2w_ben »

Scaling is a challenge for parallel processing. As the number of threads increases, so does the time spent communicating and waiting to synchronize.

If the operating system is scheduling well, 6 threads would each end up on a physical core. Beyond that, threads will be sharing resources on physical cores. HT generally improves FAH performance.

Deep in the FAH work folder, there should be a file named md.log. Can you find this section and post it? This will give a bit more information about the characteristics of this project.

Code: Select all

Initializing Domain Decomposition on 12 ranks
Dynamic load balancing: auto
Will sort the charge groups at every domain (re)decomposition
Initial maximum inter charge-group distances:
    two-body bonded interactions: 0.415 nm, LJ-14, atoms 974 982
  multi-body bonded interactions: 0.415 nm, Proper Dih., atoms 974 982
Minimum cell size due to bonded interactions: 0.457 nm
Maximum distance for 7 constraints, at 120 deg. angles, all-trans: 1.166 nm
Estimated maximum distance required for P-LINCS: 1.166 nm
This distance will limit the DD cell size, you can override this with -rcon
Using 0 separate PME ranks, as there are too few total
 ranks for efficient splitting
Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
Optimizing the DD grid for 12 cells with a minimum initial size of 1.457 nm
The maximum allowed number of cells is: X 3 Y 3 Z 3
Domain decomposition grid 2 x 3 x 2, separate PME ranks 0
PME domain decomposition: 2 x 6 x 1
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: More CPU = slower

Post by PantherX »

Welcome to the F@H Forum GunnarB_Hamburg,

Out of curiosity, did you wait at least 3% between each CPU value? Reason is that it takes a bit of time for the value pf TPF to change when the CPU value has changed.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
GunnarB_Hamburg
Posts: 5
Joined: Tue Mar 31, 2020 6:01 am

Re: More CPU = slower

Post by GunnarB_Hamburg »

The work package is done, so md.log is gone. I am starting a new package and then will wait 3%. Then I can post again.
Unfortunately I got a package, which does not use more than 4 cores. Need to wait for next package
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: More CPU = slower

Post by PantherX »

The WU (Work Unit) has a maximum number of CPU which is determined when the CPU Slot contacts the AS (Assignment Server) to get WU. If you had 4 CPUs assigned to it initially but changed it to increase it, it won't take effect until the next WU is downloaded. You can always decrease the number of CPUs when folding a WU but not increase it beyond the value that it was downloaded at.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
MeeLee
Posts: 1375
Joined: Tue Feb 19, 2019 10:16 pm

Re: More CPU = slower

Post by MeeLee »

Keep an eye out on CPU frequency and temperature, and the balance between the two.
The only reason higher thread count slows down a Wu, is when the temperatures force boost speeds to go down.
GunnarB_Hamburg
Posts: 5
Joined: Tue Mar 31, 2020 6:01 am

Re: More CPU = slower

Post by GunnarB_Hamburg »

MeeLee, what you say is just totally wrong. There is always more than one reason.
Also, if you had read my post, you would have seen, that I checked that.
Last edited by GunnarB_Hamburg on Sun Apr 26, 2020 6:20 pm, edited 1 time in total.
GunnarB_Hamburg
Posts: 5
Joined: Tue Mar 31, 2020 6:01 am

Re: More CPU = slower

Post by GunnarB_Hamburg »

More measurements:

Summary (all around 4,3 GHz, no GPU calculation, CPU load is of windows task-manager, values in brackets from Intel Extreme Tuning Tool)
2 CPU = 5:54 TPF with 22% total CPU load (19% CPU, 60°C, 49W, no throttling)
4 CPU = 3:02 TPF with 41% total CPU load (34% CPU, 69°C, 72W, no throttling)
6 CPU = 2:10 TPF with 61% total CPU load (51% CPU, 72°C, 97W, no throttling)
12 CPU/Threads = 2:10 TPF with 98% total CPU load (86% CPU, 80°C, 103W, no throttling)

with GPU (different package parallel)
4 CPU = 3:06 TPF with 41% total CPU load (43% CPU, 75°C, 75W, no throttling)
6 CPU = 3:26 TPF with 72% total CPU load (60% CPU, 80°C, 82W, no throttling)
11 CPU/Threads = 2:12 TPF with 100% total CPU load (93% CPU, 86°C, 100W, no throttling)
Here I used 11 CPU because this is the automatic selection.
The impact on the GPU calculation time was nearly none/not measurable.

If one runs only CPU, then there does not seem to be an issue. If also GPU is working on a package, there is a slowdown. I still believe, it has to do with different packages using the same first, second and third level cache.
Post Reply