n=40 but load (and threads) seem to imply n=48 being used

Moderators: Site Moderators, FAHC Science Team

Post Reply
WmLongman
Posts: 6
Joined: Wed Sep 09, 2015 10:06 pm
Hardware configuration: Win 7 x64 AMD A10+GTX660, Dell M4500 i7; iMac i7; Linux Dual E5-2660 v3
Location: About 46N-122W

n=40 but load (and threads) seem to imply n=48 being used

Post by WmLongman »

I have an E5 server with 40 threads. In the config.xml file, I specify that the number of processors is 40. (And also that the client-type is advanced and the max-packet-size is big, in case that matters.)

Usually, I would expect system load to equal 40, but on this system, it is at 48. In fact, if I look at "top -H" it shows me 48 FahCore_a4 threads running.

Should I decrease the CPU count? Or is it normal for an extra 8 threads to spin up? It seems odd to me that the load would be over 40, that's all...

Essential config settings:

Code: Select all

<config>
<power = full/>
<slot id=0 type=cpu/>
  <checkpoint v=5/>
  <client-type v=advanced/>
  <cpus v=40/>
  <max-packet-size v=big/>
  <next-unit-percentage=100/>
</slot>
</config>
Dual Intel Xeon E5-2660 v3 = 2 sockets * (10 cores * 2 threads/core) = 40 threads

TIA,

--
Bill
...folding proteins since 1984...
Joe_H
Site Admin
Posts: 7868
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: n=40 but load (and threads) seem to imply n=48 being use

Post by Joe_H »

Welcome to the folding support forum.

Please post the beginning of your log file, especially the first 100 or so line that show the client information, system info and the configuration that the client read in. Include enogh to show a WU starting to be processed. Directions for finding the log and posting it can be found in this topic - viewtopic.php?f=16&t=26036.

As for the number of threads, the folding core will use a few more than are specified for computations. They are mostly inactive during processing, so should not any adjustment for their existence.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: n=40 but load (and threads) seem to imply n=48 being use

Post by bruce »

The SMP core does create extra non-folding threads. You should have 40 that are busy plus a few more that are doing nothing but waiting.

If your system is busy doing other things, it might be beneficial to REDUCE the number of threads below 40. Performance is best if all 40 threads have relatively few interruptions so maybe 36 or 32 is a better number so that all of them can do equal amounts of FAH work -- but not because of the mostly inactive threads that FAH creates..
WmLongman
Posts: 6
Joined: Wed Sep 09, 2015 10:06 pm
Hardware configuration: Win 7 x64 AMD A10+GTX660, Dell M4500 i7; iMac i7; Linux Dual E5-2660 v3
Location: About 46N-122W

Re: n=40 but load (and threads) seem to imply n=48 being use

Post by WmLongman »

I had actually turned it down to n=36 the other day but it was still using more CPU than I expected, Bruce. I will try turning it down to 32 and see what that does to the times. Currently, the runs are all finishing in a bit more than an hour (~69'). I also find it, uh, interesting that the system is spending half its time in "nice" and half its time in "system".

Here's the top of the log file. Further below is the output of "top" showing all 48 threads as "running" and all with a similar amount of CPU and time.

Thanks for the warm welcome. I'm a LONG time folder, but new to the forum.

Code: Select all

13:19:41:************************* Folding@home Client *************************
13:19:41:    Website: http://folding.stanford.edu/
13:19:41:  Copyright: (c) 2009-2014 Stanford University
13:19:41:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
13:19:41:       Args: --run-as fahclient --config /etc/fahclient/config.xml
13:19:41:     Config: /etc/fahclient/config.xml
13:19:41:******************************** Build ********************************
13:19:41:    Version: 7.4.4
13:19:41:       Date: Mar 4 2014
13:19:41:       Time: 12:01:17
13:19:41:    SVN Rev: 4130
13:19:41:     Branch: fah/trunk/client
13:19:41:   Compiler: GNU 4.1.2 20080704 (Red Hat 4.1.2-46)
13:19:41:    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
13:19:41:             -fno-unsafe-math-optimizations -msse2
13:19:41:   Platform: linux2 2.6.18-164.11.1.el5
13:19:41:       Bits: 64
13:19:41:       Mode: Release
13:19:41:******************************* System ********************************
13:19:41:        CPU: Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz
13:19:41:     CPU ID: GenuineIntel Family 6 Model 63 Stepping 2
13:19:41:       CPUs: 40
13:19:41:     Memory: 755.66GiB
13:19:41:Free Memory: 755.05GiB
13:19:41:    Threads: POSIX_THREADS
13:19:41: OS Version: 3.18
13:19:41:Has Battery: false
13:19:41: On Battery: false
13:19:41: UTC Offset: 0
13:19:41:        PID: 17820
13:19:41:        CWD: /var/lib/fahclient
13:19:41:         OS: Linux 3.18.7-gentoo x86_64
13:19:41:    OS Arch: AMD64
13:19:41:       GPUs: 0
13:19:41:       CUDA: Not detected
13:19:41:***********************************************************************
13:19:41:<config>
13:19:41:  <!-- Slot Control -->
13:19:41:  <power v='full'/>
13:19:41:
13:19:41:  <!-- User Information -->
13:19:41:  <passkey v='********************************'/>
13:19:41:  <team v='216737'/>
13:19:41:  <user v='WmLongman'/>
13:19:41:
13:19:41:  <!-- Folding Slots -->
13:19:41:  <slot id='0' type='CPU'>
13:19:41:    <checkpoint v='5'/>
13:19:41:    <client-type v='advanced'/>
13:19:41:    <cpus v='40'/>
13:19:41:    <max-packet-size v='big'/>
13:19:41:    <next-unit-percentage v='100'/>
13:19:41:  </slot>
13:19:41:</config>
13:19:41:Switching to user fahclient
13:19:41:Trying to access database...
13:19:41:Successfully acquired database lock
13:19:41:Enabled folding slot 00: READY cpu:40
13:19:41:WU00:FS00:Connecting to 171.67.108.200:8080
13:19:43:WU00:FS00:Assigned to work server 171.64.65.99
13:19:43:WU00:FS00:Requesting new work unit for slot 00: READY cpu:40 from 171.64.65.99
13:19:43:WU00:FS00:Connecting to 171.64.65.99:8080
13:19:46:WU00:FS00:Downloading 6.24MiB
13:19:48:WU00:FS00:Download complete
13:19:48:WU00:FS00:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:9752 run:2092 clone:0 gen:206 core:0xa4 unit:0x00000112ab404163554174587ff33dad
13:19:48:WU00:FS00:Starting
13:19:48:WU00:FS00:Running FahCore: /var/usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 00 -suffix 01 -version 704 -lifeline 17820 -checkpoint 5 -np 40
13:19:48:WU00:FS00:Started FahCore on PID 17829
13:19:48:WU00:FS00:Core PID:17833
13:19:48:WU00:FS00:FahCore 0xa4 started
13:19:48:WU00:FS00:0xa4:
13:19:48:WU00:FS00:0xa4:*------------------------------*
13:19:48:WU00:FS00:0xa4:Folding@Home Gromacs GB Core
13:19:48:WU00:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
13:19:48:WU00:FS00:0xa4:
13:19:48:WU00:FS00:0xa4:Preparing to commence simulation
13:19:48:WU00:FS00:0xa4:- Looking at optimizations...
13:19:48:WU00:FS00:0xa4:- Created dyn
13:19:48:WU00:FS00:0xa4:- Files status OK
13:19:48:WU00:FS00:0xa4:- Expanded 6546141 -> 22464536 (decompressed 343.1 percent)
13:19:49:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=6546141 data_size=22464536, decompressed_data_size=22464536 diff=0
13:19:49:WU00:FS00:0xa4:- Digital signature verified
13:19:49:WU00:FS00:0xa4:
13:19:49:WU00:FS00:0xa4:Project: 9752 (Run 2092, Clone 0, Gen 206)
13:19:49:WU00:FS00:0xa4:
13:19:49:WU00:FS00:0xa4:Assembly optimizations on if available.
13:19:49:WU00:FS00:0xa4:Entering M.D.
13:19:56:WU00:FS00:0xa4:Completed 0 out of 80000 steps  (0%)
...

Code: Select all

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
19685 fahclie+  39  19 4238012 1.943g   5556 R  2.3  0.3  39:16.26 FahCore_a4
19670 fahclie+  39  19 4238012 1.943g   5556 R  2.3  0.3  38:09.45 FahCore_a4
19659 fahclie+  39  19 4238012 1.943g   5556 R  2.3  0.3  39:08.07 FahCore_a4
19663 fahclie+  39  19 4238012 1.943g   5556 R  2.3  0.3  39:21.15 FahCore_a4
19673 fahclie+  39  19 4238012 1.943g   5556 R  2.3  0.3  39:26.40 FahCore_a4
19645 fahclie+  39  19 4238012 1.943g   5556 R  2.3  0.3  39:30.57 FahCore_a4
19681 fahclie+  39  19 4238012 1.943g   5556 R  2.3  0.3  39:05.07 FahCore_a4
19668 fahclie+  39  19 4238012 1.943g   5556 R  2.3  0.3  39:29.21 FahCore_a4
19672 fahclie+  39  19 4238012 1.943g   5556 R  2.3  0.3  38:01.01 FahCore_a4
19647 fahclie+  39  19 4238012 1.943g   5556 R  2.3  0.3  39:24.11 FahCore_a4
19687 fahclie+  39  19 4238012 1.943g   5556 R  2.3  0.3  39:24.82 FahCore_a4
19650 fahclie+  39  19 4238012 1.943g   5556 R  2.2  0.3  39:17.56 FahCore_a4
19677 fahclie+  39  19 4238012 1.943g   5556 R  2.2  0.3  37:48.13 FahCore_a4
19649 fahclie+  39  19 4238012 1.943g   5556 R  2.2  0.3  37:58.96 FahCore_a4
19667 fahclie+  39  19 4238012 1.943g   5556 R  2.2  0.3  38:01.04 FahCore_a4
19682 fahclie+  39  19 4238012 1.943g   5556 R  2.2  0.3  39:24.35 FahCore_a4
19654 fahclie+  39  19 4238012 1.943g   5556 R  2.2  0.3  38:03.99 FahCore_a4
19664 fahclie+  39  19 4238012 1.943g   5556 R  2.2  0.3  39:38.04 FahCore_a4
19653 fahclie+  39  19 4238012 1.943g   5556 R  2.2  0.3  38:00.69 FahCore_a4
19665 fahclie+  39  19 4238012 1.943g   5556 R  2.2  0.3  37:53.83 FahCore_a4
19656 fahclie+  39  19 4238012 1.943g   5556 R  2.2  0.3  39:29.38 FahCore_a4
19671 fahclie+  39  19 4238012 1.943g   5556 R  2.1  0.3  39:15.75 FahCore_a4
19651 fahclie+  39  19 4238012 1.943g   5556 R  2.1  0.3  37:57.37 FahCore_a4
19683 fahclie+  39  19 4238012 1.943g   5556 R  2.1  0.3  39:28.85 FahCore_a4
19678 fahclie+  39  19 4238012 1.943g   5556 R  2.1  0.3  37:53.62 FahCore_a4
19652 fahclie+  39  19 4238012 1.943g   5556 R  2.1  0.3  37:42.20 FahCore_a4
19688 fahclie+  39  19 4238012 1.943g   5556 R  2.1  0.3  37:47.13 FahCore_a4
19639 fahclie+  39  19 4238012 1.943g   5556 R  2.0  0.3  37:56.07 FahCore_a4
19686 fahclie+  39  19 4238012 1.943g   5556 R  2.0  0.3  37:53.50 FahCore_a4
19661 fahclie+  39  19 4238012 1.943g   5556 R  2.0  0.3  37:53.38 FahCore_a4
19680 fahclie+  39  19 4238012 1.943g   5556 R  2.0  0.3  38:04.64 FahCore_a4
19655 fahclie+  39  19 4238012 1.943g   5556 R  2.0  0.3  37:51.07 FahCore_a4
19674 fahclie+  39  19 4238012 1.943g   5556 R  2.0  0.3  39:24.10 FahCore_a4
19657 fahclie+  39  19 4238012 1.943g   5556 R  2.0  0.3  39:17.06 FahCore_a4
19648 fahclie+  39  19 4238012 1.943g   5556 R  2.0  0.3  38:12.49 FahCore_a4
19676 fahclie+  39  19 4238012 1.943g   5556 R  2.0  0.3  37:46.52 FahCore_a4
19646 fahclie+  39  19 4238012 1.943g   5556 R  1.9  0.3  37:52.04 FahCore_a4
19679 fahclie+  39  19 4238012 1.943g   5556 R  1.9  0.3  39:13.90 FahCore_a4
19684 fahclie+  39  19 4238012 1.943g   5556 R  1.9  0.3  37:39.05 FahCore_a4
19660 fahclie+  39  19 4238012 1.943g   5556 R  1.9  0.3  38:56.11 FahCore_a4
19666 fahclie+  39  19 4238012 1.943g   5556 R  1.9  0.3  37:52.75 FahCore_a4
19642 fahclie+  39  19 4238012 1.943g   5556 R  1.8  0.3  39:32.48 FahCore_a4
19643 fahclie+  39  19 4238012 1.943g   5556 R  1.8  0.3  37:52.77 FahCore_a4
19675 fahclie+  39  19 4238012 1.943g   5556 R  1.8  0.3  39:24.96 FahCore_a4
19658 fahclie+  39  19 4238012 1.943g   5556 R  1.8  0.3  37:55.18 FahCore_a4
19669 fahclie+  39  19 4238012 1.943g   5556 R  1.8  0.3  38:52.36 FahCore_a4
19662 fahclie+  39  19 4238012 1.943g   5556 R  1.8  0.3  39:31.93 FahCore_a4
19644 fahclie+  39  19 4238012 1.943g   5556 R  1.7  0.3  37:45.13 FahCore_a4
19713 root      20   0   20312   2876   2048 R  0.0  0.0   0:00.03 top
Last edited by WmLongman on Thu Sep 10, 2015 6:06 am, edited 1 time in total.
...folding proteins since 1984...
Joe_H
Site Admin
Posts: 7868
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: n=40 but load (and threads) seem to imply n=48 being use

Post by Joe_H »

I would also suggest returning the checkpoint value to its default of 15 minutes. The extra stopping to create and write out checkpoint files is not helping your CPU utilization.

I also am not used to seeing top display a PID for each thread. Does the gentoo scheduler do that? Most systems I am used to would have one PID for the Core_A4 process and list a number of threads being used by that PID.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
WmLongman
Posts: 6
Joined: Wed Sep 09, 2015 10:06 pm
Hardware configuration: Win 7 x64 AMD A10+GTX660, Dell M4500 i7; iMac i7; Linux Dual E5-2660 v3
Location: About 46N-122W

Re: n=40 but load (and threads) seem to imply n=48 being use

Post by WmLongman »

VERY interesting result with n=32. It completes much more quickly, with "system" load now down to 10%. Also notice how equal the time share each of the threads becomes. This one finished in ~45' compared to a consistent ~69' with n=40. Wow.

But like I mentioned before, I had turned this down before, to no avail, just not down to 32...

Code: Select all

top - 05:37:06 up 5 days, 12:12,  7 users,  load average: 32.06, 32.05, 33.68
Threads: 399 total,  33 running, 366 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us, 10.3 sy, 69.7 ni, 20.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:  79237049+total,  2180004 used, 79019046+free,    26300 buffers
KiB Swap:        0 total,        0 used,        0 free.   307020 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
19836 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:07.92 FahCore_a4
19838 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:07.96 FahCore_a4
19846 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:07.93 FahCore_a4
19848 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:07.95 FahCore_a4
19826 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:08.76 FahCore_a4
19829 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:08.00 FahCore_a4
19830 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:07.85 FahCore_a4
19831 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:07.94 FahCore_a4
19832 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:07.95 FahCore_a4
19833 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:07.93 FahCore_a4
19834 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:07.96 FahCore_a4
19835 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:07.98 FahCore_a4
19837 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:08.01 FahCore_a4
19839 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:07.92 FahCore_a4
19840 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:07.90 FahCore_a4
19841 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:07.95 FahCore_a4
19842 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:08.00 FahCore_a4
19843 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:08.01 FahCore_a4
19844 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:07.86 FahCore_a4
19845 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:07.97 FahCore_a4
19847 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:07.97 FahCore_a4
19850 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:08.00 FahCore_a4
19851 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:07.92 FahCore_a4
19852 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:07.86 FahCore_a4
19853 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:07.99 FahCore_a4
19854 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:07.95 FahCore_a4
19855 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:07.97 FahCore_a4
19856 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:08.03 FahCore_a4
19857 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:08.02 FahCore_a4
19858 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:07.99 FahCore_a4
19859 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:08.02 FahCore_a4
19849 fahclie+  39  19 2631992 1.379g   5556 R  2.5  0.2  33:07.97 FahCore_a4
   70 root      rt   0       0      0      0 S  0.0  0.0   0:00.29 migration/16
Last edited by WmLongman on Thu Sep 10, 2015 6:07 am, edited 1 time in total.
...folding proteins since 1984...
WmLongman
Posts: 6
Joined: Wed Sep 09, 2015 10:06 pm
Hardware configuration: Win 7 x64 AMD A10+GTX660, Dell M4500 i7; iMac i7; Linux Dual E5-2660 v3
Location: About 46N-122W

Re: n=40 but load (and threads) seem to imply n=48 being use

Post by WmLongman »

Well, in 45 minutes, that's only two checkpoints if I use 15 minutes, before it's sent...and this system isn't even using a hard drive. So, I get eight or so checkpoints now over the life of the run. That's pretty small.

The top command can show threads if you use the "-H" flag. It's not specific to gentoo.

EDIT:

Hot off the press with n=36 -- yes, load and threads goes up to 48, just like it did the other day....

So, why the jump from 32 to 48?
...folding proteins since 1984...
toTOW
Site Moderator
Posts: 6309
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: n=40 but load (and threads) seem to imply n=48 being use

Post by toTOW »

If my memories are correct, prime numbers in domain decomposition are causing issues to some projects, so it might be defaulting to the nearest value with no prime numbrs in decomposition ...

32 and 48 lead to decomposition with no prime numbers. 36 and 40 don't (3 and 5).
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Nathan_P
Posts: 1180
Joined: Wed Apr 01, 2009 9:22 pm
Hardware configuration: Asus Z8NA D6C, 2 x5670@3.2 Ghz, , 12gb Ram, GTX 980ti, AX650 PSU, win 10 (daily use)

Asus Z87 WS, Xeon E3-1230L v3, 8gb ram, KFA GTX 1080, EVGA 750ti , AX760 PSU, Mint 18.2 OS

Not currently folding
Asus Z9PE- D8 WS, 2 E5-2665@2.3 Ghz, 16Gb 1.35v Ram, Ubuntu (Fold only)
Asus Z9PA, 2 Ivy 12 core, 16gb Ram, H folding appliance (fold only)
Location: Jersey, Channel islands

Re: n=40 but load (and threads) seem to imply n=48 being use

Post by Nathan_P »

Ref the system/nice load issue - some distro's are better for folding than others, and even some kernel's are better than others. For a dedicated folding rig I would look into a folding image created by one of the big teams, they have been optimised specifically to fold. On my rigs I generally get system usage of less than 2%
Image
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: n=40 but load (and threads) seem to imply n=48 being use

Post by bruce »

One of the main reasons for a high system usage is excessive task switching.

I would look at total CPU usage when FAH is NOT running. Then allocate a total number of CPU-threads to FAH that approximates the number of idle threads for most of the time. Thrashing makes any system inefficient.
WmLongman
Posts: 6
Joined: Wed Sep 09, 2015 10:06 pm
Hardware configuration: Win 7 x64 AMD A10+GTX660, Dell M4500 i7; iMac i7; Linux Dual E5-2660 v3
Location: About 46N-122W

Re: n=40 but load (and threads) seem to imply n=48 being use

Post by WmLongman »

Thanks everyone.

I have it running two slots now, a 32 and an 8. The load is now hovering at 40.7 with system load percentage about 8.7%. Like toTOW mentioned, it seems that combinations of CPU counts which cannot be properly factored just find the nearest "round" number. I couldn't do two twentys so this is how it seems to work the best.
Post Reply