Project: 17201 (Run 0, Clone 2568, Gen 185) reduced threads

Moderators: Site Moderators, FAHC Science Team

Project: 17201 (Run 0, Clone 2568, Gen 185) reduced threads

Postby HendricksSA » Mon Sep 21, 2020 7:25 pm

Have we made a change to server codes to reduce threads to avoid domain decomposition errors? I noticed the fans on my 48 thread machine were idle and found it was only using 9 threads to process this Project: 17201 (Run 0, Clone 2568, Gen 185). Looking through the log I found this. It is the first time I've noticed it.

17:29:44:WU00:FS00:Connecting to assign1.foldingathome.org:80
17:29:45:WARNING:WU00:FS00:Failed to get assignment from 'assign1.foldingathome.org:80': No WUs available for this configuration
17:29:45:WU00:FS00:Connecting to assign2.foldingathome.org:80
17:29:45:WU00:FS00:Assigned to work server 128.252.203.10
17:29:45:WU00:FS00:Requesting new work unit for slot 00: READY cpu:48 from 128.252.203.10
17:29:45:WU00:FS00:Connecting to 128.252.203.10:8080
17:29:46:WU00:FS00:Downloading 2.22MiB
17:29:47:WU00:FS00:Download complete
17:29:47:WU00:FS00:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:17201 run:0 clone:2568 gen:185 core:0xa7 unit:0x000000d480fccb0a5f32fed6c8584138
17:29:47:WU00:FS00:Starting
17:29:47:WARNING:WU00:FS00:AS lowered CPUs from 48 to 9
17:29:47:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit-avx-256/a7-0.0.19/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 706 -lifeline 1876 -checkpoint 15 -np 9
17:29:47:WU00:FS00:Started FahCore on PID 10403
17:29:47:WU00:FS00:Core PID:10407
17:29:47:WU00:FS00:FahCore 0xa7 started
HendricksSA
 
Posts: 331
Joined: Fri Jun 26, 2009 5:34 am

Re: Project: 17201 (Run 0, Clone 2568, Gen 185) reduced thre

Postby bruce » Mon Sep 21, 2020 8:17 pm

Yes. GROMACS has some severe limitations in the numbers of threads that can be used on a project. The version used in FAHCore_a7 use some hack-like corrections to enable it to work. THe upcoming version that will be in FAHCore_a8 will change all that but there still will be some similar issues. Domain Decomposition was designed back when CPUs had 1,2,4,8,12,16 cores. Nobody could conceive of trying to use as many threads as you have.

The OpenMM code used on GPUs is entirely different ... and if it reduces the parallelism to avoid specific problems, it doesn't tell you about it.

FAH is considering some improvements for CPUs.

I reommend using several CPU slots while avoiding any numbers with large prime factors.
bruce
 
Posts: 19970
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.

Re: Project: 17201 (Run 0, Clone 2568, Gen 185) reduced thre

Postby Joe_H » Mon Sep 21, 2020 8:23 pm

The code has been there for some time. Depending on server settings being also all set correctly, if a WU is not available for the requested CPU thread number, a WU that will use fewer will be assigned.

What is unusual here is that the AS went so far down, usually there are WUs available for somewhat higher thread counts. It would be more normal to see a reduction to 32 or 24 for example.

The A8 folding core is less tied to domain decomposition numbers, there are projects waiting to be created once new servers are ready. Some smaller projects may not use a large number of threads as efficiently, but they will still process. But for right now there is a bit of a shortage of CPU WUs, especially for higher thread counts.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Joe_H
Site Admin
 
Posts: 6578
Joined: Tue Apr 21, 2009 5:41 pm
Location: W. MA

Re: Project: 17201 (Run 0, Clone 2568, Gen 185) reduced thre

Postby PantherX » Tue Sep 22, 2020 4:47 am

There was discussions about what to do when a CPU with X CPUs requests work and there wasn't any. Thus, instead of idle CPU, the idea was that it would assign you a WU for Y CPUs where Y < X thus, you can still contribute. As Joe_H mentioned, this is due to a shortage of CPU WUs under some conditions which will hopefully be resolved soon.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
User avatar
PantherX
Site Moderator
 
Posts: 6725
Joined: Wed Dec 23, 2009 10:33 am
Location: Land Of The Long White Cloud


Return to Issues with a specific WU

Who is online

Users browsing this forum: midhart90 and 2 guests

cron