FahCore_a3.exe problem with XEON E5520

Moderators: Site Moderators, PandeGroup

FahCore_a3.exe problem with XEON E5520

Postby petervanderdoes » Tue Jul 13, 2010 2:38 am

Machine setup:
Intel Xeon E5520
centos-release-5-4.el5.centos.1
12 GB memory

Not sure if it's a big or if I'm doing something wrong:
I start: ./fah6 -verbosity 9 -advmethods

Code: Select all
[01:25:49]
[01:25:49] Loaded queue successfully.
[01:25:49]
[01:25:49] + Processing work unit
[01:25:49] Core required: FahCore_a3.exe
[01:25:49] Core found.
[01:25:49] - Autosending finished units... [July 13 01:25:49 UTC]
[01:25:49] Trying to send all finished work units
[01:25:49] + No unsent completed units remaining.
[01:25:49] - Autosend completed
[01:25:49] Working on queue slot 05 [July 13 01:25:49 UTC]
[01:25:49] + Working ...
[01:25:49] - Calling './FahCore_a3.exe -dir work/ -nice 19 -suffix 05 -np 0 -checkpoint 15 -verbose -lifeline 7244 -version 629'

[01:25:49]
[01:25:49] *------------------------------*
[01:25:49] Folding@Home Gromacs SMP Core
[01:25:49] Version 2.22 (June 10, 2010)
[01:25:49]
[01:25:49] Preparing to commence simulation
[01:25:49] - Ensuring status. Please wait.
[01:25:59] - Looking at optimizations...
[01:25:59] - Working with standard loops on this execution.
[01:25:59] - Created dyn
[01:25:59] - Files status OK
[01:25:59] - Expanded 1764012 -> 2248557 (decompressed 127.4 percent)
[01:25:59] Called DecompressByteArray: compressed_data_size=1764012 data_size=2248557, decompressed_data_size=2248557 diff=0
[01:25:59] - Digital signature verified
[01:25:59]
[01:25:59] Project: 6066 (Run 0, Clone 102, Gen 64)
[01:25:59]
[01:25:59] Entering M.D.
Starting 2 threads
NNODES=2, MYRANK=0, HOSTNAME=thread #0
NNODES=2, MYRANK=1, HOSTNAME=thread #1
Reading file work/wudata_05.tpr, VERSION 4.0.99_development_20090605 (single precision)
Making 1D domain decomposition 1 x 2 x 1
starting mdrun 'Mutant_scan'


I was expecting to see 8 threads.

The command : ./fah6 -verbosity 9 -advmethods -smp 8
Crashes with a CoreStatus = 0 (0)

Code: Select all
[01:29:08]
[01:29:08] Loaded queue successfully.
[01:29:08]
[01:29:08] + Processing work unit
[01:29:08] - Autosending finished units... [July 13 01:29:08 UTC]
[01:29:08] Core required: FahCore_a3.exe
[01:29:08] Trying to send all finished work units
[01:29:08] Core found.
[01:29:08] + No unsent completed units remaining.
[01:29:08] - Autosend completed
[01:29:08] Working on queue slot 06 [July 13 01:29:08 UTC]
[01:29:08] + Working ...
[01:29:08] - Calling './FahCore_a3.exe -dir work/ -nice 19 -suffix 06 -np 8 -checkpoint 15 -verbose -lifeline 7304 -version 629'

[01:29:08]
[01:29:08] *------------------------------*
[01:29:08] Folding@Home Gromacs SMP Core
[01:29:08] Version 2.22 (June 10, 2010)
[01:29:08]
[01:29:08] Preparing to commence simulation
[01:29:08] - Ensuring status. Please wait.
[01:29:18] - Looking at optimizations...
[01:29:18] - Working with standard loops on this execution.
[01:29:18] - Created dyn
[01:29:18] - Files status OK
[01:29:18] - Expanded 1764012 -> 2248557 (decompressed 127.4 percent)
[01:29:18] Called DecompressByteArray: compressed_data_size=1764012 data_size=2248557, decompressed_data_size=2248557 diff=0
[01:29:18] - Digital signature verified
[01:29:18]
[01:29:18] Project: 6066 (Run 0, Clone 102, Gen 64)
[01:29:18]
[01:29:18] Entering M.D.
Starting 8 threads
NNODES=8, MYRANK=2, HOSTNAME=thread #2
NNODES=8, MYRANK=1, HOSTNAME=thread #1
NNODES=8, MYRANK=3, HOSTNAME=thread #3
NNODES=8, MYRANK=4, HOSTNAME=thread #4
NNODES=8, MYRANK=5, HOSTNAME=thread #5
NNODES=8, MYRANK=0, HOSTNAME=thread #0
NNODES=8, MYRANK=7, HOSTNAME=thread #7
NNODES=8, MYRANK=6, HOSTNAME=thread #6
Reading file work/wudata_06.tpr, VERSION 4.0.99_development_20090605 (single precision)
Making 1D domain decomposition 1 x 8 x 1
[01:29:24] CoreStatus = 0 (0)
[01:29:24] Sending work to server
[01:29:24] Project: 6066 (Run 0, Clone 102, Gen 64)
[01:29:24] - Error: Could not get length of results file work/wuresults_06.dat
[01:29:24] - Error: Could not read unit 06 file. Removing from queue.


When I try:
./fah6 -verbosity 9 -advmethods -smp 7
Everything works fine

Code: Select all
[01:30:54] Loaded queue successfully.
[01:30:54]
[01:30:54] + Processing work unit
[01:30:54] Core required: FahCore_a3.exe
[01:30:54] Core found.
[01:30:54] - Autosending finished units... [July 13 01:30:54 UTC]
[01:30:54] Trying to send all finished work units
[01:30:54] + No unsent completed units remaining.
[01:30:54] - Autosend completed
[01:30:54] Working on queue slot 08 [July 13 01:30:54 UTC]
[01:30:54] + Working ...
[01:30:54] - Calling './FahCore_a3.exe -dir work/ -nice 19 -suffix 08 -np 7 -checkpoint 15 -verbose -lifeline 8086 -version 629'

[01:30:54]
[01:30:54] *------------------------------*
[01:30:54] Folding@Home Gromacs SMP Core
[01:30:54] Version 2.22 (June 10, 2010)
[01:30:54]
[01:30:54] Preparing to commence simulation
[01:30:54] - Looking at optimizations...
[01:30:54] - Files status OK
[01:30:54] - Expanded 1764012 -> 2248557 (decompressed 127.4 percent)
[01:30:54] Called DecompressByteArray: compressed_data_size=1764012 data_size=2248557, decompressed_data_size=2248557 diff=0
[01:30:54] - Digital signature verified
[01:30:54]
[01:30:54] Project: 6066 (Run 0, Clone 102, Gen 64)
[01:30:54]
[01:30:54] Assembly optimizations on if available.
[01:30:54] Entering M.D.
Starting 7 threads
NNODES=7, MYRANK=1, HOSTNAME=thread #1
NNODES=7, MYRANK=3, HOSTNAME=thread #3
NNODES=7, MYRANK=0, HOSTNAME=thread #0
NNODES=7, MYRANK=2, HOSTNAME=thread #2
Reading file work/wudata_08.tpr, VERSION 4.0.99_development_20090605 (single precision)
NNODES=7, MYRANK=5, HOSTNAME=thread #5
NNODES=7, MYRANK=6, HOSTNAME=thread #6
NNODES=7, MYRANK=4, HOSTNAME=thread #4
Making 1D domain decomposition 1 x 7 x 1
starting mdrun 'Mutant_scan'
32500002 steps,  65000.0 ps (continuing from step 32000002,  64000.0 ps).
[01:31:01] Completed 0 out of 500000 steps  (0%)
Last edited by petervanderdoes on Tue Jul 13, 2010 12:12 pm, edited 1 time in total.
petervanderdoes
 
Posts: 15
Joined: Wed May 26, 2010 3:37 pm

Re: FahCore_a3.exe problem with XEON E5220

Postby bollix47 » Tue Jul 13, 2010 3:14 am

Not sure if it's a big or if I'm doing something wrong:
I start: ./fah6 -verbosity 9 -advmethods


Try adding -smp
bollix47
 
Posts: 957
Joined: Sun Dec 02, 2007 6:04 am
Location: Canada

Re: FahCore_a3.exe problem with XEON E5220

Postby petervanderdoes » Tue Jul 13, 2010 11:20 am

Adding -smp results in the FahCore_a3.exe command line:
Code: Select all
Calling './FahCore_a3.exe -dir work/ -nice 19 -suffix 06 -np 8 -checkpoint 15 -verbose -lifeline 31246 -version 629'

And crashing with a CoreStatus = 0 (0)
petervanderdoes
 
Posts: 15
Joined: Wed May 26, 2010 3:37 pm

Re: FahCore_a3.exe problem with XEON E5220

Postby Russ_64 » Tue Jul 13, 2010 11:29 am

Download the newer drop-in binary (v6.30) and drop it in. Try again........

Oops - sorry that one is for Windows clients :oops:
ImageImage
User avatar
Russ_64
 
Posts: 178
Joined: Wed Dec 05, 2007 5:31 pm
Location: London, UK

Re: FahCore_a3.exe problem with XEON E5220

Postby bollix47 » Tue Jul 13, 2010 11:43 am

petervanderdoes wrote:Adding -smp results in the FahCore_a3.exe command line:
Code: Select all
Calling './FahCore_a3.exe -dir work/ -nice 19 -suffix 06 -np 8 -checkpoint 15 -verbose -lifeline 31246 -version 629'

And crashing with a CoreStatus = 0 (0)


That's what I expected since using -smp is the same as using -smp 8 in this case.

Things to look into would be your PSU and temperatures as that configuration would be using more power and producing more heat. Also, is anything else running that might cause the extra core to be unavailable to fah? Are you running Linux natively or in a VM? If VM have you allowed 8 cores to be used by it?

According to intel that processor only has 2 cores and no HT so I don't understand why it's selecting 8 threads when using -smp, it should say np=2 unless you have a 4 socket MB and 4 cpus???

http://ark.intel.com/Product.aspx?id=36593

ahhhhhhh ... I see in your opening post that it's an E5520 not E5220 ... much different. You might want to change the title by editing your opening post to avoid further possible confusion. :ewink:

Perhaps -smp 7 is the correct configuration for that o/s or MB ... I seem to recall others having problems using all cores on certain Linux distributions but don't think centos was 1 of them.

gl
bollix47
 
Posts: 957
Joined: Sun Dec 02, 2007 6:04 am
Location: Canada

Re: FahCore_a3.exe problem with XEON E5220

Postby petervanderdoes » Tue Jul 13, 2010 12:11 pm

Not running in a VM.

How can I check if anything else is locking the last CPU?

Edit: Changed the title, sorry for the confusion there.

Not sure why it only selects two threads when running without -smp, but for me that's not that important.

If adding the last CPU would be strain on power and/or heat shouldn't I have got problems with -smp 7 as well at times. This machine is not dedicated for folding it's a webserver as well.
petervanderdoes
 
Posts: 15
Joined: Wed May 26, 2010 3:37 pm

Re: FahCore_a3.exe problem with XEON E5520

Postby bollix47 » Tue Jul 13, 2010 12:27 pm

I would continue using -smp 7 and leave the other core available for your webserver duties.

You can use the top command to see where your cpu is being used.
bollix47
 
Posts: 957
Joined: Sun Dec 02, 2007 6:04 am
Location: Canada

Re: FahCore_a3.exe problem with XEON E5520

Postby codysluder » Tue Jul 13, 2010 4:49 pm

Although it doesn't apply to anything you've said, you might also want to try -smp 6. There's a problem with prime numbers such as -smp 11 or -smp 23 and less so to smaller values like -smp 7. Some folks have found that -smp 6 runs just as fast as -smp 7 and your webserver activities might benefit, too.

YMMV.
codysluder
 
Posts: 1664
Joined: Sun Dec 02, 2007 1:43 pm


Return to Linux v6 Unified Client

Who is online

Users browsing this forum: No registered users