GROMACS - new lambda state with a Gibbs move

Moderators: Site Moderators, FAHC Science Team

Post Reply
handyj
Posts: 3
Joined: Tue Apr 07, 2020 10:52 pm

GROMACS - new lambda state with a Gibbs move

Post by handyj »

I'm using the latest version of the FAHClient and I'm not seeing an error in GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown, for Linux.

This is not the other error related to the primes. I changed my config to have 2 slots with 18 cpu each.

Code: Select all

21:04:35:<config>
21:04:35:  <!-- Folding Slot Configuration -->
21:04:35:  <cpus v='18'/>
21:04:35:  <gpu v='false'/>
21:04:35:
21:04:35:  <!-- Slot Control -->
21:04:35:  <power v='LIGHT'/>
21:04:35:
21:04:35:  <!-- User Information -->
21:04:35:  <passkey v='********************************'/>
21:04:35:  <user v='handyj'/>
21:04:35:
21:04:35:  <!-- Folding Slots -->
21:04:35:  <slot id='0' type='CPU'/>
21:04:35:  <slot id='1' type='CPU'/>
21:04:35:</config>
This is something different I can't seem to find on the forum:

Code: Select all

23:57:52:WU01:FS00:FahCore 0xa7 started
23:57:53:WU01:FS00:0xa7:*********************** Log Started 2020-04-16T23:57:52Z ***********************
23:57:53:WU01:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
23:57:53:WU01:FS00:0xa7:       Type: 0xa7
23:57:53:WU01:FS00:0xa7:       Core: Gromacs
23:57:53:WU01:FS00:0xa7:       Args: -dir 01 -suffix 01 -version 704 -lifeline 9155 -checkpoint 15 -np
23:57:53:WU01:FS00:0xa7:             18
23:57:53:WU01:FS00:0xa7:************************************ CBang *************************************
23:57:53:WU01:FS00:0xa7:       Date: Nov 5 2019
23:57:53:WU01:FS00:0xa7:       Time: 06:06:57
23:57:53:WU01:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
23:57:53:WU01:FS00:0xa7:     Branch: master
23:57:53:WU01:FS00:0xa7:   Compiler: GNU 8.3.0
23:57:53:WU01:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
23:57:53:WU01:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
23:57:53:WU01:FS00:0xa7:       Bits: 64
23:57:53:WU01:FS00:0xa7:       Mode: Release
23:57:53:WU01:FS00:0xa7:************************************ System ************************************
23:57:53:WU01:FS00:0xa7:        CPU: Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz
23:57:53:WU01:FS00:0xa7:     CPU ID: GenuineIntel Family 6 Model 63 Stepping 2
23:57:53:WU01:FS00:0xa7:       CPUs: 56
23:57:53:WU01:FS00:0xa7:     Memory: 62.77GiB
23:57:53:WU01:FS00:0xa7:Free Memory: 54.84GiB
23:57:53:WU01:FS00:0xa7:    Threads: POSIX_THREADS
23:57:53:WU01:FS00:0xa7: OS Version: 5.5
23:57:53:WU01:FS00:0xa7:Has Battery: false
23:57:53:WU01:FS00:0xa7: On Battery: false
23:57:53:WU01:FS00:0xa7: UTC Offset: -6
23:57:53:WU01:FS00:0xa7:        PID: 9159
23:57:53:WU01:FS00:0xa7:        CWD: /home/xxxxxx/work
23:57:53:WU01:FS00:0xa7:******************************** Build - libFAH ********************************
23:57:53:WU01:FS00:0xa7:    Version: 0.0.18
23:57:53:WU01:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
23:57:53:WU01:FS00:0xa7:  Copyright: 2019 foldingathome.org
23:57:53:WU01:FS00:0xa7:   Homepage: hXXX/s://foldingathome.org/
23:57:53:WU01:FS00:0xa7:       Date: Nov 5 2019
23:57:53:WU01:FS00:0xa7:       Time: 06:13:26
23:57:53:WU01:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
23:57:53:WU01:FS00:0xa7:     Branch: master
23:57:53:WU01:FS00:0xa7:   Compiler: GNU 8.3.0
23:57:53:WU01:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
23:57:53:WU01:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
23:57:53:WU01:FS00:0xa7:       Bits: 64
23:57:53:WU01:FS00:0xa7:       Mode: Release
23:57:53:WU01:FS00:0xa7:************************************ Build *************************************
23:57:53:WU01:FS00:0xa7:       SIMD: avx_256
23:57:53:WU01:FS00:0xa7:********************************************************************************
23:57:53:WU01:FS00:0xa7:Project: 14379 (Run 651, Clone 2, Gen 0)
23:57:53:WU01:FS00:0xa7:Unit: 0x00000003455e42075e9330016bdc0c92
23:57:53:WU01:FS00:0xa7:Digital signatures verified
23:57:53:WU01:FS00:0xa7:Calling: mdrun -s frame0.tpr -o frame0.trr -cpt 15 -nt 18
23:57:53:WU01:FS00:0xa7:Steps: first=0 total=250000
23:57:54:WU01:FS00:0xa7:Completed 1 out of 250000 steps (0%)
23:57:56:WU00:FS01:0xa7:Completed 242500 out of 250000 steps (97%)
23:57:59:WU01:FS00:0xa7:ERROR:
23:57:59:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
23:57:59:WU01:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
23:57:59:WU01:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/expanded.c, line: 946
23:57:59:WU01:FS00:0xa7:ERROR:
23:57:59:WU01:FS00:0xa7:ERROR:Fatal error:
23:57:59:WU01:FS00:0xa7:ERROR:Something wrong in choosing new lambda state with a Gibbs move -- probably underflow in weight determination.
23:57:59:WU01:FS00:0xa7:ERROR:Denominator is:   0 1.0000000000e+00
23:57:59:WU01:FS00:0xa7:ERROR:  i                dE        numerator          weights
23:57:59:WU01:FS00:0xa7:ERROR:  0  0.0000000000e+00 1.0000000000e+00 0.0000000000e+00
23:57:59:WU01:FS00:0xa7:ERROR:  1 -3.6481990814e+01 1.4324276664e-16 1.0000000000e+01
23:57:59:WU01:FS00:0xa7:ERROR:  2 -7.2964065552e+01 2.0516768288e-32 1.0000000000e+01
23:57:59:WU01:FS00:0xa7:ERROR:  3 -1.0944599152e+02 2.9390692441e-48 1.0000000000e+01
23:57:59:WU01:FS00:0xa7:ERROR:  4 -1.4592778015e+02 4.2108553591e-64 1.0000000000e+01
23:57:59:WU01:FS00:0xa7:ERROR:  5 -1.8240985107e+02 6.0312625402e-80 1.0000000000e+01
23:57:59:WU01:FS00:0xa7:ERROR:  6 -2.1889172363e+02 8.6403690378e-96 1.0000000000e+01
23:58:00:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)

Trying to report results for Project: 14379 (Run 651, Clone 2, Gen 0) but stopping and starting the client doesn't do anything.

--handyj
Joe_H
Site Admin
Posts: 7868
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: GROMACS - new lambda state with a Gibbs move

Post by Joe_H »

I will pass on the information to the researcher. Is there more about this WU in the log that you could post?
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
handyj
Posts: 3
Joined: Tue Apr 07, 2020 10:52 pm

Re: GROMACS - new lambda state with a Gibbs move

Post by handyj »

Such as?

It repeatedly tries to upload but keeps getting the same error.

How about this?

Code: Select all

01:56:10:WU01:FS00:0xa7:Project: 14379 (Run 651, Clone 2, Gen 0)
01:56:10:WU01:FS00:0xa7:Unit: 0x00000003455e42075e9330016bdc0c92
01:56:10:WU01:FS00:0xa7:Digital signatures verified
01:56:10:WU01:FS00:0xa7:Calling: mdrun -s frame0.tpr -o frame0.trr -cpt 15 -nt 18
01:56:10:WU01:FS00:0xa7:Steps: first=0 total=250000
01:56:12:WU01:FS00:0xa7:Completed 1 out of 250000 steps (0%)
01:56:16:WU01:FS00:0xa7:ERROR:
01:56:16:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
01:56:16:WU01:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
01:56:16:WU01:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/expanded.c, line: 946
01:56:16:WU01:FS00:0xa7:ERROR:
01:56:16:WU01:FS00:0xa7:ERROR:Fatal error:
01:56:16:WU01:FS00:0xa7:ERROR:Something wrong in choosing new lambda state with a Gibbs move -- probably underflow in weight determination.
01:56:16:WU01:FS00:0xa7:ERROR:Denominator is:   0 1.0000000000e+00
01:56:16:WU01:FS00:0xa7:ERROR:  i                dE        numerator          weights
01:56:16:WU01:FS00:0xa7:ERROR:  0  0.0000000000e+00 1.0000000000e+00 0.0000000000e+00
01:56:16:WU01:FS00:0xa7:ERROR:  1 -3.6458992004e+01 1.4657535569e-16 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR:  2 -7.2918243408e+01 2.1478762593e-32 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR:  3 -1.0937696838e+02 3.1490980545e-48 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR:  4 -1.4583586121e+02 4.6162595020e-64 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR:  5 -1.8229490662e+02 6.7659374348e-80 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR:  6 -2.1875386047e+02 9.9175751788e-96 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR:  7 -2.5521287537e+021.4536388378e-111 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR:  8 -2.9167184448e+022.1307250648e-127 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR:  9 -3.2813073730e+023.1234276153e-143 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 10 -3.6458959961e+024.5787689632e-159 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 11 -4.0104867554e+026.7107836789e-175 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 12 -4.3750750732e+029.8379311271e-191 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 13 -4.7396652222e+021.4419652498e-206 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 14 -5.1042547607e+022.1136463215e-222 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 15 -5.4688433838e+023.0984864604e-238 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 16 -5.8334332275e+024.5416522001e-254 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 17 -6.1980236816e+026.6565873568e-270 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 18 -6.5626123047e+029.7581821460e-286 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 19 -6.9272015381e+021.4304072338e-301 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 20 -7.2917913818e+022.0966406898e-317 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 21 -7.2505981445e+021.2898034470e-315 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 22 -7.2556091309e+027.8144633973e-316 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 23 -7.2908026123e+022.3145453785e-317 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 24 -7.3463104248e+028.9905125574e-320 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 25 -7.3796032715e+023.2213080109e-321 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 26 -7.4157794189e+028.3991159793e-323 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 27 -7.4543652344e+02 0.0000000000e+00 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 28 -7.4949688721e+02 0.0000000000e+00 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 29 -7.5201599121e+02 0.0000000000e+00 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 30 -7.5458972168e+02 0.0000000000e+00 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 31 -7.5721246338e+02 0.0000000000e+00 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 32 -7.5987927246e+02 0.0000000000e+00 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 33 -7.6258557129e+02 0.0000000000e+00 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 34 -7.6532727051e+02 0.0000000000e+00 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 35 -7.6810070801e+02 0.0000000000e+00 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 36 -7.7090240479e+02 0.0000000000e+00 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 37 -7.7467669678e+02 0.0000000000e+00 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 38 -7.7848931885e+02 0.0000000000e+00 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 39 -7.8620697021e+02 0.0000000000e+00 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR:
01:56:16:WU01:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
01:56:16:WU01:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
01:56:16:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
01:56:16:WU01:FS00:0xa7:ERROR:
01:56:16:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
01:56:16:WU01:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
01:56:16:WU01:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/expanded.c, line: 946
01:56:16:WU01:FS00:0xa7:ERROR:
01:56:17:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
01:57:10:WU01:FS00:Starting
01:57:10:WU01:FS00:Removing old file './work/01/logfile_01-20200417-012509.txt'/
01:57:10:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /home/user/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 01 -suffix 01 -version 704 -lifeline 9718 -checkpoint 15 -np 18
Joe_H
Site Admin
Posts: 7868
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: GROMACS - new lambda state with a Gibbs move

Post by Joe_H »

That is not a repeated attempt to upload, but repeated starts. It appears you have run into a bug with the Linux A7 folding core, after several attempts to start and failing, it should fail out the WU and send in a report. The bug is that is doesn't and keeps restarting under some error situations. About all you can do is pause the folding slot that has that WU and delete it.

Pause the folding slot, go to the work directory shown as CWD in the very beginning section part of the log, and delete a folder named '01', from the WU01 shown here in the log. Then when you start the slot again, the client will detect the missing work files and send a report before requesting a new WU for the slot.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
handyj
Posts: 3
Joined: Tue Apr 07, 2020 10:52 pm

Re: GROMACS - new lambda state with a Gibbs move

Post by handyj »

Joe_H,

This worked, thanks. I may have mis-spoke about the upload, I didn't understand the process.

It may not seem like much but having just that one sentence telling me how to fix it was great. It's not often on forums I see help like that. You gave me just enough info to let my brain do the rest.

Gold star to you.

-handyj
vvoelz
Pande Group Member
Posts: 539
Joined: Sun Dec 02, 2007 8:07 pm
Location: Temple University, Philadelphia PA

Re: GROMACS - new lambda state with a Gibbs move

Post by vvoelz »

Hey handyj and Joe_H:

The verdict is still out about the nature of the error here, but my guess is that it has to do with numerical precision issues in calculating Monte Carlo probabilities of accepting/rejecting changes in the coupling parameter (which controls the scaling of non-bonded interactions) This is particular to the expanded ensemble code, which we haven’t used much before this most recent work.

IN ANY CASE, inspection of the returns so far reveals that the other CLONEs are just fine, so it really is likely a fluke. I will stop that particular run/clone so it won't be sent out again.
Post Reply