GROMACS - new lambda state with a Gibbs move

Moderators: Site Moderators, FAHC Science Team

GROMACS - new lambda state with a Gibbs move

Postby handyj » Fri Apr 17, 2020 1:12 am

I'm using the latest version of the FAHClient and I'm not seeing an error in GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown, for Linux.

This is not the other error related to the primes. I changed my config to have 2 slots with 18 cpu each.
Code: Select all
21:04:35:<config>
21:04:35:  <!-- Folding Slot Configuration -->
21:04:35:  <cpus v='18'/>
21:04:35:  <gpu v='false'/>
21:04:35:
21:04:35:  <!-- Slot Control -->
21:04:35:  <power v='LIGHT'/>
21:04:35:
21:04:35:  <!-- User Information -->
21:04:35:  <passkey v='********************************'/>
21:04:35:  <user v='handyj'/>
21:04:35:
21:04:35:  <!-- Folding Slots -->
21:04:35:  <slot id='0' type='CPU'/>
21:04:35:  <slot id='1' type='CPU'/>
21:04:35:</config>


This is something different I can't seem to find on the forum:

Code: Select all
23:57:52:WU01:FS00:FahCore 0xa7 started
23:57:53:WU01:FS00:0xa7:*********************** Log Started 2020-04-16T23:57:52Z ***********************
23:57:53:WU01:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
23:57:53:WU01:FS00:0xa7:       Type: 0xa7
23:57:53:WU01:FS00:0xa7:       Core: Gromacs
23:57:53:WU01:FS00:0xa7:       Args: -dir 01 -suffix 01 -version 704 -lifeline 9155 -checkpoint 15 -np
23:57:53:WU01:FS00:0xa7:             18
23:57:53:WU01:FS00:0xa7:************************************ CBang *************************************
23:57:53:WU01:FS00:0xa7:       Date: Nov 5 2019
23:57:53:WU01:FS00:0xa7:       Time: 06:06:57
23:57:53:WU01:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
23:57:53:WU01:FS00:0xa7:     Branch: master
23:57:53:WU01:FS00:0xa7:   Compiler: GNU 8.3.0
23:57:53:WU01:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
23:57:53:WU01:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
23:57:53:WU01:FS00:0xa7:       Bits: 64
23:57:53:WU01:FS00:0xa7:       Mode: Release
23:57:53:WU01:FS00:0xa7:************************************ System ************************************
23:57:53:WU01:FS00:0xa7:        CPU: Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz
23:57:53:WU01:FS00:0xa7:     CPU ID: GenuineIntel Family 6 Model 63 Stepping 2
23:57:53:WU01:FS00:0xa7:       CPUs: 56
23:57:53:WU01:FS00:0xa7:     Memory: 62.77GiB
23:57:53:WU01:FS00:0xa7:Free Memory: 54.84GiB
23:57:53:WU01:FS00:0xa7:    Threads: POSIX_THREADS
23:57:53:WU01:FS00:0xa7: OS Version: 5.5
23:57:53:WU01:FS00:0xa7:Has Battery: false
23:57:53:WU01:FS00:0xa7: On Battery: false
23:57:53:WU01:FS00:0xa7: UTC Offset: -6
23:57:53:WU01:FS00:0xa7:        PID: 9159
23:57:53:WU01:FS00:0xa7:        CWD: /home/xxxxxx/work
23:57:53:WU01:FS00:0xa7:******************************** Build - libFAH ********************************
23:57:53:WU01:FS00:0xa7:    Version: 0.0.18
23:57:53:WU01:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
23:57:53:WU01:FS00:0xa7:  Copyright: 2019 foldingathome.org
23:57:53:WU01:FS00:0xa7:   Homepage: hXXX/s://foldingathome.org/
23:57:53:WU01:FS00:0xa7:       Date: Nov 5 2019
23:57:53:WU01:FS00:0xa7:       Time: 06:13:26
23:57:53:WU01:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
23:57:53:WU01:FS00:0xa7:     Branch: master
23:57:53:WU01:FS00:0xa7:   Compiler: GNU 8.3.0
23:57:53:WU01:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
23:57:53:WU01:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
23:57:53:WU01:FS00:0xa7:       Bits: 64
23:57:53:WU01:FS00:0xa7:       Mode: Release
23:57:53:WU01:FS00:0xa7:************************************ Build *************************************
23:57:53:WU01:FS00:0xa7:       SIMD: avx_256
23:57:53:WU01:FS00:0xa7:********************************************************************************
23:57:53:WU01:FS00:0xa7:Project: 14379 (Run 651, Clone 2, Gen 0)
23:57:53:WU01:FS00:0xa7:Unit: 0x00000003455e42075e9330016bdc0c92
23:57:53:WU01:FS00:0xa7:Digital signatures verified
23:57:53:WU01:FS00:0xa7:Calling: mdrun -s frame0.tpr -o frame0.trr -cpt 15 -nt 18
23:57:53:WU01:FS00:0xa7:Steps: first=0 total=250000
23:57:54:WU01:FS00:0xa7:Completed 1 out of 250000 steps (0%)
23:57:56:WU00:FS01:0xa7:Completed 242500 out of 250000 steps (97%)
23:57:59:WU01:FS00:0xa7:ERROR:
23:57:59:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
23:57:59:WU01:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
23:57:59:WU01:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/expanded.c, line: 946
23:57:59:WU01:FS00:0xa7:ERROR:
23:57:59:WU01:FS00:0xa7:ERROR:Fatal error:
23:57:59:WU01:FS00:0xa7:ERROR:Something wrong in choosing new lambda state with a Gibbs move -- probably underflow in weight determination.
23:57:59:WU01:FS00:0xa7:ERROR:Denominator is:   0 1.0000000000e+00
23:57:59:WU01:FS00:0xa7:ERROR:  i                dE        numerator          weights
23:57:59:WU01:FS00:0xa7:ERROR:  0  0.0000000000e+00 1.0000000000e+00 0.0000000000e+00
23:57:59:WU01:FS00:0xa7:ERROR:  1 -3.6481990814e+01 1.4324276664e-16 1.0000000000e+01
23:57:59:WU01:FS00:0xa7:ERROR:  2 -7.2964065552e+01 2.0516768288e-32 1.0000000000e+01
23:57:59:WU01:FS00:0xa7:ERROR:  3 -1.0944599152e+02 2.9390692441e-48 1.0000000000e+01
23:57:59:WU01:FS00:0xa7:ERROR:  4 -1.4592778015e+02 4.2108553591e-64 1.0000000000e+01
23:57:59:WU01:FS00:0xa7:ERROR:  5 -1.8240985107e+02 6.0312625402e-80 1.0000000000e+01
23:57:59:WU01:FS00:0xa7:ERROR:  6 -2.1889172363e+02 8.6403690378e-96 1.0000000000e+01
23:58:00:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)


Trying to report results for Project: 14379 (Run 651, Clone 2, Gen 0) but stopping and starting the client doesn't do anything.

--handyj
handyj
 
Posts: 3
Joined: Tue Apr 07, 2020 11:52 pm

Re: GROMACS - new lambda state with a Gibbs move

Postby Joe_H » Fri Apr 17, 2020 2:11 am

I will pass on the information to the researcher. Is there more about this WU in the log that you could post?
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Joe_H
Site Admin
 
Posts: 6426
Joined: Tue Apr 21, 2009 5:41 pm
Location: W. MA

Re: GROMACS - new lambda state with a Gibbs move

Postby handyj » Fri Apr 17, 2020 2:52 am

Such as?

It repeatedly tries to upload but keeps getting the same error.

How about this?

Code: Select all
01:56:10:WU01:FS00:0xa7:Project: 14379 (Run 651, Clone 2, Gen 0)
01:56:10:WU01:FS00:0xa7:Unit: 0x00000003455e42075e9330016bdc0c92
01:56:10:WU01:FS00:0xa7:Digital signatures verified
01:56:10:WU01:FS00:0xa7:Calling: mdrun -s frame0.tpr -o frame0.trr -cpt 15 -nt 18
01:56:10:WU01:FS00:0xa7:Steps: first=0 total=250000
01:56:12:WU01:FS00:0xa7:Completed 1 out of 250000 steps (0%)
01:56:16:WU01:FS00:0xa7:ERROR:
01:56:16:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
01:56:16:WU01:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
01:56:16:WU01:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/expanded.c, line: 946
01:56:16:WU01:FS00:0xa7:ERROR:
01:56:16:WU01:FS00:0xa7:ERROR:Fatal error:
01:56:16:WU01:FS00:0xa7:ERROR:Something wrong in choosing new lambda state with a Gibbs move -- probably underflow in weight determination.
01:56:16:WU01:FS00:0xa7:ERROR:Denominator is:   0 1.0000000000e+00
01:56:16:WU01:FS00:0xa7:ERROR:  i                dE        numerator          weights
01:56:16:WU01:FS00:0xa7:ERROR:  0  0.0000000000e+00 1.0000000000e+00 0.0000000000e+00
01:56:16:WU01:FS00:0xa7:ERROR:  1 -3.6458992004e+01 1.4657535569e-16 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR:  2 -7.2918243408e+01 2.1478762593e-32 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR:  3 -1.0937696838e+02 3.1490980545e-48 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR:  4 -1.4583586121e+02 4.6162595020e-64 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR:  5 -1.8229490662e+02 6.7659374348e-80 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR:  6 -2.1875386047e+02 9.9175751788e-96 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR:  7 -2.5521287537e+021.4536388378e-111 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR:  8 -2.9167184448e+022.1307250648e-127 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR:  9 -3.2813073730e+023.1234276153e-143 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 10 -3.6458959961e+024.5787689632e-159 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 11 -4.0104867554e+026.7107836789e-175 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 12 -4.3750750732e+029.8379311271e-191 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 13 -4.7396652222e+021.4419652498e-206 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 14 -5.1042547607e+022.1136463215e-222 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 15 -5.4688433838e+023.0984864604e-238 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 16 -5.8334332275e+024.5416522001e-254 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 17 -6.1980236816e+026.6565873568e-270 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 18 -6.5626123047e+029.7581821460e-286 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 19 -6.9272015381e+021.4304072338e-301 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 20 -7.2917913818e+022.0966406898e-317 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 21 -7.2505981445e+021.2898034470e-315 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 22 -7.2556091309e+027.8144633973e-316 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 23 -7.2908026123e+022.3145453785e-317 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 24 -7.3463104248e+028.9905125574e-320 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 25 -7.3796032715e+023.2213080109e-321 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 26 -7.4157794189e+028.3991159793e-323 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 27 -7.4543652344e+02 0.0000000000e+00 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 28 -7.4949688721e+02 0.0000000000e+00 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 29 -7.5201599121e+02 0.0000000000e+00 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 30 -7.5458972168e+02 0.0000000000e+00 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 31 -7.5721246338e+02 0.0000000000e+00 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 32 -7.5987927246e+02 0.0000000000e+00 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 33 -7.6258557129e+02 0.0000000000e+00 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 34 -7.6532727051e+02 0.0000000000e+00 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 35 -7.6810070801e+02 0.0000000000e+00 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 36 -7.7090240479e+02 0.0000000000e+00 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 37 -7.7467669678e+02 0.0000000000e+00 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 38 -7.7848931885e+02 0.0000000000e+00 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR: 39 -7.8620697021e+02 0.0000000000e+00 1.0000000000e+01
01:56:16:WU01:FS00:0xa7:ERROR:
01:56:16:WU01:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
01:56:16:WU01:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
01:56:16:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
01:56:16:WU01:FS00:0xa7:ERROR:
01:56:16:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
01:56:16:WU01:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
01:56:16:WU01:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/expanded.c, line: 946
01:56:16:WU01:FS00:0xa7:ERROR:
01:56:17:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
01:57:10:WU01:FS00:Starting
01:57:10:WU01:FS00:Removing old file './work/01/logfile_01-20200417-012509.txt'/
01:57:10:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /home/user/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 01 -suffix 01 -version 704 -lifeline 9718 -checkpoint 15 -np 18
handyj
 
Posts: 3
Joined: Tue Apr 07, 2020 11:52 pm

Re: GROMACS - new lambda state with a Gibbs move

Postby Joe_H » Fri Apr 17, 2020 4:07 am

That is not a repeated attempt to upload, but repeated starts. It appears you have run into a bug with the Linux A7 folding core, after several attempts to start and failing, it should fail out the WU and send in a report. The bug is that is doesn't and keeps restarting under some error situations. About all you can do is pause the folding slot that has that WU and delete it.

Pause the folding slot, go to the work directory shown as CWD in the very beginning section part of the log, and delete a folder named '01', from the WU01 shown here in the log. Then when you start the slot again, the client will detect the missing work files and send a report before requesting a new WU for the slot.
Joe_H
Site Admin
 
Posts: 6426
Joined: Tue Apr 21, 2009 5:41 pm
Location: W. MA

Re: GROMACS - new lambda state with a Gibbs move

Postby handyj » Fri Apr 17, 2020 3:20 pm

Joe_H,

This worked, thanks. I may have mis-spoke about the upload, I didn't understand the process.

It may not seem like much but having just that one sentence telling me how to fix it was great. It's not often on forums I see help like that. You gave me just enough info to let my brain do the rest.

Gold star to you.

-handyj
handyj
 
Posts: 3
Joined: Tue Apr 07, 2020 11:52 pm

Re: GROMACS - new lambda state with a Gibbs move

Postby vvoelz » Sun Apr 19, 2020 11:43 pm

Hey handyj and Joe_H:

The verdict is still out about the nature of the error here, but my guess is that it has to do with numerical precision issues in calculating Monte Carlo probabilities of accepting/rejecting changes in the coupling parameter (which controls the scaling of non-bonded interactions) This is particular to the expanded ensemble code, which we haven’t used much before this most recent work.

IN ANY CASE, inspection of the returns so far reveals that the other CLONEs are just fine, so it really is likely a fluke. I will stop that particular run/clone so it won't be sent out again.
User avatar
vvoelz
Pande Group Member
 
Posts: 485
Joined: Sun Dec 02, 2007 9:07 pm
Location: Temple University, Philadelphia PA


Return to V7.5.1 Public Release Windows/Linux/MacOS X

Who is online

Users browsing this forum: No registered users and 1 guest

cron