FAHCore seg faulting on pop os

FAH provides a V7 client installer for Debian / Mint / Ubuntu / RedHat / CentOS / Fedora. Installation on other distros may or may not be easy but if you can offer help to others, they would appreciate it.

Moderators: Site Moderators, PandeGroup

FAHCore seg faulting on pop os

Postby cujomalainey » Thu Mar 26, 2020 3:23 am

I have been running jobs for about a week when I noticed my machine wasn't under load for about a day. Looked in the log and seems there is some sort of issue in one of the libraries. Do I have a bad job in my queue or is there something wrong with the client? Thanks.

Dmesg
Code: Select all
[13471.708354] FahCore_a7[15763]: segfault at 50 ip 000000000120aa3d sp 00007ffc5bac4cf0 error 4 in FahCore_a7[406000+10cc000]
[13471.708360] Code: 73 08 0f 84 83 00 00 00 48 c7 44 24 30 00 00 00 00 4c 8d 74 24 20 4c 89 74 24 08 f3 0f 7e 64 24 08 66 0f 6c e4 0f 29 64 24 20 <48> 8b 16 4c 39 f2 0f 84 07 01 00 00 4c 39 f6 0f 84 fe 00 00 00 4c
[13531.741271] FahCore_a7[24861]: segfault at 50 ip 000000000120aa3d sp 00007ffce9fff140 error 4 in FahCore_a7[406000+10cc000]
[13531.741277] Code: 73 08 0f 84 83 00 00 00 48 c7 44 24 30 00 00 00 00 4c 8d 74 24 20 4c 89 74 24 08 f3 0f 7e 64 24 08 66 0f 6c e4 0f 29 64 24 20 <48> 8b 16 4c 39 f2 0f 84 07 01 00 00 4c 39 f6 0f 84 fe 00 00 00 4c
[13591.778074] FahCore_a7[1683]: segfault at 50 ip 000000000120aa3d sp 00007ffc3463cff0 error 4 in FahCore_a7[406000+10cc000]
[13591.778112] Code: 73 08 0f 84 83 00 00 00 48 c7 44 24 30 00 00 00 00 4c 8d 74 24 20 4c 89 74 24 08 f3 0f 7e 64 24 08 66 0f 6c e4 0f 29 64 24 20 <48> 8b 16 4c 39 f2 0f 84 07 01 00 00 4c 39 f6 0f 84 fe 00 00 00 4c


Code: Select all
02:20:22:WU01:FS00:0xa7:************************************ Build *************************************
02:20:22:WU01:FS00:0xa7:       SIMD: avx_256
02:20:22:WU01:FS00:0xa7:********************************************************************************
02:20:22:WU01:FS00:0xa7:Project: 13833 (Run 0, Clone 2650, Gen 2)                                                                                                                             
02:20:22:WU01:FS00:0xa7:Unit: 0x0000000480fccb095e6e56038838e939
02:20:22:WU01:FS00:0xa7:Reading tar file core.xml
02:20:22:WU01:FS00:0xa7:Reading tar file frame2.tpr
02:20:22:WU01:FS00:0xa7:Digital signatures verified
02:20:22:WU01:FS00:0xa7:Calling: mdrun -s frame2.tpr -o frame2.trr -x frame2.xtc -cpt 15 -nt 15         
02:20:22:WU01:FS00:0xa7:Steps: first=500000 total=250000                                                                                                                                       
02:20:22:WU01:FS00:0xa7:ERROR:                                                                                                                                                                 
02:20:22:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
02:20:22:WU01:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown                                                                                                       
02:20:22:WU01:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
02:20:22:WU01:FS00:0xa7:ERROR:                                                                                                                                                                 
02:20:22:WU01:FS00:0xa7:ERROR:Fatal error:                                                     
02:20:22:WU01:FS00:0xa7:ERROR:There is no domain decomposition for 15 ranks that is compatible with the given box and a minimum cell size of 1.45733 nm
02:20:22:WU01:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
02:20:22:WU01:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
02:20:22:WU01:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
02:20:22:WU01:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
02:20:22:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
02:20:27:WU01:FS00:0xa7:WARNING:Unexpected exit() call
02:20:27:WU01:FS00:0xa7:WARNING:Unexpected exit from science code
02:20:27:WU01:FS00:0xa7:Saving result file ../logfile_01.txt                                                                                                                                   
02:20:27:WU01:FS00:0xa7:Saving result file md.log                           
02:20:27:WU01:FS00:0xa7:Saving result file science.log                                                                                                                                         
02:20:27:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)                                 
02:21:22:WU01:FS00:Starting                                                                   
02:21:22:WU01:FS00:Removing old file './work/01/logfile_01-20200326-014921.txt'               
02:21:22:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 01 -suffix 01 -version 705 -lifeline
1711 -checkpoint 15 -np 15                                                                     
02:21:22:WU01:FS00:Started FahCore on PID 22010                                                                                                                                               
02:21:22:WU01:FS00:Core PID:22018                                                             
02:21:22:WU01:FS00:FahCore 0xa7 started
02:21:22:WU01:FS00:0xa7:*********************** Log Started 2020-03-26T02:21:22Z ***********************
02:21:22:WU01:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
02:21:22:WU01:FS00:0xa7:       Type: 0xa7                                                                                                                                                     
02:21:22:WU01:FS00:0xa7:       Core: Gromacs
02:21:22:WU01:FS00:0xa7:       Args: -dir 01 -suffix 01 -version 705 -lifeline 22010 -checkpoint 15 -np
02:21:22:WU01:FS00:0xa7:             15                                                                                                                                                       
02:21:22:WU01:FS00:0xa7:************************************ CBang *************************************                                                                                       
02:21:22:WU01:FS00:0xa7:       Date: Nov 5 2019                                           
02:21:22:WU01:FS00:0xa7:       Time: 06:06:57                                                                                                                                                 
02:21:22:WU01:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9       
02:21:22:WU01:FS00:0xa7:     Branch: master                                                   
02:21:22:WU01:FS00:0xa7:   Compiler: GNU 8.3.0                                                 
02:21:22:WU01:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
02:21:22:WU01:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64 
02:21:22:WU01:FS00:0xa7:       Bits: 64                                                       
02:21:22:WU01:FS00:0xa7:       Mode: Release                                                   
02:21:22:WU01:FS00:0xa7:************************************ System ************************************
02:21:22:WU01:FS00:0xa7:        CPU: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
02:21:22:WU01:FS00:0xa7:     CPU ID: GenuineIntel Family 6 Model 158 Stepping 12
02:21:22:WU01:FS00:0xa7:       CPUs: 16
02:21:22:WU01:FS00:0xa7:     Memory: 31.28GiB
02:21:22:WU01:FS00:0xa7:Free Memory: 13.97GiB
02:21:22:WU01:FS00:0xa7:    Threads: POSIX_THREADS
02:21:22:WU01:FS00:0xa7: OS Version: 5.3
02:21:22:WU01:FS00:0xa7:Has Battery: false
02:21:22:WU01:FS00:0xa7: On Battery: false
02:21:22:WU01:FS00:0xa7: UTC Offset: -7
02:21:22:WU01:FS00:0xa7:        PID: 22018
02:21:22:WU01:FS00:0xa7:        CWD: /var/lib/fahclient/work
02:21:22:WU01:FS00:0xa7:******************************** Build - libFAH ********************************
02:21:22:WU01:FS00:0xa7:    Version: 0.0.18
02:21:22:WU01:FS00:0xa7:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
02:21:22:WU01:FS00:0xa7:  Copyright: 2019 foldingathome.org
02:21:22:WU01:FS00:0xa7:   Homepage: https://foldingathome.org/
02:21:22:WU01:FS00:0xa7:       Date: Nov 5 2019
02:21:22:WU01:FS00:0xa7:       Time: 06:13:26
02:21:22:WU01:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
02:21:22:WU01:FS00:0xa7:     Branch: master
02:21:22:WU01:FS00:0xa7:   Compiler: GNU 8.3.0
02:21:22:WU01:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
02:21:22:WU01:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
02:21:22:WU01:FS00:0xa7:       Bits: 64
02:21:22:WU01:FS00:0xa7:       Mode: Release
02:21:22:WU01:FS00:0xa7:************************************ Build *************************************
02:21:22:WU01:FS00:0xa7:       SIMD: avx_256
02:21:22:WU01:FS00:0xa7:********************************************************************************
02:21:22:WU01:FS00:0xa7:Project: 13833 (Run 0, Clone 2650, Gen 2)
02:21:22:WU01:FS00:0xa7:Unit: 0x0000000480fccb095e6e56038838e939
02:21:22:WU01:FS00:0xa7:Reading tar file core.xml
02:21:22:WU01:FS00:0xa7:Reading tar file frame2.tpr
02:21:22:WU01:FS00:0xa7:Digital signatures verified
02:21:22:WU01:FS00:0xa7:Calling: mdrun -s frame2.tpr -o frame2.trr -x frame2.xtc -cpt 15 -nt 15
02:21:22:WU01:FS00:0xa7:Steps: first=500000 total=250000
02:21:22:WU01:FS00:0xa7:ERROR:
02:21:22:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
02:21:22:WU01:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
02:21:22:WU01:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
02:21:22:WU01:FS00:0xa7:ERROR:
02:21:22:WU01:FS00:0xa7:ERROR:Fatal error:
02:21:22:WU01:FS00:0xa7:ERROR:There is no domain decomposition for 15 ranks that is compatible with the given box and a minimum cell size of 1.45733 nm
02:21:22:WU01:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
02:21:22:WU01:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
02:21:22:WU01:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
02:21:22:WU01:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
02:21:22:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
02:21:27:WU01:FS00:0xa7:WARNING:Unexpected exit() call
02:21:27:WU01:FS00:0xa7:WARNING:Unexpected exit from science code
02:21:27:WU01:FS00:0xa7:Saving result file ../logfile_01.txt
02:21:27:WU01:FS00:0xa7:Saving result file md.log
02:21:27:WU01:FS00:0xa7:Saving result file science.log
02:21:27:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
cujomalainey
 
Posts: 2
Joined: Thu Mar 26, 2020 3:17 am

Re: FAHCore seg faulting on pop os

Postby Papamatti » Thu Mar 26, 2020 12:38 pm

The same happens on my system since today. Majaro with Kernel 5.5.11, AMD Ryzen 7 2700 :e(
Papamatti
 
Posts: 1
Joined: Thu Mar 26, 2020 12:33 pm

Re: FAHCore seg faulting on pop os

Postby cujomalainey » Mon Mar 30, 2020 6:05 pm

No idea if it was a bad job that timed out, but I am folding again with no recent change from myself.
cujomalainey
 
Posts: 2
Joined: Thu Mar 26, 2020 3:17 am


Return to Q&A about unsupported distros of Linux

Who is online

Users browsing this forum: astrorob and 1 guest

cron