FAHCore_22 NFS cleanup issue

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, FAHC Science Team

Post Reply
Nuitari
Posts: 80
Joined: Sun Jun 09, 2019 4:03 am
Hardware configuration: 1x Nvidia 1050ti
1x Nvidia 1660Super
1x Nvidia GTX 660
1x Nvidia 1060 3gb
1x AMD rx570
2x AMD rx560
1x AMD Ryzen 7 PRO 1700
1x AMD Ryzen 7 3700X
1x AMD Phenom II
1x AMD A8-9600
1x Intel i5-4590S

FAHCore_22 NFS cleanup issue

Post by Nuitari »

For various reasons, I use NFS as the root filesystem on a machine. This includes the work folder for folding at home.
I have errors like this once a WU finishes:
18:48:41:ERROR:WU01:FS05:Exception: Failed to remove 'work/01/.nfs0000000002a20bfd000000aa': Device or resource busy

The problem is that FAHCore_22 will open files in directories outside of a WU's directory, and nfs behaves somewhat differently then other file systems when it comes to file deletions. the .nfs000 means that the file got deleted while it was still open.

The relevant output of lsof on a FAHCore_22 process

Code: Select all

FahCore_2 5205 root    0u   CHR   136,9      0t0       12 /dev/pts/9
FahCore_2 5205 root    1w   REG    0,26     9204 44303828 /root/fahclient_saruman/work/13/01/science.log (10.0.2.11:/home/miner)
FahCore_2 5205 root    2w   REG    0,26     9204 44303828 /root/fahclient_saruman/work/13/01/science.log (10.0.2.11:/home/miner)
FahCore_2 5205 root    3w   REG    0,26   237168 20185099 /root/fahclient_saruman/log.txt (10.0.2.11:/home/miner)
FahCore_2 5205 root    4w   CHR     1,3      0t0     1049 /dev/null
FahCore_2 5205 root    5u   CHR 226,134      0t0     2939 /dev/dri/renderD134
FahCore_2 5205 root    6uW  REG    0,26        5 44303822 /root/fahclient_saruman/work/13/wudata_01.lock (10.0.2.11:/home/miner)
FahCore_2 5205 root    7wW  REG    0,26        5 44303822 /root/fahclient_saruman/work/13/wudata_01.lock (10.0.2.11:/home/miner)
FahCore_2 5205 root    8u   CHR 226,133      0t0    19414 /dev/dri/renderD133
FahCore_2 5205 root    9w   REG    0,26     5312 44303823 /root/fahclient_saruman/work/13/logfile_01.txt (10.0.2.11:/home/miner)
FahCore_2 5205 root   36r   REG    0,26     6347 44303757 /root/fahclient_saruman/work/09/logfile_01.txt (10.0.2.11:/home/miner)
FahCore_2 5205 root   37r   REG    0,26 23166464 44173838 /root/fahclient_saruman/work/04/.nfs0000000002a20a0e000000ad (10.0.2.11:/home/miner)
FahCore_2 5205 root   38r   REG    0,26     5370 44303690 /root/fahclient_saruman/work/07/logfile_01.txt (10.0.2.11:/home/miner)
FahCore_2 5205 root   39r   REG    0,26   237168 20185099 /root/fahclient_saruman/log.txt (10.0.2.11:/home/miner)
FahCore_2 5205 root   40r   REG    0,26     4443 44303778 /root/fahclient_saruman/work/12/.nfs0000000002a405a2000000b0 (10.0.2.11:/home/miner)
FahCore_2 5205 root   41r   REG    0,26     6438 44303663 /root/fahclient_saruman/work/02/.nfs0000000002a4052f000000ae (10.0.2.11:/home/miner)
FahCore_2 5205 root   42w   REG    0,26        0 44303819 /root/fahclient_saruman/work/14/.nfs0000000002a405cb000000ab (10.0.2.11:/home/miner)
FahCore_2 5205 root   47r   REG    0,26     3117 44303716 /root/fahclient_saruman/work/10/logfile_01-20200430-184301.txt (10.0.2.11:/home/miner)
So for some reason, FAHCore_22 opens all of the logfile_01.txt files in the work folder and not just the one for its relevant WU. Not sure why it would need to do that.
Image
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: FAHCore_22 NFS cleanup issue

Post by PantherX »

The design of folder structure is such that anything under FAHClient folder (defauly name) should be fully accessible by the client, nothing else should be there except the files and folders that the client downloads.

In your case, what's happening is that FahCore_22 (or any other FahCore) writes data to as it is processing the WU. Once it finishes processing the WU, it writes the wuresults_xx.dat file in FAHClient which is then uploaded. When wuresults_XX.dat is successfully written, the FAHClient\Work\XX is deleted and a new FAHClient\Work\YY is created. Do note that the XX and YY number are cycled in a manner that there's no duplicate at any given time. The lowest number is always used. Within the FAHClient\Work\XX directory, is a file called logfile_01.txt which contains process reports for FahCore_22 so once the WU is completed, it needs to clean it up hence, it will require deletion.

If you can fix it from your end, that's great. Otherwise, please provide more details so we can see if this is an issue or not :)
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Nuitari
Posts: 80
Joined: Sun Jun 09, 2019 4:03 am
Hardware configuration: 1x Nvidia 1050ti
1x Nvidia 1660Super
1x Nvidia GTX 660
1x Nvidia 1060 3gb
1x AMD rx570
2x AMD rx560
1x AMD Ryzen 7 PRO 1700
1x AMD Ryzen 7 3700X
1x AMD Phenom II
1x AMD A8-9600
1x Intel i5-4590S

Re: FAHCore_22 NFS cleanup issue

Post by Nuitari »

The problem is that the core opens the logfile_01.txt for all WUs
From the lsof output I pasted:

Code: Select all

FahCore_2 5205 root    9w   REG    0,26     5312 44303823 /root/fahclient_saruman/work/13/logfile_01.txt (10.0.2.11:/home/miner)
FahCore_2 5205 root   36r   REG    0,26     6347 44303757 /root/fahclient_saruman/work/09/logfile_01.txt (10.0.2.11:/home/miner)
FahCore_2 5205 root   37r   REG    0,26 23166464 44173838 /root/fahclient_saruman/work/04/.nfs0000000002a20a0e000000ad (10.0.2.11:/home/miner)
FahCore_2 5205 root   38r   REG    0,26     5370 44303690 /root/fahclient_saruman/work/07/logfile_01.txt (10.0.2.11:/home/miner)
As you can see in this snipped, the FahCore_22 with pid 5205 has the logfile_01.txt for WU 7, 9 and 13 open.
Image
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: FAHCore_22 NFS cleanup issue

Post by bruce »

I don't think you're interpreting it correctly. Without headings and reference to the processID list, I can only interpret what I see based on how the files are processed.

Are you sure 5205 represents FAHCore_22? That really doesn't make sense. I'll bet 5205 is FAHClient and 5312 / 6347 / 5370 are FAHCore_* processes. When a WU is started, it opens 07 or 04 or 09 or 13 and creates various files in that directory, including one called logfile_01.txt. but a different invocation of FAHCore* opens a different number of those 4 choices and creates other files, including a new logfile_01.txt. Meanwile, another process, (perhaps FAHClient ... perhaps a FAHCoreWrapper.exe process) reads the files created by the FAHCore and assembles a tarball for uploading (called result*) . (I'm not sure where it is written.) At that point several of the files may be still open for read-only and but they are no longer of interest. The result* file is then managed by FAHClient and it will be enqueued for upload.

The directory 07 or 04 or 09 or 13 and it's contents are not disposed until the log says:"Cleaning up" as in this example associated with /03/

Code: Select all

05:03:09:WU03:FS02:Upload complete
05:03:10:WU03:FS02:Server responded WORK_ACK (400)
05:03:10:WU03:FS02:Final credit estimate, 34995.00 points
05:03:10:WU03:FS02:Cleaning up
Nuitari
Posts: 80
Joined: Sun Jun 09, 2019 4:03 am
Hardware configuration: 1x Nvidia 1050ti
1x Nvidia 1660Super
1x Nvidia GTX 660
1x Nvidia 1060 3gb
1x AMD rx570
2x AMD rx560
1x AMD Ryzen 7 PRO 1700
1x AMD Ryzen 7 3700X
1x AMD Phenom II
1x AMD A8-9600
1x Intel i5-4590S

Re: FAHCore_22 NFS cleanup issue

Post by Nuitari »

This is the whole process tree as it is currently running

Code: Select all

16911 pts/9    Sl+    1:34      \_ FAHClient
16920 pts/9    SNl    0:06          \_ /usr/bin/FAHCoreWrapper /root/fahclient_saruman/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 10 -suffix 01 -version 706 -lifeline 16911 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
16924 pts/9    SNl   21:01          |   \_ /root/fahclient_saruman/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 10 -suffix 01 -version 706 -lifeline 16920 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
16934 pts/9    SNl    0:06          \_ /usr/bin/FAHCoreWrapper /root/fahclient_saruman/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 16911 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 6 -gpu 6
16938 pts/9    SNl   11:40          |   \_ /root/fahclient_saruman/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 01 -suffix 01 -version 706 -lifeline 16934 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 6 -gpu 6
16959 pts/9    SNl    0:06          \_ /usr/bin/FAHCoreWrapper /root/fahclient_saruman/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 11 -suffix 01 -version 706 -lifeline 16911 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 3 -gpu 3
16965 pts/9    SNl   64:02          |   \_ /root/fahclient_saruman/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 11 -suffix 01 -version 706 -lifeline 16959 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 3 -gpu 3
17294 pts/9    SNl    0:05          \_ /usr/bin/FAHCoreWrapper /root/fahclient_saruman/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 00 -suffix 01 -version 706 -lifeline 16911 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 2 -gpu 2
17298 pts/9    RNl   55:43          |   \_ /root/fahclient_saruman/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 00 -suffix 01 -version 706 -lifeline 17294 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 2 -gpu 2
18798 pts/9    SNl    0:00          \_ /usr/bin/FAHCoreWrapper /root/fahclient_saruman/cores/cores.foldingathome.org/v7/lin/64bit/Core_21.fah/FahCore_21 -dir 08 -suffix 01 -version 706 -lifeline 16911 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 4 -gpu 4
18802 pts/9    SNl    7:40          |   \_ /root/fahclient_saruman/cores/cores.foldingathome.org/v7/lin/64bit/Core_21.fah/FahCore_21 -dir 08 -suffix 01 -version 706 -lifeline 18798 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 4 -gpu 4
18828 pts/9    SNl    0:00          \_ /usr/bin/FAHCoreWrapper /root/fahclient_saruman/cores/cores.foldingathome.org/v7/lin/64bit/Core_21.fah/FahCore_21 -dir 03 -suffix 01 -version 706 -lifeline 16911 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 1 -gpu 1
18832 pts/9    SNl    6:58          |   \_ /root/fahclient_saruman/cores/cores.foldingathome.org/v7/lin/64bit/Core_21.fah/FahCore_21 -dir 03 -suffix 01 -version 706 -lifeline 18828 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 1 -gpu 1
18985 pts/9    SNl    0:00          \_ /usr/bin/FAHCoreWrapper /root/fahclient_saruman/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 07 -suffix 01 -version 706 -lifeline 16911 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 5 -gpu 5
18989 pts/9    SNl    1:22              \_ /root/fahclient_saruman/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 -dir 07 -suffix 01 -version 706 -lifeline 18985 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 5 -gpu 5

This is the full lsof of the FAHClient process

Code: Select all

COMMAND     PID USER   FD   TYPE  DEVICE SIZE/OFF     NODE NAME
FAHClient 16911 root  cwd    DIR    0,26     4096 20186658 /root/fahclient_saruman (10.0.2.11:/home/miner)
FAHClient 16911 root  rtd    DIR    0,26     4096 17565011 / (10.0.2.11:/home/miner)
FAHClient 16911 root  txt    REG    0,26  8054104 19139468 /usr/bin/FAHClient (10.0.2.11:/home/miner)
FAHClient 16911 root  mem    REG    0,26   101200 18352323 /lib/x86_64-linux-gnu/libresolv-2.23.so (10.0.2.11:/home/miner)
FAHClient 16911 root  mem    REG    0,26    27000 18352319 /lib/x86_64-linux-gnu/libnss_dns-2.23.so (10.0.2.11:/home/miner)
FAHClient 16911 root  mem    REG    0,26    47600 18352327 /lib/x86_64-linux-gnu/libnss_files-2.23.so (10.0.2.11:/home/miner)
FAHClient 16911 root  mem    CHR 226,134              2939 /dev/dri/renderD134
FAHClient 16911 root  mem    CHR 226,133             19414 /dev/dri/renderD133
FAHClient 16911 root  mem    CHR 226,132             19254 /dev/dri/renderD132
FAHClient 16911 root  mem    CHR 226,131              2712 /dev/dri/renderD131
FAHClient 16911 root  mem    CHR 226,130              2697 /dev/dri/renderD130
FAHClient 16911 root  mem    CHR 226,129              2547 /dev/dri/renderD129
FAHClient 16911 root  mem    REG    0,26 35159824 18350100 /opt/amdgpu-pro/lib/x86_64-linux-gnu/libamdocl12cl64.so (10.0.2.11:/home/miner)
FAHClient 16911 root  mem    REG    0,26 72613376 18350099 /opt/amdgpu-pro/lib/x86_64-linux-gnu/libamdocl-orca64.so (10.0.2.11:/home/miner)
FAHClient 16911 root  mem    REG    0,26    96640 18350091 /lib/x86_64-linux-gnu/libgcc_s.so.1 (10.0.2.11:/home/miner)
FAHClient 16911 root  mem    CHR 226,128             18699 /dev/dri/renderD128
FAHClient 16911 root  mem    REG    0,26    84616 22939042 /opt/amdgpu/lib/x86_64-linux-gnu/libdrm.so.2.4.0 (10.0.2.11:/home/miner)
FAHClient 16911 root  mem    REG    0,26    55656 22939045 /opt/amdgpu/lib/x86_64-linux-gnu/libdrm_amdgpu.so.1.0.0 (10.0.2.11:/home/miner)
FAHClient 16911 root  mem    REG    0,26    31712 18352313 /lib/x86_64-linux-gnu/librt-2.23.so (10.0.2.11:/home/miner)
FAHClient 16911 root  mem    REG    0,26    27632 18350096 /opt/amdgpu-pro/lib/x86_64-linux-gnu/libOpenCL.so.1 (10.0.2.11:/home/miner)
FAHClient 16911 root  mem    REG    0,26  1868984 18352330 /lib/x86_64-linux-gnu/libc-2.23.so (10.0.2.11:/home/miner)
FAHClient 16911 root  mem    REG    0,26  1088952 18352333 /lib/x86_64-linux-gnu/libm-2.23.so (10.0.2.11:/home/miner)
FAHClient 16911 root  mem    REG    0,26    14608 18352315 /lib/x86_64-linux-gnu/libdl-2.23.so (10.0.2.11:/home/miner)
FAHClient 16911 root  mem    REG    0,26   138696 18352317 /lib/x86_64-linux-gnu/libpthread-2.23.so (10.0.2.11:/home/miner)
FAHClient 16911 root  mem    REG    0,26   162632 18352316 /lib/x86_64-linux-gnu/ld-2.23.so (10.0.2.11:/home/miner)
FAHClient 16911 root    0u   CHR   136,9      0t0       12 /dev/pts/9
FAHClient 16911 root    1u   CHR   136,9      0t0       12 /dev/pts/9
FAHClient 16911 root    2u   CHR   136,9      0t0       12 /dev/pts/9
FAHClient 16911 root    3w   REG    0,26    70592 20185100 /root/fahclient_saruman/log.txt (10.0.2.11:/home/miner)
FAHClient 16911 root    4uw  REG    0,26    65536 30277636 /root/fahclient_saruman/work/client.db (10.0.2.11:/home/miner)
FAHClient 16911 root    5u  IPv4   98548      0t0      TCP *:7396 (LISTEN)
FAHClient 16911 root    6u  IPv4   98549      0t0      TCP *:36330 (LISTEN)
FAHClient 16911 root    7u   CHR 226,134      0t0     2939 /dev/dri/renderD134
FAHClient 16911 root    8u   CHR 226,134      0t0     2939 /dev/dri/renderD134
FAHClient 16911 root    9u   CHR   226,6      0t0     2940 /dev/dri/card6
FAHClient 16911 root   10u   CHR 226,133      0t0    19414 /dev/dri/renderD133
FAHClient 16911 root   11u   CHR 226,133      0t0    19414 /dev/dri/renderD133
FAHClient 16911 root   12u   CHR   226,5      0t0    20079 /dev/dri/card5
FAHClient 16911 root   13u   CHR 226,132      0t0    19254 /dev/dri/renderD132
FAHClient 16911 root   14u   CHR 226,132      0t0    19254 /dev/dri/renderD132
FAHClient 16911 root   15u   CHR   226,4      0t0    19255 /dev/dri/card4
FAHClient 16911 root   16u   CHR 226,131      0t0     2712 /dev/dri/renderD131
FAHClient 16911 root   17u   CHR 226,131      0t0     2712 /dev/dri/renderD131
FAHClient 16911 root   18u   CHR   226,3      0t0    19635 /dev/dri/card3
FAHClient 16911 root   19u   CHR 226,130      0t0     2697 /dev/dri/renderD130
FAHClient 16911 root   20u   CHR 226,130      0t0     2697 /dev/dri/renderD130
FAHClient 16911 root   21u   CHR   226,2      0t0    18945 /dev/dri/card2
FAHClient 16911 root   22u   CHR 226,129      0t0     2547 /dev/dri/renderD129
FAHClient 16911 root   23u   CHR 226,129      0t0     2547 /dev/dri/renderD129
FAHClient 16911 root   24u   CHR   226,1      0t0     2548 /dev/dri/card1
FAHClient 16911 root   25u   CHR 226,128      0t0    18699 /dev/dri/renderD128
FAHClient 16911 root   26u   CHR 226,128      0t0    18699 /dev/dri/renderD128
FAHClient 16911 root   27u   CHR   226,0      0t0    18707 /dev/dri/card0
FAHClient 16911 root   28u   CHR 226,128      0t0    18699 /dev/dri/renderD128
FAHClient 16911 root   29u   CHR 226,129      0t0     2547 /dev/dri/renderD129
FAHClient 16911 root   30u   CHR 226,130      0t0     2697 /dev/dri/renderD130
FAHClient 16911 root   31u   CHR 226,131      0t0     2712 /dev/dri/renderD131
FAHClient 16911 root   32u   CHR 226,132      0t0    19254 /dev/dri/renderD132
FAHClient 16911 root   33u   CHR 226,133      0t0    19414 /dev/dri/renderD133
FAHClient 16911 root   34u   CHR 226,134      0t0     2939 /dev/dri/renderD134
FAHClient 16911 root   35u   REG    0,26    33344 44173050 /root/fahclient_saruman/work/client.db-journal (10.0.2.11:/home/miner)
FAHClient 16911 root   37r   REG    0,26     2684 44303665 /root/fahclient_saruman/work/10/logfile_01.txt (10.0.2.11:/home/miner)
FAHClient 16911 root   39r   REG    0,26     2712 44303690 /root/fahclient_saruman/work/01/logfile_01.txt (10.0.2.11:/home/miner)
FAHClient 16911 root   40r   REG    0,26      859 44173344 /root/fahclient_saruman/work/03/logfile_01.txt (10.0.2.11:/home/miner)
FAHClient 16911 root   41r   REG    0,26     1997 44173507 /root/fahclient_saruman/work/07/logfile_01.txt (10.0.2.11:/home/miner)
FAHClient 16911 root   42r   REG    0,26     3091 44303818 /root/fahclient_saruman/work/11/logfile_01.txt (10.0.2.11:/home/miner)
FAHClient 16911 root   43u  IPv4   97239      0t0      TCP 10.0.2.203:36330->gandalf.nuitari.net:50326 (ESTABLISHED)
FAHClient 16911 root   44r   REG    0,26    70592 20185100 /root/fahclient_saruman/log.txt (10.0.2.11:/home/miner)
FAHClient 16911 root   45r   REG    0,26      904 44173332 /root/fahclient_saruman/work/08/logfile_01.txt (10.0.2.11:/home/miner)
FAHClient 16911 root   46r   REG    0,26     3006 44303760 /root/fahclient_saruman/work/00/logfile_01.txt (10.0.2.11:/home/miner)

This is the lsof for FahCoreWrapper at PID 18985, which is the parent of Fahcore 22 at PID 18989

Code: Select all

COMMAND     PID USER   FD   TYPE  DEVICE SIZE/OFF     NODE NAME
FAHCoreWr 18985 root  cwd    DIR    0,26     4096 44173350 /root/fahclient_saruman/work (10.0.2.11:/home/miner)
FAHCoreWr 18985 root  rtd    DIR    0,26     4096 17565011 / (10.0.2.11:/home/miner)
FAHCoreWr 18985 root  txt    REG    0,26   847984 19139469 /usr/bin/FAHCoreWrapper (10.0.2.11:/home/miner)
FAHCoreWr 18985 root  mem    REG    0,26  1868984 18352330 /lib/x86_64-linux-gnu/libc-2.23.so (10.0.2.11:/home/miner)
FAHCoreWr 18985 root  mem    REG    0,26  1088952 18352333 /lib/x86_64-linux-gnu/libm-2.23.so (10.0.2.11:/home/miner)
FAHCoreWr 18985 root  mem    REG    0,26    14608 18352315 /lib/x86_64-linux-gnu/libdl-2.23.so (10.0.2.11:/home/miner)
FAHCoreWr 18985 root  mem    REG    0,26   138696 18352317 /lib/x86_64-linux-gnu/libpthread-2.23.so (10.0.2.11:/home/miner)
FAHCoreWr 18985 root  mem    REG    0,26   162632 18352316 /lib/x86_64-linux-gnu/ld-2.23.so (10.0.2.11:/home/miner)
FAHCoreWr 18985 root    0u   CHR   136,9      0t0       12 /dev/pts/9
FAHCoreWr 18985 root    1w  FIFO    0,13      0t0    98197 pipe
FAHCoreWr 18985 root    2w   CHR     1,3      0t0     1049 /dev/null
FAHCoreWr 18985 root    3w   REG    0,26    70887 20185100 /root/fahclient_saruman/log.txt (10.0.2.11:/home/miner)
FAHCoreWr 18985 root    7u   CHR 226,134      0t0     2939 /dev/dri/renderD134
FAHCoreWr 18985 root   10u   CHR 226,133      0t0    19414 /dev/dri/renderD133
FAHCoreWr 18985 root   13u   CHR 226,132      0t0    19254 /dev/dri/renderD132
FAHCoreWr 18985 root   16u   CHR 226,131      0t0     2712 /dev/dri/renderD131
FAHCoreWr 18985 root   19u   CHR 226,130      0t0     2697 /dev/dri/renderD130
FAHCoreWr 18985 root   22u   CHR 226,129      0t0     2547 /dev/dri/renderD129
FAHCoreWr 18985 root   25u   CHR 226,128      0t0    18699 /dev/dri/renderD128
FAHCoreWr 18985 root   28u   CHR 226,128      0t0    18699 /dev/dri/renderD128
FAHCoreWr 18985 root   29u   CHR 226,129      0t0     2547 /dev/dri/renderD129
FAHCoreWr 18985 root   30u   CHR 226,130      0t0     2697 /dev/dri/renderD130
FAHCoreWr 18985 root   31u   CHR 226,131      0t0     2712 /dev/dri/renderD131
FAHCoreWr 18985 root   32u   CHR 226,132      0t0    19254 /dev/dri/renderD132
FAHCoreWr 18985 root   33u   CHR 226,133      0t0    19414 /dev/dri/renderD133
FAHCoreWr 18985 root   34u   CHR 226,134      0t0     2939 /dev/dri/renderD134
FAHCoreWr 18985 root   36r   REG    0,26  2950656 44303709 /root/fahclient_saruman/work/02/.nfs0000000002a4055d000000c0 (10.0.2.11:/home/miner)
FAHCoreWr 18985 root   37r   REG    0,26     2729 44303665 /root/fahclient_saruman/work/10/logfile_01.txt (10.0.2.11:/home/miner)
FAHCoreWr 18985 root   38u  sock     0,9      0t0    99530 protocol: TCP
FAHCoreWr 18985 root   39r   REG    0,26     2712 44303690 /root/fahclient_saruman/work/01/logfile_01.txt (10.0.2.11:/home/miner)
FAHCoreWr 18985 root   40r   REG    0,26      904 44173344 /root/fahclient_saruman/work/03/logfile_01.txt (10.0.2.11:/home/miner)
FAHCoreWr 18985 root   41w   CHR     1,3      0t0     1049 /dev/null
FAHCoreWr 18985 root   42r   REG    0,26     3091 44303818 /root/fahclient_saruman/work/11/logfile_01.txt (10.0.2.11:/home/miner)
FAHCoreWr 18985 root   43u  IPv4   97239      0t0      TCP 10.0.2.203:36330->gandalf.nuitari.net:50326 (ESTABLISHED)
FAHCoreWr 18985 root   44r   REG    0,26    70887 20185100 /root/fahclient_saruman/log.txt (10.0.2.11:/home/miner)
FAHCoreWr 18985 root   45r   REG    0,26      904 44173332 /root/fahclient_saruman/work/08/logfile_01.txt (10.0.2.11:/home/miner)
FAHCoreWr 18985 root   46r   REG    0,26     3006 44303760 /root/fahclient_saruman/work/00/logfile_01.txt (10.0.2.11:/home/miner)
FAHCoreWr 18985 root   47w  FIFO    0,13      0t0    98197 pipe
This is the lsof for FahCore 22 at PID 18989

Code: Select all

root@saruman:~# lsof -p 18989
COMMAND     PID USER   FD   TYPE  DEVICE SIZE/OFF     NODE NAME
FahCore_2 18989 root  cwd    DIR    0,26     4096 44173505 /root/fahclient_saruman/work/07/01 (10.0.2.11:/home/miner)
FahCore_2 18989 root  rtd    DIR    0,26     4096 17565011 / (10.0.2.11:/home/miner)
FahCore_2 18989 root  txt    REG    0,26  9770984 32639210 /root/fahclient_saruman/cores/cores.foldingathome.org/v7/lin/64bit/Core_22.fah/FahCore_22 (10.0.2.11:/home/miner)
FahCore_2 18989 root  mem    CHR 226,133             19414 /dev/dri/renderD133
FahCore_2 18989 root  mem    CHR 226,134              2939 /dev/dri/renderD134
FahCore_2 18989 root  mem    CHR 226,132             19254 /dev/dri/renderD132
FahCore_2 18989 root  mem    CHR 226,131              2712 /dev/dri/renderD131
FahCore_2 18989 root  mem    CHR 226,130              2697 /dev/dri/renderD130
FahCore_2 18989 root  mem    REG    0,26 72613376 18350099 /opt/amdgpu-pro/lib/x86_64-linux-gnu/libamdocl-orca64.so (10.0.2.11:/home/miner)
FahCore_2 18989 root  mem    CHR 226,129              2547 /dev/dri/renderD129
FahCore_2 18989 root  mem    REG    0,26 35159824 18350100 /opt/amdgpu-pro/lib/x86_64-linux-gnu/libamdocl12cl64.so (10.0.2.11:/home/miner)
FahCore_2 18989 root  mem    CHR 226,128             18699 /dev/dri/renderD128
FahCore_2 18989 root  mem    REG    0,26    84616 22939042 /opt/amdgpu/lib/x86_64-linux-gnu/libdrm.so.2.4.0 (10.0.2.11:/home/miner)
FahCore_2 18989 root  mem    REG    0,26    55656 22939045 /opt/amdgpu/lib/x86_64-linux-gnu/libdrm_amdgpu.so.1.0.0 (10.0.2.11:/home/miner)
FahCore_2 18989 root  mem    REG    0,26  1868984 18352330 /lib/x86_64-linux-gnu/libc-2.23.so (10.0.2.11:/home/miner)
FahCore_2 18989 root  mem    REG    0,26    96640 18350091 /lib/x86_64-linux-gnu/libgcc_s.so.1 (10.0.2.11:/home/miner)
FahCore_2 18989 root  mem    REG    0,26  1088952 18352333 /lib/x86_64-linux-gnu/libm-2.23.so (10.0.2.11:/home/miner)
FahCore_2 18989 root  mem    REG    0,26  1964904 18482534 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28 (10.0.2.11:/home/miner)
FahCore_2 18989 root  mem    REG    0,26    31712 18352313 /lib/x86_64-linux-gnu/librt-2.23.so (10.0.2.11:/home/miner)
FahCore_2 18989 root  mem    REG    0,26    14608 18352315 /lib/x86_64-linux-gnu/libdl-2.23.so (10.0.2.11:/home/miner)
FahCore_2 18989 root  mem    REG    0,26   138696 18352317 /lib/x86_64-linux-gnu/libpthread-2.23.so (10.0.2.11:/home/miner)
FahCore_2 18989 root  mem    REG    0,26    27632 18350096 /opt/amdgpu-pro/lib/x86_64-linux-gnu/libOpenCL.so.1 (10.0.2.11:/home/miner)
FahCore_2 18989 root  mem    REG    0,26   162632 18352316 /lib/x86_64-linux-gnu/ld-2.23.so (10.0.2.11:/home/miner)
FahCore_2 18989 root    0u   CHR   136,9      0t0       12 /dev/pts/9
FahCore_2 18989 root    1w   REG    0,26     4273 44173512 /root/fahclient_saruman/work/07/01/science.log (10.0.2.11:/home/miner)
FahCore_2 18989 root    2w   REG    0,26     4273 44173512 /root/fahclient_saruman/work/07/01/science.log (10.0.2.11:/home/miner)
FahCore_2 18989 root    3w   REG    0,26    70661 20185100 /root/fahclient_saruman/log.txt (10.0.2.11:/home/miner)
FahCore_2 18989 root    4w   CHR     1,3      0t0     1049 /dev/null
FahCore_2 18989 root    5uW  REG    0,26        6 44173506 /root/fahclient_saruman/work/07/wudata_01.lock (10.0.2.11:/home/miner)
FahCore_2 18989 root    6wW  REG    0,26        6 44173506 /root/fahclient_saruman/work/07/wudata_01.lock (10.0.2.11:/home/miner)
FahCore_2 18989 root    7u   CHR 226,134      0t0     2939 /dev/dri/renderD134
FahCore_2 18989 root    8w   REG    0,26     1997 44173507 /root/fahclient_saruman/work/07/logfile_01.txt (10.0.2.11:/home/miner)
FahCore_2 18989 root   10u   CHR 226,133      0t0    19414 /dev/dri/renderD133
FahCore_2 18989 root   12u   CHR 226,134      0t0     2939 /dev/dri/renderD134
FahCore_2 18989 root   13u   CHR 226,132      0t0    19254 /dev/dri/renderD132
FahCore_2 18989 root   14u   CHR 226,134      0t0     2939 /dev/dri/renderD134
FahCore_2 18989 root   15u   CHR   226,6      0t0     2940 /dev/dri/card6
FahCore_2 18989 root   16u   CHR 226,131      0t0     2712 /dev/dri/renderD131
FahCore_2 18989 root   17u   CHR 226,133      0t0    19414 /dev/dri/renderD133
FahCore_2 18989 root   18u   CHR 226,133      0t0    19414 /dev/dri/renderD133
FahCore_2 18989 root   19u   CHR 226,130      0t0     2697 /dev/dri/renderD130
FahCore_2 18989 root   20u   CHR   226,5      0t0    20079 /dev/dri/card5
FahCore_2 18989 root   21u   CHR 226,132      0t0    19254 /dev/dri/renderD132
FahCore_2 18989 root   22u   CHR 226,129      0t0     2547 /dev/dri/renderD129
FahCore_2 18989 root   23u   CHR 226,132      0t0    19254 /dev/dri/renderD132
FahCore_2 18989 root   24u   CHR   226,4      0t0    19255 /dev/dri/card4
FahCore_2 18989 root   25u   CHR 226,128      0t0    18699 /dev/dri/renderD128
FahCore_2 18989 root   26u   CHR 226,131      0t0     2712 /dev/dri/renderD131
FahCore_2 18989 root   27u   CHR 226,131      0t0     2712 /dev/dri/renderD131
FahCore_2 18989 root   28u   CHR 226,128      0t0    18699 /dev/dri/renderD128
FahCore_2 18989 root   29u   CHR 226,129      0t0     2547 /dev/dri/renderD129
FahCore_2 18989 root   30u   CHR 226,130      0t0     2697 /dev/dri/renderD130
FahCore_2 18989 root   31u   CHR 226,131      0t0     2712 /dev/dri/renderD131
FahCore_2 18989 root   32u   CHR 226,132      0t0    19254 /dev/dri/renderD132
FahCore_2 18989 root   33u   CHR 226,133      0t0    19414 /dev/dri/renderD133
FahCore_2 18989 root   34u   CHR 226,134      0t0     2939 /dev/dri/renderD134
FahCore_2 18989 root   35u   CHR   226,3      0t0    19635 /dev/dri/card3
FahCore_2 18989 root   36r   REG    0,26  2950656 44303709 /root/fahclient_saruman/work/02/.nfs0000000002a4055d000000c0 (10.0.2.11:/home/miner)
FahCore_2 18989 root   37r   REG    0,26     2729 44303665 /root/fahclient_saruman/work/10/logfile_01.txt (10.0.2.11:/home/miner)
FahCore_2 18989 root   38u  sock     0,9      0t0    99530 protocol: TCP
FahCore_2 18989 root   39r   REG    0,26     2712 44303690 /root/fahclient_saruman/work/01/logfile_01.txt (10.0.2.11:/home/miner)
FahCore_2 18989 root   40r   REG    0,26      859 44173344 /root/fahclient_saruman/work/03/logfile_01.txt (10.0.2.11:/home/miner)
FahCore_2 18989 root   41w   CHR     1,3      0t0     1049 /dev/null
FahCore_2 18989 root   42r   REG    0,26     3091 44303818 /root/fahclient_saruman/work/11/logfile_01.txt (10.0.2.11:/home/miner)
FahCore_2 18989 root   43u  IPv4   97239      0t0      TCP 10.0.2.203:36330->gandalf.nuitari.net:50326 (ESTABLISHED)
FahCore_2 18989 root   44r   REG    0,26    70661 20185100 /root/fahclient_saruman/log.txt (10.0.2.11:/home/miner)
FahCore_2 18989 root   45r   REG    0,26      904 44173332 /root/fahclient_saruman/work/08/logfile_01.txt (10.0.2.11:/home/miner)
FahCore_2 18989 root   46r   REG    0,26     3006 44303760 /root/fahclient_saruman/work/00/logfile_01.txt (10.0.2.11:/home/miner)
FahCore_2 18989 root   47w  FIFO    0,13      0t0    98197 pipe
FahCore_2 18989 root   48u   CHR 226,130      0t0     2697 /dev/dri/renderD130
FahCore_2 18989 root   49u   CHR 226,130      0t0     2697 /dev/dri/renderD130
FahCore_2 18989 root   50u   CHR   226,2      0t0    18945 /dev/dri/card2
FahCore_2 18989 root   51u   CHR 226,129      0t0     2547 /dev/dri/renderD129
FahCore_2 18989 root   52u   CHR 226,129      0t0     2547 /dev/dri/renderD129
FahCore_2 18989 root   53u   CHR   226,1      0t0     2548 /dev/dri/card1
FahCore_2 18989 root   54u   CHR 226,128      0t0    18699 /dev/dri/renderD128
FahCore_2 18989 root   55u   CHR 226,128      0t0    18699 /dev/dri/renderD128
FahCore_2 18989 root   56u   CHR   226,0      0t0    18707 /dev/dri/card0
FahCore_2 18989 root   57u   CHR 226,128      0t0    18699 /dev/dri/renderD128
FahCore_2 18989 root   58u   CHR 226,129      0t0     2547 /dev/dri/renderD129
FahCore_2 18989 root   59u   CHR 226,130      0t0     2697 /dev/dri/renderD130
FahCore_2 18989 root   60u   CHR 226,131      0t0     2712 /dev/dri/renderD131
FahCore_2 18989 root   61u   CHR 226,132      0t0    19254 /dev/dri/renderD132
FahCore_2 18989 root   62u   CHR 226,133      0t0    19414 /dev/dri/renderD133
FahCore_2 18989 root   63u   CHR 226,134      0t0     2939 /dev/dri/renderD134
So what's actually going on is that it looks like FD_CLOEXEC is not set, so whenever FAHClient forks the existing file descriptors stay open.
This not only includes the log files, but the various access to /dev/dri/* files (GPUs) made by FAHClient.
Also for some reason the core at pid 18989 opens multiple graphic card devices (its assigned GPU is #5)

The cleanup somewhat works, except that I get spammed

Code: Select all

06:09:40:WU05:FS03:Cleaning up
06:09:40:ERROR:WU05:FS03:Exception: Failed to remove 'work/05/.nfs0000000002a40553000000be': Device or resource busy
until all WU that were running when the WU completed are done.

All of that could be a very good way of getting heisenbugs.

I've confirmed this on 7.5.1 and 7.6.13
Image
Post Reply