Page 1 of 1

Project 14253 causing system hang

PostPosted: Mon Jun 08, 2020 9:24 pm
by MrKicker
Project 14253
Run 768
Clone 4
Gen 57

Upon starting this WU, my system freezes within 10 seconds, requiring forced restart. Several attempts to start this WU on my computer have failed, and I've had no stability problems with FAH before.

Code: Select all
*********************** Log Started 2020-06-08T19:52:52Z ***********************
19:52:52:Trying to access database...
19:52:52:Successfully acquired database lock
19:52:52:Downloading GPUs.txt from assign1.foldingathome.org:80
19:52:52:Connecting to assign1.foldingathome.org:80
19:52:52:Read GPUs.txt
19:52:52:Enabled folding slot 01: PAUSED gpu:0:Vega 10 XL/XT [Radeon RX Vega 56/64] (by user)
19:52:52:****************************** FAHClient ******************************
19:52:52:        Version: 7.6.13
19:52:52:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
19:52:52:      Copyright: 2020 foldingathome.org
19:52:52:       Homepage: https://foldingathome.org/
19:52:52:           Date: Apr 27 2020
19:52:52:           Time: 21:21:01
19:52:52:       Revision: 5a652817f46116b6e135503af97f18e094414e3b
19:52:52:         Branch: master
19:52:52:       Compiler: Visual C++ 2008
19:52:52:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
19:52:52:       Platform: win32 10
19:52:52:           Bits: 32
19:52:52:           Mode: Release
19:52:52:           Args: --open-web-control
19:52:52:         Config: C:\Users\Alex\AppData\Roaming\FAHClient\config.xml
19:52:52:******************************** CBang ********************************
19:52:52:           Date: Apr 24 2020
19:52:52:           Time: 17:07:55
19:52:52:       Revision: ea081a3b3b0f4a37c4d0440b4f1bc184197c7797
19:52:52:         Branch: master
19:52:52:       Compiler: Visual C++ 2008
19:52:52:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
19:52:52:       Platform: win32 10
19:52:52:           Bits: 32
19:52:52:           Mode: Release
19:52:52:******************************* System ********************************
19:52:52:            CPU: AMD Ryzen 5 1600X Six-Core Processor
19:52:52:         CPU ID: AuthenticAMD Family 23 Model 1 Stepping 1
19:52:52:           CPUs: 12
19:52:52:         Memory: 15.93GiB
19:52:52:    Free Memory: 11.32GiB
19:52:52:        Threads: WINDOWS_THREADS
19:52:52:     OS Version: 6.2
19:52:52:    Has Battery: false
19:52:52:     On Battery: false
19:52:52:     UTC Offset: -4
19:52:52:            PID: 11400
19:52:52:            CWD: C:\Users\Alex\AppData\Roaming\FAHClient
19:52:52:  Win32 Service: false
19:52:52:             OS: Windows 10 Home
19:52:52:        OS Arch: AMD64
19:52:52:           GPUs: 1
19:52:52:          GPU 0: Bus:40 Slot:0 Func:0 AMD:5 Vega 10 XL/XT [Radeon RX Vega 56/64]
19:52:52:           CUDA: Not detected: Failed to open dynamic library 'nvcuda.dll': The
19:52:52:                 specified module could not be found.
19:52:52:
19:52:52:OpenCL Device 0: Platform:0 Device:0 Bus:40 Slot:0 Compute:1.2 Driver:3004.8
19:52:52:******************************* libFAH ********************************
19:52:52:           Date: Apr 15 2020
19:52:52:           Time: 14:53:14
19:52:52:       Revision: 216968bc7025029c841ed6e36e81a03a316890d3
19:52:52:         Branch: master
19:52:52:       Compiler: Visual C++ 2008
19:52:52:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
19:52:52:       Platform: win32 10
19:52:52:           Bits: 32
19:52:52:           Mode: Release
19:52:52:***********************************************************************
19:52:52:<config>
19:52:52:  <!-- Network -->
19:52:52:  <proxy v=':8080'/>
19:52:52:
19:52:52:  <!-- Slot Control -->
19:52:52:  <power v='full'/>
19:52:52:
19:52:52:  <!-- User Information -->
19:52:52:  <team v='237040'/>
19:52:52:  <user v='MrKicker'/>
19:52:52:
19:52:52:  <!-- Folding Slots -->
19:52:52:  <slot id='1' type='GPU'>
19:52:52:    <paused v='true'/>
19:52:52:  </slot>
19:52:52:</config>
19:52:58:17:127.0.0.1:New Web session
19:53:32:FS01:Unpaused
19:53:32:WU01:FS01:Starting
19:53:32:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\Users\Alex\AppData\Roaming\FAHClient\cores/cores.foldingathome.org/v7/win/64bit/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 706 -lifeline 11400 -checkpoint 15 -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
19:53:32:WU01:FS01:Started FahCore on PID 3444
19:53:32:WU01:FS01:Core PID:9632
19:53:32:WU01:FS01:FahCore 0x22 started
19:53:33:WU01:FS01:0x22:*********************** Log Started 2020-06-08T19:53:32Z ***********************
19:53:33:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
19:53:33:WU01:FS01:0x22:       Type: 0x22
19:53:33:WU01:FS01:0x22:       Core: Core22
19:53:33:WU01:FS01:0x22:    Website: https://foldingathome.org/
19:53:33:WU01:FS01:0x22:  Copyright: (c) 2009-2018 foldingathome.org
19:53:33:WU01:FS01:0x22:     Author: John Chodera <john.chodera@choderalab.org> and Rafal Wiewiora
19:53:33:WU01:FS01:0x22:             <rafal.wiewiora@choderalab.org>
19:53:33:WU01:FS01:0x22:       Args: -dir 01 -suffix 01 -version 706 -lifeline 3444 -checkpoint 15
19:53:33:WU01:FS01:0x22:             -gpu-vendor amd -opencl-platform 0 -opencl-device 0 -gpu 0
19:53:33:WU01:FS01:0x22:     Config: <none>
19:53:33:WU01:FS01:0x22:************************************ Build *************************************
19:53:33:WU01:FS01:0x22:    Version: 0.0.5
19:53:33:WU01:FS01:0x22:       Date: Apr 22 2020
19:53:33:WU01:FS01:0x22:       Time: 04:42:59
19:53:33:WU01:FS01:0x22: Repository: Git
19:53:33:WU01:FS01:0x22:   Revision: 2d69202c898bd9bb3e093f51cd32bf411c2a0388
19:53:33:WU01:FS01:0x22:     Branch: HEAD
19:53:33:WU01:FS01:0x22:   Compiler: Visual C++ 2008
19:53:33:WU01:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
19:53:33:WU01:FS01:0x22:   Platform: win32 10
19:53:33:WU01:FS01:0x22:       Bits: 64
19:53:33:WU01:FS01:0x22:       Mode: Release
19:53:33:WU01:FS01:0x22:************************************ System ************************************
19:53:33:WU01:FS01:0x22:        CPU: AMD Ryzen 5 1600X Six-Core Processor
19:53:33:WU01:FS01:0x22:     CPU ID: AuthenticAMD Family 23 Model 1 Stepping 1
19:53:33:WU01:FS01:0x22:       CPUs: 12
19:53:33:WU01:FS01:0x22:     Memory: 15.93GiB
19:53:33:WU01:FS01:0x22:Free Memory: 11.12GiB
19:53:33:WU01:FS01:0x22:    Threads: WINDOWS_THREADS
19:53:33:WU01:FS01:0x22: OS Version: 6.2
19:53:33:WU01:FS01:0x22:Has Battery: false
19:53:33:WU01:FS01:0x22: On Battery: false
19:53:33:WU01:FS01:0x22: UTC Offset: -4
19:53:33:WU01:FS01:0x22:        PID: 9632
19:53:33:WU01:FS01:0x22:        CWD: C:\Users\Alex\AppData\Roaming\FAHClient\work
19:53:33:WU01:FS01:0x22:         OS: Windows 10 Home
19:53:33:WU01:FS01:0x22:    OS Arch: AMD64
19:53:33:WU01:FS01:0x22:********************************************************************************
19:53:33:WU01:FS01:0x22:Project: 14253 (Run 768, Clone 4, Gen 57)
19:53:33:WU01:FS01:0x22:Unit: 0x00000063cedfaa925eaba95fa9213605
19:53:33:WU01:FS01:0x22:Digital signatures verified
19:53:33:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
19:53:33:WU01:FS01:0x22:Version 0.0.5
19:53:53:Removing old file 'configs/config-20200413-160422.xml'
19:53:53:Saving configuration to config.xml
19:53:53:<config>
19:53:53:  <!-- Network -->
19:53:53:  <proxy v=':8080'/>
19:53:53:
19:53:53:  <!-- Slot Control -->
19:53:53:  <power v='full'/>
19:53:53:
19:53:53:  <!-- User Information -->
19:53:53:  <team v='237040'/>
19:53:53:  <user v='MrKicker'/>
19:53:53:
19:53:53:  <!-- Folding Slots -->
19:53:53:  <slot id='1' type='GPU'/>
19:53:53:</config>
19:54:20:WU01:FS01:0x22:Completed 0 out of 500000 steps (0%)
19:54:20:WU01:FS01:0x22:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900


(Will edit or delete if I get further problems after dumping)

Re: Project 14253 causing system hang

PostPosted: Mon Jun 08, 2020 10:23 pm
by bruce
It might be a driver problem or it might be a heat problem. Assuming it's a heat problem, how about monitoring the temperature. 10 seconds it about how long it would take for a GPU to shut itself down if the heatsink is not firmly attached to the CPU with a coat of thermal paste or if the cooling fan has frozen up.

Re: Project 14253 causing system hang

PostPosted: Tue Jun 09, 2020 12:11 am
by MrKicker
I stress tessed the GPU to try and tease out the issue, the tempratures were healthy and I couldnt get any errors outside of FAH.
I found another crashing WU: Project: 14253 (Run 683, Clone 4, Gen 73)
I've tried underclocking my GPU core and GPU memory to try and get it to work, but to no avail, I'll dump this one too and hopefully this will be the end of them

Re: Project 14253 causing system hang

PostPosted: Tue Jun 09, 2020 12:24 am
by _r2w_ben
p14253 uses more memory than most GPU projects. Can you try running MemtestCL and see if it reports any errors?
https://simtk.org/projects/memtest
or
https://www.majorgeeks.com/files/details/memtestcl.html

Re: Project 14253 causing system hang

PostPosted: Tue Jun 09, 2020 1:27 am
by MrKicker
I couldnt get MemtestCL to run on the whole of my card's memory, it didnt seem to want to allocate it. But I ran the memtest on OCCT for 10 iterations with no errors. I've noticed that on WUs that arent crashing are getting "Particle coordinate is nan" errors, Im gonna try underclocking a tad again and see if they still occur

--EDIT--
No luck, maybe my GPU just suddently decided it didnt want to fold anymore?