Work units crashing 2x 1070ti set-up

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, FAHC Science Team

Work units crashing 2x 1070ti set-up

Postby intercessor » Thu Nov 19, 2020 7:03 pm

Hi,
One of the gpu's fails quite often. I switched the motherboard, currently on prime z270-p with all drivers up to date. Ideas?

Tnx much




Code: Select all
117:27:43:WU00:FS02:0x22:  Using CUDA and gpu 0
17:27:44:WU00:FS02:0x22:Completed 0 out of 2000000 steps (0%)
17:28:33:WU01:FS00:0x22:Completed 337500 out of 1250000 steps (27%)
1[color=#FF0000]7:29:01:WU00:FS02:0x22:An exception occurred at step 17067: Particle coordinate is nan
17:29:01:WU00:FS02:0x22:Max number of attempts to resume from last checkpoint (2) reached. Aborting.
17:29:01:WU00:FS02:0x22:ERROR:114: Max number of attempts to resume from last checkpoint reached.[/color]
17:29:01:WU00:FS02:0x22:Saving result file ..\logfile_01.txt
17:29:01:WU00:FS02:0x22:Saving result file science.log
17:29:01:WU00:FS02:0x22:Saving result file state.xml
[color=#FF0000]17:29:06:WU00:FS02:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT[/color]
17:29:06:WARNING:WU00:FS02:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
17:29:06:WU00:FS02:Sending unit results: id:00 state:SEND error:FAULTY project:14904 run:239 clone:5 gen:188 core:0x22 unit:0x0000010081d59d695f4ec9dfa5d32bbc
17:29:06:WU00:FS02:Uploading 9.53MiB to 129.213.157.105
17:29:06:WU00:FS02:Connecting to 129.213.157.105:8080
17:29:07:WU02:FS02:Connecting to assign1.foldingathome.org:80
17:29:07:WU02:FS02:Assigned to work server 18.188.125.154
17:29:07:WU02:FS02:Requesting new work unit for slot 02: gpu:1:0 GP104 [GeForce GTX 1070 Ti] 8186 from 18.188.125.154
17:29:07:WU02:FS02:Connecting to 18.188.125.154:8080


[color=#FF0000]7:27:32:WU00:FS02:0x22:An exception occurred at step 28111: Particle coordinate is nan
17:27:32:WU00:FS02:0x22:ERROR:98: Attempting to restart from last good checkpoint by restarting core.
17:27:32:WU00:FS02:0x22:Folding@home Core Shutdown: CORE_RESTART[/color]
17:27:32:WARNING:WU00:FS02:FahCore returned: CORE_RESTART (98 = 0x62)
17:27:32:WU00:FS02:Starting
17:27:32:WU00:FS02:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\ProgramData\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.13/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 706 -lifeline 11520 -checkpoint 5 -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
17:27:32:WU00:FS02:Started FahCore on PID 7376
17:27:32:WU00:FS02:Core PID:5028
17:27:32:WU00:FS02:FahCore 0x22 started
17:27:33:WU00:FS02:0x22:*********************** Log Started 2020-11-19T17:27:32Z ***********************
17:27:33:WU00:FS02:0x22:*************************** Core22 Folding@home Core ***************************
17:27:33:WU00:FS02:0x22:       Core: Core22
17:27:33:WU00:FS02:0x22:       Type: 0x22
17:27:33:WU00:FS02:0x22:    Version: 0.0.13
17:27:33:WU00:FS02:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
17:27:33:WU00:FS02:0x22:  Copyright: 2020 foldingathome.org
17:27:33:WU00:FS02:0x22:   Homepage: https://foldingathome.org/
17:27:33:WU00:FS02:0x22:       Date: Sep 19 2020
17:27:33:WU00:FS02:0x22:       Time: 02:35:58
17:27:33:WU00:FS02:0x22:   Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
17:27:33:WU00:FS02:0x22:     Branch: core22-0.0.13
17:27:33:WU00:FS02:0x22:   Compiler: Visual C++ 2015
17:27:33:WU00:FS02:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
17:27:33:WU00:FS02:0x22:             -DOPENMM_GIT_HASH="\"189320d0\""
17:27:33:WU00:FS02:0x22:   Platform: win32 10
17:27:33:WU00:FS02:0x22:       Bits: 64
17:27:33:WU00:FS02:0x22:       Mode: Release
17:27:33:WU00:FS02:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
17:27:33:WU00:FS02:0x22:             <peastman@stanford.edu>
17:27:33:WU00:FS02:0x22:       Args: -dir 00 -suffix 01 -version 706 -lifeline 7376 -checkpoint 5
17:27:33:WU00:FS02:0x22:             -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor
17:27:33:WU00:FS02:0x22:             nvidia -gpu 0 -gpu-usage 100
17:27:33:WU00:FS02:0x22:************************************ libFAH ************************************
17:27:33:WU00:FS02:0x22:       Date: Sep 7 2020
17:27:33:WU00:FS02:0x22:       Time: 19:09:56
17:27:33:WU00:FS02:0x22:   Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
17:27:33:WU00:FS02:0x22:     Branch: HEAD
17:27:33:WU00:FS02:0x22:   Compiler: Visual C++ 2015
17:27:33:WU00:FS02:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
17:27:33:WU00:FS02:0x22:   Platform: win32 10
17:27:33:WU00:FS02:0x22:       Bits: 64
17:27:33:WU00:FS02:0x22:       Mode: Release
17:27:33:WU00:FS02:0x22:************************************ CBang *************************************
17:27:33:WU00:FS02:0x22:       Date: Sep 7 2020
17:27:33:WU00:FS02:0x22:       Time: 19:08:30
17:27:33:WU00:FS02:0x22:   Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
17:27:33:WU00:FS02:0x22:     Branch: HEAD
17:27:33:WU00:FS02:0x22:   Compiler: Visual C++ 2015
17:27:33:WU00:FS02:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
17:27:33:WU00:FS02:0x22:   Platform: win32 10
17:27:33:WU00:FS02:0x22:       Bits: 64
17:27:33:WU00:FS02:0x22:       Mode: Release
17:27:33:WU00:FS02:0x22:************************************ System ************************************
17:27:33:WU00:FS02:0x22:        CPU: Intel(R) Core(TM) i3-7100 CPU @ 3.90GHz
17:27:33:WU00:FS02:0x22:     CPU ID: GenuineIntel Family 6 Model 158 Stepping 9
17:27:33:WU00:FS02:0x22:       CPUs: 4
17:27:33:WU00:FS02:0x22:     Memory: 3.95GiB
17:27:33:WU00:FS02:0x22:Free Memory: 1.40GiB
17:27:33:WU00:FS02:0x22:    Threads: WINDOWS_THREADS
17:27:33:WU00:FS02:0x22: OS Version: 6.2
17:27:33:WU00:FS02:0x22:Has Battery: false
17:27:33:WU00:FS02:0x22: On Battery: false
17:27:33:WU00:FS02:0x22: UTC Offset: -6
17:27:33:WU00:FS02:0x22:        PID: 5028
17:27:33:WU00:FS02:0x22:        CWD: C:\ProgramData\FAHClient\work
17:27:33:WU00:FS02:0x22:************************************ OpenMM ************************************
17:27:33:WU00:FS02:0x22:   Revision: 189320d0
17:27:33:WU00:FS02:0x22:********************************************************************************
17:27:33:WU00:FS02:0x22:Project: 14904 (Run 239, Clone 5, Gen 188)
17:27:33:WU00:FS02:0x22:Unit: 0x0000010081d59d695f4ec9dfa5d32bbc
17:27:33:WU00:FS02:0x22:Digital signatures verified
17:27:33:WU00:FS02:0x22:Folding@home GPU Core22 Folding@home Core
17:27:33:WU00:FS02:0x22:Version 0.0.13
17:27:33:WU00:FS02:0x22:  Checkpoint write interval: 100000 steps (5%) [20 total]
17:27:33:WU00:FS02:0x22:  JSON viewer frame write interval: 20000 steps (1%) [100 total]
17:27:33:WU00:FS02:0x22:  XTC frame write interval: 50000 steps (2.5%) [40 total]
17:27:33:WU00:FS02:0x22:  Global context and integrator variables write interval: disabled
17:27:33:WU00:FS02:0x22:There are 4 platforms available.
17:27:33:WU00:FS02:0x22:Platform 0: Reference
17:27:33:WU00:FS02:0x22:Platform 1: CPU
17:27:33:WU00:FS02:0x22:Platform 2: OpenCL
17:27:33:WU00:FS02:0x22:  opencl-device 0 specified
17:27:33:WU00:FS02:0x22:Platform 3: CUDA
17:27:33:WU00:FS02:0x22:  cuda-device 0 specified
17:27:40:WU00:FS02:0x22:Attempting to create CUDA context:
17:27:40:WU00:FS02:0x22:  Configuring platform CUDA
17:27:43:WU00:FS02:0x22:  Using CUDA and gpu 0
17:27:44:WU00:FS02:0x22:Completed 0 out of 2000000 steps (0%)
17:28:33:WU01:FS00:0x22:Completed 337500 out of 1250000 steps (27%)
intercessor
 
Posts: 4
Joined: Thu Nov 19, 2020 6:54 pm

Re: Work units crashing 2x 1070ti set-up

Postby JimboPalmer » Thu Nov 19, 2020 8:15 pm

Welcome to Folding@Home!
The first 200 lines of your log tell us a great deal about your setup.
If you post that, many folks will be able to help.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
JimboPalmer
 
Posts: 2074
Joined: Mon Feb 16, 2009 5:12 am
Location: Greenwood MS USA

Re: Work units crashing 2x 1070ti set-up

Postby intercessor » Thu Nov 19, 2020 8:22 pm

Hi,
I think it's there, I pasted the error messages in front of the system setup.
Here is a different crash...
Code: Select all
19:52:11:WU01:FS02:0x22:*********************** Log Started 2020-11-19T19:52:11Z ***********************
19:52:11:WU01:FS02:0x22:*************************** Core22 Folding@home Core ***************************
19:52:11:WU01:FS02:0x22:       Core: Core22
19:52:11:WU01:FS02:0x22:       Type: 0x22
19:52:11:WU01:FS02:0x22:    Version: 0.0.13
19:52:11:WU01:FS02:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
19:52:11:WU01:FS02:0x22:  Copyright: 2020 foldingathome.org
19:52:11:WU01:FS02:0x22:   Homepage: https://foldingathome.org/
19:52:11:WU01:FS02:0x22:       Date: Sep 19 2020
19:52:11:WU01:FS02:0x22:       Time: 02:35:58
19:52:11:WU01:FS02:0x22:   Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
19:52:11:WU01:FS02:0x22:     Branch: core22-0.0.13
19:52:11:WU01:FS02:0x22:   Compiler: Visual C++ 2015
19:52:11:WU01:FS02:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
19:52:11:WU01:FS02:0x22:             -DOPENMM_GIT_HASH="\"189320d0\""
19:52:11:WU01:FS02:0x22:   Platform: win32 10
19:52:11:WU01:FS02:0x22:       Bits: 64
19:52:11:WU01:FS02:0x22:       Mode: Release
19:52:11:WU01:FS02:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
19:52:11:WU01:FS02:0x22:             <peastman@stanford.edu>
19:52:11:WU01:FS02:0x22:       Args: -dir 01 -suffix 01 -version 706 -lifeline 5076 -checkpoint 5
19:52:11:WU01:FS02:0x22:             -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor
19:52:11:WU01:FS02:0x22:             nvidia -gpu 0 -gpu-usage 100
19:52:11:WU01:FS02:0x22:************************************ libFAH ************************************
19:52:11:WU01:FS02:0x22:       Date: Sep 7 2020
19:52:11:WU01:FS02:0x22:       Time: 19:09:56
19:52:11:WU01:FS02:0x22:   Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
19:52:11:WU01:FS02:0x22:     Branch: HEAD
19:52:11:WU01:FS02:0x22:   Compiler: Visual C++ 2015
19:52:11:WU01:FS02:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
19:52:11:WU01:FS02:0x22:   Platform: win32 10
19:52:11:WU01:FS02:0x22:       Bits: 64
19:52:11:WU01:FS02:0x22:       Mode: Release
19:52:11:WU01:FS02:0x22:************************************ CBang *************************************
19:52:11:WU01:FS02:0x22:       Date: Sep 7 2020
19:52:11:WU01:FS02:0x22:       Time: 19:08:30
19:52:11:WU01:FS02:0x22:   Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
19:52:11:WU01:FS02:0x22:     Branch: HEAD
19:52:11:WU01:FS02:0x22:   Compiler: Visual C++ 2015
19:52:11:WU01:FS02:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
19:52:11:WU01:FS02:0x22:   Platform: win32 10
19:52:11:WU01:FS02:0x22:       Bits: 64
19:52:11:WU01:FS02:0x22:       Mode: Release
19:52:11:WU01:FS02:0x22:************************************ System ************************************
19:52:11:WU01:FS02:0x22:        CPU: Intel(R) Core(TM) i3-7100 CPU @ 3.90GHz
19:52:11:WU01:FS02:0x22:     CPU ID: GenuineIntel Family 6 Model 158 Stepping 9
19:52:11:WU01:FS02:0x22:       CPUs: 4
19:52:11:WU01:FS02:0x22:     Memory: 3.95GiB
19:52:11:WU01:FS02:0x22:Free Memory: 2.03GiB
19:52:11:WU01:FS02:0x22:    Threads: WINDOWS_THREADS
19:52:11:WU01:FS02:0x22: OS Version: 6.2
19:52:11:WU01:FS02:0x22:Has Battery: false
19:52:11:WU01:FS02:0x22: On Battery: false
19:52:11:WU01:FS02:0x22: UTC Offset: -6
19:52:11:WU01:FS02:0x22:        PID: 9232
19:52:11:WU01:FS02:0x22:        CWD: C:\ProgramData\FAHClient\work
19:52:11:WU01:FS02:0x22:************************************ OpenMM ************************************
19:52:11:WU01:FS02:0x22:   Revision: 189320d0
19:52:11:WU01:FS02:0x22:********************************************************************************
19:52:11:WU01:FS02:0x22:Project: 17420 (Run 0, Clone 883, Gen 69)
19:52:11:WU01:FS02:0x22:Unit: 0x0000005580fccb090000000000000373
19:52:11:WU01:FS02:0x22:Reading tar file core.xml
19:52:11:WU01:FS02:0x22:Reading tar file integrator.xml.bz2
19:52:11:WU01:FS02:0x22:Reading tar file state.xml.bz2
19:52:11:WU01:FS02:0x22:Reading tar file system.xml.bz2
19:52:11:WU01:FS02:0x22:Digital signatures verified
19:52:11:WU01:FS02:0x22:Folding@home GPU Core22 Folding@home Core
19:52:11:WU01:FS02:0x22:Version 0.0.13
19:52:11:WU01:FS02:0x22:  Checkpoint write interval: 25000 steps (2%) [50 total]
19:52:11:WU01:FS02:0x22:  JSON viewer frame write interval: 12500 steps (1%) [100 total]
19:52:11:WU01:FS02:0x22:  XTC frame write interval: 10000 steps (0.8%) [125 total]
19:52:11:WU01:FS02:0x22:  Global context and integrator variables write interval: disabled
19:52:11:WU01:FS02:0x22:There are 4 platforms available.
19:52:11:WU01:FS02:0x22:Platform 0: Reference
19:52:11:WU01:FS02:0x22:Platform 1: CPU
19:52:11:WU01:FS02:0x22:Platform 2: OpenCL
19:52:11:WU01:FS02:0x22:  opencl-device 0 specified
19:52:11:WU01:FS02:0x22:Platform 3: CUDA
19:52:11:WU01:FS02:0x22:  cuda-device 0 specified
19:52:12:WU00:FS02:Upload complete
19:52:12:WU00:FS02:Server responded WORK_ACK (400)
19:52:12:WU00:FS02:Cleaning up
19:52:20:WU01:FS02:0x22:Attempting to create CUDA context:
19:52:20:WU01:FS02:0x22:  Configuring platform CUDA
19:52:24:WU01:FS02:0x22:  Using CUDA and gpu 0
19:52:24:WU01:FS02:0x22:Completed 0 out of 1250000 steps (0%)
19:52:25:WU01:FS02:0x22:Checkpoint completed at step 0
19:53:37:WU01:FS02:0x22:Completed 12500 out of 1250000 steps (1%)
19:54:06:WU01:FS02:0x22:An exception occurred at step 16981: Error invoking kernel: CUDA_ERROR_ILLEGAL_ADDRESS (700)
19:54:06:WU01:FS02:0x22:ERROR:98: Attempting to restart from last good checkpoint by restarting core.
19:54:06:WU01:FS02:0x22:Folding@home Core Shutdown: CORE_RESTART
19:54:18:WARNING:WU01:FS02:FahCore returned an unknown error code which probably indicates that it crashed
19:54:18:WARNING:WU01:FS02:FahCore returned: UNKNOWN_ENUM (-1073740791 = 0xc0000409)
19:54:18:WU01:FS02:Starting
19:54:18:WU01:FS02:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\ProgramData\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.13/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 706 -lifeline 11520 -checkpoint 5 -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
19:54:18:WU01:FS02:Started FahCore on PID 7800
intercessor
 
Posts: 4
Joined: Thu Nov 19, 2020 6:54 pm

Re: Work units crashing 2x 1070ti set-up

Postby JimboPalmer » Fri Nov 20, 2020 12:30 am

If you ever do post the first 200 lines of the log, (not the error) we will know what CPU you have, how much RAM what OS, which GPUs, which drivers, what your configuration is in F@H, etc.

it really will help.

While we wait for the front of the log, did you add any hardware after you installed F@H software? If so, you may wish to uninstall including data, and reinstall. Traditionally the installer has been better at detecting hardware than the client.
JimboPalmer
 
Posts: 2074
Joined: Mon Feb 16, 2009 5:12 am
Location: Greenwood MS USA

Re: Work units crashing 2x 1070ti set-up

Postby intercessor » Fri Nov 20, 2020 1:45 am

Hopefully, this is what you are looking for...?
I had a biostar MB that only 1 gpu would run on, or rather they would both fold and in a few minutes one would crash. So I switched to the prime z270-p, but same issue. Win 10, nvidia drivers everything up to date.

Good day

Code: Select all
******************** Log Started 2020-11-19T20:40:41Z ***********************
20:40:41:******************************* libFAH ********************************
20:40:41:           Date: Oct 20 2020
20:40:41:           Time: 13:36:55
20:40:41:       Revision: 5ca109d295a6245e2a2f590b3d0085ad5e567aeb
20:40:41:         Branch: master
20:40:41:       Compiler: Visual C++ 2015
20:40:41:        Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
20:40:41:       Platform: win32 10
20:40:41:           Bits: 32
20:40:41:           Mode: Release
20:40:41:****************************** FAHClient ******************************
20:40:41:        Version: 7.6.21
20:40:41:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
20:40:41:      Copyright: 2020 foldingathome.org
20:40:41:       Homepage: https://foldingathome.org/
20:40:41:           Date: Oct 20 2020
20:40:41:           Time: 13:41:04
20:40:41:       Revision: 6efbf0e138e22d3963e6a291f78dcb9c6422a278
20:40:41:         Branch: master
20:40:41:       Compiler: Visual C++ 2015
20:40:41:        Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
20:40:41:       Platform: win32 10
20:40:41:           Bits: 32
20:40:41:           Mode: Release
20:40:41:         Config: C:\ProgramData\FAHClient\config.xml
20:40:41:******************************** CBang ********************************
20:40:41:           Date: Oct 20 2020
20:40:41:           Time: 11:36:18
20:40:41:       Revision: 7e4ce85225d7eaeb775e87c31740181ca603de60
20:40:41:         Branch: master
20:40:41:       Compiler: Visual C++ 2015
20:40:41:        Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Zc:throwingNew /MT
20:40:41:       Platform: win32 10
20:40:41:           Bits: 32
20:40:41:           Mode: Release
20:40:41:******************************* System ********************************
20:40:41:            CPU: Intel(R) Core(TM) i3-7100 CPU @ 3.90GHz
20:40:41:         CPU ID: GenuineIntel Family 6 Model 158 Stepping 9
20:40:41:           CPUs: 4
20:40:41:         Memory: 3.95GiB
20:40:41:    Free Memory: 1.53GiB
20:40:41:        Threads: WINDOWS_THREADS
20:40:41:     OS Version: 6.2
20:40:41:    Has Battery: false
20:40:41:     On Battery: false
20:40:41:     UTC Offset: -6
20:40:41:            PID: 10492
20:40:41:            CWD: C:\ProgramData\FAHClient
20:40:41:  Win32 Service: false
20:40:41:             OS: Windows 10 Home
20:40:41:        OS Arch: AMD64
20:40:41:           GPUs: 2
20:40:41:          GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:7 GP104 [GeForce GTX 1070 Ti] 8186
20:40:41:          GPU 1: Bus:3 Slot:0 Func:0 NVIDIA:7 GP104 [GeForce GTX 1070 Ti] 8186
20:40:41:  CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:6.1 Driver:11.1
20:40:41:  CUDA Device 1: Platform:0 Device:1 Bus:3 Slot:0 Compute:6.1 Driver:11.1
20:40:41:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:457.30
20:40:41:OpenCL Device 1: Platform:0 Device:1 Bus:3 Slot:0 Compute:1.2 Driver:457.30
20:40:41:***********************************************************************
20:40:41:<config>
20:40:41:  <!-- Folding Core -->
20:40:41:  <checkpoint v='5'/>
20:40:41:  <core-priority v='low'/>
20:40:41:
20:40:41:  <!-- Folding Slot Configuration -->
20:40:41:  <cause v='HIGH_PRIORITY'/>
20:40:41:
20:40:41:  <!-- Network -->
20:40:41:  <proxy v=':8080'/>
20:40:41:
20:40:41:  <!-- Slot Control -->
20:40:41:  <pause-on-battery v='false'/>
20:40:41:  <power v='FULL'/>
20:40:41:
20:40:41:  <!-- User Information -->
20:40:41:  <passkey v='*****'/>
20:40:41:  <team v='36837'/>
20:40:41:  <user v='intercessor'/>
20:40:41:
20:40:41:  <!-- Folding Slots -->
20:40:41:  <slot id='0' type='GPU'/>
20:40:41:  <slot id='1' type='GPU'/>
20:40:41:</config>
20:40:41:Trying to access database...
20:40:41:Successfully acquired database lock
20:40:41:FS00:Initialized folding slot 00: gpu:1:0 GP104 [GeForce GTX 1070 Ti] 8186
20:40:41:FS01:Initialized folding slot 01: gpu:3:0 GP104 [GeForce GTX 1070 Ti] 8186
20:40:41:WU00:FS00:Connecting to assign1.foldingathome.org:80
20:40:41:WU01:FS01:Connecting to assign1.foldingathome.org:80
20:40:42:WU00:FS00:Assigned to work server 129.213.157.105
20:40:42:WU00:FS00:Requesting new work unit for slot 00: gpu:1:0 GP104 [GeForce GTX 1070 Ti] 8186 from 129.213.157.105
20:40:42:WU01:FS01:Assigned to work server 18.188.125.154
20:40:42:WU00:FS00:Connecting to 129.213.157.105:8080
20:40:42:WU01:FS01:Requesting new work unit for slot 01: gpu:3:0 GP104 [GeForce GTX 1070 Ti] 8186 from 18.188.125.154
20:40:42:WU01:FS01:Connecting to 18.188.125.154:8080
20:40:43:WU01:FS01:Downloading 7.52MiB
20:40:46:ERROR:WU00:FS00:Exception: 10002: Received short response, expected 512 bytes, got 0
20:40:46:ERROR:WU01:FS01:Exception: Transfer failed
20:40:47:WU00:FS00:Connecting to assign1.foldingathome.org:80
20:40:47:WU01:FS01:Connecting to assign1.foldingathome.org:80
20:40:47:WU00:FS00:Assigned to work server 129.213.157.105
20:40:47:WU00:FS00:Requesting new work unit for slot 00: gpu:1:0 GP104 [GeForce GTX 1070 Ti] 8186 from 129.213.157.105
20:40:47:WU01:FS01:Assigned to work server 128.252.203.9
20:40:47:WU00:FS00:Connecting to 129.213.157.105:8080
20:40:47:WU01:FS01:Requesting new work unit for slot 01: gpu:3:0 GP104 [GeForce GTX 1070 Ti] 8186 from 128.252.203.9
20:40:47:WU01:FS01:Connecting to 128.252.203.9:8080
20:40:47:WU01:FS01:Downloading 12.49MiB
20:40:51:WU01:FS01:Download complete
20:40:51:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:17421 run:0 clone:241 gen:70 core:0x22 unit:0x0000005380fccb0900000000000000f1
20:40:51:WU01:FS01:Starting
20:40:51:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\ProgramData\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.13/Core_22.fah/FahCore_22.exe -dir 01 -suffix 01 -version 706 -lifeline 10492 -checkpoint 5 -opencl-platform 0 -opencl-device 1 -cuda-device 1 -gpu-vendor nvidia -gpu 1 -gpu-usage 100
20:40:51:WU01:FS01:Started FahCore on PID 9832
20:40:51:WU01:FS01:Core PID:3632
20:40:51:WU01:FS01:FahCore 0x22 started
20:40:52:WU01:FS01:0x22:*********************** Log Started 2020-11-19T20:40:52Z ***********************
20:40:52:WU01:FS01:0x22:*************************** Core22 Folding@home Core ***************************
20:40:52:WU01:FS01:0x22:       Core: Core22
20:40:52:WU01:FS01:0x22:       Type: 0x22
20:40:52:WU01:FS01:0x22:    Version: 0.0.13
20:40:52:WU01:FS01:0x22:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
20:40:52:WU01:FS01:0x22:  Copyright: 2020 foldingathome.org
20:40:52:WU01:FS01:0x22:   Homepage: https://foldingathome.org/
20:40:52:WU01:FS01:0x22:       Date: Sep 19 2020
20:40:52:WU01:FS01:0x22:       Time: 02:35:58
20:40:52:WU01:FS01:0x22:   Revision: 571cf95de6de2c592c7c3ed48fcfb2e33e9ea7d3
20:40:52:WU01:FS01:0x22:     Branch: core22-0.0.13
20:40:52:WU01:FS01:0x22:   Compiler: Visual C++ 2015
20:40:52:WU01:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
20:40:52:WU01:FS01:0x22:             -DOPENMM_GIT_HASH="\"189320d0\""
20:40:52:WU01:FS01:0x22:   Platform: win32 10
20:40:52:WU01:FS01:0x22:       Bits: 64
20:40:52:WU01:FS01:0x22:       Mode: Release
20:40:52:WU01:FS01:0x22:Maintainers: John Chodera <john.chodera@choderalab.org> and Peter Eastman
20:40:52:WU01:FS01:0x22:             <peastman@stanford.edu>
20:40:52:WU01:FS01:0x22:       Args: -dir 01 -suffix 01 -version 706 -lifeline 9832 -checkpoint 5
20:40:52:WU01:FS01:0x22:             -opencl-platform 0 -opencl-device 1 -cuda-device 1 -gpu-vendor
20:40:52:WU01:FS01:0x22:             nvidia -gpu 1 -gpu-usage 100
20:40:52:WU01:FS01:0x22:************************************ libFAH ************************************
20:40:52:WU01:FS01:0x22:       Date: Sep 7 2020
20:40:52:WU01:FS01:0x22:       Time: 19:09:56
20:40:52:WU01:FS01:0x22:   Revision: 44301ed97b996b63fe736bb8073f22209cb2b603
20:40:52:WU01:FS01:0x22:     Branch: HEAD
20:40:52:WU01:FS01:0x22:   Compiler: Visual C++ 2015
20:40:52:WU01:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
20:40:52:WU01:FS01:0x22:   Platform: win32 10
20:40:52:WU01:FS01:0x22:       Bits: 64
20:40:52:WU01:FS01:0x22:       Mode: Release
20:40:52:WU01:FS01:0x22:************************************ CBang *************************************
20:40:52:WU01:FS01:0x22:       Date: Sep 7 2020
20:40:52:WU01:FS01:0x22:       Time: 19:08:30
20:40:52:WU01:FS01:0x22:   Revision: 33fcfc2b3ed2195a423606a264718e31e6b3903f
20:40:52:WU01:FS01:0x22:     Branch: HEAD
20:40:52:WU01:FS01:0x22:   Compiler: Visual C++ 2015
20:40:52:WU01:FS01:0x22:    Options: /TP /nologo /EHa /wd4297 /wd4103 /O2 /Ob3 /Zc:throwingNew /MT
20:40:52:WU01:FS01:0x22:   Platform: win32 10
20:40:52:WU01:FS01:0x22:       Bits: 64
20:40:52:WU01:FS01:0x22:       Mode: Release
20:40:52:WU01:FS01:0x22:************************************ System ************************************
20:40:52:WU01:FS01:0x22:        CPU: Intel(R) Core(TM) i3-7100 CPU @ 3.90GHz
20:40:52:WU01:FS01:0x22:     CPU ID: GenuineIntel Family 6 Model 158 Stepping 9
20:40:52:WU01:FS01:0x22:       CPUs: 4
20:40:52:WU01:FS01:0x22:     Memory: 3.95GiB
20:40:52:WU01:FS01:0x22:Free Memory: 1.62GiB
20:40:52:WU01:FS01:0x22:    Threads: WINDOWS_THREADS
20:40:52:WU01:FS01:0x22: OS Version: 6.2
20:40:52:WU01:FS01:0x22:Has Battery: false
20:40:52:WU01:FS01:0x22: On Battery: false
20:40:52:WU01:FS01:0x22: UTC Offset: -6
20:40:52:WU01:FS01:0x22:        PID: 3632
20:40:52:WU01:FS01:0x22:        CWD: C:\ProgramData\FAHClient\work
20:40:52:WU01:FS01:0x22:************************************ OpenMM ************************************
20:40:52:WU01:FS01:0x22:   Revision: 189320d0
20:40:52:WU01:FS01:0x22:********************************************************************************
20:40:52:WU01:FS01:0x22:Project: 17421 (Run 0, Clone 241, Gen 70)
20:40:52:WU01:FS01:0x22:Unit: 0x0000005380fccb0900000000000000f1
20:40:52:WU01:FS01:0x22:Reading tar file core.xml
20:40:52:WU01:FS01:0x22:Reading tar file integrator.xml.bz2
20:40:52:WU01:FS01:0x22:Reading tar file state.xml.bz2
20:40:52:WU01:FS01:0x22:Reading tar file system.xml.bz2
20:40:52:WU01:FS01:0x22:Digital signatures verified
20:40:52:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
20:40:52:WU01:FS01:0x22:Version 0.0.13
20:40:52:WU01:FS01:0x22:  Checkpoint write interval: 25000 steps (2%) [50 total]
20:40:52:WU01:FS01:0x22:  JSON viewer frame write interval: 12500 steps (1%) [100 total]
20:40:52:WU01:FS01:0x22:  XTC frame write interval: 10000 steps (0.8%) [125 total]
20:40:52:WU01:FS01:0x22:  Global context and integrator variables write interval: disabled
20:40:53:WU00:FS00:Downloading 17.29MiB
20:40:53:WU01:FS01:0x22:There are 4 platforms available.
20:40:53:WU01:FS01:0x22:Platform 0: Reference
20:40:53:WU01:FS01:0x22:Platform 1: CPU
20:40:53:WU01:FS01:0x22:Platform 2: OpenCL
20:40:53:WU01:FS01:0x22:  opencl-device 1 specified
20:40:53:WU01:FS01:0x22:Platform 3: CUDA
20:40:53:WU01:FS01:0x22:  cuda-device 1 specified
20:40:55:WU00:FS00:Download complete
20:40:56:WU00:FS00:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:14909 run:492 clone:5 gen:134 core:0x22 unit:0x000000ac81d59d695f52601e02dc0f4d
20:40:56:WU00:FS00:Starting
20:40:56:WU00:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:\ProgramData\FAHClient\cores/cores.foldingathome.org/win/64bit/22-0.0.13/Core_22.fah/FahCore_22.exe -dir 00 -suffix 01 -version 706 -lifeline 10492 -checkpoint 5 -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
20:40:56:WU00:FS00:Started FahCore on PID 1368
intercessor
 
Posts: 4
Joined: Thu Nov 19, 2020 6:54 pm

Re: Work units crashing 2x 1070ti set-up

Postby JimboPalmer » Fri Nov 20, 2020 1:42 pm

Sadly, I see no errors in your set up.

Have you tested one card at a time to see if it is hardware?
JimboPalmer
 
Posts: 2074
Joined: Mon Feb 16, 2009 5:12 am
Location: Greenwood MS USA

Re: Work units crashing 2x 1070ti set-up

Postby Neil-B » Fri Nov 20, 2020 3:06 pm

Three thoughts .. hardware, power or temp related? .. is it always the same gpu crashing? - if so might just be hardware being pushed by certain types of WU too much (all of the topics re altering clocks etc might help) - is it always the gpu in a certain mobo slot? (such as top one) - which might possibly indicate thermal? (the delay in failure might indicate this or next as heat/load builds up and trips) - does it vary which gpu? - which might indicate some kind of instability in power if supply is right near its max .. having said all that I am not a gpu folder guru so there may be better advice available ;)
1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro, Quadro M1000M 2GB, FAH 7.6.21
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro, GTX 750Ti 2GB, FAH 7.6.21
Neil-B
 
Posts: 1504
Joined: Sun Mar 22, 2020 6:52 pm
Location: UK

Re: Work units crashing 2x 1070ti set-up

Postby intercessor » Fri Nov 20, 2020 11:50 pm

I got around to down clocking the gpu by 100mhz and so far it has completed 13% with no issues. It was normally crashing after 1%... Fingers crossed.

Tnx for the suggestions.
intercessor
 
Posts: 4
Joined: Thu Nov 19, 2020 6:54 pm


Return to GPU Projects and FahCores

Who is online

Users browsing this forum: No registered users and 1 guest

cron