Bad Work Unit - need good stress test?

A forum for discussing FAH-related hardware choices and info on actual products (not speculation).

Moderator: Site Moderators

Forum rules
Please read the forum rules before posting.

Bad Work Unit - need good stress test?

Postby KA1J » Sat May 23, 2020 10:23 pm

I've run into another bad work unit seen below at 16:33. The first bad work unit/crash was, I was told, picked up by another cruncher and was completed successfully.

I have tried to stress test the GPU and all my attempts have been successful. The two tests I have tried are FAHBench using OpenCL, single precision, WU is DHFR, NaN disabled. tested 30 minutes with no drama.
Results: Score 179.037 & 23558 atoms.

I ran Unigine Heaven with the below results:

Image


I do find when I run benchmarks, it's a few degrees C less than when I run F@H and since I have no trouble with benchmarking and the tests not having errors, I'm stumped. Is there a test that challenges the GPU as much as F@H that will tell me if any errors are coming up?

Reviewing with search I have found reference in this forum to deal with this error, to changing: configure > Slots > GPU > Edit OpenCL-index from -1 to 1. Before I make that change, I'm asking here before I do that.

Also, a better F@H compatible stress test?

Thanks


Code: Select all
*********************** Log Started 2020-05-23T15:37:48Z ***********************
16:18:43:WARNING:WU01:FS00:Failed to get assignment from 'assign1.foldingathome.org:80': No WUs available for this configuration
16:18:43:WARNING:WU01:FS00:Failed to get assignment from 'assign2.foldingathome.org:80': No WUs available for this configuration
16:18:43:WARNING:WU01:FS00:Failed to get assignment from 'assign3.foldingathome.org:80': No WUs available for this configuration
16:20:41:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
16:21:02:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 3.133.76.19:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
16:21:23:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
16:21:44:WARNING:WU00:FS00:Exception: Failed to send results to work server: Failed to connect to 3.133.76.19:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
16:22:24:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
16:23:48:WARNING:WU00:FS00:Exception: Failed to send results to work server: Transfer failed
16:33:22:WU01:FS00:0x22:ERROR:exception: clWaitForEvents
16:33:26:WARNING:WU01:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
16:33:27:WARNING:WU00:FS00:Failed to get assignment from 'assign1.foldingathome.org:80': No WUs available for this configuration
16:33:27:WARNING:WU00:FS00:Failed to get assignment from 'assign2.foldingathome.org:80': No WUs available for this configuration
16:33:27:WARNING:WU00:FS00:Failed to get assignment from 'assign3.foldingathome.org:80': No WUs available for this configuration
16:33:27:WARNING:WU00:FS00:Failed to get assignment from 'assign4.foldingathome.org:80': No WUs available for this configuration
16:33:27:ERROR:WU00:FS00:Exception: Could not get an assignment
16:33:27:WARNING:WU00:FS00:Failed to get assignment from 'assign1.foldingathome.org:80': No WUs available for this configuration
16:33:47:WARNING:WU01:FS00:WorkServer connection failed on port 8080 trying 80
16:34:08:WARNING:WU01:FS00:Exception: Failed to send results to work server: Failed to connect to 3.133.76.19:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
16:34:23:WARNING:WU01:FS00:Exception: Failed to send results to work server: Transfer failed
16:35:29:WARNING:WU01:FS00:WorkServer connection failed on port 8080 trying 80
16:35:50:WARNING:WU01:FS00:Exception: Failed to send results to work server: Failed to connect to 3.133.76.19:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
16:36:19:WU00:FS00:0x22:ERROR:exception: clWaitForEvents
16:36:20:WARNING:WU00:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
16:36:20:WARNING:WU02:FS00:Failed to get assignment from 'assign1.foldingathome.org:80': No WUs available for this configuration
16:36:21:WARNING:WU02:FS00:Failed to get assignment from 'assign2.foldingathome.org:80': No WUs available for this configuration
16:36:21:WARNING:WU02:FS00:Failed to get assignment from 'assign3.foldingathome.org:80': No WUs available for this configuration
16:36:21:WARNING:WU02:FS00:Failed to get assignment from 'assign4.foldingathome.org:80': No WUs available for this configuration
16:36:21:ERROR:WU02:FS00:Exception: Could not get an assignment
16:36:21:WARNING:WU02:FS00:Failed to get assignment from 'assign1.foldingathome.org:80': No WUs available for this configuration
16:36:22:WARNING:WU02:FS00:Failed to get assignment from 'assign2.foldingathome.org:80': No WUs available for this configuration
16:36:22:WARNING:WU02:FS00:Failed to get assignment from 'assign3.foldingathome.org:80': No WUs available for this configuration
16:36:22:WARNING:WU02:FS00:Failed to get assignment from 'assign4.foldingathome.org:80': No WUs available for this configuration
16:36:22:ERROR:WU02:FS00:Exception: Could not get an assignment
16:36:41:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
16:37:06:WARNING:WU01:FS00:WorkServer connection failed on port 8080 trying 80
16:37:21:WARNING:WU02:FS00:Failed to get assignment from 'assign1.foldingathome.org:80': No WUs available for this configuration
16:37:21:WARNING:WU02:FS00:Failed to get assignment from 'assign2.foldingathome.org:80': No WUs available for this configuration
16:37:22:WARNING:WU02:FS00:Failed to get assignment from 'assign3.foldingathome.org:80': No WUs available for this configuration
16:37:22:WARNING:WU02:FS00:Failed to get assignment from 'assign4.foldingathome.org:80': No WUs available for this configuration
16:37:22:ERROR:WU02:FS00:Exception: Could not get an assignment
16:37:28:WARNING:WU01:FS00:Exception: Failed to send results to work server: Failed to connect to 3.133.76.19:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
16:38:58:WARNING:WU02:FS00:Failed to get assignment from 'assign1.foldingathome.org:80': No WUs available for this configuration
16:38:59:WARNING:WU02:FS00:Failed to get assignment from 'assign2.foldingathome.org:80': No WUs available for this configuration
16:38:59:WARNING:WU02:FS00:Failed to get assignment from 'assign3.foldingathome.org:80': No WUs available for this configuration
16:38:59:WARNING:WU02:FS00:Failed to get assignment from 'assign4.foldingathome.org:80': No WUs available for this configuration
16:38:59:ERROR:WU02:FS00:Exception: Could not get an assignment
ROG Strix Z390-E Motherboard
i7-8700K @ 5100 MHz
32 Gig GSkill RAM
NVIDIA GeForce GTX 1080
KA1J
 
Posts: 43
Joined: Tue Mar 10, 2020 8:53 am
Location: CT

Re: Bad Work Unit - need good stress test?

Postby PantherX » Sun May 24, 2020 12:59 am

Please note that we need to see the entire section of the log file just before the failure, during the failure and after the failure. Unfortunately, this line doesn't tell me what WU it was:
16:36:20:WARNING:WU00:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)

Generally speaking, most stress test and benchmarking applications are less stressful than F@H. Those software focus on rendering aspects of the GPU while folding focuses on the compute aspects of the GPU. Hence, it is tough to measure the stability of a GPU using third party applications. The current version of FAHBench doesn't support FahCore_22 but there are plans to get to it once a new version of FAHCore_22 is released (no ETA).

BTW, can you also please post the first 100 lines of the log file which will contain the details of your system and the client configuration?
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
User avatar
PantherX
Site Moderator
 
Posts: 6343
Joined: Wed Dec 23, 2009 10:33 am
Location: Land Of The Long White Cloud

Re: Bad Work Unit - need good stress test?

Postby KA1J » Sun May 24, 2020 1:44 am

That's all that's left of the info when the bad work unit appeared, I can't post any more from that one. F@H isn't delivering WU's & with that, I just closed down F@H, restarted & here's the log.

Code: Select all
*********************** Log Started 2020-05-24T00:32:06Z ***********************
00:32:06:Trying to access database...
00:32:06:Successfully acquired database lock
00:32:06:Read GPUs.txt
00:32:07:Enabled folding slot 00: READY gpu:0:TU102 [GeForce RTX 2080 Ti Rev. A] M 13448
00:32:07:****************************** FAHClient ******************************
00:32:07:        Version: 7.6.13
00:32:07:         Author: Joseph Coffland <joseph@cauldrondevelopment.com>
00:32:07:      Copyright: 2020 foldingathome.org
00:32:07:       Homepage: https://foldingathome.org/
00:32:07:           Date: Apr 27 2020
00:32:07:           Time: 21:21:01
00:32:07:       Revision: 5a652817f46116b6e135503af97f18e094414e3b
00:32:07:         Branch: master
00:32:07:       Compiler: Visual C++ 2008
00:32:07:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
00:32:07:       Platform: win32 10
00:32:07:           Bits: 32
00:32:07:           Mode: Release
00:32:07:           Args: --open-web-control
00:32:07:         Config: C:\Users\Zuul\AppData\Roaming\FAHClient\config.xml
00:32:07:******************************** CBang ********************************
00:32:07:           Date: Apr 24 2020
00:32:07:           Time: 17:07:55
00:32:07:       Revision: ea081a3b3b0f4a37c4d0440b4f1bc184197c7797
00:32:07:         Branch: master
00:32:07:       Compiler: Visual C++ 2008
00:32:07:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
00:32:07:       Platform: win32 10
00:32:07:           Bits: 32
00:32:07:           Mode: Release
00:32:07:******************************* System ********************************
00:32:07:            CPU: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
00:32:07:         CPU ID: GenuineIntel Family 6 Model 158 Stepping 10
00:32:07:           CPUs: 12
00:32:07:         Memory: 31.92GiB
00:32:07:    Free Memory: 17.89GiB
00:32:07:        Threads: WINDOWS_THREADS
00:32:07:     OS Version: 6.2
00:32:07:    Has Battery: false
00:32:07:     On Battery: false
00:32:07:     UTC Offset: -4
00:32:07:            PID: 20836
00:32:07:            CWD: C:\Users\Zuul\AppData\Roaming\FAHClient
00:32:07:  Win32 Service: false
00:32:07:             OS: Windows 10 Enterprise
00:32:07:        OS Arch: AMD64
00:32:07:           GPUs: 1
00:32:07:          GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:8 TU102 [GeForce RTX 2080 Ti Rev. A]
00:32:07:                 M 13448
00:32:07:  CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:7.5 Driver:11.0
00:32:07:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:445.87
00:32:07:******************************* libFAH ********************************
00:32:07:           Date: Apr 15 2020
00:32:07:           Time: 14:53:14
00:32:07:       Revision: 216968bc7025029c841ed6e36e81a03a316890d3
00:32:07:         Branch: master
00:32:07:       Compiler: Visual C++ 2008
00:32:07:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
00:32:07:       Platform: win32 10
00:32:07:           Bits: 32
00:32:07:           Mode: Release
00:32:07:***********************************************************************
00:32:07:<config>
00:32:07:  <!-- Folding Core -->
00:32:07:  <checkpoint v='3'/>
00:32:07:  <core-priority v='low'/>
00:32:07:
00:32:07:  <!-- Network -->
00:32:07:  <proxy v=':8080'/>
00:32:07:
00:32:07:  <!-- Slot Control -->
00:32:07:  <power v='FULL'/>
00:32:07:
00:32:07:  <!-- User Information -->
00:32:07:  <passkey v='*****'/>
00:32:07:  <team v='246763'/>
00:32:07:  <user v='KA1J'/>
00:32:07:
00:32:07:  <!-- Folding Slots -->
00:32:07:  <slot id='0' type='GPU'>
00:32:07:    <opencl-index v='0'/>
00:32:07:  </slot>
00:32:07:</config>
00:32:07:WU00:FS00:Connecting to assign1.foldingathome.org:80
00:32:07:WARNING:WU00:FS00:Failed to get assignment from 'assign1.foldingathome.org:80': No WUs available for this configuration
00:32:07:WU00:FS00:Connecting to assign2.foldingathome.org:80
00:32:08:WARNING:WU00:FS00:Failed to get assignment from 'assign2.foldingathome.org:80': No WUs available for this configuration
00:32:08:WU00:FS00:Connecting to assign3.foldingathome.org:80
00:32:08:WARNING:WU00:FS00:Failed to get assignment from 'assign3.foldingathome.org:80': No WUs available for this configuration
00:32:08:WU00:FS00:Connecting to assign4.foldingathome.org:80
00:32:08:WU00:FS00:Assigned to work server 3.133.76.19
00:32:08:WU00:FS00:Requesting new work unit for slot 00: READY gpu:0:TU102 [GeForce RTX 2080 Ti Rev. A] M 13448 from 3.133.76.19
00:32:08:WU00:FS00:Connecting to 3.133.76.19:8080
00:32:08:3:127.0.0.1:New Web session
00:32:29:WARNING:WU00:FS00:WorkServer connection failed on port 8080 trying 80
00:32:29:WU00:FS00:Connecting to 3.133.76.19:80


Unfortunate there's nothing easily available to properly challenge the computational end of the GPU as regards F@H. Thanks for the info.
KA1J
 
Posts: 43
Joined: Tue Mar 10, 2020 8:53 am
Location: CT

Re: Bad Work Unit - need good stress test?

Postby bruce » Sun May 24, 2020 2:57 am

FAH officially does not support overclocking. Maybe that's just another way of saying the FAHCore for GPUs is the de-facto benchmark for GPUs :D ... and all those other benchmarks are less stressful than FAH so don't depend on them for Stream Computing. (Yes, benchmarking video page rates is something different.)

If you've looking for other WUs that reported errors, you can find a number of previous log files in the "logs" subdirectory of FAH's data directory.
bruce
 
Posts: 19679
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.

Re: Bad Work Unit - need good stress test?

Postby KA1J » Sun May 24, 2020 3:25 am

Heh, short of doing a Macrium backup to a new drive & recloning that backup & then running F@H freshly with the same WU & operating conditions (with internet access off) & then using F@H as a valid testing benchmark, F@H would be a tough one to use as a benchmark. :)

For me, I'm sure for you & most of us, this is important for science & especially defeating Covid which is killing thousands worldwide every day. I want to be able to do the maximum number of WUs possible and I don't like corrupted WUs, if that happened because of my system. If my system has good integrity and the WU crashes, so be it, that's beyond my control.

As I read of so many at F@H who are overclocking to get more WU's done, it seems like a test available to fine-tune a GPU with the current core would be a proper software for quality control.
KA1J
 
Posts: 43
Joined: Tue Mar 10, 2020 8:53 am
Location: CT

Re: Bad Work Unit - need good stress test?

Postby bruce » Sun May 24, 2020 3:45 am

If FAH could block assignments to overclocked machines, they'd probably do it. Unfortunately when a machine asks for a new assignment, it doesn't say "...and I'm not overclocked" so we have to give them an assignment which may run without error on a cool day and may fail on a hot day (or whatever). FAH's design goal is to make use of ALL of the available resources you choose to donate ... and that has to be based on hardware that meets the original hardware design goals -- no more, no less.

On the other hand, when new code is being developed, it's supposed to find all software QC problems during beta testing. Unfortunately the beta testing team don't happen to have representatives of all OSs and all GPUs and all other client variations so it's difficult to define how long beta testing needs to run to catch all the potential issues.

Because of the urgency of the MoonShot project, that "supposed to" was relaxed because there were many, many clients that were not well represented during beta testing. To find those other issues, a few projects like 134xx temporarily slipped through the "beta testing only" firewall.

Sorry.
bruce
 
Posts: 19679
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.


Return to FAH Hardware

Who is online

Users browsing this forum: wuhuilin11 and 1 guest

cron