Project: 9411 (Run 1718, Clone 0, Gen 54)

Moderators: Site Moderators, FAHC Science Team

Post Reply
davidcoton
Posts: 1102
Joined: Wed Nov 05, 2008 3:19 pm
Location: Cambridge, UK

Project: 9411 (Run 1718, Clone 0, Gen 54)

Post by davidcoton »

Possible bad WU?
Edited log:

Code: Select all

04:20:27:WU00:FS01:0x17:Project: 9411 (Run 1718, Clone 0, Gen 54)
04:20:27:WU00:FS01:0x17:Unit: 0x00000049ab40413854d27dba9ecd2c90
04:20:27:WU00:FS01:0x17:CPU: 0x00000000000000000000000000000000
04:20:27:WU00:FS01:0x17:Machine: 1
04:20:27:WU00:FS01:0x17:Reading tar file state.xml
04:20:27:WU00:FS01:0x17:Reading tar file system.xml
04:20:27:WU00:FS01:0x17:Reading tar file integrator.xml
04:20:27:WU00:FS01:0x17:Reading tar file core.xml
04:20:27:WU00:FS01:0x17:Digital signatures verified
04:20:27:WU00:FS01:0x17:Folding@home GPU core17
04:20:27:WU00:FS01:0x17:Version 0.0.52
04:20:48:WU00:FS01:0x17:Completed 0 out of 16000000 steps (0%)
04:20:48:WU00:FS01:0x17:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
04:25:28:WU00:FS01:0x17:Completed 160000 out of 16000000 steps (1%)

******************************* Date: 2015-05-06 *******************************

10:22:42:WU00:FS01:0x17:Completed 11840000 out of 16000000 steps (74%)
10:22:47:WU00:FS01:0x17:ERROR:exception: Error downloading array energyBuffer: clEnqueueReadBuffer (-36)
10:22:47:WU00:FS01:0x17:Saving result file logfile_01.txt
10:22:48:WU00:FS01:0x17:Saving result file log.txt
10:22:48:WU00:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
Config etc:

Code: Select all

*********************** Log Started 2015-04-25T22:32:19Z ***********************
22:32:19:************************* Folding@home Client *************************
22:32:19:      Website: http://folding.stanford.edu/
22:32:19:    Copyright: (c) 2009-2014 Stanford University
22:32:19:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
22:32:19:         Args: 
22:32:19:       Config: C:/Users/David/AppData/Roaming/FAHClient/config.xml
22:32:19:******************************** Build ********************************
22:32:19:      Version: 7.4.4
22:32:19:         Date: Mar 4 2014
22:32:19:         Time: 20:26:54
22:32:19:      SVN Rev: 4130
22:32:19:       Branch: fah/trunk/client
22:32:19:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
22:32:19:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
22:32:19:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
22:32:19:     Platform: win32 XP
22:32:19:         Bits: 32
22:32:19:         Mode: Release
22:32:19:******************************* System ********************************
22:32:19:          CPU: AMD Athlon(tm) II X4 640 Processor
22:32:19:       CPU ID: AuthenticAMD Family 16 Model 5 Stepping 3
22:32:19:         CPUs: 4
22:32:19:       Memory: 3.12GiB
22:32:19:  Free Memory: 1.74GiB
22:32:19:      Threads: WINDOWS_THREADS
22:32:19:   OS Version: 6.0
22:32:19:  Has Battery: false
22:32:19:   On Battery: false
22:32:19:   UTC Offset: 1
22:32:19:          PID: 4320
22:32:19:          CWD: C:/Users/David/AppData/Roaming/FAHClient
22:32:19:           OS: Windows Vista (TM) Home Premium Service Pack 2
22:32:19:      OS Arch: X86
22:32:19:         GPUs: 1
22:32:19:        GPU 0: NVIDIA:5 GM204 [GeForce GTX 980]
22:32:19:         CUDA: 5.2
22:32:19:  CUDA Driver: 7000
22:32:19:Win32 Service: false
22:32:19:***********************************************************************
22:32:19:<config>
22:32:19:  <!-- Folding Core -->
22:32:19:  <checkpoint v='5'/>
22:32:19:
22:32:19:  <!-- HTTP Server -->
22:32:19:  <allow v='127.0.0.1 192.168.1.0/24'/>
22:32:19:  <deny v='0.0.0.0/0'/>
22:32:19:  <http-addresses v='127.0.0.1:7396 david-ubuntu:7396'/>
22:32:19:
22:32:19:  <!-- Network -->
22:32:19:  <proxy v=':8080'/>
22:32:19:
22:32:19:  <!-- Remote Command Server -->
22:32:19:  <password v='*******'/>
22:32:19:
22:32:19:  <!-- Slot Control -->
22:32:19:  <power v='full'/>
22:32:19:
22:32:19:  <!-- User Information -->
22:32:19:  <passkey v='********************************'/>
22:32:19:  <user v='davidcoton'/>
22:32:19:
22:32:19:  <!-- Web Server -->
22:32:19:  <web-allow v='127.0.0.1 168.192.1.0/24'/>
22:32:19:
22:32:19:  <!-- Folding Slots -->
22:32:19:  <slot id='0' type='CPU'>
22:32:19:    <client-type v='advanced'/>
22:32:19:    <cpus v='3'/>
22:32:19:    <paused v='true'/>
22:32:19:  </slot>
22:32:19:  <slot id='1' type='GPU'>
22:32:19:    <client-type v='advanced'/>
22:32:19:    <paused v='true'/>
22:32:19:  </slot>
22:32:19:</config>
Image
sortofageek
Site Admin
Posts: 3111
Joined: Fri Nov 30, 2007 8:06 pm
Location: Team Helix
Contact:

Re: Project: 9411 (Run 1718, Clone 0, Gen 54)

Post by sortofageek »

There is nothing back for that one so far. I flagged it so we'll be reminded to check on it.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 9411 (Run 1718, Clone 0, Gen 54)

Post by bruce »

Hi davidcoton (team 0),
Your WU (P9411 R1718 C0 G54) was added to the stats database on 2015-05-06 22:04:17 for 46086.4 points of credit.
davidcoton
Posts: 1102
Joined: Wed Nov 05, 2008 3:19 pm
Location: Cambridge, UK

Re: Project: 9411 (Run 1718, Clone 0, Gen 54)

Post by davidcoton »

Interesting. Looks like the bad WU error coincided with a Windows crash. On recovery, the same WU was restarted and ran to completion. I'll need to check Windows logs sometime to see if there are any clues.
Image
billford
Posts: 1005
Joined: Thu May 02, 2013 8:46 pm
Hardware configuration: Full Time:

2x NVidia GTX 980
1x NVidia GTX 780 Ti
2x 3GHz Core i5 PC (Linux)

Retired:

3.2GHz Core i5 PC (Linux)
3.2GHz Core i5 iMac
2.8GHz Core i5 iMac
2.16GHz Core 2 Duo iMac
2GHz Core 2 Duo MacBook
1.6GHz Core 2 Duo Acer laptop
Location: Near Oxford, United Kingdom
Contact:

Re: Project: 9411 (Run 1718, Clone 0, Gen 54)

Post by billford »

I've seen that, very occasionally- if something goes pear-shaped while a checkpoint is being written out, the WU will simply (attempt to) restart from scratch.
Image
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project: 9411 (Run 1718, Clone 0, Gen 54)

Post by bruce »

After certain types of errors, the client will request a new WU and occasionally the servers will give you a new copy of the same WU. That logic is not new.

In addition, there has been an unannounced enhancement to future versions of some GPU cores which recognizes that certain (rare) types of errors can be fixed by restarting from the previous checkpoint, with the loss of very little progress. This will happen automatically. The only time I've seen it restart from scratch without downloading is when no checkpoints have yet been written.

Neither of these options appear to coincide with the log extract that was posted so technically we're off-topic.
davidcoton
Posts: 1102
Joined: Wed Nov 05, 2008 3:19 pm
Location: Cambridge, UK

Re: Project: 9411 (Run 1718, Clone 0, Gen 54)

Post by davidcoton »

I found nothing useful in the logs, possible original error was 0x00000116 which seems to be video card related. Card is normally stable, not overclocked, etc. No application errors so the cause remains a mystery, just hope it is random and not a h/w failure (old PC which runs my business, new GTX980 GPU).
Image
Post Reply