project:9101 run:435 clone:0 gen:65 core:0x17 unit:0x0000004

Moderators: Site Moderators, FAHC Science Team

Locked
Devlin85
Posts: 21
Joined: Fri Apr 25, 2014 3:56 pm

project:9101 run:435 clone:0 gen:65 core:0x17 unit:0x0000004

Post by Devlin85 »

03:02:50:WU03:FS00:0x17:Bad State detected... attempting to resume from last good checkpoint
03:02:50:WU03:FS00:0x17:Max number of retries reached. Aborting.
03:02:50:WU03:FS00:0x17:ERROR:exception: Max Retries Reached
03:02:50:WU03:FS00:0x17:Saving result file logfile_01.txt
03:02:50:WU03:FS00:0x17:Saving result file log.txt
03:02:50:WU03:FS00:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
03:02:51:WARNING:WU03:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
03:02:51:WU03:FS00:Sending unit results: id:03 state:SEND error:FAULTY project:9101 run:435 clone:0 gen:65 core:0x17 unit:0x000000430a3b1e81533f581756f6ca6f
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: project:9101 run:435 clone:0 gen:65 core:0x17 unit:0x000

Post by PantherX »

The WU was successfully completed by another donor so it isn't a bad one.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: project:9101 run:435 clone:0 gen:65 core:0x17 unit:0x000

Post by bruce »

0x0000004 indicates that Windows discovered a memory error. I recommend thorough diagnostics of main RAM and then reduce RAM overclocking if you don't find a bad stick.
Devlin85
Posts: 21
Joined: Fri Apr 25, 2014 3:56 pm

Re: project:9101 run:435 clone:0 gen:65 core:0x17 unit:0x000

Post by Devlin85 »

It's 9101 again.. Just got another one.. Actually locked my computer afterwards, I'm gonna run some memtests, but the 13000, 13001, and 9408's run and never seem to fail. just these 9101's..

Project: 9101 (Run 790, Clone 0, Gen 65)
Unit: 0x000000440a3b1e81533f77b99203944a
CPU: 0x00000000000000000000000000000000
Machine: 0
Reading tar file state.xml
Reading tar file system.xml
Reading tar file integrator.xml
Reading tar file core.xml
Digital signatures verified
Folding@home GPU core17
Version 0.0.52
Completed 0 out of 2500000 steps (0%)
Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
Completed 25000 out of 2500000 steps (1%)
Completed 50000 out of 2500000 steps (2%)
Bad State detected... attempting to resume from last good checkpoint
Completed 25000 out of 2500000 steps (1%)
Completed 50000 out of 2500000 steps (2%)
Bad State detected... attempting to resume from last good checkpoint
Completed 25000 out of 2500000 steps (1%)
Completed 50000 out of 2500000 steps (2%)
Bad State detected... attempting to resume from last good checkpoint
Max number of retries reached. Aborting.
ERROR:exception: Max Retries Reached
Saving result file logfile_01.txt
Saving result file log.txt
Folding@home Core Shutdown: BAD_WORK_UNIT
Last edited by Devlin85 on Mon May 05, 2014 5:41 pm, edited 1 time in total.
Devlin85
Posts: 21
Joined: Fri Apr 25, 2014 3:56 pm

Re: project:9101 run:435 clone:0 gen:65 core:0x17 unit:0x000

Post by Devlin85 »

Ran MemtestG80

Final error count after 100 iterations over 128 MiB of GPU memory: 0 errors

Also did at 1GB:

Final error count after 50 iterations over 1024 MiB of GPU memory: 0 errors
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: project:9101 run:435 clone:0 gen:65 core:0x17 unit:0x000

Post by bruce »

It's not a GPU memory error, it's a Main RAM memory detected by Windows in the CPU application FahCore_17, not in the GPU. (WIndows doesn't manage GPU memory). Run memtest86*
Devlin85
Posts: 21
Joined: Fri Apr 25, 2014 3:56 pm

Re: project:9101 run:435 clone:0 gen:65 core:0x17 unit:0x000

Post by Devlin85 »

bruce wrote:It's not a GPU memory error, it's a Main RAM memory detected by Windows in the CPU application FahCore_17, not in the GPU. (WIndows doesn't manage GPU memory). Run memtest86*
Ran a couple memtests.. oddly it looks like the XMP profile (Standard, not OC) is to blame.. didn't produce any errors, just flat out froze. Went in and set everything manually, passed all the tests without issue. Fingers crossed I'm back in business here. I noticed it was having issues and paused it so it still has the same 9101 WU to process, so I should find out soon.
Devlin85
Posts: 21
Joined: Fri Apr 25, 2014 3:56 pm

Re: project:9101 run:435 clone:0 gen:65 core:0x17 unit:0x000

Post by Devlin85 »

Well either it didn't like the fact I paused it or it's still something else.. but it got to the same spot and bailed on the WU, requested new one. No crash this time though, Went right back to processing the new WU.

19:46:44:WU01:FS00:0x17:Completed 0 out of 2500000 steps (0%)
19:46:44:WU01:FS00:0x17:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
19:47:11:Started thread 10 on PID 4412
19:48:31:WU00:FS02:0x17:Completed 0 out of 2000000 steps (0%)
19:48:31:WU00:FS02:0x17:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
19:55:52:WU01:FS00:0x17:Completed 25000 out of 2500000 steps (1%)
19:56:12:WU00:FS02:0x17:Completed 20000 out of 2000000 steps (1%)
19:57:22:WU01:FS00:0x17:ERROR:exception: The periodic box size has decreased to less than twice the nonbonded cutoff.
19:57:22:WU01:FS00:0x17:Saving result file logfile_01.txt
19:57:22:WU01:FS00:0x17:Saving result file log.txt
19:57:22:WU01:FS00:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
19:57:22:WARNING:WU01:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
19:57:22:WU01:FS00:Sending unit results: id:01 state:SEND error:FAULTY project:9101 run:790 clone:0 gen:65 core:0x17 unit:0x000000440a3b1e81533f77b99203944a
19:57:22:WU01:FS00:Uploading 2.88KiB to 171.64.65.93
19:57:22:WU01:FS00:Connecting to 171.64.65.93:8080
19:57:23:WU02:FS00:Connecting to assign-GPU.stanford.edu:80
19:57:23:WU01:FS00:Upload complete
19:57:23:WU01:FS00:Server responded WORK_ACK (400)
19:57:23:WU01:FS00:Cleaning up
19:57:23:WU02:FS00:News: Welcome to Folding@Home
19:57:23:WU02:FS00:Assigned to work server 171.64.65.93
19:57:23:WU02:FS00:Requesting new work unit for slot 00: READY gpu:0:GK104 [GeForce GTX 760] from 171.64.65.93
Devlin85
Posts: 21
Joined: Fri Apr 25, 2014 3:56 pm

Re: project:9101 run:435 clone:0 gen:65 core:0x17 unit:0x000

Post by Devlin85 »

Just bombed another 9101.. but this time/last time it gives an error.. ERROR:exception: First periodic box vector must be parallel to x. & ERROR:exception: The periodic box size has decreased to less than twice the nonbonded cutoff.

20:11:08:WU00:FS02:0x17:Completed 60000 out of 2000000 steps (3%)
20:11:53:WU02:FS00:0x17:ERROR:exception: First periodic box vector must be parallel to x.
20:11:53:WU02:FS00:0x17:Saving result file logfile_01.txt
20:11:53:WU02:FS00:0x17:Saving result file log.txt
20:11:53:WU02:FS00:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
20:11:54:WARNING:WU02:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
20:11:54:WU02:FS00:Sending unit results: id:02 state:SEND error:FAULTY project:9101 run:757 clone:0 gen:69 core:0x17 unit:0x0000004d0a3b1e81533f74c0698268cb
PantherX
Site Moderator
Posts: 7020
Joined: Wed Dec 23, 2009 9:33 am
Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB

Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400
Location: Land Of The Long White Cloud
Contact:

Re: project:9101 run:435 clone:0 gen:65 core:0x17 unit:0x000

Post by PantherX »

What GPU are you using and what driver version? Please do post the initial section of the log file so we can see the system configuration and F@H settings.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Devlin85
Posts: 21
Joined: Fri Apr 25, 2014 3:56 pm

Re: project:9101 run:435 clone:0 gen:65 core:0x17 unit:0x000

Post by Devlin85 »

Code: Select all

*********************** Log Started 2014-05-05T19:45:49Z ***********************
19:45:49:************************* Folding@home Client *************************
19:45:49:      Website: http://folding.stanford.edu/
19:45:49:    Copyright: (c) 2009-2013 Stanford University
19:45:49:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
19:45:49:         Args: 
19:45:49:       Config: C:/ProgramData/FAHClient/config.xml
19:45:49:******************************** Build ********************************
19:45:49:      Version: 7.3.6
19:45:49:         Date: Feb 18 2013
19:45:49:         Time: 15:25:17
19:45:49:      SVN Rev: 3923
19:45:49:       Branch: fah/trunk/client
19:45:49:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
19:45:49:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
19:45:49:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
19:45:49:     Platform: win32 XP
19:45:49:         Bits: 32
19:45:49:         Mode: Release
19:45:49:******************************* System ********************************
19:45:49:          CPU: Intel(R) Core(TM) i7-4820K CPU @ 3.70GHz
19:45:49:       CPU ID: GenuineIntel Family 6 Model 62 Stepping 4
19:45:49:         CPUs: 8
19:45:49:       Memory: 15.94GiB
19:45:49:  Free Memory: 14.34GiB
19:45:49:      Threads: WINDOWS_THREADS
19:45:49:  Has Battery: false
19:45:49:   On Battery: false
19:45:49:   UTC offset: -4
19:45:49:          PID: 4412
19:45:49:          CWD: C:/ProgramData/FAHClient
19:45:49:           OS: Windows 8 Evolution 2014 64-Bit
19:45:49:      OS Arch: AMD64
19:45:49:         GPUs: 2
19:45:49:        GPU 0: NVIDIA:3 GK104 [GeForce GTX 760]
19:45:49:        GPU 1: NVIDIA:3 GK104 [GeForce GTX 760]
19:45:49:         CUDA: 3.0
19:45:49:  CUDA Driver: 5050
19:45:49:Win32 Service: false
19:45:49:***********************************************************************
19:45:49:<config>
19:45:49:  <service-description v='Folding@home Client'/>
19:45:49:  <service-restart v='true'/>
19:45:49:  <service-restart-delay v='5000'/>
19:45:49:
19:45:49:  <!-- Client Control -->
19:45:49:  <client-threads v='4'/>
19:45:49:  <cycle-rate v='4'/>
19:45:49:  <cycles v='-1'/>
19:45:49:  <data-directory v='.'/>
19:45:49:  <disable-sleep-when-active v='true'/>
19:45:49:  <exec-directory v='C:\Program Files (x86)\FAHClient'/>
19:45:49:  <exit-when-done v='false'/>
19:45:49:  <fold-anon v='false'/>
19:45:49:  <open-web-control v='false'/>
19:45:49:
19:45:49:  <!-- Configuration -->
19:45:49:  <config-rotate v='true'/>
19:45:49:  <config-rotate-dir v='configs'/>
19:45:49:  <config-rotate-max v='16'/>
19:45:49:
19:45:49:  <!-- Debugging -->
19:45:49:  <assignment-servers>
19:45:49:    assign3.stanford.edu:8080 assign4.stanford.edu:80
19:45:49:  </assignment-servers>
19:45:49:  <capture-directory v='capture'/>
19:45:49:  <capture-on-error v='false'/>
19:45:49:  <capture-packets v='false'/>
19:45:49:  <capture-requests v='false'/>
19:45:49:  <capture-responses v='false'/>
19:45:49:  <capture-sockets v='false'/>
19:45:49:  <debug-sockets v='false'/>
19:45:49:  <exception-locations v='true'/>
19:45:49:  <gpu-assignment-servers>
19:45:49:    assign-GPU.stanford.edu:80 assign-GPU.stanford.edu:8080
19:45:49:  </gpu-assignment-servers>
19:45:49:  <stack-traces v='false'/>
19:45:49:
19:45:49:  <!-- Error Handling -->
19:45:49:  <max-slot-errors v='5'/>
19:45:49:  <max-unit-errors v='5'/>
19:45:49:
19:45:49:  <!-- Folding Core -->
19:45:49:  <checkpoint v='30'/>
19:45:49:  <core-dir v='cores'/>
19:45:49:  <core-priority v='low'/>
19:45:49:  <cpu-affinity v='false'/>
19:45:49:  <cpu-usage v='100'/>
19:45:49:  <gpu-usage v='100'/>
19:45:49:  <no-assembly v='false'/>
19:45:49:
19:45:49:  <!-- Folding Slot Configuration -->
19:45:49:  <cause v='ANY'/>
19:45:49:  <client-subtype v='STDCLI'/>
19:45:49:  <client-type v='advanced'/>
19:45:49:  <cpu-species v='X86_PENTIUM_II'/>
19:45:49:  <cpu-type v='AMD64'/>
19:45:49:  <cpus v='-1'/>
19:45:49:  <cuda-index v='0'/>
19:45:49:  <gpu v='true'/>
19:45:49:  <max-packet-size v='normal'/>
19:45:49:  <opencl-index v='0'/>
19:45:49:  <os-species v='UNKNOWN'/>
19:45:49:  <os-type v='WIN32'/>
19:45:49:  <power v='full'/>
19:45:49:  <project-key v='0'/>
19:45:49:  <smp v='true'/>
19:45:49:
19:45:49:  <!-- Process Control -->
19:45:49:  <child v='false'/>
19:45:49:  <daemon v='false'/>
19:45:49:  <pid v='false'/>
19:45:49:  <pid-file v='Folding@home Client.pid'/>
19:45:49:  <respawn v='false'/>
19:45:49:  <service v='false'/>
19:45:49:
19:45:49:  <!-- Remote Command Server -->
19:45:49:  <command-address v='0.0.0.0'/>
19:45:49:  <command-allow-no-pass v='127.0.0.1'/>
19:45:49:  <command-deny-no-pass v='0/0'/>
19:45:49:  <command-port v='36330'/>
19:45:49:
19:45:49:  <!-- Slot Control -->
19:45:49:  <idle v='false'/>
19:45:49:  <max-shutdown-wait v='60'/>
19:45:49:  <pause-on-battery v='false'/>
19:45:49:  <pause-on-start v='false'/>
19:45:49:
19:45:49:  <!-- Web Server -->
19:45:49:  <session-timeout v='3600'/>
19:45:49:  <web-allow v='127.0.0.1'/>
19:45:49:  <web-deny v='0/0'/>
19:45:49:
19:45:49:  <!-- Work Unit Control -->
19:45:49:  <dump-after-deadline v='true'/>
19:45:49:  <max-queue v='16'/>
19:45:49:  <max-units v='0'/>
19:45:49:  <next-unit-percentage v='98'/>
19:45:49:
19:45:49:  <!-- Folding Slots -->
19:45:49:  <slot id='2' type='GPU'>
19:45:49:    <client-type v='beta'/>
19:45:49:    <gpu-index v='1'/>
19:45:49:  </slot>
19:45:49:  <slot id='0' type='GPU'/>
19:45:49:</config>
Mod edit: Added Code tags to log
Devlin85
Posts: 21
Joined: Fri Apr 25, 2014 3:56 pm

Re: project:9101 run:435 clone:0 gen:65 core:0x17 unit:0x000

Post by Devlin85 »

The one having the issue is slot 0, the one set to handle beta is working fine. I made both my slots beta now, it downloaded a new WU (still 9101) and is cruising through it right now.. at 23%.
P5-133XL
Posts: 2948
Joined: Sun Dec 02, 2007 4:36 am
Hardware configuration: Machine #1:

Intel Q9450; 2x2GB=8GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460; Windows Server 2008 X64 (SP1).

Machine #2:

Intel Q6600; 2x2GB=4GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460 video card; Windows 7 X64.

Machine 3:

Dell Dimension 8400, 3.2GHz P4 4x512GB Ram, Video card GTX 460, Windows 7 X32

I am currently folding just on the 5x GTX 460's for aprox. 70K PPD
Location: Salem. OR USA

Re: project:9101 run:435 clone:0 gen:65 core:0x17 unit:0x000

Post by P5-133XL »

You can choose to run closed beta and not be a member of the beta team, but note that all support of beta is done in the beta forums and specifically not in the general forums. You will need to be a beta team member to make any post in the beta forums. You can join the beta team, but you also must willing to accept all the duties and responsibilities of a beta tester.
Image
Locked