13001 WU failure

Moderators: Site Moderators, PandeGroup

13001 WU failure

Postby bfromcolo » Sat Oct 04, 2014 4:22 pm

This system is running Mint 17, I have a 750ti and NVIDIA driver 343.22. All stock clocks. Been running 9201 WUs fine, this is the first 13001 I have seen and it failed with:

15:49:07:WU00:FS01:0x17:ERROR:exception: Force RMSE error of 447.223 with threshold of 5

What does this error mean?




Code: Select all
*********************** Log Started 2014-10-04T15:45:26Z ***********************
15:45:26:************************* Folding@home Client *************************
15:45:26:    Website: http://folding.stanford.edu/
15:45:26:  Copyright: (c) 2009-2014 Stanford University
15:45:26:     Author: Joseph Coffland <joseph@cauldrondevelopment.com>
15:45:26:       Args: --child --lifeline 2647 /etc/fahclient/config.xml --run-as
15:45:26:             fahclient --pid-file=/var/run/fahclient.pid --daemon
15:45:26:     Config: /etc/fahclient/config.xml
15:45:26:******************************** Build ********************************
15:45:26:    Version: 7.4.4
15:45:26:       Date: Mar 4 2014
15:45:26:       Time: 12:02:38
15:45:26:    SVN Rev: 4130
15:45:26:     Branch: fah/trunk/client
15:45:26:   Compiler: GNU 4.4.7
15:45:26:    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
15:45:26:             -fno-unsafe-math-optimizations -msse2
15:45:26:   Platform: linux2 3.2.0-1-amd64
15:45:26:       Bits: 64
15:45:26:       Mode: Release
15:45:26:******************************* System ********************************
15:45:26:        CPU: AMD Phenom(tm) II X6 1045T Processor
15:45:26:     CPU ID: AuthenticAMD Family 16 Model 10 Stepping 0
15:45:26:       CPUs: 6
15:45:26:     Memory: 7.80GiB
15:45:26:Free Memory: 6.92GiB
15:45:26:    Threads: POSIX_THREADS
15:45:26: OS Version: 3.13
15:45:26:Has Battery: false
15:45:26: On Battery: false
15:45:26: UTC Offset: -6
15:45:26:        PID: 2649
15:45:26:        CWD: /var/lib/fahclient
15:45:26:         OS: Linux 3.13.0-24-generic x86_64
15:45:26:    OS Arch: AMD64
15:45:26:       GPUs: 1
15:45:26:      GPU 0: NVIDIA:4 GM107 [GeForce GTX 750 Ti]
15:45:26:       CUDA: 5.0
15:45:26:CUDA Driver: 6050
15:45:26:***********************************************************************
15:45:26:<config>
15:45:26:  <!-- Client Control -->
15:45:26:  <fold-anon v='true'/>
15:45:26:
15:45:26:  <!-- Network -->
15:45:26:  <proxy v=':8080'/>
15:45:26:
15:45:26:  <!-- Slot Control -->
15:45:26:  <power v='full'/>
15:45:26:
15:45:26:  <!-- User Information -->
15:45:26:  <passkey v='********************************'/>
15:45:26:  <team v='37726'/>
15:45:26:  <user v='bfromcolo'/>
15:45:26:
15:45:26:  <!-- Folding Slots -->
15:45:26:  <slot id='1' type='GPU'/>
15:45:26:</config>
15:45:26:Switching to user fahclient
15:45:26:Trying to access database...
15:45:27:Successfully acquired database lock
15:45:27:Enabled folding slot 01: READY gpu:0:GM107 [GeForce GTX 750 Ti]
15:45:27:WU00:FS01:Connecting to 171.67.108.201:80
15:45:28:WU00:FS01:Assigned to work server 140.163.4.231
15:45:28:WU00:FS01:Requesting new work unit for slot 01: READY gpu:0:GM107 [GeForce GTX 750 Ti] from 140.163.4.231
15:45:28:WU00:FS01:Connecting to 140.163.4.231:8080
15:45:29:WU00:FS01:Downloading 4.84MiB
15:45:35:WU00:FS01:Download 71.05%
15:45:37:WU00:FS01:Download complete
15:45:37:WU00:FS01:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:13001 run:378 clone:1 gen:68 core:0x17 unit:0x00000096538b3db75328bad892c4b6cd
15:45:38:WU00:FS01:Starting
15:45:38:WU00:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17 -dir 00 -suffix 01 -version 704 -lifeline 2649 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
15:45:38:WU00:FS01:Started FahCore on PID 2667
15:45:38:WU00:FS01:Core PID:2671
15:45:38:WU00:FS01:FahCore 0x17 started
15:45:38:WU00:FS01:0x17:*********************** Log Started 2014-10-04T15:45:38Z ***********************
15:45:38:WU00:FS01:0x17:Project: 13001 (Run 378, Clone 1, Gen 68)
15:45:38:WU00:FS01:0x17:Unit: 0x00000096538b3db75328bad892c4b6cd
15:45:38:WU00:FS01:0x17:CPU: 0x00000000000000000000000000000000
15:45:38:WU00:FS01:0x17:Machine: 1
15:45:38:WU00:FS01:0x17:Reading tar file state.xml
15:45:39:WU00:FS01:0x17:Reading tar file system.xml
15:45:39:WU00:FS01:0x17:Reading tar file integrator.xml
15:45:39:WU00:FS01:0x17:Reading tar file core.xml
15:45:39:WU00:FS01:0x17:Digital signatures verified
15:49:07:WU00:FS01:0x17:ERROR:exception: Force RMSE error of 447.223 with threshold of 5
15:49:07:WU00:FS01:0x17:Saving result file logfile_01.txt
15:49:07:WU00:FS01:0x17:Saving result file badStateCheckpoint_57114166
15:49:08:WU00:FS01:0x17:Saving result file badStateForceGroup0_57114166Core.xml
15:49:11:WU00:FS01:0x17:Saving result file badStateForceGroup0_57114166Ref.xml
15:49:14:WU00:FS01:0x17:Saving result file badStateForceGroup1_57114166Core.xml
15:49:16:WU00:FS01:0x17:Saving result file badStateForceGroup1_57114166Ref.xml
15:49:19:WU00:FS01:0x17:Saving result file badStateForceGroup2_57114166Core.xml
15:49:21:WU00:FS01:0x17:Saving result file badStateForceGroup2_57114166Ref.xml
15:49:23:WU00:FS01:0x17:Saving result file log.txt
15:49:23:WU00:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
15:49:24:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
15:49:24:WU00:FS01:Sending unit results: id:00 state:SEND error:FAULTY project:13001 run:378 clone:1 gen:68 core:0x17 unit:0x00000096538b3db75328bad892c4b6cd
15:49:24:WU00:FS01:Uploading 24.64MiB to 140.163.4.231
15:49:24:WU00:FS01:Connecting to 140.163.4.231:8080


Mod edit: Please use Code tags instead of Quote tags around log files
bfromcolo
 
Posts: 53
Joined: Fri Mar 01, 2013 1:12 am

Re: 13001 WU failure

Postby Joe_H » Sat Oct 04, 2014 5:15 pm

The error indicates that you may have received a bad WU. So far no one has completed this WU, though one person did get about 25% of the way through it.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Joe_H
Site Admin
 
Posts: 4592
Joined: Tue Apr 21, 2009 4:41 pm
Location: W. MA

Re: 13001 WU failure

Postby Breach » Sat Oct 04, 2014 7:07 pm

I think this is a more general problem following some changes done today to the AS. You have a Maxwell like me and after the change we're being given Core 17 WUs which error out (or even crash the core) - see here:
https://foldingforum.org/viewtopic.php?f=18&t=26807&start=15

I don't know whether this is the case with all Core 17 WUs and Maxwells or just some projects. From what I understand it's an old problem which emerged again with the new AS and the recent changes. After failing all WUs I have received I stopped GPU folding for now (at least with Core 15 WUs we could do something ;-)
Windows 10 x64 / i6700k @4.6Ghz / ASUS Sabertooth Z170 / 16GB DDR4 2400 CL10 / MSI 1070 @2000MHz / Creative Titanium HD / Tube amp, Sennheiser 650 / PSU Corsair AX1200i / Samsung 840 Pro, OCZ Vertex 3 SSD, HDDs in RAID
Breach
 
Posts: 187
Joined: Sat Mar 09, 2013 8:07 pm
Location: Brussels, Belgium

Re: 13001 WU failure

Postby bruce » Sat Oct 04, 2014 10:31 pm

Breach wrote:I think this is a more general problem following some changes done today to the AS. You have a Maxwell like me and after the change we're being given Core 17 WUs which error out (or even crash the core) - see here:
https://foldingforum.org/viewtopic.php?f=18&t=26807&start=15

I don't know whether this is the case with all Core 17 WUs and Maxwells or just some projects. From what I understand it's an old problem which emerged again with the new AS and the recent changes. After failing all WUs I have received I stopped GPU folding for now (at least with Core 15 WUs we could do something ;-)


The Maxwell most definitely are more reliable with the latest drivers that with older versions. I'm not sure if that's significant for FahCore_17 but it's worth considering.

While changes to the AS code have altered the assignment probabilities for specific projects, actual changes may not match with our perception of how particular projects behave.
bruce
 
Posts: 22852
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 13001 WU failure

Postby Kjetil » Sat Oct 04, 2014 10:44 pm

Latest Short Lived Branch version: 343.22. He has the last drivers for linux. I have the same problems om win. It is As not the drivers?
Kjetil
 
Posts: 128
Joined: Sat Apr 14, 2012 5:56 pm
Location: Stavanger Norway

Re: 13001 WU failure

Postby Breach » Sat Oct 04, 2014 10:44 pm

bruce, right now all Core 17 WUs assigned to Maxwells seem to fail (with latest drivers) - in my case about 10 out of 10. I posted here as I don't think this here is an isolated incident.
Breach
 
Posts: 187
Joined: Sat Mar 09, 2013 8:07 pm
Location: Brussels, Belgium

Re: 13001 WU failure

Postby bfromcolo » Sun Oct 05, 2014 2:33 pm

My system runs 9201 fine, but overnight it stopped processing after 10 consecutive 13001 failures. Will any flag make these 9201 more likely?

Code: Select all
23:33:52:WU00:FS01:0x17:ERROR:exception: Force RMSE error of 454.735 with threshold of 5
23:34:09:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
23:38:00:WU01:FS01:0x17:ERROR:exception: Force RMSE error of 453.528 with threshold of 5
23:38:18:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
23:42:10:WU02:FS01:0x17:ERROR:exception: Force RMSE error of 446.944 with threshold of 5
23:42:28:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
23:46:29:WU00:FS01:0x17:ERROR:exception: Force RMSE error of 453.412 with threshold of 5
23:46:47:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
23:50:41:WU01:FS01:0x17:ERROR:exception: Force RMSE error of 451.321 with threshold of 5
23:50:59:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
23:54:50:WU02:FS01:0x17:ERROR:exception: Force RMSE error of 452.633 with threshold of 5
23:55:07:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
23:59:01:WU03:FS01:0x17:ERROR:exception: Force RMSE error of 455.484 with threshold of 5
23:59:17:WARNING:WU03:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
00:03:25:WU00:FS01:0x17:ERROR:exception: Force RMSE error of 456.956 with threshold of 5
00:03:42:WARNING:WU00:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
00:07:31:WU01:FS01:0x17:ERROR:exception: Force RMSE error of 450.132 with threshold of 5
00:07:48:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
00:11:39:WU02:FS01:0x17:ERROR:exception: Force RMSE error of 452.811 with threshold of 5
00:11:56:WARNING:WU02:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
bfromcolo
 
Posts: 53
Joined: Fri Mar 01, 2013 1:12 am

Re: 13001 WU failure

Postby snapshot » Sun Oct 05, 2014 7:36 pm

I've just had the same problem:
Code: Select all
18:57:26:WU02:FS00:0x17:ERROR:exception: Force RMSE error of 455.059 with threshold of 5
18:57:26:WU02:FS00:0x17:Saving result file logfile_01.txt
18:57:26:WU02:FS00:0x17:Saving result file log.txt
18:57:26:WU02:FS00:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
18:57:26:WARNING:WU02:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
18:57:26:WU02:FS00:Sending unit results: id:02 state:SEND error:FAULTY project:13001 run:62 clone:3 gen:11 core:0x17 unit:0x0000001e538b3db753286153604b81f0
18:57:26:WU02:FS00:Uploading 2.30KiB to 140.163.4.231
18:57:26:WU02:FS00:Connecting to 140.163.4.231:8080
18:57:26:WU02:FS00:Upload complete
18:57:26:WU02:FS00:Server responded WORK_ACK (400)
18:57:26:WU02:FS00:Cleaning up


Nvidia drivers 340.52 under W7 Pro 64. I'll try the 344.11 drivers on my test box but I wasn't using them because they were so poor on 9201s.
Last edited by snapshot on Mon Oct 06, 2014 5:44 am, edited 1 time in total.
User avatar
snapshot
 
Posts: 121
Joined: Thu Apr 09, 2009 7:25 pm
Location: Wiltshire, UK

Re: 13001 WU failure

Postby 7im » Sun Oct 05, 2014 8:06 pm

What version of fahcore?

On what kind of hardware. Need more info to help you.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
User avatar
7im
 
Posts: 14648
Joined: Thu Nov 29, 2007 4:30 pm
Location: Arizona

Re: 13001 WU failure

Postby snapshot » Sun Oct 05, 2014 8:23 pm

FAHcore is version 52. Hardware is i7-3770, 16GB RAM, GTX750ti.

Just had another one:
Code: Select all
20:06:01:WU02:FS00:0x17:ERROR:exception: Force RMSE error of 450.68 with threshold of 5
20:06:01:WU02:FS00:0x17:Saving result file logfile_01.txt
20:06:01:WU02:FS00:0x17:Saving result file log.txt
20:06:01:WU02:FS00:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
20:06:01:WARNING:WU02:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
20:06:01:WU02:FS00:Sending unit results: id:02 state:SEND error:FAULTY project:13000 run:129 clone:0 gen:50 core:0x17 unit:0x00000066538b3db7530fc0694e857c15
20:06:01:WU02:FS00:Uploading 2.31KiB to 140.163.4.231
20:06:01:WU02:FS00:Connecting to 140.163.4.231:8080
20:06:02:WU02:FS00:Upload complete
20:06:02:WU02:FS00:Server responded WORK_ACK (400)
20:06:02:WU02:FS00:Cleaning up


This is a system that was 100% stable with 9201, 8108 and 762x WUs and has not had any hardware changes or any extra software installed other than MS updates as I've been away from home for the last four days.
This is preventing me folding with my GPU and, if I can only use the CPU, then I'm just not going to bother.
User avatar
snapshot
 
Posts: 121
Joined: Thu Apr 09, 2009 7:25 pm
Location: Wiltshire, UK

Re: 13001 WU failure

Postby gwildperson » Mon Oct 06, 2014 12:57 am

snapshot wrote:Nvidia drivers 304.52 under W7 Pro 64. I'll try the 344.11 drivers on my test box but I wasn't using them because they were so poor on 9201s.


Why 344.11, when 344.16 was released 5 days later?
gwildperson
 
Posts: 726
Joined: Tue Dec 04, 2007 8:36 pm

Re: 13001 WU failure

Postby Kjetil » Mon Oct 06, 2014 1:07 am

344.16 is for ONLY 970 and 980.
Kjetil
 
Posts: 128
Joined: Sat Apr 14, 2012 5:56 pm
Location: Stavanger Norway

Re: 13001 WU failure

Postby Razzaa » Mon Oct 06, 2014 1:44 am

I am having the exact same issues. I have tried numerous things to fix it but now my GPU wont fold at all.
Razzaa
 
Posts: 2
Joined: Mon Oct 06, 2014 1:41 am

Re: 13001 WU failure

Postby bruce » Mon Oct 06, 2014 2:54 am

Razzaa wrote:I am having the exact same issues. I have tried numerous things to fix it but now my GPU wont fold at all.


Please report which GPU you have and which drivers you are running.
bruce
 
Posts: 22852
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 13001 WU failure

Postby Barryfla » Mon Oct 06, 2014 3:41 am

I am having the same problem as others stated. My gtx 750ti won't fold, driver version 334.89, win 7, amd fx6350 6core and 16gigs ram.

14:18:11:WU01:FS01:Connecting to 171.67.108.201:80
14:18:12:WU01:FS01:Assigned to work server 140.163.4.231
14:18:12:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:GM107 [GeForce GTX 750 Ti] from 140.163.4.231
14:18:12:WU01:FS01:Connecting to 140.163.4.231:8080
14:18:12:WU01:FS01:Downloading 4.84MiB
14:18:17:WU01:FS01:Download complete
14:18:17:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:13001 run:48 clone:4 gen:34 core:0x17 unit:0x00000048538b3db753285d6453ddcf7a
14:18:17:WU01:FS01:Starting
14:18:17:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Barry/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17.exe -dir 01 -suffix 01 -version 704 -lifeline 13732 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
14:18:17:WU01:FS01:Started FahCore on PID 12816
14:18:17:WU01:FS01:Core PID:5296
14:18:17:WU01:FS01:FahCore 0x17 started
14:18:18:WU01:FS01:0x17:*********************** Log Started 2014-10-06T14:18:18Z ***********************
14:18:18:WU01:FS01:0x17:Project: 13001 (Run 48, Clone 4, Gen 34)
14:18:18:WU01:FS01:0x17:Unit: 0x00000048538b3db753285d6453ddcf7a
14:18:18:WU01:FS01:0x17:CPU: 0x00000000000000000000000000000000
14:18:18:WU01:FS01:0x17:Machine: 1
14:18:18:WU01:FS01:0x17:Reading tar file state.xml
14:18:19:WU01:FS01:0x17:Reading tar file system.xml
14:18:20:WU01:FS01:0x17:Reading tar file integrator.xml
14:18:20:WU01:FS01:0x17:Reading tar file core.xml
14:18:20:WU01:FS01:0x17:Digital signatures verified
14:18:21:WU01:FS01:0x17:Folding@home GPU core17
14:18:21:WU01:FS01:0x17:Version 0.0.52
14:22:20:WU01:FS01:0x17:ERROR:exception: Force RMSE error of 455.674 with threshold of 5
14:22:20:WU01:FS01:0x17:Saving result file logfile_01.txt
14:22:20:WU01:FS01:0x17:Saving result file log.txt
14:22:20:WU01:FS01:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
14:22:21:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
14:22:21:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:13001 run:48 clone:4 gen:34 core:0x17 unit:0x00000048538b3db753285d6453ddcf7a
Last edited by Barryfla on Tue Oct 07, 2014 12:07 am, edited 1 time in total.
Barryfla
 
Posts: 5
Joined: Sat Sep 27, 2014 6:10 pm

Next

Return to Issues with a specific WU

Who is online

Users browsing this forum: No registered users and 3 guests

cron