Repeated Download Failure

Moderators: Site Moderators, FAHC Science Team

Post Reply
N0OA
Posts: 38
Joined: Wed Feb 13, 2013 6:55 am
Hardware configuration: CPU:4 AMD Phenom II X4 910 @ 2.6GHz
CPU:2 Intel Core Duo T2600 @ 2.13GHz
CPU:4 Intel Core i5
CPU:4 Intel Core i5 M520 2.40 GHz
CPU:8 Intel Core i7-2600K @ 3.40GHz
CPU:8 Intel Core i7-3720QM @ 2.6GHz
CPU:7 Intel Core i7-3770 @ 3.40GHz
CPU:8 Intel Core i7-3820QM @ 2.7GHz
CPU:12 Intel Core i7-3930K @ 3.20GHz
CPU:10 Intel Core i7-3960X Hexa-Core 3.3GHz
CPU:10 Intel Core i7-3960X Hexa-Core 3.3GHz
CPU:2 Intel Pentium® D @ 2.80GHz
CPU:30 Intel XEON CPU E5-2687W @3.1GHz (2x)
GPU NVIDIA GT 640
GPU NVIDIA GT218 [NVS 3100M]
GPU NVIDIA GTX 570 HD EVGA
GPU NVIDIA GTX 660 Ti Zotac
GPU NVIDIA GTX 660 Ti Zotac
GPU NVIDIA GTX 660 Ti Zotac
GPU NVIDIA GTX 660 Ti Zotac
GPU NVIDIA GTX 680 EVGA
GPU NVIDIA GTX 680 EVGA
GPU NVIDIA GTX 680 GeForce
GPU NVIDIA GTX 680 GeForce
GPU NVIDIA GTX 680 GeForce
GPU NVIDIA GTX 680 GIGABYTE
GPU NVIDIA GTX 680 GIGABYTE
GPU NVIDIA GTX Titan EVGA
GPU NVIDIA GTX Titan EVGA
GPU NVIDIA Tesla K20c
Location: Minnesota

Repeated Download Failure

Post by N0OA »

Can anyone provide insight into a problem I've been seeing the last few days on a GTX Titan? The slot seems to go into a cycle of download, run for a bit, fail and then download again... Rinse and repeat ;-)

The log seems to indicate a faulty project... Any ideas?

N0OA

Code: Select all

16:25:54:WU00:FS03:0xa3:Completed 454650 out of 500000 steps  (90%)
16:26:13:WU02:FS02:0x17:Completed 0 out of 2000000 steps (0%)
16:26:23:WU00:FS03:0xa3:Completed 455000 out of 500000 steps  (91%)
16:26:24:WU03:FS00:0x17:Completed 1700000 out of 2000000 steps (85%)
16:26:25:WU01:FS01:0x17:Completed 1150000 out of 2000000 steps (57%)
16:27:17:WU02:FS02:0x17:Completed 20000 out of 2000000 steps (1%)
16:27:25:WU02:FS02:0x17:ERROR:exception: Error downloading array energyBuffer: clEnqueueReadBuffer (-36)
16:27:25:WU02:FS02:0x17:Saving result file logfile_01.txt
16:27:25:WU02:FS02:0x17:Saving result file log.txt
16:27:25:WU02:FS02:0x17:Folding@home Core Shutdown: BAD_WORK_UNIT
16:27:25:WARNING:WU02:FS02:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
16:27:25:WU02:FS02:Sending unit results: id:02 state:SEND error:FAULTY project:7811 run:0 clone:185 gen:136 core:0x17 unit:0x0000008f0a3b1e8651db47d527d80df1
16:27:25:WU02:FS02:Uploading 2.45KiB to 171.64.65.98
16:27:25:WU02:FS02:Connecting to 171.64.65.98:8080
16:27:25:WU04:FS02:Connecting to assign-GPU.stanford.edu:80
16:27:25:WU02:FS02:Upload complete
16:27:25:WU02:FS02:Server responded WORK_ACK (400)
16:27:25:WU02:FS02:Cleaning up
16:27:26:WU04:FS02:News: Welcome to Folding@Home
16:27:26:WU04:FS02:Assigned to work server 171.64.65.98
16:27:26:WU04:FS02:Requesting new work unit for slot 02: READY gpu:2:GK110 [GeForce GTX Titan] from 171.64.65.98
16:27:26:WU04:FS02:Connecting to 171.64.65.98:8080
16:27:26:WU04:FS02:Downloading 2.09MiB
16:27:28:WU04:FS02:Download complete
16:27:28:WU04:FS02:Received Unit: id:04 state:DOWNLOAD error:NO_ERROR project:7810 run:0 clone:523 gen:94 core:0x17 unit:0x000000650a3b1e8651d34b92198db5ad
16:27:28:WU04:FS02:Starting
16:27:28:WU04:FS02:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/jrice/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/beta/Core_17.fah/FahCore_17.exe -dir 04 -suffix 01 -version 703 -lifeline 3736 -checkpoint 10 -gpu 2 -gpu-vendor nvidia
16:27:28:WU04:FS02:Started FahCore on PID 1636
16:27:28:WU04:FS02:Core PID:2568
16:27:28:WU04:FS02:FahCore 0x17 started
16:27:28:WU04:FS02:0x17:*********************** Log Started 2013-08-12T16:27:28Z ***********************
16:27:28:WU04:FS02:0x17:Project: 7810 (Run 0, Clone 523, Gen 94)
16:27:28:WU04:FS02:0x17:Unit: 0x000000650a3b1e8651d34b92198db5ad
16:27:28:WU04:FS02:0x17:CPU: 0x00000000000000000000000000000000
16:27:28:WU04:FS02:0x17:Machine: 2
16:27:28:WU04:FS02:0x17:Reading tar file state.xml
16:27:29:WU04:FS02:0x17:Reading tar file system.xml
16:27:29:WU04:FS02:0x17:Reading tar file integrator.xml
16:27:29:WU04:FS02:0x17:Reading tar file core.xml
16:27:29:WU04:FS02:0x17:Digital signatures verified
16:27:37:WU01:FS01:0x17:Completed 1160000 out of 2000000 steps (58%)
16:28:16:WU04:FS02:0x17:Completed 0 out of 2000000 steps (0%)
16:28:20:WU03:FS00:0x17:Completed 1720000 out of 2000000 steps (86%)
16:28:26:WARNING:WU04:FS02:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
16:28:26:WU04:FS02:Sending unit results: id:04 state:SEND error:FAULTY project:7810 run:0 clone:523 gen:94 core:0x17 unit:0x000000650a3b1e8651d34b92198db5ad
16:28:26:WU04:FS02:Uploading 2.51KiB to 171.64.65.98
16:28:26:WU04:FS02:Connecting to 171.64.65.98:8080
16:28:27:WU04:FS02:Upload complete
16:28:27:WU04:FS02:Server responded WORK_ACK (400)
16:28:27:WU04:FS02:Cleaning up
16:28:27:WU02:FS02:Connecting to assign-GPU.stanford.edu:80
16:28:27:WU02:FS02:News: Welcome to Folding@Home
16:28:27:WU02:FS02:Assigned to work server 171.64.65.98
16:28:27:WU02:FS02:Requesting new work unit for slot 02: READY gpu:2:GK110 [GeForce GTX Titan] from 171.64.65.98
16:28:27:WU02:FS02:Connecting to 171.64.65.98:8080
16:28:28:WU02:FS02:Downloading 2.07MiB
16:28:29:WU02:FS02:Download complete
16:28:29:WU02:FS02:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:7810 run:0 clone:802 gen:5 core:0x17 unit:0x000000070a3b1e8651d34ebb80a17d5b
16:28:29:WU02:FS02:Starting
16:28:29:WU02:FS02:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/jrice/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/beta/Core_17.fah/FahCore_17.exe -dir 02 -suffix 01 -version 703 -lifeline 3736 -checkpoint 10 -gpu 2 -gpu-vendor nvidia
16:28:29:WU02:FS02:Started FahCore on PID 3916
16:28:29:WU02:FS02:Core PID:5116
16:28:29:WU02:FS02:FahCore 0x17 started
16:28:30:WU02:FS02:0x17:*********************** Log Started 2013-08-12T16:28:30Z ***********************
16:28:30:WU02:FS02:0x17:Project: 7810 (Run 0, Clone 802, Gen 5)
16:28:30:WU02:FS02:0x17:Unit: 0x000000070a3b1e8651d34ebb80a17d5b
16:28:30:WU02:FS02:0x17:CPU: 0x00000000000000000000000000000000
16:28:30:WU02:FS02:0x17:Machine: 2
16:28:30:WU02:FS02:0x17:Reading tar file state.xml
16:28:30:WU02:FS02:0x17:Reading tar file system.xml
16:28:30:WU02:FS02:0x17:Reading tar file integrator.xml
16:28:30:WU02:FS02:0x17:Reading tar file core.xml
16:28:30:WU02:FS02:0x17:Digital signatures verified
ChristianVirtual
Posts: 1596
Joined: Tue May 28, 2013 12:14 pm
Location: Tokyo

Re: Repeated Download Failure

Post by ChristianVirtual »

What driver do you use and do you restart every 24h to 36h ? There is still a driver bug for GTX 7xx GPU; might also impact Titan ?
Any OC applied ? How's the temp with multiple Titan ?
ImageImage
Please contribute your logs to http://ppd.fahmm.net
bollix47
Posts: 2950
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: Repeated Download Failure

Post by bollix47 »

Status code 72 could be indicating a memory problem. If the GPU memory is overclocked reduce it to the default. Memory speed has little to no advantage to folding but can cause folding to become unstable if it's too high.

You can also run MemtestCL to determine if the memory is flaky.
N0OA
Posts: 38
Joined: Wed Feb 13, 2013 6:55 am
Hardware configuration: CPU:4 AMD Phenom II X4 910 @ 2.6GHz
CPU:2 Intel Core Duo T2600 @ 2.13GHz
CPU:4 Intel Core i5
CPU:4 Intel Core i5 M520 2.40 GHz
CPU:8 Intel Core i7-2600K @ 3.40GHz
CPU:8 Intel Core i7-3720QM @ 2.6GHz
CPU:7 Intel Core i7-3770 @ 3.40GHz
CPU:8 Intel Core i7-3820QM @ 2.7GHz
CPU:12 Intel Core i7-3930K @ 3.20GHz
CPU:10 Intel Core i7-3960X Hexa-Core 3.3GHz
CPU:10 Intel Core i7-3960X Hexa-Core 3.3GHz
CPU:2 Intel Pentium® D @ 2.80GHz
CPU:30 Intel XEON CPU E5-2687W @3.1GHz (2x)
GPU NVIDIA GT 640
GPU NVIDIA GT218 [NVS 3100M]
GPU NVIDIA GTX 570 HD EVGA
GPU NVIDIA GTX 660 Ti Zotac
GPU NVIDIA GTX 660 Ti Zotac
GPU NVIDIA GTX 660 Ti Zotac
GPU NVIDIA GTX 660 Ti Zotac
GPU NVIDIA GTX 680 EVGA
GPU NVIDIA GTX 680 EVGA
GPU NVIDIA GTX 680 GeForce
GPU NVIDIA GTX 680 GeForce
GPU NVIDIA GTX 680 GeForce
GPU NVIDIA GTX 680 GIGABYTE
GPU NVIDIA GTX 680 GIGABYTE
GPU NVIDIA GTX Titan EVGA
GPU NVIDIA GTX Titan EVGA
GPU NVIDIA Tesla K20c
Location: Minnesota

Re: Repeated Download Failure

Post by N0OA »

The machine had been running fine for several months. It was set to 75*C and under-clocked to avoid heat issues since there are three titans in this box... Since I had another titan, I did a little round robin until I figured out which piece of hardware was giving the issue. The problem now gone - I guess I have a failing card. I requested an RMA and I EVGA will be shipping me a replacement. I've never had a card with such a soft failure. The card worked in all other regards and tested just fine with a few of the GPU test programs I have. But, when I looked more closely, it was completing the test - but it was running at about 70% of the speed of my other Titans... Something is flaky.

I will put this one in the "no longer an issue" category since it's going back to EVGA. The customer support at EVGA has been great to work with... Maybe that's part of the reason I have so much of their hardware. :-)

N0OA
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Repeated Download Failure

Post by 7im »

In your case, a 33% failure rate doesn't seem so good.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Post Reply