GPU_MEMTEST_ERROR

Moderators: Site Moderators, PandeGroup

Re: GPU_MEMTEST_ERROR

Postby mmonnin » Thu Jan 24, 2013 11:52 am

There was a server assignment problem some time ago giving AMD cards nvidia WUs. And the client is going to download the core that the WU needs so deleting the core won't help. Not until getting the correct WU were you able to get the proper core for you to work on.
mmonnin
 
Posts: 335
Joined: Wed Dec 05, 2007 1:27 am

Re: GPU_MEMTEST_ERROR

Postby meltz511 » Sun Jan 27, 2013 11:50 pm

I'm getting the GPU_MEMTEST_ERROR too

Code: Select all
*********************** Log Started 2013-01-27T23:29:52Z ***********************
23:29:52:************************* Folding@home Client *************************
23:29:52:      Website: http://folding.stanford.edu/
23:29:52:    Copyright: (c) 2009-2012 Stanford University
23:29:52:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
23:29:52:         Args: --lifeline 5668 --command-port=36331
23:29:52:       Config: C:/Users/Matt/AppData/Roaming/FAHClient/config.xml
23:29:52:******************************** Build ********************************
23:29:52:      Version: 7.2.9
23:29:52:         Date: Oct 3 2012
23:29:52:         Time: 18:05:48
23:29:52:      SVN Rev: 3578
23:29:52:       Branch: fah/trunk/client
23:29:52:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
23:29:52:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
23:29:52:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
23:29:52:     Platform: win32 XP
23:29:52:         Bits: 32
23:29:52:         Mode: Release
23:29:52:******************************* System ********************************
23:29:52:          CPU: AMD FX(tm)-8350 Eight-Core Processor
23:29:52:       CPU ID: AuthenticAMD Family 21 Model 2 Stepping 0
23:29:52:         CPUs: 8
23:29:52:       Memory: 2.00GiB
23:29:52:  Free Memory: 661.68MiB
23:29:52:      Threads: WINDOWS_THREADS
23:29:52:   On Battery: false
23:29:52:   UTC offset: -5
23:29:52:          PID: 4468
23:29:52:          CWD: C:/Users/Matt/AppData/Roaming/FAHClient
23:29:52:           OS: Windows 7 Ultimate
23:29:52:      OS Arch: AMD64
23:29:52:         GPUs: 2
23:29:52:        GPU 0: NVIDIA:1 G92 [GeForce 9800 GT]
23:29:52:        GPU 1: NVIDIA:3 GK104 [GeForce GTX 660 Ti]
23:29:52:         CUDA: 3.0
23:29:52:  CUDA Driver: 5000
23:29:52:Win32 Service: false
23:29:52:***********************************************************************
23:29:52:<config>
23:29:52:  <!-- Folding Slot Configuration -->
23:29:52:  <gpu v='true'/>
23:29:52:
23:29:52:  <!-- User Information -->
23:29:52:  <passkey v='********************************'/>
23:29:52:  <team v='59'/>
23:29:52:  <user v='Matthew_J'/>
23:29:52:
23:29:52:  <!-- Folding Slots -->
23:29:52:</config>
23:29:52:Trying to access database...
23:29:52:Successfully acquired database lock
23:29:52:Enabled folding slot 00: READY gpu:0:"G92 [GeForce 9800 GT]"
23:29:52:Enabled folding slot 01: READY gpu:1:"GK104 [GeForce GTX 660 Ti]"
23:29:52:Enabled folding slot 02: READY smp:8
23:29:52:WARNING:WU01:No longer matches Slot 2's configuration, migrating to FS00
23:29:52:WARNING:WU00:No longer matches Slot 2's configuration, migrating to FS00
23:29:52:WARNING:WU03:No longer matches Slot 0's configuration, migrating to FS02
23:29:52:WU01:FS00:Starting
23:29:52:WU01:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Matt/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/NVIDIA/G80/Core_11.fah/FahCore_11.exe -dir 01 -suffix 01 -version 702 -lifeline 4468 -checkpoint 15 -gpu 0
23:29:52:WU01:FS00:Started FahCore on PID 5416
23:29:52:WU01:FS00:Core PID:2484
23:29:52:WU01:FS00:FahCore 0x11 started
23:29:52:WU03:FS02:Starting
23:29:52:WU03:FS02:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Matt/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/Core_a4.fah/FahCore_a4.exe -dir 03 -suffix 01 -version 702 -lifeline 4468 -checkpoint 15 -np 8
23:29:52:WU03:FS02:Started FahCore on PID 5024
23:29:52:WU03:FS02:Core PID:476
23:29:52:WU03:FS02:FahCore 0xa4 started
23:29:52:WU02:FS01:Connecting to assign-GPU.stanford.edu:80
23:29:52:WU02:FS01:News: Welcome to Folding@Home
23:29:52:WU02:FS01:Assigned to work server 171.67.108.36
23:29:52:WU02:FS01:Requesting new work unit for slot 01: READY gpu:1:"GK104 [GeForce GTX 660 Ti]" from 171.67.108.36
23:29:52:WU02:FS01:Connecting to 171.67.108.36:8080
23:29:53:WU01:FS00:0x11:
23:29:53:WU01:FS00:0x11:*------------------------------*
23:29:53:WU01:FS00:0x11:Folding@Home GPU Core
23:29:53:WU01:FS00:0x11:Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
23:29:53:WU01:FS00:0x11:
23:29:53:WU01:FS00:0x11:Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
23:29:53:WU01:FS00:0x11:Build host: amoeba
23:29:53:WU01:FS00:0x11:Board Type: Nvidia
23:29:53:WU01:FS00:0x11:Core      :
23:29:53:WU01:FS00:0x11:Preparing to commence simulation
23:29:53:WU01:FS00:0x11:- Ensuring status. Please wait.
23:29:53:WU03:FS02:0xa4:
23:29:53:WU03:FS02:0xa4:*------------------------------*
23:29:53:WU03:FS02:0xa4:Folding@Home Gromacs GB Core
23:29:53:WU03:FS02:0xa4:Version 2.27 (Dec. 15, 2010)
23:29:53:WU03:FS02:0xa4:
23:29:53:WU03:FS02:0xa4:Preparing to commence simulation
23:29:53:WU03:FS02:0xa4:- Looking at optimizations...
23:29:53:WU03:FS02:0xa4:- Files status OK
23:29:53:WU03:FS02:0xa4:- Expanded 548106 -> 848540 (decompressed 154.8 percent)
23:29:53:WU03:FS02:0xa4:Called DecompressByteArray: compressed_data_size=548106 data_size=848540, decompressed_data_size=848540 diff=0
23:29:53:WU03:FS02:0xa4:- Digital signature verified
23:29:53:WU03:FS02:0xa4:
23:29:53:WU03:FS02:0xa4:Project: 7645 (Run 201, Clone 0, Gen 57)
23:29:53:WU03:FS02:0xa4:
23:29:53:WU03:FS02:0xa4:Assembly optimizations on if available.
23:29:53:WU03:FS02:0xa4:Entering M.D.
23:29:53:WU02:FS01:Downloading 56.19KiB
23:29:54:WU02:FS01:Download complete
23:29:54:WU02:FS01:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:8072 run:0 clone:3173 gen:8 core:0x15 unit:0x000000096652edb450b42af6d8e17301
23:29:54:WU02:FS01:Starting
23:29:54:WU02:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Matt/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_15.fah/FahCore_15.exe -dir 02 -suffix 01 -version 702 -lifeline 4468 -checkpoint 15 -gpu 1
23:29:54:WU02:FS01:Started FahCore on PID 5756
23:29:54:WU02:FS01:Core PID:5636
23:29:54:WU02:FS01:FahCore 0x15 started
23:29:54:WU02:FS01:0x15:
23:29:54:WU02:FS01:0x15:*------------------------------*
23:29:54:WU02:FS01:0x15:Folding@Home GPU Core
23:29:54:WU02:FS01:0x15:Version                2.25 (Wed May 9 17:03:01 EDT 2012)
23:29:54:WU02:FS01:0x15:Build host             AmoebaRemote
23:29:54:WU02:FS01:0x15:Board Type             NVIDIA/CUDA
23:29:54:WU02:FS01:0x15:Core                   15
23:29:54:WU02:FS01:0x15:GPU device info vendor=0 device=0 name=NA match=0 deviceId=1
23:29:54:WU02:FS01:0x15:
23:29:54:WU02:FS01:0x15:Window's signal control handler registered.
23:29:54:WU02:FS01:0x15:Preparing to commence simulation
23:29:54:WU02:FS01:0x15:- Looking at optimizations...
23:29:54:WU02:FS01:0x15:DeleteFrameFiles: successfully deleted file=02/wudata_01.ckp
23:29:54:WU02:FS01:0x15:- Created dyn
23:29:54:WU02:FS01:0x15:- Files status OK
23:29:54:WU02:FS01:0x15:sizeof(CORE_PACKET_HDR) = 512 file=<>
23:29:54:WU02:FS01:0x15:- Expanded 57028 -> 257342 (decompressed 451.2 percent)
23:29:54:WU02:FS01:0x15:Called DecompressByteArray: compressed_data_size=57028 data_size=257342, decompressed_data_size=257342 diff=0
23:29:54:WU02:FS01:0x15:- Digital signature verified
23:29:54:WU02:FS01:0x15:
23:29:54:WU02:FS01:0x15:Project: 8072 (Run 0, Clone 3173, Gen 8)
23:29:54:WU02:FS01:0x15:
23:29:54:WU02:FS01:0x15:Assembly optimizations on if available.
23:29:54:WU02:FS01:0x15:Entering M.D.
23:29:55:Server connection id=1 on 0.0.0.0:36331 from 127.0.0.1
23:29:56:WU02:FS01:0x15:Tpr hash 02/wudata_01.tpr:  1489602251 1343173198 1634455585 1410368382 1511216589
23:29:56:WU02:FS01:0x15:GPU device id=1
23:29:56:WU02:FS01:0x15:Working on Giving Russians Opium May Alter Current Situation
23:29:56:WU02:FS01:0x15:Client config unavailable.
23:29:56:WU02:FS01:0x15:Starting GUI Server
23:29:58:WU03:FS02:0xa4:Using Gromacs checkpoints
23:29:58:WU03:FS02:0xa4:Mapping NT from 8 to 8
23:29:59:WU03:FS02:0xa4:Resuming from checkpoint
23:29:59:WU03:FS02:0xa4:Verified 03/wudata_01.log
23:29:59:WU03:FS02:0xa4:Verified 03/wudata_01.trr
23:29:59:WU03:FS02:0xa4:Verified 03/wudata_01.xtc
23:29:59:WU03:FS02:0xa4:Verified 03/wudata_01.edr
23:29:59:WU03:FS02:0xa4:Completed 1497990 out of 2500000 steps  (59%)
23:30:02:WU01:FS00:0x11:- Looking at optimizations...
23:30:02:WU01:FS00:0x11:- Working with standard loops on this execution.
23:30:02:WU01:FS00:0x11:- Previous termination of core was improper.
23:30:02:WU01:FS00:0x11:- Files status OK
23:30:02:WU01:FS00:0x11:- Expanded 62982 -> 336763 (decompressed 534.6 percent)
23:30:02:WU01:FS00:0x11:Called DecompressByteArray: compressed_data_size=62982 data_size=336763, decompressed_data_size=336763 diff=0
23:30:02:WU01:FS00:0x11:- Digital signature verified
23:30:02:WU01:FS00:0x11:
23:30:02:WU01:FS00:0x11:Project: 10501 (Run 296, Clone 0, Gen 1012)
23:30:02:WU01:FS00:0x11:
23:30:02:WU01:FS00:0x11:Entering M.D.
23:30:08:WU01:FS00:0x11:Tpr hash 01/wudata_01.tpr:  4235738032 1343441180 398228414 616308423 4208352261
23:30:08:WU01:FS00:0x11:
23:30:08:WU01:FS00:0x11:Calling fah_main args: 14 usage=100
23:30:08:WU01:FS00:0x11:
23:30:57:WU02:FS01:0x15:Finished fah_main status=59
23:30:57:WU02:FS01:0x15:mdrun_gpu returned 59
23:30:57:WU02:FS01:0x15:GPU memtest failure
23:30:57:WU02:FS01:0x15:
23:30:57:WU02:FS01:0x15:Folding@home Core Shutdown: GPU_MEMTEST_ERROR
23:30:57:WARNING:WU02:FS01:FahCore returned: GPU_MEMTEST_ERROR (124 = 0x7c)
23:30:57:WU02:FS01:Starting
23:30:57:WU02:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Matt/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_15.fah/FahCore_15.exe -dir 02 -suffix 01 -version 702 -lifeline 4468 -checkpoint 15 -gpu 1
meltz511
 
Posts: 10
Joined: Sun Jan 27, 2013 11:43 pm

Re: GPU_MEMTEST_ERROR

Postby art_l_j_PlanetAMD64 » Mon Jan 28, 2013 12:48 am

meltz511 wrote:I'm getting the GPU_MEMTEST_ERROR too

It could be a driver issue, as you have one new GPU and one older GPU:
Code: Select all
23:29:52:        GPU 0: NVIDIA:1 G92 [GeForce 9800 GT]
23:29:52:        GPU 1: NVIDIA:3 GK104 [GeForce GTX 660 Ti]
...
23:30:57:WU02:FS01:0x15:Finished fah_main status=59
23:30:57:WU02:FS01:0x15:mdrun_gpu returned 59
23:30:57:WU02:FS01:0x15:GPU memtest failure
23:30:57:WU02:FS01:0x15:
23:30:57:WU02:FS01:0x15:Folding@home Core Shutdown: GPU_MEMTEST_ERROR
23:30:57:WARNING:WU02:FS01:FahCore returned: GPU_MEMTEST_ERROR (124 = 0x7c)

The new GPU (GTX 660 Ti) is showing the error. It should not be high temperature, because the GPU just started the WU. How big (how many Watts) is your power supply? Is it big enough for both of the GPUs?

Also, you should check your NVidia driver version, anything newer than 306.97 can cause problems. The newer NVidia drivers have had numerous reports of problems. Please see the information about it here.

I have dual Gigabyte GTX 660 Ti OC (GV-N66TOC-2GD) cards in my #6 system, please see the specifications here. They self-overclock as high as 1228MHz with rock-solid stability, using the 306.97 driver, and get as much as 39152 PPD per card. So I would suggest trying the 306.97 driver, as many users on the NVidia GeForce Forums have reported that it fixes the problems they were having with newer drivers.
art_l_j_PlanetAMD64
Over 1.04 Billion Total Points
Over 185,000 Work Units
Over 3,800,000 PPD
Overall rank (if points are combined) 20 of 1721690
In memory of my Mother May 12th 1923 - February 10th 2012
art_l_j_PlanetAMD64
 
Posts: 568
Joined: Sun May 30, 2010 2:28 pm

Re: GPU_MEMTEST_ERROR

Postby meltz511 » Mon Jan 28, 2013 5:41 am

thanks. I'll have to try that out, I've been using the newest driver. the power supply is 850 watts
meltz511
 
Posts: 10
Joined: Sun Jan 27, 2013 11:43 pm

Re: GPU_MEMTEST_ERROR

Postby meltz511 » Mon Jan 28, 2013 6:19 am

still doesn't work with 306.97 driver. as far as heat goes, the 660 ti is 29C.

Code: Select all
06:11:38:WU02:FS02:Starting
06:11:38:WU02:FS02:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Matt/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_15.fah/FahCore_15.exe -dir 02 -suffix 01 -version 702 -lifeline 4776 -checkpoint 15 -gpu 1
06:11:38:WU02:FS02:Started FahCore on PID 1512
06:11:38:WU02:FS02:Core PID:5172
06:11:38:WU02:FS02:FahCore 0x15 started
06:11:39:WU02:FS02:0x15:
06:11:39:WU02:FS02:0x15:*------------------------------*
06:11:39:WU02:FS02:0x15:Folding@Home GPU Core
06:11:39:WU02:FS02:0x15:Version                2.25 (Wed May 9 17:03:01 EDT 2012)
06:11:39:WU02:FS02:0x15:Build host             AmoebaRemote
06:11:39:WU02:FS02:0x15:Board Type             NVIDIA/CUDA
06:11:39:WU02:FS02:0x15:Core                   15
06:11:39:WU02:FS02:0x15:GPU device info vendor=0 device=0 name=NA match=0 deviceId=1
06:11:39:WU02:FS02:0x15:
06:11:39:WU02:FS02:0x15:Window's signal control handler registered.
06:11:39:WU02:FS02:0x15:Preparing to commence simulation
06:11:39:WU02:FS02:0x15:- Looking at optimizations...
06:11:39:WU02:FS02:0x15:DeleteFrameFiles: successfully deleted file=02/wudata_01.ckp
06:11:39:WU02:FS02:0x15:- Created dyn
06:11:39:WU02:FS02:0x15:- Files status OK
06:11:39:WU02:FS02:0x15:sizeof(CORE_PACKET_HDR) = 512 file=<>
06:11:39:WU02:FS02:0x15:- Expanded 56862 -> 257342 (decompressed 452.5 percent)
06:11:39:WU02:FS02:0x15:Called DecompressByteArray: compressed_data_size=56862 data_size=257342, decompressed_data_size=257342 diff=0
06:11:39:WU02:FS02:0x15:- Digital signature verified
06:11:39:WU02:FS02:0x15:
06:11:39:WU02:FS02:0x15:Project: 8072 (Run 0, Clone 1795, Gen 23)
06:11:39:WU02:FS02:0x15:
06:11:39:WU02:FS02:0x15:Assembly optimizations on if available.
06:11:39:WU02:FS02:0x15:Entering M.D.
06:11:40:WU02:FS02:0x15:Tpr hash 02/wudata_01.tpr:  2687930726 564675519 3771986821 1489226444 3898974647
06:11:40:WU02:FS02:0x15:GPU device id=1
06:11:40:WU02:FS02:0x15:Working on Giving Russians Opium May Alter Current Situation
06:11:40:WU02:FS02:0x15:Client config unavailable.
06:11:41:WU02:FS02:0x15:Finished fah_main status=59
06:11:41:WU02:FS02:0x15:mdrun_gpu returned 59
06:11:41:WU02:FS02:0x15:GPU memtest failure
06:11:41:WU02:FS02:0x15:
06:11:41:WU02:FS02:0x15:Folding@home Core Shutdown: GPU_MEMTEST_ERROR
06:11:41:WARNING:WU02:FS02:FahCore returned: GPU_MEMTEST_ERROR (124 = 0x7c)
06:12:12:WU01:FS02:0x15:
06:12:12:WU01:FS02:0x15:*------------------------------*
06:12:12:WU01:FS02:0x15:Folding@Home GPU Core
06:12:12:WU01:FS02:0x15:Version                2.25 (Wed May 9 17:03:01 EDT 2012)
06:12:12:WU01:FS02:0x15:Build host             AmoebaRemote
06:12:12:WU01:FS02:0x15:Board Type             NVIDIA/CUDA
06:12:12:WU01:FS02:0x15:Core                   15
06:12:12:WU01:FS02:0x15:GPU device info vendor=0 device=0 name=NA match=0 deviceId=1
06:12:12:WU01:FS02:0x15:
06:12:12:WU01:FS02:0x15:Window's signal control handler registered.
06:12:12:WU01:FS02:0x15:Preparing to commence simulation
06:12:12:WU01:FS02:0x15:- Looking at optimizations...
06:12:12:WU01:FS02:0x15:DeleteFrameFiles: successfully deleted file=01/wudata_01.ckp
06:12:12:WU01:FS02:0x15:- Created dyn
06:12:12:WU01:FS02:0x15:- Files status OK
06:12:12:WU01:FS02:0x15:sizeof(CORE_PACKET_HDR) = 512 file=<>
06:12:12:WU01:FS02:0x15:- Expanded 56884 -> 257342 (decompressed 452.3 percent)
06:12:12:WU01:FS02:0x15:Called DecompressByteArray: compressed_data_size=56884 data_size=257342, decompressed_data_size=257342 diff=0
06:12:12:WU01:FS02:0x15:- Digital signature verified
06:12:12:WU01:FS02:0x15:
06:12:12:WU01:FS02:0x15:Project: 8072 (Run 0, Clone 3349, Gen 8)
06:12:12:WU01:FS02:0x15:
06:12:12:WU01:FS02:0x15:Assembly optimizations on if available.
06:12:12:WU01:FS02:0x15:Entering M.D.
06:12:13:WU01:FS02:0x15:Tpr hash 01/wudata_01.tpr:  1815833189 3136151564 669111508 2231064731 1879240843
06:12:13:WU01:FS02:0x15:GPU device id=1
06:12:13:WU01:FS02:0x15:Working on Giving Russians Opium May Alter Current Situation
06:12:13:WU01:FS02:0x15:Client config unavailable.
06:12:14:WU01:FS02:0x15:Starting GUI Server
06:13:14:WU01:FS02:0x15:Finished fah_main status=59
06:13:14:WU01:FS02:0x15:mdrun_gpu returned 59
06:13:14:WU01:FS02:0x15:GPU memtest failure
06:13:14:WU01:FS02:0x15:
06:13:14:WU01:FS02:0x15:Folding@home Core Shutdown: GPU_MEMTEST_ERROR
meltz511
 
Posts: 10
Joined: Sun Jan 27, 2013 11:43 pm

Re: GPU_MEMTEST_ERROR

Postby bruce » Mon Jan 28, 2013 6:28 am

Have you tried either of the stand-alone versions of memtest for GPUs?
- MemtestG80 and MemtestCL
bruce
 
Posts: 22342
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU_MEMTEST_ERROR

Postby art_l_j_PlanetAMD64 » Mon Jan 28, 2013 6:31 am

meltz511 wrote:still doesn't work with 306.97 driver. as far as heat goes, the 660 ti is 29C.

OK, it might be a bad WU (Project: 8072 (Run 0, Clone 1795, Gen 23)). The WU should be rejected after a certain number of failures, and a new WU will be downloaded in its place. If that doesn't happen, then ask a Mod here if it is OK to dump the WU using the '--dump' command.
bruce wrote:Hopefully you used the --dump nn string to dump the WU rather than manually deleting WU files.

Where 'nn' is two digits like 00, 01, 02. If you post the log file extract at that point, the Mod can tell you which digits to enter in the '--dump' command.
art_l_j_PlanetAMD64
 
Posts: 568
Joined: Sun May 30, 2010 2:28 pm

Re: GPU_MEMTEST_ERROR

Postby bruce » Mon Jan 28, 2013 6:39 am

The log will say :WU00: or :WU01: or whatever and FAHClient Status will show the WU ID (not the SLOT ID) of the WU you're having trouble with.
bruce
 
Posts: 22342
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU_MEMTEST_ERROR

Postby meltz511 » Tue Jan 29, 2013 3:18 am

alright, tried both memtests and neither has errors. and yeah 8072 is not automatically being rejected. It was still there after I left it for 22 hours or so.
meltz511
 
Posts: 10
Joined: Sun Jan 27, 2013 11:43 pm

Re: GPU_MEMTEST_ERROR

Postby meltz511 » Tue Jan 29, 2013 3:34 am

The unit id is 01. Where do I enter the command? Do I need to launch fah using command prompt or something?
meltz511
 
Posts: 10
Joined: Sun Jan 27, 2013 11:43 pm

Re: GPU_MEMTEST_ERROR

Postby art_l_j_PlanetAMD64 » Tue Jan 29, 2013 3:39 am

meltz511 wrote:alright, tried both memtests and neither has errors. and yeah 8072 is not automatically being rejected. It was still there after I left it for 22 hours or so.

OK, is it still this WU and Work Queue ID?
Code: Select all
06:11:39:WU02:FS02:0x15:Project: 8072 (Run 0, Clone 1795, Gen 23)

If it is, then please ask a Mod if it is OK to dump the WU. If this is not the WU/Work Queue ID, then please post a log file extract showing those values.

I went through the FAHClient/FAHControl documentation, and here is how I think it should work, to properly dump a WU.

But do not use these instructions until they have been verified as being correct!!!

--------------------------------------------------------------

Potential WU dumping instructions, subject to verification by bruce or a site Moderator:

OK, it might be a bad WU. The WU should be rejected after a certain number of failures, and a new WU will be downloaded in its place. If that doesn't happen, then ask a Mod here if it is OK to dump the WU using the '--dump' command.
bruce wrote:Hopefully you used the --dump nn string to dump the WU rather than manually deleting WU files.

Where 'nn' is two digits like 00, 01, 02. If you post the log file extract at that point, the Mod can tell you which digits to enter in the '--dump' command. The log file will have something like :WU00: or :WU01:, those are the digits to enter. Also, here is an example of the FAHControl display in the Expert mode, showing the Work Queue ID, which has the same two digits to enter in the '--dump' command:
Image

The '--dump' command should be performed as follows:
  • Click 'Quit' on FAHControl to exit FAHControl and FAHClient (assumes FAHClient was started by FAHControl)
  • Open a Command Prompt window by doing: Start -> All Programs -> Accessories -> Command Prompt
  • Type the following command and press <Enter>: FAHClient --dump nn (where nn is the Work Queue ID that you found above)
  • Close the Command Prompt window
  • Restart FAHControl (which also by default starts FAHClient)
art_l_j_PlanetAMD64
 
Posts: 568
Joined: Sun May 30, 2010 2:28 pm

Re: GPU_MEMTEST_ERROR

Postby meltz511 » Tue Jan 29, 2013 4:25 am

both folding slot and work queue id are 01.
Image
meltz511
 
Posts: 10
Joined: Sun Jan 27, 2013 11:43 pm

Re: GPU_MEMTEST_ERROR

Postby meltz511 » Thu Jan 31, 2013 4:44 am

I got rid of all the slots except the 660 ti, to make troubleshooting easier. I was able to get it to switch work unit. But it still has a mem test error.
I ran both of the memtest programs without any errors. I downgraded the gpu driver to 306.97, The computer has a 850 powersupply, the the gpu works fine. I've done benchmarks with it and does well. I blew $300 on this graphic card for the purpose of running F@H on it, and I would really like to have this issue resolved.

Code: Select all
04:33:17:Adding folding slot 00: READY gpu:1:"GK104 [GeForce GTX 660 Ti]"
04:33:18:WU00:FS00:Connecting to assign-GPU.stanford.edu:80
04:33:18:WU00:FS00:News: Welcome to Folding@Home
04:33:18:WU00:FS00:Assigned to work server 171.67.108.36
04:33:18:WU00:FS00:Requesting new work unit for slot 00: READY gpu:1:"GK104 [GeForce GTX 660 Ti]" from 171.67.108.36
04:33:18:WU00:FS00:Connecting to 171.67.108.36:8080
04:33:18:Removing old file 'configs/config-20130128-060344.xml'
04:33:18:Saving configuration to config.xml
04:33:18:<config>
04:33:18:  <!-- Folding Slot Configuration -->
04:33:18:  <gpu v='true'/>
04:33:18:
04:33:18:  <!-- Network -->
04:33:18:  <proxy v=':8080'/>
04:33:18:
04:33:18:  <!-- User Information -->
04:33:18:  <passkey v='********************************'/>
04:33:18:  <team v='59'/>
04:33:18:  <user v='Matthew_J'/>
04:33:18:
04:33:18:  <!-- Folding Slots -->
04:33:18:  <slot id='0' type='GPU'>
04:33:18:    <gpu-index v='1'/>
04:33:18:  </slot>
04:33:18:</config>
04:33:19:WU00:FS00:Downloading 56.92KiB
04:33:19:WU00:FS00:Download complete
04:33:19:WU00:FS00:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:8073 run:0 clone:474 gen:126 core:0x15 unit:0x000000876652edb450b42bf1b9e0eb46
04:33:19:WU00:FS00:Starting
04:33:19:WU00:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Matt/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_15.fah/FahCore_15.exe -dir 00 -suffix 01 -version 702 -lifeline 4340 -checkpoint 15 -gpu 1
04:33:19:WU00:FS00:Started FahCore on PID 5680
04:33:19:WU00:FS00:Core PID:4924
04:33:19:WU00:FS00:FahCore 0x15 started
04:33:19:WU00:FS00:Downloading project 8073 description
04:33:19:WU00:FS00:Connecting to fah-web.stanford.edu:80
04:33:20:WU00:FS00:Project 8073 description downloaded successfully
04:33:20:WU00:FS00:0x15:
04:33:20:WU00:FS00:0x15:*------------------------------*
04:33:20:WU00:FS00:0x15:Folding@Home GPU Core
04:33:20:WU00:FS00:0x15:Version                2.25 (Wed May 9 17:03:01 EDT 2012)
04:33:20:WU00:FS00:0x15:Build host             AmoebaRemote
04:33:20:WU00:FS00:0x15:Board Type             NVIDIA/CUDA
04:33:20:WU00:FS00:0x15:Core                   15
04:33:20:WU00:FS00:0x15:GPU device info vendor=0 device=0 name=NA match=0 deviceId=1
04:33:20:WU00:FS00:0x15:
04:33:20:WU00:FS00:0x15:Window's signal control handler registered.
04:33:20:WU00:FS00:0x15:Preparing to commence simulation
04:33:20:WU00:FS00:0x15:- Looking at optimizations...
04:33:20:WU00:FS00:0x15:DeleteFrameFiles: successfully deleted file=00/wudata_01.ckp
04:33:20:WU00:FS00:0x15:- Created dyn
04:33:20:WU00:FS00:0x15:- Files status OK
04:33:20:WU00:FS00:0x15:sizeof(CORE_PACKET_HDR) = 512 file=<>
04:33:20:WU00:FS00:0x15:- Expanded 57769 -> 259290 (decompressed 448.8 percent)
04:33:20:WU00:FS00:0x15:Called DecompressByteArray: compressed_data_size=57769 data_size=259290, decompressed_data_size=259290 diff=0
04:33:20:WU00:FS00:0x15:- Digital signature verified
04:33:20:WU00:FS00:0x15:
04:33:20:WU00:FS00:0x15:Project: 8073 (Run 0, Clone 474, Gen 126)
04:33:20:WU00:FS00:0x15:
04:33:20:WU00:FS00:0x15:Assembly optimizations on if available.
04:33:20:WU00:FS00:0x15:Entering M.D.
04:33:22:WU00:FS00:0x15:Tpr hash 00/wudata_01.tpr:  1994558765 4224998879 2925319635 3023220307 1375548680
04:33:22:WU00:FS00:0x15:GPU device id=1
04:33:22:WU00:FS00:0x15:Working on God Rules Over Mankind, Animals, Cosmos and Such
04:33:22:WU00:FS00:0x15:Client config unavailable.
04:33:22:WU00:FS00:0x15:Starting GUI Server
04:34:28:WU00:FS00:0x15:Finished fah_main status=59
04:34:28:WU00:FS00:0x15:mdrun_gpu returned 59
04:34:28:WU00:FS00:0x15:GPU memtest failure
04:34:28:WU00:FS00:0x15:
04:34:28:WU00:FS00:0x15:Folding@home Core Shutdown: GPU_MEMTEST_ERROR
meltz511
 
Posts: 10
Joined: Sun Jan 27, 2013 11:43 pm

Re: GPU_MEMTEST_ERROR

Postby P5-133XL » Thu Jan 31, 2013 5:27 am

Have you tried down-clocking both the shaders and/or the RAM?
Image
P5-133XL
 
Posts: 4034
Joined: Sun Dec 02, 2007 4:36 am
Location: Salem. OR USA

Re: GPU_MEMTEST_ERROR

Postby art_l_j_PlanetAMD64 » Thu Jan 31, 2013 7:38 am

I notice you have this:
Code: Select all
04:33:18:  <!-- Folding Slots -->
04:33:18:  <slot id='0' type='GPU'>
04:33:18:    <gpu-index v='1'/>
04:33:18:  </slot>

In my #3 computer, where I installed a brand new GTX660Ti today, I have this:
Code: Select all
19:33:21:  <!-- Folding Slots -->
19:33:21:  <slot id='1' type='SMP'/>
19:33:21:  <slot id='0' type='GPU'>
19:33:21:    <gpu-index v='0'/>
19:33:21:  </slot>

In my #1 computer, where I installed a brand new GTX660Ti 3 days ago, I have this:
Code: Select all
19:25:07:  <!-- Folding Slots -->
19:25:07:  <slot id='0' type='GPU'/>
19:25:07:  <slot id='1' type='SMP'>
19:25:07:    <cpus v='4'/>
19:25:07:  </slot>

All four of my GTX660Ti's, which are new in the last 3 weeks, have processed 8072 and 8073 WUs without any failures.

In both my #1 and #3 computers, the GPU is is the only one in the computer. I realize that you had 2 GPUs to start with, which could explain the non-agreement with the slot id and the gpu-index. Try moving the GPU to a different PCIe connector, and getting FAHControl to auto-detect the GPU by doing this:
  • In FAHControl, you must be in either the 'Advanced' or 'Expert' mode, selected from the dropdown menu at the upper right
  • Click 'Pause' in FAHControl, and wait until all slots show Paused
  • Click on 'Configure', and select the 'Slots' tab
  • Click on the 'gpu' slot to highlight it, then click on 'Remove'
  • Click on 'OK', then click on 'Save'
  • Click on 'Quit' to exit FAHControl and FAHClient
  • Shutdown the computer, and remove the AC power cable from the power supply
  • Move the GPU to a different PCIe connector
  • Reconnect the AC power cable to the power supply, and power up the computer
FAHClient should auto-detect the GPU, and should have something like this:
Code: Select all
19:25:07:  <!-- Folding Slots -->
19:25:07:  <slot id='0' type='GPU'/>

See if the problem still exists with the new configuration, or if it has been fixed, and let us know. Thanks!

EDIT:
Could you please tell us the manufacturer and Model number of your GTX 660 Ti? For example, mine are the Gigabyte GTX 660 Ti OC Model number GV-N66TOC-2GD. Thanks!
art_l_j_PlanetAMD64
 
Posts: 568
Joined: Sun May 30, 2010 2:28 pm

PreviousNext

Return to V7.2.x -- Windows/Linux Release & OSX Beta

Who is online

Users browsing this forum: No registered users and 1 guest

cron