GPU_MEMTEST_ERROR

Moderators: Site Moderators, FAHC Science Team

Re: GPU_MEMTEST_ERROR

Postby meltz511 » Thu Jan 31, 2013 4:08 pm

I have the Gigabyte GTX 660 Ti OC Model number GV-N66TOC-2GD same as you. I have a 970 motherboard so one of the pcie slots have a higher bandwidth then the other, which is why I have the 660 ti in the particular slot. I can try taking the extra gpu out though.
meltz511
 
Posts: 14
Joined: Mon Jan 28, 2013 12:43 am

Re: GPU_MEMTEST_ERROR

Postby art_l_j_PlanetAMD64 » Thu Jan 31, 2013 6:10 pm

meltz511 wrote:I have the Gigabyte GTX 660 Ti OC Model number GV-N66TOC-2GD same as you. I have a 970 motherboard so one of the pcie slots have a higher bandwidth then the other, which is why I have the 660 ti in the particular slot. I can try taking the extra gpu out though.

OK, I was thinking that the '<gpu-index v='1'/>' may not be correct, when the GTX 660 Ti is the only GPU that is used by FAHClient. Just something to try.

I don't think the PCIe bandwidth is a real issue with FAH, as it doesn't seem to use too much of it. Although it would matter on apps that use the PCIe bus much more than FAH does. Most of the time FAH seems to be waiting for the GPU to process each packet in the queue. Someone with more knowledge of the inner workings of FAH could confirm or deny this, those are just my opinions based on the little that I do know about it.
art_l_j_PlanetAMD64
Over 1.04 Billion Total Points
Over 185,000 Work Units
Over 3,800,000 PPD
Overall rank (if points are combined) 20 of 1721690
In memory of my Mother May 12th 1923 - February 10th 2012
art_l_j_PlanetAMD64
 
Posts: 472
Joined: Sun May 30, 2010 3:28 pm

Re: GPU_MEMTEST_ERROR

Postby meltz511 » Fri Feb 01, 2013 12:22 am

Its fixed now. All I did was take the second gpu out. and the 660 ti ended up downloading work unit 8072 again, but it still worked. The 660 ti is in the same pcie slot; nothing else changed. It's pretty strange, just because the 9800 was in there neither one of them did anything for F@H.
Thanks for your help!
meltz511
 
Posts: 14
Joined: Mon Jan 28, 2013 12:43 am

Re: GPU_MEMTEST_ERROR

Postby art_l_j_PlanetAMD64 » Fri Feb 01, 2013 12:45 am

meltz511 wrote:Its fixed now. All I did was take the second gpu out. and the 660 ti ended up downloading work unit 8072 again, but it still worked. The 660 ti is in the same pcie slot; nothing else changed. It's pretty strange, just because the 9800 was in there neither one of them did anything for F@H.
Thanks for your help!

You're welcome! I really enjoy digging in and figuring out how to fix a problem. Especially for a fellow Gigabyte GV-N66TOC-2GD owner! :D
art_l_j_PlanetAMD64
 
Posts: 472
Joined: Sun May 30, 2010 3:28 pm

Re: GPU_MEMTEST_ERROR

Postby [T]yphoon » Fri Feb 01, 2013 5:52 pm

i have a GTX 570 and i still get these errors
when i remove the gpu from the FAHControl and re-add it again it will start folding again
but is soon as it finishes that WU it will fail

Code: Select all
10:43:26:WU00:FS00:Connecting to 171.67.108.36:8080
10:43:26:WU01:FS00:Starting
10:43:26:WU01:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Administrator/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_15.fah/FahCore_15.exe -dir 01 -suffix 01 -version 702 -lifeline 5672 -checkpoint 20 -gpu 0
10:43:26:WU01:FS00:Started FahCore on PID 1800
10:43:26:WU01:FS00:Core PID:10032
10:43:26:WU01:FS00:FahCore 0x15 started
10:43:27:WU01:FS00:0x15:
10:43:27:WU01:FS00:0x15:*------------------------------*
10:43:27:WU01:FS00:0x15:Folding@Home GPU Core
10:43:27:WU01:FS00:0x15:Version                2.25 (Wed May 9 17:03:01 EDT 2012)
10:43:27:WU01:FS00:0x15:Build host             AmoebaRemote
10:43:27:WU01:FS00:0x15:Board Type             NVIDIA/CUDA
10:43:27:WU01:FS00:0x15:Core                   15
10:43:27:WU01:FS00:0x15:
10:43:27:WU01:FS00:0x15:Window's signal control handler registered.
10:43:27:WU01:FS00:0x15:Preparing to commence simulation
10:43:27:WU01:FS00:0x15:- Looking at optimizations...
10:43:27:WU01:FS00:0x15:DeleteFrameFiles: successfully deleted file=01/wudata_01.ckp
10:43:27:WU01:FS00:0x15:- Created dyn
10:43:27:WU01:FS00:0x15:- Files status OK
10:43:27:WU01:FS00:0x15:sizeof(CORE_PACKET_HDR) = 512 file=<>
10:43:27:WU01:FS00:0x15:- Expanded 56851 -> 257342 (decompressed 452.6 percent)
10:43:27:WU01:FS00:0x15:Called DecompressByteArray: compressed_data_size=56851 data_size=257342, decompressed_data_size=257342 diff=0
10:43:28:WU01:FS00:0x15:- Digital signature verified
10:43:28:WU01:FS00:0x15:
10:43:28:WU01:FS00:0x15:Project: 8072 (Run 0, Clone 457, Gen 68)
10:43:28:WU01:FS00:0x15:
10:43:28:WU01:FS00:0x15:Assembly optimizations on if available.
10:43:28:WU01:FS00:0x15:Entering M.D.
10:43:29:WU01:FS00:0x15:Tpr hash 01/wudata_01.tpr:  3340524695 195695630 1120144990 1696995905 749919428
10:43:29:WU01:FS00:0x15:GPU device id=0
10:43:29:WU01:FS00:0x15:Working on Giving Russians Opium May Alter Current Situation
10:43:29:WU01:FS00:0x15:Client config unavailable.
10:43:29:WU01:FS00:0x15:Starting GUI Server
10:43:29:WU01:FS00:0x15:Finished fah_main status=59
10:43:29:WU01:FS00:0x15:mdrun_gpu returned 59
10:43:29:WU01:FS00:0x15:GPU memtest failure
10:43:30:WU01:FS00:0x15:
10:43:30:WU01:FS00:0x15:Folding@home Core Shutdown: GPU_MEMTEST_ERROR
10:43:30:WARNING:WU01:FS00:FahCore returned: GPU_MEMTEST_ERROR (124 = 0x7c)
[T]yphoon
 
Posts: 15
Joined: Sun Sep 30, 2012 8:27 pm

Re: GPU_MEMTEST_ERROR

Postby art_l_j_PlanetAMD64 » Fri Feb 01, 2013 6:27 pm

[T]yphoon wrote:i have a GTX 570 and i still get these errors
when i remove the gpu from the FAHControl and re-add it again it will start folding again
but is soon as it finishes that WU it will fail

Hi, [T]yphoon, could you please post the information at the top of the log file (System Info and Configuration) as described here? Thanks!

Could you also please provide a hardware description of your system, for an example you can see my #6 system's specs here. Thanks again.

Art
art_l_j_PlanetAMD64
 
Posts: 472
Joined: Sun May 30, 2010 3:28 pm

Re: GPU_MEMTEST_ERROR

Postby [T]yphoon » Fri Feb 01, 2013 10:40 pm

Code: Select all
12:02:56:******************************** Build ********************************
12:02:56:      Version: 7.2.9
12:02:56:         Date: Oct 3 2012
12:02:56:         Time: 18:05:48
12:02:56:      SVN Rev: 3578
12:02:56:       Branch: fah/trunk/client
12:02:56:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
12:02:56:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
12:02:56:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
12:02:56:     Platform: win32 XP
12:02:56:         Bits: 32
12:02:56:         Mode: Release
12:02:56:******************************* System ********************************
12:02:56:          CPU: Intel(R) Core(TM) i7-3960X CPU @ 3.30GHz
12:02:56:       CPU ID: GenuineIntel Family 6 Model 45 Stepping 6
12:02:56:         CPUs: 12
12:02:56:       Memory: 15.94GiB
12:02:56:  Free Memory: 12.42GiB
12:02:56:      Threads: WINDOWS_THREADS
12:02:56:   On Battery: false
12:02:56:   UTC offset: 1
12:02:56:          PID: 5672
12:02:56:          CWD: C:/Users/Administrator/AppData/Roaming/FAHClient
12:02:56:           OS: Windows 7 Ultimate
12:02:56:      OS Arch: AMD64
12:02:56:         GPUs: 1
12:02:56:        GPU 0: NVIDIA:2 GF110 [Geforce GTX 570]
12:02:56:         CUDA: 2.0
12:02:56:  CUDA Driver: 4020
12:02:56:Win32 Service: false
12:02:56:***********************************************************************


CPU: Core i7 3960X
Mobo: Asus Rampage IV Extreme
RAM: Corsair Vengeance 16GB DDR3 1866Mhz
GPU: EVGA GTX 570 SC
SSD/HDD: Crucial m4 512GB & Seagate Barracuda 7200.14 2TB
OS: Windows 7 Ultimate 64-bit SP1
FAH: v7.2.9
PSU: Corsair HX750W
[T]yphoon
 
Posts: 15
Joined: Sun Sep 30, 2012 8:27 pm

Re: GPU_MEMTEST_ERROR

Postby art_l_j_PlanetAMD64 » Fri Feb 01, 2013 11:21 pm

Thanks, [T]yphoon, for the new information. There are just a couple of things:
  • The log file extract has the System Info, but not the Configuration, here is a sample from my #6 computer:
    Code: Select all
    04:46:41:<config>
    04:46:41:  <!-- Folding Slot Configuration -->
    04:46:41:  <gpu v='true'/>
    04:46:41:  <max-packet-size v='big'/>
    04:46:41:
    04:46:41:  <!-- Network -->
    04:46:41:  <proxy v=':8080'/>
    04:46:41:
    04:46:41:  <!-- User Information -->
    04:46:41:  <passkey v='********************************'/>
    04:46:41:  <team v='45862'/>
    04:46:41:  <user v='art_l_j_PlanetAMD64'/>
    04:46:41:
    04:46:41:  <!-- Work Unit Control -->
    04:46:41:  <next-unit-percentage v='100'/>
    04:46:41:
    04:46:41:  <!-- Folding Slots -->
    04:46:41:  <slot id='0' type='GPU'>
    04:46:41:    <cuda-index v='0'/>
    04:46:41:    <gpu-index v='0'/>
    04:46:41:    <opencl-index v='0'/>
    04:46:41:  </slot>
    04:46:41:  <slot id='1' type='GPU'>
    04:46:41:    <cuda-index v='1'/>
    04:46:41:    <gpu-index v='1'/>
    04:46:41:    <opencl-index v='1'/>
    04:46:41:  </slot>
    04:46:41:  <slot id='2' type='SMP'>
    04:46:41:    <cpus v='6'/>
    04:46:41:  </slot>
    04:46:41:</config>
  • Please tell us your NVidia driver version. You should be using 306.97, as newer drivers have many reports of causing serious problems; please read about it here. Thanks!
Art
art_l_j_PlanetAMD64
 
Posts: 472
Joined: Sun May 30, 2010 3:28 pm

Re: GPU_MEMTEST_ERROR

Postby [T]yphoon » Sat Feb 02, 2013 12:45 am

my 2nd rig uses the 296.10 version of NVIDIA, no errors there
also because i am folding 24/7 with my GPU i need the older 296.10 version
because the newer ones will downclock at a certain point in time (and they wont be at stock until reboot)
and trust me i used every single one of those 300 serie drivers (Beta and WHQL)
i can run with the older version for weeks on my 2nd rig without rebooting

Code: Select all
<config>
  <!-- FahCore Control -->
  <checkpoint v='20'/>

  <!-- Folding Slot Configuration -->
  <gpu v='true'/>
  <smp v='false'/>

  <!-- Network -->
  <proxy v=':8080'/>

  <!-- User Information -->
  <passkey v='********************************'/>
  <team v='37726'/>
  <user v='[T]yphoon'/>

  <!-- Folding Slots -->
  <slot id='0' type='GPU'/>
</config>
[T]yphoon
 
Posts: 15
Joined: Sun Sep 30, 2012 8:27 pm

Re: GPU_MEMTEST_ERROR

Postby art_l_j_PlanetAMD64 » Sat Feb 02, 2013 1:35 am

[T]yphoon wrote:my 2nd rig uses the 296.10 version of NVIDIA, no errors there
also because i am folding 24/7 with my GPU i need the older 296.10 version
because the newer ones will downclock at a certain point in time (and they wont be at stock until reboot)
and trust me i used every single one of those 300 series drivers (Beta and WHQL)
i can run with the older version for weeks on my 2nd rig without rebooting

OK, thanks for the configuration data. Yes, the older NVidia drivers are just as good for folding as 306.97, as long as they're not so old that they don't support your GPU. But 296.10 is, as you said, just fine for your GPU.

Here is an answer from NVidia Support, which might explain the downclocking problem you were having with the newer drivers:
Setting "Power management mode" from Adaptive to Maximum Performance
Setting Power management mode from "Adaptive" to "Maximum Performance" can improve performance in certain applications when the GPU is throttling the clock speeds incorrectly. To change this setting, with your mouse, right-click over the Windows desktop and select "NVIDIA Control Panel" -> from the NVIDIA Control Panel, select the "Manage 3D settings" from the left column -> click on the Power management mode drop down box and select "Prefer Maximum Performance". Click over the "Apply" button at the bottom of the panel to apply the changes.


I will get back to you when I have looked through the data you sent me. Thanks again.

Art
art_l_j_PlanetAMD64
 
Posts: 472
Joined: Sun May 30, 2010 3:28 pm

Re: GPU_MEMTEST_ERROR

Postby [T]yphoon » Sat Feb 02, 2013 1:49 am

it was already set to Maximum Performance (thats what i always check for) and it still downclocks after time
[T]yphoon
 
Posts: 15
Joined: Sun Sep 30, 2012 8:27 pm

Re: GPU_MEMTEST_ERROR

Postby art_l_j_PlanetAMD64 » Sat Feb 02, 2013 2:00 am

[T]yphoon wrote:it was already set to Maximum Performance (that's what i always check for) and it still downclocks after time

OK, thanks. I use v306.97 on three GTX 660 Ti's and two GTX 460's, as well as v306.81 on one GTX 660 Ti (a WinXP machine, because XP does not have v306.97), and I do not have that downclocking problem. But that doesn't matter, because your driver is good and should not have anything to do with the problem you are experiencing. I'll get back to you as soon as I've gone through the data you sent me.
art_l_j_PlanetAMD64
 
Posts: 472
Joined: Sun May 30, 2010 3:28 pm

Re: GPU_MEMTEST_ERROR

Postby art_l_j_PlanetAMD64 » Sat Feb 02, 2013 3:04 am

OK, I've gone through the data you sent me, and nothing really jumps out at me, as far as being a possible cause of the problem. Just a few observations:
  • The WU shown in the first log file extract is Project: 8072 (Run 0, Clone 457, Gen 68). I searched for this PRCG, and I don't find any report of it being bad. But perhaps it could be checked by someone with access to the AS and WS databases.
  • I did a Google search for "EVGA GTX 570 SC", and one of the results was this:
    Since i have the evga gtx 570 superclocked i had some issues like crashes (display driver failed) and back to desktop and artifacts.
    The specs of the card:

    Core voltage: 1000 Mv
    Core clock: 797 Mhz
    Shaders: 1594 Mhz
    Mem: 1950 Mhz

    But this seems not so stable hence the crashes.

    So i googled for info and found out that many people have complaints of the 500 series cards. I did reinstall my OS 3 times, I did safe boot, removed my current drivers with driver sweeper. Installed various drivers which didn't work, I'm now with driver 266.58. Temps are very good, CPU idle 17 - 21c, GPU 27 - 29c (watercooled).

    So, people said to increase the voltage, so I did with msi afterburner, increased to 1025mv, but it failed, it crashes with crysis and crysis 2.
    Increased to 1050mv, bam no more crashes not a single one or artifacts.
    What also worked for me is using the stock setting from nvidia in other words downclocking no more crashes or artifacts.
    So, for a test only, could you please try the stock NVidia clocks found here: GTX 570 specs
    That's 732/1464/1900 Graphics/Processor/Memory. What we need to do, is to get the card into a working state, then the frequencies/voltages can be moved back towards the factory OC values.
  • Please try these, to test the GPU's memory:
    bruce wrote:Have you tried either of the stand-alone versions of memtest for GPUs?
    - MemtestG80 and MemtestCL
  • Do you have anything installed in the other PCIe slots? If yes, please remove them to see if that clears up the problem.
Please let me know when you've had a chance to try these suggestions, and what results you saw. Thanks.
art_l_j_PlanetAMD64
 
Posts: 472
Joined: Sun May 30, 2010 3:28 pm

Re: GPU_MEMTEST_ERROR

Postby bruce » Sat Feb 02, 2013 3:09 am

Hi ***(hidden)*** (team 0),
Your WU (P8072 R0 C457 G68) was added to the stats database on 2013-02-01 07:12:02 for 3874 points of credit.
bruce
 
Posts: 19970
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.

Re: GPU_MEMTEST_ERROR

Postby art_l_j_PlanetAMD64 » Sat Feb 02, 2013 6:24 am

[T]yphoon, a couple more things I thought of:
  • I realize that you have a Core i7 3960X CPU, and an Asus Rampage IV Extreme motherboard, which are both very different from my #6 system. But there is a list of performance-enhancing BIOS settings here, which might help you to choose the best BIOS settings for your CPU/Mobo combination. Have a look at your Mobo manual, to see which settings might be similar to the ones for my CPU/Mobo combination.
  • There is a list of Windows settings for maximum performance here, which should apply to all CPUs/Mobos/GPUs. Some of the Windows "power-saving features" might explain why the newer NVidia drivers were downclocking your GPU. In any case, the Windows settings described there should help to boost your Folding performance. They have definitely boosted my PPD numbers considerably, compared to the "default" Windows settings.
I hope this helps!
art_l_j_PlanetAMD64
 
Posts: 472
Joined: Sun May 30, 2010 3:28 pm

PreviousNext

Return to V7.2.x -- Windows/Linux Release & OSX Beta

Who is online

Users browsing this forum: No registered users and 2 guests

cron