GPU_MEMTEST_ERROR

Moderators: Site Moderators, FAHC Science Team

Re: GPU_MEMTEST_ERROR

Postby [T]yphoon » Sat Feb 02, 2013 12:33 pm

i have the same setup for months now, and the last couple of weeks i am getting these errors
i already tried the Memtest, all came without errors (i can fold 1 WU without a problem)
i also cant switch pci-e slots, because i have a custom watercooling setup
but the specs are stable
i can bench it without any problems
there is something wrong with the FAH software

it uploads a finished WU, gets a new one, new one doesnt even fold for a second and BAM it gets the memory error

i got no problems with my cpu and the mobo is using BIOS 3010 (wont upgrade to a new BIOS because of unstability at 4.7Ghz)
[T]yphoon
 
Posts: 15
Joined: Sun Sep 30, 2012 8:27 pm

Re: GPU_MEMTEST_ERROR

Postby P5-133XL » Sat Feb 02, 2013 1:05 pm

[T]yphoon wrote:i got no problems with my cpu and the mobo is using BIOS 3010 (wont upgrade to a new BIOS because of unstability at 4.7Ghz)


What are you OC'ing the GPU, CPU, or RAM? As a test, regardless of your faith in the stability of the OC, Could you please just turn all OC'ing off, just for the test. You can turn it all back on after the test. If folding still doesn't work then the problem is not the OC but if folding suddenly starts working then perhaps you might consider that the stability of your OC is not quite as high as the faith you give it.
Image
P5-133XL
 
Posts: 2948
Joined: Sun Dec 02, 2007 5:36 am
Location: Salem. OR USA

Re: GPU_MEMTEST_ERROR

Postby art_l_j_PlanetAMD64 » Sat Feb 02, 2013 1:32 pm

[T]yphoon wrote:i have the same setup for months now, and the last couple of weeks i am getting these errors
i already tried the Memtest, all came without errors (i can fold 1 WU without a problem)
OK, thanks for this information. But I have completed several hundred of the GPU WUs (P762x and P807x) on my four GTX 660 Ti's and two GTX 460's, in the last couple of weeks, without any errors. So I am trying to find out what characteristic of your system is causing the problem.

[T]yphoon wrote:i also cant switch pci-e slots, because i have a custom watercooling setup
I was not asking you to do that, I was asking if any other cards were installed in the other PCIe slots, apart from the GTX 570 SC. If yes, could you please try removing them, to see if it clears up the problem.

[T]yphoon wrote:but the specs are stable
i can bench it without any problems
there is something wrong with the FAH software
This does not follow, because the FAH software (eg downloaded FahCore_15.exe for the GPU) has not changed in the last couple of weeks. And I have no problem with my six GPUs, they complete 100% of these WUs without error.

[T]yphoon wrote:it uploads a finished WU, gets a new one, new one doesn't even fold for a second and BAM it gets the memory error
Something must have changed in your system, in the last couple of weeks. I do not mean that you made any hardware/software changes. The one WU you have identified (P8072 R0 C457 G68) was completed successfully by someone else here:
bruce wrote:Hi ***(hidden)*** (team 0),
Your WU (P8072 R0 C457 G68) was added to the stats database on 2013-02-01 07:12:02 for 3874 points of credit.
Could you please provide the settings you have right now for frequency and voltage, for your GPU? As I suggested, for an experiment only, try setting all of the GPU values to the stock NVidia (not EVGA) settings. This means 732/1464/1900MHz Graphics/Processor/Memory. What we need to do, is to get the card into a working state, then the frequencies/voltages can be moved back towards the factory EVGA OC values.

[T]yphoon wrote:i got no problems with my cpu and the mobo is using BIOS 3010 (wont upgrade to a new BIOS because of instability at 4.7Ghz)
I was not suggesting that you had problems, those settings were only for getting the best possible PPD values from your CPU and GPU.

You have to work with me on this, or I cannot help you with the problem you are experiencing.

It's up to you. If you don't help me, then I can't help you.
art_l_j_PlanetAMD64
Over 1.04 Billion Total Points
Over 185,000 Work Units
Over 3,800,000 PPD
Overall rank (if points are combined) 20 of 1721690
In memory of my Mother May 12th 1923 - February 10th 2012
art_l_j_PlanetAMD64
 
Posts: 472
Joined: Sun May 30, 2010 3:28 pm

Re: GPU_MEMTEST_ERROR

Postby art_l_j_PlanetAMD64 » Sat Feb 02, 2013 3:51 pm

[T]yphoon wrote:it uploads a finished WU, gets a new one, new one doesn't even fold for a second and BAM it gets the memory error
There is one obvious difference between the first WU and the second WU that is folded by the GPU. And that is the temperature of the GPU and memory chips themselves. Apart from that, what difference could there possibly be?

At the beginning of each WU's processing, there is a very strict test of the GPU's memory, to ensure the integrity of the subsequent calculations. This test is successful on the first WU, but not on the second WU. The obvious difference (GPU and memory chip temperatures) is the only possibility I can see, to explain this difference.

So, I would like to suggest again, that you should try for a test only, setting all of the GPU values to the stock NVidia (not EVGA) settings. This means 732/1464/1900MHz Graphics/Processor/Memory. What we need to do, is to get the card into a working state, then the frequencies/voltages can be moved back towards the factory EVGA OC values. This would give us valuable data to work from, to help you to fix this problem. Could you please do that, and tell us the results of this test? Thanks!
art_l_j_PlanetAMD64
 
Posts: 472
Joined: Sun May 30, 2010 3:28 pm

Re: GPU_MEMTEST_ERROR

Postby art_l_j_PlanetAMD64 » Sat Feb 02, 2013 5:48 pm

Also for information, please see this:
JimF wrote:As noted above, a factory overclock by the card maker can cause problems; my first Fermi card was a factory-overclocked GTS 450 that caused errors. I had to reduce the clock down to about the Nvidia chip specs using MSI Afterburner before it was reliable. Asus is good at selecting their chips for their overclocked cards, but what they do for qualification tests may not apply to Folding. And now the work units vary significantly in their difficulty, with P 76xx being significantly harder, so some might work fine while others fail.
art_l_j_PlanetAMD64
 
Posts: 472
Joined: Sun May 30, 2010 3:28 pm

Re: GPU_MEMTEST_ERROR

Postby [T]yphoon » Sun Feb 03, 2013 1:06 pm

well i tested the NVIDIA stock settings
tested a WU, finished that one and the new one failed
i used 1 vcore for the gpu
[T]yphoon
 
Posts: 15
Joined: Sun Sep 30, 2012 8:27 pm

Re: GPU_MEMTEST_ERROR

Postby P5-133XL » Sun Feb 03, 2013 3:05 pm

[T]yphoon wrote:well i tested the NVIDIA stock settings
tested a WU, finished that one and the new one failed
i used 1 vcore for the gpu


I don't think you understood, when I said turn off all OC'ing for the test to see if OC'ing was a factor, I meant all OC'ing including the CPU and RAM not just the GPU.

As a side question since you said you used a vcore for the GPU: Are you running in a virtual environment?
P5-133XL
 
Posts: 2948
Joined: Sun Dec 02, 2007 5:36 am
Location: Salem. OR USA

Re: GPU_MEMTEST_ERROR

Postby [T]yphoon » Sun Feb 03, 2013 3:47 pm

i hope you understand what vcore is
vcore is just the voltage core for the gpu and has nothing to do with a vm or that sort of thing
a gpu cant run in a vm anyway nor can you fold a gpu in a vm

EDIT: and what has my cpu got to do with it? it is a gpu memory thing not my actual ram
and it even fails when my cpu is idling (tested it with and without my cpu folding and my gpu on nvidia stock settings)
EDIT2: i should fold with the v6 client to see if that gives the exact error i am getting with the v7
[T]yphoon
 
Posts: 15
Joined: Sun Sep 30, 2012 8:27 pm

Re: GPU_MEMTEST_ERROR

Postby P5-133XL » Sun Feb 03, 2013 4:30 pm

The reason why to turn off all OC'ing is because a problem's cause may not be identifiable if you just look in one place. Data from the GPU travels through your CPU, your MB and your RAM. Just because the error originates from the GPU does not exclude the possibility that something else that touched the data didn't also harm it. So a problem's cause may not be as obvious as one would think especially when the diagnosis has been hard.

There are VM's that purport to give full HW access of the video subsystem to their hosts. However, to my knowledge you are correct that the current crop of virtual machines do not successfully GPU fold. That does not exclude the possibility that someone isn't trying. So I ask.
P5-133XL
 
Posts: 2948
Joined: Sun Dec 02, 2007 5:36 am
Location: Salem. OR USA

Re: GPU_MEMTEST_ERROR

Postby art_l_j_PlanetAMD64 » Sun Feb 03, 2013 4:55 pm

[T]yphoon wrote:well i tested the NVIDIA stock settings
tested a WU, finished that one and the new one failed
i used 1 vcore for the gpu

It will be very helpful if you will give us the full information about each test, including all of the frequencies, voltages, temperatures, TPF, and PPD of the CPU (even if it is not folding) and GPU for each test. TPF and PPD are for the GPU only, unless the CPU is also folding. For the GPU, please include all 3 frequencies, (eg 732/1464/1900MHz Graphics/Processor/Memory). For an example of this information (although without the internal GPU data), please see the latest test results for my six GPUs here.

Please record the temperatures when the first WU is 99% complete. Do you have any way to measure the GPU VRM and RAM temperatures? This is very important for reliable operation, but those temperatures are not often available from programs like EVGA Precision X or MSI Afterburner. Please see this:
Grantelb4rt wrote:I do think that your VRM temp is the Problem... anything higher than 100°C will cause instability in most cases.
belae wrote:Machine crashed again before i got back home, the GPU temp got upto to 78/79 C, and VRM got to 136 C right before machine crashed.
belae wrote:Failed again, and machine lock up with video card failure. The GPU temp was 78 to 79 C, and VRM was over 135C. I am going to file a RMA, and quite disappointed with the vapor x card. My old card was a 2 gig 6950 with dual fans. Have been very happy with it.
Something is causing the problem, where the first WU folds OK, but the second WU fails with the GPU_MEMTEST_ERROR. The temperatures of the GPU core and the GPU's RAM and VRMs are the most logical explanation for this problem.


[T]yphoon wrote:EDIT: and what has my cpu got to do with it? it is a gpu memory thing not my actual ram

The stability and performance of the GPU folding relies on the correct operation of all parts of the system, including the CPU, RAM, PSU, and GPU. That is the reason why we are asking for all of this data.
Last edited by art_l_j_PlanetAMD64 on Sun Feb 03, 2013 5:58 pm, edited 1 time in total.
art_l_j_PlanetAMD64
 
Posts: 472
Joined: Sun May 30, 2010 3:28 pm

Re: GPU_MEMTEST_ERROR

Postby art_l_j_PlanetAMD64 » Sun Feb 03, 2013 5:39 pm

[T]yphoon, perhaps you can ask kiore about the setup of his system here:
kiore wrote:System 1: (desktop)

CPU -AMD Phenom II 955. (SMP)
GPU -NV GTX 560ti and GTX 570
OS -winXPpro 32 sp3
V7.

He does not seem to have a problem, with folding on the GTX 570.
art_l_j_PlanetAMD64
 
Posts: 472
Joined: Sun May 30, 2010 3:28 pm

Re: GPU_MEMTEST_ERROR

Postby [T]yphoon » Sun Feb 03, 2013 6:27 pm

well the temps arent the problem (fullcover waterblock) and no i cant measure the GPU VRM and RAM temperatures
even as we speak the gpu temp is 45C on full load (on idle it usually around the 35C)
its folding a P8073
TPF is 00:02:41
PPD is 20790

GPU is still on nvidia stock settings with a vcore of 1.000v
[T]yphoon
 
Posts: 15
Joined: Sun Sep 30, 2012 8:27 pm

Re: GPU_MEMTEST_ERROR

Postby art_l_j_PlanetAMD64 » Sun Feb 03, 2013 7:18 pm

[T]yphoon wrote:well the temps arent the problem (fullcover waterblock) and no i cant measure the GPU VRM and RAM temperatures
even as we speak the gpu temp is 45C on full load (on idle it usually around the 35C)
its folding a P8073
TPF is 00:02:41
PPD is 20790

GPU is still on nvidia stock settings with a vcore of 1.000v

Thank you for this information. Please confirm that "nvidia stock settings" means "732/1464/1900MHz Graphics/Processor/Memory". Although this is very likely to be what you mean, we cannot make even very small assumptions about your settings, no matter how obvious it may appear to you (or to us).

And it is unfortunate that the GPU VRM and RAM temperatures are not available. Just because there is a "fullcover waterblock", does not mean that we can make any assumptions about temperatures that we cannot measure. GPU RAM and VRMs often rely on airflow only for their cooling, so unless there is a particularly good thermal connection (which we cannot assume) between the RAM+VRMs and the fullcover waterblock, we cannot rule out excessive GPU RAM and VRM temperatures.

Since your GPU core temperature is good, perhaps you could increase the vcore in small increments, as is described here:
So, people said to increase the voltage, so I did with msi afterburner, increased to 1025mv, but it failed, it crashes with crysis and crysis 2.
Increased to 1050mv, bam no more crashes not a single one or artifacts.


It would be very helpful, if you could please supply all of the information for each test, as it is necessary to establish a baseline for the results. Then we can see if a particular change makes things better or worse. Without the full data, we are mostly just making blind stabs in the dark.

For your reference, the performance of my GTX 660 Ti's when folding P807x WUs is shown here:
art_l_j_PlanetAMD64 wrote:The latest results for my #6 system, with 807x WUs.

My #6 system, AMD Bulldozer FX-8150 3.9GHz CPU, dual Gigabyte GV-N66TOC-2GD GTX 660 Ti GPU:
  • smp:6 - PRCG 7809 (6, 21, 32), 3900MHz, 50C, TPF 17:43, 7022.16 PPD
  • gpu0 - PRCG 8072 (0, 1404, 37), 1215MHz, 55C, TPF 2:07, 26355.40 PPD
  • gpu1 - PRCG 8073 (0, 222, 85), 1228MHz, 51C, TPF 2:05, 26777.09 PPD
Total for this configuration is 60155 PPD.
art_l_j_PlanetAMD64
 
Posts: 472
Joined: Sun May 30, 2010 3:28 pm

Re: GPU_MEMTEST_ERROR

Postby [T]yphoon » Sun Feb 03, 2013 7:36 pm

well there is thermal pads between the vrm and the waterblock as well as the RAM
but i will see what the v6 client does
if it finishes the 1st WU and goes nicely with the 2nd WU then the problem is in the FAH program itself
[T]yphoon
 
Posts: 15
Joined: Sun Sep 30, 2012 8:27 pm

Re: GPU_MEMTEST_ERROR

Postby art_l_j_PlanetAMD64 » Sun Feb 03, 2013 7:53 pm

[T]yphoon wrote:well there is thermal pads between the vrm and the waterblock as well as the RAM

Thank you for this additional information.


[T]yphoon wrote:but i will see what the v6 client does
if it finishes the 1st WU and goes nicely with the 2nd WU then the problem is in the FAH program itself

This does not necessarily follow, even if the v6 client folds all the WUs correctly on your system. Many folders are using GTX 570 (and other types of) GPUs with FAH v7.2.9, on all different GPU Projects (eg 807x, 762x, etc.), without any problems. If there was a systemic software problem as you are suggesting, then there would be hundreds (if not thousands) of problem reports coming in to the Forum here. But there are not. And many of the problem reports that do come in, are resolved by going from a newer driver (eg 310.xx) to a known-good older driver (eg 296.10, 306.81, 306.97).
art_l_j_PlanetAMD64
 
Posts: 472
Joined: Sun May 30, 2010 3:28 pm

PreviousNext

Return to V7.2.x -- Windows/Linux Release & OSX Beta

Who is online

Users browsing this forum: No registered users and 1 guest

cron