GPU_MEMTEST_ERROR

Moderators: Site Moderators, PandeGroup

Re: GPU_MEMTEST_ERROR

Postby [T]yphoon » Sun Feb 03, 2013 7:38 pm

well it looks like my v7 client is messed up or there is a bug in it
now folding a 2nd WU without any problems with the v6
Code: Select all
[18:56:57] Completed  45500000 out of 50000000 steps (91%).
[18:59:36] Completed  46000000 out of 50000000 steps (92%).
[19:02:16] Completed  46500000 out of 50000000 steps (93%).
[19:04:56] Completed  47000000 out of 50000000 steps (94%).
[19:07:37] Completed  47500000 out of 50000000 steps (95%).
[19:10:17] Completed  48000000 out of 50000000 steps (96%).
[19:12:57] Completed  48500000 out of 50000000 steps (97%).
[19:15:38] Completed  49000000 out of 50000000 steps (98%).
[19:18:18] Completed  49500000 out of 50000000 steps (99%).
[19:20:57] Completed  50000000 out of 50000000 steps (100%).
[19:20:58] Finished fah_main status=0
[19:20:58] Successful run
[19:20:58] DynamicWrapper: Finished Work Unit: sleep=10000
[19:21:08] Reserved 324184 bytes for xtc file; Cosm status=0
[19:21:08] Allocated 324184 bytes for xtc file
[19:21:08] - Reading up to 324184 from "work/wudata_01.xtc": Read 324184
[19:21:08] Read 324184 bytes from xtc file; available packet space=786106280
[19:21:08] xtc file hash check passed.
[19:21:08] Reserved 20256 20256 786106280 bytes for arc file=<work/wudata_01.trr> Cosm status=0
[19:21:08] Allocated 20256 bytes for arc file
[19:21:08] - Reading up to 20256 from "work/wudata_01.trr": Read 20256
[19:21:08] Read 20256 bytes from arc file; available packet space=786086024
[19:21:08] trr file hash check passed.
[19:21:08] Allocated 544 bytes for edr file
[19:21:08] Read bedfile
[19:21:08] edr file hash check passed.
[19:21:08] Allocated 36703 bytes for logfile
[19:21:08] Read logfile
[19:21:08] GuardedRun: success in DynamicWrapper
[19:21:08] GuardedRun: done
[19:21:08] Run: GuardedRun completed.
[19:21:11] + Opened results file
[19:21:11] - Writing 382199 bytes of core data to disk...
[19:21:11] Done: 381687 -> 350535 (compressed to 91.8 percent)
[19:21:11]   ... Done.
[19:21:11] DeleteFrameFiles: successfully deleted file=work/wudata_01.ckp
[19:21:11] Shutting down core
[19:21:11]
[19:21:11] Folding@home Core Shutdown: FINISHED_UNIT
[19:21:15] CoreStatus = 64 (100)
[19:21:15] Sending work to server
[19:21:15] Project: 8073 (Run 0, Clone 382, Gen 53)
[19:21:15] - Read packet limit of 540015616... Set to 524286976.


[19:21:15] + Attempting to send results [February 3 19:21:15 UTC]
[19:21:15] Gpu type=3 species=20.
[19:21:18] + Results successfully sent
[19:21:18] Thank you for your contribution to Folding@Home.
[19:21:18] + Number of Units Completed: 732

[19:21:22] - Preparing to get new work unit...
[19:21:22] Cleaning up work directory
[19:21:22] + Attempting to get work packet
[19:21:22] Passkey found
[19:21:22] Gpu type=3 species=20.
[19:21:22] - Connecting to assignment server
[19:21:23] - Successful: assigned to (171.67.108.36).
[19:21:23] + News From Folding@Home: Welcome to Folding@Home
[19:21:23] Loaded queue successfully.
[19:21:23] Gpu type=3 species=20.
[19:21:25] + Closed connections
[19:21:25]
[19:21:25] + Processing work unit
[19:21:25] Core required: FahCore_15.exe
[19:21:25] Core found.
[19:21:25] Working on queue slot 02 [February 3 19:21:25 UTC]
[19:21:25] + Working ...
[19:21:25]
[19:21:25] *------------------------------*
[19:21:25] Folding@Home GPU Core
[19:21:25] Version                2.25 (Wed May 9 17:03:01 EDT 2012)
[19:21:25] Build host             AmoebaRemote
[19:21:25] Board Type             NVIDIA/CUDA
[19:21:25] Core                   15
[19:21:25]
[19:21:25] Window's signal control handler registered.
[19:21:25] Preparing to commence simulation
[19:21:25] - Looking at optimizations...
[19:21:25] DeleteFrameFiles: successfully deleted file=work/wudata_02.ckp
[19:21:25] - Created dyn
[19:21:25] - Files status OK
[19:21:25] sizeof(CORE_PACKET_HDR) = 512 file=<>
[19:21:25] - Expanded 57405 -> 259290 (decompressed 451.6 percent)
[19:21:25] Called DecompressByteArray: compressed_data_size=57405 data_size=259290, decompressed_data_size=259290 diff=0
[19:21:25] - Digital signature verified
[19:21:25]
[19:21:25] Project: 8073 (Run 0, Clone 382, Gen 54)
[19:21:25]
[19:21:25] Assembly optimizations on if available.
[19:21:25] Entering M.D.
[19:21:27] Tpr hash work/wudata_02.tpr:  976568545 1448960472 1622442730 78505901 931771268
[19:21:27] GPU device id=0
[19:21:27] Working on God Rules Over Mankind, Animals, Cosmos and Such
[19:21:27] Client config found, loading data.
[19:21:27] Starting GUI Server
[19:22:28] Setting checkpoint frequency: 500000
[19:22:28] Completed         3 out of 50000000 steps (0%).
[19:25:08] Completed    500000 out of 50000000 steps (1%).
[19:27:47] Completed   1000000 out of 50000000 steps (2%).
[19:30:27] Completed   1500000 out of 50000000 steps (3%).
[19:33:06] Completed   2000000 out of 50000000 steps (4%).
[19:35:46] Completed   2500000 out of 50000000 steps (5%).

just copied the last 9% of the 1 WU btw
[T]yphoon
 
Posts: 15
Joined: Sun Sep 30, 2012 7:27 pm

Re: GPU_MEMTEST_ERROR

Postby art_l_j_PlanetAMD64 » Sun Feb 03, 2013 8:38 pm

[T]yphoon wrote:well it looks like my v7 client is messed up or there is a bug in it
now folding a 2nd WU without any problems with the v6

I am glad that your system is folding OK now, but to make a sweeping generalization about some software based on one unique situation (which I myself have done in the past (right, guys?) :ewink: is not valid. There could be several reasons for the results you saw, including your use of the v296.10 driver (which I am not suggesting you change). Perhaps the v296.10 driver works well with v6, but not with v7.2.9. I do know that the v306.97 driver works well with v7.2.9. But the important thing, is that your system is folding without problems now.
art_l_j_PlanetAMD64
Over 1.04 Billion Total Points
Over 185,000 Work Units
Over 3,800,000 PPD
Overall rank (if points are combined) 20 of 1721690
In memory of my Mother May 12th 1923 - February 10th 2012
art_l_j_PlanetAMD64
 
Posts: 568
Joined: Sun May 30, 2010 2:28 pm

Re: GPU_MEMTEST_ERROR

Postby [T]yphoon » Sun Feb 03, 2013 9:23 pm

well i have that version of that driver for a long time as well as that Folding@Home program
and my 2nd rig uses the same version of both nvidia driver as well as the folding@home program
no problems there
[T]yphoon
 
Posts: 15
Joined: Sun Sep 30, 2012 7:27 pm

Re: GPU_MEMTEST_ERROR

Postby art_l_j_PlanetAMD64 » Sun Feb 03, 2013 9:27 pm

I just looked here: http://www.geforce.com/drivers/beta-legacy, and the oldest driver listed is v306.23.

v296.10 was released on 2012-03-13, which predates the FAHClient V7.2.9 (released) on 2012-10-09. Perhaps there is some incompatibility.

There is even a whole topic in the Forums here:
Nvidia 296.10 WHQL drivers fail
so there are some folders who find that v296.10 does not work for them.

From that forum:
davidcoton wrote:
JimF wrote:The monitor is always set to turn off after 15 minutes, but I think the failure occurred after that, maybe right at the end of the work unit,


From your log, the failure is right at the start of a WU. See http://foldingforum.org/viewtopic.php?f=67&t=20891&p=209119&hilit=295#p209065 for a discussion of this on recent NVidia drivers (I don't use these versions myself, so I'm only offering you the links). What seems to be happening is that, using the monitor off functions, folding fails at the start of a new WU, just as you are seeing. Solutions: older drivers, or disable the auto monitor power off.

Hope that helps,

David

JimF wrote:davidcoton - thanks for the info.

As a test, I tried the following changes in the Nvidia control panel:
    Power Management Mode: Prefer Maximum Performance (I usually have it set that way, but had not done so with the new drivers yet).
    CUDA - GPUs: GeForce GT 240 (in order to exclude the GT 430, which is the display card that drives the monitor)
That successfully completed the work unit, and downloaded the next one properly. Maybe it is a permanent fix, maybe not.

EDIT: The next work unit failed. I am going back to the old drivers.

The_Nephilim wrote:Well I have had problems with F@H and the 296.10 Drivers and had Full power and no screensaver the WU's would Fail at the last 99%.. I DL'd and installed the new Betas and they seem to be working fine..

So the 296.10 Drivers just seem problematic


So, the problem you had, also affected others who were using the v296.10 driver.
art_l_j_PlanetAMD64
 
Posts: 568
Joined: Sun May 30, 2010 2:28 pm

Re: GPU_MEMTEST_ERROR

Postby [T]yphoon » Sun Feb 03, 2013 10:55 pm

i have a GTX 580 on my 2nd rig and is using the 7.2.9 as well, no problems there
and its also not running on stock
[T]yphoon
 
Posts: 15
Joined: Sun Sep 30, 2012 7:27 pm

Re: GPU_MEMTEST_ERROR

Postby art_l_j_PlanetAMD64 » Sun Feb 03, 2013 11:18 pm

[T]yphoon wrote:i have a GTX 580 on my 2nd rig and is using the 7.2.9 as well, no problems there
and its also not running on stock

Well, we just found out that each system has its own unique characteristics, that can be different from other similar systems. What a surprise.

I had the NVidia WHQL R310.70 driver install perfectly on 2 out of 3 computers here, and then it failed horribly on the 3rd computer:
Problem with the new WHQL R310.70 driver on GTX460s

WinXP SP3 32-bit, and Win7 SP1 32-bit, both worked perfectly with the NVidia WHQL R310.70 driver. But Win7 SP1 64-bit failed spectacularly. And that was a clean, brand-new Windows 7 64-bit installation, with no 3rd-party applications except for a web browser (Firefox) and some simple utilities (CCleaner, Speccy, Core Temp, EVGA Precision X, and APC PowerChute UPS monitoring software).
art_l_j_PlanetAMD64
 
Posts: 568
Joined: Sun May 30, 2010 2:28 pm

Re: GPU_MEMTEST_ERROR

Postby bruce » Mon Feb 04, 2013 4:14 am

art_l_j_PlanetAMD64 wrote:v296.10 was released on 2012-03-13, which predates the FAHClient V7.2.9 (released) on 2012-10-09. Perhaps there is some incompatibility.

Bad assumption.

Nothing in the FAHClient V7.2.9 has anything to do with the GPU drivers. The Client downloads a WU which (if necessary) downloads the required version of the FahCore which is the only part of the FAH software that actually interfaces with the driver. Now if you told me that FahCore_xx Version y.y.y predates the driver version zzz.zz, you might convince me there could be an incompatibility.

We actively discourage unnecessary updates to the drivers because the FahCore's themselves are tested against the then-current-version of the drivers and are rarely updated. Both AMD and NV seem to have a lot of trouble maintaining backward compatibility with the library routines that are in a specific version of a FahCore.
bruce
 
Posts: 22592
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: GPU_MEMTEST_ERROR

Postby art_l_j_PlanetAMD64 » Mon Feb 04, 2013 4:32 am

bruce wrote:
art_l_j_PlanetAMD64 wrote:v296.10 was released on 2012-03-13, which predates the FAHClient V7.2.9 (released) on 2012-10-09. Perhaps there is some incompatibility.

Bad assumption.

Nothing in the FAHClient V7.2.9 has anything to do with the GPU drivers. The Client downloads a WU which (if necessary) downloads the required version of the FahCore which is the only part of the FAH software that actually interfaces with the driver. Now if you told me that FahCore_xx Version y.y.y predates the driver version zzz.zz, you might convince me there could be an incompatibility.

We actively discourage unnecessary updates to the drivers because the FahCore's themselves are tested against the then-current-version of the drivers and are rarely updated. Both AMD and NV seem to have a lot of trouble maintaining backward compatibility with the library routines that are in a specific version of a FahCore.

Thanks, bruce, this is very good to know. My (incorrect) statement was just uninformed speculation, and it's good to get the true information about it.

Yes, I found out (and I should have known better) that it is a very bad idea to make any driver change, unless it is to try to fix an existing driver problem, or to get some required new features (like new GPU support). Updating just for the heck of it, or to be able to say you've got the newest driver version, is a very poor idea. If it ain't broke, don't fix it.
art_l_j_PlanetAMD64
 
Posts: 568
Joined: Sun May 30, 2010 2:28 pm

Re: GPU_MEMTEST_ERROR

Postby [T]yphoon » Tue Feb 12, 2013 10:38 pm

well for some reason i was getting the machine_unstable errors
even increasing to max safe voltages (1.1v) it gave me the unstable_machine error
also tried the memtestG80-1.1 and it gave me 50k's of errors

so i tried googling my problem (it also went to powersaving mode forever :( ) and on the EVGA forum they said that a reboot didnt fix it but a shutdown and startup did
so i tried it and boom it worked (pc didnt had a shutdown in months) also ran memtestG80-1.1 and no errors
maybe thats why i was getting these GPU_MEMTEST errors
[T]yphoon
 
Posts: 15
Joined: Sun Sep 30, 2012 7:27 pm

Re: GPU_MEMTEST_ERROR

Postby [T]yphoon » Fri Feb 15, 2013 6:04 am

its possible that FAHControl sees the GeForce GTX 560 as the GeForce 8600 GTS and visa versa (my 2nd rig has also 2 cards and does that as well, luckly they are both fermi models)
maybe you can remove the GeForce 8600 GTS? or try to run it solo
[T]yphoon
 
Posts: 15
Joined: Sun Sep 30, 2012 7:27 pm

Re: GPU_MEMTEST_ERROR

Postby codysluder » Fri Feb 15, 2013 8:14 pm

The only CPUs that are "too old" are the ones that don't have SSE, and that's really, really old (in CPU-years). As far as GPUs are concerned, when the manufacturer drops support for them, so does Stanford. AMD is only supporting OpenCL 1.1 or 1.2 and than means 5000 or greater. Nvidia is still supporting CUDA back as far as the G80, which is now several generations old.
codysluder
 
Posts: 2128
Joined: Sun Dec 02, 2007 12:43 pm

Previous

Return to V7.2.x -- Windows/Linux Release & OSX Beta

Who is online

Users browsing this forum: No registered users and 1 guest

cron