2 GTX295's installed, UNSTABLE_MACHINE issues

Moderators: slegrand, Site Moderators, PandeGroup

2 GTX295's installed, UNSTABLE_MACHINE issues

Postby jfarque » Sat Jan 10, 2009 7:23 pm

People with two or more 9800GX2's working in one machine may be best suited to help me with this.

I have two GTX295's installed in my machine. The first two GPU's on the first slot card work fine and I was able to get them going lickity-split last night with F@H v6.23. They worked overnight producing 61 WU's and 3072 points. FahMon 2.3.4 is estimating 14,271ppd for those two cores.

Card 2's cores are being persnickety. I've tried a lot of different combinations of configurations on the two cores and so far nothing I've done has been able to produce anything but UNSTABLE_MACHINE on either core even when the first card's cores are idle. The WU's I'm getting seem to be 57xx's. I have Everest and am able to watch the core's temperatures and they never go higher than 62'C so it's not likely to be thermal. Running the fans at 100% has not changed the result.

I have 3 GTX295 cards and once I get two working I'm going to make a stab at getting the third one working in the same machine. I have not made a serious effort at that yet but it's pertinent because it means that I have a third card available to swap out with. I have done that with no change in the results so it doesn't seem to be a faulty card.

Until last night this machine had 3 water cooled GTX280's in it folding away, so I'm fairly confident in my ability to install and configure the cores (I have RTFM). I may be overlooking something though...

Some information on the machine:
nVidia driver 181.20
nVidia 790i Ultra motherboard, BIOS P8 (latest)
4GB 1600MHZ DDR3
Intel 9770 quad core processor at 4.2GHz water cooled
Two EVGA GTX295's on air.
Thermaltake Toughpower 1,200W PSU for electronics
Antec 650W PSU for case fans & water pumps
Vista x64 Ultimate SP1 & fully patched up
4 displays with desktop extended onto them all (one for each GPU in two slot cards)

Quad SLI is working fine for gaming and is very smooth, so far I'm impressed with the cards. Micro-stuttering under SLI gaming is very hard to detect and the temperatures stay sane unlike the GTX280s where running three of them on air was thermally brutal.

Here's a typical log output from those cores:

Code: Select all
# Windows GPU Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.23

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Users\jfarque\AppData\Roaming\Folding@home-gpu-4
Arguments: -gpu 3 -verbosity 9 -forcegpu nvidia_g80

[05:11:21] - Ask before connecting: No
[05:11:21] - User name: jfarque (Team 111065)
[05:11:21] - User ID: 35B4FE3C0D7D8DF7
[05:11:21] - Machine ID: 11
[05:11:21]
[05:11:21] Loaded queue successfully.
[05:11:21] Initialization complete
[05:11:21]
[05:11:21] + Processing work unit
[05:11:21] - Autosending finished units... [January 10 05:11:21 UTC]
[05:11:21] Trying to send all finished work units
[05:11:21] + No unsent completed units remaining.
[05:11:21] - Autosend completed
[05:11:21] Core required: FahCore_11.exe
[05:11:21] Core found.
[05:11:21] Working on queue slot 02 [January 10 05:11:21 UTC]
[05:11:21] + Working ...
[05:11:21] - Calling '.\FahCore_11.exe -dir work/ -suffix 02 -priority 96 -nocpulock -checkpoint 15 -verbose -lifeline 3496 -version 623'

[05:11:21]
[05:11:21] *------------------------------*
[05:11:21] Folding@Home GPU Core - Beta
[05:11:21] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[05:11:21]
[05:11:21] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[05:11:21] Build host: amoeba
[05:11:21] Board Type: Nvidia
[05:11:21] Core      :
[05:11:21] Preparing to commence simulation
[05:11:21] - Ensuring status. Please wait.
[05:11:31] - Looking at optimizations...
[05:11:31] - Working with standard loops on this execution.
[05:11:31] - Previous termination of core was improper.
[05:11:31] - Files status OK
[05:11:31] - Expanded 68561 -> 357580 (decompressed 521.5 percent)
[05:11:31] Called DecompressByteArray: compressed_data_size=68561 data_size=357580, decompressed_data_size=357580 diff=0
[05:11:31] - Digital signature verified
[05:11:31]
[05:11:31] Project: 5763 (Run 13, Clone 63, Gen 99)
[05:11:31]
[05:11:31] Entering M.D.
[05:11:38] mdrun_gpu returned
[05:11:38] Going to send back what have done -- stepsTotalG=0
[05:11:38] Work fraction=0.0000 steps=0.
[05:11:41] logfile size=909 infoLength=909 edr=0 trr=25
[05:11:41] - Writing 1447 bytes of core data to disk...
[05:11:41] Done: 935 -> 611 (compressed to 65.3 percent)
[05:11:41]   ... Done.
[05:11:42]
[05:11:42] Folding@home Core Shutdown: UNSTABLE_MACHINE
[05:11:45] CoreStatus = 7A (122)
[05:11:45] Sending work to server
[05:11:45] Project: 5763 (Run 13, Clone 63, Gen 99)
[05:11:45] - Read packet limit of 540015616... Set to 524286976.


[05:11:45] + Attempting to send results [January 10 05:11:45 UTC]
[05:11:45] - Reading file work/wuresults_02.dat from core
[05:11:45]   (Read 1123 bytes from disk)
[05:11:45] Connecting to http://171.64.65.106:8080/
[05:11:46] Posted data.
[05:11:46] Initial: 0000; - Uploaded at ~2 kB/s
[05:11:46] - Averaged speed for that direction ~1 kB/s
[05:11:46] + Results successfully sent
[05:11:46] Thank you for your contribution to Folding@Home.
[05:11:50] Trying to send all finished work units
[05:11:50] + No unsent completed units remaining.
[05:11:50] - Preparing to get new work unit...
[05:11:50] + Attempting to get work packet
[05:11:50] - Will indicate memory of 4093 MB
[05:11:50] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 7, Stepping: 7
[05:11:50] - Connecting to assignment server
[05:11:50] Connecting to http://assign-GPU.stanford.edu:8080/
[05:11:51] Posted data.
[05:11:51] Initial: 40AB; - Successful: assigned to (171.64.65.106).
[05:11:51] + News From Folding@Home: GPU folding beta
[05:11:51] Loaded queue successfully.
[05:11:51] Connecting to http://171.64.65.106:8080/
[05:11:51] Posted data.
[05:11:51] Initial: 0000; - Receiving payload (expected size: 69081)
[05:11:52] - Downloaded at ~67 kB/s
[05:11:52] - Averaged speed for that direction ~74 kB/s
[05:11:52] + Received work.
[05:11:52] Trying to send all finished work units
[05:11:52] + No unsent completed units remaining.
[05:11:52] + Closed connections
[05:11:57]
[05:11:57] + Processing work unit
[05:11:57] Core required: FahCore_11.exe
[05:11:57] Core found.
[05:11:57] Working on queue slot 03 [January 10 05:11:57 UTC]
[05:11:57] + Working ...
[05:11:57] - Calling '.\FahCore_11.exe -dir work/ -suffix 03 -priority 96 -nocpulock -checkpoint 15 -verbose -lifeline 3496 -version 623'

[05:11:57]
[05:11:57] *------------------------------*
[05:11:57] Folding@Home GPU Core - Beta
[05:11:57] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[05:11:57]
[05:11:57] Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[05:11:57] Build host: amoeba
[05:11:57] Board Type: Nvidia
[05:11:57] Core      :
[05:11:57] Preparing to commence simulation
[05:11:57] - Looking at optimizations...
[05:11:57] - Created dyn
[05:11:57] - Files status OK
[05:11:57] - Expanded 68569 -> 357580 (decompressed 521.4 percent)
[05:11:57] Called DecompressByteArray: compressed_data_size=68569 data_size=357580, decompressed_data_size=357580 diff=0
[05:11:57] - Digital signature verified
[05:11:57]
[05:11:57] Project: 5762 (Run 7, Clone 12, Gen 72)
[05:11:57]
[05:11:57] Assembly optimizations on if available.
[05:11:57] Entering M.D.
[05:12:04] mdrun_gpu returned
[05:12:04] Going to send back what have done -- stepsTotalG=0
[05:12:04] Work fraction=0.0000 steps=0.


Here's an iPhone photo I took when I (briefly!) had three cards in. Again, I'm only trying to debug a 2 card issue at the moment.

Image

jaf
jfarque
 
Posts: 52
Joined: Fri Dec 28, 2007 1:01 am

Re: 2 GTX295's installed, UNSTABLE_MACHINE issues

Postby ChelseaOilman » Sat Jan 10, 2009 7:38 pm

Did you install the graphics drivers when both cards were installed in the machine? If not, I would try that to make sure the drivers are properly installed for both cards.
User avatar
ChelseaOilman
 
Posts: 1743
Joined: Sun Dec 02, 2007 3:47 pm
Location: Colorado @ 10,000 feet

Re: 2 GTX295's installed, UNSTABLE_MACHINE issues

Postby toTOW » Sat Jan 10, 2009 7:38 pm

You didn't mentioned it, but did you disable SLI (and internal SLI of the boards) and extend the desktop to run FAH ?
Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.

FAH-Addict : latest news, tests and reviews about Folding@Home project.

Image
User avatar
toTOW
Site Moderator
 
Posts: 7999
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France

Re: 2 GTX295's installed, UNSTABLE_MACHINE issues

Postby Adanorm » Sat Jan 10, 2009 7:40 pm

Are you sure that 650W PSU is enouth ?
Owner of the "Meuh farm", a new computing farm for "Alliance Francophone" (team 51)
Project state : 2 Quadcore active
Image
User avatar
Adanorm
 
Posts: 8
Joined: Wed Feb 06, 2008 8:59 pm
Location: Bruz, France

Re: 2 GTX295's installed, UNSTABLE_MACHINE issues

Postby toTOW » Sat Jan 10, 2009 7:43 pm

Thermaltake Toughpower 1,200W PSU for electronics
Antec 650W PSU for case fans & water pumps


He doesn't use the 650W PSU to power the machine ... but the question is valid about the 1200W one ... are you sure that you didn't reach its limit ?
User avatar
toTOW
Site Moderator
 
Posts: 7999
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France

Re: 2 GTX295's installed, UNSTABLE_MACHINE issues

Postby jfarque » Sat Jan 10, 2009 7:56 pm

I did disable SLI and PHYSX, and I did extend the desktop onto four monitors.

I have a 650W supply for the fans and pumps and a 1,200W supply for the motherboard, cards, 1 hard disk and 1 DVD-ROM. The supply never flinched under the three GTX280's, but the power requirements are higher for the GTX295's so I've been watching it closely.

I can fold on card 1 with both cores while card 2 is idle without a problem (7 hours overnight). But I can't get a single core to even attempt to fold on the second card with card 1's cores idle. While this doesn't mean that I won't hit supply issues at a 4-core load I think it's a reasonable test that it's not an immediate supply issue unless there's something that I fundamentally don't understand about this configuration.

I can upgrade the power supply and may well do so if I try to get all three GTX295's working in this one machine.

Thanks for the quick responses!

jaf
jfarque
 
Posts: 52
Joined: Fri Dec 28, 2007 1:01 am

Re: 2 GTX295's installed, UNSTABLE_MACHINE issues

Postby jfarque » Sat Jan 10, 2009 8:00 pm

toTOW wrote:...but did you disable SLI (and internal SLI of the boards)... ?


I disabled Quad-SLI and PHYSX in the nVidia Control Panel. Is there some setting that I'm not aware of that controls the internal SLI? I've dug around and don't see anything like that.

Why might card 1 work and card 2 not under the same settings?

jaf
jfarque
 
Posts: 52
Joined: Fri Dec 28, 2007 1:01 am

Re: 2 GTX295's installed, UNSTABLE_MACHINE issues

Postby Tobit » Sat Jan 10, 2009 8:08 pm

jfarque wrote:I disabled Quad-SLI and PHYSX in the nVidia Control Panel. Is there some setting that I'm not aware of that controls the internal SLI? I've dug around and don't see anything like that.

Did you remove the actual hardware bridge? I see it still installed in your iPhone pic above.
User avatar
Tobit
 
Posts: 743
Joined: Thu Apr 17, 2008 2:35 pm
Location: Manchester, NH USA

Re: 2 GTX295's installed, UNSTABLE_MACHINE issues

Postby dempaSD » Sat Jan 10, 2009 8:15 pm

In the process of issues tracing: I'd try to remove the first card and test the second individually as well. And the first card in the second slot etc.
Image
dempaSD
 
Posts: 146
Joined: Tue Nov 18, 2008 2:16 am
Location: Sweden and US

Re: 2 GTX295's installed, UNSTABLE_MACHINE issues

Postby OldChap » Sat Jan 10, 2009 9:28 pm

Try laying the motherboard horizontally whilst you are faultfinding. That has solved a not dissimilar issue for me with gx2's. Seems the weight of the cards puts some stress on the slot/connections over time and disturbing a good setup meant it was hard to re-establish good connections. This and a complete re-install of all folding worked for me.

Worth a try anyway

good luck

One Jealous Old Chap
Image
OldChap
 
Posts: 68
Joined: Thu Jan 01, 2009 10:27 am

Re: 2 GTX295's installed, UNSTABLE_MACHINE issues

Postby jfarque » Sat Jan 10, 2009 9:51 pm

OldChap,

I built the machine a few months ago for three water cooled GTX280s. I was worried about the weight of the cards on the slots too so I build it horizontal.

Good call though.

Image

jaf
jfarque
 
Posts: 52
Joined: Fri Dec 28, 2007 1:01 am

Re: 2 GTX295's installed, UNSTABLE_MACHINE issues

Postby OldChap » Sat Jan 10, 2009 10:10 pm

Now I'm even more jealous. I just love that rig !!

Swap the pci(e) leads from the good one? Hey sorry, teaching to suck eggs.
OldChap
 
Posts: 68
Joined: Thu Jan 01, 2009 10:27 am

Re: 2 GTX295's installed, UNSTABLE_MACHINE issues

Postby toTOW » Sat Jan 10, 2009 10:46 pm

If we talk about GPU identificators (the X value in -gpu X flag), which ones are working, and which aren't ?
User avatar
toTOW
Site Moderator
 
Posts: 7999
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France

Re: 2 GTX295's installed, UNSTABLE_MACHINE issues

Postby ferrari12508 » Sat Jan 10, 2009 10:52 pm

I get that error when my shader clocks are fine for playing games and just using my computer, but are too high for F@H. Get rivatuner and lower the clocks and see if that helps.
Image
User avatar
ferrari12508
 
Posts: 68
Joined: Mon Sep 29, 2008 10:34 am
Location: Central New Jersey

Re: 2 GTX295's installed, UNSTABLE_MACHINE issues

Postby mjbservices » Sun Jan 11, 2009 1:36 am

Do you get the same error if you swap the gpu's from first and second slot and try to fold on the 'second' gpu in the first slot? Just want to make sure we do not have a faulty/marginal gpu.

I have 12 9800 gt's in 6 systems and 1 card would consistantly get unstable_machine even though it worked great for gaming, it just would not fold no matter where I used it.

Good luck!
mjbservices
 
Posts: 16
Joined: Sun Jan 11, 2009 1:17 am

Next

Return to NVIDIA specific issues

Who is online

Users browsing this forum: No registered users and 1 guest