57xx - NV GPUs failing all the time @ all projects

Moderators: slegrand, Site Moderators, PandeGroup

57xx - NV GPUs failing all the time @ all projects

Postby Cenzor » Wed Jan 14, 2009 8:26 pm

If you're seeing issue with your NV GPU (NaNs detected on GPU / Unstable Machine), please fill in a report in this thead : viewtopic.php?f=52&t=7965

The current thread can be used of any discussions related to the issue.
toTOW


I am one of the main folders in our team (folding@sweclockers.com, 37451) and I am speaking for all of us.

What is going on with all these 57xx GPU projects? Do you even test them before you release them?

Why I am so harsh? Well, because every single 57xx GPU project you have released for the past 2 weeks only end up in NAN's and all kind off errors. None of us are able to fold them and yes, we have tried with everything at default settings too. I can add that my personal rig is a GTX280 + 9800GX2 with water cooling and the cards never exceed 50 degrees. I still get these problems, even at default settings.

Please don't use your contributing users as beta testers.

Thank you.

/C
Cenzor
 
Posts: 24
Joined: Sat Aug 16, 2008 5:58 pm

Re: 57xx - Failing all the time @ all projects

Postby Leganfuh » Thu Jan 15, 2009 1:56 am

I would like to comment on this also:

I fold anything that you throw at me and have quite a large investment in my Folding Farm. I do not have most of the problems that Fellow Folders complait about, that might be the fact that I am not a heavy over-clocker. But the fact is, I believe that Stanford sometime forgets that this is our equipment, our investment, and we loan it to you to help in your research. We ask you treat it as if it were your own, do not over stress our computers, make the point system reflect the stress that you inflict on our equipment.

Image

Mike
Leganfuh - Team XCPUS
aka: The Commander 8-)

The Commander Corei7-965 2-GTX 260 Core 216 2-SMP 2-GPU2
(1) Dell XPS 720 2.4 Quad Core, GeForce 8800GTX 2-SMP-MPICH 1-GPU2
(20) Dell Vostro 400's 2.4 Quad Core, GeForce 8800GT 40-SMP-MPICH 20-GPU2
(4) Dell Vostro 410's 2.4 Quad Core, GeForce 8800GT 8-SMP-MPICH 4-GPU2
Mike
Team XCPUs
Leganfuh
aka: The Commander
Image
Image
Leganfuh
 
Posts: 29
Joined: Sat Mar 29, 2008 8:52 pm
Location: Tidewater, Oregon

Re: 57xx - Failing all the time @ all projects

Postby toTOW » Thu Jan 15, 2009 2:20 am

Running a GTX2xx with an earlier generation board has already caused issues ... see in this thread : viewtopic.php?f=52&t=7834

By the way, we're gonna need more details : do you have overclocked hardware ? are the WU failing on all boards at once, or only on particular one ? what is you OS and drivers version ? ...
Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.

FAH-Addict : latest news, tests and reviews about Folding@Home project.

Image
User avatar
toTOW
Site Moderator
 
Posts: 8931
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France

Re: 57xx - Failing all the time @ all projects

Postby Cenzor » Thu Jan 15, 2009 4:00 am

I checked your thread-link and it seems to be similar but not the same. The cards fold away just fine at all 50xx projects but almost all 57xx keep failing. Sometimes they manage to complete a whole WU but it's very rare. The problems intensified sometime in the last 1-2 weeks and includes both EUE, NAN and problems with connections. Regarding the connection problems the client can sit for hours, trying to connect to the work server(s) but when manually quitting it and just restarting it, the connection works fine on the first attempt. I have seen similar looping connection problems before but not this frequent and no, firewalls etc are not involved.

Regarding the other 57xx problems, most of them (as reported in other threads) are related to 5761, but not all. As my team name implies, most of us are pushing our hardware a bit hard, which is why I pointed out that we have also tried stepping down to default settings. I have heard problem reports from several different rigs but the one I personally have the most problems with at the moment is this one:
AMD Phenom 2.5Ghz, Vista ultimate 64, 8800GTX, 9800GX2. Afaik the mixed card problems mostly relate to mixing cards with a big difference in the number of shaders but the 8800 and 9800 are more or less the same card, just different generations, and until now they have co-existed just fine. The same projects also fail on one of my other machines that only has a single gtx260 card and another one with a single gtx280, so mixing cards is not the reason.

Typical problem logs have been posted in other nearby threads related to mainly the 5761 and NAN errors and other 57xx threads. I checked their logs and I have the same errors and problems so no point spamming the same again. I might add again that this is at default clocks, with very good cooling and without using "-forceasm" etc. I usually don't post and whine about WU's but the last week has just been :e(
Last edited by Cenzor on Thu Jan 15, 2009 6:45 am, edited 1 time in total.
Cenzor
 
Posts: 24
Joined: Sat Aug 16, 2008 5:58 pm

Re: 57xx - Failing all the time @ all projects

Postby MoneyGuyBK » Thu Jan 15, 2009 5:40 am

This is starting again......
I had at least a couple of dozen of these EUE'ed over the course of Tuesday and most of Wednesday.......

See my older thread here:
viewtopic.php?f=52&t=7231






Peace
T.E.A.M. “Together Everyone Accomplishes Miracles!”
Image
OC, S. California ... God Bless All
User avatar
MoneyGuyBK
 
Posts: 562
Joined: Sun Dec 02, 2007 6:40 am
Location: Team_XPS ..... OC, S. Calif

Re: 57xx - Failing all the time @ all projects

Postby Cenzor » Thu Jan 15, 2009 6:42 am

Cenzor
 
Posts: 24
Joined: Sat Aug 16, 2008 5:58 pm

Re: 57xx - Failing all the time @ all projects

Postby shdbcamping » Thu Jan 15, 2009 2:04 pm

Cenzor wrote:Here is the effect on our team :? =>>

http://folding.extremeoverclocking.com/ ... s=&t=37451

And ours as well:
http://folding.extremeoverclocking.com/ ... s=&t=80856

I am wondering how much of this problem has to do with the new NV drivers?

On another note... I had been 511pt 57XX free for quite a while. two days ago I started getting them from the server again. The heat involved with the latest batch makes the last round look like a vacation in Siberia :( . I have 8800GT's jumping 13C over the 50XX WU's and 8C over the 384pt 57XX's. First time ever to have heat issues above 80C with anything other than my 9800GX2's.

++++++ 1,000 to Leganfu posted earlier, as I have recently made major invesments in Hardware specifically for the purpose of assisting F@H research.
Last edited by shdbcamping on Thu Jan 15, 2009 2:15 pm, edited 1 time in total.
shdbcamping
 
Posts: 587
Joined: Mon Nov 10, 2008 7:57 am

Re: 57xx - Failing all the time @ all projects

Postby Xilikon » Thu Jan 15, 2009 2:14 pm

That's weird. I have 7 GPU ranging from the 8800GS all the way to the GTX 260 core 216. All have the shaders clock overclocked and none fail on those WU beside the rare bad WU. The core version they are all running is 1.19 or 1.20 and using the 178.24 driver excepted my GTX 260 core 216 which use the new beta Windows 7 WDDM driver.
Image
User avatar
Xilikon
 
Posts: 548
Joined: Sun Dec 02, 2007 1:34 pm

Re: 57xx - Failing all the time @ all projects

Postby MtM » Thu Jan 15, 2009 2:15 pm

And since it's not limited to one team, it's not that important imo ;)

That said, I folded quite some 57xx wu's and had no problems at all with them. So blaming Stanford is not the thing to do. Informing them is something else, but inform without placing blame is my take on this.

The point system should reflect scientific benefit, not stress ( would be nice if they correlated, but stress is not the factor here in any way ).

PG should not threat folder diffrent based on the amount of hardware they spend on the project, anyone asking for this posting his stats and asking for help based on them in my opinion is going the wrong route :(

Ps.

Code: Select all
 Project    : 5748
 Core       : GPUv2 Gromacs
 Frames     : 100
 Waardering : 511


 -- C:\Program Files (x86)\Electronic Arts\ --

 Min. tijd / frame    : 3mn 03s  - 2412,59 ppd
 Gem. tijd / frame    : 3mn 09s  - 2336,00 ppd
 Geen huidige tijd / frame
 Geen R3F. tijd / frame
 Geen eff. tijd / frame

 Project    : 5749
 Core       : GPUv2 Gromacs
 Frames     : 100
 Waardering : 511


 -- GSO 2 --

 Min. tijd / frame    : 2mn 57s  - 2494,37 ppd
 Gem. tijd / frame    : 3mn 12s  - 2299,50 ppd
 Geen huidige tijd / frame
 Geen R3F. tijd / frame
 Geen eff. tijd / frame

 Project    : 5750
 Core       : GPUv2 Gromacs
 Frames     : 100
 Waardering : 511


 -- GSO 1 --

 Min. tijd / frame    : 2mn 36s  - 2830,15 ppd
 Gem. tijd / frame    : 2mn 54s  - 2537,38 ppd
 Geen huidige tijd / frame
 Geen R3F. tijd / frame
 Geen eff. tijd / frame

 Project    : 5751
 Core       : GPUv2 Gromacs
 Frames     : 100
 Waardering : 511


 -- GSO 1 --

 Min. tijd / frame    : 2mn 59s  - 2466,50 ppd
 Gem. tijd / frame    : 3mn 13s  - 2287,59 ppd
 Geen huidige tijd / frame
 Geen R3F. tijd / frame
 Geen eff. tijd / frame


 -- GSO 2 --

 Min. tijd / frame    : 2mn 34s  - 2866,91 ppd
 Gem. tijd / frame    : 2mn 36s  - 2830,15 ppd
 Huidige tijd / frame : 2mn 36s  - 2830,15 ppd
 R3F. tijd / frame    : 2mn 40s  - 2759,40 ppd
 Eff. tijd / frame    : 2mn 45s  - 2675,78 ppd

 Project    : 5751
 Core       : GPUv2 Gromacs
 Frames     : 100
 Waardering : 511


 -- GSO 1 --

 Min. tijd / frame    : 2mn 59s  - 2466,50 ppd
 Gem. tijd / frame    : 3mn 13s  - 2287,59 ppd
 Geen huidige tijd / frame
 Geen R3F. tijd / frame
 Geen eff. tijd / frame


 -- GSO 2 --

 Min. tijd / frame    : 2mn 34s  - 2866,91 ppd
 Gem. tijd / frame    : 2mn 36s  - 2830,15 ppd
 Huidige tijd / frame : 2mn 36s  - 2830,15 ppd
 R3F. tijd / frame    : 2mn 40s  - 2759,40 ppd
 Eff. tijd / frame    : 2mn 45s  - 2675,78 ppd

 Project    : 5752
 Core       : GPUv2 Gromacs
 Frames     : 100
 Waardering : 511


 -- GSO 1 --

 Min. tijd / frame    : 2mn 35s  - 2848,41 ppd
 Gem. tijd / frame    : 3mn 01s  - 2439,25 ppd
 Huidige tijd / frame : 7mn 23s  - 996,62 ppd
 R3F. tijd / frame    : 6mn 08s  - 1199,74 ppd
 Eff. tijd / frame    : 3mn 36s  - 2044,00 ppd

 Project    : 5752
 Core       : GPUv2 Gromacs
 Frames     : 100
 Waardering : 511


 -- GSO 1 --

 Min. tijd / frame    : 2mn 35s  - 2848,41 ppd
 Gem. tijd / frame    : 3mn 01s  - 2439,25 ppd
 Huidige tijd / frame : 7mn 23s  - 996,62 ppd
 R3F. tijd / frame    : 6mn 08s  - 1199,74 ppd
 Eff. tijd / frame    : 3mn 36s  - 2044,00 ppd


 Project    : 5753
 Core       : GPUv2 Gromacs
 Frames     : 100
 Waardering : 511


 -- GSO 1 --

 Min. tijd / frame    : 2mn 35s  - 2848,41 ppd
 Gem. tijd / frame    : 2mn 35s  - 2848,41 ppd
 Huidige tijd / frame : 2mn 35s  - 2848,41 ppd
 R3F. tijd / frame    : 2mn 35s  - 2848,41 ppd
 Eff. tijd / frame    : 1mn 37s  - 4551,59 ppd


 -- GSO 2 --

 Min. tijd / frame    : 2mn 34s  - 2866,91 ppd
 Gem. tijd / frame    : 2mn 36s  - 2830,15 ppd
 Huidige tijd / frame : 2mn 39s  - 2776,75 ppd
 R3F. tijd / frame    : 2mn 38s  - 2794,33 ppd
 Eff. tijd / frame    : 2mn 42s  - 2725,33 ppd


 Project    : 5754
 Core       : GPUv2 Gromacs
 Frames     : 100
 Waardering : 511


 -- GSO 1 --

 Min. tijd / frame    : 2mn 35s  - 2848,41 ppd
 Gem. tijd / frame    : 2mn 35s  - 2848,41 ppd
 Huidige tijd / frame : 2mn 36s  - 2830,15 ppd
 R3F. tijd / frame    : 2mn 35s  - 2848,41 ppd
 Eff. tijd / frame    : 2mn 39s  - 2776,75 ppd


 -- GSO 2 --

 Min. tijd / frame    : 2mn 38s  - 2794,33 ppd
 Gem. tijd / frame    : 3mn 39s  - 2016,00 ppd
 Huidige tijd / frame : 2mn 49s  - 2612,45 ppd
 R3F. tijd / frame    : 2mn 44s  - 2692,10 ppd
 Eff. tijd / frame    : 2mn 41s  - 2742,26 ppd


 Project    : 5755
 Core       : GPUv2 Gromacs
 Frames     : 100
 Waardering : 511


 -- GSO 1 --

 Min. tijd / frame    : 2mn 58s  - 2480,36 ppd
 Gem. tijd / frame    : 2mn 58s  - 2480,36 ppd
 Geen huidige tijd / frame
 Geen R3F. tijd / frame
 Geen eff. tijd / frame

 Project    : 5756
 Core       : GPUv2 Gromacs
 Frames     : 100
 Waardering : 511


 -- GSO 1 --

 Min. tijd / frame    : 2mn 35s  - 2848,41 ppd
 Gem. tijd / frame    : 2mn 38s  - 2794,33 ppd
 Huidige tijd / frame : 2mn 38s  - 2794,33 ppd
 R3F. tijd / frame    : 2mn 41s  - 2742,26 ppd
 Eff. tijd / frame    : 3mn 52s  - 1903,03 ppd


 -- GSO 2 --

 Min. tijd / frame    : 2mn 33s  - 2885,65 ppd
 Gem. tijd / frame    : 2mn 38s  - 2794,33 ppd
 Huidige tijd / frame : 2mn 39s  - 2776,75 ppd
 R3F. tijd / frame    : 2mn 39s  - 2776,75 ppd
 Eff. tijd / frame    : 2mn 43s  - 2708,61 ppd


Ect ect ect ect... I'm getting tired of copy pasting, but I had no EUE on any off these ( besides user error as letting my hdd grind to an halt with no free space :lol: ).
Last edited by MtM on Thu Jan 15, 2009 2:19 pm, edited 1 time in total.
MtM
 
Posts: 3233
Joined: Fri Jun 27, 2008 2:20 pm
Location: The Netherlands

Re: 57xx - Failing all the time @ all projects

Postby Tobit » Thu Jan 15, 2009 2:18 pm

Never has Pande and Co. ever guaranteed that you will always have 100% rock stable projects. Beta testing is done but sometimes bad units slip through the cracks and we have hiccups like this. In the end, we are all "testers". If Pande could test every project before releasing them to us, he wouldn't need us now would he? There is only so much testing they can do before having us start crunching. I am confident that Pande and Co. are working on the problem. I'm sorry your team isn't performing well because of this. However, your attitude doesn't reflect very well on your reasons for participating as it seems you care more about points and gaining "status" than anything else. You say you "speak for all" of you. I sure hope that all of you on your team don't share your same negative attitude. Given your recent sharp decline, maybe you guys have too high of a concentration of NVIDIA GPUs, try adding a few more CPUs. :)
User avatar
Tobit
 
Posts: 675
Joined: Thu Apr 17, 2008 2:35 pm
Location: Manchester, NH USA

Re: 57xx - Failing all the time @ all projects

Postby Xilikon » Thu Jan 15, 2009 2:20 pm

MtM is right. Also, me and MtM is in the beta team and we do QA testing without issues (be assured that if a project is bad, we would be the first to complaint as I did in the past saving the general folders from some headaches).

If the QA (which is a bunch of guys with various hardware all over the world) confirm it work well, then the problem lie more in the setup (excessive overclocking, bad pick of driver/core version or something else causing issues to the GPU2 client). If you are complaining about temps, this mean that cooling is not proper so you should take steps to solve it, not ask Stanford to stop releasing those WU. Sorry to be blunt but that's not how Stanford work since they care more about science first.
User avatar
Xilikon
 
Posts: 548
Joined: Sun Dec 02, 2007 1:34 pm

Re: 57xx - Failing all the time @ all projects

Postby MtM » Thu Jan 15, 2009 2:20 pm

Well said Tobit :)

Edit: same to you Xilikon offcourse. And yes, if a range was really bad, we would notice during beta testing. As said before, I had no issue at all with 57xx wu's ( I had some problems on my own side, but not related to Stanford ). That's why I posted what I did, a recommodation to not place blame on PG as I am quite sure there is no blame to place there. There might be issues, mixed boards is thing which needs investigation as problems do occur more often there then it does with single type gpu folders, but don't say the wu range is bad when you're the minority experiencing troubles.

If you want action to be taken, give them more information to take action on :)
Last edited by MtM on Thu Jan 15, 2009 2:30 pm, edited 1 time in total.
MtM
 
Posts: 3233
Joined: Fri Jun 27, 2008 2:20 pm
Location: The Netherlands

Re: 57xx - Failing all the time @ all projects

Postby shdbcamping » Thu Jan 15, 2009 2:27 pm

Xilikon wrote:That's weird. I have 7 GPU ranging from the 8800GS all the way to the GTX 260 core 216. All have the shaders clock overclocked and none fail on those WU beside the rare bad WU. The core version they are all running is 1.19 or 1.20 and using the 178.24 driver excepted my GTX 260 core 216 which use the new beta Windows 7 WDDM driver.

I agree with the EUE issue having vast differences, as I do not have the issue with EUE's onthe 57XX WU's on any of my cards so far... With the exception of cards shutting down for heat.
I have the latest 181.20? (I think) on an 8800GT 512, 180.70 on GX2's and another 8800GT 512. Running Vista32, Vista64 and XP32 OS's. My card core's are underclocked ~10% and no shader OC is above ~10%. Stock on Memory clocks.

My GX2's had been running 76C max and my 8800GT's 73C max (usually much less).
shdbcamping
 
Posts: 587
Joined: Mon Nov 10, 2008 7:57 am

Re: 57xx - Failing all the time @ all projects

Postby pmsfh2008 » Thu Jan 15, 2009 2:37 pm

I too saw this starting on 1/13/09 with 5765+ WU's,
These are new 678 atom projects.
Looks like too I am getting multiple _01, 02, ... files in the work folder
where otherwise they would be the for same _0x etc.
Gonna put the 178.08 drivers back on for a try.
Let's try to get more info and advise.
pmsfh2008
 
Posts: 75
Joined: Sat Jul 26, 2008 12:17 am
Location: Bradenton, FL

Re: 57xx - Failing all the time @ all projects

Postby MtM » Thu Jan 15, 2009 2:47 pm

Reading back in moneyguyBK's thread, I'm more thinking of an issue with multigpu folding then the wu's on their own. All the logs I saw there had a -gpu 2 or higher number..

pmsfh2008 that sound strange! Why are you saying 'looks like to' ? I haven't heard this before in conjunction with the report here of alluded problems with 57xx wu's.

You always have _0x in your work folder, the x stands for the queue slot the wu occupies?
MtM
 
Posts: 3233
Joined: Fri Jun 27, 2008 2:20 pm
Location: The Netherlands

Next

Return to NVIDIA specific issues

Who is online

Users browsing this forum: No registered users and 0 guests

cron