P10101

Moderators: slegrand, Site Moderators, PandeGroup

Re: P10101

Postby Slash_2CPU » Sun Jan 03, 2010 2:03 am

dschief wrote:the 285 & the 9800GT are way more powerful than a 8400 GS,

I'm getting clean runs with 9800GTX+
the 9800 has 512 mem & 128 processors

the 8400 has 256 mem & 16 processors.

P10101 is about 1/3 larger than other Gpu Wu's 1254 vs. 973 atoms

it sounds like a case of TOO much wu & TOO little card


Actually, the 9800 has 256MB, and the 8400GS has 512MB. I don't think 1100PPD is bad really.

Do you have a solution or can you add anything besides saying "card is too small?" I doubt that it is even possibly relevant here, since another user sees the same error on a different card and the app does not really seem to care if it has 8, 16, or 240 cores to run on.
Xeon E5 2687W @ 3.29GHz
2x GTX560 Ti @1840 shaders
i7 950 @ 3.75GHz DDR3-1620
Dual Xeon E5472 @ 8x3.3GHz
Phenom II X6 1055T @ 3.72 GHZ
GT 440 @ 1600 shaders
1.32KW
Slash_2CPU
 
Posts: 273
Joined: Sat Apr 19, 2008 5:15 pm

Re: P10101

Postby VijayPande » Sun Jan 03, 2010 3:35 am

With the discussion here, it looks like an important time to force a core upgrade. We don't like to do this in general since we like to give donors some choices, but this seems like an important issue now and v1.31 has been very thoroughly tested. See this blog post for a few more details:
http://folding.typepad.com/news/2010/01 ... -2010.html
Prof. Vijay Pande, PhD
Departments of Chemistry, Structural Biology, and Computer Science
Chair, Biophysics
Director, Folding@home Distributed Computing Project
Stanford University
User avatar
VijayPande
Pande Group Member
 
Posts: 2723
Joined: Fri Nov 30, 2007 6:25 am
Location: Stanford

Re: P10101

Postby ElectricVehicle » Sun Jan 03, 2010 7:16 am

I like some things about the v1.31 core. I'm not allowed to talk about power or heat in this thread, so we all lose the value that information adds.

I have noticed all my GPUs slowing down randomly after several days running at full speed under the Client 6.23, Core v1.31. I have not been able to isolate the cause yet, though I'm working on it. Restrictions in this thread have slowed my progress on the issue.

I was previously Client 6.20, Core v1.19.

The rest of my environment is Windows 7 x64, CUDA 2.3, GTX 295 + 9600 GSO.

I like a lot of things about the v1.31 core, it seems to make more use of the GPUs, which is gneerally a good thing. I'm seeing some other issues that are new enough and complex enough - the simultaneous slow down of all the GPUs, and some other WU behavior that are related to the upgrade, but I have yet to isolate which part of the upgrade and why.

I'm also not in the same position to evaluate v1.31 vs. v1.19. Even if my issues are in anyway related to the v1.31, it may be a big enough improvement to the majority of users that it's worth making a mandatory upgrade.

So take this for what it's worth - I'm seeing behavior I can't explain that has some relationship to Client 6.23 and core v1.31, they may not be the cause - I just can't isolate it yet. More as I figure it out! So I guess go for it unless you've seen several others seeing some of the slowdowns I'm seeing. (Core priority is Slightly Higher when I last checked. This computer is now mixed shader counts, but I saw some similar issues before I added the 9600GSO when I only had the GTX 295 that is dual GPUs with the same shader counts.
ElectricVehicle
 
Posts: 304
Joined: Fri Feb 01, 2008 6:41 pm

Re: P10101

Postby davidcoton » Sun Jan 03, 2010 9:17 am

Have now started a memory test(memtestG80) on my 8400GS. So far, after 8.5 hours,

Test iteration 453 (GPU 0, 208 MiB): 96 errors so far

Prof Pande, please note that currently I am only getting 10101 units, no others were being served to me. Therefore until there is a solution (core or server upgrade, or funding to replace my GPU) I cannot contribute to GPU folding.

Regards,

David
davidcoton
 
Posts: 365
Joined: Wed Nov 05, 2008 3:19 pm
Location: Cambridge, UK

Re: P10101

Postby Hyperlife » Sun Jan 03, 2010 6:15 pm

davidcoton wrote:Have now started a memory test(memtestG80) on my 8400GS. So far, after 8.5 hours,

Test iteration 453 (GPU 0, 208 MiB): 96 errors so far

Prof Pande, please note that currently I am only getting 10101 units, no others were being served to me. Therefore until there is a solution (core or server upgrade, or funding to replace my GPU) I cannot contribute to GPU folding.

If you're getting any errors on the card, then the card is the problem. Contact your manufacturer to begin the RMA process if it's still under warranty.
Image
User avatar
Hyperlife
 
Posts: 439
Joined: Sun Dec 02, 2007 7:38 am

Re: P10101

Postby Sahkuhnder » Sun Jan 03, 2010 7:02 pm

davidcoton wrote:Have now started a memory test(memtestG80) on my 8400GS. So far, after 8.5 hours,

Test iteration 453 (GPU 0, 208 MiB): 96 errors so far

Prof Pande, please note that currently I am only getting 10101 units, no others were being served to me. Therefore until there is a solution (core or server upgrade, or funding to replace my GPU) I cannot contribute to GPU folding.


It appears that Dr. Pande is aware of the issue. Link

After running MemtestG80 on roughly 20,000 hosts on the Folding@home network, they found that the majority of consumer-grade cards demonstrated what they called a "non-negligible, pattern-sensitive rate of memory soft errors."
Image
Sahkuhnder
 
Posts: 215
Joined: Sun Dec 02, 2007 5:28 am
Location: Vegas Baby! Yeah!

Re: P10101

Postby davidcoton » Sun Jan 03, 2010 9:19 pm

After over 17 hours testing, here's the final memtestG80 result:

Final error count after 1000 iterations over 208 MiB of GPU memory: 128 errors


I don't think that accounts for every 10101 unit failing, also the card is now 6% into a 5769 unit. Maybe the results are unreliable, but the link from Sahkuhnder above suggests that could be a much more general problem.

Overall, then, I'm left with several questions:

[*]Does the processing of 10101 units require a particular memory allocation that is beyond the capabilities of an 8400 card? The 208MB allocation used in my tests is within 8MB of the maximum I could allocate in memtestg80.
[*]Are results from 8400 and similar "low end" cards reliable? Maybe comparing multiple results on different cards is enough to validate them. If not, should we give up and restrict GPU folding to the "high end" cards?
[*]If results from 8400 cards are worthwhile, how can we prevent certain units being sent to cards that cannot process them? In particular, how do we stop this happening repeatedly so that a card is doing nothing useful for 2-3 days?

Regards,

David
davidcoton
 
Posts: 365
Joined: Wed Nov 05, 2008 3:19 pm
Location: Cambridge, UK

Re: P10101

Postby toTOW » Sun Jan 03, 2010 11:03 pm

These units are bigger than what we're used to see in terms of atoms. This means that if the whole WU can't stay in VRAM, it'll add a lot of stress to the memory controller and memory chips, which increase the risk of triggering an error.

Can you tell us in which test you get these errors from MemtestG80 ? (check the description of memtestg80 I posted here to interpret the erros : http://en.fah-addict.net/web/web-4-2.php )
Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.

FAH-Addict : latest news, tests and reviews about Folding@Home project.

Image
User avatar
toTOW
Site Moderator
 
Posts: 7999
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France

Re: P10101

Postby davidcoton » Sun Jan 03, 2010 11:08 pm

Unfortunately not. The errors occurred while I wasn't looking, and scrolled out of view before I was aware of them. The summary at the end of the run does not give any detail, only a total.

BTW, the program failed to connect to Stanford to transmit the results -- I don't know if that was a transient problem here, or something at Stanford, or in between.

David
davidcoton
 
Posts: 365
Joined: Wed Nov 05, 2008 3:19 pm
Location: Cambridge, UK

Re: P10101

Postby Buck Nasty » Mon Jan 04, 2010 1:52 am

ElectricVehicle wrote:I like some things about the v1.31 core. I'm not allowed to talk about power or heat in this thread, so we all lose the value that information adds.

I have noticed all my GPUs slowing down randomly after several days running at full speed under the Client 6.23, Core v1.31. I have not been able to isolate the cause yet, though I'm working on it. Restrictions in this thread have slowed my progress on the issue.

I was previously Client 6.20, Core v1.19.

The rest of my environment is Windows 7 x64, CUDA 2.3, GTX 295 + 9600 GSO.

I like a lot of things about the v1.31 core, it seems to make more use of the GPUs, which is gneerally a good thing. I'm seeing some other issues that are new enough and complex enough - the simultaneous slow down of all the GPUs, and some other WU behavior that are related to the upgrade, but I have yet to isolate which part of the upgrade and why.

I'm also not in the same position to evaluate v1.31 vs. v1.19. Even if my issues are in anyway related to the v1.31, it may be a big enough improvement to the majority of users that it's worth making a mandatory upgrade.

So take this for what it's worth - I'm seeing behavior I can't explain that has some relationship to Client 6.23 and core v1.31, they may not be the cause - I just can't isolate it yet. More as I figure it out! So I guess go for it unless you've seen several others seeing some of the slowdowns I'm seeing. (Core priority is Slightly Higher when I last checked. This computer is now mixed shader counts, but I saw some similar issues before I added the 9600GSO when I only had the GTX 295 that is dual GPUs with the same shader counts.


If you are using Priority/Affinity software(i.e. Prifinity 2), the new cores that are downloaded may not be recognized as a favorite setting. They will be set to default and may be locked to a single core, thus slowing production. I have had this happen several times in the past with core upgrades. Also, if you are using Windows Task Manager, the settings reset after every work unit.
Image
User avatar
Buck Nasty
 
Posts: 23
Joined: Sat Nov 29, 2008 11:06 pm

Re: P10101

Postby bruce » Mon Jan 04, 2010 2:01 am

Buck Nasty wrote:They will be set to default and may be locked to a single core, thus slowing production. I have had this happen several times in the past with core upgrades. Also, if you are using Windows Task Manager, the settings reset after every work unit.


Locking to a single core may slow production but it may improve production, depending on your system. In any case, there is a setting in the client configuration asking you if you want to lock the client affinity. What setting did you use?

If that setting is disabled, the FahCore naturally inherits the affinity settings from the client whenever an new copy is started (with each new WU). If you choose to do this with Windows Task Manager, you should set Affinity for both the core and the client. Then when the next WU starts, the core will inherit the affinity that you want.

This applies to all FahCores and all recent clients and has nothing to do with Project 10101 or with version 1.31 of the FahCore.
bruce
Site Admin
 
Posts: 16837
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: P10101

Postby noprob » Mon Jan 04, 2010 6:06 am

I have been using the 1.31 core since it first came out.

I run 3 88xx nVidia cards at stock freq. on XP Pro SP3 with no issues concerning this WU other than a slight increase in temperature on one of my 88xx cards.
Image
User avatar
noprob
 
Posts: 73
Joined: Sun Mar 09, 2008 2:48 am
Location: mountains of West Virginia U.S.of A.

Re: P10101

Postby domboy » Mon Jan 04, 2010 2:28 pm

Tobit wrote:
Slash_2CPU wrote:Instantly unstable on stock clocked G98 8400GS.

I'm not surprised but am surprised that someone would run -advmethods on something like an 8400GS in the first place. :eo


I can also confirm that my lone nVidia GPU (220GT) is getting P10101 WUs without the -advmethods flag set. And it folds them just fine...
Just curious, was Project 4744 a nVidia only project? I recall seeing 548 point WU on ATI GPU clients awhile back, I don't remember what project those were. I'd like to see something like this P10101 on the ATI side too...
domboy
 
Posts: 138
Joined: Thu Oct 02, 2008 1:42 pm
Location: Wilmington NC

Re: P10101

Postby John Naylor » Mon Jan 04, 2010 2:39 pm

Project 4744 was an ATI project... presumably it is not too difficult to change the core used to process GPU units from one manufacturer to the other, though I am just guessing at that :wink:

Project 4744 Description

EDIT: just read the post below me.... d'oh!
Last edited by John Naylor on Mon Jan 04, 2010 2:51 pm, edited 2 times in total.
Folding whatever I'm sent since March 2006 :) Beta testing since October 2006. www.FAH-Addict.net Administrator since August 2009.
User avatar
John Naylor
 
Posts: 1039
Joined: Mon Dec 03, 2007 4:36 pm
Location: University of Birmingham, UK

Re: P10101

Postby Tobit » Mon Jan 04, 2010 2:39 pm

domboy wrote:Just curious, was Project 4744 a nVidia only project?

No, p4744 used to be on ATI. However, there are a lot less ATI GPUs running so, to finish a project, they might have converted some ATI units over to nVidia. I'm not sure if this is the case with p10101 or not but it might be. Or, it could be a case of these having a high atom count and it would be faster to get these back by switching to Nvidia. Remember, every project is benchmarked on an ATI card so any work units can run on both hardware, it just depends on how the Workunit & core are configured. If I am wrong on any of this, someone correct me please. :)
User avatar
Tobit
 
Posts: 743
Joined: Thu Apr 17, 2008 2:35 pm
Location: Manchester, NH USA

PreviousNext

Return to NVIDIA specific issues

Who is online

Users browsing this forum: No registered users and 1 guest