Rash of bad WU all 13000/13001

Moderators: Site Moderators, PandeGroup

Re: Bad WU: 13001 (Run 532, Clone 3, Gen 4)

Postby pinetor » Fri Jun 06, 2014 1:57 am

As to the CPU threads: very little return for the CPU folding versus the GPU... fan noise ect..

As to the GPU OC.. none. I cant see OCing a GPU and then running it 24/7 . This is my one and only rig, so no super duper juice.
pinetor
 
Posts: 13
Joined: Fri Jun 06, 2014 12:10 am

Re: Rash of bad WU all 13000/13001

Postby pinetor » Fri Jun 06, 2014 2:05 am

The first time ..I cant recall which WU. the WU stuck at 99.9% I had to reboot.
I then deleted the Work folder and it picked another very short WU ( 9xxx) this ran to completion
It then picked up another 13000 WU. this locked up the GPU
I re-booted and let the WU reload... again today the GPU locked up
I had to hard boot yet again and the WU re=loaded at 80%
I have yet to complete any 13000/13001 WU
Currently I have all work paused, as there is no use in just running to less than 100%. Plus I am not here when the GPU locks up, so I am not sure what may be happening. I do run gpuz. The CPU is water cooled, but as noted not doing anything.

I will be glad to DL the 14.4 if you think that will get me back to folding.
pinetor
 
Posts: 13
Joined: Fri Jun 06, 2014 12:10 am

Re: Bad WU: 13000 (Run 952, Clone 0, Gen 22)

Postby PantherX » Fri Jun 06, 2014 2:14 am

Welcome to the F@H Forum pinetor,

Please note that if you don't want to fold on the CPU, simply remove the CPU Slot by following these instructions:
1) Open up Advanced Control (AKA FAHControl)
2) Click Configure
3) Select the Slots Tab
4) Select the appropriate Slot
5) Click Remove
6) Click Save

Thus, you will only have a single GPU Slot which you can fold on.

Moreover, please refrain from manually deleting the work folder since it delays the progress of science.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Chrome Folding App (Beta) Ӂ Troubleshooting "Bad WUs" Ӂ Troubleshooting Server Connectivity Issues
User avatar
PantherX
Site Moderator
 
Posts: 6321
Joined: Wed Dec 23, 2009 9:33 am

Re: Rash of bad WU all 13000/13001

Postby PantherX » Fri Jun 06, 2014 2:24 am

pinetor wrote:The first time ..I cant recall which WU. the WU stuck at 99.9% I had to reboot...

In majority of cases, it is caused by the driver being reloaded by the OS. Can you search Windows Event Log for messages related to driver reloading? Also, while you have stated that the GPU isn't overclocked, is it factory overclocked by chance? If so, it is possible that the factory overclock is unstable so "down-clock" to the AMD stock frequencies.

pinetor wrote:...It then picked up another 13000 WU. this locked up the GPU
I re-booted and let the WU reload... again today the GPU locked up
I had to hard boot yet again and the WU re=loaded at 80%
I have yet to complete any 13000/13001 WU
Currently I have all work paused, as there is no use in just running to less than 100%. Plus I am not here when the GPU locks up, so I am not sure what may be happening. I do run gpuz. The CPU is water cooled, but as noted not doing anything...

Could you please explain by what "locked up the GPU" means? If you mean that the cursor moves very slowly across the screen, this is called screen lag. The cause of the screen lag can be a combination of drivers and GPU model (among other factors) and the reason for this is that the GPU works on a First In First Out (FIFO) manner which doesn't have any kind of priority/scheduling system like the CPU to manage tasks. If you encounter screen lag, the best solution is to configure the GPU to fold only when the system is idle (http://folding.stanford.edu/home/faq/fa ... ion/#ntoc3).

pinetor wrote:...I will be glad to DL the 14.4 if you think that will get me back to folding.

It has been reported by donors that 14.4 WHQL improves performance over 13.X WHQL driver series for folding. Thus, you can try it out.
User avatar
PantherX
Site Moderator
 
Posts: 6321
Joined: Wed Dec 23, 2009 9:33 am

Re: Rash of bad WU all 13000/13001

Postby pinetor » Fri Jun 06, 2014 4:01 am

Thanks for all the assistance.
I have updated the Catalyst ( complete package) to the latest.
I have also manually over ridden the GPU fan speed to 55%. Generally it never gets above 40% even after long hours at 98% load.

What I mean by GPU lock-up is that the screen has a cyan "checker-board pattern" and the entire system is non-responsive. I have seen a GPU go bad ( during mining) and the situation seems similar thus i conclude the GPU is locked up. Given the CPU is doing very little ( 13% load) and the memory use is below 3GB ( 2.75) out of 8GB. I don't think of those sub-systems are to blame. I was able to fold about 560k worth of points before any problems popped up ( I know thats not impressive, but it represent at least two weeks worth of error free folding).
pinetor
 
Posts: 13
Joined: Fri Jun 06, 2014 12:10 am

Re: Bad WU: 13000 (Run 952, Clone 0, Gen 22)

Postby pinetor » Fri Jun 06, 2014 4:08 am

Thank you for the welcome!

It certainly would not be my first thought (to delete the work folder) But at the time. the WU was stuck at 99% and my GPU ( the slot assigned to it) was at 0% load. I let it set this way for several hours ( while browsing the forums) after a few more re-boots, I gave up and deleted the folder. This DID get me a non-13000 WU which then did run to completion. However the next WU and all since then ( on the GPU) have been 13000/13001 WU.
pinetor
 
Posts: 13
Joined: Fri Jun 06, 2014 12:10 am

Re: Bad WU: 13001 (Run 486, Clone 5, Gen 15)

Postby bruce » Fri Jun 06, 2014 4:22 am

P5-133XL wrote:Hi pinetor (team 224497),
Your WU (P13001 R486 C5 G15) was added to the stats database on 2014-06-04 22:03:33 for 13869.6 points of credit.


Partial credit, in spite of the error. Presumably the WU has been reassigned to see if somebody else can complete it but we'll have to wait until they have time to process it before we can report the final status.
bruce
 
Posts: 22698
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Bad WU: 13000 (Run 952, Clone 0, Gen 22)

Postby bruce » Fri Jun 06, 2014 4:36 am

Having the GPU at 0% while the WU appears to be stuck at 99% has been reported many times. There are several possible causes, especially overclocking or overheating or MSRemoteDesktop or a Sleep state, all of which can reset the GPU, thereby stopping all progress. Unfortunately, the estimated progress continues to increase from whatever point the reset happened ... until it reaches 99% in the GUI ... although the log stops reporting progress.

If you discover that the progress indications in the log stop and become unsynchronized with the GUI, you can manually recover by doing a Pause, followed by a Fold.

You'll need to eliminate all but one of the causes the OS has decided to reset the GPU and then prevent that from causing it to reset.

I see you have reported several WUs with the message "Bad State detected... attempting to resume from last good checkpoint" Other people are not having that problem although we'll have to wait to confirm that others have successfully complete the same WUs. I have a strong hunch that those messages are an indication of the same problem as what I've called an OS-initiated GPU Reset. There's a good chance that FAH puts heavier computational demands on the GPU that SHA64 so GPUs which appeared to be stable are now demonstrably unstable.
bruce
 
Posts: 22698
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Rash of bad WU all 13000/13001

Postby bruce » Fri Jun 06, 2014 5:05 am

Project 13000/13001 puts heavier demands on your system than many other applications. FAH is very good at uncovering systems which have appeared to be stable until now but which, in fact, are marginal under high load.

See also the answer I provided in one of your other topics.
viewtopic.php?f=19&t=26436&p=265705#p265705
bruce
 
Posts: 22698
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Rash of bad WU all 13000/13001

Postby PantherX » Fri Jun 06, 2014 5:08 am

pinetor wrote:...What I mean by GPU lock-up is that the screen has a cyan "checker-board pattern" and the entire system is non-responsive. I have seen a GPU go bad ( during mining) and the situation seems similar thus i conclude the GPU is locked up. Given the CPU is doing very little ( 13% load) and the memory use is below 3GB ( 2.75) out of 8GB. I don't think of those sub-systems are to blame. I was able to fold about 560k worth of points before any problems popped up ( I know thats not impressive, but it represent at least two weeks worth of error free folding).

It sounds like a VRAM issue. Maybe your GPU is failing or encountering some serious hardware issue. As a test, can you run some GPU benchmarks (http://www.techpowerup.com/downloads/Benchmarking/) and see if you spot any visual artifacts (http://www.playtool.com/pages/artifacts/artifacts.html) or if the system locks-up/crashes?
User avatar
PantherX
Site Moderator
 
Posts: 6321
Joined: Wed Dec 23, 2009 9:33 am

Re: Bad WU: 13000 (Run 952, Clone 0, Gen 22)

Postby pinetor » Fri Jun 06, 2014 11:36 pm

Bruce (bruce)
Again thanks for your patience with me. I suspect your are correct in that 13000 projects are taxing the GPU more any other task set to it. i actually have this WU back and have been running since I posted all this last night... and we have made it through the day!!! I am at 78% completion. The GPU is loaded at 98 to 99% and holding at 50 to 51C ( according to gpuz). Either the new drivers or the fan at constant 55% seems to be doing the trick.
I can rule out:
any OC ( CPU, RAM, or GPU)
Remote desktop
Sleep

however, I do have lots of the typical processes that might decide to break things.. synapse updates, java,AMD, qnap.

PS.. I never did SHA64 .. to late to get into BTC.. I did Scrypt, running up to 3 GPUs.. but I sold them all but the lowly 7850

Still I think the load is the issue.. either the driver could not handle it or jsut not enough cooling. I can bump the GPU fans up to 60% but... its just too nosiy.

Thanks again!
pinetor
 
Posts: 13
Joined: Fri Jun 06, 2014 12:10 am

Re: Rash of bad WU all 13000/13001

Postby pinetor » Fri Jun 06, 2014 11:38 pm

and stop my folding???? (grins)

so far so good today... fingers crossed.(new drivers/manual fan control)
pinetor
 
Posts: 13
Joined: Fri Jun 06, 2014 12:10 am

Re: Bad WU: 13001 (Run 532, Clone 3, Gen 4)

Postby 7im » Sat Jun 07, 2014 12:41 am

Depending on the model of your HD 7800, a lot of GPUs come factory overclocked these days. It's easy to miss unless looking for it.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
User avatar
7im
 
Posts: 14648
Joined: Thu Nov 29, 2007 4:30 pm
Location: Arizona

Re: Bad WU: 13000 (Run 952, Clone 0, Gen 22)

Postby bruce » Sat Jun 07, 2014 7:10 pm

You've reported several WUs that crashed in separate topics (which makes sense if the problem is associated with a specific WU). You have also opened a general topic about multiple p13000/13001 WUs. I'm going to merge them into a single topic, on the theory that they're all caused by the same sort of problem with your GPU hardware. The posts will be in chronological order so may appear to be intermixed, but at least all the answers that might be applicable might be in one place.
bruce
 
Posts: 22698
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Rash of bad WU all 13000/13001

Postby bollix47 » Wed Jul 09, 2014 9:39 am

Another folder was able to complete the WU successfully:

Hi xxxx (team xxxx),
Your WU (P13001 R486 C5 G15) was added to the stats database on 2014-06-24 04:03:47 for 17123 points of credit.
Image
bollix47
 
Posts: 3499
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Previous

Return to Issues with a specific WU

Who is online

Users browsing this forum: No registered users and 1 guest

cron