Failing units, low ppd, and returned units.

If you're new to FAH and need help getting started or you have very basic questions, start here.

Moderators: Site Moderators, FAHC Science Team

Scarlet-Tech
Posts: 37
Joined: Tue Nov 10, 2015 9:54 pm

Re: Failing units, low ppd, and returned units.

Post by Scarlet-Tech »

It would take me a while to look back.. I wonder what the 96xx projects are for research wise? If they are specific to say alzheimers, maybe we can designate our projects to fold for something else that way it will no pull the projects that fail. Let me go look at the project list and see what it returns. It may take a bit on my phone.

9637 Disease type: unspecified
9629 Disease type: unspecified.

B, can you designate your units to fold for Cancer specifically, and see if it picks those up?

It will be in the advanced settings menu.. I can't remember where. I will suggest this on our forums as well.
mmonnin
Posts: 324
Joined: Wed Dec 05, 2007 1:27 am

Re: Failing units, low ppd, and returned units.

Post by mmonnin »

Scarlet, have you made it home to check your logs for errors yet? I suggest HFM + dropbox to access the logs while away.

My own GTX970 sometimes gets Core 18 WUs that are like 1:30 TPF and at other times gets Core 21 WUs that are 4m TPF. That could easily account for the WU count change. There is a pretty good difference in PPD as well between the WU types.
Scarlet-Tech
Posts: 37
Joined: Tue Nov 10, 2015 9:54 pm

Re: Failing units, low ppd, and returned units.

Post by Scarlet-Tech »

mmonnin wrote:Scarlet, have you made it home to check your logs for errors yet? I suggest HFM + dropbox to access the logs while away.

My own GTX970 sometimes gets Core 18 WUs that are like 1:30 TPF and at other times gets Core 21 WUs that are 4m TPF. That could easily account for the WU count change. There is a pretty good difference in PPD as well between the WU types.
As frustrating as it is, I am still in Arizona. I get home Saturday night and will be setting up HFM and will be keep it in mind to use Dropbox as well.

I am going to try to kick up a laptop, and set up a connection so that I can remotely control my PC while away from now on. I didn't have the money right away, but I will have it right after I arrive home.
Ricky
Posts: 483
Joined: Sat Aug 01, 2015 1:34 am
Hardware configuration: 1. 2 each E5-2630 V3 processors, 64 GB RAM, GTX980SC GPU, and GTX980 GPU running on windows 8.1 operating system.
2. I7-6950X V3 processor, 32 GB RAM, 1 GTX980tiFTW, and 2 each GTX1080FTW GPUs running on windows 8.1 operating system.
Location: New Mexico

Re: Failing units, low ppd, and returned units.

Post by Ricky »

I don't believe I have had any 96XX projects run on my system. I have noted that almost all of my issues were on a factory overclocked GTX960. The GTX980 that I fold is bottom of the line, and it had no issues that I can recall.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Failing units, low ppd, and returned units.

Post by bruce »

It's my impression that
(1) Overclocking, even factory overclocking, increases the error rate for the projects we're talking about.
(2) Maxwell GPUs seem to be more prone to these errors. FAH has temporarily reduced assignments of some projects to Maxwell unless you use a client-type flag. I suppose that's why you're not getting 96xx projects.
(3) Development is working toward better solutions.
bcavnaugh
Posts: 147
Joined: Tue Apr 30, 2013 1:39 pm

Re: Failing units, low ppd, and returned units.

Post by bcavnaugh »

Third BAD_WORK_UNIT Core 21 project:9631 GTX 980HC

11:51:20:WU01:FS04:0x21:Completed 880000 out of 2000000 steps (44%)
11:52:46:WU01:FS04:0x21:Completed 900000 out of 2000000 steps (45%)
11:52:56:WU01:FS04:0x21:Bad State detected... attempting to resume from last good checkpoint
11:52:56:WU01:FS04:0x21:Max number of retries reached. Aborting.
11:52:56:WU01:FS04:0x21:ERROR:Max Retries Reached
11:52:56:WU01:FS04:0x21:Saving result file logfile_01.txt
11:52:56:WU01:FS04:0x21:Saving result file log.txt
11:52:56:WU01:FS04:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
11:52:57:WARNING:WU01:FS04:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
11:52:57:WU01:FS04:Sending unit results: id:01 state:SEND error:FAULTY project:9631 run:1 clone:77 gen:8 core:0x21 unit:0x0000000cab436c9b5609bee204fd8294
11:52:57:WU01:FS04:Uploading 13.00KiB to 171.67.108.155
11:52:57:WU01:FS04:Connecting to 171.67.108.155:8080
11:52:57:WU01:FS04:Upload complete
11:52:57:WU01:FS04:Server responded WORK_ACK (400)
11:52:57:WU01:FS04:Cleaning up
US Army Retired | Folding@EVGA The Number One Team in the Folding@Home Community.
bcavnaugh
Posts: 147
Joined: Tue Apr 30, 2013 1:39 pm

Re: Failing units, low ppd, and returned units.

Post by bcavnaugh »

bruce wrote:It's my impression that
(1) Overclocking, even factory overclocking, increases the error rate for the projects we're talking about.
(2) Maxwell GPUs seem to be more prone to these errors. FAH has temporarily reduced assignments of some projects to Maxwell unless you use a client-type flag. I suppose that's why you're not getting 96xx projects.
(3) Development is working toward better solutions.
I have Removed the client-type altogether on both 980 Rigs.
So now as the norm we will have to wait several hours to see what come down the pike on both Rigs.

No Flag and I get Beta Projects ZETA_DEV is under Core 18 And UNKNOWN_ENUM 21 P9704 (R11, C8, G109)

This is the next one to fail.
Project: 9641 Failed *OPENMM_21 21 P9641 this is not one of the UNKNOWN_ENUM Core
Forth BAD_WORK_UNIT Core 21 project:9641 GTX 980HB
Last edited by bcavnaugh on Wed Nov 18, 2015 3:52 pm, edited 4 times in total.
US Army Retired | Folding@EVGA The Number One Team in the Folding@Home Community.
bcavnaugh
Posts: 147
Joined: Tue Apr 30, 2013 1:39 pm

Re: Failing units, low ppd, and returned units.

Post by bcavnaugh »

Third BAD_WORK_UNIT Core 21 project:9631 GTX 980HC

11:51:20:WU01:FS04:0x21:Completed 880000 out of 2000000 steps (44%)
11:52:46:WU01:FS04:0x21:Completed 900000 out of 2000000 steps (45%)
11:52:56:WU01:FS04:0x21:Bad State detected... attempting to resume from last good checkpoint
11:52:56:WU01:FS04:0x21:Max number of retries reached. Aborting.
11:52:56:WU01:FS04:0x21:ERROR:Max Retries Reached
11:52:56:WU01:FS04:0x21:Saving result file logfile_01.txt
11:52:56:WU01:FS04:0x21:Saving result file log.txt
11:52:56:WU01:FS04:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
11:52:57:WARNING:WU01:FS04:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
11:52:57:WU01:FS04:Sending unit results: id:01 state:SEND error:FAULTY project:9631 run:1 clone:77 gen:8 core:0x21 unit:0x0000000cab436c9b5609bee204fd8294
11:52:57:WU01:FS04:Uploading 13.00KiB to 171.67.108.155
11:52:57:WU01:FS04:Connecting to 171.67.108.155:8080
11:52:57:WU01:FS04:Upload complete
11:52:57:WU01:FS04:Server responded WORK_ACK (400)
11:52:57:WU01:FS04:Cleaning up
US Army Retired | Folding@EVGA The Number One Team in the Folding@Home Community.
bcavnaugh
Posts: 147
Joined: Tue Apr 30, 2013 1:39 pm

Re: Failing units, low ppd, and returned units.

Post by bcavnaugh »

Forth BAD_WORK_UNIT Core 21 project:9641 GTX 980HB

15:30:29:WU02:FS00:0x21:Completed 780000 out of 2000000 steps (39%)
15:31:56:WU02:FS00:0x21:Completed 800000 out of 2000000 steps (40%)
15:32:05:WU02:FS00:0x21:Bad State detected... attempting to resume from last good checkpoint
15:32:05:WU02:FS00:0x21:Max number of retries reached. Aborting.
15:32:05:WU02:FS00:0x21:ERROR:Max Retries Reached
15:32:05:WU02:FS00:0x21:Saving result file logfile_01.txt
15:32:05:WU02:FS00:0x21:Saving result file log.txt
15:32:05:WU02:FS00:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
15:32:05:WARNING:WU02:FS00:FahCore returned: BAD_WORK_UNIT (114 = 0x72)
15:32:05:WU02:FS00:Sending unit results: id:02 state:SEND error:FAULTY project:9641 run:0 clone:37 gen:31 core:0x21 unit:0x0000002eab436c9b5609bee4be719abe
15:32:05:WU02:FS00:Uploading 12.50KiB to 171.67.108.155
15:32:05:WU02:FS00:Connecting to 171.67.108.155:8080
15:32:06:WU02:FS00:Upload complete
15:32:06:WU02:FS00:Server responded WORK_ACK (400)
US Army Retired | Folding@EVGA The Number One Team in the Folding@Home Community.
z999z3mystorys
Posts: 7
Joined: Mon Mar 18, 2013 3:19 pm

Re: Failing units, low ppd, and returned units.

Post by z999z3mystorys »

bruce wrote:It's my impression that
(1) Overclocking, even factory overclocking, increases the error rate for the projects we're talking about.
(2) Maxwell GPUs seem to be more prone to these errors. FAH has temporarily reduced assignments of some projects to Maxwell unless you use a client-type flag. I suppose that's why you're not getting 96xx projects.
(3) Development is working toward better solutions.
I'm managed to lower, but not stop failure rates from what I can tell. I used all the fixes I could find that suggested might help, downclocking my factory OC GTX 980 to reference speeds(1266 down to 1126), underclocking my Memory another 500mhz(1000mhz effective) for the p2 state that the projects run at (not sure why that is or what's making it do that instead of the memory running at full speed, but the core stays at full speed at least) and setting PhysX to CPU

I'm also running my client without any flags on it.

I'm glad that maxwell GPUs aren't being sent as many of those WU given that they seem to have trouble with it, til some resolution can be worked out.

Also glad to see that the development team is working towards better solutions, as underclocking isn't one of my most favorite solutions, as it slows things down of course, but doable til a better solution is found.
bcavnaugh
Posts: 147
Joined: Tue Apr 30, 2013 1:39 pm

Re: Failing units, low ppd, and returned units.

Post by bcavnaugh »

z999z3mystorys wrote:
bruce wrote:It's my impression that
(1) Overclocking, even factory overclocking, increases the error rate for the projects we're talking about.
(2) Maxwell GPUs seem to be more prone to these errors. FAH has temporarily reduced assignments of some projects to Maxwell unless you use a client-type flag. I suppose that's why you're not getting 96xx projects.
(3) Development is working toward better solutions.
I'm managed to lower, but not stop failure rates from what I can tell. I used all the fixes I could find that suggested might help, downclocking my factory OC GTX 980 to reference speeds(1266 down to 1126), underclocking my Memory another 500mhz(1000mhz effective) for the p2 state that the projects run at (not sure why that is or what's making it do that instead of the memory running at full speed, but the core stays at full speed at least) and setting PhysX to CPU

I'm also running my client without any flags on it.

I'm glad that maxwell GPUs aren't being sent as many of those WU given that they seem to have trouble with it, til some resolution can be worked out.

Also glad to see that the development team is working towards better solutions, as underclocking isn't one of my most favorite solutions, as it slows things down of course, but doable til a better solution is found.
This maybe somewhat true but not for the normal user.
Not all users use overclocking or downclocking software to set the Graphics Cards.
US Army Retired | Folding@EVGA The Number One Team in the Folding@Home Community.
bcavnaugh
Posts: 147
Joined: Tue Apr 30, 2013 1:39 pm

Re: Failing units, low ppd, and returned units.

Post by bcavnaugh »

Round 1 Log Files

https://docs.google.com/document/d/1kgN ... sp=sharing
https://docs.google.com/document/d/1i64 ... sp=sharing


My computers are going back the their default Overclocked Settings and setting the GPU back to 1500 MHz
US Army Retired | Folding@EVGA The Number One Team in the Folding@Home Community.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Failing units, low ppd, and returned units.

Post by bruce »

At this time, the GPU VRAM clock rate seems to be more important than the Core clock-rate.l
Scarlet-Tech
Posts: 37
Joined: Tue Nov 10, 2015 9:54 pm

Re: Failing units, low ppd, and returned units.

Post by Scarlet-Tech »

bruce wrote:At this time, the GPU VRAM clock rate seems to be more important than the Core clock-rate.l

Bcavnaugh is showing that the VRAM has been lowered from stock 7000mhz to 6000mhz and lower.. Well lower than stock speeds and even lower than last generation speeds.

The problems persist.

He lowered all clocks ridiculously low, removing any factory overclock and all memory clocks. This shows that although everyone isn't experiencing it, that lowering the clocks does not fix the issue in any way shape or form.

Since it is spread over multiple systems, and is usually 96xx series work units as well as some 97xx work units, maybe they just aren't compatible with Maxwell cards?

P.S. This isn't a witch hunt, as you stated before. This is fact spread over multiple systems with Core clocks lowered to stock and memory clocks lowered well below stock. The issue persists, and these guys are burning a lot of electricity trying to find a cute so Stanford can find more cures.
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Failing units, low ppd, and returned units.

Post by 7im »

Yes, The problem persists. Lowering the memory clocks was never a solution, simply a workaround that has helped some people finish more work units. This is while we wait for Stanford to revise and improve the core.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Post Reply