Bad State Detected

Moderators: Site Moderators, FAHC Science Team

Bad State Detected

Postby iceman1992 » Mon Apr 27, 2020 8:10 am

Is this just system instability? Or something with the WU?
Seems to continue just fine, now at 91%
Update: now has been uploaded, WORK_ACK.

Code: Select all
05:56:58:WU00:FS01:0x22:Completed 3600000 out of 8000000 steps (45%)
05:58:31:WU00:FS01:0x22:Completed 3680000 out of 8000000 steps (46%)
06:00:04:WU00:FS01:0x22:Completed 3760000 out of 8000000 steps (47%)
06:00:40:WU00:FS01:0x22:Bad State detected... attempting to resume from last good checkpoint. Is your system overclocked?
06:00:40:WU00:FS01:0x22:Following exception occured: Force RMSE error of 8.13081 with threshold of 5
06:01:50:WU00:FS01:0x22:Completed 3840000 out of 8000000 steps (48%)
06:03:24:WU00:FS01:0x22:Completed 3920000 out of 8000000 steps (49%)
06:04:57:WU00:FS01:0x22:Completed 4000000 out of 8000000 steps (50%)
06:06:30:WU00:FS01:0x22:Completed 4080000 out of 8000000 steps (51%)
iceman1992
 
Posts: 527
Joined: Fri Mar 23, 2012 6:16 pm

Re: Bad State Detected

Postby PantherX » Mon Apr 27, 2020 8:37 am

It's a tough one... it could be a WU that's almost bad but not there yet (potentially the next one could be a bad WU). It could also be hardware related. If you haven't seen this message before and you have been folding for a while, then it is safe to ignore it. If you commonly see that message, then it could be related to your hardware.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time

Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
User avatar
PantherX
Site Moderator
 
Posts: 6322
Joined: Wed Dec 23, 2009 10:33 am
Location: Land Of The Long White Cloud

Re: Bad State Detected

Postby iceman1992 » Mon Apr 27, 2020 9:21 am

Okay, I'll ignore it. I don't remember ever seeing that before. Just thought I'd post in case the researchers need to be notified.
iceman1992
 
Posts: 527
Joined: Fri Mar 23, 2012 6:16 pm

Re: Bad State Detected

Postby Joe_H » Mon Apr 27, 2020 3:58 pm

Could be that this particular project/WU pushed your GPU a bit harder, and that was enough to cause an error. If you see this again and you have an overclock factory or otherwise, try reducing it by a little.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Joe_H
Site Admin
 
Posts: 6426
Joined: Tue Apr 21, 2009 5:41 pm
Location: W. MA

Re: Bad State Detected

Postby iceman1992 » Mon Apr 27, 2020 4:15 pm

Joe_H wrote:Could be that this particular project/WU pushed your GPU a bit harder, and that was enough to cause an error. If you see this again and you have an overclock factory or otherwise, try reducing it by a little.

Okay. Not an overclock, but a slight undervolt, GTX1660 1860MHz at 981mV. It hasn't happened again though, will keep monitoring.
iceman1992
 
Posts: 527
Joined: Fri Mar 23, 2012 6:16 pm

Re: Bad State Detected

Postby bruce » Sat Jul 04, 2020 8:13 am

Every project has a range of acceptable errors. In this case, the value 5 was considered acceptable.based on the project's owner's estimate. He guessed wrong.

He could have allowed 10 and might have gotten good answers. He could have reconfigured the project to run with mixed precision and the sum would have been run in FP64 calculations and the error would have been much smaller and the project would run somewhat slower.

A lot depends on how many atoms contribute to that error sum.
bruce
 
Posts: 19636
Joined: Thu Nov 29, 2007 11:13 pm
Location: So. Cal.


Return to Discussions of General-FAH topics

Who is online

Users browsing this forum: Google [Bot], JimboPalmer and 2 guests

cron