13827 bad work unit

Moderators: Site Moderators, FAHC Science Team

Post Reply
peterjammo
Posts: 90
Joined: Wed Mar 25, 2020 1:19 pm

13827 bad work unit

Post by peterjammo »

16:46:38:WU01:FS00:0xa7:Project: 13827 (Run 228, Clone 1, Gen 118)
16:46:38:WU01:FS00:0xa7:Unit: 0x0000008b80fccb095c9f844bebbc0337
16:46:38:WU01:FS00:0xa7:Digital signatures verified
16:46:38:WU01:FS00:0xa7:Calling: mdrun -s frame118.tpr -o frame118.trr -x frame118.xtc -cpi state.cpt -cpt 15 -nt 2
16:46:39:WU01:FS00:0xa7:Steps: first=14750000 total=125000
16:46:49:WU01:FS00:0xa7:Completed 28731 out of 125000 steps (22%)
16:47:01:WU01:FS00:0xa7:Completed 28750 out of 125000 steps (23%)
******************************* Date: 2020-03-29 *******************************
17:00:53:WU01:FS00:0xa7:Completed 30000 out of 125000 steps (24%)
17:15:11:WU01:FS00:0xa7:Completed 31250 out of 125000 steps (25%)
17:29:07:WU01:FS00:0xa7:Completed 32500 out of 125000 steps (26%)
17:43:06:WU01:FS00:0xa7:Completed 33750 out of 125000 steps (27%)
17:56:59:WU01:FS00:0xa7:Completed 35000 out of 125000 steps (28%)
18:10:56:WU01:FS00:0xa7:Completed 36250 out of 125000 steps (29%)
18:25:15:WU01:FS00:0xa7:Completed 37500 out of 125000 steps (30%)
18:39:26:WU01:FS00:0xa7:Completed 38750 out of 125000 steps (31%)
18:53:21:WU01:FS00:0xa7:Completed 40000 out of 125000 steps (32%)
19:07:15:WU01:FS00:0xa7:Completed 41250 out of 125000 steps (33%)
19:21:37:WU01:FS00:0xa7:Completed 42500 out of 125000 steps (34%)
19:35:35:WU01:FS00:0xa7:Completed 43750 out of 125000 steps (35%)
19:49:28:WU01:FS00:0xa7:Completed 45000 out of 125000 steps (36%)
20:03:23:WU01:FS00:0xa7:Completed 46250 out of 125000 steps (37%)
20:17:42:WU01:FS00:0xa7:Completed 47500 out of 125000 steps (38%)
20:31:43:WU01:FS00:0xa7:Completed 48750 out of 125000 steps (39%)
20:32:09:WU01:FS00:0xa7:ERROR:Guru Meditation #7d3e9559f44e7af8.b7c556aba1c8379f (389120.4665736) '01/01/state.cpt'
20:32:09:WU01:FS00:0xa7:WARNING:Unexpected exit() call
20:32:09:WU01:FS00:0xa7:WARNING:Unexpected exit from science code
20:32:09:WU01:FS00:0xa7:Saving result file ../logfile_01.txt
20:32:09:WU01:FS00:0xa7:Saving result file frame118.trr
20:32:09:WU01:FS00:0xa7:Saving result file frame118.xtc
20:32:09:WU01:FS00:0xa7:Saving result file md.log
20:32:09:WU01:FS00:0xa7:Saving result file science.log
20:32:10:WU01:FS00:0xa7:Folding@home Core Shutdown: BAD_WORK_UNIT
20:32:10:WU01:FS00:0xa7:ERROR:Exception: Core instance does not exists.
20:32:10:WU01:FS00:0xa7:ERROR:Exception: Core instance does not exists.
20:32:10:WU01:FS00:0xa7:ERROR:Exception: Core instance does not exists.
last error repaeted for pages same time stamp, then restarted and working on new wu also 13827
davidcoton
Posts: 1102
Joined: Wed Nov 05, 2008 3:19 pm
Location: Cambridge, UK

Re: 13827 bad work unit

Post by davidcoton »

Did it upload? If so the researchers will have the info to investigate.
Not sure whether it comes from a simulation problem not caught by the code, or a code error. Either way I think it's new on me. It may or may not be sufficiently important to investigate.
In any case the WU will be re-issued, if someone else completes it then it will be treated as a random error and not taken further.
Image
peterjammo
Posts: 90
Joined: Wed Mar 25, 2020 1:19 pm

Re: 13827 bad work unit

Post by peterjammo »

No, it didn't upload, but the cpu on the machine I was running died about an hour into the next wu,after I'd posted this. Fair chance it was my end.
davidcoton
Posts: 1102
Joined: Wed Nov 05, 2008 3:19 pm
Location: Cambridge, UK

Re: 13827 bad work unit

Post by davidcoton »

Thanks for reporting back -- yes that's a fair guess.
Image
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 13827 bad work unit

Post by bruce »

:Guru Meditation indicates that the filesystem has been corrupted. With that failure, a chkdisk / chkdsk might be able to correct the error ... but if it's a real hardware error, start with a new disk.

Another possiblity is that your AV scanner found the series of random coordinates in one of FAH's work files looked like a virus and it took liberties with FAH's work files. (False positive)
peterjammo
Posts: 90
Joined: Wed Mar 25, 2020 1:19 pm

Re: 13827 bad work unit

Post by peterjammo »

The CPU definitely died soon after. It was one which had never been able to run at full clock speed without regular thermal shutdowns. It had never been a problem (approx 2 yrs intermittent use as a spare/loan laptop) at 1.6GHz and that's where I was running it. After it shutdown yesterday, it wouldn't run long enough to let me change the clock speed. Changing it out for a slower (my last spare) cpu has had it running solid and fairly cool since last night. I'll run a disk check when it finishes the current unit. I don't run any live AV scanners as I still believe that Linux,run like I do - nothing installed except out of the official repos - doesn't need them.....
Post Reply