Solutions to crashes on remote systems?
Posted: Wed Jan 25, 2017 2:34 am
I'm away from home for up to 8 days at a time, during which, my folding rigs turn my bedroom into a smelting furnace. This is generally not an issue. However, I have two problems that occasionally crop up that are rather annoying.
Problem 1:
While stable, we all get the occasional FAHCore crash. I've noticed that if I've left one of the computers on for a prolonged period of time without checking if anything has crashed, some or all of the GPU slots remain stuck at 0%, saying ready. This is fixed with a restart of the computer.
Problem 2:
This problem has only happened once, just now, but in case this happens again in the future, it would be useful to know if there are solutions. Currently, in my TeamViewer list, it shows one of my three folding rigs offline. Can't be an internet issue, otherwise the other systems would be shown offline as well. Since TV starts up automatically, I know the system did not restart either. The only two possible things I can think of are either that the system has frozen and/or got a BSOD, or something died. Since the former is more likely than the latter (and I certainly hope that is the case!), it will remain stuck in that state until I return and restart the computer myself.
So, regarding the first problem, is there any way to mitigate the issue? I know I could remote in more frequently to ensure that any loss of computing time is reduced as much as possible, but is there a better solution than that? I've seen some people talk about scripting automatic restarts every 24 hours, and I have a feeling that this might be what I'll have to settle for, but again, is there anything better? Can the root problem be solved, with the core crashes? Is there any reason why the client does not recover after a crash, and have the GPU slots stuck at 0%?
As for problem 2, I need to double check if auto restarts are disabled for BSODs, which might solve the issue (further investigations will need to be done when I get home obviously), but if it is a case of a system being frozen, but not in a BSOD state, can anything be done remotely, or would I be completely out of luck there?
Problem 1:
While stable, we all get the occasional FAHCore crash. I've noticed that if I've left one of the computers on for a prolonged period of time without checking if anything has crashed, some or all of the GPU slots remain stuck at 0%, saying ready. This is fixed with a restart of the computer.
Problem 2:
This problem has only happened once, just now, but in case this happens again in the future, it would be useful to know if there are solutions. Currently, in my TeamViewer list, it shows one of my three folding rigs offline. Can't be an internet issue, otherwise the other systems would be shown offline as well. Since TV starts up automatically, I know the system did not restart either. The only two possible things I can think of are either that the system has frozen and/or got a BSOD, or something died. Since the former is more likely than the latter (and I certainly hope that is the case!), it will remain stuck in that state until I return and restart the computer myself.
So, regarding the first problem, is there any way to mitigate the issue? I know I could remote in more frequently to ensure that any loss of computing time is reduced as much as possible, but is there a better solution than that? I've seen some people talk about scripting automatic restarts every 24 hours, and I have a feeling that this might be what I'll have to settle for, but again, is there anything better? Can the root problem be solved, with the core crashes? Is there any reason why the client does not recover after a crash, and have the GPU slots stuck at 0%?
As for problem 2, I need to double check if auto restarts are disabled for BSODs, which might solve the issue (further investigations will need to be done when I get home obviously), but if it is a case of a system being frozen, but not in a BSOD state, can anything be done remotely, or would I be completely out of luck there?