Page 1 of 1

endless FAH core restart - Project: 9014 (Run 110, Clone 7,

Posted: Sat Aug 15, 2015 9:40 am
by kkemball
I've posted the message below on another section (problems specific WUs), as a reply to: " Project: 9017 (R 548, C 11, G 0) endless restart ".

What happens is that the 7.4.4 SMP client is suddenly giving the error message: FahCore returned: INTERRUPTED (102 = 0x66). It goes from "Ready" to "Running" interrupts loops back to "Ready" and starts the WU initialization all over again until it interrupts just before folding starts.
I'm getting exactly the same thing for >24 hrs. now on this machine (iMac 11.2) BUT a different WU: Project: 9014 (Run 110, Clone 7, Gen 79).

This Friday afternoon I started getting the same on a MacBook Pro with a different WU alltogether!

So now both machines are looping and the last WU completed was Friday at 3 PM.

Any Ideas?

Thanks,
Kevin
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~BEGIN~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Code: Select all

09:36:16:WU00:FS00:Started FahCore on PID 67215
09:36:16:Started thread 42 on PID 63998
09:36:16:WU00:FS00:Core PID:67216
09:36:16:WU00:FS00:FahCore 0xa4 started
09:36:17:WU00:FS00:0xa4:
09:36:17:WU00:FS00:0xa4:*------------------------------*
09:36:17:WU00:FS00:0xa4:Folding@Home Gromacs Core
09:36:17:WU00:FS00:0xa4:Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
09:36:17:WU00:FS00:0xa4:
09:36:17:WU00:FS00:0xa4:Preparing to commence simulation
09:36:17:WU00:FS00:0xa4:- Ensuring status. Please wait.
09:36:26:WU00:FS00:0xa4:- Looking at optimizations...
09:36:26:WU00:FS00:0xa4:- Working with standard loops on this execution.
09:36:26:WU00:FS00:0xa4:Examination of work files indicates 8 consecutive improper terminations of core.
09:36:26:WU00:FS00:0xa4:- Expanded 691942 -> 1302736 (decompressed 188.2 percent)
09:36:26:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=691942 data_size=1302736, decompressed_data_size=1302736 diff=0
09:36:26:WU00:FS00:0xa4:- Digital signature verified
09:36:26:WU00:FS00:0xa4:
09:36:26:WU00:FS00:0xa4:Project: 9014 (Run 110, Clone 7, Gen 79)
09:36:26:WU00:FS00:0xa4:
09:36:26:WU00:FS00:0xa4:Entering M.D.
09:36:32:WU00:FS00:0xa4:Mapping NT from 4 to 4 
09:36:33:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
09:37:16:WU00:FS00:Starting
09:37:16:WU00:FS00:Removing old file './work/00/logfile_01-20150815-090556.txt'
09:37:16:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper "/Library/Application Support/FAHClient/cores/web.stanford.edu/~pande/OSX/AMD64/Core_a4.fah/FahCore_a4" -dir 00 -suffix 01 -version 704 -lifeline 63998 -checkpoint 20 -np 4
09:37:16:WU00:FS00:Started FahCore on PID 67301
09:37:16:Started thread 43 on PID 63998
09:37:16:WU00:FS00:Core PID:67302
09:37:16:WU00:FS00:FahCore 0xa4 started
09:37:17:WU00:FS00:0xa4:
09:37:17:WU00:FS00:0xa4:*------------------------------*
09:37:17:WU00:FS00:0xa4:Folding@Home Gromacs Core
09:37:17:WU00:FS00:0xa4:Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
09:37:17:WU00:FS00:0xa4:
09:37:17:WU00:FS00:0xa4:Preparing to commence simulation
09:37:17:WU00:FS00:0xa4:- Ensuring status. Please wait.
09:37:26:WU00:FS00:0xa4:- Looking at optimizations...
09:37:26:WU00:FS00:0xa4:- Working with standard loops on this execution.
09:37:26:WU00:FS00:0xa4:Examination of work files indicates 8 consecutive improper terminations of core.
09:37:26:WU00:FS00:0xa4:- Expanded 691942 -> 1302736 (decompressed 188.2 percent)
09:37:26:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=691942 data_size=1302736, decompressed_data_size=1302736 diff=0
09:37:26:WU00:FS00:0xa4:- Digital signature verified
09:37:26:WU00:FS00:0xa4:
09:37:26:WU00:FS00:0xa4:Project: 9014 (Run 110, Clone 7, Gen 79)
09:37:26:WU00:FS00:0xa4:
09:37:27:WU00:FS00:0xa4:Entering M.D.
09:37:33:WU00:FS00:0xa4:Mapping NT from 4 to 4 
09:37:34:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
09:38:17:WU00:FS00:Starting
09:38:17:WU00:FS00:Removing old file './work/00/logfile_01-20150815-090613.txt'
09:38:17:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper "/Library/Application Support/FAHClient/cores/web.stanford.edu/~pande/OSX/AMD64/Core_a4.fah/FahCore_a4" -dir 00 -suffix 01 -version 704 -lifeline 63998 -checkpoint 20 -np 4
09:38:17:WU00:FS00:Started FahCore on PID 67385
09:38:17:Started thread 44 on PID 63998
09:38:17:WU00:FS00:Core PID:67386
09:38:17:WU00:FS00:FahCore 0xa4 started
09:38:17:WU00:FS00:0xa4:
09:38:17:WU00:FS00:0xa4:*------------------------------*
09:38:17:WU00:FS00:0xa4:Folding@Home Gromacs Core
09:38:17:WU00:FS00:0xa4:Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
09:38:17:WU00:FS00:0xa4:
09:38:17:WU00:FS00:0xa4:Preparing to commence simulation
09:38:17:WU00:FS00:0xa4:- Ensuring status. Please wait.
09:38:26:WU00:FS00:0xa4:- Looking at optimizations...
09:38:26:WU00:FS00:0xa4:- Working with standard loops on this execution.
09:38:26:WU00:FS00:0xa4:Examination of work files indicates 8 consecutive improper terminations of core.
09:38:26:WU00:FS00:0xa4:- Expanded 691942 -> 1302736 (decompressed 188.2 percent)
09:38:26:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=691942 data_size=1302736, decompressed_data_size=1302736 diff=0
09:38:26:WU00:FS00:0xa4:- Digital signature verified
09:38:26:WU00:FS00:0xa4:
09:38:26:WU00:FS00:0xa4:Project: 9014 (Run 110, Clone 7, Gen 79)
09:38:26:WU00:FS00:0xa4:
09:38:26:WU00:FS00:0xa4:Entering M.D.
09:38:32:WU00:FS00:0xa4:Mapping NT from 4 to 4 
09:38:33:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~END~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Mod edit: Please use Code tags from the Full editor around log file postings

Re: endless FAH core restart - Project: 9014 (Run 110, Clone

Posted: Sat Aug 15, 2015 10:21 am
by ChristianVirtual
Can you please post form what server the WU got downloaded ?

Re: endless FAH core restart - Project: 9014 (Run 110, Clone

Posted: Sat Aug 15, 2015 3:53 pm
by bruce
They both come from 171.64.65.124.

Re: endless FAH core restart - Project: 9014 (Run 110, Clone

Posted: Sat Aug 15, 2015 4:12 pm
by ChristianVirtual
Was wondering if it could become a server issue ... But I meanwhile got other working WU for the same server. Busted.

Re: endless FAH core restart - Project: 9014 (Run 110, Clone

Posted: Sat Aug 15, 2015 4:16 pm
by Joe_H
Could be just a coincidence that a number are all showing bad at the same time, but just in case I am sending a message to the WS manager.

Re: endless FAH core restart - Project: 9014 (Run 110, Clone

Posted: Sat Aug 15, 2015 9:12 pm
by sryckbos
Thanks for the heads up. I believe we've gotten to the source of this problem (vs the temporary fixes of the last few weeks). It should all be working now. Sorry about all the headache!

Steven

Re: endless FAH core restart - Project: 9014 (Run 110, Clone

Posted: Sun Aug 16, 2015 9:29 am
by kkemball
Thanks for checking!

I dumped the WUs on both machines and they DL'ed new units and started right up.

Cheers,
Kevin