endless FAH core restart - Project: 9014 (Run 110, Clone 7,

Moderators: Site Moderators, FAHC Science Team

Post Reply
kkemball
Posts: 3
Joined: Fri Dec 07, 2012 5:10 pm

endless FAH core restart - Project: 9014 (Run 110, Clone 7,

Post by kkemball »

I've posted the message below on another section (problems specific WUs), as a reply to: " Project: 9017 (R 548, C 11, G 0) endless restart ".

What happens is that the 7.4.4 SMP client is suddenly giving the error message: FahCore returned: INTERRUPTED (102 = 0x66). It goes from "Ready" to "Running" interrupts loops back to "Ready" and starts the WU initialization all over again until it interrupts just before folding starts.
I'm getting exactly the same thing for >24 hrs. now on this machine (iMac 11.2) BUT a different WU: Project: 9014 (Run 110, Clone 7, Gen 79).

This Friday afternoon I started getting the same on a MacBook Pro with a different WU alltogether!

So now both machines are looping and the last WU completed was Friday at 3 PM.

Any Ideas?

Thanks,
Kevin
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~BEGIN~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Code: Select all

09:36:16:WU00:FS00:Started FahCore on PID 67215
09:36:16:Started thread 42 on PID 63998
09:36:16:WU00:FS00:Core PID:67216
09:36:16:WU00:FS00:FahCore 0xa4 started
09:36:17:WU00:FS00:0xa4:
09:36:17:WU00:FS00:0xa4:*------------------------------*
09:36:17:WU00:FS00:0xa4:Folding@Home Gromacs Core
09:36:17:WU00:FS00:0xa4:Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
09:36:17:WU00:FS00:0xa4:
09:36:17:WU00:FS00:0xa4:Preparing to commence simulation
09:36:17:WU00:FS00:0xa4:- Ensuring status. Please wait.
09:36:26:WU00:FS00:0xa4:- Looking at optimizations...
09:36:26:WU00:FS00:0xa4:- Working with standard loops on this execution.
09:36:26:WU00:FS00:0xa4:Examination of work files indicates 8 consecutive improper terminations of core.
09:36:26:WU00:FS00:0xa4:- Expanded 691942 -> 1302736 (decompressed 188.2 percent)
09:36:26:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=691942 data_size=1302736, decompressed_data_size=1302736 diff=0
09:36:26:WU00:FS00:0xa4:- Digital signature verified
09:36:26:WU00:FS00:0xa4:
09:36:26:WU00:FS00:0xa4:Project: 9014 (Run 110, Clone 7, Gen 79)
09:36:26:WU00:FS00:0xa4:
09:36:26:WU00:FS00:0xa4:Entering M.D.
09:36:32:WU00:FS00:0xa4:Mapping NT from 4 to 4 
09:36:33:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
09:37:16:WU00:FS00:Starting
09:37:16:WU00:FS00:Removing old file './work/00/logfile_01-20150815-090556.txt'
09:37:16:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper "/Library/Application Support/FAHClient/cores/web.stanford.edu/~pande/OSX/AMD64/Core_a4.fah/FahCore_a4" -dir 00 -suffix 01 -version 704 -lifeline 63998 -checkpoint 20 -np 4
09:37:16:WU00:FS00:Started FahCore on PID 67301
09:37:16:Started thread 43 on PID 63998
09:37:16:WU00:FS00:Core PID:67302
09:37:16:WU00:FS00:FahCore 0xa4 started
09:37:17:WU00:FS00:0xa4:
09:37:17:WU00:FS00:0xa4:*------------------------------*
09:37:17:WU00:FS00:0xa4:Folding@Home Gromacs Core
09:37:17:WU00:FS00:0xa4:Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
09:37:17:WU00:FS00:0xa4:
09:37:17:WU00:FS00:0xa4:Preparing to commence simulation
09:37:17:WU00:FS00:0xa4:- Ensuring status. Please wait.
09:37:26:WU00:FS00:0xa4:- Looking at optimizations...
09:37:26:WU00:FS00:0xa4:- Working with standard loops on this execution.
09:37:26:WU00:FS00:0xa4:Examination of work files indicates 8 consecutive improper terminations of core.
09:37:26:WU00:FS00:0xa4:- Expanded 691942 -> 1302736 (decompressed 188.2 percent)
09:37:26:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=691942 data_size=1302736, decompressed_data_size=1302736 diff=0
09:37:26:WU00:FS00:0xa4:- Digital signature verified
09:37:26:WU00:FS00:0xa4:
09:37:26:WU00:FS00:0xa4:Project: 9014 (Run 110, Clone 7, Gen 79)
09:37:26:WU00:FS00:0xa4:
09:37:27:WU00:FS00:0xa4:Entering M.D.
09:37:33:WU00:FS00:0xa4:Mapping NT from 4 to 4 
09:37:34:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
09:38:17:WU00:FS00:Starting
09:38:17:WU00:FS00:Removing old file './work/00/logfile_01-20150815-090613.txt'
09:38:17:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper "/Library/Application Support/FAHClient/cores/web.stanford.edu/~pande/OSX/AMD64/Core_a4.fah/FahCore_a4" -dir 00 -suffix 01 -version 704 -lifeline 63998 -checkpoint 20 -np 4
09:38:17:WU00:FS00:Started FahCore on PID 67385
09:38:17:Started thread 44 on PID 63998
09:38:17:WU00:FS00:Core PID:67386
09:38:17:WU00:FS00:FahCore 0xa4 started
09:38:17:WU00:FS00:0xa4:
09:38:17:WU00:FS00:0xa4:*------------------------------*
09:38:17:WU00:FS00:0xa4:Folding@Home Gromacs Core
09:38:17:WU00:FS00:0xa4:Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
09:38:17:WU00:FS00:0xa4:
09:38:17:WU00:FS00:0xa4:Preparing to commence simulation
09:38:17:WU00:FS00:0xa4:- Ensuring status. Please wait.
09:38:26:WU00:FS00:0xa4:- Looking at optimizations...
09:38:26:WU00:FS00:0xa4:- Working with standard loops on this execution.
09:38:26:WU00:FS00:0xa4:Examination of work files indicates 8 consecutive improper terminations of core.
09:38:26:WU00:FS00:0xa4:- Expanded 691942 -> 1302736 (decompressed 188.2 percent)
09:38:26:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=691942 data_size=1302736, decompressed_data_size=1302736 diff=0
09:38:26:WU00:FS00:0xa4:- Digital signature verified
09:38:26:WU00:FS00:0xa4:
09:38:26:WU00:FS00:0xa4:Project: 9014 (Run 110, Clone 7, Gen 79)
09:38:26:WU00:FS00:0xa4:
09:38:26:WU00:FS00:0xa4:Entering M.D.
09:38:32:WU00:FS00:0xa4:Mapping NT from 4 to 4 
09:38:33:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~END~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Mod edit: Please use Code tags from the Full editor around log file postings
ChristianVirtual
Posts: 1596
Joined: Tue May 28, 2013 12:14 pm
Location: Tokyo

Re: endless FAH core restart - Project: 9014 (Run 110, Clone

Post by ChristianVirtual »

Can you please post form what server the WU got downloaded ?
ImageImage
Please contribute your logs to http://ppd.fahmm.net
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: endless FAH core restart - Project: 9014 (Run 110, Clone

Post by bruce »

They both come from 171.64.65.124.
ChristianVirtual
Posts: 1596
Joined: Tue May 28, 2013 12:14 pm
Location: Tokyo

Re: endless FAH core restart - Project: 9014 (Run 110, Clone

Post by ChristianVirtual »

Was wondering if it could become a server issue ... But I meanwhile got other working WU for the same server. Busted.
ImageImage
Please contribute your logs to http://ppd.fahmm.net
Joe_H
Site Admin
Posts: 7854
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: endless FAH core restart - Project: 9014 (Run 110, Clone

Post by Joe_H »

Could be just a coincidence that a number are all showing bad at the same time, but just in case I am sending a message to the WS manager.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
sryckbos
Pande Group Member
Posts: 116
Joined: Wed Jun 26, 2013 10:23 pm

Re: endless FAH core restart - Project: 9014 (Run 110, Clone

Post by sryckbos »

Thanks for the heads up. I believe we've gotten to the source of this problem (vs the temporary fixes of the last few weeks). It should all be working now. Sorry about all the headache!

Steven
kkemball
Posts: 3
Joined: Fri Dec 07, 2012 5:10 pm

Re: endless FAH core restart - Project: 9014 (Run 110, Clone

Post by kkemball »

Thanks for checking!

I dumped the WUs on both machines and they DL'ed new units and started right up.

Cheers,
Kevin
Post Reply