endless FAH core restart - Project: 9014 (Run 110, Clone 7,

Moderators: Site Moderators, PandeGroup

endless FAH core restart - Project: 9014 (Run 110, Clone 7,

Postby kkemball » Sat Aug 15, 2015 9:40 am

I've posted the message below on another section (problems specific WUs), as a reply to: " Project: 9017 (R 548, C 11, G 0) endless restart ".

What happens is that the 7.4.4 SMP client is suddenly giving the error message: FahCore returned: INTERRUPTED (102 = 0x66). It goes from "Ready" to "Running" interrupts loops back to "Ready" and starts the WU initialization all over again until it interrupts just before folding starts.

I'm getting exactly the same thing for >24 hrs. now on this machine (iMac 11.2) BUT a different WU: Project: 9014 (Run 110, Clone 7, Gen 79).

This Friday afternoon I started getting the same on a MacBook Pro with a different WU alltogether!

So now both machines are looping and the last WU completed was Friday at 3 PM.

Any Ideas?

Thanks,
Kevin


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~BEGIN~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Code: Select all
09:36:16:WU00:FS00:Started FahCore on PID 67215
09:36:16:Started thread 42 on PID 63998
09:36:16:WU00:FS00:Core PID:67216
09:36:16:WU00:FS00:FahCore 0xa4 started
09:36:17:WU00:FS00:0xa4:
09:36:17:WU00:FS00:0xa4:*------------------------------*
09:36:17:WU00:FS00:0xa4:Folding@Home Gromacs Core
09:36:17:WU00:FS00:0xa4:Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
09:36:17:WU00:FS00:0xa4:
09:36:17:WU00:FS00:0xa4:Preparing to commence simulation
09:36:17:WU00:FS00:0xa4:- Ensuring status. Please wait.
09:36:26:WU00:FS00:0xa4:- Looking at optimizations...
09:36:26:WU00:FS00:0xa4:- Working with standard loops on this execution.
09:36:26:WU00:FS00:0xa4:Examination of work files indicates 8 consecutive improper terminations of core.
09:36:26:WU00:FS00:0xa4:- Expanded 691942 -> 1302736 (decompressed 188.2 percent)
09:36:26:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=691942 data_size=1302736, decompressed_data_size=1302736 diff=0
09:36:26:WU00:FS00:0xa4:- Digital signature verified
09:36:26:WU00:FS00:0xa4:
09:36:26:WU00:FS00:0xa4:Project: 9014 (Run 110, Clone 7, Gen 79)
09:36:26:WU00:FS00:0xa4:
09:36:26:WU00:FS00:0xa4:Entering M.D.
09:36:32:WU00:FS00:0xa4:Mapping NT from 4 to 4
09:36:33:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
09:37:16:WU00:FS00:Starting
09:37:16:WU00:FS00:Removing old file './work/00/logfile_01-20150815-090556.txt'
09:37:16:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper "/Library/Application Support/FAHClient/cores/web.stanford.edu/~pande/OSX/AMD64/Core_a4.fah/FahCore_a4" -dir 00 -suffix 01 -version 704 -lifeline 63998 -checkpoint 20 -np 4
09:37:16:WU00:FS00:Started FahCore on PID 67301
09:37:16:Started thread 43 on PID 63998
09:37:16:WU00:FS00:Core PID:67302
09:37:16:WU00:FS00:FahCore 0xa4 started
09:37:17:WU00:FS00:0xa4:
09:37:17:WU00:FS00:0xa4:*------------------------------*
09:37:17:WU00:FS00:0xa4:Folding@Home Gromacs Core
09:37:17:WU00:FS00:0xa4:Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
09:37:17:WU00:FS00:0xa4:
09:37:17:WU00:FS00:0xa4:Preparing to commence simulation
09:37:17:WU00:FS00:0xa4:- Ensuring status. Please wait.
09:37:26:WU00:FS00:0xa4:- Looking at optimizations...
09:37:26:WU00:FS00:0xa4:- Working with standard loops on this execution.
09:37:26:WU00:FS00:0xa4:Examination of work files indicates 8 consecutive improper terminations of core.
09:37:26:WU00:FS00:0xa4:- Expanded 691942 -> 1302736 (decompressed 188.2 percent)
09:37:26:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=691942 data_size=1302736, decompressed_data_size=1302736 diff=0
09:37:26:WU00:FS00:0xa4:- Digital signature verified
09:37:26:WU00:FS00:0xa4:
09:37:26:WU00:FS00:0xa4:Project: 9014 (Run 110, Clone 7, Gen 79)
09:37:26:WU00:FS00:0xa4:
09:37:27:WU00:FS00:0xa4:Entering M.D.
09:37:33:WU00:FS00:0xa4:Mapping NT from 4 to 4
09:37:34:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
09:38:17:WU00:FS00:Starting
09:38:17:WU00:FS00:Removing old file './work/00/logfile_01-20150815-090613.txt'
09:38:17:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper "/Library/Application Support/FAHClient/cores/web.stanford.edu/~pande/OSX/AMD64/Core_a4.fah/FahCore_a4" -dir 00 -suffix 01 -version 704 -lifeline 63998 -checkpoint 20 -np 4
09:38:17:WU00:FS00:Started FahCore on PID 67385
09:38:17:Started thread 44 on PID 63998
09:38:17:WU00:FS00:Core PID:67386
09:38:17:WU00:FS00:FahCore 0xa4 started
09:38:17:WU00:FS00:0xa4:
09:38:17:WU00:FS00:0xa4:*------------------------------*
09:38:17:WU00:FS00:0xa4:Folding@Home Gromacs Core
09:38:17:WU00:FS00:0xa4:Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
09:38:17:WU00:FS00:0xa4:
09:38:17:WU00:FS00:0xa4:Preparing to commence simulation
09:38:17:WU00:FS00:0xa4:- Ensuring status. Please wait.
09:38:26:WU00:FS00:0xa4:- Looking at optimizations...
09:38:26:WU00:FS00:0xa4:- Working with standard loops on this execution.
09:38:26:WU00:FS00:0xa4:Examination of work files indicates 8 consecutive improper terminations of core.
09:38:26:WU00:FS00:0xa4:- Expanded 691942 -> 1302736 (decompressed 188.2 percent)
09:38:26:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=691942 data_size=1302736, decompressed_data_size=1302736 diff=0
09:38:26:WU00:FS00:0xa4:- Digital signature verified
09:38:26:WU00:FS00:0xa4:
09:38:26:WU00:FS00:0xa4:Project: 9014 (Run 110, Clone 7, Gen 79)
09:38:26:WU00:FS00:0xa4:
09:38:26:WU00:FS00:0xa4:Entering M.D.
09:38:32:WU00:FS00:0xa4:Mapping NT from 4 to 4
09:38:33:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~END~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Mod edit: Please use Code tags from the Full editor around log file postings
kkemball
 
Posts: 3
Joined: Fri Dec 07, 2012 5:10 pm

Re: endless FAH core restart - Project: 9014 (Run 110, Clone

Postby ChristianVirtual » Sat Aug 15, 2015 10:21 am

Can you please post form what server the WU got downloaded ?
ImageImage
Please contribute your logs to http://ppd.fahmm.net
User avatar
ChristianVirtual
 
Posts: 1507
Joined: Tue May 28, 2013 12:14 pm
Location: 日本 東京

Re: endless FAH core restart - Project: 9014 (Run 110, Clone

Postby bruce » Sat Aug 15, 2015 3:53 pm

They both come from 171.64.65.124.
bruce
 
Posts: 21416
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: endless FAH core restart - Project: 9014 (Run 110, Clone

Postby ChristianVirtual » Sat Aug 15, 2015 4:12 pm

Was wondering if it could become a server issue ... But I meanwhile got other working WU for the same server. Busted.
User avatar
ChristianVirtual
 
Posts: 1507
Joined: Tue May 28, 2013 12:14 pm
Location: 日本 東京

Re: endless FAH core restart - Project: 9014 (Run 110, Clone

Postby Joe_H » Sat Aug 15, 2015 4:16 pm

Could be just a coincidence that a number are all showing bad at the same time, but just in case I am sending a message to the WS manager.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Joe_H
Site Admin
 
Posts: 3891
Joined: Tue Apr 21, 2009 4:41 pm
Location: W. MA

Re: endless FAH core restart - Project: 9014 (Run 110, Clone

Postby sryckbos » Sat Aug 15, 2015 9:12 pm

Thanks for the heads up. I believe we've gotten to the source of this problem (vs the temporary fixes of the last few weeks). It should all be working now. Sorry about all the headache!

Steven
User avatar
sryckbos
Pande Group Member
 
Posts: 142
Joined: Wed Jun 26, 2013 10:23 pm

Re: endless FAH core restart - Project: 9014 (Run 110, Clone

Postby kkemball » Sun Aug 16, 2015 9:29 am

Thanks for checking!

I dumped the WUs on both machines and they DL'ed new units and started right up.

Cheers,
Kevin
kkemball
 
Posts: 3
Joined: Fri Dec 07, 2012 5:10 pm


Return to CPU Projects - released FAHCores _a4 & _a7

Who is online

Users browsing this forum: No registered users and 1 guest

cron