p9015 INTERRUPTED (102 = 0x66)

Moderators: Site Moderators, FAHC Science Team

Post Reply
parkut
Posts: 364
Joined: Tue Feb 12, 2008 7:33 am
Hardware configuration: Running exclusively Linux headless blades. All are dedicated crunching machines.
Location: SE Michigan, USA

p9015 INTERRUPTED (102 = 0x66)

Post by parkut »

Noticed this one failing to start, tried in excess of 28 times by the time I noticed it.

Model Name: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
OS: CentOS 5.11 Linux
Memory: 3.77GiB
Client Version: 7.4.4
Core: FahCore_a4.exe
Core Version: 2.27 (Dec. 15, 2010)

20:26:35:WU01:FS00:0xa4:Project: 9015 (Run 140, Clone 2, Gen 51)
20:26:35:WU01:FS00:0xa4:
20:26:35:WU01:FS00:0xa4:Entering M.D.
20:26:41:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)

Resolved by service FAHClient stop
cd /var/lib/fahclient
deleting entire contents of
work, logs and cores directory
restarting FAH
service FAHClient start
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: p9015 INTERRUPTED (102 = 0x66)

Post by bruce »

A Windows service has a choice about what to do if there's an error. I presume CentOS does, too, but I'm not sure how to configure it.

Do you want the service to stop after an error or restart FAHClient? That's not an easy question to answer.
parkut
Posts: 364
Joined: Tue Feb 12, 2008 7:33 am
Hardware configuration: Running exclusively Linux headless blades. All are dedicated crunching machines.
Location: SE Michigan, USA

Re: p9015 INTERRUPTED (102 = 0x66)

Post by parkut »

Simply restarting the client does not resolve the problem. The only way I was able to recover is to dump the WU as described above.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: p9015 INTERRUPTED (102 = 0x66)

Post by bruce »

Right. My question deals with the "next time" something unexpected happens.
iBozz
Posts: 89
Joined: Wed Nov 26, 2008 7:01 pm
Hardware configuration: iMac (Retina 5K, 27-inch, 2017), 3.8 GHz Quad-Core Intel Core i5, 64 GB 2400 MHz DDR4, 2TB HD running under macOS Catalina v10.15.7 (19G2021)
Location: NW England, UK

Re: p9015 INTERRUPTED (102 = 0x66)

Post by iBozz »

I've had a similar problem with Project: 9016 (Run 33, Clone 10, Gen 50) on an iMac under Yosemite.

Code: Select all

18:19:19:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
18:19:22:WU00:FS00:Starting
18:19:22:WU00:FS00:Removing old file './work/00/logfile_01-20150801-180158.txt'
18:19:22:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper "/Library/Application Support/FAHClient/cores/web.stanford.edu/~pande/OSX/AMD64/Core_a4.fah/FahCore_a4" -dir 00 -suffix 01 -version 704 -lifeline 932 -checkpoint 15 -np 8
18:19:22:WU00:FS00:Started FahCore on PID 936
18:19:22:WU00:FS00:Core PID:937
18:19:22:WU00:FS00:FahCore 0xa4 started
18:19:23:WU00:FS00:0xa4:
18:19:23:WU00:FS00:0xa4:*------------------------------*
18:19:23:WU00:FS00:0xa4:Folding@Home Gromacs Core
18:19:23:WU00:FS00:0xa4:Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
18:19:23:WU00:FS00:0xa4:
18:19:23:WU00:FS00:0xa4:Preparing to commence simulation
18:19:23:WU00:FS00:0xa4:- Ensuring status. Please wait.
18:19:32:WU00:FS00:0xa4:- Looking at optimizations...
18:19:32:WU00:FS00:0xa4:- Working with standard loops on this execution.
18:19:33:WU00:FS00:0xa4:- Previous termination of core was improper.
18:19:33:WU00:FS00:0xa4:- Going to use standard loops.
18:19:33:WU00:FS00:0xa4:- Files status OK
18:19:33:WU00:FS00:0xa4:- Expanded 180543 -> 716800 (decompressed 397.0 percent)
18:19:33:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=180543 data_size=716800, decompressed_data_size=716800 diff=0
18:19:33:WU00:FS00:0xa4:- Digital signature verified
18:19:34:WU00:FS00:0xa4:
18:19:34:WU00:FS00:0xa4:Project: 9016 (Run 33, Clone 10, Gen 50)
18:19:34:WU00:FS00:0xa4:
18:19:35:WU00:FS00:0xa4:Entering M.D.
18:19:41:WU00:FS00:0xa4:Mapping NT from 8 to 8 
18:19:41:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
18:20:22:WU00:FS00:Starting
18:20:22:WU00:FS00:Removing old file './work/00/logfile_01-20150801-180258.txt'
18:20:22:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper "/Library/Application Support/FAHClient/cores/web.stanford.edu/~pande/OSX/AMD64/Core_a4.fah/FahCore_a4" -dir 00 -suffix 01 -version 704 -lifeline 932 -checkpoint 15 -np 8
18:20:22:WU00:FS00:Started FahCore on PID 939
18:20:22:WU00:FS00:Core PID:940
18:20:22:WU00:FS00:FahCore 0xa4 started
18:20:23:WU00:FS00:0xa4:
18:20:23:WU00:FS00:0xa4:*------------------------------*
18:20:23:WU00:FS00:0xa4:Folding@Home Gromacs Core
18:20:23:WU00:FS00:0xa4:Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
18:20:23:WU00:FS00:0xa4:
18:20:23:WU00:FS00:0xa4:Preparing to commence simulation
18:20:23:WU00:FS00:0xa4:- Ensuring status. Please wait.
18:20:32:WU00:FS00:0xa4:- Looking at optimizations...
18:20:32:WU00:FS00:0xa4:- Working with standard loops on this execution.
18:20:33:WU00:FS00:0xa4:- Previous termination of core was improper.
18:20:33:WU00:FS00:0xa4:- Going to use standard loops.
18:20:33:WU00:FS00:0xa4:- Files status OK
18:20:33:WU00:FS00:0xa4:- Expanded 180543 -> 716800 (decompressed 397.0 percent)
18:20:33:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=180543 data_size=716800, decompressed_data_size=716800 diff=0
18:20:33:WU00:FS00:0xa4:- Digital signature verified
18:20:34:WU00:FS00:0xa4:
18:20:34:WU00:FS00:0xa4:Project: 9016 (Run 33, Clone 10, Gen 50)
18:20:34:WU00:FS00:0xa4:
18:20:35:WU00:FS00:0xa4:Entering M.D.
18:20:41:WU00:FS00:0xa4:Mapping NT from 8 to 8 
18:20:41:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
18:21:22:WU00:FS00:Starting
18:21:22:WU00:FS00:Removing old file './work/00/logfile_01-20150801-180358.txt'
18:21:22:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper "/Library/Application Support/FAHClient/cores/web.stanford.edu/~pande/OSX/AMD64/Core_a4.fah/FahCore_a4" -dir 00 -suffix 01 -version 704 -lifeline 932 -checkpoint 15 -np 8
18:21:22:WU00:FS00:Started FahCore on PID 944
18:21:22:WU00:FS00:Core PID:945
18:21:22:WU00:FS00:FahCore 0xa4 started
18:21:23:WU00:FS00:0xa4:
18:21:23:WU00:FS00:0xa4:*------------------------------*
18:21:23:WU00:FS00:0xa4:Folding@Home Gromacs Core
18:21:23:WU00:FS00:0xa4:Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
18:21:23:WU00:FS00:0xa4:
18:21:23:WU00:FS00:0xa4:Preparing to commence simulation
18:21:23:WU00:FS00:0xa4:- Ensuring status. Please wait.
18:21:32:WU00:FS00:0xa4:- Looking at optimizations...
18:21:32:WU00:FS00:0xa4:- Working with standard loops on this execution.
18:21:33:WU00:FS00:0xa4:- Previous termination of core was improper.
18:21:33:WU00:FS00:0xa4:- Going to use standard loops.
18:21:33:WU00:FS00:0xa4:- Files status OK
18:21:33:WU00:FS00:0xa4:- Expanded 180543 -> 716800 (decompressed 397.0 percent)
18:21:33:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=180543 data_size=716800, decompressed_data_size=716800 diff=0
18:21:33:WU00:FS00:0xa4:- Digital signature verified
18:21:33:WU00:FS00:0xa4:
18:21:33:WU00:FS00:0xa4:Project: 9016 (Run 33, Clone 10, Gen 50)
18:21:33:WU00:FS00:0xa4:
18:21:34:WU00:FS00:0xa4:Entering M.D.
18:21:40:WU00:FS00:0xa4:Mapping NT from 8 to 8 
18:21:41:FS00:Paused
18:21:41:FS00:Shutting core down
18:21:41:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
18:22:04:Removing old file 'configs/config-20140805-155531.xml'
18:22:05:Saving configuration to config.xml
18:22:05:<config>
18:22:05:  <!-- Network -->
18:22:05:  <proxy v=':8080'/>
18:22:05:
18:22:05:  <!-- Slot Control -->
18:22:05:  <power v='full'/>
18:22:05:
18:22:05:  <!-- User Information -->
18:22:05:  <passkey v='********************************'/>
18:22:05:  <team v='37761'/>
18:22:05:  <user v='iBozz'/>
18:22:05:
18:22:05:  <!-- Folding Slots -->
18:22:05:  <slot id='0' type='CPU'>
18:22:05:    <client-type v='advanced'/>
18:22:05:    <paused v='true'/>
18:22:05:  </slot>
18:22:05:</config>
18:22:39:FS00:Unpaused
18:22:39:WU00:FS00:Starting
18:22:39:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper "/Library/Application Support/FAHClient/cores/web.stanford.edu/~pande/OSX/AMD64/Core_a4.fah/FahCore_a4" -dir 00 -suffix 01 -version 704 -lifeline 932 -checkpoint 15 -np 8
18:22:39:WU00:FS00:Started FahCore on PID 954
18:22:39:WU00:FS00:Core PID:955
18:22:39:WU00:FS00:FahCore 0xa4 started
18:22:43:WARNING:WU00:FS00:FahCore returned: MISSING_WORK_FILES (116 = 0x74)
18:22:43:WARNING:WU00:FS00:Fatal error, dumping
18:22:44:WU00:FS00:Sending unit results: id:00 state:SEND error:DUMPED project:9016 run:33 clone:10 gen:50 core:0xa4 unit:0x00000049ab40417c554e9d242605f907
18:22:44:WARNING:WU00:FS00:Missing original Unit data, cannot send dump report
18:22:44:WU00:FS00:Cleaning up
18:22:45:WU00:FS00:Connecting to 171.67.108.200:8080
18:22:47:WU00:FS00:Assigned to work server 155.247.166.220
18:22:47:WU00:FS00:Requesting new work unit for slot 00: READY cpu:8 from 155.247.166.220
18:22:47:WU00:FS00:Connecting to 155.247.166.220:8080
18:22:48:WU00:FS00:Downloading 200.37KiB
18:22:48:WU00:FS00:Download complete
18:22:49:WU00:FS00:Received Unit: id:00 state:DOWNLOAD error:NO_ERROR project:6390 run:16 clone:8 gen:12 core:0xa4 unit:0x0000000d0002894c54175684dcd76f78
18:22:49:WU00:FS00:Starting
18:22:49:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper "/Library/Application Support/FAHClient/cores/web.stanford.edu/~pande/OSX/AMD64/Core_a4.fah/FahCore_a4" -dir 00 -suffix 01 -version 704 -lifeline 932 -checkpoint 15 -np 8
18:22:49:WU00:FS00:Started FahCore on PID 956
18:22:49:WU00:FS00:Core PID:957
18:22:49:WU00:FS00:FahCore 0xa4 started
18:22:49:WU00:FS00:0xa4:
18:22:49:WU00:FS00:0xa4:*------------------------------*
18:22:49:WU00:FS00:0xa4:Folding@Home Gromacs Core
18:22:49:WU00:FS00:0xa4:Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
18:22:49:WU00:FS00:0xa4:
18:22:49:WU00:FS00:0xa4:Preparing to commence simulation
18:22:49:WU00:FS00:0xa4:- Looking at optimizations...
18:22:49:WU00:FS00:0xa4:- Created dyn
18:22:49:WU00:FS00:0xa4:- Files status OK
18:22:49:WU00:FS00:0xa4:- Expanded 204671 -> 431956 (decompressed 211.0 percent)
18:22:49:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=204671 data_size=431956, decompressed_data_size=431956 diff=0
18:22:49:WU00:FS00:0xa4:- Digital signature verified
18:22:49:WU00:FS00:0xa4:
18:22:49:WU00:FS00:0xa4:Project: 6390 (Run 16, Clone 8, Gen 12)
18:22:49:WU00:FS00:0xa4:
18:22:49:WU00:FS00:0xa4:Assembly optimizations on if available.
18:22:49:WU00:FS00:0xa4:Entering M.D.
18:22:55:WU00:FS00:0xa4:Mapping NT from 8 to 8 
18:22:56:WU00:FS00:0xa4:Completed 0 out of 2500000 steps  (0%)
18:23:04:Removing old file 'configs/config-20140805-215020.xml'
18:23:04:Saving configuration to config.xml
18:23:04:<config>
18:23:04:  <!-- Network -->
18:23:04:  <proxy v=':8080'/>
18:23:04:
18:23:04:  <!-- Slot Control -->
18:23:04:  <power v='full'/>
18:23:04:
18:23:04:  <!-- User Information -->
18:23:04:  <passkey v='********************************'/>
18:23:04:  <team v='37761'/>
18:23:04:  <user v='iBozz'/>
18:23:04:
18:23:04:  <!-- Folding Slots -->
18:23:04:  <slot id='0' type='CPU'>
18:23:04:    <client-type v='advanced'/>
18:23:04:  </slot>
18:23:04:</config>
18:25:44:WU00:FS00:0xa4:Completed 25000 out of 2500000 steps  (1%)
18:28:36:WU00:FS00:0xa4:Completed 50000 out of 2500000 steps  (2%)
I've deleted the work unit and downloaded another (Project: 6390 (Run 16, Clone 8, Gen 12) which is working just fine.

I hope that the log helps someone and also that I have posted in the right place!
iMac (Retina 5K, 27-inch, 2017), 3.8 GHz Quad-Core Intel Core i5, 64 GB 2400 MHz DDR4, 2TB HD, macOS Catalina v10.15.7
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: p9015 INTERRUPTED (102 = 0x66)

Post by bruce »

Hmmm. Someone else attempted to fold Project: 9016 (Run 33, Clone 10, Gen 50) and got a failure message. Most likely it's a "bad WU"

Hi xxx (team xxx),
Your WU (P9016 R33 C10 G50) was added to the stats database on 2015-07-30 19:08:43 for 0 points of credit.
Post Reply