Project: 9017 (R 548, C 11, G 0) endless restart

Moderators: Site Moderators, FAHC Science Team

Post Reply
ChristianVirtual
Posts: 1596
Joined: Tue May 28, 2013 12:14 pm
Location: Tokyo

Project: 9017 (R 548, C 11, G 0) endless restart

Post by ChristianVirtual »

On a EC2 instance running as CPU:16 under Ubuntu I got this one this morning. WU downloaded and try to start and fail immediately.

FahCore returned: INTERRUPTED (102 = 0x66)

Already known as bad WU ?

Code: Select all

00:00:54:Assigned to work server 171.64.65.124
00:00:54:Requesting new work unit for slot 00: RUNNING cpu:16 from 171.64.65.124
00:00:54:Connecting to 171.64.65.124:8080
00:00:54:Downloading 55.77KiB
00:00:54:Download complete
00:00:54:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:9017 run:548 clone:11 gen:0 core unit:0x00000000ab40417c55b26f88a3d3a766
00:01:55:Starting
00:01:55:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/beta/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 704 -lifeline 1742 -checkpoint 15 -np 16
00:01:55:Started FahCore on PID 5220
00:01:55:Core PID:5224
00:01:55:FahCore 0xa4 started
00:01:55:
00:01:55:*------------------------------*
00:01:55:Folding@Home Gromacs GB Core
00:01:55:Version 2.27 (Dec. 15, 2010)
00:01:55:
00:01:55:Preparing to commence simulation
00:01:55:- Looking at optimizations...
00:01:55:- Created dyn
00:01:55:- Files status OK
00:01:55:- Expanded 56596 -> 266240 (decompressed 470.4 percent)
00:01:55:Called DecompressByteArray: compressed_data_size=56596 data_size=266240, decompressed_data_size=266240 diff=0
00:01:55:- Digital signature verified
00:01:55:
00:01:55:Project: 9017 (Run 548, Clone 11, Gen 0)
00:01:55:
00:01:55:Assembly optimizations on if available.
00:01:55:Entering M.D.
00:02:01:FahCore returned: INTERRUPTED (102 = 0x66)
00:02:02:Starting
00:02:02:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/beta/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 704 -lifeline 1742 -checkpoint 15 -np 16
00:02:02:Started FahCore on PID 5229
00:02:02:Core PID:5233
00:02:02:FahCore 0xa4 started
00:02:02:
00:02:02:*------------------------------*
00:02:02:Folding@Home Gromacs GB Core
00:02:02:Version 2.27 (Dec. 15, 2010)
00:02:02:
00:02:02:Preparing to commence simulation
00:02:02:- Ensuring status. Please wait.
00:02:11:- Looking at optimizations...
00:02:11:- Working with standard loops on this execution.
00:02:11:- Previous termination of core was improper.
00:02:11:- Files status OK
00:02:11:- Expanded 56596 -> 266240 (decompressed 470.4 percent)
00:02:11:Called DecompressByteArray: compressed_data_size=56596 data_size=266240, decompressed_data_size=266240 diff=0
00:02:11:- Digital signature verified
00:02:11:
00:02:11:Project: 9017 (Run 548, Clone 11, Gen 0)
00:02:11:
00:02:11:Entering M.D.
00:02:18:FahCore returned: INTERRUPTED (102 = 0x66)
00:03:02:Starting
00:03:02:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/beta/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 704 -lifeline 1742 -checkpoint 15 -np 16
00:03:02:Started FahCore on PID 5238
00:03:02:Core PID:5242
00:03:02:FahCore 0xa4 started
00:03:02:
00:03:02:*------------------------------*
00:03:02:Folding@Home Gromacs GB Core
00:03:02:Version 2.27 (Dec. 15, 2010)
00:03:02:
00:03:02:Preparing to commence simulation
00:03:02:- Ensuring status. Please wait.
00:03:11:- Looking at optimizations...
00:03:11:- Working with standard loops on this execution.
00:03:11:- Previous termination of core was improper.
00:03:11:- Going to use standard loops.
00:03:11:- Files status OK
00:03:11:- Expanded 56596 -> 266240 (decompressed 470.4 percent)
00:03:11:Called DecompressByteArray: compressed_data_size=56596 data_size=266240, decompressed_data_size=266240 diff=0
00:03:11:- Digital signature verified
00:03:11:
00:03:11:Project: 9017 (Run 548, Clone 11, Gen 0)
00:03:11:
00:03:11:Entering M.D.
00:03:18:FahCore returned: INTERRUPTED (102 = 0x66)
00:04:02:Starting
00:04:02:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/beta/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 704 -lifeline 1742 -checkpoint 15 -np 16
00:04:02:Started FahCore on PID 5247
00:04:02:Core PID:5251
00:04:02:FahCore 0xa4 started
00:04:02:
00:04:02:*------------------------------*
00:04:02:Folding@Home Gromacs GB Core
00:04:02:Version 2.27 (Dec. 15, 2010)
00:04:02:
00:04:02:Preparing to commence simulation
00:04:02:- Ensuring status. Please wait.
00:04:11:- Looking at optimizations...
00:04:11:- Working with standard loops on this execution.
00:04:11:- Previous termination of core was improper.
00:04:11:- Going to use standard loops.
00:04:11:- Files status OK
00:04:12:- Expanded 56596 -> 266240 (decompressed 470.4 percent)
00:04:12:Called DecompressByteArray: compressed_data_size=56596 data_size=266240, decompressed_data_size=266240 diff=0
00:04:12:- Digital signature verified
00:04:12:
00:04:12:Project: 9017 (Run 548, Clone 11, Gen 0)
00:04:12:
00:04:12:Entering M.D.
00:04:18:FahCore returned: INTERRUPTED (102 = 0x66)
00:05:02:Starting
00:05:02:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/beta/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 704 -lifeline 1742 -checkpoint 15 -np 16
00:05:02:Started FahCore on PID 5256
00:05:02:Core PID:5260
00:05:02:FahCore 0xa4 started
00:05:02:
00:05:02:*------------------------------*
00:05:02:Folding@Home Gromacs GB Core
00:05:02:Version 2.27 (Dec. 15, 2010)
00:05:02:
00:05:02:Preparing to commence simulation
00:05:02:- Ensuring status. Please wait.
00:05:11:- Looking at optimizations...
00:05:11:- Working with standard loops on this execution.
00:05:11:- Previous termination of core was improper.
00:05:11:- Going to use standard loops.
00:05:11:- Files status OK
00:05:12:- Expanded 56596 -> 266240 (decompressed 470.4 percent)
00:05:12:Called DecompressByteArray: compressed_data_size=56596 data_size=266240, decompressed_data_size=266240 diff=0
00:05:12:- Digital signature verified
00:05:12:
00:05:12:Project: 9017 (Run 548, Clone 11, Gen 0)
00:05:12:
00:05:12:Entering M.D.
00:05:18:FahCore returned: INTERRUPTED (102 = 0x66)
00:06:02:Starting
00:06:02:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/beta/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 704 -lifeline 1742 -checkpoint 15 -np 16
00:06:02:Started FahCore on PID 5267
00:06:02:Core PID:5271
00:06:02:FahCore 0xa4 started
00:06:02:
00:06:02:*------------------------------*
00:06:02:Folding@Home Gromacs GB Core
00:06:02:Version 2.27 (Dec. 15, 2010)
00:06:02:
00:06:02:Preparing to commence simulation
00:06:02:- Ensuring status. Please wait.
00:06:11:- Looking at optimizations...
00:06:11:- Working with standard loops on this execution.
00:06:11:- Previous termination of core was improper.
00:06:11:- Going to use standard loops.
00:06:11:- Files status OK
00:06:12:- Expanded 56596 -> 266240 (decompressed 470.4 percent)
00:06:12:Called DecompressByteArray: compressed_data_size=56596 data_size=266240, decompressed_data_size=266240 diff=0
00:06:12:- Digital signature verified
00:06:12:
00:06:12:Project: 9017 (Run 548, Clone 11, Gen 0)
00:06:12:
00:06:12:Entering M.D.
00:06:18:FahCore returned: INTERRUPTED (102 = 0x66)
00:07:02:Starting
00:07:02:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/beta/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 704 -lifeline 1742 -checkpoint 15 -np 16
00:07:02:Started FahCore on PID 5276
00:07:02:Core PID:5280
00:07:02:FahCore 0xa4 started
00:07:03:
00:07:03:*------------------------------*
00:07:03:Folding@Home Gromacs GB Core
00:07:03:Version 2.27 (Dec. 15, 2010)
00:07:03:
00:07:03:Preparing to commence simulation
00:07:03:- Ensuring status. Please wait.
00:07:12:- Looking at optimizations...
00:07:12:- Working with standard loops on this execution.
00:07:12:- Previous termination of core was improper.
00:07:12:- Going to use standard loops.
00:07:12:- Files status OK
00:07:12:- Expanded 56596 -> 266240 (decompressed 470.4 percent)
00:07:12:Called DecompressByteArray: compressed_data_size=56596 data_size=266240, decompressed_data_size=266240 diff=0
00:07:12:- Digital signature verified
00:07:12:
00:07:12:Project: 9017 (Run 548, Clone 11, Gen 0)
00:07:12:
00:07:12:Entering M.D.
00:07:18:FahCore returned: INTERRUPTED (102 = 0x66)
00:08:02:Starting
00:08:02:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/beta/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 704 -lifeline 1742 -checkpoint 15 -np 16
00:08:02:Started FahCore on PID 5285
00:08:02:Core PID:5289
00:08:02:FahCore 0xa4 started
00:08:03:
00:08:03:*------------------------------*
00:08:03:Folding@Home Gromacs GB Core
00:08:03:Version 2.27 (Dec. 15, 2010)
00:08:03:
00:08:03:Preparing to commence simulation
00:08:03:- Ensuring status. Please wait.
00:08:12:- Looking at optimizations...
00:08:12:- Working with standard loops on this execution.
00:08:12:- Previous termination of core was improper.
00:08:12:- Going to use standard loops.
00:08:12:- Files status OK
00:08:12:- Expanded 56596 -> 266240 (decompressed 470.4 percent)
00:08:12:Called DecompressByteArray: compressed_data_size=56596 data_size=266240, decompressed_data_size=266240 diff=0
00:08:12:- Digital signature verified
00:08:12:
00:08:12:Project: 9017 (Run 548, Clone 11, Gen 0)
00:08:12:
00:08:12:Entering M.D.
00:08:18:FahCore returned: INTERRUPTED (102 = 0x66)
00:09:02:Starting
00:09:02:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/beta/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 704 -lifeline 1742 -checkpoint 15 -np 16
00:09:02:Started FahCore on PID 5294
00:09:02:Core PID:5298
00:09:02:FahCore 0xa4 started
00:09:03:
00:09:03:*------------------------------*
00:09:03:Folding@Home Gromacs GB Core
00:09:03:Version 2.27 (Dec. 15, 2010)
00:09:03:
00:09:03:Preparing to commence simulation
00:09:03:- Ensuring status. Please wait.
00:09:12:- Looking at optimizations...
00:09:12:- Working with standard loops on this execution.
00:09:12:Examination of work files indicates 8 consecutive improper terminations of core.
00:09:12:- Expanded 56596 -> 266240 (decompressed 470.4 percent)
00:09:12:Called DecompressByteArray: compressed_data_size=56596 data_size=266240, decompressed_data_size=266240 diff=0
00:09:12:- Digital signature verified
00:09:12:
00:09:12:Project: 9017 (Run 548, Clone 11, Gen 0)
00:09:12:
00:09:12:Entering M.D.
00:09:18:FahCore returned: INTERRUPTED (102 = 0x66)
00:10:02:Starting
00:10:02:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/beta/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 704 -lifeline 1742 -checkpoint 15 -np 16
00:10:02:Started FahCore on PID 5303
00:10:02:Core PID:5307
00:10:02:FahCore 0xa4 started
00:10:03:
00:10:03:*------------------------------*
00:10:03:Folding@Home Gromacs GB Core
00:10:03:Version 2.27 (Dec. 15, 2010)
00:10:03:
00:10:03:Preparing to commence simulation
00:10:03:- Ensuring status. Please wait.
00:10:12:- Looking at optimizations...
00:10:12:- Working with standard loops on this execution.
00:10:12:Examination of work files indicates 8 consecutive improper terminations of core.
00:10:12:- Expanded 56596 -> 266240 (decompressed 470.4 percent)
00:10:12:Called DecompressByteArray: compressed_data_size=56596 data_size=266240, decompressed_data_size=266240 diff=0
00:10:12:- Digital signature verified
00:10:12:
00:10:12:Project: 9017 (Run 548, Clone 11, Gen 0)
00:10:12:
00:10:12:Entering M.D.
00:10:18:FahCore returned: INTERRUPTED (102 = 0x66)
00:11:02:Starting
00:11:02:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/beta/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 704 -lifeline 1742 -checkpoint 15 -np 16
00:11:02:Started FahCore on PID 5312
00:11:02:Core PID:5316
00:11:02:FahCore 0xa4 started
00:11:03:
00:11:03:*------------------------------*
00:11:03:Folding@Home Gromacs GB Core
00:11:03:Version 2.27 (Dec. 15, 2010)
00:11:03:
00:11:03:Preparing to commence simulation
00:11:03:- Ensuring status. Please wait.
00:11:12:- Looking at optimizations...
00:11:12:- Working with standard loops on this execution.
00:11:12:Examination of work files indicates 8 consecutive improper terminations of core.
00:11:12:- Expanded 56596 -> 266240 (decompressed 470.4 percent)
00:11:12:Called DecompressByteArray: compressed_data_size=56596 data_size=266240, decompressed_data_size=266240 diff=0
00:11:12:- Digital signature verified
00:11:12:
00:11:12:Project: 9017 (Run 548, Clone 11, Gen 0)
00:11:12:
00:11:12:Entering M.D.
00:11:18:FahCore returned: INTERRUPTED (102 = 0x66)
00:11:48:Paused
00:12:24:Saving configuration to /etc/fahclient/config.xml
00:12:24:<config>
00:12:24: <!-- Folding Slot Configuration -->
00:12:24: <client-type v='beta'/>
00:12:24: <gpu v='false'/>
00:12:24:
00:12:24: <!-- Network -->
00:12:24: <proxy v=':8080'/>
00:12:24:
00:12:24: <!-- Slot Control -->
00:12:24: <pause-on-start v='true'/>
00:12:24: <power v='full'/>
00:12:24:
00:12:24: <!-- User Information -->
00:12:24: <team v='33'/>
00:12:24: <user v='ChristianVirtual'/>
00:12:24:
00:12:24: <!-- Folding Slots -->
00:12:24: <slot id='0' type='CPU'>
00:12:24: <paused v='true'/>
00:12:24: </slot>
00:12:24:</config>
00:12:38:Unpaused
00:12:38:Starting
00:12:38:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/beta/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 704 -lifeline 1742 -checkpoint 15 -np 16
00:12:38:Started FahCore on PID 5321
00:12:38:Core PID:5325
00:12:38:FahCore 0xa4 started
00:12:38:
00:12:38:*------------------------------*
00:12:38:Folding@Home Gromacs GB Core
00:12:38:Version 2.27 (Dec. 15, 2010)
00:12:38:
00:12:38:Preparing to commence simulation
00:12:38:- Ensuring status. Please wait.
00:12:47:- Looking at optimizations...
00:12:47:- Working with standard loops on this execution.
00:12:47:Examination of work files indicates 8 consecutive improper terminations of core.
00:12:48:- Expanded 56596 -> 266240 (decompressed 470.4 percent)
00:12:48:Called DecompressByteArray: compressed_data_size=56596 data_size=266240, decompressed_data_size=266240 diff=0
00:12:48:- Digital signature verified
00:12:48:
00:12:48:Project: 9017 (Run 548, Clone 11, Gen 0)
00:12:48:
00:12:48:Entering M.D.
00:12:54:FahCore returned: INTERRUPTED (102 = 0x66)
00:13:25:Saving configuration to /etc/fahclient/config.xml
00:13:25:<config>
00:13:25: <!-- Folding Slot Configuration -->
00:13:25: <client-type v='beta'/>
00:13:25: <gpu v='false'/>
00:13:25:
00:13:25: <!-- Network -->
00:13:25: <proxy v=':8080'/>
00:13:25:
00:13:25:
00:13:25: <!-- Slot Control -->
00:13:25: <pause-on-start v='true'/>
00:13:25: <power v='full'/>
00:13:25:
00:13:25: <!-- User Information -->
00:13:25: <team v='33'/>
00:13:25: <user v='ChristianVirtual'/>
00:13:25:
00:13:25: <!-- Folding Slots -->
00:13:25: <slot id='0' type='CPU'/>
00:13:25:</config>
00:13:38:Starting
00:13:38:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/beta/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 704 -lifeline 1742 -checkpoint 15 -np 16
00:13:38:Started FahCore on PID 5330
00:13:38:Core PID:5334
00:13:38:FahCore 0xa4 started
00:13:38:
00:13:38:*------------------------------*
00:13:38:Folding@Home Gromacs GB Core
00:13:38:Version 2.27 (Dec. 15, 2010)
00:13:38:
00:13:38:Preparing to commence simulation
00:13:38:- Ensuring status. Please wait.
00:13:47:- Looking at optimizations...
00:13:47:- Working with standard loops on this execution.
00:13:47:Examination of work files indicates 8 consecutive improper terminations of core.
00:13:48:- Expanded 56596 -> 266240 (decompressed 470.4 percent)
00:13:48:Called DecompressByteArray: compressed_data_size=56596 data_size=266240, decompressed_data_size=266240 diff=0
00:13:48:- Digital signature verified
00:13:48:
00:13:48:Project: 9017 (Run 548, Clone 11, Gen 0)
00:13:48:
00:13:48:Entering M.D.
00:13:54:FahCore returned: INTERRUPTED (102 = 0x66)
ImageImage
Please contribute your logs to http://ppd.fahmm.net
Joe_H
Site Admin
Posts: 7870
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Project: 9017 (R 548, C 11, G 0) endless restart

Post by Joe_H »

No reports in the database so far for this WU. The download size for this WU is small though in comparison to ones my systems have received from this project.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
kkemball
Posts: 3
Joined: Fri Dec 07, 2012 5:10 pm

Re: Project: 9017 (R 548, C 11, G 0) endless restart

Post by kkemball »

I'm getting exactly the same thing for >24 hrs. now on this machine (iMac 11.2) BUT a different WU: Project: 9014 (Run 110, Clone 7, Gen 79).

This Friday afternoon I started getting the same on a MacBook Pro with a different WU alltogether!

So now both machines are looping and the last WU completed was Friday at 3 PM.

Any Ideas?

Thanks,
Kevin
ChristianVirtual
Posts: 1596
Joined: Tue May 28, 2013 12:14 pm
Location: Tokyo

Re: Project: 9017 (R 548, C 11, G 0) endless restart

Post by ChristianVirtual »

What I did was deleting the slot, let FAH dump the WU in a controlled manner and recreate the slot and continue working. Not much more we can do in that situation (in addition to reporting, what you also did). My guess: bad WU.
ImageImage
Please contribute your logs to http://ppd.fahmm.net
ChristianVirtual
Posts: 1596
Joined: Tue May 28, 2013 12:14 pm
Location: Tokyo

Re: Project: 9017 (R 548, C 11, G 0) endless restart

Post by ChristianVirtual »

Additional info on downloaded size of WU

Code: Select all

00:00:53:WU01:FS00:Connecting to 171.67.108.200:8080
00:00:54:WU01:FS00:Assigned to work server 171.64.65.124
00:00:54:WU01:FS00:Requesting new work unit for slot 00: RUNNING cpu:16 from 171.64.65.124
00:00:54:WU01:FS00:Connecting to 171.64.65.124:8080
00:00:54:WU01:FS00:Downloading 55.77KiB
00:00:54:WU01:FS00:Download complete
00:00:54:WU01:FS00:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:9017 run:548 clone:11 gen:0 core:0xa4 unit:0x00000000ab40417c55b26f88a3d3a766
00:01:39:WU00:FS00:0xa4:Completed 250000 out of 250000 steps (100%)
ImageImage
Please contribute your logs to http://ppd.fahmm.net
sryckbos
Pande Group Member
Posts: 116
Joined: Wed Jun 26, 2013 10:23 pm

Re: Project: 9017 (R 548, C 11, G 0) endless restart

Post by sryckbos »

Thanks for the heads up. I believe we've gotten to the source of this problem (vs the temporary fixes of the last few weeks). Sorry about all the headache!

Steven
Post Reply