8649 (Run 238, Clone 0, Gen 39) repeated fails, dumped

Moderators: Site Moderators, FAHC Science Team

Post Reply
parkut
Posts: 364
Joined: Tue Feb 12, 2008 7:33 am
Hardware configuration: Running exclusively Linux headless blades. All are dedicated crunching machines.
Location: SE Michigan, USA

8649 (Run 238, Clone 0, Gen 39) repeated fails, dumped

Post by parkut »

Found one of my linux (Centos 7.3/Q22/Q6600) machines stuck in a loop trying to start Project 8649 (Run 238, Clone 0, Gen 39), but immediately failing. This repeated 322 times over a five hour period.

Issue was resolved by stopping FAHClient and deleting work directory. On restart, the machine was immediately assigned a new WU and returned to folding like normal.

Code: Select all

02:48:20:WU00:FS00:Starting
02:48:20:WU00:FS00:Removing old file './work/00/logfile_01-20180115-021720.txt'
-checkpoint 15 -np 4
02:48:20:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 00 -suffix 01 -version 704 -lifeline 28424 
02:48:20:WU00:FS00:Started FahCore on PID 29484
02:48:20:WU00:FS00:Core PID:29488
02:48:20:WU00:FS00:FahCore 0xa4 started
02:48:20:WU00:FS00:0xa4:
02:48:20:WU00:FS00:0xa4:*------------------------------*
02:48:20:WU00:FS00:0xa4:Folding@Home Gromacs GB Core
02:48:20:WU00:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
02:48:20:WU00:FS00:0xa4:
02:48:20:WU00:FS00:0xa4:Preparing to commence simulation
02:48:20:WU00:FS00:0xa4:- Ensuring status. Please wait.
02:48:30:WU00:FS00:0xa4:- Looking at optimizations...
02:48:30:WU00:FS00:0xa4:- Working with standard loops on this execution.
02:48:30:WU00:FS00:0xa4:Examination of work files indicates 8 consecutive improper terminations of core.
02:48:30:WU00:FS00:0xa4:- Expanded 29720 -> 535948 (decompressed 1803.3 percent)
02:48:30:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=29720 data_size=535948, decompressed_data_size=535948 diff=0
02:48:30:WU00:FS00:0xa4:- Digital signature verified
02:48:30:WU00:FS00:0xa4:
02:48:30:WU00:FS00:0xa4:Project: 8649 (Run 238, Clone 0, Gen 39)
02:48:30:WU00:FS00:0xa4:
02:48:30:WU00:FS00:0xa4:Entering M.D.
02:48:36:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
02:49:20:WU00:FS00:Starting
02:49:20:WU00:FS00:Removing old file './work/00/logfile_01-20180115-021820.txt'
-checkpoint 15 -np 4
02:49:20:WU00:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/fahwebx.stanford.edu/cores/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 00 -suffix 01 -version 704 -lifeline 28424 
02:49:20:WU00:FS00:Started FahCore on PID 29496
02:49:20:WU00:FS00:Core PID:29500
02:49:20:WU00:FS00:FahCore 0xa4 started
02:49:20:WU00:FS00:0xa4:
02:49:20:WU00:FS00:0xa4:*------------------------------*
02:49:20:WU00:FS00:0xa4:Folding@Home Gromacs GB Core
02:49:20:WU00:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
02:49:20:WU00:FS00:0xa4:
02:49:20:WU00:FS00:0xa4:Preparing to commence simulation
02:49:20:WU00:FS00:0xa4:- Ensuring status. Please wait.
02:49:30:WU00:FS00:0xa4:- Looking at optimizations...
02:49:30:WU00:FS00:0xa4:- Working with standard loops on this execution.
02:49:30:WU00:FS00:0xa4:Examination of work files indicates 8 consecutive improper terminations of core.
02:49:30:WU00:FS00:0xa4:- Expanded 29720 -> 535948 (decompressed 1803.3 percent)
02:49:30:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=29720 data_size=535948, decompressed_data_size=535948 diff=0
02:49:30:WU00:FS00:0xa4:- Digital signature verified
02:49:30:WU00:FS00:0xa4:
02:49:30:WU00:FS00:0xa4:Project: 8649 (Run 238, Clone 0, Gen 39)
02:49:30:WU00:FS00:0xa4:
02:49:30:WU00:FS00:0xa4:Entering M.D.
02:49:36:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
toTOW
Site Moderator
Posts: 6296
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: 8649 (Run 238, Clone 0, Gen 39) repeated fails, dumped

Post by toTOW »

Yes, something is probably wrong with this trajectory : Gen 38 has been completed on January 15th and I see only one report of failure on the 21st ... at least someone has been able to report something to the server.

I marked the WU as bad.
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Post Reply