Core 0x18 - 9152 WU Stuck Deserializing State.

Moderators: Site Moderators, FAHC Science Team

Post Reply
sgstair
Posts: 6
Joined: Thu Feb 05, 2009 5:52 pm
Location: Seattle, Washington
Contact:

Core 0x18 - 9152 WU Stuck Deserializing State.

Post by sgstair »

Here's the detailed log for my experience with 9152 (Run 17, Clone 3, Gen 125)
It was stuck without making progress for a few days before I noticed it. Restarting it had no effect.
I renamed the work folder and another WU has started folding successfully, I think there's a problem with this specific WU.

Portion of the detailed log:

Code: Select all

[ Entering Init ]
  Launch time: 2016.03.27  10:5:26
  Arguments passed: -dir 01 -suffix 01 -version 704 -lifeline 146528 -checkpoint 15 -gpu 0 -gpu-vendor nvidia 
[ Leaving  Init ]
[ Entering Main ]
  Reading core settings...
  Total number of steps: 2500000
  XTC write frequency: 100000
[ Initializing Core Contexts ]
  Using platform OpenCL
  Looking for vendor: nvidia...found on platformId 1
setting Context Property: OpenCLDeviceIndex to 0
setting Context Property: OpenCLPlatformIndex to 1
  Deserializing System...
  Setting up Force Groups:
    Group 0: Everything Else
    Group 1: Nonbonded Direct Space
    Group 2: Nonbonded Reciprocal Space
  Found MonteCarloBarostat @ 1 (default) Bar, 300 Kelvin, 25 pressure change frequency.
    Found: 33142 atoms, 6 forces.
  Deserializing State...WARNING:Console control signal 1 on PID 148212

Exiting, please wait. . .

Project: 9152 (Run 17, Clone 3, Gen 125)

Unit: 0x000000b9ab436c9f56623c606c67a23a

CPU: 0x00000000000000000000000000000000

Machine: 1

Reading tar file core.xml

Reading tar file integrator.xml

Reading tar file state.xml

Reading tar file system.xml

Digital signatures verified

**************************** Zeta Folding@home Core ****************************
       Type: 24
       Core: Zeta
    Website: http://folding.stanford.edu/
  Copyright: (c) 2009-2014 Stanford University
     Author: Yutong Zhao <yutong.zhao@stanford.edu>
       Args: -dir 01 -suffix 01 -version 704 -lifeline 151168 -checkpoint 15
             -gpu 0 -gpu-vendor nvidia
     Config: <none>
************************************ Build *************************************
    Version: 0.0.4
       Date: Feb 4 2015
       Time: 10:43:15
    SVN Rev: Unknown
     Branch: Unknown
   Compiler: Visual C++ 2008
    Options: $( /TP $) /nologo /EHa /wd4297 /wd4103 /Ox -arch:SSE2 /MT
   Platform: win32 7
       Bits: 32
       Mode: Release
************************************ System ************************************
        CPU: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
     CPU ID: GenuineIntel Family 6 Model 58 Stepping 9
       CPUs: 8
     Memory: 7.90GiB
Free Memory: 1.98GiB
    Threads: WINDOWS_THREADS
 OS Version: 6.2
Has Battery: false
 On Battery: false
 UTC Offset: -7
        PID: 150656
        CWD: C:\Users\fah\AppData\Roaming\FAHClient\work
         OS: Windows 8.1 Pro
    OS Arch: AMD64
       GPUs: 0
       CUDA: Not detected
********************************************************************************
Folding@home GPU core18

Version 0.0.4

[2] compatible platform(s):
  -- 0 --
  PROFILE = FULL_PROFILE
  VERSION = OpenCL 1.2 
  NAME = Intel(R) OpenCL
  VENDOR = Intel(R) Corporation
  -- 1 --
  PROFILE = FULL_PROFILE
  VERSION = OpenCL 1.2 CUDA 7.5.0
  NAME = NVIDIA CUDA
  VENDOR = NVIDIA Corporation

(1) device(s) found on platform 0:
  -- 0 --
  DEVICE_NAME =         Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
  DEVICE_VENDOR = Intel(R) Corporation
  DEVICE_VERSION = OpenCL 1.2 (Build 75658)

(2) device(s) found on platform 1:
  -- 0 --
  DEVICE_NAME = GeForce GTX 970
  DEVICE_VENDOR = NVIDIA Corporation
  DEVICE_VERSION = OpenCL 1.2 CUDA

  -- 1 --
  DEVICE_NAME = GeForce GTX 970
  DEVICE_VENDOR = NVIDIA Corporation
  DEVICE_VERSION = OpenCL 1.2 CUDA

[ Entering Init ]
  Launch time: 2016.03.28  11:20:6
  Arguments passed: -dir 01 -suffix 01 -version 704 -lifeline 151168 -checkpoint 15 -gpu 0 -gpu-vendor nvidia 
[ Leaving  Init ]
[ Entering Main ]
  Reading core settings...
  Total number of steps: 2500000
  XTC write frequency: 100000
[ Initializing Core Contexts ]
  Using platform OpenCL
  Looking for vendor: nvidia...found on platformId 1
setting Context Property: OpenCLDeviceIndex to 0
setting Context Property: OpenCLPlatformIndex to 1
  Deserializing System...
  Setting up Force Groups:
    Group 0: Everything Else
    Group 1: Nonbonded Direct Space
    Group 2: Nonbonded Reciprocal Space
  Found MonteCarloBarostat @ 1 (default) Bar, 300 Kelvin, 25 pressure change frequency.
    Found: 33142 atoms, 6 forces.
  Deserializing State...
Logs shown in fahcontrol:

Code: Select all

07:20:12:WU01:FS01:Connecting to 171.67.108.45:80
07:20:12:WU01:FS01:Assigned to work server 171.67.108.159
07:20:12:WU01:FS01:Requesting new work unit for slot 01: RUNNING gpu:0:GM204 [GeForce GTX 970] from 171.67.108.159
07:20:12:WU01:FS01:Connecting to 171.67.108.159:8080
07:20:13:WU01:FS01:Downloading 18.25MiB
07:20:17:WU01:FS01:Download complete
07:20:17:WU01:FS01:Received Unit: id:01 state:DOWNLOAD error:NO_ERROR project:9152 run:17 clone:3 gen:125 core:0x18 unit:0x000000b9ab436c9f56623c606c67a23a

Code: Select all

******************************* Date: 2016-04-28 *******************************
******************************* Date: 2016-04-28 *******************************
18:18:56:FS01:Paused
18:18:56:FS01:Shutting core down
18:18:56:WU01:FS01:0x18:WARNING:Console control signal 1 on PID 148212
18:18:56:WU01:FS01:0x18:Exiting, please wait. . .
18:19:57:WARNING:FS01:Killing WU01
18:19:57:WU01:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
18:20:05:FS01:Unpaused
18:20:05:WU01:FS01:Starting
18:20:05:WU01:FS01:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/fah/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/NVIDIA/Fermi/Core_18.fah/FahCore_18.exe -dir 01 -suffix 01 -version 704 -lifeline 9968 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
18:20:05:WU01:FS01:Started FahCore on PID 151168
18:20:05:WU01:FS01:Core PID:150656
18:20:05:WU01:FS01:FahCore 0x18 started
18:20:05:WU01:FS01:0x18:*********************** Log Started 2016-04-28T18:20:05Z ***********************
18:20:05:WU01:FS01:0x18:Project: 9152 (Run 17, Clone 3, Gen 125)
18:20:05:WU01:FS01:0x18:Unit: 0x000000b9ab436c9f56623c606c67a23a
18:20:05:WU01:FS01:0x18:CPU: 0x00000000000000000000000000000000
18:20:05:WU01:FS01:0x18:Machine: 1
18:20:05:WU01:FS01:0x18:Reading tar file core.xml
18:20:05:WU01:FS01:0x18:Reading tar file integrator.xml
18:20:05:WU01:FS01:0x18:Reading tar file state.xml
18:20:05:WU01:FS01:0x18:Reading tar file system.xml
18:20:06:WU01:FS01:0x18:Digital signatures verified
18:20:06:WU01:FS01:0x18:Folding@home GPU core18
18:20:06:WU01:FS01:0x18:Version 0.0.4
18:42:49:FS01:Paused
18:42:49:FS01:Shutting core down
18:42:49:WU01:FS01:0x18:WARNING:Console control signal 1 on PID 150656
18:42:49:WU01:FS01:0x18:Exiting, please wait. . .
18:43:50:WARNING:FS01:Killing WU01
18:43:50:WU01:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Core 0x18 - 9152 WU Stuck Deserializing State.

Post by Joe_H »

Yes, it might be a bad WU. There are several failures recorded in the database, and no successful completions.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
stiv88
Posts: 5
Joined: Mon Aug 08, 2016 3:59 pm

Re: Core 0x18 - 9152 WU Stuck Deserializing State.

Post by stiv88 »

I got the same wu.

How i can cancel/delete it? 6.45 days left and estimated ppd 633.

//EDIT: remove and add gpu slot help.
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Core 0x18 - 9152 WU Stuck Deserializing State.

Post by Joe_H »

stiv88 wrote:I got the same wu.

How i can cancel/delete it? 6.45 days left and estimated ppd 633.

//EDIT: remove and add gpu slot help.
Did you get this exact WU - Project: 9152 (Run 17, Clone 3, Gen 125)? Or are you talking about another WU from Project 9152?

The first might point to a problem on the WS, this WU was last assigned last April and probably should not have been assigned to anyone. The second situation should be reported as problem with that exact WU, not piggybacked onto a report on a different WU.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Post Reply