WU has reset overnight

If you're new to FAH and need help getting started or you have very basic questions, start here.

Moderators: Site Moderators, FAHC Science Team

Post Reply
Redamancy
Posts: 3
Joined: Mon Mar 27, 2017 6:24 am

WU has reset overnight

Post by Redamancy »

Hello!
I've recently started folding at a decent Acer laptop that I didn't use for anything anyway. Folding has gone well, until yesterday when I, just as usual, paused the folding and shut off my computer. Normally, when I start it again next morning, it comes back to where it stopped last night and keeps folding as usual. But when I started it today the entire WU had reset. It's a shame since this was a big WU worth 3300 points (for this laptop, it's big :P) and it had worked about 50 % through it. Is this a common thing? I would like to know if there's any solution.
Log will be posted below:

Code: Select all

*********************** Log Started 2017-03-27T06:18:12Z ***********************
06:18:12:************************* Folding@home Client *************************
06:18:12:      Website: http://folding.stanford.edu/
06:18:12:    Copyright: (c) 2009-2014 Stanford University
06:18:12:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
06:18:12:         Args: 
06:18:12:       Config: C:/Users/Jakob/AppData/Roaming/FAHClient/config.xml
06:18:12:******************************** Build ********************************
06:18:12:      Version: 7.4.4
06:18:12:         Date: Mar 4 2014
06:18:12:         Time: 20:26:54
06:18:12:      SVN Rev: 4130
06:18:12:       Branch: fah/trunk/client
06:18:12:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
06:18:12:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
06:18:12:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
06:18:12:     Platform: win32 XP
06:18:12:         Bits: 32
06:18:12:         Mode: Release
06:18:12:******************************* System ********************************
06:18:12:          CPU: Intel(R) Core(TM) i3-3227U CPU @ 1.90GHz
06:18:12:       CPU ID: GenuineIntel Family 6 Model 58 Stepping 9
06:18:12:         CPUs: 4
06:18:12:       Memory: 3.82GiB
06:18:12:  Free Memory: 2.56GiB
06:18:12:      Threads: WINDOWS_THREADS
06:18:12:   OS Version: 6.1
06:18:12:  Has Battery: true
06:18:12:   On Battery: true
06:18:12:   UTC Offset: 2
06:18:12:          PID: 3564
06:18:12:          CWD: C:/Users/Jakob/AppData/Roaming/FAHClient
06:18:12:           OS: Windows 7 Professional
06:18:12:      OS Arch: AMD64
06:18:12:         GPUs: 0
06:18:12:         CUDA: Not detected
06:18:12:Win32 Service: false
06:18:12:***********************************************************************
06:18:12:<config>
06:18:12:  <!-- Network -->
06:18:12:  <proxy v=':8080'/>
06:18:12:
06:18:12:  <!-- Slot Control -->
06:18:12:  <pause-on-battery v='false'/>
06:18:12:  <power v='full'/>
06:18:12:
06:18:12:  <!-- User Information -->
06:18:12:  <team v='143016'/>
06:18:12:  <user v='Redamancy'/>
06:18:12:
06:18:12:  <!-- Folding Slots -->
06:18:12:  <slot id='0' type='CPU'/>
06:18:12:</config>
06:18:12:Trying to access database...
06:18:15:Successfully acquired database lock
06:18:15:Enabled folding slot 00: READY cpu:4
06:18:15:WU00:FS00:Starting
06:18:15:WU00:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Jakob/AppData/Roaming/FAHClient/cores/fahwebx.stanford.edu/cores/Win32/AMD64/Core_a4.fah/FahCore_a4.exe -dir 00 -suffix 01 -version 704 -lifeline 3564 -checkpoint 15 -np 4
06:18:21:WU00:FS00:Started FahCore on PID 4776
06:18:21:WU00:FS00:Core PID:4788
06:18:21:WU00:FS00:FahCore 0xa4 started
06:18:23:WU00:FS00:0xa4:
06:18:23:WU00:FS00:0xa4:*------------------------------*
06:18:23:WU00:FS00:0xa4:Folding@Home Gromacs GB Core
06:18:23:WU00:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
06:18:23:WU00:FS00:0xa4:
06:18:23:WU00:FS00:0xa4:Preparing to commence simulation
06:18:23:WU00:FS00:0xa4:- Looking at optimizations...
06:18:23:WU00:FS00:0xa4:- Files status OK
06:18:24:WU00:FS00:0xa4:- Expanded 1948126 -> 6261824 (decompressed 321.4 percent)
06:18:24:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=1948126 data_size=6261824, decompressed_data_size=6261824 diff=0
06:18:24:WU00:FS00:0xa4:- Digital signature verified
06:18:24:WU00:FS00:0xa4:
06:18:24:WU00:FS00:0xa4:Project: 11660 (Run 91, Clone 3, Gen 93)
06:18:24:WU00:FS00:0xa4:
06:18:24:WU00:FS00:0xa4:Assembly optimizations on if available.
06:18:24:WU00:FS00:0xa4:Entering M.D.
06:18:30:WU00:FS00:0xa4:Using Gromacs checkpoints
06:18:30:WU00:FS00:0xa4:Mapping NT from 4 to 4 
06:18:36:WU00:FS00:0xa4:Resuming from checkpoint
06:18:36:WU00:FS00:0xa4:Verified 00/wudata_01.log
06:18:37:WU00:FS00:0xa4:Verified 00/wudata_01.trr
06:18:38:WU00:FS00:0xa4:File 00/wudata_01.xtc has changed since last checkpoint
06:18:38:WU00:FS00:0xa4:mdrun returned 3
06:18:40:WU00:FS00:0xa4:Gromacs detected an invalid checkpoint.  Restarting...
06:18:40:WU00:FS00:0xa4:Folding@home Core Shutdown: UNKNOWN_ERROR
06:18:41:WARNING:WU00:FS00:FahCore returned: CORE_RESTART (98 = 0x62)
06:18:41:WU00:FS00:Starting
06:18:41:WU00:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Jakob/AppData/Roaming/FAHClient/cores/fahwebx.stanford.edu/cores/Win32/AMD64/Core_a4.fah/FahCore_a4.exe -dir 00 -suffix 01 -version 704 -lifeline 3564 -checkpoint 15 -np 4
06:18:41:WU00:FS00:Started FahCore on PID 3732
06:18:41:WU00:FS00:Core PID:2688
06:18:41:WU00:FS00:FahCore 0xa4 started
06:18:41:WU00:FS00:0xa4:
06:18:41:WU00:FS00:0xa4:*------------------------------*
06:18:41:WU00:FS00:0xa4:Folding@Home Gromacs GB Core
06:18:41:WU00:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
06:18:41:WU00:FS00:0xa4:
06:18:41:WU00:FS00:0xa4:Preparing to commence simulation
06:18:41:WU00:FS00:0xa4:- Looking at optimizations...
06:18:42:WU00:FS00:0xa4:- Created dyn
06:18:42:WU00:FS00:0xa4:- Files status OK
06:18:42:WU00:FS00:0xa4:- Expanded 1948126 -> 6261824 (decompressed 321.4 percent)
06:18:42:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=1948126 data_size=6261824, decompressed_data_size=6261824 diff=0
06:18:42:WU00:FS00:0xa4:- Digital signature verified
06:18:42:WU00:FS00:0xa4:
06:18:42:WU00:FS00:0xa4:Project: 11660 (Run 91, Clone 3, Gen 93)
06:18:42:WU00:FS00:0xa4:
06:18:42:WU00:FS00:0xa4:Assembly optimizations on if available.
06:18:42:WU00:FS00:0xa4:Entering M.D.
06:18:48:WU00:FS00:0xa4:Mapping NT from 4 to 4 
06:18:53:WU00:FS00:0xa4:Completed 0 out of 1250000 steps  (0%)
Redamancy
Posts: 3
Joined: Mon Mar 27, 2017 6:24 am

Re: WU has reset overnight

Post by Redamancy »

I noticed this while enabling errors and warnings in my log.

*********************** Log Started 2017-03-27T06:18:12Z ***********************
06:18:41:WARNING:WU00:FS00:FahCore returned: CORE_RESTART (98 = 0x62)

That could probably be the thing that reset the core. Anyone knows what the cause could be?
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: WU has reset overnight

Post by Joe_H »

Welcome to the folding support forum.

The important part of the message is just a couple lines before:

Code: Select all

06:18:38:WU00:FS00:0xa4:File 00/wudata_01.xtc has changed since last checkpoint
06:18:38:WU00:FS00:0xa4:mdrun returned 3
06:18:40:WU00:FS00:0xa4:Gromacs detected an invalid checkpoint.  Restarting...
06:18:40:WU00:FS00:0xa4:Folding@home Core Shutdown: UNKNOWN_ERROR
Something corrupted part of the checkpoint done prior to your shutting down the laptop. It could be as simple as not providing enough time between pausing folding and doing the shutdown, the checkpoint files might not have been completely written to disk first. Windows is supposed to wait long enough for this to happen during a shutdown, but from personal experience that does not always happen. In this case, without a valid checkpoint to start from, the client restarted from the beginning.

There are some other possible causes for the file getting corrupted, for example a failing drive. The best way to avoid this happening in the future is to pause folding and wait a minute or two before shutting down.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Redamancy
Posts: 3
Joined: Mon Mar 27, 2017 6:24 am

Re: WU has reset overnight

Post by Redamancy »

Joe_H wrote:Welcome to the folding support forum.

The important part of the message is just a couple lines before:

Code: Select all

06:18:38:WU00:FS00:0xa4:File 00/wudata_01.xtc has changed since last checkpoint
06:18:38:WU00:FS00:0xa4:mdrun returned 3
06:18:40:WU00:FS00:0xa4:Gromacs detected an invalid checkpoint.  Restarting...
06:18:40:WU00:FS00:0xa4:Folding@home Core Shutdown: UNKNOWN_ERROR
Something corrupted part of the checkpoint done prior to your shutting down the laptop. It could be as simple as not providing enough time between pausing folding and doing the shutdown, the checkpoint files might not have been completely written to disk first. Windows is supposed to wait long enough for this to happen during a shutdown, but from personal experience that does not always happen. In this case, without a valid checkpoint to start from, the client restarted from the beginning.

There are some other possible causes for the file getting corrupted, for example a failing drive. The best way to avoid this happening in the future is to pause folding and wait a minute or two before shutting down.
Drats, I realised the cause was something like this. I was tired and could possibly have shut off the laptop too quick.
Thanks for the reply.
SteveWillis
Posts: 409
Joined: Fri Apr 15, 2016 12:42 am
Hardware configuration: PC 1:
Linux Mint 17.3
three gtx 1080 GPUs One on a powered header
Motherboard = [MB-AM3-AS-SB-990FXR2] qty 1 Asus Sabertooth 990FX(+59.99)
CPU = [CPU-AM3-FX-8320BR] qty 1 AMD FX 8320 Eight Core 3.5GHz(+41.99)

PC2:
Linux Mint 18
Open air case
Motherboard: ASUS Crosshair V Formula-Z AM3+ AMD 990FX SATA 6Gb/s USB 3.0 ATX AMD
AMD FD6300WMHKBOX FX-6300 6-Core Processor Black Edition with Cooler Master Hyper 212 EVO - CPU Cooler with 120mm PWM Fan
three gtx 1080,
one gtx 1080 TI on a powered header

Re: WU has reset overnight

Post by SteveWillis »

Does FAHControl also write a checkpoint when paused? I was under the impression that there was no way to force a checkpoint.
Image

1080 and 1080TI GPUs on Linux Mint
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: WU has reset overnight

Post by Joe_H »

With the GPU folding cores currently in use that is correct, they write a checkpoint every so many steps. For the CPU cores the checkpoint is done at the time interval set through FAHControl, but that checkpoint will e done as soon as the current iteration completes. However by bad timing, a shutdown done at the same moment as the checkpoint is in the process of being created can result in it being corrupted by being partially written.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: WU has reset overnight

Post by bruce »

SteveWillis wrote:Does FAHControl also write a checkpoint when paused? I was under the impression that there was no way to force a checkpoint.
As Joe suggested, FAHControl does not initiate a checkpoint when paused but it should complete one if the checkpoint process has started. Then, too, once it has been written, it's still in cache memory and it's up to the OS to complete the process of storing the data permanently on disk.

And, no, there is no way to force a checkpoint, so upon restart, FAH will have to reprocess whatever work has been done since the last checkpoint was written.
Post Reply