70XX project didn't support checkpoint parameter

Moderators: Site Moderators, FAHC Science Team

Post Reply
vmzy
Posts: 136
Joined: Wed Apr 16, 2008 6:25 am

70XX project didn't support checkpoint parameter

Post by vmzy »

I have set checkpoint interval to 3 mins.

Code: Select all

<!-- FahCore Control -->
  <checkpoint v='3'/>

  <!-- Folding Slot Configuration -->
  <extra-core-args v='-forceasm'/>
But 70xx project seems just have percent checkpoint.Didn't use checkpoint interval setting.

Code: Select all

15:43:51:WU01:FS00:0xa4:Completed 2200000 out of 10000000 steps  (22%)
15:53:23:Server connection id=1 ended
15:53:25:Lost lifeline PID 2148, exiting
15:53:26:FS00:Shutting core down
15:53:36:WU01:FS00:0xa4:Client no longer detected. Shutting down core 
15:53:36:WU01:FS00:0xa4:
15:53:36:WU01:FS00:0xa4:Folding@home Core Shutdown: CLIENT_DIED
15:53:36:Clean exit
you can see from shutdown log,that I quit v7 10 mins after percent reached.It should checkpointed 3 times.

Code: Select all

01:08:43:WU01:FS00:Starting
01:08:43:WU01:FS00:Running FahCore: "D:\Program Files\FAHClient/FAHCoreWrapper.exe" D:/ProgramData/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/Core_a4.fah/FahCore_a4.exe -dir 01 -suffix 01 -version 701 -lifeline 2520 -checkpoint 3 -np 4 -forceasm
01:08:43:WU01:FS00:Started FahCore on PID 2756
01:08:43:Server connection id=1 on 0.0.0.0:36330 from 127.0.0.1
01:08:43:WU01:FS00:Core PID:2772
01:08:43:WU01:FS00:FahCore 0xa4 started
01:08:43:WU01:FS00:0xa4:
01:08:43:WU01:FS00:0xa4:*------------------------------*
01:08:43:WU01:FS00:0xa4:Folding@Home Gromacs GB Core
01:08:44:WU01:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
01:08:44:WU01:FS00:0xa4:
01:08:44:WU01:FS00:0xa4:Preparing to commence simulation
01:08:44:WU01:FS00:0xa4:- Assembly optimizations manually forced on.
01:08:44:WU01:FS00:0xa4:- Not checking prior termination.
01:08:44:WU01:FS00:0xa4:- Expanded 49744 -> 192928 (decompressed 387.8 percent)
01:08:44:WU01:FS00:0xa4:Called DecompressByteArray: compressed_data_size=49744 data_size=192928, decompressed_data_size=192928 diff=0
01:08:44:WU01:FS00:0xa4:- Digital signature verified
01:08:44:WU01:FS00:0xa4:
01:08:44:WU01:FS00:0xa4:Project: 7027 (Run 1, Clone 448, Gen 14)
01:08:44:WU01:FS00:0xa4:
01:08:44:WU01:FS00:0xa4:Assembly optimizations on if available.
01:08:44:WU01:FS00:0xa4:Entering M.D.
01:08:49:WU01:FS00:0xa4:Using Gromacs checkpoints
01:08:50:WU01:FS00:0xa4:Mapping NT from 4 to 4 
01:08:50:WU01:FS00:0xa4:Resuming from checkpoint
01:08:50:WU01:FS00:0xa4:Verified 01/wudata_01.log
01:08:50:WU01:FS00:0xa4:Verified 01/wudata_01.trr
01:08:50:WU01:FS00:0xa4:Verified 01/wudata_01.xtc
01:08:50:WU01:FS00:0xa4:Verified 01/wudata_01.edr
01:08:50:WU01:FS00:0xa4:Completed 2200001 out of 10000000 steps  (22%)
But when I restart v7 it just continued from percent checkpoint,waste 10 mins calculation.And the checkpoint interval setting hadn`t take effect.Please check it.


Furthermore ,80xx project seems don't support next-unit-percentage parameter.It will starting download new WU when 100% finished rather than 99%.
compdewd
Posts: 165
Joined: Sat Jun 09, 2012 6:56 am
Hardware configuration: [1] Debian 8 64-bit: EVGA NVIDIA GTX 650 Ti, MSI NVIDIA GTX 460, AMD FX-8120
[2] Windows 7 64-bit: MSI NVIDIA GTX 460, AMD Phenom II X4
Location: Cincinnati, Ohio, USA
Contact:

Re: 70XX project didn't support checkpoint parameter

Post by compdewd »

vmzy wrote:Furthermore ,80xx project seems don't support next-unit-percentage parameter.It will starting download new WU when 100% finished rather than 99%.
I do not think this is project specific. I have also had this happen. You can think of it as if next-unit-percentage is set to 99, once 99% is complete, the next work unit download will start. If you want the next work unit to be downloaded once the unit reaches 99%, you must set next-unit-percentage to 98. I also thought that next-unit-percentage was not working when I saw this, but after changing the parameter to something else, I saw how it worked.
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: 70XX project didn't support checkpoint parameter

Post by 7im »

Various fahcores follow the checkpoint request differently. Some fahcores save a check point at every %, so they don't save any extra checkpoints on a timed basis. Still others work very much inline with timed checkpoints. YMMV.

As to the 99% vs. 100%, that's a similar fahcore related issue with V7, and has been documented.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Nathan_P
Posts: 1180
Joined: Wed Apr 01, 2009 9:22 pm
Hardware configuration: Asus Z8NA D6C, 2 x5670@3.2 Ghz, , 12gb Ram, GTX 980ti, AX650 PSU, win 10 (daily use)

Asus Z87 WS, Xeon E3-1230L v3, 8gb ram, KFA GTX 1080, EVGA 750ti , AX760 PSU, Mint 18.2 OS

Not currently folding
Asus Z9PE- D8 WS, 2 E5-2665@2.3 Ghz, 16Gb 1.35v Ram, Ubuntu (Fold only)
Asus Z9PA, 2 Ivy 12 core, 16gb Ram, H folding appliance (fold only)
Location: Jersey, Channel islands

Re: 70XX project didn't support checkpoint parameter

Post by Nathan_P »

There's no need to have your checkpoints set so low, if your machine crashes or loses power whilst writing the checkpoint the WU will be corrupted and lost. 15 minutes is the default and I run mine at 30.

Regarding next unit %age. I have noticed on 80xx projects that they are that quick that if you set your %age to 99 the current WU will be finished before the download is finished - nothing is wrong its just normal. On a 764x project you will have a good 10 minutes to finish the download before you hit 100%
Image
Post Reply