checkpoint didnt work properly .
I have set checkpoint interval to 3 min.But it have no effect .I wiill paste restart log below, You can see the start step was wrong .
2012-04-04 shutdown log
- Code: Select all
15:42:25:WU00:FS00:0xa4:Completed 400000 out of 10000000 steps (4%)
15:50:32:Lost lifeline PID 5608, exiting
15:50:33:Server connection id=1 ended
15:50:33:FS00:Shutting core down
15:50:38:WU00:FS00:0xa4:Client no longer detected. Shutting down core
15:50:38:WU00:FS00:0xa4:
15:50:38:WU00:FS00:0xa4:Folding@home Core Shutdown: CLIENT_DIED
15:50:38:Clean exit
2012-04-05 startup log(there is no checkpoint between 15:42:25 and 15:50:32)
- Code: Select all
01:11:15:WU00:FS00:0xa4:Project: 7008 (Run 2, Clone 13, Gen 40)
01:11:15:WU00:FS00:0xa4:
01:11:15:WU00:FS00:0xa4:Assembly optimizations on if available.
01:11:15:WU00:FS00:0xa4:Entering M.D.
01:11:21:WU00:FS00:0xa4:Using Gromacs checkpoints
01:11:21:WU00:FS00:0xa4:Mapping NT from 4 to 4
01:11:22:WU00:FS00:0xa4:Resuming from checkpoint
01:11:22:WU00:FS00:0xa4:Verified 00/wudata_01.log
01:11:22:WU00:FS00:0xa4:Verified 00/wudata_01.trr
01:11:22:WU00:FS00:0xa4:Verified 00/wudata_01.xtc
01:11:22:WU00:FS00:0xa4:Verified 00/wudata_01.edr
01:11:22:WU00:FS00:0xa4:Completed 400001 out of 10000000 steps (4%)
2012-04-05 shutdown log
- Code: Select all
15:43:53:WU00:FS00:0xa4:Completed 8300000 out of 10000000 steps (83%)
15:51:52:Server connection id=1 ended
15:51:52:Lost lifeline PID 1152, exiting
15:51:53:FS00:Shutting core down
15:51:56:WU00:FS00:0xa4:Client no longer detected. Shutting down core
15:51:56:WU00:FS00:0xa4:
15:51:56:WU00:FS00:0xa4:Folding@home Core Shutdown: CLIENT_DIED
15:51:56:Clean exit
2012-04-06 startup log(there is no checkpoint between 15:43:53 and 15:51:52)
- Code: Select all
01:09:46:WU00:FS00:0xa4:Project: 7008 (Run 2, Clone 13, Gen 40)
01:09:46:WU00:FS00:0xa4:
01:09:46:WU00:FS00:0xa4:Assembly optimizations on if available.
01:09:46:WU00:FS00:0xa4:Entering M.D.
01:09:49:Server connection id=1 on 0.0.0.0:36330 from 127.0.0.1
01:09:52:WU00:FS00:0xa4:Using Gromacs checkpoints
01:09:52:WU00:FS00:0xa4:Mapping NT from 4 to 4
01:09:52:WU00:FS00:0xa4:Resuming from checkpoint
01:09:52:WU00:FS00:0xa4:Verified 00/wudata_01.log
01:09:53:WU00:FS00:0xa4:Verified 00/wudata_01.trr
01:09:53:WU00:FS00:0xa4:Verified 00/wudata_01.xtc
01:09:53:WU00:FS00:0xa4:Verified 00/wudata_01.edr
01:09:53:WU00:FS00:0xa4:Completed 8300001 out of 10000000 steps (83%)
2012-04-06 shutdown log
- Code: Select all
16:20:10:WU00:FS00:0xa4:Completed 45000 out of 1500000 steps (3%)
16:33:17:Lost lifeline PID 1692, exiting
16:33:18:FS00:Shutting core down
16:33:18:Server connection id=1 ended
16:33:22:Clean exit
16:33:22:WU00:FS00:0xa4:Client no longer detected. Shutting down core
16:33:22:WU00:FS00:0xa4:
16:33:22:WU00:FS00:0xa4:Folding@home Core Shutdown: CLIENT_DIED
2012-04-07 startup log(checkpoint seems correct )
- Code: Select all
01:41:55:WU00:FS00:0xa4:Project: 7809 (Run 8, Clone 205, Gen 51)
01:41:55:WU00:FS00:0xa4:
01:41:55:WU00:FS00:0xa4:Assembly optimizations on if available.
01:41:55:WU00:FS00:0xa4:Entering M.D.
01:42:01:WU00:FS00:0xa4:Using Gromacs checkpoints
01:42:01:WU00:FS00:0xa4:Mapping NT from 4 to 4
01:42:03:WU00:FS00:0xa4:Resuming from checkpoint
01:42:03:WU00:FS00:0xa4:Verified 00/wudata_01.log
01:42:03:WU00:FS00:0xa4:Verified 00/wudata_01.trr
01:42:03:WU00:FS00:0xa4:Verified 00/wudata_01.xtc
01:42:03:WU00:FS00:0xa4:Verified 00/wudata_01.edr
01:42:04:WU00:FS00:0xa4:Completed 49890 out of 1500000 steps (3%)
you can see .
Project: 7008 (Run 2, Clone 13, Gen 40) will Shutting down core first ,then Clean exit, and it checkpoint was wrong .It only checkpoint at percent finished .
Project: 7809 (Run 8, Clone 205, Gen 51)will Clean exit first ,then Shutting down core , and it checkpoint was correct.
BTW:I add -forceasm parameter on my V7 ,Does it cause this bug .