gpu core didn`t support checkpoint setting?

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, FAHC Science Team

Post Reply
vmzy
Posts: 136
Joined: Wed Apr 16, 2008 6:25 am

gpu core didn`t support checkpoint setting?

Post by vmzy »

I have set 'checkpointing frequency' to 3 min. Cpu core have no problem with this setting.
But Gpu core took 53 minutes to save a checkpoint(project 11713 with 15 minutes TPF ).When it save checkpoint, there will be a message 'WARNING:FS01:Size of positions 2579 does not match topology 2577' showup in log.
Could GPU core support checkpoint setting or save checkpoint at each percent?I think 53 minutes is too long.

Code: Select all

06:45:13:WU00:FS01:0x21:Folding@home GPU Core21 Folding@home Core
06:45:13:WU00:FS01:0x21:Version 0.0.18
06:45:27:WU00:FS01:0x21:Completed 0 out of 7500000 steps (0%)
06:45:27:WU00:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
07:01:22:WU00:FS01:0x21:Completed 75000 out of 7500000 steps (1%)
07:17:16:WU00:FS01:0x21:Completed 150000 out of 7500000 steps (2%)
07:33:10:WU00:FS01:0x21:Completed 225000 out of 7500000 steps (3%)
07:38:37:WARNING:FS01:Size of positions 2579 does not match topology 2577
07:49:12:WU00:FS01:0x21:Completed 300000 out of 7500000 steps (4%)
07:49:56:FS01:Paused
07:49:56:FS01:Shutting core down
07:49:56:WU00:FS01:0x21:WARNING:Console control signal 1 on PID 14160
07:49:56:WU00:FS01:0x21:Exiting, please wait. . .
07:49:57:WU00:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
07:50:18:FS01:Unpaused
07:50:18:WU00:FS01:Starting
07:50:18:WU00:FS01:Running FahCore: "D:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "D:\Program Files (x86)\FAHData\cores/cores.foldingathome.org/Win32/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21.exe" -dir 00 -suffix 01 -version 705 -lifeline 7772 -checkpoint 3 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0 -forceasm
07:50:18:WU00:FS01:Started FahCore on PID 6928
07:50:18:WU00:FS01:Core PID:13716
07:50:18:WU00:FS01:FahCore 0x21 started
07:50:19:WARNING:FS01:Size of positions 2579 does not match topology 2577
07:50:19:WU00:FS01:0x21:*********************** Log Started 2018-07-31T07:50:18Z ***********************
07:50:19:WU00:FS01:0x21:Project: 11713 (Run 20, Clone 56, Gen 262)
07:50:19:WU00:FS01:0x21:Unit: 0x0000013f8ca304e75a5a52288b1dbcac
07:50:19:WU00:FS01:0x21:CPU: 0x00000000000000000000000000000000
07:50:19:WU00:FS01:0x21:Machine: 1
07:50:19:WU00:FS01:0x21:Digital signatures verified
07:50:19:WU00:FS01:0x21:Folding@home GPU Core21 Folding@home Core
07:50:19:WU00:FS01:0x21:Version 0.0.18
07:50:19:WU00:FS01:0x21:  Found a checkpoint file
07:50:24:WU00:FS01:0x21:Completed 250000 out of 7500000 steps (3%)
07:50:24:WU00:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
08:01:05:WU00:FS01:0x21:Completed 300000 out of 7500000 steps (4%)
08:17:06:WU00:FS01:0x21:Completed 375000 out of 7500000 steps (5%)
08:33:07:WU00:FS01:0x21:Completed 450000 out of 7500000 steps (6%)
08:43:51:WARNING:FS01:Size of positions 2579 does not match topology 2577
08:49:06:WU00:FS01:0x21:Completed 525000 out of 7500000 steps (7%)
09:05:03:WU00:FS01:0x21:Completed 600000 out of 7500000 steps (8%)
09:20:59:WU00:FS01:0x21:Completed 675000 out of 7500000 steps (9%)
09:37:04:WU00:FS01:0x21:Completed 750000 out of 7500000 steps (10%)
09:37:11:WARNING:FS01:Size of positions 2579 does not match topology 2577
09:53:03:WU00:FS01:0x21:Completed 825000 out of 7500000 steps (11%)
10:08:58:WU00:FS01:0x21:Completed 900000 out of 7500000 steps (12%)
10:25:08:WU00:FS01:0x21:Completed 975000 out of 7500000 steps (13%)
10:30:45:WARNING:FS01:Size of positions 2579 does not match topology 2577
10:41:18:WU00:FS01:0x21:Completed 1050000 out of 7500000 steps (14%)
10:57:18:WU00:FS01:0x21:Completed 1125000 out of 7500000 steps (15%)
******************************* Date: 2018-07-31 *******************************
12:00:52:WARNING:WU00:FS01:Detected clock skew (51 mins 14 secs), I/O delay, laptop hibernation or other slowdown noted, adjusting time estimates
12:00:54:WARNING:WU00:FS01:FahCore returned an unknown error code which probably indicates that it crashed
12:00:54:WARNING:WU00:FS01:FahCore returned: WU_STALLED (127 = 0x7f)
12:00:54:WU00:FS01:Starting
12:00:54:WU00:FS01:Running FahCore: "D:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "D:\Program Files (x86)\FAHData\cores/cores.foldingathome.org/Win32/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21.exe" -dir 00 -suffix 01 -version 705 -lifeline 7772 -checkpoint 3 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0 -forceasm
12:00:54:WU00:FS01:Started FahCore on PID 1520
12:00:54:WU00:FS01:Core PID:1420
12:00:54:WU00:FS01:FahCore 0x21 started
12:00:55:WARNING:FS01:Size of positions 2579 does not match topology 2577
12:00:55:WARNING:FS01:Size of positions 2579 does not match topology 2577
12:00:55:WARNING:FS01:Size of positions 2579 does not match topology 2577
12:00:55:WARNING:FS01:Size of positions 2579 does not match topology 2577
12:00:55:WU00:FS01:0x21:*********************** Log Started 2018-07-31T12:00:55Z ***********************
12:00:55:WU00:FS01:0x21:Project: 11713 (Run 20, Clone 56, Gen 262)
12:00:55:WU00:FS01:0x21:Unit: 0x0000013f8ca304e75a5a52288b1dbcac
12:00:55:WU00:FS01:0x21:CPU: 0x00000000000000000000000000000000
12:00:55:WU00:FS01:0x21:Machine: 1
12:00:55:WU00:FS01:0x21:Digital signatures verified
12:00:55:WU00:FS01:0x21:Folding@home GPU Core21 Folding@home Core
12:00:55:WU00:FS01:0x21:Version 0.0.18
12:00:55:WU00:FS01:0x21:  Found a checkpoint file
12:01:05:WU00:FS01:0x21:Completed 1000000 out of 7500000 steps (13%)
12:01:05:WU00:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
12:11:41:WU00:FS01:0x21:Completed 1050000 out of 7500000 steps (14%)
12:27:36:WU00:FS01:0x21:Completed 1125000 out of 7500000 steps (15%)
12:43:44:WU00:FS01:0x21:Completed 1200000 out of 7500000 steps (16%)
12:54:40:WARNING:FS01:Size of positions 2579 does not match topology 2577
12:59:55:WU00:FS01:0x21:Completed 1275000 out of 7500000 steps (17%)
13:15:51:WU00:FS01:0x21:Completed 1350000 out of 7500000 steps (18%)
13:31:47:WU00:FS01:0x21:Completed 1425000 out of 7500000 steps (19%)
13:47:51:WU00:FS01:0x21:Completed 1500000 out of 7500000 steps (20%)
13:47:59:WARNING:FS01:Size of positions 2579 does not match topology 2577
14:03:49:WU00:FS01:0x21:Completed 1575000 out of 7500000 steps (21%)
14:19:45:WU00:FS01:0x21:Completed 1650000 out of 7500000 steps (22%)
14:35:41:WU00:FS01:0x21:Completed 1725000 out of 7500000 steps (23%)
14:41:07:WARNING:FS01:Size of positions 2579 does not match topology 2577
14:51:38:WU00:FS01:0x21:Completed 1800000 out of 7500000 steps (24%)
15:07:39:WU00:FS01:0x21:Completed 1875000 out of 7500000 steps (25%)
15:07:55:FS01:Paused
15:07:55:FS01:Shutting core down
15:07:55:WU00:FS01:0x21:WARNING:Console control signal 1 on PID 1420
15:07:55:WU00:FS01:0x21:Exiting, please wait. . .
15:07:55:WU00:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
******************************* Date: 2018-08-01 *******************************
01:01:02:FS01:Unpaused
01:01:02:WU00:FS01:Starting
01:01:02:WU00:FS01:Running FahCore: "D:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" "D:\Program Files (x86)\FAHData\cores/cores.foldingathome.org/Win32/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21.exe" -dir 00 -suffix 01 -version 705 -lifeline 7772 -checkpoint 3 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0 -forceasm
01:01:02:WU00:FS01:Started FahCore on PID 16056
01:01:02:WU00:FS01:Core PID:15380
01:01:02:WU00:FS01:FahCore 0x21 started
01:01:03:WARNING:FS01:Size of positions 2579 does not match topology 2577
01:01:03:WU00:FS01:0x21:*********************** Log Started 2018-08-01T01:01:02Z ***********************
01:01:03:WU00:FS01:0x21:Project: 11713 (Run 20, Clone 56, Gen 262)
01:01:03:WU00:FS01:0x21:Unit: 0x0000013f8ca304e75a5a52288b1dbcac
01:01:03:WU00:FS01:0x21:CPU: 0x00000000000000000000000000000000
01:01:03:WU00:FS01:0x21:Machine: 1
01:01:03:WU00:FS01:0x21:Digital signatures verified
01:01:03:WU00:FS01:0x21:Folding@home GPU Core21 Folding@home Core
01:01:03:WU00:FS01:0x21:Version 0.0.18
01:01:03:WARNING:FS01:Size of positions 2579 does not match topology 2577
01:01:03:WARNING:FS01:Size of positions 2579 does not match topology 2577
01:01:03:WARNING:FS01:Size of positions 2579 does not match topology 2577
01:01:03:WARNING:FS01:Size of positions 2579 does not match topology 2577
01:01:03:WARNING:FS01:Size of positions 2579 does not match topology 2577
01:01:03:WARNING:FS01:Size of positions 2579 does not match topology 2577
01:01:03:WU00:FS01:0x21:  Found a checkpoint file
01:01:12:WU00:FS01:0x21:Completed 1750000 out of 7500000 steps (23%)
01:01:12:WU00:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
01:11:58:WU00:FS01:0x21:Completed 1800000 out of 7500000 steps (24%)
01:28:02:WU00:FS01:0x21:Completed 1875000 out of 7500000 steps (25%)
01:43:58:WU00:FS01:0x21:Completed 1950000 out of 7500000 steps (26%)
bollix47
Posts: 2941
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: gpu core didn`t support checkpoint setting?

Post by bollix47 »

The GPU checkpoint frequency is set by the Project Leader & the FAHClient setting has no affect for GPU work.
The Topology messages are cosmetic and can be ignored. They will be eliminated in a future release.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: gpu core didn`t support checkpoint setting?

Post by bruce »

Manually setting checkpoint frequency only works on CPU slots.
(I'm not sure why that was never implemented for FAHCores that use the GPU.)
Joe_H
Site Admin
Posts: 7854
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: gpu core didn`t support checkpoint setting?

Post by Joe_H »

In addition, setting the checkpoint to the lowest possible value of 3 minutes has been tested and shown to actually slow down CPU processing by a several percent. How much will depend on how powerful the CPU is, but time spent writing a checkpoint is not being used to process a WU.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Post Reply