Work Unit (PRCG): 9201 (995, 4, 508)

Moderators: Site Moderators, FAHC Science Team

dkapetansky
Posts: 9
Joined: Tue Jul 07, 2015 12:05 am

Work Unit (PRCG): 9201 (995, 4, 508)

Post by dkapetansky »

This work unit has executed on my machine until it reached a completion state of 99%, at which point it has stalled, and will not advance further.

What approaches might be used to drive the WU to completion?

Thank you and I apologize if this topic has already been covered on this forum.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Work Unit (PRCG): 9201 (995, 4, 508)

Post by bruce »

If you check the log, you'll most likely find that it stopped well before 99% but the completion estimator just keeps progressing. A Pause / Fold will resume from the last checkpoint and the WU will very likely be completed.

There are a variety of possible causes, but the most frequent seem to be overclocking / overheating.
dkapetansky
Posts: 9
Joined: Tue Jul 07, 2015 12:05 am

Re: Work Unit (PRCG): 9201 (995, 4, 508)

Post by dkapetansky »

Thank you! I am trying this out now.
dkapetansky
Posts: 9
Joined: Tue Jul 07, 2015 12:05 am

Re: Work Unit (PRCG): 9201 (995, 4, 508)

Post by dkapetansky »

OK, here is the follow-up on this.
My original observation was that the progress bar for the work unit showed 99.99%, but the log showed only 60% complete. So I paused and then re-started the WU, and the progress bar advanced again to 99.99%, and stalled again. (The log then showed 65% complete.)

Here are the actual log entries:

Code: Select all

*********************** Log Started 2015-07-07T09:26:25Z ***********************
09:26:25:************************* Folding@home Client *************************
09:26:25:      Website: http://folding.stanford.edu/
09:26:25:    Copyright: (c) 2009-2014 Stanford University
09:26:25:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
09:26:25:         Args: 
09:26:25:       Config: C:/Users/Kap/AppData/Roaming/FAHClient/config.xml
09:26:25:******************************** Build ********************************
09:26:25:      Version: 7.4.4
09:26:25:         Date: Mar 4 2014
09:26:25:         Time: 20:26:54
09:26:25:      SVN Rev: 4130
09:26:25:       Branch: fah/trunk/client
09:26:25:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
09:26:25:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
09:26:25:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
09:26:25:     Platform: win32 XP
09:26:25:         Bits: 32
09:26:25:         Mode: Release
09:26:25:******************************* System ********************************
09:26:25:          CPU: Intel(R) Core(TM) i7-3840QM CPU @ 2.80GHz
09:26:25:       CPU ID: GenuineIntel Family 6 Model 58 Stepping 9
09:26:25:         CPUs: 8
09:26:25:       Memory: 15.95GiB
09:26:25:  Free Memory: 13.81GiB
09:26:25:      Threads: WINDOWS_THREADS
09:26:25:   OS Version: 6.1
09:26:25:  Has Battery: true
09:26:25:   On Battery: false
09:26:25:   UTC Offset: -4
09:26:25:          PID: 7496
09:26:25:          CWD: C:/Users/Kap/AppData/Roaming/FAHClient
09:26:25:           OS: Windows 7 Home Premium
09:26:25:      OS Arch: AMD64
09:26:25:         GPUs: 1
09:26:25:        GPU 0: ATI:5 Wimbledon XT [Radeon HD 7970M]
09:26:25:         CUDA: Not detected
09:26:25:Win32 Service: false
09:26:25:***********************************************************************
09:26:25:<config>
09:26:25:  <!-- Folding Core -->
09:26:25:  <checkpoint v='6'/>
09:26:25:  <core-priority v='low'/>
09:26:25:
09:26:25:  <!-- Network -->
09:26:25:  <proxy v=':8080'/>
09:26:25:
09:26:25:  <!-- Slot Control -->
09:26:25:  <pause-on-battery v='false'/>
09:26:25:  <power v='full'/>
09:26:25:
09:26:25:  <!-- User Information -->
09:26:25:  <passkey v='********************************'/>
09:26:25:  <user v='dkapetansky'/>
09:26:25:
09:26:25:  <!-- Folding Slots -->
09:26:25:  <slot id='0' type='GPU'/>
09:26:25:</config>
09:26:25:Trying to access database...
09:26:25:Successfully acquired database lock
09:26:25:Enabled folding slot 00: READY gpu:0:Wimbledon XT [Radeon HD 7970M]
09:26:25:WU00:FS00:Starting
09:26:25:WU00:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Kap/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/ATI/R600/Core_17.fah/FahCore_17.exe -dir 00 -suffix 01 -version 704 -lifeline 7496 -checkpoint 6 -gpu 0 -gpu-vendor ati
09:26:25:WU00:FS00:Started FahCore on PID 7700
09:26:27:WU00:FS00:Core PID:9012
09:26:27:WU00:FS00:FahCore 0x17 started
09:26:29:WU00:FS00:0x17:*********************** Log Started 2015-07-07T09:26:29Z ***********************
09:26:29:WU00:FS00:0x17:Project: 9201 (Run 995, Clone 4, Gen 508)
09:26:29:WU00:FS00:0x17:Unit: 0x000002c46652edc45399fd2f0659b15a
09:26:29:WU00:FS00:0x17:CPU: 0x00000000000000000000000000000000
09:26:29:WU00:FS00:0x17:Machine: 0
09:26:29:WU00:FS00:0x17:Digital signatures verified
09:26:29:WU00:FS00:0x17:Folding@home GPU core17
09:26:29:WU00:FS00:0x17:Version 0.0.52
09:26:30:WU00:FS00:0x17:  Found a checkpoint file
09:27:15:WU00:FS00:0x17:Completed 3200000 out of 5000000 steps (64%)
09:27:15:WU00:FS00:0x17:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
09:31:09:15:127.0.0.1:New Web connection
09:39:03:WU00:FS00:0x17:Completed 3250000 out of 5000000 steps (65%)
11:06:40:66:127.0.0.1:New Web connection
13:38:29:109:127.0.0.1:New Web connection
******************************* Date: 2015-07-07 *******************************
15:27:44:140:127.0.0.1:New Web connection
For the record, the GPU utilization was 90% and the GPU temperature was 79 degrees C. This is not an overclocked system.

Any additional suggestions for investigation would be welcome. I would like to make a good contribution to a worthy cause.

Mod edit: added Code tags around log file
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Work Unit (PRCG): 9201 (995, 4, 508)

Post by 7im »

What AMD driver version?
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
dkapetansky
Posts: 9
Joined: Tue Jul 07, 2015 12:05 am

Re: Work Unit (PRCG): 9201 (995, 4, 508)

Post by dkapetansky »

This machine has an AMD Radeon HD 7970M GPU with driver version 14.501.1003.0

Um, perhaps the FAH project isn't rated to run on this brand of hardware?
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Work Unit (PRCG): 9201 (995, 4, 508)

Post by 7im »

As long as you used the driver from AMD and not the laptop maker, it looks like the driver is up to date.

FAH is rated to run on this hardware. If it was not, the GPU: line would say unsupported instead of [Radeon HD 7970M].

Next I would try the bottle cap trick. Place a couple bottle caps or pencils under the laptop to increase the airflow and see if the client runs better.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
dkapetansky
Posts: 9
Joined: Tue Jul 07, 2015 12:05 am

Re: Work Unit (PRCG): 9201 (995, 4, 508)

Post by dkapetansky »

The driver I am using was grabbed by the AMD Catalyst Control Center; it appears to be legitimate and up-to-date.

So now I am in process of putting this machine on a laptop base with a fan, trying to bring ventilation up and operating temperature down. I will report the results as soon as I have them in hand.

Thank you!
dkapetansky
Posts: 9
Joined: Tue Jul 07, 2015 12:05 am

Re: Work Unit (PRCG): 9201 (995, 4, 508)

Post by dkapetansky »

OK, I finally have results for this. Improving the ventilation and reducing the operating temperature as much as practical has no observable effect on processing behavior. The work unit will progress to some maximum value less than 100%, and stall. If I pause and then resume, the reported progress drops back to 40-50%, then moves forward steadily to 99.99%, and stalls again.

Eventually, the work unit will reach its expiration date and be canceled by the system, and a new work unit will be issued. Often this next WU will process and complete without complication. Weather or not a particular WU will complete or not has proven entirely unpredictable.

The log snapshot looks like this:

Code: Select all

*********************** Log Started 2015-07-11T00:41:52Z ***********************
00:41:52:************************* Folding@home Client *************************
00:41:52:      Website: http://folding.stanford.edu/
00:41:52:    Copyright: (c) 2009-2014 Stanford University
00:41:52:       Author: Joseph Coffland <joseph@cauldrondevelopment.com>
00:41:52:         Args: 
00:41:52:       Config: C:/Users/Kap/AppData/Roaming/FAHClient/config.xml
00:41:52:******************************** Build ********************************
00:41:52:      Version: 7.4.4
00:41:52:         Date: Mar 4 2014
00:41:52:         Time: 20:26:54
00:41:52:      SVN Rev: 4130
00:41:52:       Branch: fah/trunk/client
00:41:52:     Compiler: Intel(R) C++ MSVC 1500 mode 1200
00:41:52:      Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE
00:41:52:               /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT /Qmkl
00:41:52:     Platform: win32 XP
00:41:52:         Bits: 32
00:41:52:         Mode: Release
00:41:52:******************************* System ********************************
00:41:52:          CPU: Intel(R) Core(TM) i7-3840QM CPU @ 2.80GHz
00:41:52:       CPU ID: GenuineIntel Family 6 Model 58 Stepping 9
00:41:52:         CPUs: 8
00:41:52:       Memory: 15.95GiB
00:41:52:  Free Memory: 14.41GiB
00:41:52:      Threads: WINDOWS_THREADS
00:41:52:   OS Version: 6.1
00:41:52:  Has Battery: true
00:41:52:   On Battery: false
00:41:52:   UTC Offset: -4
00:41:52:          PID: 4632
00:41:52:          CWD: C:/Users/Kap/AppData/Roaming/FAHClient
00:41:52:           OS: Windows 7 Home Premium
00:41:52:      OS Arch: AMD64
00:41:52:         GPUs: 1
00:41:52:        GPU 0: ATI:5 Wimbledon XT [Radeon HD 7970M]
00:41:52:         CUDA: Not detected
00:41:52:Win32 Service: false
00:41:52:***********************************************************************
00:41:52:<config>
00:41:52:  <!-- Folding Core -->
00:41:52:  <checkpoint v='6'/>
00:41:52:  <core-priority v='low'/>
00:41:52:
00:41:52:  <!-- Network -->
00:41:52:  <proxy v=':8080'/>
00:41:52:
00:41:52:  <!-- Slot Control -->
00:41:52:  <pause-on-battery v='false'/>
00:41:52:  <power v='full'/>
00:41:52:
00:41:52:  <!-- User Information -->
00:41:52:  <passkey v='********************************'/>
00:41:52:  <user v='dkapetansky'/>
00:41:52:
00:41:52:  <!-- Folding Slots -->
00:41:52:  <slot id='0' type='GPU'/>
00:41:52:</config>
00:41:52:Trying to access database...
00:41:52:Successfully acquired database lock
00:41:52:Enabled folding slot 00: READY gpu:0:Wimbledon XT [Radeon HD 7970M]
00:41:54:WU00:FS00:Starting
00:41:54:WU00:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Kap/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/ATI/R600/Core_17.fah/FahCore_17.exe -dir 00 -suffix 01 -version 704 -lifeline 4632 -checkpoint 6 -gpu 0 -gpu-vendor ati
00:41:54:WU00:FS00:Started FahCore on PID 4540
00:41:59:WU00:FS00:Core PID:8496
00:41:59:WU00:FS00:FahCore 0x17 started
00:41:59:WU00:FS00:0x17:*********************** Log Started 2015-07-11T00:41:59Z ***********************
00:41:59:WU00:FS00:0x17:Project: 9201 (Run 981, Clone 1, Gen 507)
00:41:59:WU00:FS00:0x17:Unit: 0x000002e26652edc45399fc9aa2217c06
00:41:59:WU00:FS00:0x17:CPU: 0x00000000000000000000000000000000
00:41:59:WU00:FS00:0x17:Machine: 0
00:41:59:WU00:FS00:0x17:Digital signatures verified
00:41:59:WU00:FS00:0x17:Folding@home GPU core17
00:41:59:WU00:FS00:0x17:Version 0.0.52
00:42:01:WU00:FS00:0x17:  Found a checkpoint file
00:43:09:WU00:FS00:0x17:Completed 500000 out of 5000000 steps (10%)
00:43:09:WU00:FS00:0x17:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
00:53:28:WU00:FS00:0x17:Completed 550000 out of 5000000 steps (11%)
00:58:14:13:127.0.0.1:New Web connection
01:03:54:WU00:FS00:0x17:Completed 600000 out of 5000000 steps (12%)
01:14:14:WU00:FS00:0x17:Completed 650000 out of 5000000 steps (13%)
01:24:39:WU00:FS00:0x17:Completed 700000 out of 5000000 steps (14%)
03:18:44:69:127.0.0.1:New Web connection
06:35:07:99:127.0.0.1:New Web connection
******************************* Date: 2015-07-11 *******************************
11:00:23:125:127.0.0.1:New Web connection
******************************* Date: 2015-07-11 *******************************
22:27:05:FS00:Paused
22:27:05:FS00:Shutting core down
22:27:06:WU00:FS00:0x17:WARNING:Console control signal 1 on PID 8496
22:27:06:WU00:FS00:0x17:Exiting, please wait. . .
22:27:16:Removing old file 'configs/config-20150628-175217.xml'
22:27:16:Saving configuration to config.xml
22:27:16:<config>
22:27:16:  <!-- Folding Core -->
22:27:16:  <checkpoint v='6'/>
22:27:16:  <core-priority v='low'/>
22:27:16:
22:27:16:  <!-- Network -->
22:27:16:  <proxy v=':8080'/>
22:27:16:
22:27:16:  <!-- Slot Control -->
22:27:16:  <pause-on-battery v='false'/>
22:27:16:  <power v='full'/>
22:27:16:
22:27:16:  <!-- User Information -->
22:27:16:  <passkey v='********************************'/>
22:27:16:  <user v='dkapetansky'/>
22:27:16:
22:27:16:  <!-- Folding Slots -->
22:27:16:  <slot id='0' type='GPU'>
22:27:16:    <paused v='true'/>
22:27:16:  </slot>
22:27:16:</config>
22:28:06:WARNING:FS00:Killing WU00
22:28:06:WU00:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
22:28:31:155:127.0.0.1:New Web connection
22:28:32:164:127.0.0.1:New Web connection
22:28:45:FS00:Unpaused
22:28:45:WU00:FS00:Starting
22:28:45:WU00:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Kap/AppData/Roaming/FAHClient/cores/web.stanford.edu/~pande/Win32/AMD64/ATI/R600/Core_17.fah/FahCore_17.exe -dir 00 -suffix 01 -version 704 -lifeline 4632 -checkpoint 6 -gpu 0 -gpu-vendor ati
22:28:45:WU00:FS00:Started FahCore on PID 297468
22:28:45:WU00:FS00:Core PID:270836
22:28:45:WU00:FS00:FahCore 0x17 started
22:28:45:WU00:FS00:0x17:*********************** Log Started 2015-07-11T22:28:45Z ***********************
22:28:45:WU00:FS00:0x17:Project: 9201 (Run 981, Clone 1, Gen 507)
22:28:45:WU00:FS00:0x17:Unit: 0x000002e26652edc45399fc9aa2217c06
22:28:45:WU00:FS00:0x17:CPU: 0x00000000000000000000000000000000
22:28:45:WU00:FS00:0x17:Machine: 0
22:28:45:WU00:FS00:0x17:Digital signatures verified
22:28:45:WU00:FS00:0x17:Folding@home GPU core17
22:28:45:WU00:FS00:0x17:Version 0.0.52
22:28:46:WU00:FS00:0x17:  Found a checkpoint file
22:29:18:Removing old file 'configs/config-20150628-175520.xml'
22:29:18:Saving configuration to config.xml
22:29:18:<config>
22:29:18:  <!-- Folding Core -->
22:29:18:  <checkpoint v='6'/>
22:29:18:  <core-priority v='low'/>
22:29:18:
22:29:18:  <!-- Network -->
22:29:18:  <proxy v=':8080'/>
22:29:18:
22:29:18:  <!-- Slot Control -->
22:29:18:  <pause-on-battery v='false'/>
22:29:18:  <power v='full'/>
22:29:18:
22:29:18:  <!-- User Information -->
22:29:18:  <passkey v='********************************'/>
22:29:18:  <user v='dkapetansky'/>
22:29:18:
22:29:18:  <!-- Folding Slots -->
22:29:18:  <slot id='0' type='GPU'/>
22:29:18:</config>
22:29:30:WU00:FS00:0x17:Completed 700000 out of 5000000 steps (14%)
22:29:30:WU00:FS00:0x17:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
22:35:31:WU00:FS00:0x17:Completed 750000 out of 5000000 steps (15%)
22:42:07:WU00:FS00:0x17:Completed 800000 out of 5000000 steps (16%)
22:52:45:WU00:FS00:0x17:Completed 850000 out of 5000000 steps (17%)
23:03:09:WU00:FS00:0x17:Completed 900000 out of 5000000 steps (18%)
23:13:22:WU00:FS00:0x17:Completed 950000 out of 5000000 steps (19%)
23:23:41:WU00:FS00:0x17:Completed 1000000 out of 5000000 steps (20%)
23:33:59:WU00:FS00:0x17:Completed 1050000 out of 5000000 steps (21%)
23:44:26:WU00:FS00:0x17:Completed 1100000 out of 5000000 steps (22%)
23:54:59:WU00:FS00:0x17:Completed 1150000 out of 5000000 steps (23%)
00:05:21:WU00:FS00:0x17:Completed 1200000 out of 5000000 steps (24%)
00:15:46:WU00:FS00:0x17:Completed 1250000 out of 5000000 steps (25%)
00:26:04:WU00:FS00:0x17:Completed 1300000 out of 5000000 steps (26%)
00:36:31:WU00:FS00:0x17:Completed 1350000 out of 5000000 steps (27%)
00:46:53:WU00:FS00:0x17:Completed 1400000 out of 5000000 steps (28%)
00:57:29:WU00:FS00:0x17:Completed 1450000 out of 5000000 steps (29%)
01:07:59:WU00:FS00:0x17:Completed 1500000 out of 5000000 steps (30%)
01:18:24:WU00:FS00:0x17:Completed 1550000 out of 5000000 steps (31%)
01:28:49:WU00:FS00:0x17:Completed 1600000 out of 5000000 steps (32%)
01:39:01:216:127.0.0.1:New Web connection
01:39:21:WU00:FS00:0x17:Completed 1650000 out of 5000000 steps (33%)
01:49:43:WU00:FS00:0x17:Completed 1700000 out of 5000000 steps (34%)
02:00:10:WU00:FS00:0x17:Completed 1750000 out of 5000000 steps (35%)
02:10:43:WU00:FS00:0x17:Completed 1800000 out of 5000000 steps (36%)
02:21:04:WU00:FS00:0x17:Completed 1850000 out of 5000000 steps (37%)
02:31:39:WU00:FS00:0x17:Completed 1900000 out of 5000000 steps (38%)
02:42:10:WU00:FS00:0x17:Completed 1950000 out of 5000000 steps (39%)
02:52:46:WU00:FS00:0x17:Completed 2000000 out of 5000000 steps (40%)
03:03:26:WU00:FS00:0x17:Completed 2050000 out of 5000000 steps (41%)
03:13:56:WU00:FS00:0x17:Completed 2100000 out of 5000000 steps (42%)
03:24:31:WU00:FS00:0x17:Completed 2150000 out of 5000000 steps (43%)
03:35:13:WU00:FS00:0x17:Completed 2200000 out of 5000000 steps (44%)
03:45:47:WU00:FS00:0x17:Completed 2250000 out of 5000000 steps (45%)
******************************* Date: 2015-07-12 *******************************
08:22:35:253:127.0.0.1:New Web connection
08:22:37:261:127.0.0.1:New Web connection
11:51:10:289:127.0.0.1:New Web connection
******************************* Date: 2015-07-12 *******************************
17:30:06:320:127.0.0.1:New Web connection
18:33:08:348:127.0.0.1:New Web connection
21:02:32:378:127.0.0.1:New Web connection
22:25:42:408:127.0.0.1:New Web connection
Mod edit: added Code tags to log
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Work Unit (PRCG): 9201 (995, 4, 508)

Post by bruce »

According to your log, the WU Project: 9201 (Run 981, Clone 1, Gen 507)
had already progressed to about (10%), resuming progress at 00:43:09 GMT
It then processed to about (14%) by 01:24:39 at which point it apparently hung or entered sleep mode.
You paused it at 22:27:05 and unpaused it less than a minute later.
It restarted from (14%) at 22:29:30 to (45%) at 03:45:47, at which point, nothing more happened until at least 22:25:42.

Yes, it is hanging for no apparent reason, and whenever you pause it and resume folding, it makes some progress before hanging again. Note that it is NOT progressing to 99.99% even though that's what the web client is telling you.

What are your power saving features set to? Perhaps pausing and upausing resumes work only because you've waked it up from a CPU sleep condition.
dkapetansky
Posts: 9
Joined: Tue Jul 07, 2015 12:05 am

Re: Work Unit (PRCG): 9201 (995, 4, 508)

Post by dkapetansky »

Your analysis seems to be spot-on, and you've had a number of good ideas.

The power setting for this machine has been set to "High Performance," which means "Dim the display- Never," "Turn off the display- Never," "Put the computer to sleep- Never," and "Adjust plan brightness- Maximum."

If this WU does not complete by 07/14/2015 @ 8:00, then it will expire, and a new WU will be issued to this machine, which might very well complete successfully. So, there is a sort of limping progress, in that processing will never hang forever, but I'm still interested in boosting the efficiency by sidestepping stalls.

Do you think I should try to report this behavior to a project tech lead?
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Work Unit (PRCG): 9201 (995, 4, 508)

Post by bruce »

Do you think I should try to report this behavior to a project tech lead?
No. Whatever is going on is nothing they could fix. It's something in your computer, whether heat related, power related, or OS settings. I have no doubt that the same WU will complete successfully on another computer.

Until somebody correctly guesses what to change on your system, the only thing I can suggest is to pause/resume whenever it takes more than the typical ~10 minutes (for that project) between progress reports.

Do you have a way to monitor GPU temperature?
dkapetansky
Posts: 9
Joined: Tue Jul 07, 2015 12:05 am

Re: Work Unit (PRCG): 9201 (995, 4, 508)

Post by dkapetansky »

I have been using "GPU Shark" to monitor my GPU temperature. Since this machine was last rebooted a couple of days ago, the GPU temperature has ranged from 71-79 degrees Celsius. Currently, it is at 78 degrees, and is being used at 98% of maximum. The CPU is not being used for this project.

OK, thank you for your insight and comments.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Work Unit (PRCG): 9201 (995, 4, 508)

Post by bruce »

All WUs use the CPU, at least a little, but that's not particularly important unless you're doing some very heavy work with your CPU on something else.
dkapetansky
Posts: 9
Joined: Tue Jul 07, 2015 12:05 am

Re: Work Unit (PRCG): 9201 (995, 4, 508)

Post by dkapetansky »

Interesting. OK, for the record, the CPU on this machine is being used concurrently for work on IBM's World Community Grid. The CPU utilization is 100%, and its temperature ranges from 92-101 degrees Celsius. The World Community Grid processes run in the background, at the lowest process priority.

Would this be expected to cause interference with Folding at Home processing? Once, when a FAH WU was stalled, I ran a test by pausing the World Community Grid processing, but the FAH WU still did not advance further. Well, it was worth a try.
Post Reply