PRCG: 10467 (0, 0, 163) very slow (about 25% of normal)

Moderators: Site Moderators, FAHC Science Team

Post Reply
petem
Posts: 14
Joined: Wed Nov 18, 2015 4:57 pm
Hardware configuration: i5-2400, Asus P8Z77V-Pro, 2x4GB mem, 80GB HD, 2xGTX 970 (EVGA SSC), GTX 660ti (PNY), Rosewill Quark 750W PSU, Costco grape tray case ; )

PRCG: 10467 (0, 0, 163) very slow (about 25% of normal)

Post by petem »

I've been folding successfully for about three weeks on my dedicated Linux rig (2 weeks since I added another video card) without issues.
Earlier today, I noticed an unusually poor ETA (about 25% the normal ppd) WU on the recently added card and researched it without any luck. Details follow:

First the system:
  • Mobo: Asus P8Z77V-Pro
    CPU: i5-2300
    Video:
    • 2 x EVGA GeForce GTX 970 04G-P4-3975-KR 4GB SSC GAMING w/ACX 2.0+ (factory overclocked 13% to 1190MHz)
      1 x PNY GTX 660 ti XLR8 (stock clocks)
    PSU: 750W platinum rated (system pulls 580W max)
    OS: Ubuntu 14.04.3 LTS
    Drivers: NVidia 355.11
    Other Software: just enough to set the system up for folding: Updates, etc.

    Except for 1 WU with the "slow bug", I have had pretty consistent results so far.
The Log (excerpted, from the end of the previous WU to current):

Code: Select all

21:32:13:WU03:FS03:0x18:Completed 15840000 out of 16000000 steps (99%)
21:32:14:WU02:FS03:Connecting to 171.67.108.45:80
21:32:15:WU02:FS03:Assigned to work server 140.163.4.233
21:32:15:WU02:FS03:Requesting new work unit for slot 03: RUNNING gpu:2:GK104 [GeForce GTX 660 Ti] from 140.163.4.233
21:32:15:WU02:FS03:Connecting to 140.163.4.233:8080
21:32:16:WU02:FS03:Downloading 4.28MiB
21:32:17:WU02:FS03:Download complete
21:32:17:WU02:FS03:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:10467 run:0 clone:0 gen:163 core:0x17 unit:0x0000012a538b3db95388ce27a2896c8a
21:32:18:WU02:FS03:Downloading core from http://web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_17.fah
21:32:18:WU02:FS03:Connecting to web.stanford.edu:80
21:32:18:WU02:FS03:FahCore 17: Downloading 3.01MiB
21:32:20:WU02:FS03:FahCore 17: Download complete
21:32:20:WU02:FS03:Valid core signature
21:32:20:WU02:FS03:Unpacked 8.16MiB to cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17
21:45:47:WU03:FS03:0x18:Completed 16000000 out of 16000000 steps (100%)
21:45:49:WU03:FS03:0x18:Saving result file logfile_01.txt
21:45:49:WU03:FS03:0x18:Saving result file checkpointState.xml
21:45:50:WU03:FS03:0x18:Saving result file checkpt.crc
21:45:50:WU03:FS03:0x18:Saving result file log.txt
21:45:50:WU03:FS03:0x18:Saving result file positions.xtc
21:45:54:WU03:FS03:0x18:Folding@home Core Shutdown: FINISHED_UNIT
21:45:55:WU03:FS03:FahCore returned: FINISHED_UNIT (100 = 0x64)
21:45:55:WU03:FS03:Sending unit results: id:03 state:SEND error:NO_ERROR project:9430 run:38 clone:5 gen:149 core:0x18 unit:0x000000b0ab40413855474c47c6951f9f
21:45:55:WU03:FS03:Uploading 24.04MiB to 171.64.65.56
21:45:55:WU03:FS03:Connecting to 171.64.65.56:8080
21:45:55:WU02:FS03:Starting
21:45:55:WU02:FS03:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17 -dir 02 -suffix 01 -version 704 -lifeline 1202 -checkpoint 15 -gpu 2 -gpu-vendor nvidia
21:45:55:WU02:FS03:Started FahCore on PID 13583
21:45:55:WU02:FS03:Core PID:13587
21:45:55:WU02:FS03:FahCore 0x17 started
21:45:55:WU02:FS03:0x17:*********************** Log Started 2015-11-24T21:45:55Z ***********************
21:45:55:WU02:FS03:0x17:Project: 10467 (Run 0, Clone 0, Gen 163)
21:45:55:WU02:FS03:0x17:Unit: 0x0000012a538b3db95388ce27a2896c8a
21:45:55:WU02:FS03:0x17:CPU: 0x00000000000000000000000000000000
21:45:55:WU02:FS03:0x17:Machine: 3
21:45:55:WU02:FS03:0x17:Reading tar file state.xml
21:45:56:WU02:FS03:0x17:Reading tar file system.xml
21:45:56:WU02:FS03:0x17:Reading tar file integrator.xml
21:45:56:WU02:FS03:0x17:Reading tar file core.xml
21:45:56:WU02:FS03:0x17:Digital signatures verified
21:46:01:WU03:FS03:Upload 7.54%
21:46:03:WU00:FS00:0xa4:Completed 237500 out of 250000 steps  (95%)
21:46:07:WU03:FS03:Upload 13.26%
21:46:13:WU03:FS03:Upload 18.72%
21:46:19:WU03:FS03:Upload 24.44%
21:46:25:WU03:FS03:Upload 29.90%
21:46:31:WU03:FS03:Upload 35.36%
21:46:37:WU03:FS03:Upload 41.08%
21:46:43:WU03:FS03:Upload 46.54%
21:46:49:WU03:FS03:Upload 52.52%
21:46:55:WU03:FS03:Upload 58.24%
21:47:01:WU03:FS03:Upload 63.70%
21:47:07:WU03:FS03:Upload 69.16%
21:47:13:WU03:FS03:Upload 74.88%
21:47:19:WU03:FS03:Upload 80.86%
21:47:25:WU03:FS03:Upload 86.59%
21:47:31:WU03:FS03:Upload 91.79%
21:47:37:WU03:FS03:Upload 97.77%
21:47:47:WU03:FS03:Upload complete
21:47:47:WU03:FS03:Server responded WORK_ACK (400)
21:47:47:WU03:FS03:Final credit estimate, 71374.00 points
21:47:47:WU03:FS03:Cleaning up
21:47:51:WU02:FS03:0x17:Completed 0 out of 5000000 steps (0%)
22:12:00:WU02:FS03:0x17:Completed 50000 out of 5000000 steps (1%)
22:36:05:WU02:FS03:0x17:Completed 100000 out of 5000000 steps (2%)
23:00:16:WU02:FS03:0x17:Completed 150000 out of 5000000 steps (3%)
23:24:21:WU02:FS03:0x17:Completed 200000 out of 5000000 steps (4%)

... <paused> ...

00:03:36:FS03:Unpaused
00:03:36:WU02:FS03:Starting
00:03:36:WU02:FS03:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_17.fah/FahCore_17 -dir 02 -suffix 01 -version 704 -lifeline 1202 -checkpoint 15 -gpu 2 -gpu-vendor nvidia
00:03:36:WU02:FS03:Started FahCore on PID 24044
00:03:36:WU02:FS03:Core PID:24048
00:03:36:WU02:FS03:FahCore 0x17 started
00:03:37:WU02:FS03:0x17:*********************** Log Started 2015-11-26T00:03:37Z ***********************
00:03:37:WU02:FS03:0x17:Project: 10467 (Run 0, Clone 0, Gen 163)
00:03:37:WU02:FS03:0x17:Unit: 0x0000012a538b3db95388ce27a2896c8a
00:03:37:WU02:FS03:0x17:CPU: 0x00000000000000000000000000000000
00:03:37:WU02:FS03:0x17:Machine: 3
00:03:37:WU02:FS03:0x17:Digital signatures verified
00:03:37:WU02:FS03:0x17:  Found a checkpoint file
00:05:29:WU02:FS03:0x17:Completed 2250000 out of 5000000 steps (45%)
00:29:39:WU02:FS03:0x17:Completed 2300000 out of 5000000 steps (46%)
...
02:54:21:WU02:FS03:0x17:Completed 2600000 out of 5000000 steps (52%)
03:18:32:WU02:FS03:0x17:Completed 2650000 out of 5000000 steps (53%)
03:42:37:WU02:FS03:0x17:Completed 2700000 out of 5000000 steps (54%)
04:06:42:WU02:FS03:0x17:Completed 2750000 out of 5000000 steps (55%)
Issue/Discussion:
The problem is on Slot 3, a GTX 660 ti working on Core 0x17, PRCG: 10467 (0, 0, 163)
It's TPF for this WU has been about 24 minutes the whole time, regardless of attempts to reset it.
It currently is at 56.59% with a TPF of 24:08 (m:s), and an ETA of 17 hrs and 28 mins,
Base credit: 14755, Estimated: 29946, PPD: 17860

Typical PPD for the GTX 660 ti is 60k-70k.

All other slots (1 CPU (only 1 core in use out of 4, lol), and 2 GTX 970's) are behaving normally.

I have paused it and after a long wait (I got called away) started it again.
I have rebooted the PC.
I downgraded the slot to "On idle" to get it to stop, and then started it again (after about a 5 minute wait).

All to no avail.

I have found a few posts with issues on Project 10467, but they haven't had this particular issue.
No overheating, no errors, nothing out of the ordinary - just slow.

The 660 ti is not overclocked - it's at stock reference clocks, it's temps and power usage are normal.

Any information or steps to fix it (if it is indeed abnormal), would be appreciated.

Thank you
- Pete
toTOW
Site Moderator
Posts: 6309
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: PRCG: 10467 (0, 0, 163) very slow (about 25% of normal)

Post by toTOW »

Does nvidia-smi show a fully loaded GPU ? Are temperature consistent with a FAH load while running this WU ?
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
bollix47
Posts: 2942
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: PRCG: 10467 (0, 0, 163) very slow (about 25% of normal)

Post by bollix47 »

Hi Pete,

Unfortunately Fermi and Maxwell GPUs don't co-exist well with each other in the folding world because they have different driver requirements for optimum PPD. My 660 ti works well with 304.128 but if I try the latest drivers the PPD drops dramatically. You will probably need later drivers to get your 970s folding so mixing the two types does not work well. You may find a driver that works well with both but I'm not aware of one. I purposely keep the two types on separate computers so that I can use the driver that works best for each.
petem
Posts: 14
Joined: Wed Nov 18, 2015 4:57 pm
Hardware configuration: i5-2400, Asus P8Z77V-Pro, 2x4GB mem, 80GB HD, 2xGTX 970 (EVGA SSC), GTX 660ti (PNY), Rosewill Quark 750W PSU, Costco grape tray case ; )

Re: PRCG: 10467 (0, 0, 163) very slow (about 25% of normal)

Post by petem »

Sorry 'bout the delayed response - Black Friday was a riot, and I'm recuperating :D
toTOW wrote:Does nvidia-smi show a fully loaded GPU ? Are temperature consistent with a FAH load while running this WU ?
I was looking at wattages - they were within the normal range. I'm running a minimal linux install and am not familiar with temperature utilities.
bollix47 wrote:Hi Pete,

Unfortunately Fermi and Maxwell GPUs don't co-exist well with each other in the folding world because they have different driver requirements for optimum PPD. My 660 ti works well with 304.128 but if I try the latest drivers the PPD drops dramatically. You will probably need later drivers to get your 970s folding so mixing the two types does not work well. You may find a driver that works well with both but I'm not aware of one. I purposely keep the two types on separate computers so that I can use the driver that works best for each.
Until now, they've been doing fine (using the 355.11 NVidia drivers, as noted above).

Anyway, I just toughed it out, and everything returned to "normal", with the 660 ti pulling 60-70k PPD again. Thanks for the info! :)
Post Reply