RTX 2080 lags

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, FAHC Science Team

Post Reply
Azmodes
Posts: 37
Joined: Wed Jan 09, 2019 10:38 am
Location: Ob der Enns
Contact:

RTX 2080 lags

Post by Azmodes »

Hello. I've been crunching on BOINC since 2016, but their GPUGrid project doesn't yet support Turing, hence me giving this a try now.

Everything appears to be working well enough, except for one thing: Whenever my RTX 2080 is folding, there are noticable lags about every 10-20 seconds, especially (only?) when I'm running Google Chrome. While I'm also running various BOINC projects on my other GPUs and the CPU (see specs below), this is definitely F@H-related, since I've tried it with those turned off and the problem persists (and I have never had any such issues with them before I started folding). I have tried searching this forum, but unsurprisingly problems with lag usually involve the GPU being connected to a screen and the RTX is not (my two screens are connected to a GTX 1060). This is no really ideal, since a) this is my main PC I use every day and b) I specifically intended the RTX 2080 as a pure compute card and hence didn't connect any screens to it, to avoid this very problem.

I have tried increasing/decreasing core priority, turrning off hardware acceleration in Chrome, changing task priority in the task manager, lowering folding power; nothing of which has had any impact.

Any help appreciated.

My system:
Win 10, 64-bit
ASUS Prime X399-A mobo
Threadripper 1950X
2x GTX 1060 3GB (doing BOINC, two screens connected to one of those)
1x RTX 2080 (doing F@H)
32 GB of DDR4 RAM running at 2666 MHz

Log:

Code: Select all

01:27:51:******************************** Build ********************************
01:27:51:        Version: 7.5.1
01:27:51:           Date: May 11 2018
01:27:51:           Time: 13:06:32
01:27:51:     Repository: Git
01:27:51:       Revision: 4705bf53c635f88b8fe85af7675557e15d491ff0
01:27:51:         Branch: master
01:27:51:       Compiler: Visual C++ 2008
01:27:51:        Options: /TP /nologo /EHa /wd4297 /wd4103 /Ox /MT
01:27:51:       Platform: win32 10
01:27:51:           Bits: 32
01:27:51:           Mode: Release
01:27:51:******************************* System ********************************
01:27:51:            CPU: AMD Ryzen Threadripper 1950X 16-Core Processor
01:27:51:         CPU ID: AuthenticAMD Family 23 Model 1 Stepping 1
01:27:51:           CPUs: 32
01:27:51:         Memory: 31.87GiB
01:27:51:    Free Memory: 9.79GiB
01:27:51:        Threads: WINDOWS_THREADS
01:27:51:     OS Version: 6.2
01:27:51:    Has Battery: false
01:27:51:     On Battery: false
01:27:51:     UTC Offset: 1
01:27:51:            PID: 13788
01:27:51:            CWD: C:\Users\Dani\AppData\Roaming\FAHClient
01:27:51:             OS: Windows 10 Enterprise
01:27:51:        OS Arch: AMD64
01:27:51:           GPUs: 3
01:27:51:          GPU 0: Bus:65 Slot:0 Func:0 NVIDIA:7 GP106 [GeForce GTX 1060 3GB] 3935
01:27:51:          GPU 1: Bus:9 Slot:0 Func:0 NVIDIA:7 GP106 [GeForce GTX 1060 3GB] 3935
01:27:51:          GPU 2: Bus:66 Slot:0 Func:0 NVIDIA:7 TU104 [GeForce RTX 2080 Rev. A]
01:27:51:                 10068
01:27:51:  CUDA Device 0: Platform:0 Device:0 Bus:66 Slot:0 Compute:7.5 Driver:10.0
01:27:51:  CUDA Device 1: Platform:0 Device:1 Bus:9 Slot:0 Compute:6.1 Driver:10.0
01:27:51:  CUDA Device 2: Platform:0 Device:2 Bus:65 Slot:0 Compute:6.1 Driver:10.0
01:27:51:OpenCL Device 0: Platform:0 Device:0 Bus:66 Slot:0 Compute:1.2 Driver:416.94
01:27:51:OpenCL Device 1: Platform:0 Device:1 Bus:9 Slot:0 Compute:1.2 Driver:416.94
01:27:51:OpenCL Device 2: Platform:0 Device:2 Bus:65 Slot:0 Compute:1.2 Driver:416.94
01:27:51:  Win32 Service: false
01:27:51:***********************************************************************
01:27:51:<config>
01:27:51:  <!-- Network -->
01:27:51:  <proxy v=':8080'/>
01:27:51:
01:27:51:  <!-- Slot Control -->
01:27:51:  <power v='full'/>
01:27:51:
01:27:51:  <!-- User Information -->
01:27:51:  <passkey v='********************************'/>
01:27:51:  <team v='1604'/>
01:27:51:  <user v='Azmodes'/>
01:27:51:
01:27:51:  <!-- Folding Slots -->
01:27:51:  <slot id='3' type='GPU'>
01:27:51:    <gpu-index v='2'/>
01:27:51:  </slot>
01:27:51:</config>
Image
ProDigit
Posts: 242
Joined: Sun Dec 09, 2018 10:23 pm

Re: RTX 2080 lags

Post by ProDigit »

The chrome browser is running the desktop client. You don't need that one. Just have fah control open or minimized. The chrome browser tries to refresh the connection every so often.
It also goes into sleep mode if not used, and comes out of it when you focus (eg click) on the browser.
On low powered machines like dual cores, running the browser open can diminish PPDs by almost 10%.
The number is smaller on GPU folding and powerful machines, but chrome uses memory, and just the green icon letting you know you're folding, is consuming GPU (you can see that by opening task manager in Windows, and see how 3D calculations are detracting from compute0.

I would close chrome, minimize fahcontrol, close explorer in taskmanager (it'll close your windows desktop, which can be reenabled by starting a new session of explorer from the taskmanager <CTRL>+<SHIFT>+<ESC>). Close Cortana, photos, windows search indexing, and any other task which is suspended (green leaf in Windows 10).
Close all other things you don't need, like browsers, foreground and background programs, services you don't need, like update services, background services, print spoilers, event managers, etc...
Also background picture off to solid color (or if your is doesn't support it, create a 1 pixel background, stretched, of any color you like).
Remove screen saver, revert to blank screen saver.
All the above give minimal optimization that is not measurable in FAH, due to greater fluctuations in work units, but still is there.
In the long run, leaving your system running as barebones as possible, does add up to about 10% of extra PPD.

I do leave the firewall running, but turn off active virus scanning, and set the fah GPU folding program in taskmanager to high, and the CPU folding program to above average.
Since each WU takes about a good 12 - 24 hours to complete, i don't have to worry too much about thread priority once set.
Joe_H
Site Admin
Posts: 7854
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: RTX 2080 lags

Post by Joe_H »

ProDigit's advice is only good if you are going for maximum points and do not use your system for other things at the same time.

As for Chrome, there used to be an option to turn off hardware (GPU) acceleration which would use your CPU for rendering instead. I don't know if that option is still available.

Unless you are actively using FAHControl, we generally recommend closing it when not needed.

One final thing, have you checked that the GPU folding core is running on the 2080 when active? We have run into issues in the past where the GPU index numbers have not properly matched up after an install, so the GPU core ends up running on a different GPU than shown in FAHControl.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Azmodes
Posts: 37
Joined: Wed Jan 09, 2019 10:38 am
Location: Ob der Enns
Contact:

Re: RTX 2080 lags

Post by Azmodes »

Joe_H wrote:ProDigit's advice is only good if you are going for maximum points and do not use your system for other things at the same time.
Yeah, as I said, this is my main computer I also use for other things. I'm not planning to strip it bare in order to optimise crunching performance. :)
Joe_H wrote:As for Chrome, there used to be an option to turn off hardware (GPU) acceleration which would use your CPU for rendering instead. I don't know if that option is still available.
I have that switched off by default.

Also, I'm not using the browser F@H. What I meant was that while browsing in general, there's a lag. Well, umm, there was, because now it's gone again. Possibly related to me switching to another BOINC CPU project, because the previous one also seemed to mess with Chrome for some reason. It's weird, though, because I tried completely turning off BOINC and the problem persisted then.
Joe_H wrote:Unless you are actively using FAHControl, we generally recommend closing it when not needed.
Okay, thanks.
Joe_H wrote:One final thing, have you checked that the GPU folding core is running on the 2080 when active? We have run into issues in the past where the GPU index numbers have not properly matched up after an install, so the GPU core ends up running on a different GPU than shown in FAHControl.
Yes, it's definitely using it. I have monitoring software which shows 80-85% core load on the RTX and when I pause folding, load goes to 0. Using the correct card for sure.

Anyway, since this seems to have more or less fixed itself (for now?), another question. Is GPU load like that (<90%) considered normal? Certain BOINC projects have the option to run more than one task per GPU to improve throughput. Does something similar apply to F@H? Can I simply create another slot for the same GPU? Would this benefit my PPD?

EDIT: oh and is there an official Discord or IRC channel?
Image
bollix47
Posts: 2941
Joined: Sun Dec 02, 2007 5:04 am
Location: Canada

Re: RTX 2080 lags

Post by bollix47 »

You may find additional resources @ https://www.reddit.com/r/foldingathome/

Related to lag are you leaving 1 or 2 free CPU cores? Your GPU slot will require at least one free core to feed the GPU and another one for your browsing etc.
Above you said you turned off Boinc and the lag continued ... that does point to the hardware acceleration setting in Advanced Settings ... the normal default is On so you may want to check it in case it has somehow returned to default, if it is On turn it off and restart Chrome.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: RTX 2080 lags

Post by bruce »

Does the PCIe utilization show spikes during those 10-20 second moments?

I suspect that you've overcommitted your CPU threads. FAH expects that there is at least one CPU that's DEDICATED to FAHCore_21 for each FAH GPU. When FAHClient configures itself, it will restrict the CPU slots from using those CPUs, but of course that only works when you don't run some other heavy CPU application such as BOINC or FAH's NaCl client. The amount of processing used by FAHCore_21 will vary but that's difficult to detect because the driver uses a spin-wait which registers as CPU-busy. Periodically, it does other processing, too, and I don't know how often -- possibly every 10-20 seconds.

Balancing FAH with BOINC is a very challenging process. Both are designed to use unused CPU resources, but neither can be convinced to share nicely with the other. My traditional recommendation is to run one or the other -- unless you choose to work out affinity settings that are reassigned when processes are restarted.
ProDigit
Posts: 242
Joined: Sun Dec 09, 2018 10:23 pm

Re: RTX 2080 lags

Post by ProDigit »

on my cards I get a 95-98% GPU utilization. More on the slower cards than on the faster ones.
The remaining 10% in your case can be dedicated graphics to your monitor (3D), and transactions between RAM and GPU.
90% is normal for faster cards.
Ideally you'd want compute_0 to hit 95+%, but that would not be possible on a dedicated graphics card that's also feeding a monitor data.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: RTX 2080 lags

Post by bruce »

Pay attention. I am NOT asking about GPU Utilization, I am asking about variations in PCIe utilization.
ProDigit
Posts: 242
Joined: Sun Dec 09, 2018 10:23 pm

Re: RTX 2080 lags

Post by ProDigit »

Bruce, My answer was to Azmodes, who asked:
....another question. Is GPU load like that (<90%) considered normal?
Azmodes
Posts: 37
Joined: Wed Jan 09, 2019 10:38 am
Location: Ob der Enns
Contact:

Re: RTX 2080 lags

Post by Azmodes »

bollix47 wrote:You may find additional resources @ https://www.reddit.com/r/foldingathome/.
Thanks, I already subscribed to it before registering here. :wink:
bollix47 wrote:Related to lag are you leaving 1 or 2 free CPU cores? Your GPU slot will require at least one free core to feed the GPU and another one for your browsing etc.
Above you said you turned off Boinc and the lag continued ... that does point to the hardware acceleration setting in Advanced Settings ... the normal default is On so you may want to check it in case it has somehow returned to default, if it is On turn it off and restart Chrome.
Yup, I read about having to keep a CPU core free for each F@H GPU slot; it's generally advisable for BOINC projects as well. Currently, BOINC is using 28 out of 32 threads (including two GPU tasks), so that's two physical cores left over. One for F@H and one free, if you will. Too tight? Keep in mind that the issue persisted even after I turned BOINC off.

Trust me, hw acceleration in Chrome is and has always been turned off.
bruce wrote:Does the PCIe utilization show spikes during those 10-20 second moments?
I didn't check, sorry, and now the issue appears to have vanished.
bollix47 wrote:Balancing FAH with BOINC is a very challenging process. Both are designed to use unused CPU resources, but neither can be convinced to share nicely with the other. My traditional recommendation is to run one or the other -- unless you choose to work out affinity settings that are reassigned when processes are restarted.
Yeah, mixing wasn't my first choice and I probably won't keep the RTX folding for much longer (hopefully GPUGrid is going to support it soon), but I wanted to try.
ProDigit wrote:on my cards I get a 95-98% GPU utilization. More on the slower cards than on the faster ones.
The remaining 10% in your case can be dedicated graphics to your monitor (3D), and transactions between RAM and GPU.
90% is normal for faster cards.
Ideally you'd want compute_0 to hit 95+%, but that would not be possible on a dedicated graphics card that's also feeding a monitor data.
Again, this particular card is not connected to any screen.

I checked in the task manager and compute_0 is at 93-96%. Turning off BOINC does seem to increase that value slightly, but not by much.
Image
foldy
Posts: 2061
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: RTX 2080 lags

Post by foldy »

Do you have some HW monitoring tool running like MSI Afterburner? If true can you close them to see if that helps?

Another wild guess: Nvidia had problems with PLX switch chips and recommends as workaround using this tool
https://github.com/CHEF-KOCH/MSI-utility/releases
Maybe that can help you, but create a windows system restore point before, in case anything goes wrong.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: RTX 2080 lags

Post by bruce »

Azmodes wrote:Certain BOINC projects have the option to run more than one task per GPU to improve throughput. Does something similar apply to F@H? Can I simply create another slot for the same GPU? Would this benefit my PPD?
No, and No.

Assuming [BOINC or FAH] is running a GPU at 85% and you are able to convince your system to run [2 FAH or 2 BOINC or (1 FAH plus 1 BOINC)] on a single GPU, you're talking about adding 15% of that GPU to some project. For BOINC, this is a good idea. 50% of each of two projects (or whatever misture your system decides to allocate) will get more work done than 85% + 0% and BOINC will happily count each project as completed whenever they finish -- up until the deadline -- and BOINC will do it's best to complete both of them.

FAH doesn't simply count completed project ... it adds an increasing number of bonus points when you finish a project earlier. Reducing the effective speed of a FAH assignment from 85% to ~50% will cause a significant detrimental effect on you PPD. If either another FAH project or a BOINC project takes resources away from a FAH project, you will NOT be happy with the PPD. As I said earlier, balancing the resources can be quite a challenge unless you decide to run FAH one week and BOINC the next week, even after effectively draining the project(s) being suspended.

If I have not explained this clearly, please ask because finishing a FAH project just before the deadline is a good way to minimize your PPD. BOINC doesn't care.
Post Reply