PCI-e bandwidth/capacity limitations

A forum for discussing FAH-related hardware choices and info on actual products (not speculation).

Moderator: Site Moderators

Forum rules
Please read the forum rules before posting.
Post Reply
rwh202
Posts: 425
Joined: Mon Nov 15, 2010 8:51 pm
Hardware configuration: 8x GTX 1080
3x GTX 1080 Ti
3x GTX 1060
Various other bits and pieces
Location: South Coast, UK

Re: PCI-e bandwidth/capacity limitations

Post by rwh202 »

Aurum wrote:rxh202, So if you moved a single 1080 to different slots then does the second 16x slot (PCI_E4) run at x4 when used alone??? I see from the photo that MSI labels the first 16x slot (PCI_E2) as PCI-E3.0 but no label on PCI_E4 just a different lock style.
Yeah, the top slot is fixed x16 connected to the CPU and the lower x16 is actually x4 electrical and connected to the PCH. On more sophisticated motherboards, both slots are often connected to the 16 lanes from the CPU with switches that automatically reconfigure to an x8/x8 config when 2 cards are fitted (or even x0/x16 when just the lower slot used).
In linux, the driver helpfully reports the PCIE link info (width 1x, 4x etc. and speed (that dynamically changes to save power, but was at 5 GT/s for Gen2 and 8 GT/s for Gen3 during tests)) for each connected card, so I've been able to confirm the speeds when running the tests.
foldy wrote:So on Linux with a GTX 1080 gen3 x16 vs gen 3 x1 you loose only 4% and another 4% when going down to gen 2 x1.
But on your particular mainboard you loose another 10% when using mb instead of CPU connection.
Yep, that about sums it up! This experimentation has been informative for me, because in future I'll look for motherboards that allow both GPUs to be connected to the CPU (in an x8/x8 config).
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: PCI-e bandwidth/capacity limitations

Post by bruce »

That's really good work, guys.
rwh202 wrote:... but I just moved the card between slots and used a 1x riser to drop to 1x.
You can also get a 4x riser.
rwh202
Posts: 425
Joined: Mon Nov 15, 2010 8:51 pm
Hardware configuration: 8x GTX 1080
3x GTX 1080 Ti
3x GTX 1060
Various other bits and pieces
Location: South Coast, UK

Re: PCI-e bandwidth/capacity limitations

Post by rwh202 »

bruce wrote:You can also get a 4x riser.
Thanks - powered 4x risers were a little harder to find, but just ordered one to try - shipping from Hong Kong, so it'll be a little while...
yalexey
Posts: 14
Joined: Sun Oct 30, 2016 5:10 pm

Re: PCI-e bandwidth/capacity limitations

Post by yalexey »

Recent data in that the thread suggest - the performance is much stronger influence by latency, rather than the bus width.
That is why I want to repeat the question: Is it possible to eliminate this bottleneck, by reducing latency requirements? For example, by the GPU job queue formed CPU cycles or multiple jobs for the one GPU, which are processed virtually in parallel. That is, until one of them is waiting for data from the CPU, the other at the moment load GPU computation units.

This work can be performed once, and save thousands of dollars for people around the world. This will increase the performance of all systems, where the GPU is installed on the lines of the Northbridge.
Aurum
Posts: 296
Joined: Sat Oct 03, 2015 3:15 pm
Location: The Great Basin

Re: PCI-e bandwidth/capacity limitations

Post by Aurum »

What is the fastest Nvidia card that can run with minimal performance loss on a 1x slot :?:
In Science We Trust Image
foldy
Posts: 2061
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: PCI-e bandwidth/capacity limitations

Post by foldy »

There is report from anandtech forum where a R9-280X (HD7970) only looses 9% performance pcie 2.0 x16 vs x1 on Windows.

Code: Select all

I have quickly downloaded FAH Bench and ran the GUI, and ran 180 second tests, one at PCIe 2.0 16x and the other at 1x using this contraption (link).
My test bed is a BIOSTAR H81S2, with a 3.0GHz Pentium dual core (G3220, 22nm). The GPU is a lowly Radeon R9-280X (HD7970).
At 1x, the FAHBench software reported these results on Tahiti, Open-CL, accuracy check enabled, 180 seconds, dhfr task, single precision, NaN check disabled: 38.5408, 23558 atoms.
At 16x/2.0: 42.3566, 23558 atoms.
1x was only 91% of 16x, losing 9% of it's potential.
rwh202
Posts: 425
Joined: Mon Nov 15, 2010 8:51 pm
Hardware configuration: 8x GTX 1080
3x GTX 1080 Ti
3x GTX 1060
Various other bits and pieces
Location: South Coast, UK

Re: PCI-e bandwidth/capacity limitations

Post by rwh202 »

Some more numbers, same system as before, but with a x4 riser to play with:
Linux Mint 17.3 using drivers 367.44.
GPU: EVGA GTX 1080 FTW
MB: MSI Z87-G41
CPU: Pentium G3258 @ 3.2 GHz

Code: Select all

x16 Gen3 (CPU - no riser) 1% bus usage. Score: 150.476 (100%)
x4  Gen3 (CPU - w. riser) 4% bus usage. Score: 148.624 (98.8%)
x4  Gen2 (CPU - w. riser) 7% bus usage. Score: 146.809 (97.6%)
x4  Gen2 (MB  - w. riser) 5% bus usage. Score: 135.345 (89.9%) (only ran this test to check whether cheap riser itself was degrading performance above and beyond drop in PCIe speed - very similar result to previous runs without riser so, I assume not)
Previous results for comparison:

Code: Select all

x16 Gen3 (CPU)  1% bus usage. Score: 149.455 (100%)
x16 Gen2 (CPU)  2% bus usage. Score: 148.494 (99.4%)
x4  Gen2 (MB)   5% bus usage. Score: 135.417 (90.6%)
x1  Gen3 (CPU) 13% bus usage. Score: 143.917 (96.3%)
x1  Gen2 (CPU) 23% bus usage. Score: 137.669 (92.1%)
x1  Gen2 (MB)  17% bus usage. Score: 123.570 (82.6%)
So, x4 Gen3 only loses 1% and dropping to Gen2 loses another 1% - provided you're connected to CPU, not PCH.
This is broadly inline with the 4% loss going to x1 Gen3 and the further 4% dropping to Gen2 identified earlier.
foldy
Posts: 2061
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: PCI-e bandwidth/capacity limitations

Post by foldy »

So we can say that pcie x4 only has a small performance loss compared to x16 by 2%, except your mainboard lanes which loose additional 8%.

As you use Linux even on pcie x1 there is only a small performance loss of 4% going from pcie Gen3 x16 to x1 and another 4% going to Gen2 x1, except your mainboard lanes which loose additional 10%. This is in contrast to the Windows results where pcie x1 cut the performance by half.

We only saw the results of fast Nvidia GPUs, I guess fast AMD GPUs like R9 Fury behave similar but that was not tested.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: PCI-e bandwidth/capacity limitations

Post by bruce »

That also suggests that a slow GPU will see very little degradation, no matter how it's connected.
foldy
Posts: 2061
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: PCI-e bandwidth/capacity limitations

Post by foldy »

Posting a chat with Nathan_P about mainboards with PLX pcie switch chip:

Do you have numbers for a mainboard with PLX switch chip which supports x8/x8/x8/x8 with only a 16 lanes CPU? I guess it should work great for folding because the pcie bus is not used permanent by folding work units but every x millisecond and so 4 GPUs can alternate on cpu lanes. But it could also be the GPUs disturb each other and it will be effective x4 because all 16 cpu lanes are permanent busy?
Nathan_P: Those numbers I quoted are on a PLX equipped motherboard, the Asus Z87-WS, 2 slots run at x16, 3 at x16/x8/x8 and quad x8 but you can BIOS switch the speeds between PCIE 1/2/3 giving PCIe 3.0 x16, x8 & x4, PCIe2.0 x16 & x8 and PCIe 1 x16.

Bear in mind that PCIe 3.0 x8 has the same in bandwidth as 2.0 x16, 2.0 x8 is the same as 1.0 x16 and 1.0 x16 is the same as 3.0 x4
Ah I see, so with PLX quad x8 there is no performance loss?
Nathan_P: Very small, around 1% from PCIE 3.0 x16 to PCIe3.0x8 or PCIe2.0 x16. Some WU did give a greater drop but others less so the average is around 1%
Aurum
Posts: 296
Joined: Sat Oct 03, 2015 3:15 pm
Location: The Great Basin

Re: PCI-e bandwidth/capacity limitations

Post by Aurum »

I had a dream Gigabyte made a motherboard just for folding. It did not have any bells and whistles, no PCI slots, no audio, no serial or parallel ports, no legacy USB. It did have 8-pin CPU power, auxillary molex and SATA power connectors, M.2 SSD socket, PCIe slots on both sides of the CPU and all PCIe slots were 3.0 x16 and spaced to fit double-wide graphics cards.

Then I woke up and it was snowing hard.
In Science We Trust Image
boristsybin
Posts: 50
Joined: Mon Jan 16, 2017 11:40 am
Hardware configuration: 4x1080Ti + 2x1050Ti
Location: Russia, Moscow

Re: PCI-e bandwidth/capacity limitations

Post by boristsybin »

Can anyone tell, is pci-e 3.0 x8 enough to feed titan x pascal with folding task to maximum output?
Image
foldy
Posts: 2061
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: PCI-e bandwidth/capacity limitations

Post by foldy »

Yes that's fine.
PS3EdOlkkola
Posts: 184
Joined: Tue Aug 26, 2014 9:48 pm
Hardware configuration: 10 SMP folding slots on Intel Phi "Knights Landing" system, configured as 24 CPUs/slot
9 AMD GPU folding slots
31 Nvidia GPU folding slots
50 total folding slots
Average PPD/slot = 459,500
Location: Dallas, TX

Re: PCI-e bandwidth/capacity limitations

Post by PS3EdOlkkola »

@boristsybin I've got a sample size of 12 Titan X pascal GPUs with about half on PCIe 3.0 x8 and the other half on PCIe 3.0 x16 interfaces. There is no discernible difference in frame time or PPD between the two interfaces. All of them are in 2011-v3 motherboards with 40 PCIe lane CPUs.
Image
Hardware config viewtopic.php?f=66&t=17997&p=277235#p277235
hiigaran
Posts: 134
Joined: Thu Nov 17, 2011 6:01 pm

Re: PCI-e bandwidth/capacity limitations

Post by hiigaran »

This thread should be more than enough motivation for devs to come up with a F@H equivalent of BOINCs WUProp!
Post Reply