PCI-e bandwidth/capacity limitations

A forum for discussing FAH-related hardware choices and info on actual products (not speculation).

Moderator: Site Moderators

Forum rules
Please read the forum rules before posting.
Post Reply
yalexey
Posts: 14
Joined: Sun Oct 30, 2016 5:10 pm

Re: PCI-e bandwidth/capacity limitations

Post by yalexey »

Is it possible to organize some kind of queue in OpenCL, and preload data? Perhaps due to the processing of two or more threads on a single GPU. Cost of multigpu system essentially depends on the required bus bandwidth.

The other day I saw the description and photo 12 Radeon GPU system on a single Supermicro board. It really works in the tasks of mining, but not suitable for calculations in connection with the issues discussed in this thread.
https://i.gyazo.com/cc8ca224dd86317f4fc ... b89e36.jpg

EDIT by Mod:
Replaced a large image of that system by a link to that image.
(Images are prohibited to save bandwidth]
foldy
Posts: 2061
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: PCI-e bandwidth/capacity limitations

Post by foldy »

I understand that the bus usage limit on fast GPUs is a problem with pcie 3.0 x1 on Windows but not on Linux.

12 GPUs is a little heavy but 8 should be possible. When each GPU needs pcie 3.0 x4 as minimum then with 8 GPUs you need 4x8 = 32 lanes. And for nvidia GPUs you need a CPU core each to feed them. A intel Core i7-6900K has 8 real cores and 40 pcie lanes - that matches. But the CPU and mainboard are expensive.

Another alternative may be mainboards with PEX switch chip where the pcie lanes used dynamically. This mainboard then even would offers 7 pcie 3.0 x8.
https://www.asus.com/de/Motherboards/X9 ... fications/

I don't know if using some splitters 16 GPUs with pcie 3.0 x2 would be possible with this board?

Most users find it more cheap and easy to just build several dual GPU systems.
Aurum
Posts: 296
Joined: Sat Oct 03, 2015 3:15 pm
Location: The Great Basin

Re: PCI-e bandwidth/capacity limitations

Post by Aurum »

This MSI Z87-G45 GAMING motherboard (https://us.msi.com/Motherboard/Z87-G45- ... cification) has 3xPCIe 3.0 x16 slots with operating modes: x16x0x0, x8x8x0, or x8x4x4. So with only two cards it's running x8x8. I may be able to add an RX 480 in the third slot and test it.

Simultaneous FAHbench January 6, 2017
CPU, Card, GPU, GDDR, Brand, GPU Clock, Memory Clock, Shaders, Compute, Precision, WU, Accuracy Check, NaN Check, Run Length, Score, Scaled Score, Atoms
Intel Core i7 4771 @ 3.50GHz, RX470, Ellesmere, 4, ASUS, 1250, 1650, 2048, openCL, single, dhfr, enabled, disabled, 1000 s, 66.7551, 66.7551, 23558, in tandem
Intel Core i7 4771 @ 3.50GHz, RX470, Ellesmere, 4, MSI, 1230, 1650, 2048, openCL, single, dhfr, enabled, disabled, 1000 s, 66.5858, 66.5858, 23558, in tandem

Individual ASUS RX470 FAHbench January 6, 2017
Intel Core i7 4771 @ 3.50GHz, RX470, Ellesmere, 4, ASUS, 1250, 1650, 2048, openCL, single, dhfr, enabled, disabled, 120 s, 66.1621, 66.1621, 23558, alone

Individual MSI RX470 FAHbench January 6, 2017
Intel Core i7 4771 @ 3.50GHz, RX470 Ellesmere 4, MSI, 1230, 1650, 2048, openCL, single, dhfr, enabled, disabled, 120 s, 65.2190, 65.2190, 23558, alone
Last edited by Aurum on Fri Jan 06, 2017 9:13 pm, edited 1 time in total.
In Science We Trust Image
7im
Posts: 10189
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: PCI-e bandwidth/capacity limitations

Post by 7im »

foldy wrote:I understand that the bus usage limit on fast GPUs is a problem with pcie 3.0 x1 on Windows but not on Linux.
Understand this how, please?
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Aurum
Posts: 296
Joined: Sat Oct 03, 2015 3:15 pm
Location: The Great Basin

Re: PCI-e bandwidth/capacity limitations

Post by Aurum »

foldy wrote:Another alternative may be mainboards with PEX switch chip where the pcie lanes used dynamically. This mainboard then even would offers 7 pcie 3.0 x8.
https://www.asus.com/de/Motherboards/X9 ... fications/
I love the board, and look, it's only $510 :shock: :shock: :shock:

Amazon just told me they cancelled the MB they sold me that they don't have so I'm looking for another, hopefully under $200.
In Science We Trust Image
foldy
Posts: 2061
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: PCI-e bandwidth/capacity limitations

Post by foldy »

7im wrote:
foldy wrote:I understand that the bus usage limit on fast GPUs is a problem with pcie 3.0 x1 on Windows but not on Linux.
Understand this how, please?
I mean this is what I read in the forum threads.
Aurum
Posts: 296
Joined: Sat Oct 03, 2015 3:15 pm
Location: The Great Basin

Re: PCI-e bandwidth/capacity limitations

Post by Aurum »

Card, Score, note
GTX 1070 , 87.1, alone 16x slot
GTX 1070 , 54.9, alone 1x slot
GTX 980 Ti, 81.7, alone 16x slot
GTX 980 Ti, 41.1, alone 1x slot
GTX 1070 , 89.1, in tandem 16x slot
GTX 1070 , 49.0, in tandem 1x slot
GTX 980 Ti, 79.1, in tandem 16x slot
GTX 980 Ti, 39.7, in tandem 1x slot

single Precision, dhfr WU (23,558 atoms), Accuracy Check enabled, NaN Check 10 steps, Run Length 60 s alone or 120 s in tandem
Intel Core i3-4130T @ 2.9 GHz, Windows 7 64-bit, 8 GB RAM, 250 GB SATA-III SSD, Corsair AX1200
Nvidia ForceWare 376.48, FAH 7.4.15, FAHbench 2.2.5
EVGA GTX 1070, GP104, 8 GB, GPU Clock 1595 MHz, Memory 2002 MHz, 1920 shaders
EVGA GTX 980 Ti, GM200, 6 GB, GPU Clock 1102 MHz, Memory 1753 MHz, 2816 shaders
ASRock H81 Pro BTC: 1xPCIe 2.0 x16 + 5xPCIe 2.0 x1
Last edited by Aurum on Sat Jan 07, 2017 3:02 pm, edited 1 time in total.
In Science We Trust Image
foldy
Posts: 2061
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: PCI-e bandwidth/capacity limitations

Post by foldy »

To summaries:
GTX 1070 x16 90ns
GTX 1070 x1 50ns

GTX 980 Ti x16 80ns
GTX 980 Ti x1 40ns

This is up to 50% performance loss for fast GPUs on x1.

Did you measure this on Windows or Linux?

Can you also measure x4? Both in tandem only one measurement.
Aurum
Posts: 296
Joined: Sat Oct 03, 2015 3:15 pm
Location: The Great Basin

Re: PCI-e bandwidth/capacity limitations

Post by Aurum »

foldy wrote:Did you measure this on Windows or Linux?
I revised the post (see above) to make it easier to read and more complete. Win7-64.
foldy wrote:Can you also measure x4? Both in tandem only one measurement.
Not on this cheap MB that was on my bench getting a frame to mount 1xHD5830 + 5xHD5970s. I will on another MB.

So the Score is some timed event in nanoseconds??? I have yet to see the documenation that explains what FAHbench is doing. E.g., the final score seems to be the last recorded value and not some average of all measurements. This is a problem when running a tandem test because one always finishes first and the second place GPU takes a jump up at the end.

What's the difference between DHFR (23,558 atoms) and DHFR-implicit (2,489 atoms)??? Single versus double precision???
NAV is small so I wonder if it's useful to run.
In Science We Trust Image
rwh202
Posts: 425
Joined: Mon Nov 15, 2010 8:51 pm
Hardware configuration: 8x GTX 1080
3x GTX 1080 Ti
3x GTX 1060
Various other bits and pieces
Location: South Coast, UK

Re: PCI-e bandwidth/capacity limitations

Post by rwh202 »

Some numbers for FAH Bench on Linux Mint 17.3 using drivers 367.44.
GPU: EVGA GTX 1080 FTW
MB: MSI Z87-G41
CPU: Pentium G3258 @ 3.2 GHz

Code: Select all

x16 Gen3 (CPU)  1% bus usage. Score: 149.455 (100%)
x16 Gen2 (CPU)  2% bus usage. Score: 148.494 (99.4%)
x4  Gen2 (MB)   5% bus usage. Score: 135.417 (90.6%)
x1  Gen3 (CPU) 13% bus usage. Score: 143.917 (96.3%)
x1  Gen2 (CPU) 23% bus usage. Score: 137.669 (92.1%)
x1  Gen2 (MB)  17% bus usage. Score: 123.570 (82.6%)
So, PCIe bus does have an effect, but the connection (either via MB chipset or direct to CPU) seems to have a greater effect than the nominal link speed.

However, on Linux, the performance drop off appears to be less than that being reported on Windows.
Aurum
Posts: 296
Joined: Sat Oct 03, 2015 3:15 pm
Location: The Great Basin

Re: PCI-e bandwidth/capacity limitations

Post by Aurum »

rwh202, How do you control whether you route via MB chipset or direct to CPU??? How do you monitor bus usage, a Linux feature???
In Science We Trust Image
rwh202
Posts: 425
Joined: Mon Nov 15, 2010 8:51 pm
Hardware configuration: 8x GTX 1080
3x GTX 1080 Ti
3x GTX 1060
Various other bits and pieces
Location: South Coast, UK

Re: PCI-e bandwidth/capacity limitations

Post by rwh202 »

Aurum wrote:rwh202, How do you control whether you route via MB chipset or direct to CPU??? How do you monitor bus usage, a Linux feature???
The slots on my motherboard are hard wired to either the PCH or CPU - I don't have any control over it - some MBs have more configuration in bios for how the lanes are allocated and shared between slots, but I just moved the card between slots and used a 1x riser to drop to 1x.

Bus usage is reported by the driver to the nvidia x-server settings app in Linux and to the nvidia-smi interface - I think it's the same number reported by GPU-z and other utilities in Windows.
Aurum
Posts: 296
Joined: Sat Oct 03, 2015 3:15 pm
Location: The Great Basin

Re: PCI-e bandwidth/capacity limitations

Post by Aurum »

rxh202, So if you moved a single 1080 to different slots then does the second 16x slot (PCI_E4) run at x4 when used alone??? I see from the photo that MSI labels the first 16x slot (PCI_E2) as PCI-E3.0 but no label on PCI_E4 just a different lock style.
The MSI web page spec for your MB says:
• 1 x PCIe 3.0 x16 slot
• 1 x PCIe 2.0 x16 slot
- PCI_E4 supports up to PCIe 2.0 x4 speed
• 2 x PCIe 2.0 x1 slots
TIA, just trying to learn this stuff as I've never thought about it before and would like to get the most out of my multi-GPU rigs.
Thanks for the GPU-Z tip, I see the Sensors tab has some interesting monitors. While folding my 1x slot with 980Ti has a Bus Interface Load of 74% and my 16x slot 2.0 slot with a 1070 has 52%. It even tells me why GPU performance is capped.
In Science We Trust Image
foldy
Posts: 2061
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: PCI-e bandwidth/capacity limitations

Post by foldy »

You could also load a real work unit into FahBench. Just copy one from FahClient work folder to FahBench\share\fahbench\workunits and rename accordingly.
On my gtx 970 on Windows 7 64 pcie 2.0 x8 the default FahBench dhfr has 38% bus usage while the real work unit in FahBench uses 60% bus usage like in FahClient.
I always run FahBench in default settings except using a real work unit for bus usage test.
http://www.filedropper.com/real_2
foldy
Posts: 2061
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: PCI-e bandwidth/capacity limitations

Post by foldy »

So on Linux with a GTX 1080 gen3 x16 vs gen 3 x1 you loose only 4% and another 4% when going down to gen 2 x1.
But on your particular mainboard you loose another 10% when using mb instead of CPU connection.
Post Reply