Page 1 of 1

Mining Motherboard for FAH (with many pcie x1 slots)

Posted: Tue Feb 06, 2018 2:29 am
by v00d00
Slightly OT, but is anyone using one of these boards for folding? I am coming to the point where I will need to upgrade my AMD system, possibly to a Ryzen and these boards caught my eye. While the whole PCIE x1 vs x16 might be a big thing on 1080's, how does it scale on say a 1050 or 1030.

This is highly speculative btw, but the thought of running a primary card on x16 and some lesser cards, say 1050's, on x1 slots does interest me. If anyone is running one, what sort of numbers are you getting from the x1 slots on the cards you are using. As I understand it, most of these boards only give you 3-4 pcie 3.0 slots, the rest being 2.0 slots. For me this would be fine since I have no interest in deploying 8+ cards from one board initially, but given money and upgrades, may do down the line.

Re: Mining Motherboard for FAH (with many pcie x1 slots)

Posted: Tue Feb 06, 2018 3:30 am
by bruce
When people ask about PCIe speed, they're generally trying to get 100% GPU utilization with no measurable throughput reduction due to data movement. That's not what you're looking for. There's no doubt that a GPU in a 1x slot will spend a portion of it's time waiting for data and a portion of it's time waiting for the shaders to complete the calculations on that data ... but let's look at it from the big-picture perspective. The objective should be to get more total work done, whether it's waiting for the slot or for the calculations. One GPU in a x16 slot will get less work done than 2 similar GPUs in x8 slots even if they spend more time waiting for data. For a GPU that's very fast, the general rule of thumb is that a x4 slot will see some loss in performance, but not that much. As you've obviously figured out, a 1050 will see a smaller loss in performance that, say, a 1080.

I have successfully use "slower" GPUs in x1 slots but I didn't explicitly compare a family of various "slower" GPUs and I don't remember any reports from others.

Also the terms 1x/2x/4x/8x/16x are not really meaningful with out a version number: PCI Exprpress 1.0/ 2,0/ 3,0/ or 4,0 so I'm being a bit sloppy in my earlier statements.

The actual results will also depend on the size of the WU being processes ... lots of atoms means more data needs to be moved that a protein with fewer atoms.

Anyway a 1050 connected at PCIe 1.0 at 1x is probably going to be severly throttled, so I don't recommend it. PCIe 3.0 1x is probably plenty to keep a 1030 quite busy. I wish I could give you more accurate data.

.

Re: Mining Motherboard for FAH (with many pcie x1 slots)

Posted: Tue Feb 06, 2018 7:37 am
by rwh202
Just to add to the above - Linux is a better option than windows at the moment for PCIe bandwidth.
A 1050 won't be throttled to any meaningful extent and a 1080Ti will still pull 1 Mil PPD (vs 1.2) on PCIe 2.0 1x

Re: Mining Motherboard for FAH (with many pcie x1 slots)

Posted: Tue Feb 06, 2018 10:27 am
by foldy
Yes you need to use Linux as Windows bottlenecks much more on pcie.

Re: Mining Motherboard for FAH (with many pcie x1 slots)

Posted: Tue Feb 06, 2018 2:54 pm
by v00d00
It's fine, I haven't used Windows for anything serious in over 13 years. Its just an OS for gaming nowadays.

Would probably use RHEL or Devuan and customise it beyond that.

Re: Mining Motherboard for FAH (with many pcie x1 slots)

Posted: Wed Feb 14, 2018 4:30 am
by FldngForGrandparents
You need a physical core per GPU to maintain performance. No hyperthreading, etc. That will limit you with lots of PCIe losts. The best I have done is 7 cards on 16X slots. I have moved to the max of 5 on mixed 16x to 4x for stability and performance.

Re: Mining Motherboard for FAH (with many pcie x1 slots)

Posted: Wed Feb 14, 2018 8:00 am
by Nathan_P
on those mining mobo's it might be an x16 slot but its only running at x1

Re: Mining Motherboard for FAH (with many pcie x1 slots)

Posted: Wed Feb 14, 2018 10:00 am
by foldy
@FldngForGrandparents: Do you use Windows or Linux? Which CPU do you use? How much was your gain from using physical cores instead of hyperthreading?

Re: Mining Motherboard for FAH (with many pcie x1 slots)

Posted: Thu Feb 15, 2018 1:03 am
by v00d00
On an AMD platform, the one core per instance isnt that big a deal if you loaded it with an FX 8 core, or since its AM4, a Ryzen 8 Core. On Intel it might be a bit more in depth, requiring a Xeon cpu solution.

By my reckoning if its AMD based, I dont see any issue with maybe 12 instances, ie, 12x 1050ti's or similar. bind 8 of them to the physicals and throw 2 logicals per process for the rest. Based on a Ryzen 1700. If you work on the fact that a hyperthread or similar is worth 50% of a physical core, throwing 4 sets of two at 4 processes might suffice. But its theoretical. From linux I would simply unhook all the physical cores using isolcpus and then taskset each one manually for the first 8 instances, then taskset the rest while leaving them on the io_scheduler. It would be slightly complicated but fairly hardy once built and configured. Also completely headless with the most streamlined base system and nothing that isnt needed.

Also I dont have a hyper/smt capable cpu so cant test dedicating 2 hypers to a process and comparing to one that uses a single physical core. Im sure someone has done those numbers already.

Re: Mining Motherboard for FAH (with many pcie x1 slots)

Posted: Thu Feb 15, 2018 6:43 am
by bruce
v00d00 wrote:If you work on the fact that a hyperthread or similar is worth 50% of a physical core ...
That's not a very good assumption. It really depends on what workload is assigned to the threads.

A pair of threads share the same floating-point registers (& etc.) so you'll get slightly more than 50% out of each thread. That means that running a pair of threads from FAHCore_a4 or _a7 you'll get maybe 55-60% throughput on each one, compared to what you'd get if the other "half" was idle.

For code containing non-floating-point instructions, there are extra fixed-point registers. Each thread has (almost) dedicated hardware. Since FAHCore_a1 primary function is moving data to/from the PCIe bus, I'd expect that if you pair up two copies of the code that drives the GPU, you'd get very little degradation.

This is theory, though, and I've never measured it. How about you actually do some measurements and report your findings.

Re: Mining Motherboard for FAH (with many pcie x1 slots)

Posted: Sat Feb 17, 2018 9:58 pm
by v00d00
Its a good idea. At present though the only machine I have that is capable of utilising hyperthreads is an i3, which runs windows. Try as I will I haven't been able to get it to fold gpu. The drivers are all installed correctly, other opencl apps work fine, but everytime I assign the slot to gpu, the client deletes the slot. And yes, it is installed to a directory not within Program Files or any other UAC controlled directory. Cpu client on the other hand works well.

Also yes I know the whole 50% thing is a hack value and I haven't tested it fully. I just know from running other programs and games, it doesnt quite measure up to the same as a regular core. Sometimes I will watch cpu utilisation in Open Hardware Monitor while im gaming or rendering and their are two cores that are always at 100% and two that fluctuate. So I ran 7zip and bound it to each core to find out which ones were the hyperthreads while creating an archive. The ones that would be running topped out at 100% while gaming were the hyperthreads, while the other two that ran at around 70-80% were the physical cores.

Anyways I will try and find some way to test it.

Re: Mining Motherboard for FAH (with many pcie x1 slots)

Posted: Sun Feb 18, 2018 6:54 am
by bruce
Well, let's see if we can diagnose "the client deletes the slot" problem.

Post the first ~100 lines of your log per the instructions in the signature block of my first post (above) and the segment showing you adding the slot and the slot being removed.

Does FAHBench run on the GPU?