Is there going to be new cores to take advantage of new GPUs

If you think it might be a driver problem, see viewforum.php?f=79

Moderators: Site Moderators, FAHC Science Team

Post Reply
scott@bjorn3d
Posts: 80
Joined: Tue Dec 19, 2017 12:19 pm

Is there going to be new cores to take advantage of new GPUs

Post by scott@bjorn3d »

As having 3 people including myself as cancer survivors have been big in Folding@Home since 2005. As I keep upgrading GPU's as new tech comes out it seems that Folding@Home is not really taking advantage of their power. Sure PPD goes up some with each new card, but I have 4 Pascal based cards Folding right now with 2 being 1080TI's and it just seems like they should be doing allot more work efficiently. And of coarse in my interest to get more work done faster I will be upgrading again when new round of NVIDIA GPU's come out.

Just have to ask is there any new work units or clients that will let these cards really work?
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Is there going to be new cores to take advantage of new

Post by bruce »

I don't understand what evidence you have that your system(s) are not accomplishing as much as your hardware/software is capable of doing?

I contend that they ARE "really working"

---------------------

There is a new FAHCore being developed (and as always, there's no predicted release date other that "soon"©). It will contain some new scientific options and will probably have some new optimizations, but there's no way to predict how significant those will be. (FAH is already highly optimized.)

It's a fact that FAH's computational requirements are challenging and it's pretty much up to the hardware designers to design hardware that can do challenging computations faster.
foldy
Posts: 2061
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: Is there going to be new cores to take advantage of new

Post by foldy »

Currently Linux is much faster than Windows for nvidia GPU folding, maybe because of the drivers.
JimboPalmer
Posts: 2573
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: Is there going to be new cores to take advantage of new

Post by JimboPalmer »

[This is all wild ass guessing, I have no inside information]

The more atoms in the protein, the more shaders the program will use. fold too large a protein fails to complete on time, folding too small a protein does not use all shaders.

I do not think the server has (or uses) much information about the capability of the client. It send the same complexity proteins to a GTX1080ti and a GT1030 as they both look the same to the server. (it may send the same proteins to a GT430) A large protein may take weeks on a GT430, days on a GT1030 and minutes on a GTX 1080ti. This is bad for everyone. The poor GT430 may never see a Protein it can complete in time, while the GTX 1080ti may spend a significant portion of time doing set up and hardly any time doing science. If the GT430 got only small proteins and the GTX1080ti only large proteins, both owners would feel their contribution is more valued.

On both the AMD side and the Nvidia side, the actual instruction set has been stable for some time: Vega does not seem to offer new features, just faster. Volta adds a new style of shader, but my guess is that it's accuracy is not enough to help F@H, so it will act as a faster Pascal.

If the client could either report the model number, number of cores or the run time of the last WU back to the server, the server could better guess how large a protein to assign next. Model number would cause a great deal of updates and frustrated donors that their model was not already included in the latest update. One core is not equal to another across cards but still "more is better". If the client reported back I did this size Protein in this time, the server would (slowly) make better and better guesses.

None of that would help the Donor who does not run 24/7 though. (the last does the best)
Last edited by JimboPalmer on Wed Jan 10, 2018 9:49 pm, edited 1 time in total.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
foldy
Posts: 2061
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: Is there going to be new cores to take advantage of new

Post by foldy »

rwh202 wrote:It seems like either the projects or routines just don't scale across the sheer number of cores - has anyone tried or indeed succeeded to run multiple WUs simultaneosuly on the same card? Even then, it'll be a trade off between two tasks at best case 50% performance vs a single one getting 70% performance.
I tried that but it only works by adding one slot and start and then add a second slot for the same GPU. If you reboot the PC that setup will fail because both cores would start simultanious and get a lock exception. Performance is bad because both work units on the same GPU only perform 50% as you said and because of the quick return bonus the PPD goes down - one fast returned WU is better than 2 slow returned WU in PPD.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Is there going to be new cores to take advantage of new

Post by bruce »

rwh202 wrote:Yeah, linux performs better but still room for improvement - there it only uses 1-2% PCIE bus, and with 10 GB/sec available, that shouldn't be a limiting factor (if it is, something should be optimised in the code)
Folding uses only a few hundred MB of RAM and VRAM and with 10 GB available, there are resources to use and abuse if it improves performance.

It seems like either the projects or routines just don't scale across the sheer number of cores - has anyone tried or indeed succeeded to run multiple WUs simultaneosuly on the same card? Even then, it'll be a trade off between two tasks at best case 50% performance vs a single one getting 70% performance.

I know that some other distributed computing projects effectively interweave 2 or more WUs together that get processed at the same time as part of a super-WU, ensuring that the GPU is always busy (1 task shuffling data or checkpointing on CPU whilst other is crunching)
1) OK. PCIe bus isn't a limiting factor
2) Extra RAM won't help when the primary limitation is GFLOPS of the folding device.3) Running multiple WUs on the same device slows down both of them. If one WU uses ~90% of the GFLOPS, running two of them might make use of the other 10%, but each of the WUs would get ~55% of the original 90% so each one will take 60% longer to finish. As far as FAH is concerned, faster results are ALWAYS better than MORE results if those results are delayed.

The scientific results are maximized when each individual WU is processed at the maximum rate. The time spent check-pointing and shuffling data is insignificant compared to the time spend crunching.
_r2w_ben
Posts: 285
Joined: Wed Apr 23, 2008 3:11 pm

Re: Is there going to be new cores to take advantage of new

Post by _r2w_ben »

This is the latest official post I could find about the upcoming FahCore 22. Unfortunately it's from October 2016. Since then OpenMM has released 7.1 and 7.1.1. The current FahCore 21 is based on OpenMM 6.2 or 6.3 IIRC.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Is there going to be new cores to take advantage of new

Post by bruce »

The team working on FAHCore_22 and OpenMM have been seeking improvements from each other. (Why do you suppose 7.1.1 has replaced 7.1?)
foldy
Posts: 2061
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: Is there going to be new cores to take advantage of new

Post by foldy »

I guess they wait for OpenMM 7.2 which is currently in beta because that supports new CUDA 9 which maybe can use GPUs more efficiently.
scott@bjorn3d
Posts: 80
Joined: Tue Dec 19, 2017 12:19 pm

Re: Is there going to be new cores to take advantage of new

Post by scott@bjorn3d »

I think they just need to find a way for the client to report the GPU when it goes to pull and work unit and the higher the GPU power the bigger work unit it could pull. I do fold to help fight illness but also like the competition side of thing. I mean look at my tiny little team. Only 3 of us and we keep slugging it to stay in top 50 teams. https://folding.extremeoverclocking.com ... s=&t=41608
ChristianVirtual
Posts: 1596
Joined: Tue May 28, 2013 12:14 pm
Location: Tokyo

Re: Is there going to be new cores to take advantage of new

Post by ChristianVirtual »

JimboPalmer wrote:[This is all wild ass guessing, I have no inside information]

The more atoms in the protein, the more shaders the program will use. fold too large a protein fails to complete on time, folding too small a protein does not use all shaders.

I do not think the server has (or uses) much information about the capability of the client. ...
I'm sure there would be ways via GPU information send back to server (e.g. name, PCI ID); if not sufficient information are send back at least regular config parameter like max-packet-size etc. might be possible to "abuse", I mean utilized to steer better fitting WU to cards.
Suggested that some time back; but it never made it into reality. :|
ImageImage
Please contribute your logs to http://ppd.fahmm.net
v00d00
Posts: 396
Joined: Sun Dec 02, 2007 4:53 am
Hardware configuration: FX8320e (6 cores enabled) @ stock,
- 16GB DDR3,
- Zotac GTX 1050Ti @ Stock.
- Gigabyte GTX 970 @ Stock
Debian 9.

Running GPU since it came out, CPU since client version 3.
Folding since Folding began (~2000) and ran Genome@Home for a while too.
Ran Seti@Home prior to that.
Location: UK
Contact:

Re: Is there going to be new cores to take advantage of new

Post by v00d00 »

Is the OP maybe suggesting something for GPU that would be the equivalent to the BigWU option. A set of projects that would be impossible to complete on any hardware less than say a 1080, maybe with a different points scheme that reflects the complexity and size of the projects? Instead of these projects focusing on getting it done quick and cheap, the focus could be on doing more interesting science at cost of longer completion times and the use of larger amounts of resources (by this im talking about proteins that require 6GB+ of video ram and large amounts of shaders, possibly taking 3-4 days to complete each workunit but also giving points worthy of the sacrifice).

Such a program could maybe be beneficial to certain researchers who possibly find themselves limited by what they can achieve at present. But whether the software is at the point where the above would be possible and whether there are the proliferation of cards and folders willing to undertake such research, is debatable.
Image
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Is there going to be new cores to take advantage of new

Post by bruce »

Bigadv was established because GROMACS had problems supporting high CPU counts. At the same time, new code was developed to enable GROMACS to handle many new values of CPU count while introducing the problems with "large primes" -- which later had to be solved with changes to FAH. Unfortunately, there was no rational way to establish an appropriate PPD for bigadv which lead to other problems, including the discontinuation of bigadv. The current version of GROMACS supports large numbers of CPUs, but the largest actual hardware available is dwarfed by the number of shaders in modern GPUs.

I'm not aware of any real scientific limitations for the current generations of non-commercial GPUs that prevent new projects from being designed.

Neverthele3ss, there ARE PPD limitations which have not been adequately diagnosed. All current projects CAN be processed by the current GPUs but many, if enabled, would provide reduced PPDs. In other words, if FAH were redesigned to provide an option similar to bigaedv, it would allow you to receive assignments that you'd complain about.

The FAH point and bonus system is based on awarding the SAME points and bonuses whenever you produce the same science. Older generations of GPUs happen to be more efficient than newer generations. I know of no method of solving this issue other than to allow the PI to limit assignments to the less-efficient Pascal GPUs.

There's a similar discussion elsewhere asking why the TitanV isn't more powerful than the GTX 1080Ti. (It's less of an issue because the performance is about equal.) I can't predict whether either or both of these issues can be solved -- whether by changes to OpenMM or to the drivers or if the hardware designers need to change something.

In all cases, FAH makes use of whatever resources you provide, even if you're not happy with the results. I have no doubt that (behind the scenes) the experts are continuing to improve throughput for everybody, but since it's a complicated problem, it's not something that'll be solved quickly.
Luscious
Posts: 49
Joined: Sat Oct 13, 2012 6:38 am

Re: Is there going to be new cores to take advantage of new

Post by Luscious »

The current version of GROMACS supports large numbers of CPUs
So what would be the optimum thread count for CPU's running a4 currently? I read somewhere that 8 worked but 10 didn't because of division. If that's true would configuring 24 threads from a modern Xeon be workable (since 24 can be divided by 8,6,4,3,2,1)
Joe_H
Site Admin
Posts: 7854
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: Is there going to be new cores to take advantage of new

Post by Joe_H »

Currently multiples of 2, 3 and sometimes 5 will work. Not all projects have problems with decompositions involving 5 and its multiples, but the supply of available WU's can vary. There are/were some older projects that will assign to multiples of 7, but those projects were almost done the last time I checked.

So for a system that has one or more Xeon processors and a core/thread count large enough, then 24 is workable most of the time. Even larger number of threads can sometimes find work. Limits at this point are the number of atoms involved in the simulated protein system, and testing on enough larger computer systems prior to release to determine the usable thread counts beyond about 32 threads.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Post Reply