FP16, VS INT8 VS INT4?

Moderators: Site Moderators, FAHC Science Team

Post Reply
MeeLee
Posts: 1375
Joined: Tue Feb 19, 2019 10:16 pm

FP16, VS INT8 VS INT4?

Post by MeeLee »

I see some websites quote different performances on cards, like in picture below:
Image

Can FAH use INT4 or INT8 on their WUs?
If so, would it increase folding performance?
JimboPalmer
Posts: 2573
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: FP16, VS INT8 VS INT4?

Post by JimboPalmer »

If F@H could use FP16, Int8 or Int4, it would indeed speed up the simulation.

Sadly, even FP32 is 'too small' and sometimes FP64 is used. Always using FP64 would be ideal, but it is just too slow. (Some cards may do FP64 32 times as slow as FP32)

As the simulation Programs (mostly OpenMM for GPUs) get updated with Volta and Turing in mind I would expect the developers to make use of them in scenarios where the errors do not accumulate. I have my doubts there are any such subroutines in OpenMM.

As examples, here is a wikipedia page on Nvidia GPUs, I started with Pascal, but you can scroll up for older micro-architectures. Wikipedia calls FR32 Single Precision, FP64 Double Precision, and FP16 Half Precision.

https://en.wikipedia.org/wiki/List_of_N ... _10_series

You will notice that Pascal supports Half Precision, but very slowly. It would not be useful to modify Pascal Code. Volta is very fast at both Double Precision and Half Precision, it would make a great F@H micro-architecture (because Double Precision or FP64 is very fast) but is VERY expensive. Turing does Half Precision very rapidly, but not Double Precision very fast. (Even the slowest Volta is 10 times as fast as the fastest Turing at Double precision)

FP16 is going to be most useful when you never plug the results of one equation into the inputs of the next equation. Modeling Proteins does a great deal of plugging the results of one time frame into the inputs of the next time frame.

Again, I have no reason to suspect F@H can use Half Precision, I suspect it would cause rounding errors that would overwhelm the simulation.
Last edited by JimboPalmer on Tue Mar 26, 2019 7:41 am, edited 1 time in total.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
Theodore
Posts: 118
Joined: Sun Feb 10, 2019 2:07 pm

Re: FP16, VS INT8 VS INT4?

Post by Theodore »

I believe NVidia Quadro had a Volta chipset, But it was reported to be quite slow compared to the much cheaper RTX2060,
Am I wrong about this?
The Tesla V performance was about on par with 2xRTX 2060s.
Last edited by Theodore on Tue Mar 26, 2019 4:39 pm, edited 1 time in total.
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: FP16, VS INT8 VS INT4?

Post by Joe_H »

Reported where? The Quadro GV100 based on the Volta chipset is rated at ~12,000 GFLOPS for single precision FP operations, even higher when running at boost clocks. A RTX 2060 gets a rating that is half that, ~6000 GFLOPS. So yo appear to have got the information backwards.

The comparison is even worse if you look at double precision FP operations, the 2060 is rated at less than 200 GFLOPS, the Quadro GV100 is at about 6,000 GFLOPS.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
Theodore
Posts: 118
Joined: Sun Feb 10, 2019 2:07 pm

Re: FP16, VS INT8 VS INT4?

Post by Theodore »

Perhaps, but from what I read online, people running FAH on those cards, have reported very low PPDs.
Joe_H
Site Admin
Posts: 7856
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Pro 2.8 quad 12 GB smp4
MacBook Pro 2.9 i7 8 GB smp2
Location: W. MA

Re: FP16, VS INT8 VS INT4?

Post by Joe_H »

You still have not said where. Details on what projects they were seeing this on and which drivers are important.

In particular that may be an issue with F@h, but not the GPGPU these cards were designed for. No mainstream Geforce cards are based on Volta. Most of the Volta based cards support NVLink for high speed interconnections between multiple cards and the CPUs.
Image

iMac 2.8 i7 12 GB smp8, Mac Pro 2.8 quad 12 GB smp6
MacBook Pro 2.9 i7 8 GB smp3
MeeLee
Posts: 1375
Joined: Tue Feb 19, 2019 10:16 pm

Re: FP16, VS INT8 VS INT4?

Post by MeeLee »

I can see that the Nvidia Quadro cards based on Pascal, don't have high PPD.
But they're entirely different from those based on Volta.
JimboPalmer
Posts: 2573
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: FP16, VS INT8 VS INT4?

Post by JimboPalmer »

from the above link at Wikipedia:
Quadro GV100 14,800 Single Precision GFLOPS (FP32) 7,400 Double Precision GFLOPS (FP64)

GeForce RTX 2060 6,451.20 Single Precision GFLOPS (FP32) 201.60 Double Precision GFLOPS (FP64)

F@H uses Single Precision when it can, and Double Precision only when necessary.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
Theodore
Posts: 118
Joined: Sun Feb 10, 2019 2:07 pm

Re: FP16, VS INT8 VS INT4?

Post by Theodore »

Well, if you browser around a bit, you'll find PPD ratings in the likes of a GTX1070 ti..
Most certainly not equivalent to their price disparity.
bruce
Posts: 20910
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: FP16, VS INT8 VS INT4?

Post by bruce »

Theodore wrote:... you'll find PPD ratings in the likes of a GTX1070 ti. Most certainly not equivalent to their price disparity.
Quadro is intended for the "professional" marketplace whereas FAH is designed to work on home computers. The professional marketplace emphasizes performance in Double Precision. As Jimbo says, its use is minimized for FAH by design.

The RTX series is aimed at the AI market and the gaming market, both of which can use features which FAH doesn't need.

I make it a point to stick to GPUs intended for the home market (GeForce, etc.) which help me avoid buying features that I don't need.

FAH's design goals will not be changed to use other types of arithmetic because that would limit its use to more expensive hardware with almost no change in performance for scientific work-- and it will not use more FP64 unless it's required for better science. It uses a maximum amount of Single Precision math (FP32) and whatever integers are needed to support the code.
JimboPalmer
Posts: 2573
Joined: Mon Feb 16, 2009 4:12 am
Location: Greenwood MS USA

Re: FP16, VS INT8 VS INT4?

Post by JimboPalmer »

Your GPU must support double precision, (fp64) but it is so minimized that the speed of double precision does not slow f@h.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
Post Reply