AMD BFLOAT 16?

MeeLee · Post by **MeeLee** » Wed Mar 03, 2021 12:22 am

A new instruction set, Bfloat 16, in between float 16, and full precision 32 bit, hardware built into new AMD CPUS (who now also support avx512).

Is BFLOAT16 feasible for folding?

From digging the inets, I found:

Bfloat16
Fp16 has the drawback for scientific computing of having a limited range, its largest positive number being

6.55 \times 10^4. This has led to the development of an alternative 16-bit format that trades precision for range. The bfloat16 format is used by Google in its tensor processing units. Intel, which plans to support bfloat16 in its forthcoming Nervana Neural Network Processor, has recently (November 2018) published a white paper that gives a precise definition of the format.

The allocation of bits to the exponent and significand for bfloat16, fp16, and fp32 is shown in this table, where the implicit leading bit of a normalized number is counted in the significand.

Format Significand Exponent
bfloat16 8 bits 8 bits
fp16 11 bits 5 bits
fp32 24 bits 8 bits
Bfloat16 has three fewer bits in the significand than fp16, but three more in the exponent. And it has the same exponent size as fp32. Consequently, converting from fp32 to bfloat16 is easy: the exponent is kept the same and the significand is rounded or truncated from 24 bits to 8; hence overflow and underflow are not possible in the conversion.

Post by **bruce** » Wed Mar 03, 2021 4:08 am

FAH uses traditional single and double precisions like other scientific code features. I don't see that reducing precision and extending the exponent would be useful. Moreover, I think that without a new compiler it would be something worth investigating.

Both ray-tracing and Artificial Intelligent have been developing new instruction sets and new data formats which are faster but less accurate. That doesn't help the scientific world which has done a good job with the "standard" floating point formats. Both half precision and bfloat16 are less accurate than single precision so FAH won't be using them.

JimboPalmer · Post by **JimboPalmer** » Wed Mar 03, 2021 5:16 am

AVX-512 is likely to be more important, Intel's version has 3 flaws:

1) every CPU family they make implements another version.

2) they only include it on server grade CPUs not @Home units.

3) they downclock the entire CPU while AVX, and especially AVX-512 is running.

If AMD makes AVX-512 a single standard across their line, then F@H is more likely to code for it.

Post by **bruce** » Wed Mar 03, 2021 5:49 am

The GROMACS used in FAHCore_a8 is coded for AVX512 but the report that I saw said it was LESS productive that AVX256. It didn't explain why, so maybe it was due to locking down the entire CPU. Also they might have tested it on an earlier CPU model. The theoretical throughput may sound good in the specs and that may not work out in practical terms. Whatever happens in real life, it will be tested by FAH.

Folding Forum

AMD BFLOAT 16?

AMD BFLOAT 16?

Re: AMD BFLOAT 16?

Re: AMD BFLOAT 16?

Re: AMD BFLOAT 16?