Is BFLOAT16 feasible for folding?
From digging the inets, I found:
Bfloat16
Fp16 has the drawback for scientific computing of having a limited range, its largest positive number being
6.55 \times 10^4. This has led to the development of an alternative 16-bit format that trades precision for range. The bfloat16 format is used by Google in its tensor processing units. Intel, which plans to support bfloat16 in its forthcoming Nervana Neural Network Processor, has recently (November 2018) published a white paper that gives a precise definition of the format.
The allocation of bits to the exponent and significand for bfloat16, fp16, and fp32 is shown in this table, where the implicit leading bit of a normalized number is counted in the significand.
Format Significand Exponent
bfloat16 8 bits 8 bits
fp16 11 bits 5 bits
fp32 24 bits 8 bits
Bfloat16 has three fewer bits in the significand than fp16, but three more in the exponent. And it has the same exponent size as fp32. Consequently, converting from fp32 to bfloat16 is easy: the exponent is kept the same and the significand is rounded or truncated from 24 bits to 8; hence overflow and underflow are not possible in the conversion.