Why F16 is used to represent MXFP4?
#14
by czl - opened
May I know what's the rationale to call MXFP4_MOE quant as F16?
- I understand the MX formats from the Microscaling paper, it shows that the
MXFP6performance is comparable toFP32 - A comparison made in NVIDIA's blog compares the performance of
MXFP4toFP8and points that the accuracy ofMXFP4may have noticeable accuracy drop versusFP8- In this case, wouldn't
F8be more suitable?
- In this case, wouldn't
Because OpenAI originally released the model in this format. So technically f16 is the model's 'original' precision. Only the MOE layers are quantized. If you want all layers unquantized, then go for the B32 version
Thanks for clarifying.
czl changed discussion status to closed