What impact has quantization had on model performance / ability?
#4
by spanspek - opened
The README for this quantized model still shows benchmark results of the full un-quantized model
Generally speaking, models quantized to FP8 lose almost no accuracy but beyond that we know that performance starts to degrade
Are benchmarks available for this model at this specific level of quantization?
hanks you too