What impact has quantization had on model performance / ability?

by spanspek - opened Mar 4

Mar 4

The README for this quantized model still shows benchmark results of the full un-quantized model

Generally speaking, models quantized to FP8 lose almost no accuracy but beyond that we know that performance starts to degrade

Are benchmarks available for this model at this specific level of quantization?

Mar 4

hanks you too

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment