nemotron-speech-streaming-en-0.6b-int8

Quantized ONNX model for streaming speech recognition, derived from altunenes/parakeet-rs (nemotron-speech-streaming-en-0.6b).

Quantization Method

Dynamic int8 quantization (onnxruntime quantize_dynamic, QInt8 weights)

Files

File Description
encoder.onnx Quantized encoder (stateful, cache-aware streaming)
decoder_joint.onnx Quantized decoder + joint network
tokenizer.model SentencePiece tokenizer (unchanged from source)

Usage

These models are designed for use with parakeet-rs or compatible ONNX Runtime inference pipelines. The encoder is stateful with cache tensors for streaming inference (cache_last_channel, cache_last_time, cache_last_channel_len).

Source

Quantized from the ONNX models in altunenes/parakeet-rs subdirectory nemotron-speech-streaming-en-0.6b/.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lokkju/nemotron-speech-streaming-en-0.6b-int8

Quantized
(2)
this model