Parakeet TDT ASR - VI (20260125)
Exported from NVIDIA base model with TensorRT FP32 optimization for production deployment.
Model Details
- Base Model:
parakeet-tdt-0.6b-v3-vi - Export Date: 20260125
- TensorRT Version: 10.x
- Precision: FP32
- Batch Configuration: 1-8-16 (min-opt-max)
- Sequence Configuration: 64-512-3000 frames (min-opt-max)
- Target Platform: NVIDIA Triton Inference Server
Files
model.nemo: NeMo checkpoint containing decoder, jointer, and tokenizertensorrt/l4/model.plan: TensorRT FP32 engine for encoder (optimized for L4 GPU)onnx/: ONNX models folder (portable, CPU/GPU compatible)encoder-*.onnx: ONNX encoder modeldecoder_joint-*.onnx: ONNX decoder and joint model
Architecture
This model uses a two-stage inference approach:
- Encoder (TensorRT): Fast GPU-accelerated feature extraction
- Decoder + Jointer (PyTorch): RNNT decoding with beam search
Usage with Triton
# In Triton model repository, create:
# - parakeet_asr_vi/1/model.nemo
# - parakeet_encoder_vi/1/model.plan (TensorRT - recommended for best performance)
# OR
# - parakeet_encoder_vi/1/encoder-temp_rnnt.onnx (ONNX - portable alternative)
# Start Triton server
tritonserver --model-repository=/models
# Make inference request
import tritonclient.grpc as grpcclient
client = grpcclient.InferenceServerClient("localhost:8001")
result = client.infer(model_name="parakeet_asr_vi", inputs=[...])
Usage with ONNX Runtime (Portable)
import onnxruntime as ort
# Load encoder and decoder from onnx folder
encoder_session = ort.InferenceSession("onnx/encoder-temp_rnnt.onnx")
decoder_session = ort.InferenceSession("onnx/decoder_joint-temp_rnnt.onnx")
# Run inference
encoder_out = encoder_session.run(None, {'audio': audio_features})
decoder_out = decoder_session.run(None, {'encoder_output': encoder_out[0]})
Performance
- Latency: ~50-100ms for typical audio (optimized batch size 8)
- Throughput: 16 concurrent requests supported
- Max Audio Duration: ~30 seconds (3000 frames at 100fps)
Model Card
For deployment instructions and examples, see:
- actableai/parakeet-tdt-0.6b-v3-vi (reference deployment)
- Triton Inference Server Documentation
Citation
@misc{parakeet-tdt-vi-20260125,
title={Parakeet TDT ASR - VI},
author={NVIDIA and ActableAI},
year={2026},
url={https://huggingface.co/actableai/parakeet-tdt-0.6b-v3-vi-20260125}
}
License
This model is released under CC-BY-4.0 license.
- Downloads last month
- 7