Any plan to open the streaming model to run without vLLM?

by nenad1002 - opened 6 days ago

6 days ago

Currently Qwen3-ASR "streaming" is only available through vLLM, while the HF / Torch interface is offline only.
Is the streaming variant a different trained model, or the same weights with a different runtime graph?

And is there any plan to release the streaming model, so it can be used with non-vLLM runtimes (e.g. directly through torch with a standalone PyTorch API)?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment