ONNX Support
It seems that this model does not support ONNX which is required by Transformers.js npm library
import { pipeline } from "@huggingface/transformers";
const pipe = await pipeline(
"text-generation",
"ibm-granite/granite-4.0-h-small",
);
const output = await pipe("Once upon a time, there was", {
temperature: 0,
max_new_tokens: 20,
});
error:
Error: Could not locate file: "https://huggingface.co/ibm-granite/granite-4.0-h-small/resolve/main/onnx/model.onnx".
at handleError (webpack://huggingface/transformers/src/utils/hub.js:282:1)
at getModelFile (webpack://huggingface/transformers/src/utils/hub.js:555:1)
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
at async getSession (webpack://huggingface/transformers/src/models.js:336:1)
at async <anonymous> (webpack://huggingface/transformers/src/models.js:353:71)
at async Promise.all (index 0)
at async constructSessions (webpack://huggingface/transformers/src/models.js:351:1)
at async Promise.all (index 0)
at async GraniteMoeHybridForCausalLM.from_pretrained (webpack://huggingface/transformers/src/models.js:1145:1)
at async AutoModelForCausalLM.from_pretrained (webpack://huggingface/transformers/src/models.js:7903:1)
Hi @jose-biescas ! You're correct that this model does not have native ONNX support. That's true for two reasons:
- The models in this org are each only a single format, so putting
model.onnxinside this model would double the download size unless the user knows to only download thesafetensorsormodel.onnxdepending on their use case. - These hybrid-recurrent models rely on core operations that are not fully implemented in ONNX.
Reason (1) will stick around, but once (2) is solved, we'll have ONNX versions available in different model repos. We keep most of our other format conversions in the Granite Quantized Models collection, though that isn't a particularly accurate name for non-quantized format conversions and we don't currently have official ONNX conversions. Currently, the best place to find ONNX conversions is in onnx-community: https://huggingface.co/models?search=onnx-community/granite.
As for the status of (2), there has been some significant progress made by @Xenova . Last I checked, he had most of the core operations working, but there was a WebGPU bug preventing some of them from running on the GPU. There is a converted version of the smallest hybrid model available here: https://huggingface.co/onnx-community/granite-4.0-h-350m-ONNX. You can experiment with this to ensure that the architecture runs for you.
Hey! Thank you for the thorough explanation! I appreciate the alternative and I’ll definitely give it a try. And good luck, love the work that’s being put into Granite.
@jose-biescas
I touched base with
@Xenova
and he's now got two more ONNX converted models up with smaller sizes that should fit well with transformers.js:
Just note that you'd need to use Transformers.js v4 to run the models efficiently on WebGPU! https://github.com/huggingface/transformers.js/pull/1382
(You can install via this PR/source, or wait a couple weeks until we put out our first dev NPM release)