ONNX Support

#10

by jose-biescas - opened 27 days ago

27 days ago

It seems that this model does not support ONNX which is required by Transformers.js npm library

import { pipeline } from "@huggingface/transformers";

const pipe = await pipeline(
  "text-generation",
  "ibm-granite/granite-4.0-h-small",
);

const output = await pipe("Once upon a time, there was", {
  temperature: 0,
  max_new_tokens: 20,
});

error:

Error: Could not locate file: "https://huggingface.co/ibm-granite/granite-4.0-h-small/resolve/main/onnx/model.onnx".
    at handleError (webpack://huggingface/transformers/src/utils/hub.js:282:1)
    at getModelFile (webpack://huggingface/transformers/src/utils/hub.js:555:1)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async getSession (webpack://huggingface/transformers/src/models.js:336:1)
    at async <anonymous> (webpack://huggingface/transformers/src/models.js:353:71)
    at async Promise.all (index 0)
    at async constructSessions (webpack://huggingface/transformers/src/models.js:351:1)
    at async Promise.all (index 0)
    at async GraniteMoeHybridForCausalLM.from_pretrained (webpack://huggingface/transformers/src/models.js:1145:1)
    at async AutoModelForCausalLM.from_pretrained (webpack://huggingface/transformers/src/models.js:7903:1)

gabegoodhart

IBM Granite org 27 days ago

Hi @jose-biescas ! You're correct that this model does not have native ONNX support. That's true for two reasons:

The models in this org are each only a single format, so putting model.onnx inside this model would double the download size unless the user knows to only download the safetensors or model.onnx depending on their use case.
These hybrid-recurrent models rely on core operations that are not fully implemented in ONNX.

Reason (1) will stick around, but once (2) is solved, we'll have ONNX versions available in different model repos. We keep most of our other format conversions in the Granite Quantized Models collection, though that isn't a particularly accurate name for non-quantized format conversions and we don't currently have official ONNX conversions. Currently, the best place to find ONNX conversions is in onnx-community: https://huggingface.co/models?search=onnx-community/granite.

As for the status of (2), there has been some significant progress made by @Xenova . Last I checked, he had most of the core operations working, but there was a WebGPU bug preventing some of them from running on the GPU. There is a converted version of the smallest hybrid model available here: https://huggingface.co/onnx-community/granite-4.0-h-350m-ONNX. You can experiment with this to ensure that the architecture runs for you.

jose-biescas

27 days ago

•

edited 27 days ago

Hey! Thank you for the thorough explanation! I appreciate the alternative and I’ll definitely give it a try. And good luck, love the work that’s being put into Granite.

jose-biescas changed discussion status to closed 27 days ago

gabegoodhart

IBM Granite org 21 days ago

•

edited 21 days ago

@jose-biescas I touched base with @Xenova and he's now got two more ONNX converted models up with smaller sizes that should fit well with transformers.js:

Xenova

IBM Granite org 21 days ago

Just note that you'd need to use Transformers.js v4 to run the models efficiently on WebGPU! https://github.com/huggingface/transformers.js/pull/1382
(You can install via this PR/source, or wait a couple weeks until we put out our first dev NPM release)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment