Transformers documentation
Exporters
Exporters
New export backends can be added to Transformers by subclassing HfExporter.
Learn how to use the built-in exporters in the Exporters guide.
AutoHfExporter
The Auto-HF expoerter class that takes care of automatically instantiating to the correct HfExporter given the ExportConfig.
Load an exporter instance from a pretrained model/checkpoint that ships an export config.
Not implemented yet — placeholder for a first-class “export recipe” workflow.
The idea: model owners publish an export_config.json (or an export_config field in config.json) alongside their weights on the Hub. That file captures the settings the
owner has already validated for their architecture — the target format (dynamo / onnx / executorch), exact dynamic-shape specs (e.g. text_ids dynamic to 4096,
image tiles fixed at 448, batch=1 for edge deployment), strict flag, ONNX opset,
prefill vs. decode layout, ExecuTorch backend choice, and any other knob that today lives
as tribal knowledge in a README or a private notebook.
Consumers then get the owner-validated export in one call:
exporter = AutoHfExporter.from_pretrained("org/model-name")
program = exporter.export(model, inputs)Composes with the [register_export_input_preparer] registry: the owner supplies the
shape spec via export_config.json, transformers supplies the data-dependent
precomputations (cu_seqlens, vision position ids, window indices, …) for that
architecture. Together they cover the two hard parts of exporting new models — knowing
the right shape contract and preparing the right inputs — so downstream users don’t
re-derive either from scratch (and don’t break in production when they get it wrong).
Return True if the provided dict describes an export_format that has both a
registered config class and a registered exporter class. Warns with an actionable message
when the format is missing entirely, unknown, or only half-registered.
AutoExportConfig
The Auto-HF export config class that takes care of automatically dispatching to the correct export config given an export config stored in a dictionary.
HfExporter
Abstract base class for all Transformers exporters.
Subclass and implement ~HfExporter.export to add a new export backend.
export
< source >( model: PreTrainedModelsample_inputs: MutableMapping[str, torch.Tensor | Cache]config: ExportConfigMixin )
Parameters
- model (PreTrainedModel) — The model to export.
- sample_inputs (
dict[str, torch.Tensor | Cache]) — Forward kwargs — what you’d pass tomodel(**sample_inputs). These are used directly as the example inputs during tracing. For an autoregressive decode-step export, this means you need to includepast_key_values,cache_position, etc. If you only have generation-style inputs, use~HfExporter.export_for_generationinstead — it runsmodel.generatefor you and exports each stage. - config (
ExportConfigMixin) — Backend-specific configuration.
Export the model and return the backend-specific program object.
export_for_generation
< source >( model: PreTrainedModelsample_inputs: MutableMapping[str, torch.Tensor | Cache]config: ExportConfigMixin | dict[str, ExportConfigMixin] ) → dict[str, Any]
Parameters
- model (PreTrainedModel) —
The generative model to export. Must support
model.generate(**sample_inputs). - sample_inputs (
dict[str, torch.Tensor | Cache]) — Generate kwargs — what you’d pass tomodel.generate(**sample_inputs)(typicallyinput_ids+attention_mask, plus any modality inputs likepixel_values/input_featuresfor multi-modal models). Per-stage forward kwargs are captured internally. - config (
ExportConfigMixinordict[str, ExportConfigMixin]) — Backend-specific configuration. Pass a single config to apply to every component, or adictkeyed by component name (e.g."image_encoder","language_model","lm_head","decode") to override per-component — all component names must be present in the dict.
Returns
dict[str, Any]
{component_name: backend_specific_artifact} — same keys as
decompose_for_generation(). Values are whatever
~HfExporter.export returns for the concrete backend (ExportedProgram,
ONNXProgram, ExecutorchProgramManager).
Decompose a generative model and export each component independently.
Thin wrapper around decompose_for_generation() that calls ~HfExporter.export on every returned (submodel, forward_inputs) pair. If you need
the intermediate (submodel, forward_inputs) pairs (for verification, custom inputs,
skipping a stage, …), call decompose_for_generation() directly.
Check required_packages are installed and warn on version drift from tested_versions.
DynamoConfig
class transformers.exporters.DynamoConfig
< source >( export_format: ExportFormat = <ExportFormat.DYNAMO: 'dynamo'>dynamic: bool = Falsestrict: bool = Falsedynamic_shapes: dict[str, typing.Any] | None = Noneprefer_deferred_runtime_asserts_over_guards: bool = False )
Parameters
- dynamic (bool, optional, defaults to False) — Whether to export with dynamic (symbolic) shapes. When True and dynamic_shapes is not set, all tensor dimensions are set to Dim.AUTO automatically.
- strict (bool, optional, defaults to False) — Whether to enable strict mode in torch.export. Runs the full symbolic trace and catches more errors, but is slower and more likely to fail on complex models.
- dynamic_shapes (dict[str, Any], optional) — Explicit per-input dynamic shape specifications passed to torch.export. Takes precedence over dynamic.
- prefer_deferred_runtime_asserts_over_guards (bool, optional, defaults to False) —
When True, data-dependent shape guards are emitted as runtime asserts in the exported
graph instead of failing the export at trace time when a guard wouldn’t hold across the
full symbolic shape range. Most transformer LLMs need this set to True when using
fine-grained
Dim(min=, max=)bounds. Not needed withdynamic=True/Dim.AUTO, wheretorch.exportinfers shape relations instead of verifying them against the user-stated bounds.
Configuration class for exporting models via torch.export.
OnnxConfig
class transformers.exporters.OnnxConfig
< source >( export_format: ExportFormat = <ExportFormat.ONNX: 'onnx'>dynamic: bool = Falsestrict: bool = Falsedynamic_shapes: dict[str, typing.Any] | None = Noneprefer_deferred_runtime_asserts_over_guards: bool = Falseoutput_path: str | os.PathLike | None = Noneopset_version: int | None = Noneexternal_data: bool = Trueoptimize: bool = Trueexport_params: bool = Truekeep_initializers_as_inputs: bool = False )
Parameters
- output_path (
strorPathLike, optional) — Output path for the.onnxfile. WhenNone(default) the exported model is kept in memory as anONNXProgramand not written to disk. - opset_version (
int, optional) — ONNX opset version to target. Defaults to the latest opset supported by the installedonnxscriptversion. - external_data (
bool, optional, defaults toTrue) — Store large weight tensors in a separate.onnx_datasidecar file instead of embedding them in the protobuf. Required for models whose weights exceed the 2 GB protobuf limit. - optimize (
bool, optional, defaults toTrue) — Runonnxscriptoptimisation passes (constant folding, dead-code elimination, …) on the exported graph. Disable for models that hit upstreamonnxscriptoptimiser bugs. - export_params (
bool, optional, defaults toTrue) — Embed model weights in the ONNX graph. Set toFalseto export a weight-free graph (weights must be supplied at runtime). - keep_initializers_as_inputs (
bool, optional, defaults toFalse) — Expose weight initializers as explicit graph inputs. Required by some older ONNX runtimes (opset < 9).
Configuration class for exporting models to ONNX via torch.onnx.export.
Inherits all fields from DynamoConfig (dynamic, strict, dynamic_shapes, prefer_deferred_runtime_asserts_over_guards).
ExecutorchConfig
class transformers.exporters.ExecutorchConfig
< source >( export_format: ExportFormat = <ExportFormat.EXECUTORCH: 'executorch'>dynamic: bool = Falsestrict: bool = Falsedynamic_shapes: dict[str, typing.Any] | None = Noneprefer_deferred_runtime_asserts_over_guards: bool = Falsebackend: str = 'xnnpack' )
Configuration class for exporting models to ExecuTorch format.
Inherits all fields from DynamoConfig (dynamic, strict, dynamic_shapes, prefer_deferred_runtime_asserts_over_guards).