Instructions to use mikrografija/doc-extractor-vl with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mikrografija/doc-extractor-vl with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="mikrografija/doc-extractor-vl") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("mikrografija/doc-extractor-vl") model = AutoModelForImageTextToText.from_pretrained("mikrografija/doc-extractor-vl") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use mikrografija/doc-extractor-vl with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "mikrografija/doc-extractor-vl" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mikrografija/doc-extractor-vl", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/mikrografija/doc-extractor-vl
- SGLang
How to use mikrografija/doc-extractor-vl with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "mikrografija/doc-extractor-vl" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mikrografija/doc-extractor-vl", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "mikrografija/doc-extractor-vl" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mikrografija/doc-extractor-vl", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use mikrografija/doc-extractor-vl with Docker Model Runner:
docker model run hf.co/mikrografija/doc-extractor-vl
doc-extractor-vl
Document data extraction model based on Qwen2.5-VL-7B-Instruct, configured for structured JSON output from document images (invoices, forms, receipts, etc.).
Key Features
- Cyrillic-free output: Includes pre-computed logit bias file that blocks all 4129 Cyrillic tokens, preventing Cyrillic/Latin script confusion common in multilingual VL models
- Structured JSON output: System prompt enforces JSON-only responses
- Multilingual: Optimized for Slovenian, English, German, Croatian and other Latin-script languages
Files
| File | Description |
|---|---|
cyrillic_logit_bias.json |
4129 token IDs with bias -100 to block Cyrillic generation |
system_prompt.txt |
System prompt template for document extraction |
serving_config.yaml |
Recommended vLLM serving parameters |
generate_cyrillic_bias.py |
Script to regenerate the logit bias file |
Usage with vLLM
Serving
vllm serve mikrografija/doc-extractor-vl --max-model-len 4096
Request with Cyrillic blocking
import json
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")
# Load Cyrillic logit bias
with open("cyrillic_logit_bias.json") as f:
cyrillic_bias = {int(k): v for k, v in json.load(f).items()}
# Load system prompt
with open("system_prompt.txt") as f:
system_prompt = f.read()
response = client.chat.completions.create(
model="mikrografija/doc-extractor-vl",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": [
{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."} },
{"type": "text", "text": "Extract data into this JSON schema: {\"issuer\": \"\", \"date\": \"\", \"total\": \"\", \"items\": []}"}
]}
],
logit_bias=cyrillic_bias,
temperature=0.0,
max_tokens=4096,
)
Why Cyrillic Blocking?
Qwen2.5-VL models are trained on multilingual data including Cyrillic scripts. When processing Latin-script documents (especially Slovenian, Croatian, or other languages with diacritics), the model occasionally substitutes Latin characters with visually similar Cyrillic characters (e.g., Latin "a" → Cyrillic "а"). The logit bias approach blocks this at the decoding level, making it impossible for the model to generate Cyrillic tokens.
Base Model
This model uses unmodified Qwen2.5-VL-7B-Instruct weights. No fine-tuning was applied. The configuration files provide the Cyrillic blocking and structured output enforcement.
- Downloads last month
- 3
Model tree for mikrografija/doc-extractor-vl
Base model
Qwen/Qwen2.5-VL-7B-Instruct