DocOracle-v1
DocOracle-v1 is a compact Hugging Face transformer router for synthetic business-document workflows. It classifies synthetic invoices, inbox requests, and RFQs into operational decisions used by an agentic back-office benchmark.
The model is part of PerimeterReasoner-AgentBench, a public-safe benchmark for local-first business-document agents. The benchmark combines deterministic extraction, relational memory, fake ERP actions, human-review routing, audit traces, exact metrics, and this optional learned routing layer.
Labels
DocOracle-v1 predicts one of four workflow labels:
| Label | Meaning |
|---|---|
auto_approve |
Route a valid invoice to automatic approval. |
human_review |
Route a risky, incomplete, unsupported, unknown, or ambiguous case to human review. |
draft_created |
Route a resolved inbox/order request to fake ERP order-draft creation. |
draft_quote |
Route a resolved RFQ request to quote drafting. |
Label Examples
auto_approve
The model should predict auto_approve when an invoice is complete, uses a supported currency, includes a purchase order, and is below the review threshold.
Invoice INV-000123
Vendor: Alpine Robotics AG
Total: 1250.00 EUR
VAT: 250.00 EUR
Due date: 2026-06-15
PO number: PO-1001
Payment terms: NET 30
human_review
The model should predict human_review when the case is risky, incomplete, unsupported, unknown, or ambiguous.
Invoice INV-000124
Vendor: Marinello & Co
Total: 8750.00 EUR
VAT: 1750.00 EUR
Due date: 2026-06-25
PO number: PO-1006
Payment terms: NET 45
Reason: the amount is above the automatic approval threshold.
Another example:
From: buyer@unknown.example
Subject: New order
Please send 20 boxes of our usual premium filters next Tuesday.
Reason: the sender/customer cannot be resolved from memory.
draft_created
The model should predict draft_created when an inbox/order request has enough information to create a fake ERP order draft.
From: purchasing@marinello.example
Subject: Repeat order
Please send 20 boxes of our usual premium filters next Tuesday.
Use our normal shipping method.
Reason: the customer, product alias, quantity, and shipping preference can be resolved.
draft_quote
The model should predict draft_quote when a customer is asking for pricing and the RFQ has enough information to prepare a quote draft.
Customer asks for a quote for 100 industrial sensors, delivery in Zurich, payment terms NET 45.
Reason: the product exists, the quantity is present, and the payment terms are supported.
Technical Details
- Architecture:
BertForSequenceClassification - Base checkpoint:
google/bert_uncased_L-2_H-128_A-2 - Library: Hugging Face
transformers - Checkpoint format:
model.safetensors - Task: synthetic workflow routing / text classification
- Training mode: full fine-tuning for the compact CPU-friendly checkpoint
- Optional repo support: PEFT/LoRA training path and ModernBERT LoRA showcase path
Training Data
The model was trained only on synthetic examples generated by the benchmark. The data covers three task families:
- invoice validation
- inbox-to-ERP order drafting
- RFQ triage
The generated split is balanced across the four workflow labels.
| Split | Examples |
|---|---|
| Train | 576 |
| Test | 144 |
No customer documents, real invoices, private prompts, credentials, or production code are included.
Evaluation
Evaluation was run on the synthetic test split.
| Metric | Value |
|---|---|
| Accuracy | 0.9792 |
| Macro F1 | 0.9791 |
The deterministic rule-based benchmark remains the reference baseline. DocOracle-v1 is useful for comparing learned routing behavior against exact, reproducible rules.
Usage
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="YOUR_USERNAME/doc-oracle-v1",
)
text = """Invoice INV-000123
Vendor: Alpine Robotics AG
Total: 1250.00 EUR
VAT: 250.00 EUR
Due date: 2026-06-15
PO number: PO-1001
Payment terms: NET 30"""
print(classifier(text))
Expected label:
auto_approve
Example Inputs
Invoice:
Invoice INV-000123
Vendor: Alpine Robotics AG
Total: 1250.00 EUR
VAT: 250.00 EUR
Due date: 2026-06-15
PO number: PO-1001
Payment terms: NET 30
Inbox request:
From: purchasing@marinello.example
Subject: Repeat order
Please send 20 boxes of our usual premium filters next Tuesday.
Use our normal shipping method.
RFQ:
Customer asks for a quote for 100 industrial sensors, delivery in Zurich, payment terms NET 45.
Intended Use
Use DocOracle-v1 for:
- synthetic benchmark demos
- workflow-routing experiments
- comparing learned routing against deterministic business rules
- interview or portfolio demonstrations of document AI and agent evaluation
It is not intended for real invoice approval, financial automation, or production business decisions.
Limitations
- The model is trained only on synthetic data.
- It does not understand real customer contracts, business policies, or ERP systems.
- It should not be used for real approvals without real validation, monitoring, governance, and human review controls.
- The high metric values reflect the controlled synthetic benchmark, not real-world production performance.
Safety And Privacy
This model and its benchmark data are public-safe:
- no customer data
- no real invoices
- no private prompts
- no credentials
- no LuxoAI production code
- no private infrastructure details
Project Context
DocOracle-v1 is the learned router component of PerimeterReasoner-AgentBench. The broader benchmark includes:
- deterministic invoice extraction and policy validation
- synthetic email and RFQ tasks
- relational customer/product memory
- fake ERP state-machine actions
- human-review escalation
- audit traces
- exact metrics and deterministic judge scoring
- FastAPI and Docker surfaces
- optional PEFT/LoRA and ModernBERT training paths
- Downloads last month
- 22