DocOracle-v1

DocOracle-v1 is a compact Hugging Face transformer router for synthetic business-document workflows. It classifies synthetic invoices, inbox requests, and RFQs into operational decisions used by an agentic back-office benchmark.

The model is part of PerimeterReasoner-AgentBench, a public-safe benchmark for local-first business-document agents. The benchmark combines deterministic extraction, relational memory, fake ERP actions, human-review routing, audit traces, exact metrics, and this optional learned routing layer.

Labels

DocOracle-v1 predicts one of four workflow labels:

Label Meaning
auto_approve Route a valid invoice to automatic approval.
human_review Route a risky, incomplete, unsupported, unknown, or ambiguous case to human review.
draft_created Route a resolved inbox/order request to fake ERP order-draft creation.
draft_quote Route a resolved RFQ request to quote drafting.

Label Examples

auto_approve

The model should predict auto_approve when an invoice is complete, uses a supported currency, includes a purchase order, and is below the review threshold.

Invoice INV-000123
Vendor: Alpine Robotics AG
Total: 1250.00 EUR
VAT: 250.00 EUR
Due date: 2026-06-15
PO number: PO-1001
Payment terms: NET 30

human_review

The model should predict human_review when the case is risky, incomplete, unsupported, unknown, or ambiguous.

Invoice INV-000124
Vendor: Marinello & Co
Total: 8750.00 EUR
VAT: 1750.00 EUR
Due date: 2026-06-25
PO number: PO-1006
Payment terms: NET 45

Reason: the amount is above the automatic approval threshold.

Another example:

From: buyer@unknown.example
Subject: New order

Please send 20 boxes of our usual premium filters next Tuesday.

Reason: the sender/customer cannot be resolved from memory.

draft_created

The model should predict draft_created when an inbox/order request has enough information to create a fake ERP order draft.

From: purchasing@marinello.example
Subject: Repeat order

Please send 20 boxes of our usual premium filters next Tuesday.
Use our normal shipping method.

Reason: the customer, product alias, quantity, and shipping preference can be resolved.

draft_quote

The model should predict draft_quote when a customer is asking for pricing and the RFQ has enough information to prepare a quote draft.

Customer asks for a quote for 100 industrial sensors, delivery in Zurich, payment terms NET 45.

Reason: the product exists, the quantity is present, and the payment terms are supported.

Technical Details

  • Architecture: BertForSequenceClassification
  • Base checkpoint: google/bert_uncased_L-2_H-128_A-2
  • Library: Hugging Face transformers
  • Checkpoint format: model.safetensors
  • Task: synthetic workflow routing / text classification
  • Training mode: full fine-tuning for the compact CPU-friendly checkpoint
  • Optional repo support: PEFT/LoRA training path and ModernBERT LoRA showcase path

Training Data

The model was trained only on synthetic examples generated by the benchmark. The data covers three task families:

  • invoice validation
  • inbox-to-ERP order drafting
  • RFQ triage

The generated split is balanced across the four workflow labels.

Split Examples
Train 576
Test 144

No customer documents, real invoices, private prompts, credentials, or production code are included.

Evaluation

Evaluation was run on the synthetic test split.

Metric Value
Accuracy 0.9792
Macro F1 0.9791

The deterministic rule-based benchmark remains the reference baseline. DocOracle-v1 is useful for comparing learned routing behavior against exact, reproducible rules.

Usage

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="YOUR_USERNAME/doc-oracle-v1",
)

text = """Invoice INV-000123
Vendor: Alpine Robotics AG
Total: 1250.00 EUR
VAT: 250.00 EUR
Due date: 2026-06-15
PO number: PO-1001
Payment terms: NET 30"""

print(classifier(text))

Expected label:

auto_approve

Example Inputs

Invoice:

Invoice INV-000123
Vendor: Alpine Robotics AG
Total: 1250.00 EUR
VAT: 250.00 EUR
Due date: 2026-06-15
PO number: PO-1001
Payment terms: NET 30

Inbox request:

From: purchasing@marinello.example
Subject: Repeat order

Please send 20 boxes of our usual premium filters next Tuesday.
Use our normal shipping method.

RFQ:

Customer asks for a quote for 100 industrial sensors, delivery in Zurich, payment terms NET 45.

Intended Use

Use DocOracle-v1 for:

  • synthetic benchmark demos
  • workflow-routing experiments
  • comparing learned routing against deterministic business rules
  • interview or portfolio demonstrations of document AI and agent evaluation

It is not intended for real invoice approval, financial automation, or production business decisions.

Limitations

  • The model is trained only on synthetic data.
  • It does not understand real customer contracts, business policies, or ERP systems.
  • It should not be used for real approvals without real validation, monitoring, governance, and human review controls.
  • The high metric values reflect the controlled synthetic benchmark, not real-world production performance.

Safety And Privacy

This model and its benchmark data are public-safe:

  • no customer data
  • no real invoices
  • no private prompts
  • no credentials
  • no LuxoAI production code
  • no private infrastructure details

Project Context

DocOracle-v1 is the learned router component of PerimeterReasoner-AgentBench. The broader benchmark includes:

  • deterministic invoice extraction and policy validation
  • synthetic email and RFQ tasks
  • relational customer/product memory
  • fake ERP state-machine actions
  • human-review escalation
  • audit traces
  • exact metrics and deterministic judge scoring
  • FastAPI and Docker surfaces
  • optional PEFT/LoRA and ModernBERT training paths
Downloads last month
22
Safetensors
Model size
4.39M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support