German Legal NER - ONNX INT4 Quantized

4-bit quantized ONNX version of elenanereiss/bert-german-ler for Named Entity Recognition in German legal texts.

Model Details

Base model bert-base-german-cased fine-tuned on German LER
Source model elenanereiss/bert-german-ler
Format ONNX with 4-bit weight quantization (MatMulNBits, block_size=128, symmetric)
Model size 134 MB (down from 415 MB fp32)
Max sequence length 512 tokens
License CC-BY-4.0

Performance

Metrics from the source model evaluated on the German LER test set:

Precision Recall F1
Micro avg 0.945 0.964 0.955
Macro avg 0.89 0.89 0.89

Per-entity F1 (test set)

Entity Code F1 Entity Code F1
Law GS 0.98 Court GRT 0.98
Court decision RS 0.97 Judge RR 0.97
Contract VT 0.96 Country LD 0.96
Legal literature LIT 0.96 Institution INN 0.95
EU norm EUN 0.95 Lawyer AN 0.94
Person PER 0.94 Brand MRK 0.93
Company UN 0.92 Organization ORG 0.91
Ordinance VO 0.90 Regulation VS 0.86
City ST 0.85 Street STR 0.77
Landscape LDS 0.61

Entity Types (19 classes)

Code German English Share in dataset
GS Gesetz Law / Statute 34.53%
RS Rechtsprechung Court decision 23.46%
GRT Gericht Court 5.99%
LIT Literatur Legal literature 5.60%
VT Vertrag Contract / Treaty 5.34%
INN Institution Institution 4.09%
PER Person Person 3.26%
RR Richter Judge 2.83%
EUN EU-Norm EU legal norm 2.79%
LD Land Country / State 2.66%
ORG Organisation Organization 2.17%
UN Unternehmen Company 1.97%
VO Verordnung Ordinance 1.49%
ST Stadt City 1.31%
VS Vorschrift Regulation 1.13%
MRK Marke Brand 0.53%
LDS Landschaft Landscape / Region 0.37%
STR Straße Street 0.25%
AN Anwalt Lawyer 0.21%

Usage

Requirements

pip install onnxruntime transformers numpy

Inference

import onnxruntime as ort
from transformers import AutoTokenizer
import numpy as np
import json

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("mayflowergmbh/bert-german-ler-onnx-int4")
session = ort.InferenceSession("model_int4.onnx", providers=["CPUExecutionProvider"])

# Load label mapping
with open("config.json") as f:
    config = json.load(f)
id2label = config["id2label"]

# Tokenize input
text = "Herr Müller verstieß gegen § 36 Abs. 7 IfSG und wurde vom Bundesgerichtshof verurteilt."
inputs = tokenizer(text, return_tensors="np", padding=True, truncation=True)

# Run inference
outputs = session.run(None, {
    "input_ids": inputs["input_ids"].astype(np.int64),
    "attention_mask": inputs["attention_mask"].astype(np.int64),
    "token_type_ids": inputs["token_type_ids"].astype(np.int64),
})

# Decode predictions
predictions = np.argmax(outputs[0], axis=-1)[0]
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])

for token, pred_id in zip(tokens, predictions):
    label = id2label[str(pred_id)]
    if token not in ("[PAD]", "[CLS]", "[SEP]") and label != "O":
        print(f"{token:20s} {label}")

Output:

Müller               B-PER
§                    B-GS
36                   I-GS
Abs                  I-GS
.                    I-GS
7                    I-GS
I                    I-GS
##f                  I-GS
##SG                 I-GS
Bundes               B-GRT
##gerichtshof        I-GRT

Entity Extraction Helper

def extract_entities(text, tokenizer, session, id2label):
    inputs = tokenizer(text, return_tensors="np", padding=True, truncation=True)
    outputs = session.run(None, {
        "input_ids": inputs["input_ids"].astype(np.int64),
        "attention_mask": inputs["attention_mask"].astype(np.int64),
        "token_type_ids": inputs["token_type_ids"].astype(np.int64),
    })
    predictions = np.argmax(outputs[0], axis=-1)[0]
    tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])

    entities = []
    current_entity = None
    current_tokens = []

    for token, pred_id in zip(tokens, predictions):
        if token in ("[PAD]", "[CLS]", "[SEP]"):
            continue
        label = id2label[str(pred_id)]

        if label.startswith("B-"):
            if current_entity:
                entities.append({
                    "entity": current_entity,
                    "text": tokenizer.convert_tokens_to_string(current_tokens).strip()
                })
            current_entity = label[2:]
            current_tokens = [token]
        elif label.startswith("I-") and current_entity == label[2:]:
            current_tokens.append(token)
        else:
            if current_entity:
                entities.append({
                    "entity": current_entity,
                    "text": tokenizer.convert_tokens_to_string(current_tokens).strip()
                })
                current_entity = None
                current_tokens = []

    if current_entity:
        entities.append({
            "entity": current_entity,
            "text": tokenizer.convert_tokens_to_string(current_tokens).strip()
        })

    return entities


entities = extract_entities(
    "Das Urteil des BGH vom 12.03.2021 (Az. III ZR 5/20) stützt sich auf § 280 Abs. 1 BGB.",
    tokenizer, session, id2label
)
for e in entities:
    print(f"[{e['entity']:>3s}] {e['text']}")

Output:

[GRT] BGH
[ RS] Az. III ZR 5 / 20
[ GS] § 280 Abs. 1 BGB

Batch Inference

texts = [
    "Der Kläger berief sich auf Art. 6 EMRK.",
    "Die Richterin Dr. Schmidt verwies auf das BVerfG-Urteil.",
]

inputs = tokenizer(texts, return_tensors="np", padding=True, truncation=True)
outputs = session.run(None, {
    "input_ids": inputs["input_ids"].astype(np.int64),
    "attention_mask": inputs["attention_mask"].astype(np.int64),
    "token_type_ids": inputs["token_type_ids"].astype(np.int64),
})

for i, text in enumerate(texts):
    predictions = np.argmax(outputs[0][i], axis=-1)
    tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][i])
    # ... process as above

Quantization Details

The model was quantized from the original fp32 ONNX export using ONNX Runtime's MatMulNBitsQuantizer:

from onnxruntime.quantization.matmul_nbits_quantizer import MatMulNBitsQuantizer
import onnx

model = onnx.load("model.onnx")
quant = MatMulNBitsQuantizer(
    model=model,
    block_size=128,
    is_symmetric=True,
    accuracy_level=4,
    bits=4,
)
quant.process()
fp32 ONNX INT4 ONNX
Size 415 MB 134 MB
Compression 1x ~3.1x
Quantization - 4-bit symmetric, block_size=128

Citation

If you use this model, please cite the original LER dataset paper:

@inproceedings{leitner2020dataset,
  title={A Dataset of German Legal Documents for Named Entity Recognition},
  author={Leitner, Elena and Rehm, Georg and Moreno-Schneider, Juli{\'a}n},
  booktitle={Proceedings of the 12th Language Resources and Evaluation Conference},
  pages={4886--4893},
  year={2020},
  url={https://arxiv.org/abs/2003.13016}
}

Acknowledgments

Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mayflowergmbh/bert-german-ler-onnx-int4

Quantized
(1)
this model

Dataset used to train mayflowergmbh/bert-german-ler-onnx-int4

Paper for mayflowergmbh/bert-german-ler-onnx-int4

Evaluation results