PIBot Joint BERT

Modelo Joint BERT multi-head para clasificación de intención y slot filling, especializado en consultas sobre indicadores macroeconómicos del Banco Central de Chile.

Arquitectura

Componente Detalle
Base microsoft/mdeberta-v3-base
Task pibimacecv3
Intent heads 5 (activity, calc_mode, investment, region, req_form)
Slot labels 15 (BIO)
Custom code modeling_jointbert.py, module.py

Intent Heads

Head Clases Valores
activity 3 none, specific, general
calc_mode 4 original, prev_period, yoy, contribution
investment 3 none, specific, general
region 3 none, specific, general
req_form 3 latest, point, range

Slot Entities (BIO)

Entidades extraídas: activity, frequency, indicator, investment, period, region, seasonality

Esquema BIO completo: 15 etiquetas (O, B-*, I-*).

Uso

Instalación

pip install torch transformers

Carga del Modelo

import torch
from transformers import AutoTokenizer, AutoConfig

# Cargar tokenizer y config
tokenizer = AutoTokenizer.from_pretrained("BCCh/pibert", trust_remote_code=True)
config = AutoConfig.from_pretrained("BCCh/pibert", trust_remote_code=True)

# Cargar labels desde el repo
from huggingface_hub import hf_hub_download
import os

label_dir = os.path.dirname(hf_hub_download("BCCh/pibert", "labels/slot_label.txt"))

# Leer intent y slot labels
def read_labels(path):
    with open(path) as f:
        return [line.strip() for line in f if line.strip()]

slot_labels = read_labels(os.path.join(label_dir, "slot_label.txt"))

# Preparar intent_label_lst para cada head
intent_label_lst = []
for head in ['activity', 'calc_mode', 'investment', 'region', 'req_form']:
    intent_label_lst.append(read_labels(os.path.join(label_dir, f"{head}_label.txt")))

# Cargar modelo con custom code
from transformers import AutoModelForTokenClassification
from modeling_jointbert import JointBERT  # auto-cargado con trust_remote_code

model = JointBERT.from_pretrained(
    "BCCh/pibert",
    config=config,
    intent_label_lst=intent_label_lst,
    slot_label_lst=slot_labels,
    trust_remote_code=True,
)
model.eval()

Predicción

text = "cuál fue el imacec de agosto 2024"
tokens = tokenizer(text, return_tensors="pt", padding=True, truncation=True)

with torch.no_grad():
    outputs = model(**tokens)
    # outputs contiene intent_logits (lista) y slot_logits

Estructura del Paquete

model_package/
├── config.json              # Configuración BERT + task
├── model.safetensors        # Pesos del modelo
├── tokenizer.json           # Tokenizer
├── tokenizer_config.json
├── special_tokens_map.json
├── vocab.txt
├── modeling_jointbert.py    # Arquitectura JointBERT (custom)
├── module.py                # CRF y módulos auxiliares
├── __init__.py
├── README.md                # Este archivo
└── labels/
    ├── slot_label.txt
    ├── activity_label.txt
    ├── calc_mode_label.txt
    ├── investment_label.txt
    ├── region_label.txt
    ├── req_form_label.txt

Datos de Entrenamiento

Entrenado con datos de consultas sobre indicadores macroeconómicos chilenos:

  • IMACEC (Indicador Mensual de Actividad Económica)
  • PIB (Producto Interno Bruto)
  • Sectores económicos, frecuencias, períodos, regiones

Limitaciones

  • Especializado en consultas macroeconómicas del Banco Central de Chile
  • Mejor rendimiento en consultas cortas (< 50 tokens)
  • Requiere trust_remote_code=True por la arquitectura custom

Cita

@misc{pibot-jointbert,
  author = {Banco Central de Chile},
  title = {PIBot Joint BERT - Multi-head Intent + Slot Filling},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/BCCh/pibert}}
}

Referencias

Licencia

MIT License

Downloads last month
26
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BCCh/pibert

Finetuned
(251)
this model

Paper for BCCh/pibert