AMIS Commodity Classifier

This model repository contains artifacts from an AMIS commodity relevance classifier training run. It includes the Transformer model, any configured TF-IDF or sentence-embedding baselines, prediction files, and the training report.

  • Dataset: faodl/amis-agri-soybeans
  • Dataset subset: ``
  • Text column: chunk_text
  • Label column: label
  • Transformer: distilbert/distilbert-base-multilingual-cased
  • Generated at: 2026-05-19T20:13:44.207534+00:00

Dataset Summary

Split Rows Label 0 Label 1 Unique groups Mean text length
train 4745 3860 885 2244 702.4
validation 1034 782 252 481 710.3
test 1074 889 185 482 708.6

Threshold Comparison on Test Split

Model Threshold Accuracy Precision Recall F1 ROC AUC Average precision
logistic_tfidf 0.500 0.944 0.805 0.892 0.846 0.967 0.914
logistic_tfidf 0.454 0.941 0.785 0.908 0.842 0.967 0.914
xgboost_tfidf 0.500 0.954 0.895 0.832 0.863 0.964 0.896
xgboost_tfidf 0.549 0.955 0.905 0.827 0.864 0.964 0.896
embedding-logistic_sentence_embeddings 0.500 0.939 0.753 0.957 0.843 0.988 0.951
embedding-logistic_sentence_embeddings 0.647 0.954 0.837 0.914 0.873 0.988 0.951
embedding-svm_sentence_embeddings 0.500 0.957 0.884 0.865 0.874 0.988 0.949
embedding-svm_sentence_embeddings 0.379 0.955 0.848 0.903 0.874 0.988 0.949
embedding-lightgbm_sentence_embeddings 0.500 0.959 0.894 0.865 0.879 0.985 0.950
embedding-lightgbm_sentence_embeddings 0.429 0.959 0.890 0.870 0.880 0.985 0.950
transformer 0.500 0.954 0.882 0.849 0.865 0.976 0.929
transformer 0.493 0.955 0.883 0.854 0.868 0.976 0.929

Confusion Matrices on Test Split

Rows are true labels and columns are predicted labels.

logistic_tfidf at threshold 0.500

True / Predicted NOT_RELEVANT RELEVANT
NOT_RELEVANT 849 40
RELEVANT 20 165

logistic_tfidf at threshold 0.454

True / Predicted NOT_RELEVANT RELEVANT
NOT_RELEVANT 843 46
RELEVANT 17 168

xgboost_tfidf at threshold 0.500

True / Predicted NOT_RELEVANT RELEVANT
NOT_RELEVANT 871 18
RELEVANT 31 154

xgboost_tfidf at threshold 0.549

True / Predicted NOT_RELEVANT RELEVANT
NOT_RELEVANT 873 16
RELEVANT 32 153

embedding-logistic_sentence_embeddings at threshold 0.500

True / Predicted NOT_RELEVANT RELEVANT
NOT_RELEVANT 831 58
RELEVANT 8 177

embedding-logistic_sentence_embeddings at threshold 0.647

True / Predicted NOT_RELEVANT RELEVANT
NOT_RELEVANT 856 33
RELEVANT 16 169

embedding-svm_sentence_embeddings at threshold 0.500

True / Predicted NOT_RELEVANT RELEVANT
NOT_RELEVANT 868 21
RELEVANT 25 160

embedding-svm_sentence_embeddings at threshold 0.379

True / Predicted NOT_RELEVANT RELEVANT
NOT_RELEVANT 859 30
RELEVANT 18 167

embedding-lightgbm_sentence_embeddings at threshold 0.500

True / Predicted NOT_RELEVANT RELEVANT
NOT_RELEVANT 870 19
RELEVANT 25 160

embedding-lightgbm_sentence_embeddings at threshold 0.429

True / Predicted NOT_RELEVANT RELEVANT
NOT_RELEVANT 869 20
RELEVANT 24 161

transformer at threshold 0.500

True / Predicted NOT_RELEVANT RELEVANT
NOT_RELEVANT 868 21
RELEVANT 28 157

transformer at threshold 0.493

True / Predicted NOT_RELEVANT RELEVANT
NOT_RELEVANT 868 21
RELEVANT 27 158

Validation-Tuned Thresholds

  • logistic_tfidf: threshold 0.454 (validation F1 0.870); test F1 change vs 0.5: -0.004.
  • xgboost_tfidf: threshold 0.549 (validation F1 0.900); test F1 change vs 0.5: +0.002.
  • embedding-logistic_sentence_embeddings: threshold 0.647 (validation F1 0.851); test F1 change vs 0.5: +0.031.
  • embedding-svm_sentence_embeddings: threshold 0.379 (validation F1 0.840); test F1 change vs 0.5: +0.000.
  • embedding-lightgbm_sentence_embeddings: threshold 0.429 (validation F1 0.847); test F1 change vs 0.5: +0.001.
  • transformer: threshold 0.493 (validation F1 0.924); test F1 change vs 0.5: +0.003.

Artifacts

  • logistic_tfidf: /content/agri-soybeans-classifier/baselines/logistic
  • xgboost_tfidf: /content/agri-soybeans-classifier/baselines/xgboost
  • embedding-logistic_sentence_embeddings: /content/agri-soybeans-classifier/baselines/embedding-logistic
  • embedding-svm_sentence_embeddings: /content/agri-soybeans-classifier/baselines/embedding-svm
  • embedding-lightgbm_sentence_embeddings: /content/agri-soybeans-classifier/baselines/embedding-lightgbm
  • transformer: /content/agri-soybeans-classifier/transformer

Inference

Install the runtime dependencies:

pip install transformers torch huggingface_hub pandas joblib scikit-learn xgboost sentence-transformers lightgbm

Transformer

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

MODEL_ID = "faodl/agri-soybeans-classifier"

texts = [
    "Rice export prices increased after new procurement rules were announced.",
    "The finance ministry released its monthly fuel tax bulletin.",
]

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, subfolder="transformer")
model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID, subfolder="transformer")
threshold = float(getattr(model.config, "threshold", 0.5))

encoded = tokenizer(
    texts,
    truncation=True,
    padding=True,
    max_length=256,
    return_tensors="pt",
)

with torch.no_grad():
    logits = model(**encoded).logits
    probabilities = torch.softmax(logits, dim=-1)[:, 1].tolist()

for text, probability in zip(texts, probabilities):
    label = model.config.id2label[int(probability >= threshold)]
    print({"text": text, "probability_positive": probability, "label": label})

TF-IDF Baselines

Available baseline names in this run: "logistic", "xgboost".

import json
import joblib
from huggingface_hub import hf_hub_download

MODEL_ID = "faodl/agri-soybeans-classifier"
BASELINE = "logistic"

texts = [
    "Maize production forecasts were revised after delayed rains.",
    "The central bank published new exchange rate statistics.",
]

model_path = hf_hub_download(
    repo_id=MODEL_ID,
    repo_type="model",
    filename=f"baselines/{BASELINE}/{BASELINE}_tfidf.joblib",
)
report_path = hf_hub_download(
    repo_id=MODEL_ID,
    repo_type="model",
    filename="report.json",
)

pipeline = joblib.load(model_path)
with open(report_path, encoding="utf-8") as handle:
    report = json.load(handle)

threshold = next(
    result["validation_best_threshold"]["threshold"]
    for result in report["results"]
    if result["model_type"] == f"{BASELINE}_tfidf"
)

probabilities = pipeline.predict_proba(texts)[:, 1]
for text, probability in zip(texts, probabilities):
    label = "RELEVANT" if probability >= threshold else "NOT_RELEVANT"
    print({"text": text, "probability_positive": float(probability), "label": label})

Sentence-Embedding Baselines

Available embedding baseline names in this run: "embedding-logistic", "embedding-svm", "embedding-lightgbm".

import joblib
from huggingface_hub import hf_hub_download
from sentence_transformers import SentenceTransformer

MODEL_ID = "faodl/agri-soybeans-classifier"
BASELINE = "embedding-logistic"

texts = [
    "Wheat export inspections rose as demand from importers increased.",
    "The sports ministry announced a new stadium renovation plan.",
]

model_path = hf_hub_download(
    repo_id=MODEL_ID,
    repo_type="model",
    filename=f"baselines/{BASELINE}/{BASELINE}.joblib",
)
artifact = joblib.load(model_path)
embedding_model = SentenceTransformer(artifact["embedding_model_name"])
embeddings = embedding_model.encode(
    texts,
    batch_size=artifact.get("embedding_batch_size", 64),
    convert_to_numpy=True,
    normalize_embeddings=artifact.get("normalize_embeddings", True),
)
probabilities = artifact["classifier"].predict_proba(embeddings)[:, 1]
threshold = artifact["validation_best_threshold"]["threshold"]

for text, probability in zip(texts, probabilities):
    label = "RELEVANT" if probability >= threshold else "NOT_RELEVANT"
    print({"text": text, "probability_positive": float(probability), "label": label})

Files

  • REPORT.md: Markdown report for this training run.
  • report.json: Machine-readable report containing metrics and thresholds.
  • transformer/: Fine-tuned Transformer artifacts, when Transformer training is enabled.
  • baselines/: TF-IDF and sentence-embedding baseline artifacts, when baseline training is enabled.
  • */validation_predictions.csv and */test_predictions.csv: Split-level predictions.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for faodl/agri-soybeans-classifier

Finetuned
(445)
this model