language: en tags: - text-classification - onnx - job-classification - it license: mit base_model: intfloat/e5-base-v2

IT vs Non-IT Job Title Classifier

Binary classifier that determines whether a job title belongs to an IT/tech role or not. Built on top of intfloat/e5-base-v2 embeddings with a logistic regression head, exported to ONNX for fast, lightweight inference with no heavy ML dependencies at runtime.

Repository contents

File Description
e5_it_classifier.onnx Logistic regression classifier head (ONNX)

The encoder (intfloat/e5-base-v2) is loaded separately at inference time โ€” it is not bundled here since it is a public model.

How it works

  1. The job title is prefixed with "query: " โ€” required by the e5-v2 instruction format
  2. The prefixed title is encoded by intfloat/e5-base-v2 with mean pooling and L2 normalization, producing a 768-dim embedding
  3. The embedding is passed through the logistic regression ONNX model
  4. The output is a probability for class 1 (IT) and class 0 (Non-IT)

Training

  • Encoder: intfloat/e5-base-v2 via sentence-transformers, embeddings L2-normalized
  • Classifier: sklearn.linear_model.LogisticRegression(C=1.0, max_iter=1000, class_weight='balanced')
  • Input: job title only
  • Labels: 1 = IT role, 0 = Non-IT role
  • Class balancing: enabled via class_weight='balanced' to handle uneven label distribution

Inference

Python

from sentence_transformers import SentenceTransformer
import onnxruntime as ort
import numpy as np

encoder = SentenceTransformer("intfloat/e5-base-v2")
sess = ort.InferenceSession("e5_it_classifier.onnx")

def classify(title: str) -> dict:
    emb = encoder.encode(["query: " + title], normalize_embeddings=True)
    probs = sess.run(["probabilities"], {"input": emb.astype(np.float32)})[0]
    return {
        "label": "IT" if probs[0][1] > probs[0][0] else "Non-IT",
        "it_probability": float(probs[0][1]),
    }

print(classify("Senior Software Engineer"))  # IT
print(classify("Regional Sales Manager"))    # Non-IT

JavaScript / TypeScript (Bun or Node)

import { pipeline } from "@huggingface/transformers";
import * as ort from "onnxruntime-node";

const extractor = await pipeline("feature-extraction", "intfloat/e5-base-v2", { quantized: false });
const session = await ort.InferenceSession.create("./e5_it_classifier.onnx");

async function classify(title: string) {
  const output = await extractor("query: " + title, { pooling: "mean", normalize: true });

  const results = await session.run({
    input: new ort.Tensor("float32", output.data as Float32Array, [1, 768]),
  });

  const probs = results.probabilities.data as Float32Array;
  return {
    label: probs[1] > probs[0] ? "IT" : "Non-IT",
    it_probability: probs[1],
  };
}

console.log(await classify("Senior Software Engineer")); // IT
console.log(await classify("Regional Sales Manager"));   // Non-IT
bun add @huggingface/transformers onnxruntime-node
# or
npm install @huggingface/transformers onnxruntime-node

Intended use

Designed for automated job pipeline filtering โ€” quickly classifying job titles as IT or non-IT before downstream enrichment or processing steps. Works well as a lightweight pre-filter given that it only requires a job title with no description needed.

Limitations

  • Trained and evaluated on job title text only โ€” unusual or highly abbreviated titles may score less reliably
  • English job titles only
  • Edge cases like hybrid roles (e.g. "IT Sales Manager") may produce probabilities close to 0.5
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support