language: en tags: - text-classification - onnx - job-classification - it license: mit base_model: intfloat/e5-base-v2

IT vs Non-IT Job Title Classifier

Binary classifier that determines whether a job title belongs to an IT/tech role or not. Built on top of intfloat/e5-base-v2 embeddings with a logistic regression head, exported to ONNX for fast, lightweight inference with no heavy ML dependencies at runtime.

Repository contents

File	Description
`e5_it_classifier.onnx`	Logistic regression classifier head (ONNX)

The encoder (intfloat/e5-base-v2) is loaded separately at inference time — it is not bundled here since it is a public model.

How it works

The job title is prefixed with "query: " — required by the e5-v2 instruction format
The prefixed title is encoded by intfloat/e5-base-v2 with mean pooling and L2 normalization, producing a 768-dim embedding
The embedding is passed through the logistic regression ONNX model
The output is a probability for class 1 (IT) and class 0 (Non-IT)

Training

Encoder: intfloat/e5-base-v2 via sentence-transformers, embeddings L2-normalized
Classifier: sklearn.linear_model.LogisticRegression(C=1.0, max_iter=1000, class_weight='balanced')
Input: job title only
Labels: 1 = IT role, 0 = Non-IT role
Class balancing: enabled via class_weight='balanced' to handle uneven label distribution

Inference

Python

from sentence_transformers import SentenceTransformer
import onnxruntime as ort
import numpy as np

encoder = SentenceTransformer("intfloat/e5-base-v2")
sess = ort.InferenceSession("e5_it_classifier.onnx")

def classify(title: str) -> dict:
    emb = encoder.encode(["query: " + title], normalize_embeddings=True)
    probs = sess.run(["probabilities"], {"input": emb.astype(np.float32)})[0]
    return {
        "label": "IT" if probs[0][1] > probs[0][0] else "Non-IT",
        "it_probability": float(probs[0][1]),
    }

print(classify("Senior Software Engineer"))  # IT
print(classify("Regional Sales Manager"))    # Non-IT

JavaScript / TypeScript (Bun or Node)

import { pipeline } from "@huggingface/transformers";
import * as ort from "onnxruntime-node";

const extractor = await pipeline("feature-extraction", "intfloat/e5-base-v2", { quantized: false });
const session = await ort.InferenceSession.create("./e5_it_classifier.onnx");

async function classify(title: string) {
  const output = await extractor("query: " + title, { pooling: "mean", normalize: true });

  const results = await session.run({
    input: new ort.Tensor("float32", output.data as Float32Array, [1, 768]),
  });

  const probs = results.probabilities.data as Float32Array;
  return {
    label: probs[1] > probs[0] ? "IT" : "Non-IT",
    it_probability: probs[1],
  };
}

console.log(await classify("Senior Software Engineer")); // IT
console.log(await classify("Regional Sales Manager"));   // Non-IT

bun add @huggingface/transformers onnxruntime-node
# or
npm install @huggingface/transformers onnxruntime-node

Intended use

Designed for automated job pipeline filtering — quickly classifying job titles as IT or non-IT before downstream enrichment or processing steps. Works well as a lightweight pre-filter given that it only requires a job title with no description needed.

Limitations

Trained and evaluated on job title text only — unusual or highly abbreviated titles may score less reliably
English job titles only
Edge cases like hybrid roles (e.g. "IT Sales Manager") may produce probabilities close to 0.5

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support