language: en tags: - text-classification - onnx - job-classification - it license: mit base_model: intfloat/e5-base-v2
IT vs Non-IT Job Title Classifier
Binary classifier that determines whether a job title belongs to an IT/tech role or not. Built on top of intfloat/e5-base-v2 embeddings with a logistic regression head, exported to ONNX for fast, lightweight inference with no heavy ML dependencies at runtime.
Repository contents
| File | Description |
|---|---|
e5_it_classifier.onnx |
Logistic regression classifier head (ONNX) |
The encoder (intfloat/e5-base-v2) is loaded separately at inference time โ it is not bundled here since it is a public model.
How it works
- The job title is prefixed with
"query: "โ required by the e5-v2 instruction format - The prefixed title is encoded by
intfloat/e5-base-v2with mean pooling and L2 normalization, producing a 768-dim embedding - The embedding is passed through the logistic regression ONNX model
- The output is a probability for class
1(IT) and class0(Non-IT)
Training
- Encoder:
intfloat/e5-base-v2viasentence-transformers, embeddings L2-normalized - Classifier:
sklearn.linear_model.LogisticRegression(C=1.0, max_iter=1000, class_weight='balanced') - Input: job title only
- Labels:
1= IT role,0= Non-IT role - Class balancing: enabled via
class_weight='balanced'to handle uneven label distribution
Inference
Python
from sentence_transformers import SentenceTransformer
import onnxruntime as ort
import numpy as np
encoder = SentenceTransformer("intfloat/e5-base-v2")
sess = ort.InferenceSession("e5_it_classifier.onnx")
def classify(title: str) -> dict:
emb = encoder.encode(["query: " + title], normalize_embeddings=True)
probs = sess.run(["probabilities"], {"input": emb.astype(np.float32)})[0]
return {
"label": "IT" if probs[0][1] > probs[0][0] else "Non-IT",
"it_probability": float(probs[0][1]),
}
print(classify("Senior Software Engineer")) # IT
print(classify("Regional Sales Manager")) # Non-IT
JavaScript / TypeScript (Bun or Node)
import { pipeline } from "@huggingface/transformers";
import * as ort from "onnxruntime-node";
const extractor = await pipeline("feature-extraction", "intfloat/e5-base-v2", { quantized: false });
const session = await ort.InferenceSession.create("./e5_it_classifier.onnx");
async function classify(title: string) {
const output = await extractor("query: " + title, { pooling: "mean", normalize: true });
const results = await session.run({
input: new ort.Tensor("float32", output.data as Float32Array, [1, 768]),
});
const probs = results.probabilities.data as Float32Array;
return {
label: probs[1] > probs[0] ? "IT" : "Non-IT",
it_probability: probs[1],
};
}
console.log(await classify("Senior Software Engineer")); // IT
console.log(await classify("Regional Sales Manager")); // Non-IT
bun add @huggingface/transformers onnxruntime-node
# or
npm install @huggingface/transformers onnxruntime-node
Intended use
Designed for automated job pipeline filtering โ quickly classifying job titles as IT or non-IT before downstream enrichment or processing steps. Works well as a lightweight pre-filter given that it only requires a job title with no description needed.
Limitations
- Trained and evaluated on job title text only โ unusual or highly abbreviated titles may score less reliably
- English job titles only
- Edge cases like hybrid roles (e.g. "IT Sales Manager") may produce probabilities close to 0.5
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support