IssueSpec V5 — App-Review Classifier (RoBERTa, 7-class)

The production Stage-1 classifier from the CIKM 2026 paper IssueSpec: A Framework for Structured Review-to-Issue Translation. Fine-tuned RoBERTa head that labels app-store reviews into the seven-class Maalej-Nabil taxonomy.

Classes

bug_report, feature_request, performance, usability, compatibility, praise, other

Performance

On the 490-review expert gold standard:

Cohen's κ = 0.592 (moderate; up from the V2 LLM baseline of 0.163)
Accuracy = 65.0%, macro F1 = 0.653
Recovers minority classes the LLM was blind to: compatibility F1 0.83, performance F1 0.77

V5 is trained on V2-corrected labels plus verified-anchor correction (5,230 expert-labeled reviews) and targeted compatibility augmentation (200 synthetic

100 mined). See the paper §3.1 and §5.1 for full details.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tok = AutoTokenizer.from_pretrained("<ANON>/issuespec-v5-classifier")
model = AutoModelForSequenceClassification.from_pretrained("<ANON>/issuespec-v5-classifier")

text = "App crashes when opening ads on Samsung Galaxy S21 (Android 13)."
inputs = tok(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
    logits = model(**inputs).logits
pred = logits.argmax(-1).item()
print(model.config.id2label[pred])

Cross-protocol generalization

Evaluated zero-shot against Maalej's 5,008 labels (out-of-distribution taxonomy): macro F1 = 0.676, weighted F1 = 0.730, accuracy = 72.0% — confirming the classifier performs concept recognition, not template memorization.

Citation

Please cite the CIKM 2026 paper. Code, data, and the full V1–V5 checkpoint series are linked from the project repository's SETUP_GUIDE.md.

License

MIT

Downloads last month: 18

Safetensors

Model size

0.1B params

Tensor type

F32