IssueSpec V5 — App-Review Classifier (RoBERTa, 7-class)
The production Stage-1 classifier from the CIKM 2026 paper IssueSpec: A Framework for Structured Review-to-Issue Translation. Fine-tuned RoBERTa head that labels app-store reviews into the seven-class Maalej-Nabil taxonomy.
Classes
bug_report, feature_request, performance, usability, compatibility,
praise, other
Performance
On the 490-review expert gold standard:
- Cohen's κ = 0.592 (moderate; up from the V2 LLM baseline of 0.163)
- Accuracy = 65.0%, macro F1 = 0.653
- Recovers minority classes the LLM was blind to: compatibility F1 0.83, performance F1 0.77
V5 is trained on V2-corrected labels plus verified-anchor correction (5,230 expert-labeled reviews) and targeted compatibility augmentation (200 synthetic
- 100 mined). See the paper §3.1 and §5.1 for full details.
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tok = AutoTokenizer.from_pretrained("<ANON>/issuespec-v5-classifier")
model = AutoModelForSequenceClassification.from_pretrained("<ANON>/issuespec-v5-classifier")
text = "App crashes when opening ads on Samsung Galaxy S21 (Android 13)."
inputs = tok(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
logits = model(**inputs).logits
pred = logits.argmax(-1).item()
print(model.config.id2label[pred])
Cross-protocol generalization
Evaluated zero-shot against Maalej's 5,008 labels (out-of-distribution taxonomy): macro F1 = 0.676, weighted F1 = 0.730, accuracy = 72.0% — confirming the classifier performs concept recognition, not template memorization.
Citation
Please cite the CIKM 2026 paper. Code, data, and the full V1–V5 checkpoint
series are linked from the project repository's SETUP_GUIDE.md.
License
MIT
- Downloads last month
- 18