bingbangboom
/

qwen3-0.6B-holmes-nt

Text Classification

text-generation

text-embeddings-inference

Model card Files Files and versions

qwen3-0.6B-holmes-nt (WIP)

Qwen3-0.6B SFT'd on text→text_type pairs from pangram/editlens_iclr to output one of three labels: human_written, ai_edited, or ai_generated.

No thinking mode, direct label output only.

This is a WIP and requires review. See the benchmark script for methodology. If you find any mistakes, please let me know. Refer the official EditLens Repo and their paper for more context.

Training (via Unsloth Studio)

Parameter	Value
Base model	`unsloth/qwen3-0.6b-unsloth-bnb-4bit`
Epochs	1 (1875 steps)
Learning rate	5e-5 (linear schedule)
Warmup steps	50
Optimizer	AdamW 8-bit
Per-device batch size	2
Gradient accumulation	16
Effective batch size	32
Max sequence length	2048
LoRA rank / alpha	16 / 16
Seed	3407

Trained with Unsloth and HuggingFace TRL.

Ternary Classification

In-Domain (test split) — All Models

Paper baselines from EditLens (ICLR 2026, Table 2). Dark-edged bars = local run(Open Source).

Out-of-Domain (test_llama split) — All Models

Out-of-Domain (test_enron split) — All Models

All Pooled Splits (test+test_enron+test_llama) — Open Source Only

Confusion Matrices (All Pooled Splits — Open Source Only)

Generalization Across Splits — Open Source Only

Confusion Matrices (Across Splits — Open Source Only)

Binary Classification

Paper baselines from Table 1, Section 4.2

Human vs. Any AI — In-Domain, All Models (test split, collapsed ternary for Holmes)

Fully AI vs. AI-Edited + Human — In-Domain, All Models (test split,collapsed ternary for Holmes)

Human vs. Any AI — All Pooled Splits, Open Source Only (test+test_enron+test_llama, collapsed ternary for Holmes)

Fully AI vs. AI-Edited + Human — All Pooled Splits, Open Source Only (test+test_enron+test_llama, collapsed ternary for Holmes)

Human vs. Fully AI(no AI-Edited) — Across Splits, All Models

Third-party benchmarks

Nonnative English (Liang et al., 2023) — 91 human texts

Human Detectors (Russell et al., 2024) — High Quality Articles (150 human and 150 AI)

Benchmark Scripts

Full benchmark pipeline and raw predictions: revalidate/

Downloads last month: 986

Safetensors

Model size

0.6B params

Tensor type

BF16

·

Dataset used to train bingbangboom/qwen3-0.6B-holmes-nt

Papers for bingbangboom/qwen3-0.6B-holmes-nt

EditLens: Quantifying the Extent of AI Editing in Text

Paper • 2510.03154 • Published Oct 3, 2025

People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text

Paper • 2501.15654 • Published Jan 26, 2025 • 16

GPT detectors are biased against non-native English writers

Paper • 2304.02819 • Published Apr 6, 2023