Qwen2.5-Coder-7B-Instruct — Dataset Generator V2 Fine-tune

Fine-tuned version of Qwen2.5-Coder-7B-Instruct trained on Dataset Generator V2 — synthetic coding dataset generated with Dataset Generator.

Benchmark Results

Model HumanEval HumanEval+
Base Qwen2.5-Coder-7B-Instruct 55.5% (±2.1) 49.0% (±1.9)
This model (FT V2) 60.0% (±0.9) 54.0% (±1.8)

+4.5pp on HumanEval, +5.0pp on HumanEval+ vs base — error bars don't overlap, statistically significant improvement (5 runs averaged).

Benchmark

Training

  • Method: QLoRA fine-tuning via Unsloth
  • Base model: Qwen2.5-Coder-7B-Instruct
  • Dataset: Dataset Generator V2 (1,135 multi-turn examples)
  • Hardware: RTX 4070 Ti 12GB
  • Quantization: Q4_K_M GGUF (quantized by Unsloth)
  • Chat template: ChatML (embedded in GGUF)
  • Context length: 32,768 tokens
  • Evaluation: 5 runs on HumanEval/HumanEval+ at temp 0.2

Training logs and exact hyperparameters were not preserved — this was an exploratory fine-tune.

Training Data

Trained on Dataset Generator V2 — 1,135 multi-turn conversations across 8 coding categories:

  • Code Generation & Debugging
  • API, DevOps & Infrastructure
  • Architecture, Testing & Refactoring
  • Terminal, CLI & Tooling
  • Algorithms & Data Manipulation
  • Data Processing & Transformation
  • Code Reasoning & Review
  • Practical Multi-step Problem Solving

See the dataset card for full details including generation models and methodology.

Limitations

  • Optimized for algorithmic coding and reasoning — shows measurable improvement on HumanEval/HumanEval+
  • Not optimized for library-heavy workflows (pandas, numpy, requests) — for those use cases, train on a dataset with library-focused categories using Dataset Generator
  • Multi-turn conversational style — produces explanations alongside code

Support

If this helped you:

  • Ko-fi: https://ko-fi.com/arondaron
  • ETH: 0xA6910bDa2a89ee38cA42883e365BB2DdFba3C2A1
  • BTC: bc1qamarkursch3x8399qaly4md32ck5xgthnr9jpl
  • SOL: 797jTzFRm9dd4joHPqvUjryeXi5rPbMwG6Rqj3wJrgMt

License

Apache-2.0 — inherited from base model Qwen2.5-Coder-7B-Instruct.

Built with Dataset Generator.

Downloads last month
151
GGUF
Model size
8B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AronDaron/Qwen2.5-Coder-7B-Instruct-DatasetGen-v2

Base model

Qwen/Qwen2.5-7B
Quantized
(189)
this model

Dataset used to train AronDaron/Qwen2.5-Coder-7B-Instruct-DatasetGen-v2