SFT-tuned Qwen3-0.6B Model (Intermediate Artifact)

This model is an intermediate artifact from a ReMax alignment pipeline. It is the result of performing Supervised Fine-Tuning (SFT) on the base Qwen/Qwen3-0.6B-Base model.

Training Details

  • Dataset: A subset of 30000 'chosen' examples from Anthropic/hh-rlhf.
  • Epochs: 1
  • Purpose: This model serves as the initial policy (Ï€_ref) for the ReMax alignment stage in the full training script.
  • Can be used in any pipeline where a SFT model is required.
Downloads last month
2
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sorakritt/qwen3-0.6b-sft-hh

Finetuned
(577)
this model

Dataset used to train sorakritt/qwen3-0.6b-sft-hh