SFT-tuned Qwen3-0.6B Model (Intermediate Artifact)
This model is an intermediate artifact from a ReMax alignment pipeline. It is the result of performing Supervised Fine-Tuning (SFT) on the base Qwen/Qwen3-0.6B-Base model.
Training Details
- Dataset: A subset of 30000 'chosen' examples from
Anthropic/hh-rlhf. - Epochs: 1
- Purpose: This model serves as the initial policy (
Ï€_ref) for the ReMax alignment stage in the full training script. - Can be used in any pipeline where a SFT model is required.
- Downloads last month
- 2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for sorakritt/qwen3-0.6b-sft-hh
Base model
Qwen/Qwen3-0.6B-Base