sorakritt
/

qwen3-0.6b-sft-hh

intermediate-artifact

Model card Files Files and versions

SFT-tuned Qwen3-0.6B Model (Intermediate Artifact)

This model is an intermediate artifact from a ReMax alignment pipeline. It is the result of performing Supervised Fine-Tuning (SFT) on the base Qwen/Qwen3-0.6B-Base model.

Training Details

Dataset: A subset of 30000 'chosen' examples from Anthropic/hh-rlhf.
Epochs: 1
Purpose: This model serves as the initial policy (π_ref) for the ReMax alignment stage in the full training script.
Can be used in any pipeline where a SFT model is required.

Downloads last month: 2

Safetensors

Model size

0.6B params

Tensor type

F32

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sorakritt/qwen3-0.6b-sft-hh

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

(577)

this model

Dataset used to train sorakritt/qwen3-0.6b-sft-hh