SQLStorm GRPO checkpoint (Llama 3.2 3B)

RL checkpoint from GRPO training (rl_finetune_grpo.py) on StackOverflow-style SQL, continuing from abharadwaj123/llama3-sql2plan.

Contents: merged causal LM weights (model.safetensors), tokenizer, config. Optimizer state was omitted to save space.

Load:

from transformers import AutoModelForCausalLM, AutoTokenizer
m = AutoModelForCausalLM.from_pretrained("REPO_ID", torch_dtype="auto", device_map="auto")
tok = AutoTokenizer.from_pretrained("REPO_ID")
Downloads last month
14
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support