Qwen3-4B PokerGPT O3 SFT (GGUF)

GGUF quantized versions of YiPz/qwen3-4b-pokergpt-o3-sft-lora.

Model Description

Qwen3-4B-thinking-2507 fine-tuned on high quality reasoning traces. This model provides expert-level poker analysis with step-by-step reasoning.

Available Files

File	Size	Description
`qwen3-4b-pokergpt-o3-q4_k_m.gguf`	~2.5 GB	Recommended - good quality/size balance
`qwen3-4b-pokergpt-o3-q8_0.gguf`	~4.5 GB	Higher quality

Usage with Ollama

# Download
huggingface-cli download YiPz/qwen3-4b-pokergpt-o3-sft-gguf \
    qwen3-4b-pokergpt-o3-q4_k_m.gguf --local-dir ./

# Create Modelfile
cat > Modelfile << 'EOF'
FROM ./qwen3-4b-pokergpt-o3-q4_k_m.gguf

PARAMETER temperature 0.1
PARAMETER num_ctx 3072
PARAMETER stop "<|im_end|>"

SYSTEM "You are an expert poker coach who explains optimal plays with step-by-step reasoning."

TEMPLATE """{- if .System }<|im_start|>system
{ .System }<|im_end|>
{ end }{- range .Messages }<|im_start|>{ .Role }
{ .Content }<|im_end|>
{ end }<|im_start|>assistant
"""
EOF

# Create and run
ollama create pokergpt -f Modelfile
ollama run pokergpt "I have AKo on the button. What should I do?"

Output Format

The model outputs structured reasoning:

<think>
1. Position analysis: We are on the button...
2. Hand strength: AKo is a premium hand...
3. Stack considerations: With 100bb effective...
4. Action recommendation: We should 3-bet...
</think>

<action>cbr 9</action>

Training Details

Base Model: Qwen/Qwen3-4B-thinking-2507
Training Data: O3 reasoning distillations on PokerBench
Method: Full-precision LoRA fine-tuning (r=64)
Training Steps: 5,000

License

Apache 2.0

Downloads last month: 65

GGUF

Model size

4B params

Architecture

qwen3

Hardware compatibility

4-bit

8-bit

16-bit

Model tree for YiPz/qwen-4b-pokerbench-sft-gguf

Base model

Qwen/Qwen3-4B-Thinking-2507

Quantized

(98)

this model