Qwen3-4B PokerGPT O3 SFT (GGUF)

GGUF quantized versions of YiPz/qwen3-4b-pokergpt-o3-sft-lora.

Model Description

Qwen3-4B-thinking-2507 fine-tuned on high quality reasoning traces. This model provides expert-level poker analysis with step-by-step reasoning.

Available Files

File Size Description
qwen3-4b-pokergpt-o3-q4_k_m.gguf ~2.5 GB Recommended - good quality/size balance
qwen3-4b-pokergpt-o3-q8_0.gguf ~4.5 GB Higher quality

Usage with Ollama

# Download
huggingface-cli download YiPz/qwen3-4b-pokergpt-o3-sft-gguf \
    qwen3-4b-pokergpt-o3-q4_k_m.gguf --local-dir ./

# Create Modelfile
cat > Modelfile << 'EOF'
FROM ./qwen3-4b-pokergpt-o3-q4_k_m.gguf

PARAMETER temperature 0.1
PARAMETER num_ctx 3072
PARAMETER stop "<|im_end|>"

SYSTEM "You are an expert poker coach who explains optimal plays with step-by-step reasoning."

TEMPLATE """{- if .System }<|im_start|>system
{ .System }<|im_end|>
{ end }{- range .Messages }<|im_start|>{ .Role }
{ .Content }<|im_end|>
{ end }<|im_start|>assistant
"""
EOF

# Create and run
ollama create pokergpt -f Modelfile
ollama run pokergpt "I have AKo on the button. What should I do?"

Output Format

The model outputs structured reasoning:

<think>
1. Position analysis: We are on the button...
2. Hand strength: AKo is a premium hand...
3. Stack considerations: With 100bb effective...
4. Action recommendation: We should 3-bet...
</think>

<action>cbr 9</action>

Training Details

  • Base Model: Qwen/Qwen3-4B-thinking-2507
  • Training Data: O3 reasoning distillations on PokerBench
  • Method: Full-precision LoRA fine-tuning (r=64)
  • Training Steps: 5,000

License

Apache 2.0

Downloads last month
65
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for YiPz/qwen-4b-pokerbench-sft-gguf

Quantized
(98)
this model