Qwen3-4B PokerGPT O3 SFT (GGUF)
GGUF quantized versions of YiPz/qwen3-4b-pokergpt-o3-sft-lora.
Model Description
Qwen3-4B-thinking-2507 fine-tuned on high quality reasoning traces. This model provides expert-level poker analysis with step-by-step reasoning.
Available Files
| File | Size | Description |
|---|---|---|
qwen3-4b-pokergpt-o3-q4_k_m.gguf |
~2.5 GB | Recommended - good quality/size balance |
qwen3-4b-pokergpt-o3-q8_0.gguf |
~4.5 GB | Higher quality |
Usage with Ollama
# Download
huggingface-cli download YiPz/qwen3-4b-pokergpt-o3-sft-gguf \
qwen3-4b-pokergpt-o3-q4_k_m.gguf --local-dir ./
# Create Modelfile
cat > Modelfile << 'EOF'
FROM ./qwen3-4b-pokergpt-o3-q4_k_m.gguf
PARAMETER temperature 0.1
PARAMETER num_ctx 3072
PARAMETER stop "<|im_end|>"
SYSTEM "You are an expert poker coach who explains optimal plays with step-by-step reasoning."
TEMPLATE """{- if .System }<|im_start|>system
{ .System }<|im_end|>
{ end }{- range .Messages }<|im_start|>{ .Role }
{ .Content }<|im_end|>
{ end }<|im_start|>assistant
"""
EOF
# Create and run
ollama create pokergpt -f Modelfile
ollama run pokergpt "I have AKo on the button. What should I do?"
Output Format
The model outputs structured reasoning:
<think>
1. Position analysis: We are on the button...
2. Hand strength: AKo is a premium hand...
3. Stack considerations: With 100bb effective...
4. Action recommendation: We should 3-bet...
</think>
<action>cbr 9</action>
Training Details
- Base Model: Qwen/Qwen3-4B-thinking-2507
- Training Data: O3 reasoning distillations on PokerBench
- Method: Full-precision LoRA fine-tuning (r=64)
- Training Steps: 5,000
License
Apache 2.0
- Downloads last month
- 65
Hardware compatibility
Log In to add your hardware
4-bit
8-bit
16-bit
Model tree for YiPz/qwen-4b-pokerbench-sft-gguf
Base model
Qwen/Qwen3-4B-Thinking-2507