# Qwen3-8B QLoRA 파인튜닝 가이드

> **모델명:** `bluejude10/Smoothie-Qwen3-8B-DTRO-Edition`
> **작성일:** 2026-01-11
> **작성자:** bluejude10

---

## 📋 목차

1. [개요](#1-개요)
2. [환경 설정](#2-환경-설정)
3. [학습 설정](#3-학습-설정)
4. [데이터셋 포맷](#4-데이터셋-포맷)
5. [학습 실행](#5-학습-실행)
6. [GGUF 변환](#6-gguf-변환)
7. [Ollama 배포](#7-ollama-배포)
8. [트러블슈팅](#8-트러블슈팅)

---

## 1. 개요

### 1.1 프로젝트 목표
- **베이스 모델:** `dnotitia/Smoothie-Qwen3-8B`
- **목적:** 한자 없이 순수 한글로 응답하도록 튜닝
- **방법:** QLoRA (4-bit Quantized Low-Rank Adaptation)
- **하드웨어:** NVIDIA RTX 3080 Ti (12GB VRAM)

### 1.2 최종 산출물
| 파일명 | 설명 |
|--------|------|
| `final_model_qlora/` | LoRA 어댑터 (vLLM/HuggingFace용) |
| `Smoothie-Qwen3-8B-DTRO-Edition-Q4_K_M.gguf` | 파인튜닝된 GGUF (Ollama용) |
| `Smoothie-Qwen3-8B-Original-Q4_K_M.gguf` | 원본 GGUF (비교용) |

---

## 2. 환경 설정

### 2.1 필수 패키지

```bash
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install transformers accelerate datasets peft bitsandbytes
pip install trl>=0.26.0
```

### 2.2 검증된 버전 (2026-01-11 기준)

| 패키지 | 버전 |
|--------|------|
| Python | 3.13 |
| PyTorch | 2.8.0+cu128 |
| Transformers | 4.57.3 |
| TRL | 0.26.2 |
| PEFT | latest |
| BitsAndBytes | latest |

### 2.3 주의사항
- **TRL 0.26.x API 변경:** `SFTTrainer`의 `tokenizer` 인자가 `processing_class`로 변경됨
- **max_seq_length → max_length:** `SFTConfig`에서 파라미터명 변경됨
- **Windows 제약:** `triton` 미지원으로 Unsloth 사용 불가 → 표준 TRL 사용

---

## 3. 학습 설정

### 3.1 QLoRA 설정

```python
class Config:
    # 모델
    MODEL_NAME = "dnotitia/Smoothie-Qwen3-8B"
    MAX_SEQ_LENGTH = 1024
    
    # LoRA 하이퍼파라미터
    LORA_R = 16              # LoRA rank
    LORA_ALPHA = 32          # LoRA alpha (보통 r의 2배)
    LORA_DROPOUT = 0.05      # 드롭아웃
    LORA_TARGET_MODULES = [
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj"
    ]
    
    # 학습 하이퍼파라미터
    BATCH_SIZE = 1
    GRADIENT_ACCUMULATION = 16  # 효과적 배치 크기: 16
    NUM_EPOCHS = 20
    LEARNING_RATE = 2e-4
    WARMUP_STEPS = 50
    WEIGHT_DECAY = 0.01
```

### 3.2 4-bit 양자화 설정

```python
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",           # NormalFloat4
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,      # 이중 양자화
)
```

### 3.3 TRL 0.26.x SFTConfig

```python
from trl import SFTTrainer, SFTConfig

training_args = SFTConfig(
    output_dir=config.OUTPUT_DIR,
    per_device_train_batch_size=config.BATCH_SIZE,
    gradient_accumulation_steps=config.GRADIENT_ACCUMULATION,
    warmup_steps=config.WARMUP_STEPS,
    num_train_epochs=config.NUM_EPOCHS,
    learning_rate=config.LEARNING_RATE,
    fp16=False,
    bf16=torch.cuda.is_bf16_supported(),
    logging_steps=10,
    optim="paged_adamw_8bit",
    weight_decay=config.WEIGHT_DECAY,
    lr_scheduler_type="cosine",
    seed=config.SEED,
    save_strategy="epoch",
    save_total_limit=2,
    report_to="none",
    gradient_checkpointing=True,
    dataset_text_field="text",       # ⚠️ SFTConfig으로 이동
    max_length=config.MAX_SEQ_LENGTH, # ⚠️ max_seq_length → max_length
)

trainer = SFTTrainer(
    model=model,
    processing_class=tokenizer,  # ⚠️ tokenizer → processing_class
    train_dataset=dataset,
    args=training_args,
)
```

---

## 4. 데이터셋 포맷

### 4.1 입력 형식 (Alpaca Format)

```json
[
    {
        "instruction": "질문 또는 지시문",
        "input": "추가 컨텍스트 (선택)",
        "output": "응답"
    }
]
```

### 4.2 변환 후 형식

```
### Instruction:
{instruction}

### Input:
{input}

### Response:
{output}
```

### 4.3 학습 데이터
- **파일:** `merged_dataset.json`
- **항목 수:** 1,273개
- **내용:** 한글 응답 학습용 QnA 데이터

---

## 5. 학습 실행

### 5.1 실행 명령

```bash
cd "e:\qwen3 8b"
python finetune_qlora.py
```

### 5.2 학습 결과

```
trainable params: 43,646,976 || all params: 8,234,382,336 || trainable%: 0.5301%
총 스텝: 1,600 (1,273 samples × 20 epochs ÷ 16 effective batch)
소요 시간: 약 4시간 (RTX 3080 Ti 기준)
```

### 5.3 출력 디렉토리

| 경로 | 설명 |
|------|------|
| `outputs_qlora/` | 체크포인트 (epoch별 저장) |
| `final_model_qlora/` | 최종 LoRA 어댑터 |

---

## 6. GGUF 변환

### 6.1 사전 요구사항

```bash
# llama.cpp 클론
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

# 의존성 설치
pip install -r requirements.txt
```

### 6.2 LoRA 병합 + GGUF 변환

**Step 1: LoRA를 베이스 모델에 병합**

```python
# merge_lora.py
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# 베이스 모델 로드 (FP16)
base_model = AutoModelForCausalLM.from_pretrained(
    "dnotitia/Smoothie-Qwen3-8B",
    torch_dtype=torch.float16,
    device_map="cpu",
    trust_remote_code=True,
)

# LoRA 어댑터 로드 및 병합
model = PeftModel.from_pretrained(base_model, "e:/qwen3 8b/final_model_qlora")
model = model.merge_and_unload()

# 병합된 모델 저장
model.save_pretrained("e:/qwen3 8b/merged_model_fp16")
tokenizer = AutoTokenizer.from_pretrained("dnotitia/Smoothie-Qwen3-8B")
tokenizer.save_pretrained("e:/qwen3 8b/merged_model_fp16")
```

**Step 2: GGUF 변환 (Q4_K_M)**

```bash
# llama.cpp 디렉토리에서 실행
python convert_hf_to_gguf.py "e:/qwen3 8b/merged_model_fp16" \
    --outfile "e:/qwen3 8b/Smoothie-Qwen3-8B-DTRO-Edition-f16.gguf" \
    --outtype f16

# 양자화
./llama-quantize "e:/qwen3 8b/Smoothie-Qwen3-8B-DTRO-Edition-f16.gguf" \
    "e:/qwen3 8b/Smoothie-Qwen3-8B-DTRO-Edition-Q4_K_M.gguf" Q4_K_M
```

### 6.3 원본 모델 GGUF 변환 (비교용)

```bash
python convert_hf_to_gguf.py "dnotitia/Smoothie-Qwen3-8B" \
    --outfile "e:/qwen3 8b/Smoothie-Qwen3-8B-Original-f16.gguf" \
    --outtype f16

./llama-quantize "e:/qwen3 8b/Smoothie-Qwen3-8B-Original-f16.gguf" \
    "e:/qwen3 8b/Smoothie-Qwen3-8B-Original-Q4_K_M.gguf" Q4_K_M
```

---

## 7. Ollama 배포

### 7.1 Modelfile 작성

**파인튜닝 버전 (Modelfile.dtro)**

```dockerfile
FROM ./Smoothie-Qwen3-8B-DTRO-Edition-Q4_K_M.gguf

TEMPLATE """{{- if .System }}{{ .System }}
{{ end }}{{ if .Prompt }}### Instruction:
{{ .Prompt }}

### Response:
{{ end }}{{ .Response }}"""

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 40
PARAMETER repeat_penalty 1.1
PARAMETER num_ctx 4096
PARAMETER stop "### Instruction:"
PARAMETER stop "### Response:"

SYSTEM "당신은 친절하고 정확한 AI 어시스턴트입니다. 모든 응답은 순수 한글로 작성하며, 한자를 사용하지 않습니다."
```

**원본 버전 (Modelfile.original)**

```dockerfile
FROM ./Smoothie-Qwen3-8B-Original-Q4_K_M.gguf

TEMPLATE """{{- if .System }}{{ .System }}
{{ end }}{{ if .Prompt }}### Instruction:
{{ .Prompt }}

### Response:
{{ end }}{{ .Response }}"""

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 4096
```

### 7.2 Ollama에 등록

```bash
# 파인튜닝 버전
cd "e:\qwen3 8b"
ollama create bluejude10/smoothie-qwen3-8b-dtro -f Modelfile.dtro

# 원본 버전
ollama create bluejude10/smoothie-qwen3-8b-original -f Modelfile.original
```

### 7.3 테스트

```bash
# 파인튜닝 버전 테스트
ollama run bluejude10/smoothie-qwen3-8b-dtro "안녕하세요, 자기소개 해주세요."

# 원본과 비교
ollama run bluejude10/smoothie-qwen3-8b-original "안녕하세요, 자기소개 해주세요."
```

---

## 8. 트러블슈팅

### 8.1 TRL 0.26.x API 오류

**증상:**
```
TypeError: SFTTrainer.__init__() got an unexpected keyword argument 'tokenizer'
```

**해결:**
```python
# Before (TRL 0.8.x)
trainer = SFTTrainer(
    tokenizer=tokenizer,
    max_seq_length=1024,
    dataset_text_field="text",
)

# After (TRL 0.26.x)
training_args = SFTConfig(
    max_length=1024,          # max_seq_length → max_length
    dataset_text_field="text",
)
trainer = SFTTrainer(
    processing_class=tokenizer,  # tokenizer → processing_class
    args=training_args,
)
```

### 8.2 CUDA Out of Memory

**증상:**
```
torch.cuda.OutOfMemoryError
```

**해결:**
1. `BATCH_SIZE`를 1로 유지
2. `GRADIENT_ACCUMULATION`을 16 이상으로 설정
3. `MAX_SEQ_LENGTH`를 1024 이하로 축소
4. `gradient_checkpointing=True` 활성화

### 8.3 llama.cpp 변환 실패

**증상:**
```
KeyError: 'model.embed_tokens.weight'
```

**해결:**
- 최신 llama.cpp 버전 사용 (Qwen3 지원 필요)
- `convert_hf_to_gguf.py` 대신 `convert.py` 시도

---

## 📎 부록

### A. 디렉토리 구조

```
e:\qwen3 8b\
├── finetune_qlora.py          # 학습 스크립트
├── merged_dataset.json        # 학습 데이터 (1,273개)
├── FINETUNE_GUIDE.md          # 이 문서
│
├── outputs_qlora/             # 체크포인트
│   ├── checkpoint-1520/
│   └── checkpoint-1600/
│
├── final_model_qlora/         # 최종 LoRA 어댑터
│   ├── adapter_config.json
│   └── adapter_model.safetensors
│
├── merged_model_fp16/         # 병합된 모델 (GGUF 변환용)
│
└── *.gguf                     # 양자화된 모델
```

### B. 모델 명명 규칙

| 용도 | 이름 |
|------|------|
| HuggingFace/vLLM | `bluejude10/Smoothie-Qwen3-8B-DTRO-Edition` |
| Ollama (파인튜닝) | `bluejude10/smoothie-qwen3-8b-dtro` |
| Ollama (원본) | `bluejude10/smoothie-qwen3-8b-original` |
| GGUF 파일 | `Smoothie-Qwen3-8B-DTRO-Edition-Q4_K_M.gguf` |

---

**🎉 파인튜닝 완료!**