Doc-to-LoRA: Learning to Instantly Internalize Contexts
Paper โข 2602.15902 โข Published โข 3
A 144 M parameter Perceiver hypernetwork trained on Qwen/Qwen3-1.7B. Reads a document once, outputs LoRA deltas, and lets the base LLM answer questions without the document ever appearing in the context window.
Based on Doc-to-LoRA (Charakorn et al., 2026).
| Metric | Value |
|---|---|
| Base model | Qwen/Qwen3-1.7B |
| Perceiver params | 144 M |
| LoRA rank / alpha | 8 / 8.0 |
| Target module | down_proj |
| Training steps | 8,000 |
| Final CE loss | 0.2165 |
| Exact-match accuracy (NIAH) | 80.0% |
| Training ctx length | 32โ256 tokens |
| File | Description |
|---|---|
hypernet.pt |
Perceiver weights + full config to rebuild the class |
inference_example.py |
Self-contained script (download and run) |
training_config.json |
Training hyperparameters |
curves.png |
Loss and accuracy curves |
pip install transformers>=4.51.0 huggingface_hub torch
from huggingface_hub import hf_hub_download
import torch
ckpt = torch.load(hf_hub_download("farpluto/doc-to-lora-niah", "hypernet.pt"),
map_location="cuda", weights_only=False)
# See inference_example.py for the complete working script.
Chain-of-thought thinking is suppressed via /no_think appended to every query.
Residual <think> tokens are stripped from generated output.
Both techniques are harmless no-ops on non-Qwen3 models.