Access Sets Matter: Budgeting Expert Reads for Scalable Weight-Space Model Merging
Abstract
MergePipe addresses expert weight access limitations in large language model merging by formulating it as an expert access-set problem with budget-aware execution and deterministic planning.
Weight-space model merging is usually formulated as an algebraic operation on checkpoints, yet at LLM scale the limiting resource is often the set of expert weights that must be read. We introduce MergePipe, a budget-aware execution layer that casts LLM merging as an expert access-set problem: given a merge operator and a checkpoint family in a shared weight coordinate system, choose which expert delta blocks to access under an explicit I/O budget. MergePipe indexes parameter blocks, builds deterministic access plans, and executes the induced budgeted merge with replayable manifests. The plan is budget-sound by construction and recovers the full-read merge at full budget; for fixed-coefficient additive operators, the omitted-update error is bounded by the norm of omitted deltas. Across Qwen and Llama merging workloads, MergePipe reduces expert-read I/O by up to an order of magnitude and achieves up to 11times speedups. Representative budget sweeps show O(10^{-3}) parameter deviation from full-read merges and no monotonic degradation on downstream benchmarks.
Community
MergePipe is the first parameter management and execution system for large-scale LLM merging.
It reframes LLM merging as a data management problem, rather than a one-off script execution:
explicitly model expert parameter reads as a budgeted resource,
explicitly planning merges before execution, and
materialize merged checkpoints with atomic publish and immutable manifests.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- MinT: Managed Infrastructure for Training and Serving Millions of LLMs (2026)
- RW-TTT: Batched Serving for Request-Owned Test-Time Training State (2026)
- BatchWeave: A Consistent Object-Store-Native Data Plane for Large Foundation Model Training (2026)
- When Does Value-Aware KV Eviction Help? A Fixed-Contract Diagnostic for Non-Monotone Cache Compression (2026)
- COREY: Entropy-Guided Runtime Chunk Scheduling for Selective Scan Kernels (2026)
- ArborKV: Structure-Aware KV Cache Management for Scaling Tree-based LLM Reasoning (2026)
- PRISM: Fast Online LLM Serving via Scheduling-Memory Co-design (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2605.29489 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper