SigmaScale: LLM Compression with SVD-based Low-Rank Decomposition and Learned Scaling Matrices
Abstract
SigmaScale learns auxiliary scaling matrices to improve truncated SVD-based LLM compression by adapting to individual weight structures through activation-aware transformations.
We present SigmaScale, a method for learning auxiliary scaling matrices S to aid truncated Singular Value Decomposition (SVD) based Large Language Model (LLM) compression. Instead of deriving scaling matrices analytically, SigmaScale optimizes two sets of vectors that define diagonal row and column scaling transformations under an activation-aware compression loss. We show that learned scaling lowers the effective intrinsic rank of weight matrices, as reflected by reductions in effective-rank entropy, and that this reduction is strongly correlated with compression loss. Experiments on Llama 3.1 8B Instruct and Qwen3-8B show that SigmaScale is competitive with closely related state-of-the-art SVD-based compression methods across perplexity and zero-shot benchmarks. By using learned activation-aware transformations, SigmaScale explores a more flexible route to low-rank LLM compression by adapting to the structure of individual model weights. The advantage observed in specific tasks makes our approach a valid option for applications requiring a reduced LLM-inference computing cost.
Community
Method for learning scaling matrices to aid SVD based LLM compression
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- IO-SVD: Input-Output Whitened SVD for Adaptive-Rank LLM Compression (2026)
- D-QRELO: Training- and Data-Free Delta Compression for Large Language Models via Quantization and Residual Low-Rank Approximation (2026)
- Predicting LLM Compression Degradation from Spectral Statistics (2026)
- Different Prompts, Different Ranks: Prompt-aware Dynamic Rank Selection for SVD-based LLM Compression (2026)
- TwinQuant: Learnable Subspace Decomposition for 4-Bit LLM Quantization (2026)
- Cross-Layer Subspace Coupling for LLM Compression: A Unifying Framework and Its Empirical Limits (2026)
- Post-Optimization Adaptive Rank Allocation for LoRA (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2606.07098 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper