Differential Transformer V2
• 51
None defined yet.
Covering Human Action Space for Computer Use: Data Synthesis and Benchmark
Synthetic Computers at Scale for Long-Horizon Productivity Simulation
hf-mem release added a breakdown of Mixture-of-Experts (MoE) memory usage!hf-mem now splits MoE memory into base model weights, routed experts, and KV cache