heng (liuheng)

upvoted a paper 7 months ago

Tower+: Bridging Generality and Translation Specialization in Multilingual LLMs

Paper • 2506.17080 • Published Jun 20, 2025 • 8

upvoted 3 articles 10 months ago

Article

KV Cache from scratch in nanoVLM

+3

ariG23498, kashif, lusxvr, andito, pcuenq

•

Jun 4, 2025

• 120

Article

🕳️ Attention Sinks in LLMs for endless fluency

tomaarsen

•

Oct 9, 2023

• 37

Article

Slim Attention, KArAt, XAttention and Multi-Token Attention Explained – What’s Really Changing in Transformers?

Kseniase

•

Apr 4, 2025

• 16

upvoted an article 12 months ago

Article

Simplifying Alignment: From RLHF to Direct Preference Optimization (DPO)

ariG23498

•

Jan 19, 2025

• 53

upvoted a collection over 1 year ago

Qwen3

Collection

84 items • Updated Dec 31, 2025 • 1.83k

upvoted 2 articles over 1 year ago

Article

🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It?

Kseniase

•

Mar 17, 2025

• 360

Article

Open-source DeepResearch – Freeing our search agents

+3

m-ric, albertvillanova, merve, thomwolf, clefourrier

•

Feb 4, 2025

• 1.32k

upvoted a paper over 1 year ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28, 2025 • 125

upvoted a paper almost 3 years ago

Extending Context Window of Large Language Models via Positional Interpolation

Paper • 2306.15595 • Published Jun 27, 2023 • 54

liuheng

AI & ML interests

Organizations

Tower+: Bridging Generality and Translation Specialization in Multilingual LLMs

KV Cache from scratch in nanoVLM

🕳️ Attention Sinks in LLMs for endless fluency

Slim Attention, KArAt, XAttention and Multi-Token Attention Explained – What’s Really Changing in Transformers?

Simplifying Alignment: From RLHF to Direct Preference Optimization (DPO)

Qwen3

🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It?

Open-source DeepResearch – Freeing our search agents

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Extending Context Window of Large Language Models via Positional Interpolation

liuheng

AI & ML interests

Organizations

heng's activity

KV Cache from scratch in nanoVLM

🕳️ Attention Sinks in LLMs for endless fluency

Slim Attention, KArAt, XAttention and Multi-Token Attention Explained – What’s Really Changing in Transformers?

Simplifying Alignment: From RLHF to Direct Preference Optimization (DPO)

🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It?

Open-source DeepResearch – Freeing our search agents