arxiv:2505.02707
Jaward Sesay
Jaward
AI & ML interests
Building Lectūra Labs | CS Grad Student @BIT | AI/ML Research: Autonomous Agents, LLMs | Building The Cursor for Learning | Role Model Karpathy
Recent Activity
updated a dataset about 18 hours ago
Jaward/lectura-agents-data posted an update 1 day ago
Been wrapping my head around this theoretical banger. Bro just casually answered a fundamental yet very important question: why does weight decay work or in other words why should penalizing weight magnitude help a model perform better on unseen data?
He argues that the minimum neural weight norm required to represent a target dataset is closely related to the Kolmogorov complexity of that target dataset: I.e. smaller weight norms correspond to simpler solutions (lower Kolmogorov complexity), and simpler solutions tend to generalize better. This explains why bigger models generalize well on noisy data because there’s enough room to account for optimal KC. So the question now is not hinged on parameter size but on how much information is encoded in those parameters. Thus If norm is related to complexity, researchers can design regularizers that more directly control complexity, cool! It holds true for fixed precision only tho, and he explained clearly why posted an update 26 days ago
Anthropic’s new read introduces a new autoencoder (NLA) that now enables an LLM to reason in natural language (words) instead of activations (numbers). They trained Claude (with NLA) to translate its activations into human-readable text. NLA has two parameterized models: an activation verbalizer that converts activations to text, and an activation reconstructor that tries to recreate the activations back to text. While this is cool, it took GRPO to get here lol, proving how cutting-edge we can get when research is opensourced. Very useful for work on interpretability and alignment btw