arxiv:2605.09169

Prediction Bottlenecks Don't Discover Causal Structure (But Here's What They Actually Do)

Published on May 9

· Submitted by

Aman Chadha on May 12

Upvote

Authors:

Aman Chadha

Abstract

A Mamba state-space model's claimed recovery of Granger-causal structure through a simple readout was tested across synthetic and real datasets with interventions, revealing the method-level claim does not hold when accounting for confounding factors and baseline approaches.

AI-generated summary

A Mamba state-space model trained only for next-step prediction appears to recover Granger-causal structure through a simple readout S = |W_{out} W_{in}|, with early experiments suggesting the phenomenon generalized across architectures and benefited from interventional data at p < 10^{-5}. We package the protocol used to test that claim -- standardized synthetic generators (VAR/Lorenz/CauseMe-style), three intervention semantics (do(X=c), soft-noise, random-forcing), edge-provenance cards on three real datasets, and size-matched control arms -- as a reusable falsification benchmark, and walk the claim through it in five stages. The method-level claim does not survive: (i) a plain linear bottleneck does as well or better; (ii) tuned Lasso beats the bottleneck on synthetic CauseMe-style benchmarks, and on Lorenz-96 (the only real benchmark with unambiguous ground truth) classical PCMCI and Granger lead a tight cluster in which the bottleneck trails; (iii) the headline intervention advantage is roughly 60% a sample-size confound, and the residual disappears under standard do(X=c) interventions, surviving only under a non-standard random-forcing scheme; (iv) even that residual reproduces, with a larger effect, in classical bivariate Granger -- the effect is method-agnostic. What survives is a narrow characterization result; the benchmark is the lasting artifact, and each stage above is one of its control arms.

View arXiv page View PDF Add to collection

Community

amanchadha

Paper author Paper submitter about 14 hours ago

This paper falsifies the claim that next-step prediction bottlenecks—especially Mamba/SSM weight projections—recover causal structure, showing instead that their apparent gains are mostly low-rank regression, sample-size confounds, intervention-semantics artifacts, and target-corruption robustness, with the main durable contribution being a reusable falsification benchmark.

➡️ 𝐊𝐞𝐲 𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬 𝐨𝐟 𝐭𝐡𝐞𝐢𝐫 𝐏𝐫𝐞𝐝𝐢𝐜𝐭𝐢𝐨𝐧-𝐚𝐬-𝐂𝐚𝐮𝐬𝐚𝐥-𝐃𝐢𝐬𝐜𝐨𝐯𝐞𝐫𝐲 𝐅𝐚𝐥𝐬𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧 𝐅𝐫𝐚𝐦𝐞𝐰𝐨𝐫𝐤: 

🧪 𝑹𝒆𝒖𝒔𝒂𝒃𝒍𝒆 𝑭𝒊𝒗𝒆-𝑺𝒕𝒂𝒈𝒆 𝑭𝒂𝒍𝒔𝒊𝒇𝒊𝒄𝒂𝒕𝒊𝒐𝒏 𝑩𝒆𝒏𝒄𝒉𝒎𝒂𝒓𝒌: Introduces a control-heavy benchmark spanning VAR, Lorenz-96, CauseMe-style generators, real datasets with edge-provenance cards, matched-capacity architectures, size-matched observational controls, and multiple intervention semantics to stress-test claims that prediction models implicitly recover causal graphs.

🧩 𝑾𝒆𝒊𝒈𝒉𝒕-𝑷𝒓𝒐𝒋𝒆𝒄𝒕𝒊𝒐𝒏 𝑪𝒂𝒖𝒔𝒂𝒍𝒊𝒕𝒚 𝑫𝒐𝒆𝒔 𝑵𝒐𝒕 𝑺𝒖𝒓𝒗𝒊𝒗𝒆 𝑪𝒐𝒏𝒕𝒓𝒐𝒍𝒔: Tests the extraction rule (S = |W_{out}W_{in}|) for bottleneck predictors and shows that linear bottlenecks match or beat Mamba SSMs, tuned Lasso dominates on synthetic graph recovery, and classical PCMCI/Granger-style methods outperform the bottleneck on clean Lorenz-96 ground truth.

🧠 𝑰𝒏𝒕𝒆𝒓𝒗𝒆𝒏𝒕𝒊𝒐𝒏 𝑮𝒂𝒊𝒏𝒔 𝑨𝒓𝒆 𝑪𝒐𝒏𝒇𝒐𝒖𝒏𝒅𝒔, 𝑵𝒐𝒕 𝑪𝒂𝒖𝒔𝒂𝒍 𝑬𝒙𝒕𝒓𝒂𝒄𝒕𝒊𝒐𝒏: Demonstrates that the reported interventional advantage mostly comes from extra sample size and a non-standard per-step random-forcing intervention; under proper (do(X_i=c)) interventions the effect nearly vanishes, while the residual appears even more strongly in classical bivariate Granger, indicating method-agnostic target-corruption robustness rather than learned causal discovery.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.09169

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.09169 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.09169 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.09169 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.