Papers
arxiv:2605.13740

Learning POMDP World Models from Observations with Language-Model Priors

Published on May 13
· Submitted by
Valentin Six
on May 18
Authors:
,
,
,
,
,
,
,
,

Abstract

Pinductor uses language model priors to efficiently learn POMDP models from limited observation-action data, matching performance of methods with privileged hidden state access while outperforming traditional tabular approaches.

AI-generated summary

Whether navigating a building, operating a robot, or playing a game, an agent that acts effectively in an environment must first learn an internal model of how that environment works. Partially-observable Markov decision processes (POMDPs) provide a flexible modeling class for such internal world models, but learning them from observation-action trajectories alone is challenging and typically requires extensive environment interaction. We ask whether language-model priors can reduce costly interaction by leveraging prior knowledge, and introduce Pinductor (POMDP-inductor): an LLM proposes candidate POMDP models from a few observation-action trajectories and iteratively refines them to optimize a belief-based likelihood score. Despite using strictly less information, Pinductor matches the performance and sample efficiency of LLM-based POMDP learning methods that assume privileged access to the hidden state, while significantly surpassing the sample efficiency of tabular POMDP baselines. Further results show that performance scales with LLM capability and degrades gracefully as semantic information about the environment is withheld. Together, these results position language-model priors as a practical tool for sample-efficient world-model learning under partial observability, and a step toward generalist agents in real-world environments. Code is available at https://github.com/atomresearch/pinductor.

Community

Paper author Paper submitter

We introduce Pinductor: a method for learning executable POMDP world models from partially observed trajectories using LLM priors.

Unlike prior LLM-based approaches, Pinductor never accesses the hidden state — not even post hoc. Instead, an LLM proposes candidate transition / observation / reward programs, which are iteratively refined through belief-based filtering and interaction with the environment.

Across MiniGrid POMDP tasks, Pinductor:
• matches privileged-state LLM baselines while using strictly less information,
• significantly outperforms tabular POMDP baselines,
• learns useful belief states from only a few trajectories.

The broader question behind this work is:
Can language models act as strong priors for world-model induction under partial observability?

Paper: https://arxiv.org/abs/2605.13740
Code: https://github.com/atomresearch/pinductor

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.13740
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.13740 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.13740 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.13740 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.