Can LLM-based models do model-based planning?

post by jylin04 · 2025-04-16T12:38:00.793Z · LW · GW · 1 comments

This is a link post for https://docs.google.com/document/d/1eYkXycZEu93KOdwh6sOm0hHXr-yZ61lRivWtL5yohRc/edit?usp=sharing

Contents

1 comment

I recently spent a few months thinking about whether LLM-based models can do model-based planning, and wrote a ~40-page report on it: "Report on LLMs and model-based planning".  The doc is a bit rough around the edges still - most notably, the concepts of "efficient planning" and "sufficiently convoluted" tasks in section 1 are incompletely defined - but I thought I would share it in the current form, in case others could find the framework or early conclusions useful.

The summary is as follows: 

In this report, I investigate the question of whether current LLM-based models have the cognitive capability of model-based planning (MBP). 

As motivation, model-based planning is a frequently mentioned crux in the ongoing debate about whether LLM-based models are on track to scale to AGI.  However, the debate is frequently underspecified, with different people taking e.g. “world-model” to mean different things. In this report, l first operationalize model-based planning in a way that in principle can be compared against information-processing patterns in trained LLMs, and explain why I think the operationalization may be necessary for AIs to achieve certain consequential tasks in the real world. I then explore whether current LLM-based models can do it, based mainly on their performance on related benchmarks.

My main findings are that

  • Whether or not LLM-based models can do model-based planning may be bottlenecked by the complexity of the states required by the world-model: in particular whether or not they can be compressed into a compact representation in token form.
     
  • Current “pure LLMs” through GPT-4 probably cannot do model-based planning as defined here for a nontrivial number of planning steps, over world-models with even very simple states, based on their performance on existing benchmarks. 
     
  • Current reasoning models through o1 probably cannot do model-based planning as defined here for a nontrivial number of planning steps, over world-models with relatively simple states, based on their preliminary performance on existing benchmarks. 
     
  • Existing benchmarks may imperfectly track model-based planning under this operationalization, and I suggest an idea for a new benchmark to fill in the gaps.

     

  • Looking ahead, I think LLM-based architectures at scale could plausibly support model-based planning inefficiently, with one potential bottleneck being a potential need to encode intermediate states in the chain of thought. The main open questions are whether they could support efficient forms of model-based planning, and to what extent non-token state representations and efficient planning algorithms are needed to achieve consequential tasks in the real world.

This report is organized as follows. 

In section 1, I define and motivate the notion of model-based planning that I’ll use in the report.

In section 2, I review the LLM-based architectures that I’ll consider.

In section 3, I review the strategies with which we can try to get insight into whether current models and/or their architectures at scale can do model-based planning.

In section 4, I discuss whether current LLM-based models can do model-based planning over world-models with relatively simple states, based mostly on collecting results from existing benchmarks.

In section 5, I discuss ideas for future work.

1 comments

Comments sorted by top scores.

comment by Davidmanheim · 2025-04-16T15:16:56.479Z · LW(p) · GW(p)

Very interesting work. One question I've had about this is whether humans can do such planning 'natively', i.e. in our heads, or if we're using tools in ways that are essentially the same as doing "model-based planning inefficiently, with... bottleneck being a potential need to encode intermediate states."