Ideas for CoT Models: A Geometric Perspective on Latent Space Reasoning

post by Rohan Ganapavarapu (rohan-ganapavarapu) · 2025-01-24T19:01:47.339Z · LW · GW · 0 comments

This is a link post for https://rohan.ga/blog/llm_cot/

Contents

    Disclaimer: These ideas are untested and only come from my intuition. I don’t have the resources to explore them any further.
  intro
  how to structure latent space
  geometric intuition
  why it could be good
  limitations
None
No comments

Disclaimer: These ideas are untested and only come from my intuition. I don’t have the resources to explore them any further.

intro

CoT and test time compute have been proven to be the future direction of language models for better or for worse. o1 and DeepSeek-R1 demonstrate a step function in model intelligence. Coconut also provides a way for this reasoning to occur in latent space. I have been thinking about the geometric structure of the latent space where this reasoning can occur.

I want to propose a different geometric perspective on how we structure the latent reasoning space. What if, instead of treating all reasoning steps uniformly, we designed the latent space to mirror how complex problem-solving naturally progresses–from broad exploration to precise refinement?

The intuition is: early reasoning steps require a rich space for exploring multiple potential paths, while later steps need precision to nail down the exact solution. This suggests structuring the latent reasoning space as a progressive funnel: starting with high-dimensional, low-precision representations that gradually transform into lower-dimensional, high-precision ones.

how to structure latent space

We structure the latent reasoning space as a progressive funnel: starting with high-dimensional, low-precision representations that gradually transform into lower-dimensional, high-precision ones.

Early reasoning steps would operate in a vast but coarse-grained space. This creates a rich geometric landscape where many potential reasoning paths can coexist “orthogonally” without interfering with each other. We have many rough directions to explore simultaneously.

As reasoning progresses, we’d project into increasingly focused spaces with higher precision per dimension. While we lose some of that initial expressiveness, we gain the ability to make more precise distinctions–perfect for refining the final steps of a logical deduction or mathematical calculation.

geometric intuition

In the early high-dimensional space, the “concentration of measure” phenomenon actually helps keep different partial solutions naturally separated. The manifold has many local peaks and valleys, allowing the model to maintain multiple hypotheses in superposition.

As we funnel down to lower dimensions, we’re essentially performing a learned form of dimensionality reduction that preserves the most promising reasoning pathways while discarding irrelevant directions. The manifold becomes smoother and more precise, ideal for fine-tuning the final logical steps.

why it could be good

Current approaches often force models to commit to specific reasoning paths too early. By starting in a high-dimensional space, we allow the model to maintain multiple partial solutions in parallel, only gradually pruning away less promising directions as confidence increases.

This mirrors how human experts often reason: starting with broad intuitive leaps and gradually refining them into precise logical arguments. The initial high-dimensional space provides room for that kind of intuitive exploration, while the final high-precision space ensures rigorous conclusions.

The manifold perspective also suggests why this might be computationally efficient: early broad exploration happens in a coarse space where precise computation isn’t needed, while expensive high-precision operations only occur in the reduced dimensional space where they matter most.

Of course we are doing some anthropomorphizing but the intuition here is as well founded as anything else. Attention isn’t really the model paying attention to each token.

limitations

I, of course, have 0 idea how we would implement this on the model architecture scale. Changing the dimensions and precisions is really weird when you consider how it would affect the other parts of the model. We would be predicting the next vector but how exactly we choose the dimension of the vector and how exactly we start narrowing and how exactly we start generating vectors that are “translatable” to human text is unclear. There is also a lack of training data, we would have to AlphaGo it and RL from literally nothing, as no CoT in this weird vector format exists.

I also assume the low precision of higher dimensions lowers the compute cost so it is comparable to current models. I think this is such a departure from what is known working it may not make sense to explore it (training stability may be really hard).

The only reason I write this is for the 1e-9 chance someone reads this and is inspired to develop ASI.

0 comments

Comments sorted by top scores.