Posts

How Logic "Really" Works: An Engineering Perspective 2025-04-16T05:34:09.443Z
FlexChunk: Enabling 100M×100M Out-of-Core SpMV (~1.8 min, ~1.7 GB RAM) with Near-Linear Scaling 2025-04-06T05:27:06.271Z
Formal Proof: O(n) Is a Cognitive Illusion 2025-03-28T18:26:05.111Z

Comments

Comment by Daniil Strizhov (mila-dolontaeva) on Tracing the Thoughts of a Large Language Model · 2025-03-28T05:17:48.565Z · LW · GW

The poetry case really stuck with me. Claude’s clearly planning rhymes ahead, which already cracks the “just next-token” intuition about autoregressive models. But maybe it’s more than a neat trick. What if this spatial planning is a core capability—like the model’s not just unrolling a string, but navigating a conceptual space toward a target? One could test this by checking how often similar planning circuits pop up in multi-step reasoning tasks. If it’s building a rough "mental map" of where it wants to land, that might explain why bigger context windows boost reasoning so much. Not just more data—more room to plan. Has anyone tried prompting or tracing for this directly?