Posts
Comments
Some errata:
The bat thing might have just been Thomas Nagel, I can't find the source I thought I remembered.
At one point I said LLMs forget everything they thought previously between predicting (say) token six and seven and half to work from scratch. Because of the way the attention mechanism works it is actually a little more complicated (see the top comment from hmys). What I said is (I believe) still overall right but I would put that detail less strongly.
Hofstadter apparently was the one who said a human-level chess AI would rather talk about poetry.
Did a podcast interview with Ayush Prakash on the AIXI model (and modern AI), very introductory/non-technical:
How do you square that with Algorithm 10 here: https://arxiv.org/pdf/2207.09238? See Appendix B for the list of notation, should save some time if you don't want to read the whole thing.
(Nice resource by the way, the only place I have seen anyone write down a proper pseudo-code algorithm for transformers)
Seems to match the diagram from @hmys in that the entire row to the left of a position goes into its multiheaded attention operation - NOT the original input tokens.
Sounds awesome, but also clearly dual use.
It seems that the relevant thing is not so much how many values you have tested as the domain size of the function. A function with a large domain cannot be explicitly represented with a small lookup table. But this means you also have to consider how the black box behaves when you feed it something outside of its domain, right? If it has some default "missing" value, that doesn't complicate the conclusions much. But what if it is a lookup table, but the index is taken mod the length of the table? Or more generally, what if a hash function is applied to the index first? It seems that counting the number of unique output values taken by the function is more robustly lower-bounds its size as a (generalized) lookup table?
Yes - assuming what you described is a fixed algorithm T, the complexity of A is just a constant, and the universal distribution samples input A for T a constant fraction of the time, meaning that this still dominates the average case runtime of T.
More generally: the algorithm has to be fixed (uniform), it can't be parameterized by the input size. The results of the paper are asymptotic.
I already explained one reason why we should experience time passing - we have memories of the past but not the future. This is because of the arrow of time. Cognition is a computational process that runs forward in time; the explanation is probably related to the fact that computers create heat, which means increasing entropy, and the forward direction of time is the direction in which entropy increases - but I think Aram has a better explanation. I am aware that this will not address the objection as it exists in your mind - you’re imagining that all of our qualia should somehow exist outside of time at the same instant - but I think this is just confused. How would you know if they did? What would that mean? You certainly can’t experience the future as you experience the past in any causally detectable way. Actually; I suppose that such a strange state of affairs is discussed in “stories of your life,” the inspiration for the movie Arrival.
I don’t have a complete theory of qualia, but this seems like an unreasonable demand from the level 4 multiverse theory in itself. The level 4 multiverse explains why thinking beings like us could find themselves in our situation. Why that “feels like” something in a first person way is a problem for any materialist theory, and the discussion of that problem is not new. Instead of getting into this, I addressed directly what the post actually claims, which is that the level 4 multiverse theory does not explain why pleasure and suffering have different valences, when they should be symmetric - the flaw in that reasoning is that there is no need for them to be symmetric.
Both objections seem confused, particularly the second but probably the first as well.
Certainly we as conscious beings must exist within a universe that can support conscious thought. Possibly this requires an arrow of time. Possibly we are just in one of the mathematical universes that happens to have an arrow of time - the arrow seems to arise from fairly simple assumptions, mainly an initial condition and coarse graining, see the recent paper “Causal Multi-baker maps and the arrow of time” from Aram Ebtekar. Perhaps more to the point, the experience of moving forward in time just exists at a different level of abstraction than “the universe existing all at once.” We feel that we move forward in time because we only have memories of the past, which is related to the second law of thermodynamics. It’s a fact about physics and cognition, not the meta-physics of our universe’s abstract existence.
The valence of pleasure and pain is not just a sign change, they serve vastly different psychological functions and evolved for distinct evolutionary reasons. There is no symmetry here.
Also, I believe it’s usually called the level 4 multiverse, as opposed to universe?
A fun illustration of survivorship/selection bias is that nearly every time I find myself reading an older paper, I find it insightful, cogent, and clearly written.
What does the chinchilla scaling laws paper (overtraining small models) have to do with distilling larger models? It’s about optimizing the performance of your best model, not inference costs. The compute optimal small model would presumably be a better thing to distill, since the final quality is higher.
It might go that way, but I don't see strong reasons to expect it.
Yeah, that sentence may have been too strong.
Yes, I agree with almost all of that, particularly:
Deep learning AI research isn't concerned with inventing AIs, it's concerned with inventing AI training processes.
Technically it deep learning research is concerned with inventing AIs, but lately through inventing AI training processes.
The only part I either don't understand or don't agree with is:
long reasoning traces are probably sufficient to bootstrap general intelligence, given the right model weights
Though a simple training process certainly find diverse functions, I don't think the current paradigm will get all of the ones needed for AGI.
I'm not prepared to throw out my metaphysics to explain that sometimes research takes a few decades.
Yes, transhumanists used to say 2045 and it was considered a bit aggressive, times have changed!
IMO, my latest dates are from 2040-2050, and if it doesn't happen by then, then I'll consider AI to likely never reach what people on LW thought.
What? I have a good 20-25% on AGI a few decades after we understand the brain, and the former could easily be 100-250 years out. Probably other stuff accelerates a lot by then but who knows!
You're totally right - I knew all of the things that should have let me reach this conclusion, but I was still thinking about the residual stream in the upwards direction on your diagram as doing all of the work from scratch, just sort of glancing back at previous tokens through attention, when it can also look at all the previous residual streams.
This does invalidate a fairly load-bearing part of my model, in that I now see that LLMs have a meaningful ability to "consider" a sequence in greater and greater depth as its length grows - so they should be able to fit more thinking in when the context is long (without hacks like chain of thoughts etc.).
Other parts of my mental model still hold up though. While this proves that LLMs should be better at figuring out how to predict sequences than I thought previously (possibly even inventing sophisticated mental representations of the sequences) I still don't expect methods of deliberation based on sampling long chains of reasoning from an LLM to work - that isn't directly tied to sequence prediction accuracy, it would require a type of goal-directed thinking which I suspect we do not know how to train effectively. That is, the additional thinking time is still spent on token prediction, potentially of later tokens, but not on choosing to produce tokens that are useful for further reasoning about the given task = actual user query, except insofar as that is useful for next toke prediction. RLHF changes the story, but as discussed I do not expect it to be a silver bullet.
Cool!
I endorse this take wholeheartedly.
I also wrote something related (but probably not as good) very recently: https://www.lesswrong.com/posts/vvgND6aLjuDR6QzDF/my-model-of-what-is-going-on-with-llms
Our intuitions here should be informed by the historical difficulty of RL.
Yes, that is what I think.
Edit: The class of tasks doesn't include autonomously doing important things such as making discoveries. It does include becoming a better coding assistant.
I think things will be "interesting" by 2045 in one way or another - so it sounds like our disagreement is small on a log scale :)
I see - I mean, clearly AlexNet didn't just invent all the algorithms it relied on, I believe the main novel contribution was to train on GPU's and get it working well enough to blow everything else out of the water?
The fact that it took decades of research to go from the Perceptron to great image classification indicates to me that there might be further decades of research between holding an intelligent-ish conversation and being a human agent level agent. This seems like the natural expectation given the story so far, no?
You seem overly anchored on COT as the only scaffolding system in the near-mid future (2-5 years). While I'm uncertain what specific architectures will emerge, the space of possible augmentations (memory systems, tool use, multi-agent interactions, etc.) seems vastly larger than current COT implementations.
COT (and particularly the extension tree of thoughts) seems like the strongest of those to me, probably because I can see an analogy to Solomonoff induction -> AIXI. I am curious whether you have some particular more sophisticated memory system in mind?
My point is that these are all things that might work, but there is no strong reason to think they will - particularly to the extent of being all that we need. AI progress is usually on the scale of decades and often comes from unexpected places (though for the main line, ~always involving a neural net in some capacity).
Something like that seems like it would be a MVP of "actually try and get an LLM to come up with something significantly economically valuable. I expect that the lack of this type of experiment existing is because major AI labs feel like that would be choosing to exploit while there are still many gains to be made from exploring further architectural and scaffolding-esque improvements.
I find this kind of hard to swallow - a huge number of people are using and researching LLMs, I suspect that if something like this "just works" we would know by now. I mean, it would certainly win a lot of acclaim for the first group to pull it off, so the incentives seem sufficient - and it doesn't seem that hard to pursue this in parallel to basic research on LLMs. Plus, the two investments are synergistic; for example, one would probably learn about the limitations of current models by pursuing this line. Maybe Anthropic is too small and focused to try it, but GDM could easily spin off a team.
Where you say "Certainly LLMs should be useful tools for coding, but perhaps not in a qualitatively different way than the internet is a useful tool for coding, and the internet didn't rapidly set off a singularity in coding speed.", I find this to be untrue both in terms of the impact of the internet (while it did not cause a short takeoff, it did dramatically increase the amount of new programmers and the effective transfer of information between them. I expect without it we would see computers having <20% of their current economic impact), and in terms of the current and expected future impact of LLM's (LLM's simply are widely used by smart/capable programmers. I trust them to evaluate if it is noticeably better than StackOverflow/the rest of the internet).
I expect LLMs to offer significant advantages above the internet. I am simply pointing out that not every positive feedback loop is a singularity. I expect great coding assistants (essentially excellent autocomplete) but not drop-in replacements for software engineers any time soon. This is one factor that will increase the pace of AI research somewhat, but also Moore's law is running out, which will definitely slow the pace. Not sure which one wins out directionally.
I think the central claim is plausible, and would very much like to find out I'm in a world where AGI is decades away instead of years. We might be ready by then.
Me too!
If I am reading this correctly, there are two specific tests you mention:
1) GPT-5 level models come out on schedule (as @Julian Bradshaw noted, we are still well within the expected timeframe based on trends to this point)
See my response to his comment - I think its not so clear that projecting those trends invalidates my model, but it really depends on whether GPT-5 is actually a qualitative upgrade comparable to the previous steps, which we do not know yet.
2) LLMs or agents built on LLMs do something "important" in some field of science, math, or writing
I would add on test 2 that neither have almost all humans. We don't have a clear explanation for why some humans have much more of this capability than others, and yet all the human brains are running on similar hardware and software. This suggests the number of additional insights needed to boost us from "can't do novel important things" to "can do" may be as small as zero, though I don't think it is actually zero. In any case, I am hesitant to embrace a test for AGI that a large majority of humans fail.
This seems about right, but there are two points to keep in mind.
a) It is more surprising that LLMs can't do anything important because their knowledge far surpasses any humans, which indicates that there is some kind of cognitive function qualitatively missing.
b) I think that about the bottom 30% (very rough estimate) of humans in developed nations are essentially un-agentic. The kind of major discoveries and creations I pointed to mostly come from the top 1%. However, I think that in the middle of that range there are still plenty of people capable of knowledge work. I don't see LLMs managing the sort of project that would take a mediocre mid-level employee a week or month. So there's a gap here, even between LLMs and ordinary humans. I am not as certain about this as I am about the stronger test, but it lines up with my experience with DeepResearch - I asked it for a literature review of my field and it had pretty serious problems that would have made it unusable, despite requiring ~no knowledge creation (I can email you an annotated copy if you're interested).
In practical terms, suppose this summer OpenAI releases GPT-5-o4, and by winter it's the lead author on a theoretical physics or pure math paper (or at least the main contributor - legal considerations about personhood and IP might stop people from calling AI the author). How would that affect your thinking?
Assuming the results of the paper are true (everyone would check) and at least somewhat novel/interesting (~sufficient for the journal to be credible) this would completely change my mind. As I said, it is a crux.
Wow, crazy timing for the GPT-5 announcement! I'll come back to that, but first the dates that you helpfully collected:
It's not clear to me that this timeline points in the direction you are arguing. Exponentially increasing time between "step" improvements in models would mean that progress rapidly slows to the scale of decades. In practice this would probably look like a new paradigm with more low-hanging fruit overtaking or extending transformers.
I think your point is valid in the sense that things were already slowing down by GPT-3 -> GPT-4, which makes my original statement at least potentially misleading. However, research and compute investment have also been ramping up drastically - I don't know by exactly how much, but I would guess nearly an order of magnitude? So the wait times here may not really be comparable.
Anyway, this whole speculative discussion will soon (?) be washed out when we actually see GPT-5. The announcement is perhaps a weak update against my position, but really the thing to watch is whether it is a qualitative improvement on the scale of previous GPT-N -> GPT-(N+1). If it is, then you are right that progress has not slowed down much. My standard is whether it starts doing anything important.
For what’s it’s worth, I couldn’t parse it but didn’t vote
I don’t believe that the type of thing the left means by diversity makes institutions smarter.
I also don’t believe that the right has generalized it’s anti-diversity stance to include all forms of variety which may actually be useful.
How are you interpreting this fact?
Sam Altman's power, money, and status all rely on people believing that GPT-(T+1) is going to be smarter than them. Altman doesn't have good track record of being honest and sincere when it comes to protecting his power, money, and status.
I don't think I understood this.
Here is an excellent philosophical analysis of de Finetti's theorem from I.J. Good: https://www.jstor.org/stable/20114666
Fortunately, this first mover problem should be resolved by my username being my actual name :)
I guess it also depends on how obsessively the others present scroll lesswrong… I would definitely appreciate knowing you are who you are if I met you
- 2024: AIs can reliably do ML engineering tasks that take humans ~30 minutes fairly reliably and 2-to-4-hour tasks with strong elicitation.
- 2025: AIs can reliably do 2-to-4-hour ML engineering tasks and sometimes medium-quality incremental research (e.g. conference workshop paper) with strong elicitation.
- 2026: AIs can reliably do 8-hour ML-engineering tasks and sometimes do high-quality novel research (e.g. autonomous research that would get accepted at a top-tier ML conference) with strong elicitation.
I don't believe any of these summaries has been, is, or will be correct. AIs still can't reliably do ML engineering tasks that take me ~30 minutes, at least not when I actually try it in practice. I will be shocked if an AI produces a medium-quality conference paper this year. General-purpose AIs will not do high quality novel research in 2026 (narrow AI along the lines of AlphaFold probably will to some extent).
He’s right that arguments for short timelines are essentially vibes-based but he completely ignores the value of technical A.I. safety research, which is pretty much the central justification for our case.
Gary Kasparov would beat me at chess in some way I can't predict in advance. However, if the game starts with half his pieces removed from the board, I will beat him by playing very carefully. The first above-human level A.G.I. seems overwhelmingly likely to be down a lot of material - massively outnumbered, running on our infrastructure, starting with access to pretty crap/low bandwidth actuators in the physical world and no legal protections (yes, this actually matters when you're not as smart as ALL of humanity - it's a disadvantage relative to even the average human). If we exercise even a modicum of competence it will also be even tougher (e.g. an air gap, dedicated slightly weaker controllers, exposed thoughts at some granularity). If the chess metaphor holds we should expect the first such A.G.I. not to beat us - but it may well attempt to escape under many incentive structures. Does this mean we should expect to have many tries to solve alignment?
If you think not, it's probably because of some dis-analogy with chess. For instance, the search space in the real world is much richer, and maybe there are always some "killer moves" available if you're smart enough to see them e.g. invent nanotech. This seems to tie in with people's intuitions about A) how fragile the world is and B) how g-loaded the game of life is. Personally I'm highly uncertain about both, but I suspect the answers are "somewhat."
I would guess that A.G.I. that only wants to end the world might be able to pull it off with slightly superhuman intelligence, which is very scary to me. But I think it would actually be very hard to bootstrap all singularity level infrastructure from a post-apocalyptic wasteland, so perhaps this is actually not an convergent instrumental subgoal at this level of intelligence.
Is life actually much more g-loaded than chess? In terms of how far you can in principle multiply your material, unequivocally yes. However life is also more stochastic - I will never beat Gary Kasparov in a fair game, but if Jeff Bezos and I started over with ~0 dollars and no name recognition / average connections today, I think there's a good >1% chance I'm richer in a year. It's not immediately clear to me which view is more relevant here.
Seems to be a restatement of Paul's, which I did respond to.
I view that as more of an interesting discussion than entirely a criticism. I just gave it a reread - he raises a lot of good points, but there's not exactly a central argument distinct from the ones I addressed as far as I can tell? He is mainly focused on digging into embeddedness issues, particularly discussing things I'd classify as "pain sensors" to prevent AIXI from destroying itself. My solution to this here is a little more thorough than the one that the pro-AIXI speaker comes up with.
The discussion of death is somewhat incorrect because it doesn't consider Turing machines which (while never halting) produce only a finite percept sequence and then "hang" or loop indefinitely. This can be viewed as death and may be considered likely in some cases. Here is a paper on it.
The other criticism is that AIXI doesn't self-improve - I mean, it learns of course, but doesn't edit its own source code. There may be hacky ways around this but basically I agree - that's just not the point of the AIXI model. It's a specification for optimal intelligence and an optimal intelligence does not need to self-improve. Perhaps self-improvement is better viewed as a method of bootstrapping a weak AIXI approximation into a better one using external conceptual tools. It's probably not a necessary ingredient up to human level though; certainly modern LLMs do not self-improve (yet) and since they are pretty much black-boxes it's not clear that they will be able to until well past the point where they are smart enough to be dangerous.
The standard AIXI does consider terminating histories because (even if halting machines are not included) some machines loop indefinitely without producing further output. The probability of this happening does eventually approach zero, but there is no reason it can't be the most plausible hypothesis in some cases.
I suppose that's right - long fingernails on exactly one hand could be easier to observe.
But overall I find the signs of string instruments very hard to notice. Hands are generally hard to observe because they're usually in motion, often clasped or hidden (e.g. gloved) etc.
Check out my research program:
https://www.lesswrong.com/s/sLqCreBi2EXNME57o
Particularly the open problems post (once you know what AIXI is).
For a balance between theory and implementation, I think Michael Cohen’s work on AIXI-like agents is promising.
Also look into Alex Altair’s selection theorems, John Wentworth’s natural abstractions, Vanessa Kosoy’s infra-Bayesianism (and more generally learning theoretic agenda which I suppose I’m part of), and Abram Demski’s trust tiling.
If you want to connect with alignment researchers you could attend the agent foundations conference at CMU, apply by tomorrow: https://www.lesswrong.com/posts/cuf4oMFHEQNKMXRvr/agent-foundations-2025-at-cmu
This seems like a pretty clear and convincing framing to me, not sure I've seen it expressed this way before. Good job!
A random player against a good player is exactly what we’re looking for right? If all transcripts with one random player had two random players then LLMs should play randomly when their opponents play randomly, but if most transcripts with a random player have it getting stomped by a superior algorithm that’s what we’d expect from base models (and we should be able to elicit it more reliably with careful prompting).
I see no reason that transformers can’t learn to play chess (or any other reasonable game) if they’re carefully trained on board state evaluations etc. This is essentially policy distillation (from a glance at the abstract). What I’m interested in is whether LLMs have absorbed enough general reasoning ability that they can learn to play chess the hard way, like humans do - by understanding the rules and thinking it through zero-shot. Or at least transfer some of that generality to performing better at chess than would be expected (since they in fact have the advantage of absorbing many games during training and don’t have to learn entirely in context). I’m trying to get at that question by investigating how LLMs do at chess - the performance of custom trained transformers isn’t exactly a crux, though it is somewhat interesting.
I don't know, I almost instantly found a transcript of a human stomping a random agent on reddit:
https://www.reddit.com/r/chess/comments/2rv7fr/randomness_vs_strategy/
This sort of thing probably would have been scraped?
I was thinking that plenty would appear as the only baseline a teenage amateur RL enthusiast might beat before getting bored, but I haven't found any examples of anyone actually posting such transcripts after a few minutes of effort so maybe you're right.
Which, unlike random move transcripts, is what you would predict, since the Superalignment paper says the GPT chess PGN dataset was filtered for Elo, in standard behavior-cloning fashion.
Chess-specific training sets won't contain a lot of random play.
I am more interested in any direct evidence that makes you suspect LLMs are good at chess when prompted appropriately?
Great, looking forward to it, thanks for putting this on.
This seems more plausible post hoc. There should be plenty of transcripts of random algorithms as baseline versus effective chess algorithms in the training set, and the prompt suggests strong play.
Are paper submissions exclusive?
What do you mean by "geometry of program synthesis"?
This seems possible - according to this article almost every model got crushed by the easiest Stockfish: https://dynomight.net/chess/
But at the end he links to his second attempt which experimented with fine tuning and prompting, eventually getting decent performance against weak Stockfish. Actually he notes that lists of legal moves are actively harmful, which may partially explain the original example with random agents.
A cursory glance at publications on the topic seems to indicate that LLMs can make valid moves and somehow represent the board state (which seems to follow), but are still weak players even after significant effort designing prompts.
Can you share any more definitive evidence?
10x was probably too strong but his posts are very clear he things it's a large productivity multiplier. I'll try to remember to link the next instance I see.
The basic idea that the utility should be learned is right, though perhaps one can still build a wrapper for it at some efficiency cost, rather than integrating the two modules.
However, I think the post spirals into unfounded optimism about the consequences of this observation. When the utility function "breaks" that probably looks like collapse to preferring something we didn't intend, perhaps because the planner has pushed the utility function out of distribution as you described. At least under the wrapper design, it should never look like incoherent action - Bayes optimal decisions with respect to ANY utility function are coherent - so the agent would presumably continue to function but seeking some bizarre goal. Perhaps some utility functions are sufficiently discontinuous that the agent really does start flipping out like you suggest, but this need not be the case. As an existence proof, a broken utility function that collapses to depending directly on the sensor inputs instead of their correlates in the world is perfectly tractable and there is a consistent stable agent which optimizes for it at least in theory, namely AIXI (with appropriate reward function).