0 comments

Comments sorted by top scores.

comment by abramdemski · 2025-01-24T16:34:17.325Z · LW(p) · GW(p)

Evolutionary mutations are produced randomly, and have an entire lifetime to contribute to an animal's fitness and thereby get naturally selected. By contrast, neural network updates are generated by deciding which weight-changes would certainly be effective for improving performance on single training examples, and then averaging those changes together for a large batch of training data.
Per my judgement, this makes it sound like evolution has a much stronger incentive to produce inner algorithms which do something like general-purpose optimization (e.g. human intelligence). We can roughly analogize an LLM's prompt to human sense data; and although it's hard to neatly carve sense data into a certain number of "training examples" per lifetime, the fact that human cortical neurons seem get used roughly 240 million times in a person's 50-year window of having reproductive potential,^[4] whereas LLM neurons fire just once per training example, should give some sense for how much harder evolution selects for general-purpose algorithms such as human intelligence.

By this argument, it sounds like you should agree with my conclusion that o1 and similar models are particularly dangerous and a move in the wrong direction [LW · GW], because the "test-time compute" approach grows the size of a "single training example" much larger, so that single neurons are firing many more times.

I think the possibility of o1 models creating mesa-optimizers seems particularly concrete and easy to reason about. Pre-trained base models can already spin up "simulacra" which feel relatively agentic when you talk to them (ie coherent over short spans, mildly clever). Why not expect o1-style training to amplify these?

(I would agree that there are two sides to this argument -- I am selectively arguing for one side, not presenting a balanced view, in the hopes of soliciting your response wrt the other side.)

I think it quite plausible that o1-style training increases agenticness significantly by reinforcing agentic patterns of thinking, while only encouraging adequate alignment to get high scores on the training examples. We have already seen o1 do things like spontaneously cheat at chess. What, if anything, is unconvincing about that example, in your view?

comment by abramdemski · 2025-01-24T16:40:11.099Z · LW(p) · GW(p)

This one was a little bit of a face-palm for me the first time I noticed it. If we're being pedantic about it, we might point out that the term "optimization algorithm" does not just refer to AIXI-like programs, which optimize over expected future world histories. Optimization algorithms include all algorithms that search over some possibility space, and select a possibility according to some evaluation criterion. For example, gradient descent is an algorithm which optimizes over neuron configuration, not future world-histories.

This distinction is what I was trying to get at with selection vs control. [LW · GW]

comment by Lun · 2025-01-24T10:13:33.594Z · LW(p) · GW(p)

Gradient descent generates updates by suggesting algorithmic improvements for single training examples, thereby exerting much less pressure for generality than evolution does.

A recent technique Gradient Agreement Filtering filters out gradients that disagree between samples, which if I'm understanding correctly is intentionally breaking this crux and pushing for more generalization / less memorization of specific samples.

Vague intuition that with typical LM pretraining which involves batches of far more than 1 sample + optimizers with momentum this might already not hold, non-generalizing / noisy / disagreeing updates are unimportant over the training run and the generalizing, agreeing updates stick around.

comment by Fiora Sunshine (Fiora from Rosebloom) · 2025-01-24T00:55:36.591Z · LW(p) · GW(p)

one obviously true consideration i failed to raise was that neural networks change lots of their weights at a time per update. this is in contrast to natural selection which can only change one thing at a time. this means that gradient descent lacks evolution's property of every change to the structure in question needing to be useful in its own right if it's going to spread through the population. therefore, deep learning systems could build complex algorithms requiring multiple computational steps before becoming useful in a way that evolution couldn't. this probably gives it access to a broader class of algorithms it can implement, potentially including dangerous mesa-optimizers.

i still think llms updates being not being generated for generality is a significant reason for hope though

Replies from: datawitch

↑ comment by datawitch · 2025-01-24T02:01:17.467Z · LW(p) · GW(p)

I'm not sure that's quite right. A genetic mutation is "one thing" but it can easily have many different effects especially once you consider that it's active for an entire lifetime.

And doesn't gradient descent also demand that each weight update is beneficial? At a much more fine grain than evolution does... then again I guess there's grokking so I'm not sure.

Replies from: sharmake-farah

↑ comment by Noosphere89 (sharmake-farah) · 2025-01-24T14:12:28.795Z · LW(p) · GW(p)

I'm not sure that's quite right. A genetic mutation is "one thing" but it can easily have many different effects especially once you consider that it's active for an entire lifetime.

Wile this can happen, empirically speaking (at least for eukaryotes), genetic mutations are mostly much more modular and limited to one specific thing by default, rather than it affecting everything else in a tangled way, and this is due to genetics research discovering that things are mostly linear and compositional for genetic effects, in the sense that the best way to predict what will happen if you add two genes together is that their effects are summed up, not interacting in a nonlinear way.