Posts
Comments
What about quickly launching a missile following its trajectory using the same technology? The probe eventually needs to slow down to survive impact and the missile doesn't so preventing Von Neumann probes seems fairly straightforward to me. My understanding is that tracking objects in space is very easy unless they've had time to cool to near absolute zero.
On the other hand, this requires a misaligned AI was able to build such a probe and get it on a rocket it built or commandeered without being detected or stopped. That rules out safety via monitoring (and related approaches) and we would need to rely on it being essentially aligned anyway (such as via the "natural generalizations" Holden mentioned).
Interesting, this seems quite similar to the idea that human intelligence is around some critical threshold for scientific understanding and reasoning. I'm skeptical that it's useful to think of this as "culture" (except insofar as AIs hearing about the scientific method and mindset from training data, which will apply to anything trained on common crawl) but the broader point does seem to be a major factor in whether there is a "sharp left turn".
I don't think AIs acquiring scientific understanding and reasoning is really a crux for a sharp left turn: moderately intelligent humans who understand the importance of scientific understanding and reasoning and are actively trying to use it seem (to me) to be able to use it very well when biases aren't getting in the way. Very high-g humans can outsmart them trivially in some domains (like pure math) and to a limited extent in others (like social manipulation). But even if you would describe these capability gains as dramatic it doesn't seem like you can attribute them to greater awareness of abstract and scientific reasoning. Unless you think AI that's only barely able to grok these concepts might be an existential threat or there are further levels of understanding beyond what very smart humans have I don't think there's a good reason to be worried about a jump to superhuman capabilities due to gaining capabilities like P₂B.
On the other hand you have a concern that AIs would be able to figure out what from their training data is high quality or low quality (and presumably manually adjusting their own weights to remove the low quality training data). I agree with you that this is a potential cause for a discontinuity, though if foundation model developers start using their previous AIs to pre-filter (or generate) the training data in this way then I think we shouldn't see a big discontinuity due to the training data.
Like you I am not sure what implications there are for whether a sharp gain in capability would likely be accompanied by a major change in alignment (compared to if that capability had happened gradually).