Posts
Comments
I think this post shifts the burden to risk-concerned folks and only justifies that risk with its own poor analogies. 7/10 of the analogy-makers you cite here have pdooms <=50%, so they are only aiming for plausibility. You admit that the analogies point to the "logical possibility" of stark misalignment, but one person's logical possibility is another person's plausibility.
To give an example, Golden Retrievers are much more cherry-picked than Cotra's lion/chimpanzee examples. Of all the species on Earth, the ones we've successfully domesticated are a tiny, tiny minority. Maybe you'd say we have a high success rate when we try to domesticate a species, but that took a long time in each case and is still meaningfully incomplete. I think 7/10 are advocating for taking the time with AIs and would be right to say we shouldn't expect e.g. lion domestication to happen overnight, even thought we're likely to succeed eventually.
The presentation of Quentin's alternate evolution argument seems plausible, but not clearly more convincing than the more common version one might hear from risk-concerned folks. Training fixes model weights to some degree, after which you can do some weaker adjustments with things like fine-tuning, RLHF, and (in the behavioral sense, maybe) prompt-engineering. Our genes seem like the most meaningfully fixed thing about us, and those are ~entirely a product of our ancestors' performance, which is heavily weighted toward the more stable pre-agricultural and pre-industrial human environments.