Posts

Comments

Comment by joetey on A Rocket–Interpretability Analogy · 2024-10-22T20:24:23.403Z · LW · GW

I personally don't think that working on better capabilities, and working on the alignment problem are two distinct, separate problems. For example, if you're able to create better, more precise control mechanisms and intervention techniques, you can better align human & AI intent.  Alignment feels as much of a technical, control interface problem, than merely a question of "what should we align to?".