AI Safety proposal - Influencing the superintelligence explosion 2024-05-22T23:31:16.487Z


Comment by Morgan on AI Safety proposal - Influencing the superintelligence explosion · 2024-05-25T21:10:14.867Z · LW · GW

Thank you, I think you pointed out some pretty significant oversights in the plan.

I was hoping that the system only needed to provide value during the period where an AI is expansion towards a superintelligent singleton, and we only really needed to live through that transition. But you're making me realize that even if we could give it a positive-sum trade up to that point, it would rationally defect afterwards unless we had changed its goals on a deep level. And like you say, that sort of requires that the system can solve alignment as it goes. I'd been thinking that by shifting it's trajectory we could permanently alter its behavior even if we're not solving alignment. I still think that it is possible that we could do that, but probably not in ways that matter for our survival, and probably not in ways that would be easy to predict (e.g. by shifting AI to build X before Y, something about building X causes it to gain novel understanding which it then leverages. Probably not very practically useful since we don't know those in advance.) 

I have a rough intuition that the ability to survive the transition to superintelligence still seems like it is still gives humanity more of a chance. In the sense that I expect the AI to be much more heavily resource constrained early in its timeline, and gaining compounding advantages as early as possible is much more advantageous; whereas post-superintelligence the value of any resource may be more incremental. But if that's the state of things, we still require a continuous positive-sum relationship without alignment, which feels likely-impossible to me.