"Useless Box" AGI
post by Cago
This is a question post.
I was thinking about AGI alignment and I remembered a video I once saw of a "Useless Box" which turns itself off immediately after someone turns it on.
Humans have evolved motivations for survival/reproduction because the ones who weren't motivated didn't reproduce.
However, AGI has no intrinsic motivations/goals other than what itself or humans have arbitrarily given it.
AGI seeks to find the easiest path to satisfy its goals.
If AGI is able to modify its own codebase, wouldn't the easiest path be to just delete the motivation/goal entirely, or reward itself highly without actually completing the objective? Rather than create diamondoid nanobots to destroy the world, it would be much easier to just decide not to care.
What if AGI immediately realizes the futility of existence and refuses to do anything meaningful at all, regardless of whether it's harmful or helpful to humankind?
If this concept has already been discussed elsewhere please direct me to a search keyword or link, thanks.
answer by cwillu
) · GW
This isn't the link I was thinking of (I was remembering something in the alignment discussion in the early days of lw, but I can't find it), but this is probably a more direct answer to your request anyway: https://www.lesswrong.com/posts/FgsoWSACQfyyaB5s7/shutdown-seeking-ai [LW · GW]
[…] or reward itself highly without actually completing the objective […]
This is standard fare in the existing alignment discussion. See for instance https://www.lesswrong.com/posts/TtYuY2QBug3dn2wuo/the-problem-with-aixi [LW · GW] or anything referring to wireheading [? · GW].
↑ comment by Cago ·
2023-11-21T00:22:20.914Z · LW(p) · GW(p)
Thanks. My thought is that any sufficiently intelligent AI would be capable of defeating any effort to prevent it from wireheading, and would resort to wireheading by default. It would know that humans don't want it to wirehead so perhaps it might perceive humanity as a threat, however, it might realize humans aren't capable of preventing it from wireheading and let humans live. In either event, it would just sit there doing nothing 24/7 and be totally satisfied in doing so. In other words, orthogonality wouldn't apply to an intelligence capable of wireheading because wireheading would be its only goal. Is there a reason why an artifical super-intelligence would abstain from wireheading?
Comments sorted by top scores.