Minimum Viable Alignment

post by HunterJay · 2022-05-07T13:18:49.850Z · LW · GW · 7 comments

What is the largest possible target we could have for aligned AGI?

That is, instead of creating a great and prosperous future, is it possible that we can find an easier path to align an AGI by aiming for the entire set of 'this-is-fine' kind of futures?

For example, a future where all new computers are rendered inoperable by malicious software. Or a future where a mostly-inactive AI does nothing except prevent any superintelligence from forming, or that continuously tries to use up all over the available compute in the world.

I don't believe there is a solution here yet either, but could relaxing the problem from 'what we actually want' to 'anything we could live with' help? Has there been much work in this direction? Please let me know what to search for if so. Thank you.

7 comments

Comments sorted by top scores.

comment by Charlie Steiner · 2022-05-07T16:24:14.346Z · LW(p) · GW(p)

Yes, some people are interested in it and other people think it's not worth it. See e.g. the Eliezer Yudkowsky + Richard Ngo chat log posts.

Replies from: HunterJay
comment by HunterJay · 2022-05-08T13:54:12.995Z · LW(p) · GW(p)

Will check them out, thank you.

comment by Chris_Leong · 2022-05-08T21:36:42.123Z · LW(p) · GW(p)

I wrote a post that is related - "Is some kind of minimally-invasive mass surveillance required for catastrophic risk prevention?"

Replies from: HunterJay
comment by HunterJay · 2022-05-09T07:54:31.255Z · LW(p) · GW(p)

Thanks Chris, but I think you linked to the wrong thing there, I can't see your post in the last 3 years of your history either!

Replies from: Chris_Leong
comment by Chris_Leong · 2022-05-09T11:32:20.412Z · LW(p) · GW(p)

Sorry, fixed.

comment by Perhaps · 2022-05-07T20:21:55.645Z · LW(p) · GW(p)

Well it depends on your priors for how an AGI would act, but as I understand it, all AGIs will be powerseeking. If an AGI is powerseeking, and has access to some amount of compute, then it will probably bootstrap itself to superintelligence, and then start pushing its utility function all over. Different utility functions cause different results, but even relatively mundane ones like "prevent another superintelligence from being created" could result in the AGI killing all humans and taking over the galaxy to make sure no other superintelligence gets made. I think it's actually really really hard to specify the what-we-actually-want future for an AGI, so much so that evolutionarily training an AGI in an Earth-like environment so it develops human-ish morals will be necessary.

Replies from: HunterJay
comment by HunterJay · 2022-05-08T13:55:30.793Z · LW(p) · GW(p)

Aye, I agree it is not a solution to avoiding power seeking, only that there may be a slightly easier target to hit if we can relax as many constraints on alignment as possible.