Two clarifications about "Strategic Background"

post by Rob Bensinger (RobbBB) · 2018-04-12T02:11:46.034Z · score: 76 (22 votes) · LW · GW · 3 comments

I've talked to a few people who misunderstood important parts of the "strategic background" discussion in https://intelligence.org/2017/12/01/miris-2017-fundraiser/#3.

First, at least two people thought the 1-8 numbered list was "MIRI's organizational plan" rather than "what we'd be least surprised to see happen in the world, conditional on good outcomes." MIRI is trying to de-confuse itself about step 8 and help put AGI developers in a better position in the future to select for AGI designs that are alignment-conducive, not trying to develop AGI.

Second, at least two other people misread "minimal aligned AGI" as "minimally aligned AGI", and thought MIRI was saying that developers should do the bare minimum of alignment work and then deploy immediately; or they saw that we were recommending building "systems with the bare minimum of capabilities for ending the acute risk period" and thought we were recommending this as an alternative to working really hard to achieve highly reliable and robust systems.

The MIRI view isn't "rather than making alignment your top priority and working really hard to over-engineer your system for safety, try to build a system with the bare minimum of capabilities". It's: "in addition to making alignment your top priority and working really hard to over-engineer your system for safety, also build the system to have the bare minimum of capabilities".

The idea isn't that you can get away with cutting corners on safety by keeping the system weak; per Eliezer's security mindset posts, a good plan should work (or fail safely) if the system ends up being a lot smarter than intended. Instead, the idea is that shooting for the bare minimum of capabilities adds a lot of value if your fundamentals are really good. Every additional capability a developer needs to align adds some extra difficulty and additional points of failure, so developers should target minimality in addition to alignment.

3 comments

comment by William_S · 2018-04-15T17:01:39.498Z · score: 8 (None votes) · LW · GW

Would you think that the following approach would fit within "in addition to making alignment your top priority and working really hard to over-engineer your system for safety, also build the system to have the bare minimum of capabilities" and possibly work, or would you think that it would be hopelessly doomed?

  • Work hard on designing the system to be safe
  • But there's some problem left over that you haven't been able to fully solve, and think will manifest at a certain scale (level of intelligence/optimization power/capabilities)
  • Run the system, but limit scale to stay well within the range where you expect it to behave well
comment by Rob Bensinger (RobbBB) · 2018-04-15T22:33:00.955Z · score: 7 (None votes) · LW · GW

I think you're probably in a really bad state if you have to lean very much on that with your first AGI system. You want to build the system to not optimize any harder than absolutely necessary, but you also want the system to fail safely if it does optimize a lot harder than you were expecting.

The kind of AGI approach that seems qualitatively like "oh, this could actually work" to me involves more "the system won't even try to run searches for solutions to problems you don't want solved" and less "the system tries to find those solutions but fails because of roadblocks you put in the way (e.g., you didn't give it enough hardware)".

comment by Donald Hobson (donald-hobson) · 2018-12-24T21:24:14.926Z · score: 1 (None votes) · LW · GW

Imagine that you hadn't figured out FDT, but you did have CDT and EDT. Would building an AI that defers to humans if they are different be an example of minimal but aligned?

If we take artificial addition too seriously, its hard to imagine what a "minimal arithmatician" looks like. If you understand arithmetic, you can make a perfect system, if you don't, the system will be hopeless. I would not be surprised if there was some simple "algorithm of maximally efficient intelligence" and we built it. No foom, AI starts at the top. All the ideas about rates of intelligence growth are nonsense. We built a linear time AIXI.