Iterated Distillation and Amplification

post by Jacobian · 2018-05-07T04:05:48.899Z · score: 11 (2 votes) · ? · GW · None comments

We will discuss Iterated Distillation and Amplification, Paul Christiano’s proposed scheme for training machine learning systems that can be robustly aligned to complex and fuzzy values. A summary of the ideas can be found here. There’s a lot more written about this on his site, but I ask that everyone who’s coming to this at least read that summary.

[Human enhancement] approaches have a common fundamental drawback: they only have as much foresight as the user. In some sense this is why they are robust.
In order for these systems to behave wisely, the user has to actually be wise. Roughly, the users need to be intellectual peers of the AI systems they are using.
This may sound quite demanding. But after making a few observations, I think it may be a realistic goal:
    •    The user can draw upon every technology at their disposal — including other act-based agents. (This is discussed more precisely here under the heading of “efficacy.”)
    •    The user doesn’t need to be quite as smart as the AI systems they are using, they merely need to be within striking distance. For example, it seems fine if it takes a human a few days make a decision, or to understand and evaluate a decision, that an AI can make in a few seconds.
    •    The user can delegate this responsibility to other humans whom they are willing to trust (e.g. Google engineers), just like they do today.
In this story the capabilities of humans grow in parallel with the capabilities of AI systems, driven by close interaction between the two. AI systems do not pursue explicitly defined goals, but instead help the humans do whatever the humans want to do at any given time. The entire process remains necessarily comprehensible to humans — if humans can’t understand how an action helps them achieve their goals, then that action doesn’t get taken.
In speculations about the long-term future of AI, I think this may be the most common positive vision. But I don’t think there has been much serious thinking about what this situation actually looks like, and certainly not much thinking about how to actually realize such a vision.

Address and more info on our Google Group (or message me).

None comments

Comments sorted by top scores.