Concept extrapolation: key posts

post by Stuart_Armstrong · 2022-04-19T10:01:24.988Z · LW · GW · 2 comments

Concept extrapolation is the skill of taking a concept, a feature, or a goal that is defined in a narrow training situation... and extrapolating it safely to a more general situation. This more general situation might be very extreme, and the original concept might not make much sense (eg defining "human beings" in terms of quantum fields).

Nevertheless, since training data is always insufficient, key concepts must be extrapolated. And doing so successfully is a skill that humans have to a certain degree, and that an aligned AI would need to possess to a higher extent.

This sequence collects the key posts on concept extrapolation. They are not necessarily to be read in this order; different people will find different posts useful.

2 comments

Comments sorted by top scores.

comment by Quintin Pope (quintin-pope) · 2022-04-20T05:36:59.342Z · LW(p) · GW(p)

I have a comment here [LW · GW] that argues many patterns in human values and our generalizations of values emerge from an inner alignment failure in the brain. I’d be interested in hearing your perspective on it and whether it tracks with your own thinking on concept extrapolation.

Replies from: Stuart_Armstrong
comment by Stuart_Armstrong · 2022-04-20T12:32:15.549Z · LW(p) · GW(p)

Thanks for that link. It does seem to correspond intuitively to a lot of the human condition. Though it doesn't really explain value extrapolation, more the starting point from which humans can extrapolate values. Still a fascinating read, thanks!