Potential Research Topic: Vingean Reflection, Value Alignment and Aspiration

post by Ikaxas · 2020-02-06T01:09:05.384Z · score: 15 (4 votes) · LW · GW · 3 comments

Epistemic Status: Potential research idea. Time-limited, so not as clear as it could have been.

Vingean reflection is the process of trying to anticipate how an agent smarter than you might think, in order to ensure that it will be aligned with your values. This is hard, because "if [an agent] could predict [a smarter agent's] actions in detail, it would already be as smart as them." Value Learning is the problem of trying to use machine learning to train an AI to care about what humans care about.

I haven't read much about these problems, but they struck me as related to a concept introduced by philosopher Agnes Callard: "aspiration." Her idea is that, sometimes, we come to care about things that we didn't care about before, and, in particular, that: (1) this doesn't happen all at once, and (2) we play an active role in the process. She argues in her book (which I haven't read yet, but see the interview I just linked with her and Robert Wright) that in several different areas of philosophy (decision theory, moral psychology, and moral responsibility) the prevailing theories make assumptions that would render this process paradoxical or impossible.

To see what aspiration looks like, consider some value that you didn't have before, but now do. Since I don't know you, I'll give a generic example, but substitute in whatever actually applies to you. Suppose you are now a gourmand, though you didn't care much about good food when you were younger (this apparently happened to a friend of Callard's). How did you get from there to here? Perhaps there was a moment where you first got excited about food (in the case of Callard's friend, she took a trip to Ousaka, Japan). But this probably isn't the whole story, at least not in many cases. This lucky, random encounter provided the first shove to get you onto the path towards being a gourmand, but it didn't take you all the way. You got an inkling of the value of good food by having some in Oosaka, but you had to choose to cultivate this interest. But how is it possible to move yourself further along this path, without already knowing how a gourmand would value good food? It seems like if you care enough to want to get better at valuing good food, then you must already be the kind of person who cares about good food. And how can you critique your own taste without already having the sort of trained palette that future-you will (might) have? How can you improve, without being able to fully see the end of the path? And if you could fully see the end of the path, wouldn't you already be there? (If this description seems unclear, it probably is, and I unfortunately don't have the time to put into making it clearer; please go watch the Robert Wright interview to actually understand what's going on).

Some clarifying points from the interview:

Wright: The paradox is: until you have a value, you don't value it. So how does one get from the place of not valuing it at all, to suddenly valuing it?

...

Callard [clarifying]: The way I think about it is, how do you go from caring about it very little, to caring about it a little more; how do you increase your caring for something?

...

Wright: So you're interested in the dynamics of the process itself---what sustains the transition and the progress?

...

Callard [later]: I'm saying there's such a thing as self-creation [because your values are part of yourself, so if you have a hand in creating your values, then you have a hand in creating yourself].

This sounds a lot like the Vingean Reflection: if an agent could predict how future-them would act, they would already be future-them. It also sounds a lot like value learning: in a sense it is a type of value learning---learning the values that you want yourself to have, or the values that your potential future self has. There are obvious differences, but I think the similarities should also be apparent (especially if you've also watched the interview).

One of the prevailing methods of doing philosophy on LessWrong is this: for any philosophical concept, ask, "how would you build an AI that does that." And I think that asking "how would you build an AI that could do aspiration?" sounds a lot like the problems of Vingean Reflection and Value Learning (or perhaps some combination of the two: learning to predict how a future version of you with better values would act, and emulate them in order to become them). I think an interesting research project would be to investigate to what extent Callard's work on aspiration is relevant to solving Vingean reflection and the value learning problem. Unfortunately, I'm not in a position to do this myself right now, but I wanted to advertise that this was a possible research question, either for my future self (heh) or some other person. The main tasks would be to read Callard's book, and the literatures on Vingean reflection and the Value Learning Problem, and see what fruitful connections can be made, if any. Again, apologies that I can't lay out the research question more clearly; if I were in a position to do that (time-wise and expertise-wise) I would probably also be in a position to actually do the project, but I'm not (note, this need not be a long, protracted project; reading Callard's book and the relevant literature could probably be done in a week of full-time work, give or take a few days depending on how much literature there is, and at that point one would be in a position to evaluate whether there were any fruitful connections to be drawn. And if one is already familiar with the VR and VL literatures, it might just take the time of reading Callard's book and writing up relevant findings if any).

(Side note: I actually think the concept of aspiration may also have relevance to value drift, and movement growth, on both a personal and movement level: learning how to change one's values may also provide insight on how to keep them stable, and learning how value change is possible may provide insight on how to shape other people's values to be more aligned with EA. But I think Callard's book doesn't talk as much about the nitty-gritty of how aspiration works, but rather more about the philosophical problems it poses. My suggestion is that these problems seem very similar to the problems posed by Vingean Reflection/Value Learning, and that looking at her solutions may provide new insight on these alignment problems. The movement-growth stuff would take more extrapolation from her book, I think).

(Also, if this would be better posted as a question, I'd be happy to repost it as one or have the mods do so.)

3 comments

Comments sorted by top scores.

comment by Charlie Steiner · 2020-02-06T09:26:10.534Z · score: 5 (3 votes) · LW(p) · GW(p)

Thanks! This is an interesting recommendation.

I was definitely struck by the resemblance between her notion of "normative dependence" and the ideas behind the CIRL framework. And I think that the fix to the AI reasoning about something more intelligent is more or less the same thing humans do, which is we abstract away the planning and replace it with some "power" to do something. Like if I imagine playing Magnus Carlsen in chess, I don't simulate a chess game at all, I compare an imaginary chess-winning power I attribute to us in my abstracted mental representation.

But as for the philosophical problems she mentions in the interview, I felt like they fell into pretty standard orthodox philosophical failure modes. For the sake of clarity, I guess I should say I mean the obsession with history, and the default assumption that questions have one right answer - things I think are boondoggles have to be addressed just because they're historical, and there's too much worry about what humans "really" are like as opposed to consideration of models of humans.

comment by G Gordon Worley III (gworley) · 2020-02-07T02:02:23.277Z · score: 3 (2 votes) · LW(p) · GW(p)

This is an interesting way to frame things. I have plenty of experience what you're calling aspiration here via deliberative practices over the past 5 years or so that have caused me to transform in ways I wanted to while also not understanding how to get there. For example, when I started zen practice I had some vague idea of what I was there to do or get—get "enlightened", be more present, be more capable, act more naturally, etc.—but I didn't really understand how to do it or even what it was I was really going for. After all, if I really did understand it, I would have already been doing it. It's only through a very slow process of experimenting, trying, being nudged in directions, and making very short moves towards nearby attractors that I've over time come to better unstand some of these things, or understand why I was confused and what the thing I thought I wanted really was without being skewed by my previous perceptions of it.

I think much of the problem with the kind of approach you are proposing is figuring out how to turn this into something a machine can do. That is, right now it's understood and explained at a level that makes sense for humans, but how do we take those notions and turn them into something mathematically precise enough that we could instruct a machine to do them and then evaluate whether or not what it did was in fact what we intended. I realize you are just pointing out the idea and not claiming to have it all solved, so this is only to say that I expect much of the hard work here is figuring out what the core, natural feature of what's going on with aspiration is such that it can be used to design an AI that can do that.

comment by Ikaxas · 2020-02-07T20:00:19.895Z · score: 1 (1 votes) · LW(p) · GW(p)

how do we take those notions and turn them into something mathematically precise enough that we could instruct a machine to do them and then evaluate whether or not what it did was in fact what we intended

Yep, that's the project! I think the main utility of Callard's work here is (1) pointing out the phenomenon (a phenomenon that is strikingly similar to some of the abilities we want AI's to have), and (2) noticing that the most prominent theories of decision theory, moral psychology, and moral responsibility make assumptions that we have to break if we want to allow room for aspiration (assumptions that we who are trying to build safe AI are probably also accidentally making insofar as we take over those standard theories). IDK whether she provides alternate assumptions to make instead, but if she does these might also be useful. But the main point is just noticing that we need different theories of these things.

Once we've noticed the phenomenon of aspiration, and that it requires breaking some of these assumptions, I agree that the hard bit is coming up with a mathematical theory of aspiration (or the AI equivalent).