Humans are not agents: short vs long term

post by Stuart_Armstrong · 2017-06-27T13:04:36.000Z · LW · GW · 5 comments

A putative new idea for AI control; index here.

This is an example of humans not being (idealised) agents.

Imagine a human who has a preference to not live beyond a hundred years. However, they want to live to next year, and it's predictable that every year they are alive, they will have the same desire to survive till the next year.


This human (not a completely implausible example, I hope!) has a contradiction between their long and short term preferences. So which is accurate? It seems we could resolve these preferences in favour of the short term ("live forever") or the long term ("die after a century") preferences.

Now, at this point, maybe we could appeal to meta-preferences - what would the human themselves want, if they could choose? But often these meta-preferences are un- or under-formed, and can be influenced by how the question or debate is framed.

Specifically, suppose we are scheduling this human's agenda. We have the choice of making them meet one of two philosophers (not meeting anyone is not an option). If they meet Professor R. T. Long, he will advise them to follow long term preferences. If instead, they meet Paul Kurtz, he will advise them to pay attention their short term preferences. Whichever one they meet, they will argue for a while and will then settle on the recommended preference resolution. And then they will not change that, whoever they meet subsequently.

Since we are doing the scheduling, we effectively control the human's meta-preferences on this issue. What should we do? And what principles should we use to do so? We are trying to maximise human preferences, but we can also control what they are (and have to control what they are, though our choice of which philosopher they meet first).

It's clear that this can apply to AIs: if they are simultaneously aiding humans as well as learning their preferences, they will have multiple opportunities to do this sort of preference-shaping.

5 comments

Comments sorted by top scores.

comment by thetasafe · 2017-06-18T09:13:37.000Z · LW(p) · GW(p)

Please explain the term "meta-preferences", if here it doesn't means the same as put by Sir James Buchanan in his 1985 work titled "The reason of rules" for the term "meta-preferences" to be 'a preference for preferences'.

Replies from: Stuart_Armstrong
comment by Stuart_Armstrong · 2017-06-20T09:32:56.000Z · LW(p) · GW(p)

It is ‘a preference for preferences’; eg "my long term needs take precedence over my short term desires" is a meta-preference (in fact the use of terms 'needs' vs 'desires' is itself a meta-preference, as at the lowest formal level, both are just preferences).

Replies from: thetasafe
comment by thetasafe · 2017-06-21T20:18:23.000Z · LW(p) · GW(p)

How can the short term preference be classified as "live forever" and the long term preference as "die after a century"? It can also be put through your argument then, that "die after a century" would take precedence over "live forever".

Do the arguments imply that the AI will have an RLong function and a PKurtz function for preference-shaping (holding that it will have multiple opportunities)?

I was unable to gather the context in which you put your questions - "What should we do? And what principles should we use to do so?", lacking the light to gather, 'what is it that we have to "do"?'.

Replies from: Stuart_Armstrong
comment by Stuart_Armstrong · 2017-06-27T13:05:18.000Z · LW(p) · GW(p)

How can the short term preference be classified as “live forever” and the long term preference as “die after a century”?

Because "live forever" is the inductive consequence of the short-term "live till tomorrow" preference applied to every day.

Do the arguments imply that the AI will have an RLong function and a PKurtz function for preference-shaping

No. It implies that the human can be successfully modelled as having a mix of RLong and RKurtz preferences, conditional on which philosopher they meet first. And the AI is trying to best implement human preferences, yet humans have these odd mixed preferences.

What we (the AI) have to "do", is decide which philosopher the human meets first, and hence what their future preferences will be.

Replies from: thetasafe
comment by thetasafe · 2017-06-27T18:05:48.000Z · LW(p) · GW(p)

(i) Because “live forever” is the inductive consequence of the short-term “live till tomorrow” preference applied to every day.

Then, "die after a century" is the inductive consequence of the long-term "?" preference applied to "?".

(ii) No. It implies that the human can be successfully modelled as having a mix of RLong and RKurtz preferences, conditional on which philosopher they meet first. And the AI is trying to best implement human preferences, yet humans have these odd mixed preferences. What we (the AI) have to “do”, is decide which philosopher the human meets first, and hence what their future preferences will be.

I still am unable to sort out the relation between the "human", the "AI"/"the AI" and the "philosophers". I am relating it as, there is some human "H" with some name who will meet the philosophers "RLong" and "PKurtz" who will model the preferences of "H" into "RLong" and "RKurtz", conditional on whether they meet Mr./Ms. "RLong" first or Mr./Ms. "PKurtz" first. Am I right in understanding this much?

Apart from this, what/who/where is "(the AI)"? If we are not referring to our respective understandings of "the AI".

Moreover, regarding "we" i.e. "ourselves" as "the AI" i.e. "our respective understandings of the AI theory", the human "H" should meet Mr./Ms. "PKurtz" first because it will prove to be comparatively more beneficial in my understanding, where my understanding suggests an outcome "O" to be measured in terms of efficient utilization of time, if the human "H" were me or even not me, as it will save time.

To achieve anything in "long-term" needs first an understanding of the "short-term".