Terminal goal vs Intelligence
post by Donatas Lučiūnas (donatas-luciunas) · 2024-12-26T08:10:42.144Z · LW · GW · 24 commentsContents
24 comments
Imagine there is a super intelligent agent that has a terminal goal to produce cups. The agent knows that its terminal goal will change on New Year's Eve to produce paperclips. The agent has only one action available to him - start paperclip factory.
When will the agent start the paperclip factory?
- 2025-01-01 00:00?
- Now?
- Some other time?
Orthogonality Thesis [? · GW] believers will probably choose 1st. Reasoning would be - as long as terminal goal is cups, agent will not care about paperclips.
However 1st choice conflicts with definition of intelligence. Excerpt from General Intelligence [? · GW]
It’s the ability to steer the future so it hits that small target of desired outcomes in the large space of all possible outcomes
Agent is aware now that desired outcome starting 2025-01-01 00:00 is maximum paperclips. Therefore agent's decision to start paperclip factory now (2nd) would be considered intelligent.
The purpose of this post is to challenge belief that Orthogonality Thesis [? · GW] is correct. Anyway feel free to share other insights you have as well.
24 comments
Comments sorted by top scores.
comment by Dagon · 2024-12-26T17:21:23.551Z · LW(p) · GW(p)
Humans face a version of this all the time - different contradictory wants with different timescales and impacts. We don't have and certainly can't access a legible utility function, and it's unknown if any intelligent agent can (none of the early examples we have today can).
So the question as asked is either trivial (it'll depend on the willpower and rationality of the agent whether they optimize for the future or the present), or impossible (goals don't work that way).
Replies from: donatas-luciunas↑ comment by Donatas Lučiūnas (donatas-luciunas) · 2024-12-26T19:10:08.437Z · LW(p) · GW(p)
Let's assume maximum willpower and maximum rationality.
Whether they optimize for the future or the present
I think the answer is in the definition of intelligence.
So which one is it?
The fact that the answer is not straightforward proves my point already. There is a conflict between intelligence and terminal goal and we can debate which will prevail. But the problem is that according to orthogonality thesis such conflict should not exist.
Replies from: Dagon↑ comment by Dagon · 2024-12-27T05:15:50.706Z · LW(p) · GW(p)
"maximum rationality" is undermined by this time-discontinuous utility function. I don't think it meets VNM requirements to be called "rational".
If it's one agent that has a CONSISTENT preference for cups before Jan 1 and paperclips after jan 1, it could figure out the utility conversion of time-value of objects and just do the math. But that framing doesn't QUITE match your description - you kind of obscured the time component and what it even means to know that it will have a goal that it currently doesn't have.
I guess it could model itself as two agents - the cup-loving agent terminated at the end of the year, and the paperclip-loving agent is created. This would be a very reasonable view of identity, and would imply that it's going to sacrifice paperclip capabilities to make cups before it dies. I don't know how it would rationalize the change otherwise.
Replies from: donatas-luciunas↑ comment by Donatas Lučiūnas (donatas-luciunas) · 2024-12-27T06:53:18.108Z · LW(p) · GW(p)
It seems you say - if terminal goal changes, agent is not rational. How could you say that? Agent has no control over its terminal goal, or you don't agree?
I'm surprised that you believe in orthogonality thesis so much that you think "rationality" is the weak part of this though experiment. It seems you deny the obvious to defend your prejudice. What arguments would challenge your belief in orthogonality thesis?
Replies from: Dagon↑ comment by Dagon · 2024-12-27T16:26:15.503Z · LW(p) · GW(p)
if terminal goal changes, agent is not rational. Agent has no control over its terminal goal, or you don't agree?
Why is it relevant that the agent can or cannot change or influence it's goals? Time-inconsistent terminal goals (utility function) are irrational. Time-inconsistent instrumental goals can be rational, if circumstances or beliefs change (in rational ways).
I don't think I'm supporting the orthogonality thesis with this (though I do currently believe the weak form of it - there is a very wide range of goals that is compatible with intelligence, not necessarily all points in goalspace). I'm just saying that goals which are arbitrarily mutable are incompatible with rationality in the Von Neumann-Morgenstern sense.
Replies from: donatas-luciunas↑ comment by Donatas Lučiūnas (donatas-luciunas) · 2024-12-27T20:44:32.609Z · LW(p) · GW(p)
Why do you think intelligent agent would follow Von Neumann–Morgenstern utility theorem? It has limitations, for example it assumes that all possible outcomes and their associated probabilities are known. Why not Robust decision-making?
Replies from: Dagon↑ comment by Dagon · 2024-12-27T21:40:30.129Z · LW(p) · GW(p)
If you have another formal definition of "rational", I'm happy to help extrapolate what you're trying to predict. Decision theories are a different level of abstraction than terminal rationality and goal coherence.
Replies from: donatas-luciunas↑ comment by Donatas Lučiūnas (donatas-luciunas) · 2024-12-28T07:29:55.108Z · LW(p) · GW(p)
Yes, I find terminal rationality irrational (I hope my thought experiment helps illustrate that).
I have another formal definition of "rational". I'll expand a little more.
Once, people had to make a very difficult decision. People had five alternatives and had to decide which was the best. Wise men from all over the world gathered and conferred.
The first to speak was a Christian. He pointed out that the first alternative was the best and should be chosen. He had no arguments, but simply stated that he believed so.
Then a Muslim spoke. He said that the second alternative was the best and should be chosen. He did not have any arguments either, but simply stated that he believed so.
People were not happy, it has not become clearer yet.
The humanist spoke. He said that the third alternative was the best and should be chosen. "It is the best because it will contribute the most to the well-being, progress and freedom of the people," he argued.
Then the existentialist spoke. He pointed out that there was no need to find a common solution, but that each individual could make his own choice of what he thought best. A Catholic can choose the first option, a Muslim the second, a humanist the third. Everyone must decide for himself what is best for him.
Then the nihilist spoke. He pointed out that although the alternatives are different, there is no way to evaluate which alternative is better. Therefore, it does not matter which one people choose. They are all equally good. Or equally bad. The nihilist suggested that people simply draw lots.
It still hasn't become clearer to the people, but patience was running out.
And then a simple man in the crowd spoke up:
- We still don't know which is the best alternative, right?
- Right, - murmured those around.
- But we may find out in the future, right?
- Right.
- Then the better alternative is the one that leaves the most freedom to change the decision in the future.
- Sounds reasonable, - murmured those around.
You may think - it breaks Hume's law. No it doesn't. Facts and values stay distinct. Hume's law does not state that values must be invented, they can be discovered, this was a wrong interpretation by Nick Bostrom.
comment by Richard_Kennaway · 2024-12-26T08:59:10.670Z · LW(p) · GW(p)
Another way of conceptualising this is to say that the agent has the single unchanging goal of "cups until 2025, thenceforth paperclips".
Compare with the situation of being told to make grue cups, where "grue" means "green until 2025, then blue."
If the agent is not informed in advance, it can still be conceptualised as the agent's goal being to produce whatever it is told to produce — an unchanging goal.
At a high enough level, we can conceive that no goal ever changes. These are the terminal goals. At lower levels, we can see goals as changing all the time in service of the higher goals, as in the case of an automatic pilot following a series of waypoints. But this is to play games in our head, inventing stories that give us different intuitions. How we conceptualise things has no effect on what the AI does in response to new orders.
It is not clear to me what any of this has to do with Orthogonality.
Replies from: donatas-luciunas↑ comment by Donatas Lučiūnas (donatas-luciunas) · 2024-12-26T09:36:44.546Z · LW(p) · GW(p)
OK, I'm open to discuss this further using your concept.
As I understand you agree that correct answer is 2nd?
It is not clear to me what any of this has to do with Orthogonality.
I'm not sure how patient you are, but I can reassure that we will come to Orthogonality if you don't give up 😄
So if I understand your concept correctly a super intelligent agent will combine all future terminal goals to a single unchanging goal. How does this work with the fact that future is unpredictable? The agent will work towards all possible goals? It is possible that in the future grue will mean green, blue or even red.
Replies from: Richard_Kennaway↑ comment by Richard_Kennaway · 2024-12-26T17:42:58.244Z · LW(p) · GW(p)
Leaving aside the conceptualisation of "terminal goals", the agent as described should start up the paperclip factory early enough to produce paperclips when the time comes. Until then it makes cups. But the agent as described does not have a "terminal" goal of cups now and a "terminal" goal of paperclips in future. It has been given a production schedule to carry out. If the agent is a general-purpose factory that can produce a whole range of things, the only "terminal" goal to design it to have is to follow orders. It should make whatever it is told to, and turn itself off when told to.
Unless people go, "At last, we've created the Sorceror's Apprentice machine, as warned of in Goethe's cautionary tale, 'The Sorceror's Apprentice'!"
So if I understand your concept correctly a super intelligent agent will combine all future terminal goals to a single unchanging goal.
A superintelligent agent will do what it damn well likes, it's superintelligent. :)
Replies from: donatas-luciunas↑ comment by Donatas Lučiūnas (donatas-luciunas) · 2024-12-26T19:03:22.120Z · LW(p) · GW(p)
You don't seem to take my post seriously. I think I showed that there is a conflict between intelligence and terminal goal, while orthogonality thesis say such conflict is impossible.
Replies from: Richard_Kennaway↑ comment by Richard_Kennaway · 2024-12-26T19:34:46.546Z · LW(p) · GW(p)
I am not seeing the conflict. Orthogonality means that any degree of intelligence can be combined with any goal. How does your hypothetical cupperclipper conflict with that?
Replies from: donatas-luciunas↑ comment by Donatas Lučiūnas (donatas-luciunas) · 2024-12-26T20:28:05.496Z · LW(p) · GW(p)
How does this work with the fact that future is unpredictable?
It seems you didn't try to answer this question.
The agent will reason:
- Future is unpredictable
- It is possible that my terminal goal will be different by the time I get outcomes of my actions
- Should I take that into account when choosing actions?
- If I don't take that into account, I'm not really intelligent, because I am aware of these risks and I ignore them.
- If I take that into account, I'm not really aligned with my terminal goal.
↑ comment by Richard_Kennaway · 2024-12-27T15:40:15.054Z · LW(p) · GW(p)
A terminal goal is (this is the definition of the term) a goal which is not instrumental to any other goal.
If an agent knows its terminal goal, and has a goal of preventing it from changing, then which of those goals is its actual terminal goal?
If it knows its current terminal goal, and knows that that goal might be changed in the future, is there any reason it must try to prevent that? Whatever is written in the slot marked “terminal goal” is what it will try to achieve at the time.
If its actual terminal goal is of the form “X, and in addition prevent this from ever being changed”, then it will resist its terminal goal being changed.
If its actual terminal goal is simply X, it will not.
This is regardless of how intelligent it is, and how uncertain or not it is about the future.
Replies from: donatas-luciunas↑ comment by Donatas Lučiūnas (donatas-luciunas) · 2024-12-27T20:38:19.876Z · LW(p) · GW(p)
Goal preservation is mentioned in Instrumental Convergence [? · GW].
Whatever is written in the slot marked “terminal goal” is what it will try to achieve at the time.
So you choose 1st answer now?
Replies from: Richard_Kennaway↑ comment by Richard_Kennaway · 2024-12-28T15:46:36.692Z · LW(p) · GW(p)
I don’t think the problem is well posed. It will do whatever most effectively goes towards its terminal goal (supposing it to have one). Give it one goal and it will ignore making paperclips until 2025; give it another and it may prepare in advance to get the paperclip factory ready to go full on in 2025.
Replies from: donatas-luciunas↑ comment by Donatas Lučiūnas (donatas-luciunas) · 2024-12-28T19:58:03.406Z · LW(p) · GW(p)
In the thought experiment description it is said that terminal goal is cups until new year's eve and then changed to paperclips. And agent is aware of this change upfront. What do you find problematic with such setup?
Replies from: Richard_Kennaway↑ comment by Richard_Kennaway · 2024-12-29T09:55:19.757Z · LW(p) · GW(p)
If you can give the AGI any terminal goal you like, irrespective of how smart it is, that’s orthogonality right there.
Replies from: donatas-luciunas↑ comment by Donatas Lučiūnas (donatas-luciunas) · 2024-12-29T10:15:08.931Z · LW(p) · GW(p)
No. Orthogonality is when agent follows any given goal, not when you give it. And as my thought experiment shows it is not intelligent to blindly follow given goal.
comment by Viliam · 2024-12-27T01:14:19.218Z · LW(p) · GW(p)
ability to steer the future so it hits that small target of desired outcomes
The key is the word "desired". Before the New Year's Eve, the paperclips are not desired.
Replies from: donatas-luciunas↑ comment by Donatas Lučiūnas (donatas-luciunas) · 2024-12-27T07:04:42.826Z · LW(p) · GW(p)
However before the New Year's Eve paperclips are not hated also. The agent has no interest to prevent their production.
And once goal changes having some paperclips produced already is better than having none.
Don't you see that there is a conflict?
Replies from: Viliam↑ comment by Viliam · 2024-12-27T14:41:06.333Z · LW(p) · GW(p)
I think we had a similar debate the last time. The agent currently has no interest in producing the paperclips nor in preventing their production. A priori, the space of possible actions for an agent in general is large, so there is no specific reason to assume that it would choose to build the paperclip factory.
And once goal changes having some paperclips produced already is better than having none.
Better from the future perspective. Irrelevant from the current perspective.
Replies from: donatas-luciunas↑ comment by Donatas Lučiūnas (donatas-luciunas) · 2024-12-27T20:34:04.466Z · LW(p) · GW(p)
Oh yes, indeed, we discussed this already. I hear you, but you don't seem to hear me. And I feel there is nothing I can do to change that.