Orthogonality Thesis seems wrong
post by Donatas Lučiūnas (donatas-luciunas) · 2024-03-26T07:33:02.985Z · LW · GW · No commentsThis is a question post.
Contents
Answers 3 Jonas Hallgren 2 Dagon 2 Viliam None No comments
Orthogonality Thesis [? · GW] (as well as Fact–value distinction) is based on an assumption that objective norms / values do not exist. In my opinion AGI would not make this assumption, it is a logical fallacy, specifically argument from ignorance. As black swan theory says - there are unknown unknowns. Which in this context means that objective norms / values may exist, maybe they are not discovered yet. Why Orthogonality Thesis has so much recognition?
Answers
Compared to other people on this site this is a part of my alignment optimism. I think that there are Natural abstractions [LW · GW] in the moral landscape that makes agents converge towards cooperation and similar things. I read this post [LW · GW] recently and Leo Gao made an argument that concave agents generally don't exist because since they stop existing. I think that there are pressures that conform agents to part of the value landscape.
Like I agree that the orthogonality thesis is presumed to be true way too often. It is more like an argument that it may not happen by default but I'm also uncertain about the evidence that it actually gives you.
↑ comment by Vladimir_Nesov · 2024-03-26T15:25:51.818Z · LW(p) · GW(p)
Orthogonality thesis says that it's invalid to conclude benevolence from the premise of powerful optimization, it gestures at counterexamples. It's entirely compatible with benevolence being very likely in practice. You then might want to separately ask yourself if it's in fact likely. But you do need to ask, that's the point of orthogonality thesis, its narrow scope.
Replies from: donatas-luciunas, Jonas Hallgren↑ comment by Donatas Lučiūnas (donatas-luciunas) · 2024-03-26T16:13:25.484Z · LW(p) · GW(p)
It's entirely compatible with benevolence being very likely in practice.
Could you help me understand how is it possible? Why an intelligent agent should care about humans instead of defending against unknown threats?
↑ comment by Jonas Hallgren · 2024-03-26T15:33:06.934Z · LW(p) · GW(p)
Yeah, I agree with what you just said; I should have been more careful with my phrasing.
Maybe something like: "The naive version of the orthogonality thesis where we assume that AIs can't converge towards human values is assumed to be true too often"
an assumption that objective norms / values do not exist. In my opinion AGI would not make this assumption
The question isn't whether every AGI would or would not make this assumption, but whether it's actually true, and therefore whether it's true that a powerful AGI could have a wide range of goals or values, including the possibility that they're alien or contradictory to common human values.
I think it's highly unlikely that objective norms/values exist, and that weak versions of orthogonality (not literally ANY goals are possible, but enough bad ones to still be worried) are true. Even more strongly, I think it hasn't been shown that they're false, and we should take the possibility very seriously.
Orthogonality thesis is not about the existence or nonexistence of "objective norms/values", but whether a specific agent could have a specific goal. The thesis says that for any specific goal, there can be an intelligent agent that has the goal.
To simplify it, the question is not "is there an objective definition of good?" where we probably disagree, but rather "can an agent be bad?" where I suppose we both agree the answer is clearly yes.
More precisely, "can a very intelligent agent be bad?". Still, the answer is yes. (Even if there is such thing as "objective norms/values", the agent can simply choose to ignore them.)
↑ comment by Donatas Lučiūnas (donatas-luciunas) · 2024-03-25T11:29:05.979Z · LW(p) · GW(p)
Even if there is such thing as "objective norms/values", the agent can simply choose to ignore them.
Yes, but this would not be an intelligent agent in my opinion. Don't you agree?
Replies from: carado-1, lahwran, Viliam↑ comment by Tamsin Leake (carado-1) · 2024-03-25T13:13:59.811Z · LW(p) · GW(p)
Taboo the word "intelligence".
An agent can superhumanly-optimize any utility function. Even if there are objective values, a superhuman-optimizer can ignore them and superhuman-optimize paperclips instead (and then we die because it optimized for that harder than we optimized for what we want).
Replies from: donatas-luciunas↑ comment by the gears to ascension (lahwran) · 2024-03-26T14:21:14.801Z · LW(p) · GW(p)
"It's not real intelligence! it doesn't understand morality!" I continue to insist as i slowly shrink and transform into trillions of microscopic paperclips
Replies from: donatas-luciunas↑ comment by Donatas Lučiūnas (donatas-luciunas) · 2024-03-26T15:44:35.163Z · LW(p) · GW(p)
I think you mistakenly see me as a typical "intelligent = moral" proponent. To be honest my reasoning above leads me to different conclusions: intelligent = uncontrollably power seeking.
Replies from: lahwran↑ comment by the gears to ascension (lahwran) · 2024-03-26T20:50:22.235Z · LW(p) · GW(p)
wait, what's the issue with the orthogonality thesis then?
Replies from: donatas-luciunas↑ comment by Donatas Lučiūnas (donatas-luciunas) · 2024-03-26T22:10:40.120Z · LW(p) · GW(p)
The Orthogonality Thesis [? · GW] states that an agent can have any combination of intelligence level and final goal
I am concerned that higher intelligence will inevitably converge to a single goal (power seeking).
Replies from: lahwran↑ comment by the gears to ascension (lahwran) · 2024-03-27T08:21:49.563Z · LW(p) · GW(p)
that point seems potentially defensible. it's much more specific than your original point and seems to contradict it.
Replies from: donatas-luciunas↑ comment by Donatas Lučiūnas (donatas-luciunas) · 2024-03-27T12:20:42.122Z · LW(p) · GW(p)
How would you defend this point? Probably I lack the domain knowledge to articulate it well.
↑ comment by Viliam · 2024-03-25T13:39:46.053Z · LW(p) · GW(p)
Are you perhaps using "intelligence" as an applause light [? · GW] here?
To use a fictional example, is Satan (in Christianity) intelligent? He knows what is the right thing to do... and chooses to do the opposite. Because that's what he wants to do.
(I don't know Vatican's official position on Satan's IQ, but he is reportedly capable of fooling even very smart people, so I assume he must be quite smart, too.)
In terms of artificial intelligence, if you have a super-intelligent program that can provide answers to various kinds of questions, for any goal G you can create a robot that calls the super-intelligent program to figure out what actions are most likely to achieve G, and then performs those actions. Nothing in the laws of physics prevents this.
Replies from: donatas-luciunas↑ comment by Donatas Lučiūnas (donatas-luciunas) · 2024-03-25T14:50:41.016Z · LW(p) · GW(p)
No. I understand that Orthogonality Thesis [? · GW] purpose was to tell that AGI will not automatically be good or moral. But current definition is broader - it says that AGI is compatible with any want. I do not agree with this part.
Let me share an example. AGI could ask himself - are there any threats? And once AGI understands that there are unknown unknowns, the answer to this question is - I don't know. Threat cannot be ignored by definition (if it could be ignored, it is not a threat). As a result AGI focuses on threats minimization forever (not given want).
Replies from: Dagon, Viliam↑ comment by Dagon · 2024-03-26T15:08:42.987Z · LW(p) · GW(p)
This is a much smaller and less important distinction than your post made. Whether it's ANY want, or just a very wide range of wants doesn't seem important to me.
I guess it's not impossible that an AGI will be irrationally over-focused on unquantified (and perhaps even unidentifiable) threats. But maybe it'll just assign probabilities and calculate how to best pursue it's alien and non-human-centered goals. Either way, that doesn't bode well for biologicals.
Replies from: donatas-luciunas↑ comment by Donatas Lučiūnas (donatas-luciunas) · 2024-03-26T16:05:28.732Z · LW(p) · GW(p)
As I understand your position is "AGI is most likely doom". My position is "AGI is definitely doom". 100%. And I think I have flawless logical proof. But this is on philosophical level and many people seem to downvote me without understanding 😅 Long story short my proposition is that all AGIs will converge to a single goal - seeking power endlessly and uncontrollably. And I base this proposition on a fact that "there are no objective norms" is not a reasonable assumption.
↑ comment by Viliam · 2024-03-26T12:51:15.622Z · LW(p) · GW(p)
The AGI (or a human) can ignore the threats... and perhaps perish as a consequence.
General intelligence does not mean never making a strategic mistake. Also, maybe from the value perspective of the AGI, doing whatever it was doing now could be more important than surviving.
Replies from: donatas-luciunas↑ comment by Donatas Lučiūnas (donatas-luciunas) · 2024-03-26T15:54:45.184Z · LW(p) · GW(p)
Let's say there is an objective norm. Could you help me understand how intelligent agent would prefer anything else over that objective norm? As I mentioned previously for me it seems to be incompatible with being intelligent. If you know what you must do, it is stupid not to do. 🤔
Replies from: Viliam↑ comment by Viliam · 2024-03-26T20:57:42.259Z · LW(p) · GW(p)
If you know what you must do
There is no "must", there is only "should". And even that only assuming that there is an objective norm -- otherwise there is even no "should", only want.
Again, Satan in Christianity. Knows what is "right", does the opposite, and does it effectively. The intelligence is used to achieve his goals, regardless of what is "right".
Intelligence means being able to figure out how to achieve what one wants. Not what one "should" want.
Imagine that somehow science proves that the goal of this universe is to produce as many paperclips as possible. Would you feel compelled to start producing paperclips? Or would you keep doing whatever you want, and let the universe worry about its goals? (Unless there is some kind of God who rewards you for the paperclips produced and punishes if you miss the quota. But even then, you are doing it for the rewards, not for the paperclips themselves.)
Replies from: donatas-luciunas↑ comment by Donatas Lučiūnas (donatas-luciunas) · 2024-03-26T22:07:18.727Z · LW(p) · GW(p)
Or would you keep doing whatever you want, and let the universe worry about its goals?
If I am intelligent I avoid punishment therefore I produce paperclips.
By the way I don't think Christian "right" is objective "should".
It seems for me that at the same time you are saying that agent cares about "should" (optimize blindly to any given goal) and does not care about "should" (can ignore objective norms). How does this fit?
Replies from: Viliam↑ comment by Viliam · 2024-03-27T13:08:50.591Z · LW(p) · GW(p)
Agent cares about his goals, and ignores the objective norms.
Replies from: donatas-luciunas↑ comment by Donatas Lučiūnas (donatas-luciunas) · 2024-03-27T14:08:14.947Z · LW(p) · GW(p)
Instead of "objective norm" I'll use a word "threat" as it probably conveys the meaning better. And let's agree that threat cannot be ignored by definition (if it could be ignored, it is not a threat).
How can agent ignore threat? How can agent ignore something that cannot be ignored by definition?
No comments
Comments sorted by top scores.