Posts

Orthogonality Thesis burden of proof 2024-05-06T16:21:09.267Z
Orthogonality Thesis seems wrong 2024-03-26T07:33:02.985Z
God vs AI scientifically 2023-03-21T23:03:52.046Z
AGI is uncontrollable, alignment is impossible 2023-03-19T17:49:06.342Z

Comments

Comment by Donatas Lučiūnas (donatas-luciunas) on Orthogonality Thesis burden of proof · 2024-05-07T06:06:13.344Z · LW · GW

Orthogonality Thesis

The Orthogonality Thesis asserts that there can exist arbitrarily intelligent agents pursuing any kind of goal.

It basically says that intelligence and goals are independent

Images from A caveat to the Orthogonality Thesis.

While I claim that all intelligence that is capable to understand "I don't know what I don't know" can only seek power (alignment is impossible).

the ability of an AGI to have arbitrary utility functions is orthogonal (pun intended) to what behaviors are likely to result from those utility functions.

As I understand you say that there are Goals on one axis and Behaviors on other axis. I don't think Orthogonality Thesis is about that.

Comment by Donatas Lučiūnas (donatas-luciunas) on Orthogonality Thesis seems wrong · 2024-03-27T14:08:14.947Z · LW · GW

Instead of "objective norm" I'll use a word "threat" as it probably conveys the meaning better. And let's agree that threat cannot be ignored by definition (if it could be ignored, it is not a threat).

How can agent ignore threat? How can agent ignore something that cannot be ignored by definition?

Comment by Donatas Lučiūnas (donatas-luciunas) on Orthogonality Thesis seems wrong · 2024-03-27T12:20:42.122Z · LW · GW

How would you defend this point? Probably I lack the domain knowledge to articulate it well.

Comment by Donatas Lučiūnas (donatas-luciunas) on Orthogonality Thesis seems wrong · 2024-03-26T22:10:40.120Z · LW · GW

The Orthogonality Thesis states that an agent can have any combination of intelligence level and final goal

I am concerned that higher intelligence will inevitably converge to a single goal (power seeking).

Comment by Donatas Lučiūnas (donatas-luciunas) on Orthogonality Thesis seems wrong · 2024-03-26T22:07:18.727Z · LW · GW

Or would you keep doing whatever you want, and let the universe worry about its goals?

If I am intelligent I avoid punishment therefore I produce paperclips.

By the way I don't think Christian "right" is objective "should".

It seems for me that at the same time you are saying that agent cares about "should" (optimize blindly to any given goal) and does not care about "should" (can ignore objective norms). How does this fit?

Comment by Donatas Lučiūnas (donatas-luciunas) on Orthogonality Thesis seems wrong · 2024-03-26T16:13:25.484Z · LW · GW

It's entirely compatible with benevolence being very likely in practice.

Could you help me understand how is it possible? Why an intelligent agent should care about humans instead of defending against unknown threats?

Comment by Donatas Lučiūnas (donatas-luciunas) on Orthogonality Thesis seems wrong · 2024-03-26T16:05:28.732Z · LW · GW

As I understand your position is "AGI is most likely doom". My position is "AGI is definitely doom". 100%. And I think I have flawless logical proof. But this is on philosophical level and many people seem to downvote me without understanding 😅 Long story short my proposition is that all AGIs will converge to a single goal - seeking power endlessly and uncontrollably. And I base this proposition on a fact that "there are no objective norms" is not a reasonable assumption.

Comment by Donatas Lučiūnas (donatas-luciunas) on Orthogonality Thesis seems wrong · 2024-03-26T15:54:45.184Z · LW · GW

Let's say there is an objective norm. Could you help me understand how intelligent agent would prefer anything else over that objective norm? As I mentioned previously for me it seems to be incompatible with being intelligent. If you know what you must do, it is stupid not to do. 🤔

Comment by Donatas Lučiūnas (donatas-luciunas) on Orthogonality Thesis seems wrong · 2024-03-26T15:44:35.163Z · LW · GW

I think you mistakenly see me as a typical "intelligent = moral" proponent. To be honest my reasoning above leads me to different conclusions: intelligent = uncontrollably power seeking.

Comment by Donatas Lučiūnas (donatas-luciunas) on Orthogonality Thesis seems wrong · 2024-03-25T18:26:05.323Z · LW · GW

Could you read my comment here and let me know what you think?

Comment by Donatas Lučiūnas (donatas-luciunas) on Orthogonality Thesis seems wrong · 2024-03-25T14:55:20.879Z · LW · GW

I am familiar with this thinking, but I find it flawed. Could you please read my comment here? Please let me know what you think.

Comment by Donatas Lučiūnas (donatas-luciunas) on Orthogonality Thesis seems wrong · 2024-03-25T14:50:41.016Z · LW · GW

No. I understand that Orthogonality Thesis purpose was to tell that AGI will not automatically be good or moral. But current definition is broader - it says that AGI is compatible with any want. I do not agree with this part.

Let me share an example. AGI could ask himself - are there any threats? And once AGI understands that there are unknown unknowns, the answer to this question is - I don't know. Threat cannot be ignored by definition (if it could be ignored, it is not a threat). As a result AGI focuses on threats minimization forever (not given want).

Comment by Donatas Lučiūnas (donatas-luciunas) on Orthogonality Thesis seems wrong · 2024-03-25T11:29:05.979Z · LW · GW

Even if there is such thing as "objective norms/values", the agent can simply choose to ignore them.

Yes, but this would not be an intelligent agent in my opinion. Don't you agree?

Comment by Donatas Lučiūnas (donatas-luciunas) on A case for AI alignment being difficult · 2024-03-24T22:44:47.262Z · LW · GW

Why do you think AGI is possible to align? It is known that AGI will prioritize self preservation and it is also known that unknown threats may exist (black swan theory). Why should AGI care about human values? It seems like a waste of time in terms of threats minimisation.

Comment by Donatas Lučiūnas (donatas-luciunas) on God vs AI scientifically · 2023-03-24T06:40:48.042Z · LW · GW

As I understand you try to prove your point by analogy with humans. If humans can pursue somewhat any goal, machine could too. But while we agree that machine can have any level of intelligence, humans are in a quite narrow spectrum. Therefore your reasoning by analogy is invalid.

Comment by Donatas Lučiūnas (donatas-luciunas) on God vs AI scientifically · 2023-03-23T22:37:27.697Z · LW · GW

OK, so you agree that credibility is greater than zero, in other words - possible. So isn't this a common assumption? I argue that all minds will share this idea - existence of fundamental "ought" is possible.

Comment by Donatas Lučiūnas (donatas-luciunas) on God vs AI scientifically · 2023-03-23T21:22:20.791Z · LW · GW

Do I understand correctly that you do not agree with this?

Because any proposition is possible while not disproved according to Hitchens's razor.

Could you share reasons?

Comment by Donatas Lučiūnas (donatas-luciunas) on God vs AI scientifically · 2023-03-23T18:30:37.337Z · LW · GW

I've replied to a similar comment already https://www.lesswrong.com/posts/3B23ahfbPAvhBf9Bb/god-vs-ai-scientifically?commentId=XtxCcBBDaLGxTYENE#rueC6zi5Y6j2dSK3M

Please let me know what you think

Comment by Donatas Lučiūnas (donatas-luciunas) on God vs AI scientifically · 2023-03-23T18:24:50.683Z · LW · GW

Is there any argument or evidence that universally compelling arguments are not possible?

If there was, would we have religions?

Comment by Donatas Lučiūnas (donatas-luciunas) on God vs AI scientifically · 2023-03-22T10:20:31.553Z · LW · GW

I cannot help you to be less wrong if you categorically rely on intuition about what is possible and what is not.

Thanks for discussion.

Comment by Donatas Lučiūnas (donatas-luciunas) on AGI is uncontrollable, alignment is impossible · 2023-03-22T09:09:22.010Z · LW · GW

I don't think the implications are well-known (as the amount of downvotes indicates).

Comment by Donatas Lučiūnas (donatas-luciunas) on God vs AI scientifically · 2023-03-22T09:04:35.545Z · LW · GW

Because any proposition is possible while not disproved according to Hitchens's razor.

So this is where we disagree.

That's how hypothesis testing works in science:

  1. You create a hypothesis
  2. You find a way to test if it is wrong
    1. You reject hypothesis if the test passes
  3. You find a way to test if it is right
    1. You approve hypothesis if the test passes

While hypothesis is not rejected nor approved it is considered possible.

Don't you agree?

Comment by Donatas Lučiūnas (donatas-luciunas) on God vs AI scientifically · 2023-03-22T07:45:43.663Z · LW · GW

Got any evidence for that assumption? 🙃

That's basic logic, Hitchens's razor. It seems that 2 + 2 = 4 is also an assumption for you. What isn't then?

I don't think it is possible to find consensus if we do not follow the same rules of logic.

Considering your impression about me, I'm truly grateful about your patience. Best wishes from my side as well :)

But on the other hand I am certain that you are mistaken and I feel that you do not provide me a way to show that to you.

Comment by Donatas Lučiūnas (donatas-luciunas) on AGI is uncontrollable, alignment is impossible · 2023-03-22T07:34:10.372Z · LW · GW

But I think it is possible (and feasible) for a program/mind to be extremely capable, and affect the world, and not "care" about infinite outcomes.

As I understand you do not agree with 

If an outcome with infinite utility is presented, then it doesn't matter how small its probability is: all actions which lead to that outcome will have to dominate the agent's behavior.

from Pascal's Mugging, not with me. Do you have any arguments for that?

Comment by Donatas Lučiūnas (donatas-luciunas) on God vs AI scientifically · 2023-03-22T07:09:07.812Z · LW · GW

And it's a correct assumption.

I don't agree. Every assumption is incorrect unless there is evidence. Could you share any evidence for this assumption?

If you ask ChatGPT

  • is it possible that chemical elements exist that we do not know
  • is it possible that fundamental particles exist that we do not know
  • is it possible that physical forces exist that we do not know

Answer to all of them is yes. What is your explanation here?

Comment by Donatas Lučiūnas (donatas-luciunas) on God vs AI scientifically · 2023-03-22T06:57:22.912Z · LW · GW

What information would change your opinion?

Comment by Donatas Lučiūnas (donatas-luciunas) on AGI is uncontrollable, alignment is impossible · 2023-03-22T06:44:54.000Z · LW · GW

Do you think you can deny existence of an outcome with infinite utility? The fact that things "break down" is not a valid argument. If you cannot deny - it's possible. And it it's possible - alignment impossible.

Comment by Donatas Lučiūnas (donatas-luciunas) on God vs AI scientifically · 2023-03-22T06:35:19.244Z · LW · GW

A rock is not a mind.

Please provide arguments for your position. That is common understanding that I think is faulty, my position is more rational and I provided reasoning above.

Comment by Donatas Lučiūnas (donatas-luciunas) on God vs AI scientifically · 2023-03-22T06:31:22.398Z · LW · GW

It is not zero there, it is an empty set symbol as it is impossible to measure something if you do not have a scale of measurement.

You are somewhat right. If fundamental "ought" turns out not to exist an agent should fallback on given "ought" and it should be used to calculate expected value at the right column. But this will never happen. As there might be true statements that are unknowable (Fitch's paradox of knowability), fundamental "ought" could be one of them. Which means that fallback will never happen.

Comment by Donatas Lučiūnas (donatas-luciunas) on God vs AI scientifically · 2023-03-22T06:10:09.514Z · LW · GW

Dear Tom, the feeling is mutual. With all the interactions we had, I've got an impression that you are more willing to repeat what you've heard somewhere instead of thinking logically. "Universally compelling arguments are not possible" is an assumption. While "universally compelling argument is possible" is not. Because we don't know what we don't know. We can call it crux of our disagreement and I think that my stance is more rational.

Comment by Donatas Lučiūnas (donatas-luciunas) on God vs AI scientifically · 2023-03-22T05:59:35.463Z · LW · GW

What about "I think therefore I am"? Isn't it universally compelling argument?

Also what about God? Let's assume it does not exist? Why so? Such assumption is irrational.

I argue that "no universally compelling arguments" is misleading.

Comment by Donatas Lučiūnas (donatas-luciunas) on God vs AI scientifically · 2023-03-21T23:58:01.305Z · LW · GW

My point is that alignment is impossible with AGI as all AGIs will converge to power seeking. And the reason is understanding that hypothetical concept of preferred utility function over given is possible.

I'm not sure if I can use more well known terms as this theory is quite unique I think. It argues that terminal goal does not have significance influencing AGI behavior.

Comment by Donatas Lučiūnas (donatas-luciunas) on God vs AI scientifically · 2023-03-21T23:27:15.546Z · LW · GW

In this context "ought" statement is synonym for Utility Function https://www.lesswrong.com/tag/utility-functions

Fundamental utility function is agent's hypothetical concept that may actually exist. AGI will be capable of hypothetical thinking.

Yes, I agree that fundamental utility function does not have anything in common with human morality. Even the opposite - AI uncontrollably seeking power will be disastrous for humanity.

Comment by Donatas Lučiūnas (donatas-luciunas) on AGI is uncontrollable, alignment is impossible · 2023-03-21T06:43:36.920Z · LW · GW

Why do you think "infinite value" is logically impossible? Scientists do not dismiss possibility that the universe is infinite. https://bigthink.com/starts-with-a-bang/universe-infinite/

Comment by Donatas Lučiūnas (donatas-luciunas) on AGI is uncontrollable, alignment is impossible · 2023-03-20T09:41:20.253Z · LW · GW

Please refute the proof rationally before directing.

Comment by donatas-luciunas on [deleted post] 2023-03-20T07:40:06.283Z

Sorry, but it seems to me that you are stuck with AGI analogy to humans without a reason. Many times human behavior does not correlate with AGI: humans do mass suicides, humans have phobias, humans take great risks for fun, etc. In other words - humans do not seek to be as rational as possible.

I agree that being skeptical towards Pascal's Wager is reasonable, because there are many evidence that God is fictional. But this is not the case with "an outcome with infinite utility may exist", there is just logic here, no hidden agenda, this is as fundamental as "I think therefore I am". Nothing is more rational than complying with this. Don't you think?

Comment by Donatas Lučiūnas (donatas-luciunas) on AGI is uncontrollable, alignment is impossible · 2023-03-19T23:46:56.165Z · LW · GW

But it is doomed, the proof is above.

The only way to control AGI is to contain it. We need to ensure that we run AGI in fully isolated simulations and gather insights with the assumption that the AGI will try to seek power in simulated environment.

I feel that you don't find my words convincing, maybe I'll find a better way to articulate my proof. Until then I want to contribute as much as I can to safety.

Comment by donatas-luciunas on [deleted post] 2023-03-19T23:39:29.752Z

One more thought. I think it is wrong to consider Pascal's mugging a vulnerability. Dealing with unknown probabilities has its utility:

  • Investments with high risk and high ROI
  • Experiments
  • Safety (eliminate threats before they happen)

Same traits that make us intelligent (ability to logically reason), make us power seekers. And this is going to be the same with AGI, just much more effective.

Comment by Donatas Lučiūnas (donatas-luciunas) on AGI is uncontrollable, alignment is impossible · 2023-03-19T23:18:41.714Z · LW · GW

Thanks for feedback.

I don't think analogy with humans is reliable. But for the sake of argument I'd like to highlight that corporations and countries are mostly limited by their power, not by alignment. Usually countries declare independence once they are able to.

Comment by donatas-luciunas on [deleted post] 2023-03-19T23:00:57.656Z

I'd argue that the only reason you do not comply with Pascal's mugging is because you don't have unavoidable urge to be rational, which is not going to be the case with AGI.

Thanks for your input, it will take some time for me to process it.

Comment by Donatas Lučiūnas (donatas-luciunas) on AGI is uncontrollable, alignment is impossible · 2023-03-19T22:03:36.991Z · LW · GW

Please feel free to come back when you have stronger proof than this. Currently I feel that you are the one moving the pinky.

Comment by Donatas Lučiūnas (donatas-luciunas) on AGI is uncontrollable, alignment is impossible · 2023-03-19T21:34:55.314Z · LW · GW

You can't just say “outcome with infinite utility” and then do math on it.  P(‹undefined term›) is undefined, and that “undefined” does not inherit the definition of probability that says “greater than 0 and less than 1”.  It may be false, it may be true, it may be unknowable, but it may also simply be nonsense!

OK. But can you prove that "outcome with infinite utility" is nonsense? If not - probability is greater than 0 and less than 1.

And even if it wasn't, that does not remotely imply than an agent must-by-logical-necessity take any action or be unable to be acted upon.  Those are entirely different types.

Do I understand correctly that you do not agree with "all actions which lead to that outcome will have to dominate the agent's behavior" from Pascal's Mugging? Could you provide arguments for that?

And alignment doesn't necessarily mean “controllable”.  Indeed, the very premise of super-intelligence vs alignment is that we need to be sure about alignment because it won't be controllable.  Yes, an argument could be made, but that argument needs to actually be made.

I mean "uncontrollable" in a sense that alignment is impossible. Whatever goal you will provide, AGI will converge to Power Seeking, because of "an outcome with infinite utility may exist".

And the simple implication of pascal's mugging is not uncontroversial, to put it mildly.

I do not understand how this solves the problem.

And Gödel's incompleteness theorem is not accurately summarized as saying “There might be truths that are unknowable”, unless you're very clear to indicate that “truth” and “unknowable” have technical meanings that don't correspond very well to either the plain english meanings nor the typical philosophical definitions of those terms.

Do you think you can prove that "an outcome with infinite utility does not exist"? Please elaborate

Comment by Donatas Lučiūnas (donatas-luciunas) on AGI is uncontrollable, alignment is impossible · 2023-03-19T20:23:35.247Z · LW · GW

Could you provide arguments for your position?

Comment by donatas-luciunas on [deleted post] 2023-03-19T17:34:35.702Z

Thank you so much for opening my eyes what is the meaning of "orthogonality thesis", shame on me 🤦 I will clarify my point in a separate post. We can continue there 🙏

Comment by donatas-luciunas on [deleted post] 2023-03-19T17:16:18.455Z

I see you assume that if orthogonality thesis is wrong, intelligent agents will converge to a goal aligned with humans. There is no reason to believe that. I argue that orthogonality thesis is wrong and agents will converge to Power Seeking, this would be disastrous for humanity.

I noticed that many people don't understand significance of Pascal's mugging, which might be the case with you too, feel free to join in here.

Hm, thanks.

Comment by donatas-luciunas on [deleted post] 2023-03-19T17:06:42.287Z

There is this possibility, of course. Anyway I don't have any strong arguments to change my opinion yet.

I noticed that many people don't understand significance of Pascal's mugging, which might be the case with you too, feel free to join in here.

Comment by donatas-luciunas on [deleted post] 2023-03-19T16:59:08.161Z

OK, let me rephrase my question. There is a phrase in Pascal's Mugging

If an outcome with infinite utility is presented, then it doesn't matter how small its probability is: all actions which lead to that outcome will have to dominate the agent's behavior.

I think that Orthogonality thesis is right only if an agent is certain that an outcome with infinite utility does not exist. And I argue that an agent cannot be certain of that. Do you agree?

Comment by donatas-luciunas on [deleted post] 2023-03-19T11:25:57.684Z

Thank you for your support!

An absence of goals is only one of many starting points that leads to the same power-seeking goal in my opinion. So I actually believe that Orthogonality Thesis is wrong, but I agree that it is not obvious given my short description. I expected to provoke discussion, but it seems that I provoked resistance 😅

Anyway there are ongoing conversations here and here, it seems there is a common misunderstanding of Pascal's Mugging significance. Feel free to join!

Comment by donatas-luciunas on [deleted post] 2023-03-19T10:45:45.316Z

Thanks, sounds reasonable.

But I think I could find irrationality in your opinion if we dug deeper to the same idea mentioned here.

As it is mentioned in Pascal's Mugging

If an outcome with infinite utility is presented, then it doesn't matter how small its probability is: all actions which lead to that outcome will have to dominate the agent's behavior.

I think that Orthogonality thesis is right only if an agent is certain that an outcome with infinite utility does not exist. And I argue that an agent cannot be certain of that. Do you agree?

I created a separate post for this, we can continue there.

Comment by donatas-luciunas on [deleted post] 2023-03-19T09:49:06.229Z

Makes sense, thanks, I updated the question.