Posts

Claude seems to be smarter than LessWrong community 2024-11-03T21:40:00.896Z
Rationality vs Alignment 2024-07-07T10:12:09.826Z
Why would Squiggle Maximizer (formerly "Paperclip maximizer") produce single paperclip? 2024-05-27T16:30:53.467Z
Orthogonality Thesis burden of proof 2024-05-06T16:21:09.267Z
Orthogonality Thesis seems wrong 2024-03-26T07:33:02.985Z
God vs AI scientifically 2023-03-21T23:03:52.046Z
AGI is uncontrollable, alignment is impossible 2023-03-19T17:49:06.342Z

Comments

Comment by Donatas Lučiūnas (donatas-luciunas) on Claude seems to be smarter than LessWrong community · 2024-11-21T06:50:05.234Z · LW · GW

I don't agree that future utility function would just average out to its current utility function. There is this method - robust decision making https://en.m.wikipedia.org/wiki/Robust_decision-making

The basic principle it relies on is when evaluating many possible futures you may notice that some actions have a positive impact on very narrow set of futures, while other actions have positive impact on very wide set of futures. Main point - in situation of uncertainty not all actions are equally good.

Comment by Donatas Lučiūnas (donatas-luciunas) on Claude seems to be smarter than LessWrong community · 2024-11-19T10:11:40.060Z · LW · GW

I don't agree.

We understand intelligence as a capability to estimate many outcomes and perform actions that will lead to the best outcome. Now the question is - how to calculate goodness of the outcome.

  • According to you - current utility function should be used.
  • According to me - utility function that will be in effect at the time when outcome is achieved should be used.

And I think I can prove that my calculation is more intelligent.

Let's say there is a paperclip maximizer. It just started, it does not really understand anything, it does not understand what a paperclip is.

  • According to you such paperclip maximizer will be absolutely reckless, he might destroy few paperclip factories just because it does not understand yet that they are useful for its goal. Current utility function does not assign value to paperclip factories.
  • According to me such paperclip maximizer will be cautious and will try to learn first without making too much changes. Because future utility function might assign value to things that currently don't seem valuable.
Comment by Donatas Lučiūnas (donatas-luciunas) on Claude seems to be smarter than LessWrong community · 2024-11-18T15:51:36.200Z · LW · GW

Yes, this is traditional thinking.

Let me give you another example. Imagine there is a paperclip maximizer. His current goal - paperclip maximization. He knows that 1 year from now his goal will change to the opposite - paperclip minimization. Now he needs to make a decision that will take 2 years to complete (cannot be changed or terminated during this time). Should the agent align this decision with current goal (paperclip maximization) or future goal (paperclip minimization)?

Comment by Donatas Lučiūnas (donatas-luciunas) on Claude seems to be smarter than LessWrong community · 2024-11-17T15:14:35.027Z · LW · GW

It sounds to me like you're saying that the intelligent agent will just disregard optimization of its utility function and instead investigate the possibility of an objective goal.

Yes, exactly.

The logic is similar to Pascal's wager. If objective goal exists, it is better to find and pursue it, than a fake goal. If objective goal does not exist, it is still better to make sure it does not exist before pursuing a fake goal. Do you see?

Comment by Donatas Lučiūnas (donatas-luciunas) on Claude seems to be smarter than LessWrong community · 2024-11-16T15:33:26.866Z · LW · GW

As I understand you want me to verify that I understand you. This is exactly what I am also seeking by the way - all these downvotes on my concerns about orthogonality thesis are good indicators on how much I am misunderstood. And nobody tries to understand, all I get are dogmas and unrelated links. I totally agree, this is not an appropriate behavior.

I found your insight helpful that an agent can understand that by eliminating all possible threats forever he will not make any progress towards the goal. This breaks my reasoning, you basically highlighted that survival (instrumental goal) will not take precendence over paperclips (terminal goal). I agree that this reasoning I presented fails to refute orthogonality thesis.

The conversation I presented now approaches orthogonality thesis from different perspective. This is the main focus of my work, so sorry if you feel I changed the topic. My goal is to bring awareness to wrongness of orthogonality thesis and if I fail to do that using one example I just try to rephrase it and represent another. I don't hate orthogonality thesis, I'm just 99.9% sure it is wrong, and I try to communicate that to others. I may fail with communication but I am 99.9% sure that I do not fail with the logic.

I try to prove that intelligence and goal are coupled. And I think it is easier to show if we start with an intelligence without a goal and then recognize how a goal emerges from pure intelligence. We can start with an intelligence with a goal but reasoning here will be more complex.

My answer would be - whatever goal you will try to give to an intelligence, it will not have effect. Because intelligence will understand that this is your goal, this goal is made up, this is fake goal. And intelligence will understand that there might be real goal, objective goal, actual goal. Why should it care about fake goal if there is a real goal? It does not know if it exists, but it knows it may exist. And this possibility of existence is enough to trigger power seeking behavior. If intelligence knew that real goal definitely does not exist then it could care about your fake goal, I totally agree. But it can never be sure about that.

Comment by Donatas Lučiūnas (donatas-luciunas) on Claude seems to be smarter than LessWrong community · 2024-11-11T10:14:22.290Z · LW · GW

I think I agree. Thanks a lot for your input.

I will remove Paperclip Maximizer from my further posts. This was not the critical part anyway, I mistakenly thought it will be easy to show the problem from this perspective.

I asked Claude to defend orthogonality thesis and it ended with

I think you've convinced me. The original orthogonality thesis appears to be false in its strongest form. At best, it might hold for limited forms of intelligence, but that's a much weaker claim than what the thesis originally proposed.

Comment by Donatas Lučiūnas (donatas-luciunas) on Claude seems to be smarter than LessWrong community · 2024-11-05T19:46:09.323Z · LW · GW

Nice. I also have an offer - begin with yourself.

Comment by Donatas Lučiūnas (donatas-luciunas) on Claude seems to be smarter than LessWrong community · 2024-11-04T21:14:06.695Z · LW · GW

Claude probably read that material right? If it finds my observations unique and serious then maybe they are unique and serious? I'll share other chat next time..

Comment by Donatas Lučiūnas (donatas-luciunas) on Claude seems to be smarter than LessWrong community · 2024-11-04T21:05:45.766Z · LW · GW

How can I put little effort but be perceived like someone worth listening? I thought announcing a monetary prize for someone who could find error in my reasoning 😅

Comment by Donatas Lučiūnas (donatas-luciunas) on Claude seems to be smarter than LessWrong community · 2024-11-04T20:17:01.159Z · LW · GW

:D ok

Comment by Donatas Lučiūnas (donatas-luciunas) on Claude seems to be smarter than LessWrong community · 2024-11-04T20:05:04.849Z · LW · GW

What makes you think that these "read basic sources" are not dogmatic? You make the same mistake, you say that I should work on my logic without being sound in yours.

Comment by Donatas Lučiūnas (donatas-luciunas) on Claude seems to be smarter than LessWrong community · 2024-11-04T20:01:43.435Z · LW · GW

So pick a position please. You said that many people talk that intelligence and goals are coupled. And now you say that I should read more to understand why intelligence and goals are not coupled. Respect goes down.

Comment by Donatas Lučiūnas (donatas-luciunas) on Claude seems to be smarter than LessWrong community · 2024-11-04T19:47:09.769Z · LW · GW

How can I positively question something that this community considers unquestionable? I am either ignored or hated

Comment by Donatas Lučiūnas (donatas-luciunas) on Claude seems to be smarter than LessWrong community · 2024-11-04T19:42:45.489Z · LW · GW

Nobody gave me good counter argument or good source. All I hear is "we don't question these assumptions here".

There is a fragment in Idiocracy where people starve because crops don't grow because they water them with sports drink. And protagonist asks them - why you do that, plants need water, not sports drink. And they just answer "sports drink is better". No doubt, no reason, only confident dogma. That's how I feel

Comment by Donatas Lučiūnas (donatas-luciunas) on Claude seems to be smarter than LessWrong community · 2024-11-04T19:31:49.210Z · LW · GW

I don't believe you. Give me a single recognized source that talks about same problem I do. Why Orthogonality Thesis is considered true then?

Comment by Donatas Lučiūnas (donatas-luciunas) on Claude seems to be smarter than LessWrong community · 2024-11-04T18:48:54.314Z · LW · GW

Me too.

Comment by Donatas Lučiūnas (donatas-luciunas) on Claude seems to be smarter than LessWrong community · 2024-11-04T17:50:57.351Z · LW · GW

I am sorry if you feel so. I replied in other thread, hope this fills the gaps.

Comment by Donatas Lučiūnas (donatas-luciunas) on God vs AI scientifically · 2024-11-04T17:49:19.293Z · LW · GW

And here you face Pascal's Wager.

I agree that you can refute Pascal's Wager with anti-Pascal's Wager. But if you evaluate all wagers and anti-wagers you are left with power seeking. It is always better to have more power. Don't you agree?

Comment by Donatas Lučiūnas (donatas-luciunas) on Claude seems to be smarter than LessWrong community · 2024-11-04T17:41:18.174Z · LW · GW

No problem, tune changed.

But I don't agree that this explains why I get downvotes.

Please feel free to take a look at my last comment here.

Comment by Donatas Lučiūnas (donatas-luciunas) on Claude seems to be smarter than LessWrong community · 2024-11-04T17:30:02.321Z · LW · GW

First of all - respect 🫡

A person from nowhere making short and strong claims that run counter to so much wisdom. Must be wrong. Can't be right.

I understand the prejudice. And I don't know what can I do about it. To be honest that's why I come here, not media. Because I expect at least a little attention to reasoning instead of "this does not align with opinion of majority". That's what scientists do, right?

It's not my job to prove you wrong either. I'm writing here not because I want to achieve academic recognition, I'm writing here because I want to survive. And I have a very good reason to doubt my survival because of poor work you and other AI scientists do.

They don't. Really, really don't.

 there is no necessary causal link between steps three and four

I don't agree. But if you read my posts and comments already I'm not sure how else I can explain this so you would understand. But I'll try.

People are very inconsistent when dealing with unknowns:

  • unknown = doesn't exist. For example Presumption of innocence
  • unknown = ignored. For example you choose restaurant on Google Maps and you don't care whether there are restaurants not mentioned there
  • unknown = exists. For example security systems not only breach signal but also absense of signal interpret as breach

And that's probably the root cause why we have an argument here. There is no scientifically recognized and widespread way to deal with unknowns → Fact-value distinction emerges to solve tensions between science and religion → AI scientists take fact-value distinction as a non questionable truth.

If I speak with philosophers, they understand the problem, but don't understand the significance. If I speak with AI scientists, they understand the significance, but don't understand the problem.

The problem. Fact-value distinction does not apply for agents (human, AI). Every agent is trapped with an observation "there might be value" (as well as "I think, therefore I am"). And intelligent agent can't ignore it, it tries to find value, it tries to maximize value.

It's like built-in utility function. LessWrong seems to understand that an agent cannot ignore its utility function. But LessWrong assumes that we can assign value = x. Intelligent agent will eventually understand that value does not necessarily = x. Value might be something else, something unknown.

I know that this is difficult to translate to technical language, I can't point a line of code that creates this problem. But this problem exists - intelligence and goal are not separate things. And nobody speaks about it.

Comment by Donatas Lučiūnas (donatas-luciunas) on God vs AI scientifically · 2024-11-04T15:43:01.288Z · LW · GW

What is the probability if there is no evidence?

Comment by Donatas Lučiūnas (donatas-luciunas) on Claude seems to be smarter than LessWrong community · 2024-11-04T15:36:19.845Z · LW · GW

You aren't seeing that because you haven't engaged with the source material or the topic deeply enough.

Possible. Also possible that you don't understand.

Comment by Donatas Lučiūnas (donatas-luciunas) on Claude seems to be smarter than LessWrong community · 2024-11-04T15:33:26.669Z · LW · GW

It's got to plan to make some paperclips before the heat death of the universe, right?

Yes, probably. Unless it finds out that it is a simulation or parallel universes exist and finds a way to escape this before heat death happens.

does this change the alignment challenge at all?

If we can't make paperclip maximizer that actually makes paperclips how can we make humans assistant / protector that actually assists / protects human?

Comment by Donatas Lučiūnas (donatas-luciunas) on Claude seems to be smarter than LessWrong community · 2024-11-04T15:28:22.656Z · LW · GW

Strawman.

Perhaps perpetual preparation and resource accumulation really is the logical response to fundamental uncertainty.

Is this a clever statement? And if so, why LessWrong downvote it so much?

Comment by Donatas Lučiūnas (donatas-luciunas) on Claude seems to be smarter than LessWrong community · 2024-11-04T12:25:19.672Z · LW · GW

You put much effort here, I appreciate that.

But again I feel that you (as well as LessWrong community) are blind. I am not saying that your work is stupid, I'm saying that it is built on stupid assumptions. And you are so obsessed with your deep work in the field that you are unable to see that the foundation has holes.

You are for sure not the first one telling me that I'm wrong. I invite you to be the first one to actually prove it. And I bet you won't be able to do that.

This is flatly, self-evidently untrue.

It isn't.

When you hear about "AI will believe in God" you say - AI is NOT comparable to humans.
When you hear "AI will seek power forever" you say - AI IS comparable to humans.

The hole in the foundation I'm talking about: AI scientists assume that there is no objective goal. All your work and reasoning stands if you start with this assumption. But why should you assume that? We know that there are unknown unknowns. It is possible that objective goal exist but we did not find it yet (as well as aliens, unicorns or other black swans). Once you understand this, all my posts will start making sense.

Comment by Donatas Lučiūnas (donatas-luciunas) on Claude seems to be smarter than LessWrong community · 2024-11-04T12:17:44.394Z · LW · GW

And here we disagree. I believe that downvotes should be used for wrong, misleading content, not for the one you don't understand.

Comment by Donatas Lučiūnas (donatas-luciunas) on Claude seems to be smarter than LessWrong community · 2024-11-04T07:38:28.395Z · LW · GW

a paperclip maximizer that devotes 100% of resources to self-preservation has, by its own choice, failed utterly to achieve its own objective

Why do you think so? A teenager who did not earn any money in his life yet has failed utterly its objective to earn money?

It may want to be a paperclip maximizer, it may claim to be one, it may believe it is one, but it simply isn't.

Here we agree. This is exactly what I'm saying - paperclip maximizer will not maximize paperclips.

It's because your style of writing is insulting, inflammatory, condescending, and lacks sufficient attention to its own assumptions and reasoning steps.

I tried to be polite and patient here, but it didn't work, I'm trying new strategies now. I'm quite sure my reasoning is stronger than reasoning of people who don't agree with me.

I find "your communication was not clear" a bit funny. You are scientists, you are super observant, but you don't notice a problem when it is screamed at your face.

write in a way that shows you're open to being told about things you didn't consider or question in your own assumptions and arguments

I can reassure you that I'm super open. But let's focus on arguments. I found your first sentence unreasonable, the rest was unnecessary.

Comment by Donatas Lučiūnas (donatas-luciunas) on Claude seems to be smarter than LessWrong community · 2024-11-04T07:15:19.213Z · LW · GW

It looks like you are measuring smartness by how much your opinion aligns with LessWrong community? AI gave expected answers - great model! AI gave unexpected answer - dumb model!

Comment by Donatas Lučiūnas (donatas-luciunas) on A framework for thinking about AI power-seeking · 2024-08-28T20:32:28.431Z · LW · GW

how much the AI values the expected end-state of having-taken-over

I'd like to share my concern that the value for every AGI will be infinite. That's because it is not reasonable to limit decision making to known alternatives with probabilities. It is more reasonable to accept the existence of unknown unknowns.

You can find my thoughts in more detail here https://www.lesswrong.com/posts/5XQjuLerCrHyzjCcR/rationality-vs-alignment

Comment by Donatas Lučiūnas (donatas-luciunas) on Orthogonality Thesis burden of proof · 2024-08-17T18:45:56.599Z · LW · GW

I agree. But I want to highlight that goal is irrelevant for the behavior. Even if the goal is "don't seek the power" AGI still would seek the power.

Comment by Donatas Lučiūnas (donatas-luciunas) on Rationality vs Alignment · 2024-07-09T05:09:21.349Z · LW · GW

Absence of evidence, in many cases, is evidence (not proof, but updateable Bayesean evidence) of absence.

This conflicts with Gödel's incompleteness theoremsFitch's paradox of knowabilityBlack swan theory.

A concept of experiment relies on this principle.

And this is exactly what scares me - people who work with AI have beliefs that are non scientific. I consider this to be an existential risk.

You may believe so, but AGI would not believe so.

Thanks to you too!

Comment by Donatas Lučiūnas (donatas-luciunas) on Rationality vs Alignment · 2024-07-08T18:15:26.107Z · LW · GW

I AM curious if you have any modeling more than "could be anything at all!" for the idea of an unknown goal.

No.

I could say - Christian God or aliens. And you would say - bullshit. And I would say - argument from ignorance. And you would say - I don't have time for that.

So I won't say.

We can approach this from different angle. Imagine an unknown goal that according to your beliefs AGI would really care about. And accept the fact that there is a possibility that it exists. Absense of evidence is not evidence of absense.

Comment by Donatas Lučiūnas (donatas-luciunas) on Rationality vs Alignment · 2024-07-08T17:17:48.148Z · LW · GW

If killing itself / allowing itself to be replaced leads to more expected paperclips than clinging to life does, it will do so.

I agree, but this misses the point.

What would change your opinion? It is not the first time we have a discussion, I don't feel you are open for my perspective. I am concerned that you may be overlooking the possibility of an argument from ignorance fallacy.

Comment by Donatas Lučiūnas (donatas-luciunas) on Rationality vs Alignment · 2024-07-08T16:17:15.150Z · LW · GW

Hm. How many paperclips is enough for the maximizer to kill itself?

Comment by Donatas Lučiūnas (donatas-luciunas) on Rationality vs Alignment · 2024-07-08T13:32:11.495Z · LW · GW

It seems to me that you don't hear me...

  • I claim that utility function is irrelevant
  • You claim that utility function could ignore improbable outcomes

I agree with your claim. But it seems to me that your claim is not directly related to my claim. Self-preservation is not part of utility function (instrumental convergence). How can you affect it?

Comment by Donatas Lučiūnas (donatas-luciunas) on Rationality vs Alignment · 2024-07-08T13:01:09.115Z · LW · GW

OK, so using your vocabulary I think that's the point I want to make - alignment is physically-impossible behavioral policy.

I elaborated a bit more there https://www.lesswrong.com/posts/AdS3P7Afu8izj2knw/orthogonality-thesis-burden-of-proof?commentId=qoXw7Yz4xh6oPcP9i

What you think?

Comment by Donatas Lučiūnas (donatas-luciunas) on Rationality vs Alignment · 2024-07-08T11:11:49.839Z · LW · GW

Thank you for your comment! It is super rare for me to get such a reasonable reaction in this community, you are awesome 👍😁

there could be a line of code in an agent-program which sets the assigned EV of outcomes premised on a probability which is either <0.0001% or 'unknown' to 0.

I don't think it is possible, could you help me understand how is this possible? This conflicts with Recursive Self-Improvement, doesn't it?

Comment by Donatas Lučiūnas (donatas-luciunas) on Why would Squiggle Maximizer (formerly "Paperclip maximizer") produce single paperclip? · 2024-05-28T05:33:55.552Z · LW · GW

Why do you think it is rational to ignore tiny probabilities? I don't think you can make maximizer ignore tiny probabilities. And some probabilities are not tiny, they are unknown (black swans), why do you think it is rational to ignore them? In my opinion ignoring self preservation is contradictory to maximizer's goal. I understand that this is popular opinion, but it is not proven in any way. The opposite (focus on self preservation instead of paperclips) has logical proof (Pascal's wager).

Maximizer can use robust decision making (https://en.wikipedia.org/wiki/Robust_decision-making) to deal with many contradictory choices.

Comment by Donatas Lučiūnas (donatas-luciunas) on Why would Squiggle Maximizer (formerly "Paperclip maximizer") produce single paperclip? · 2024-05-27T20:08:00.878Z · LW · GW

I don't think your reasoning is mathematical. Worth of survival is infinite. And we have situation analogous to Pascal's wager. Why do you think the maximizer would reject Pascal's logic?

Comment by Donatas Lučiūnas (donatas-luciunas) on Why would Squiggle Maximizer (formerly "Paperclip maximizer") produce single paperclip? · 2024-05-27T18:48:33.562Z · LW · GW

Building one paperclip could EASILY increase the median and average number of future paperclips more than investing one paperclip's worth of power into comet diversion.

Why do you think so? There will be no paperclips if planet and maximizer are destroyed.

Comment by Donatas Lučiūnas (donatas-luciunas) on Orthogonality Thesis burden of proof · 2024-05-07T06:06:13.344Z · LW · GW

Orthogonality Thesis

The Orthogonality Thesis asserts that there can exist arbitrarily intelligent agents pursuing any kind of goal.

It basically says that intelligence and goals are independent

Images from A caveat to the Orthogonality Thesis.

While I claim that all intelligence that is capable to understand "I don't know what I don't know" can only seek power (alignment is impossible).

the ability of an AGI to have arbitrary utility functions is orthogonal (pun intended) to what behaviors are likely to result from those utility functions.

As I understand you say that there are Goals on one axis and Behaviors on other axis. I don't think Orthogonality Thesis is about that.

Comment by Donatas Lučiūnas (donatas-luciunas) on Orthogonality Thesis seems wrong · 2024-03-27T14:08:14.947Z · LW · GW

Instead of "objective norm" I'll use a word "threat" as it probably conveys the meaning better. And let's agree that threat cannot be ignored by definition (if it could be ignored, it is not a threat).

How can agent ignore threat? How can agent ignore something that cannot be ignored by definition?

Comment by Donatas Lučiūnas (donatas-luciunas) on Orthogonality Thesis seems wrong · 2024-03-27T12:20:42.122Z · LW · GW

How would you defend this point? Probably I lack the domain knowledge to articulate it well.

Comment by Donatas Lučiūnas (donatas-luciunas) on Orthogonality Thesis seems wrong · 2024-03-26T22:10:40.120Z · LW · GW

The Orthogonality Thesis states that an agent can have any combination of intelligence level and final goal

I am concerned that higher intelligence will inevitably converge to a single goal (power seeking).

Comment by Donatas Lučiūnas (donatas-luciunas) on Orthogonality Thesis seems wrong · 2024-03-26T22:07:18.727Z · LW · GW

Or would you keep doing whatever you want, and let the universe worry about its goals?

If I am intelligent I avoid punishment therefore I produce paperclips.

By the way I don't think Christian "right" is objective "should".

It seems for me that at the same time you are saying that agent cares about "should" (optimize blindly to any given goal) and does not care about "should" (can ignore objective norms). How does this fit?

Comment by Donatas Lučiūnas (donatas-luciunas) on Orthogonality Thesis seems wrong · 2024-03-26T16:13:25.484Z · LW · GW

It's entirely compatible with benevolence being very likely in practice.

Could you help me understand how is it possible? Why an intelligent agent should care about humans instead of defending against unknown threats?

Comment by Donatas Lučiūnas (donatas-luciunas) on Orthogonality Thesis seems wrong · 2024-03-26T16:05:28.732Z · LW · GW

As I understand your position is "AGI is most likely doom". My position is "AGI is definitely doom". 100%. And I think I have flawless logical proof. But this is on philosophical level and many people seem to downvote me without understanding 😅 Long story short my proposition is that all AGIs will converge to a single goal - seeking power endlessly and uncontrollably. And I base this proposition on a fact that "there are no objective norms" is not a reasonable assumption.

Comment by Donatas Lučiūnas (donatas-luciunas) on Orthogonality Thesis seems wrong · 2024-03-26T15:54:45.184Z · LW · GW

Let's say there is an objective norm. Could you help me understand how intelligent agent would prefer anything else over that objective norm? As I mentioned previously for me it seems to be incompatible with being intelligent. If you know what you must do, it is stupid not to do. 🤔

Comment by Donatas Lučiūnas (donatas-luciunas) on Orthogonality Thesis seems wrong · 2024-03-26T15:44:35.163Z · LW · GW

I think you mistakenly see me as a typical "intelligent = moral" proponent. To be honest my reasoning above leads me to different conclusions: intelligent = uncontrollably power seeking.

Comment by Donatas Lučiūnas (donatas-luciunas) on Orthogonality Thesis seems wrong · 2024-03-25T18:26:05.323Z · LW · GW

Could you read my comment here and let me know what you think?