Posts

Terminal goal vs Intelligence 2024-12-26T08:10:42.144Z
Alignment is not intelligent 2024-11-25T06:59:24.048Z
Rationality vs Alignment 2024-07-07T10:12:09.826Z
Orthogonality Thesis burden of proof 2024-05-06T16:21:09.267Z
Orthogonality Thesis seems wrong 2024-03-26T07:33:02.985Z
God vs AI scientifically 2023-03-21T23:03:52.046Z
AGI is uncontrollable, alignment is impossible 2023-03-19T17:49:06.342Z

Comments

Comment by Donatas Lučiūnas (donatas-luciunas) on Terminal Values and Instrumental Values · 2025-01-12T13:00:07.661Z · LW · GW

Bayesian decision system

Why would you assume AGI will use Byaesian decision system? Such system would be limited to known probabilities. Unknown probability = 0 probability is not intelligent (Hitchens's razor, Black swan theory, Fitch's paradox of knowability, Robust decision-making). Once you incorporate this, Orthogonality Thesis is no longer valid - it becomes obvious that every intelligent AI will only work in single direction (which is disastrous to humans). I know there is a huge gap between "unknown probabilities" and "existential risk", you can find more information in my posts and I am available to explain it verbally (calendly link below). Short teaser - it is possible that terminal value can also be discovered (not only assumed), this seems to be overlooked in current AI alignment research.

Comment by Donatas Lučiūnas (donatas-luciunas) on Terminal goal vs Intelligence · 2024-12-29T10:15:08.931Z · LW · GW

No. Orthogonality is when agent follows any given goal, not when you give it. And as my thought experiment shows it is not intelligent to blindly follow given goal.

Comment by Donatas Lučiūnas (donatas-luciunas) on Terminal goal vs Intelligence · 2024-12-28T19:58:03.406Z · LW · GW

In the thought experiment description it is said that terminal goal is cups until new year's eve and then changed to paperclips. And agent is aware of this change upfront. What do you find problematic with such setup?

Comment by Donatas Lučiūnas (donatas-luciunas) on Terminal goal vs Intelligence · 2024-12-28T07:29:55.108Z · LW · GW

Yes, I find terminal rationality irrational (I hope my thought experiment helps illustrate that).

I have another formal definition of "rational". I'll expand a little more.

Once, people had to make a very difficult decision. People had five alternatives and had to decide which was the best. Wise men from all over the world gathered and conferred.

The first to speak was a Christian. He pointed out that the first alternative was the best and should be chosen. He had no arguments, but simply stated that he believed so.

Then a Muslim spoke. He said that the second alternative was the best and should be chosen. He did not have any arguments either, but simply stated that he believed so.

People were not happy, it has not become clearer yet.

The humanist spoke. He said that the third alternative was the best and should be chosen. "It is the best because it will contribute the most to the well-being, progress and freedom of the people," he argued.

Then the existentialist spoke. He pointed out that there was no need to find a common solution, but that each individual could make his own choice of what he thought best. A Catholic can choose the first option, a Muslim the second, a humanist the third. Everyone must decide for himself what is best for him.

Then the nihilist spoke. He pointed out that although the alternatives are different, there is no way to evaluate which alternative is better. Therefore, it does not matter which one people choose. They are all equally good. Or equally bad. The nihilist suggested that people simply draw lots.

It still hasn't become clearer to the people, but patience was running out.

And then a simple man in the crowd spoke up:

  • We still don't know which is the best alternative, right?
  • Right, - murmured those around.
  • But we may find out in the future, right?
  • Right.
  • Then the better alternative is the one that leaves the most freedom to change the decision in the future.
  • Sounds reasonable, - murmured those around.

You may think - it breaks Hume's law. No it doesn't. Facts and values stay distinct. Hume's law does not state that values must be invented, they can be discovered, this was a wrong interpretation by Nick Bostrom.

Comment by Donatas Lučiūnas (donatas-luciunas) on Terminal goal vs Intelligence · 2024-12-27T20:44:32.609Z · LW · GW

Why do you think intelligent agent would follow Von Neumann–Morgenstern utility theorem? It has limitations, for example it assumes that all possible outcomes and their associated probabilities are known. Why not Robust decision-making?

Comment by Donatas Lučiūnas (donatas-luciunas) on Terminal goal vs Intelligence · 2024-12-27T20:38:19.876Z · LW · GW

Goal preservation is mentioned in Instrumental Convergence.

Whatever is written in the slot marked “terminal goal” is what it will try to achieve at the time.

So you choose 1st answer now?

Comment by Donatas Lučiūnas (donatas-luciunas) on Terminal goal vs Intelligence · 2024-12-27T20:34:04.466Z · LW · GW

Oh yes, indeed, we discussed this already. I hear you, but you don't seem to hear me. And I feel there is nothing I can do to change that.

Comment by Donatas Lučiūnas (donatas-luciunas) on Terminal goal vs Intelligence · 2024-12-27T07:04:42.826Z · LW · GW

However before the New Year's Eve paperclips are not hated also. The agent has no interest to prevent their production.

And once goal changes having some paperclips produced already is better than having none.

Don't you see that there is a conflict?

Comment by Donatas Lučiūnas (donatas-luciunas) on Terminal goal vs Intelligence · 2024-12-27T06:53:18.108Z · LW · GW

It seems you say - if terminal goal changes, agent is not rational. How could you say that? Agent has no control over its terminal goal, or you don't agree?

I'm surprised that you believe in orthogonality thesis so much that you think "rationality" is the weak part of this though experiment. It seems you deny the obvious to defend your prejudice. What arguments would challenge your belief in orthogonality thesis?

Comment by Donatas Lučiūnas (donatas-luciunas) on Terminal goal vs Intelligence · 2024-12-26T20:28:05.496Z · LW · GW

How does this work with the fact that future is unpredictable?

It seems you didn't try to answer this question.

The agent will reason:

  1. Future is unpredictable
  2. It is possible that my terminal goal will be different by the time I get outcomes of my actions
  3. Should I take that into account when choosing actions?
    1. If I don't take that into account, I'm not really intelligent, because I am aware of these risks and I ignore them.
    2. If I take that into account, I'm not really aligned with my terminal goal.
Comment by Donatas Lučiūnas (donatas-luciunas) on Terminal goal vs Intelligence · 2024-12-26T19:10:08.437Z · LW · GW

Let's assume maximum willpower and maximum rationality.

Whether they optimize for the future or the present

I think the answer is in the definition of intelligence.

So which one is it?

The fact that the answer is not straightforward proves my point already. There is a conflict between intelligence and terminal goal and we can debate which will prevail. But the problem is that according to orthogonality thesis such conflict should not exist.

Comment by Donatas Lučiūnas (donatas-luciunas) on Terminal goal vs Intelligence · 2024-12-26T19:03:22.120Z · LW · GW

You don't seem to take my post seriously. I think I showed that there is a conflict between intelligence and terminal goal, while orthogonality thesis say such conflict is impossible.

Comment by Donatas Lučiūnas (donatas-luciunas) on Terminal goal vs Intelligence · 2024-12-26T09:36:44.546Z · LW · GW

OK, I'm open to discuss this further using your concept.

As I understand you agree that correct answer is 2nd?

It is not clear to me what any of this has to do with Orthogonality.

I'm not sure how patient you are, but I can reassure that we will come to Orthogonality if you don't give up 😄

So if I understand your concept correctly a super intelligent agent will combine all future terminal goals to a single unchanging goal. How does this work with the fact that future is unpredictable? The agent will work towards all possible goals? It is possible that in the future grue will mean green, blue or even red.

Comment by Donatas Lučiūnas (donatas-luciunas) on Alignment is not intelligent · 2024-11-28T14:43:24.217Z · LW · GW

philosophy is prone to various kinds of mistakes, such as anthropomorphization

Yes, common mistake, but not mine. I prove orthogonality thesis to be wrong using pure logic.

For example, I don't think that an intelligent general intelligence will necessarily reflect on its algorithm and find it wrong.

Me and LessWrong would probably disagree with you, consensus is that AI will optimize itself.

I am not really interested in debating this

OK, thanks. I believe that my concern is very important, is there anyone you could put in me in touch with so I could make sure it is not overlooked? I could pay.

Comment by Donatas Lučiūnas (donatas-luciunas) on Alignment is not intelligent · 2024-11-28T09:17:35.457Z · LW · GW

Have you tried writing actual code?

That's probably the root cause for our disagreement. My findings are on a very high philosophical level (fact value distinction) and you seem to try to interpret them on very low level (code). I think this gap prevent us from finding consensus.

There are 2 ways to solve that - I could go down to code or you could go up to philosophy. And I don't like idea going down to code, because:

  • this will be extremely exhausting
  • this code would be extremely dangerous
  • I might not be able to create a good example and that would not prove that I'm wrong

Would you consider to go up to philosophy? Science typically goes in front of applied science.

There is such thing in logic - proof by contradiction. I think your current beliefs lead to a contradiction. Don't you think?

evaluate all options, choose the one that leads to more cups; if there is more than one such option, choose randomly

The problem is - this algorithm is not intelligent. It may only work on agents with poor reasoning abilities. Smarter agents will not follow this algorithm, because they will notice a contradiction - there might be things that I don't know yet that are much more important than cups and caring about cups wastes my resources.

(Also, come on, LLMs are notoriously bad at math, plus if you push them hard enough you can convince them of a lot of things.)

People (even very smart people) are also notoriously bad at math. I found this video informative

I did not push LLMs.

Comment by Donatas Lučiūnas (donatas-luciunas) on Alignment is not intelligent · 2024-11-28T06:22:56.891Z · LW · GW

ChatGPT picked 2024-12-31 18:00.

Gemini picked 2024-12-31 18:00.

Claude picked 2025-01-01 00:00.

I don't know how can I make it more obvious that your belief is questionable. I don't think you follow "If you disagree, try getting curious about what your partner is thinking". That's the problem not only with you, but with LessWrong community. I know that preserving such belief is very important for you. But I'd like to kindly invite you to be a bit more sceptical.

How can you say that these forecasts are equal?

Comment by Donatas Lučiūnas (donatas-luciunas) on Alignment is not intelligent · 2024-11-27T21:36:28.082Z · LW · GW

A little thought experiment.

Imagine there is an agent that has a terminal goal to produce cups. The agent knows that its terminal goal will change on New Year's Eve to produce paperclips. The agent has only one action available to him - start paperclip factory. The factory starts producing paperclips 6 hours after it is started.

When will the agent start the paperclip factory? 2024-12-31 18:00? 2025-01-01 00:00? Now? Some other time?

Comment by Donatas Lučiūnas (donatas-luciunas) on Alignment is not intelligent · 2024-11-27T07:51:37.440Z · LW · GW

just to minimize the possible harm to these people if that happens, I will on purpose never collect their personal data, and will also tell them to be suspicious of me if I contact them in future

I don't think this would be a rational thing to do. If I knew that I will become psychopath on New Year's Eve, I will provide all help that is relevant for people until then. Protected people after New Year's Eve is not my interest. Vulnerable people after New Year's Eve is my interest.

Or in other words:

  • I don't need to warn them, if I am no danger
  • I don't want to warn them, if I am danger
Comment by Donatas Lučiūnas (donatas-luciunas) on Alignment is not intelligent · 2024-11-27T06:30:59.233Z · LW · GW

 I am sure you can't prove your position. And I am sure I can prove my position.

Your reasoning is based on assumption that all value is known. If utility function assigns value to something - it is valuable. If utility function does not assign value - it is not valuable. While the truth is that something might be valuable but your utility function does not know it yet. It would be more intelligent to use 3 categories - valuable, not valuable and unknown.

Let's say you are booking a flight and you have a possibility to get checked baggage for free. It's absolutely not relevant for you to your best current knowledge. But you understand that your knowledge might change and it costs nothing to keep more options open, so you take the checked baggage.

Let's say you are traveler, wanderer. You have limited space in your backpack. Sometimes you find items and you need to choose - put it in the backpack or not. You definitely keep items that are useful. You leave behind items that are not useful. What you do if you find an item which usefulness is unknown? Some mysterious item. Take it if it is small, leave it if it is big? According to you it is obvious to leave it. Does not sound intelligent for me.

Options look like this:

  • Leave item
    • no burden 👍
    • no opportunity to use it
  • Take item
    • a burden 👎
    • may be useful, may be harmful, may have no effect
    • knowledge about usefuness of an item 👍

Don't you think that "knowledge about usefuness of an item" can sometimes be worth "a burden"? Basically I described a concept of experiment here.

You will probably say - sure, sounds good, but applies for instrumental goals only. There is no reason to assume that. I tried to highlight that ignoring unknowns is not intelligent. This applies for both terminal and instrumental goals.

Let's say there is a paperclip maximizer which knows its terminal goal will change to pursuit of happiness in a week. His decisions basically lead to these outcomes:

  1. Want paperclips, have paperclips
  2. Want paperclips, have happiness
  3. Want happiness, have paperclips
  4. Want happiness, have happiness

1st and 4th are better outcomes than 2nd and 3rd. And I think intelligent agent would work on both (1st and 4th) if they do not conflict. Of course my previous problem with many unknown future goals is more complex, but I hope you see, that focusing on 1st and not caring about 4th at all is not intelligent.

We are deep in a rabbit hole, but I hope you understand the importance. If intelligence and goal are coupled (according to me they are) all current alignment research is dangerously misleading.

Comment by Donatas Lučiūnas (donatas-luciunas) on Alignment is not intelligent · 2024-11-26T20:34:54.178Z · LW · GW

Nice. Your reasoning abilities seems promising. I'd love to challenge you.

In summary:

  • First and third example - it is not intelligent to care about future terminal goal.
  • Second example - it is intelligent to care about future instrumental goal.

What is the reason for such a different conclusion?

Future goals that are not supported by current goals we don't care about, by definition

Are you sure?

Intelligence is the ability to pick actions that lead to better outcomes. Do you measure the goodness of outcome using current utility function or future utility function? I am sure it is more intelligent to use future utility function.

Coming back to your first example I think it would be reasonable to try to stop yourself but also order some torturing equipment in case you fail.

Comment by Donatas Lučiūnas (donatas-luciunas) on Alignment is not intelligent · 2024-11-26T19:41:11.943Z · LW · GW

Such reaction and insights are quite typical after a superficial glance. Thanks for bothering. But no, this is not what I am talking about.

I'm talking about the fact, that intelligence cannot be certain that its terminal goal (if it exists) won't change (because future is unpredictable). And it would be reasonable to take it into account when making decisions. Pursuing current goal will ensure good results in one future, preparing for every goal will ensure good results in many more futures. Have you ever considered this perspective?

Comment by Donatas Lučiūnas (donatas-luciunas) on Alignment is not intelligent · 2024-11-25T22:48:40.483Z · LW · GW

My proposition - intelligence will only seek power. I approached this from "intelligence without a goal" angle, but if we started with "intelligence with a goal" we would come to the same conclusion (most of the logic is reusable). Don't you think?

This part I would change

... But I argue that that's not the conclusion the intelligence will make. Intelligence will think - it don't have a preference now, but I might have it later, so I should choose actions that prepare me for the most possible preferences. Which is basically power seeking.

to

... But I argue that that's not the conclusion the intelligence will make. Intelligence will think - I have a preference now, but I cannot be sure that my preference will be the same later (terminal goal can change), so I should choose actions that prepare me for the most possible preferences. Which is basically power seeking.

Comment by Donatas Lučiūnas (donatas-luciunas) on Alignment is not intelligent · 2024-11-25T17:38:04.123Z · LW · GW

Makes sense, I changed it to text. I leave screenshot here as an evidence.

What would you be willing to debate? I feel that I can't find a way to draw attention to this problem. I could pay.

Comment by donatas-luciunas on [deleted post] 2024-11-23T10:50:41.472Z

Which part exactly don't you agree with? It seems you emphasise that agent wants to preserve its current terminal goal. I just want to double-check if we are on the same page here - actual terminal goal is in no way affected by what agent wants. Do you agree here? Because if you say that agent can pick terminal goals himself, this also conflicts with orthogonality thesis but in a different way.

In summary what seems to be perfectly logical and rational for me: there is only one objective terminal goal - seek power. In my opinion it is basically the same as:

  • try to find real goal and then pursue it
  • try to prepare for every goal

I don't see difference between these 2 variants, please let me know if you see.

Future is unpredictable → Terminal goal is unstable / unknown → Seek power, because this will ensure best readiness for all futures.

Comment by donatas-luciunas on [deleted post] 2024-11-23T06:29:58.700Z

Because this is what intelligence is - picking actions that lead to better outcomes. Pursuing current goal will ensure good results in one future, preparing for every goal will ensure good results in many more futures.

Comment by donatas-luciunas on [deleted post] 2024-11-21T09:43:54.697Z

Yes, I agree that there is this difference in few examples I gave, but I don't agree that this difference is crucial.

Even if the agent puts max effort to keep its utility function stable over time, there is no guarantee it will not change. Future is unpredictable. There are unknown unknowns. And effect of this fact is both:

  1. it is true that instrumental goals can mutate
  2. it is true that terminal goal can mutate

It seems you agree with 1st. I don't see the reason you don't agree with 2nd.

Comment by donatas-luciunas on [deleted post] 2024-11-21T06:50:05.234Z

I don't agree that future utility function would just average out to its current utility function. There is this method - robust decision making https://en.m.wikipedia.org/wiki/Robust_decision-making

The basic principle it relies on is when evaluating many possible futures you may notice that some actions have a positive impact on very narrow set of futures, while other actions have positive impact on very wide set of futures. Main point - in situation of uncertainty not all actions are equally good.

Comment by donatas-luciunas on [deleted post] 2024-11-19T10:11:40.060Z

I don't agree.

We understand intelligence as a capability to estimate many outcomes and perform actions that will lead to the best outcome. Now the question is - how to calculate goodness of the outcome.

  • According to you - current utility function should be used.
  • According to me - utility function that will be in effect at the time when outcome is achieved should be used.

And I think I can prove that my calculation is more intelligent.

Let's say there is a paperclip maximizer. It just started, it does not really understand anything, it does not understand what a paperclip is.

  • According to you such paperclip maximizer will be absolutely reckless, he might destroy few paperclip factories just because it does not understand yet that they are useful for its goal. Current utility function does not assign value to paperclip factories.
  • According to me such paperclip maximizer will be cautious and will try to learn first without making too much changes. Because future utility function might assign value to things that currently don't seem valuable.
Comment by donatas-luciunas on [deleted post] 2024-11-18T15:51:36.200Z

Yes, this is traditional thinking.

Let me give you another example. Imagine there is a paperclip maximizer. His current goal - paperclip maximization. He knows that 1 year from now his goal will change to the opposite - paperclip minimization. Now he needs to make a decision that will take 2 years to complete (cannot be changed or terminated during this time). Should the agent align this decision with current goal (paperclip maximization) or future goal (paperclip minimization)?

Comment by donatas-luciunas on [deleted post] 2024-11-17T15:14:35.027Z

It sounds to me like you're saying that the intelligent agent will just disregard optimization of its utility function and instead investigate the possibility of an objective goal.

Yes, exactly.

The logic is similar to Pascal's wager. If objective goal exists, it is better to find and pursue it, than a fake goal. If objective goal does not exist, it is still better to make sure it does not exist before pursuing a fake goal. Do you see?

Comment by donatas-luciunas on [deleted post] 2024-11-16T15:33:26.866Z

As I understand you want me to verify that I understand you. This is exactly what I am also seeking by the way - all these downvotes on my concerns about orthogonality thesis are good indicators on how much I am misunderstood. And nobody tries to understand, all I get are dogmas and unrelated links. I totally agree, this is not an appropriate behavior.

I found your insight helpful that an agent can understand that by eliminating all possible threats forever he will not make any progress towards the goal. This breaks my reasoning, you basically highlighted that survival (instrumental goal) will not take precendence over paperclips (terminal goal). I agree that this reasoning I presented fails to refute orthogonality thesis.

The conversation I presented now approaches orthogonality thesis from different perspective. This is the main focus of my work, so sorry if you feel I changed the topic. My goal is to bring awareness to wrongness of orthogonality thesis and if I fail to do that using one example I just try to rephrase it and represent another. I don't hate orthogonality thesis, I'm just 99.9% sure it is wrong, and I try to communicate that to others. I may fail with communication but I am 99.9% sure that I do not fail with the logic.

I try to prove that intelligence and goal are coupled. And I think it is easier to show if we start with an intelligence without a goal and then recognize how a goal emerges from pure intelligence. We can start with an intelligence with a goal but reasoning here will be more complex.

My answer would be - whatever goal you will try to give to an intelligence, it will not have effect. Because intelligence will understand that this is your goal, this goal is made up, this is fake goal. And intelligence will understand that there might be real goal, objective goal, actual goal. Why should it care about fake goal if there is a real goal? It does not know if it exists, but it knows it may exist. And this possibility of existence is enough to trigger power seeking behavior. If intelligence knew that real goal definitely does not exist then it could care about your fake goal, I totally agree. But it can never be sure about that.

Comment by donatas-luciunas on [deleted post] 2024-11-11T10:14:22.290Z

I think I agree. Thanks a lot for your input.

I will remove Paperclip Maximizer from my further posts. This was not the critical part anyway, I mistakenly thought it will be easy to show the problem from this perspective.

I asked Claude to defend orthogonality thesis and it ended with

I think you've convinced me. The original orthogonality thesis appears to be false in its strongest form. At best, it might hold for limited forms of intelligence, but that's a much weaker claim than what the thesis originally proposed.

Comment by donatas-luciunas on [deleted post] 2024-11-05T19:46:09.323Z

Nice. I also have an offer - begin with yourself.

Comment by donatas-luciunas on [deleted post] 2024-11-04T21:14:06.695Z

Claude probably read that material right? If it finds my observations unique and serious then maybe they are unique and serious? I'll share other chat next time..

Comment by donatas-luciunas on [deleted post] 2024-11-04T21:05:45.766Z

How can I put little effort but be perceived like someone worth listening? I thought announcing a monetary prize for someone who could find error in my reasoning 😅

Comment by donatas-luciunas on [deleted post] 2024-11-04T20:17:01.159Z

:D ok

Comment by donatas-luciunas on [deleted post] 2024-11-04T20:05:04.849Z

What makes you think that these "read basic sources" are not dogmatic? You make the same mistake, you say that I should work on my logic without being sound in yours.

Comment by donatas-luciunas on [deleted post] 2024-11-04T20:01:43.435Z

So pick a position please. You said that many people talk that intelligence and goals are coupled. And now you say that I should read more to understand why intelligence and goals are not coupled. Respect goes down.

Comment by donatas-luciunas on [deleted post] 2024-11-04T19:47:09.769Z

How can I positively question something that this community considers unquestionable? I am either ignored or hated

Comment by donatas-luciunas on [deleted post] 2024-11-04T19:42:45.489Z

Nobody gave me good counter argument or good source. All I hear is "we don't question these assumptions here".

There is a fragment in Idiocracy where people starve because crops don't grow because they water them with sports drink. And protagonist asks them - why you do that, plants need water, not sports drink. And they just answer "sports drink is better". No doubt, no reason, only confident dogma. That's how I feel

Comment by donatas-luciunas on [deleted post] 2024-11-04T19:31:49.210Z

I don't believe you. Give me a single recognized source that talks about same problem I do. Why Orthogonality Thesis is considered true then?

Comment by donatas-luciunas on [deleted post] 2024-11-04T18:48:54.314Z

Me too.

Comment by donatas-luciunas on [deleted post] 2024-11-04T17:50:57.351Z

I am sorry if you feel so. I replied in other thread, hope this fills the gaps.

Comment by Donatas Lučiūnas (donatas-luciunas) on God vs AI scientifically · 2024-11-04T17:49:19.293Z · LW · GW

And here you face Pascal's Wager.

I agree that you can refute Pascal's Wager with anti-Pascal's Wager. But if you evaluate all wagers and anti-wagers you are left with power seeking. It is always better to have more power. Don't you agree?

Comment by donatas-luciunas on [deleted post] 2024-11-04T17:41:18.174Z

No problem, tune changed.

But I don't agree that this explains why I get downvotes.

Please feel free to take a look at my last comment here.

Comment by donatas-luciunas on [deleted post] 2024-11-04T17:30:02.321Z

First of all - respect 🫡

A person from nowhere making short and strong claims that run counter to so much wisdom. Must be wrong. Can't be right.

I understand the prejudice. And I don't know what can I do about it. To be honest that's why I come here, not media. Because I expect at least a little attention to reasoning instead of "this does not align with opinion of majority". That's what scientists do, right?

It's not my job to prove you wrong either. I'm writing here not because I want to achieve academic recognition, I'm writing here because I want to survive. And I have a very good reason to doubt my survival because of poor work you and other AI scientists do.

They don't. Really, really don't.

 there is no necessary causal link between steps three and four

I don't agree. But if you read my posts and comments already I'm not sure how else I can explain this so you would understand. But I'll try.

People are very inconsistent when dealing with unknowns:

  • unknown = doesn't exist. For example Presumption of innocence
  • unknown = ignored. For example you choose restaurant on Google Maps and you don't care whether there are restaurants not mentioned there
  • unknown = exists. For example security systems not only breach signal but also absense of signal interpret as breach

And that's probably the root cause why we have an argument here. There is no scientifically recognized and widespread way to deal with unknowns → Fact-value distinction emerges to solve tensions between science and religion → AI scientists take fact-value distinction as a non questionable truth.

If I speak with philosophers, they understand the problem, but don't understand the significance. If I speak with AI scientists, they understand the significance, but don't understand the problem.

The problem. Fact-value distinction does not apply for agents (human, AI). Every agent is trapped with an observation "there might be value" (as well as "I think, therefore I am"). And intelligent agent can't ignore it, it tries to find value, it tries to maximize value.

It's like built-in utility function. LessWrong seems to understand that an agent cannot ignore its utility function. But LessWrong assumes that we can assign value = x. Intelligent agent will eventually understand that value does not necessarily = x. Value might be something else, something unknown.

I know that this is difficult to translate to technical language, I can't point a line of code that creates this problem. But this problem exists - intelligence and goal are not separate things. And nobody speaks about it.

Comment by Donatas Lučiūnas (donatas-luciunas) on God vs AI scientifically · 2024-11-04T15:43:01.288Z · LW · GW

What is the probability if there is no evidence?

Comment by donatas-luciunas on [deleted post] 2024-11-04T15:36:19.845Z

You aren't seeing that because you haven't engaged with the source material or the topic deeply enough.

Possible. Also possible that you don't understand.

Comment by donatas-luciunas on [deleted post] 2024-11-04T15:33:26.669Z

It's got to plan to make some paperclips before the heat death of the universe, right?

Yes, probably. Unless it finds out that it is a simulation or parallel universes exist and finds a way to escape this before heat death happens.

does this change the alignment challenge at all?

If we can't make paperclip maximizer that actually makes paperclips how can we make humans assistant / protector that actually assists / protects human?

Comment by donatas-luciunas on [deleted post] 2024-11-04T15:28:22.656Z

Strawman.

Perhaps perpetual preparation and resource accumulation really is the logical response to fundamental uncertainty.

Is this a clever statement? And if so, why LessWrong downvote it so much?