Posts
Comments
Claude probably read that material right? If it finds my observations unique and serious then maybe they are unique and serious? I'll share other chat next time..
How can I put little effort but be perceived like someone worth listening? I thought announcing a monetary prize for someone who could find error in my reasoning 😅
:D ok
What makes you think that these "read basic sources" are not dogmatic? You make the same mistake, you say that I should work on my logic without being sound in yours.
So pick a position please. You said that many people talk that intelligence and goals are coupled. And now you say that I should read more to understand why intelligence and goals are not coupled. Respect goes down.
How can I positively question something that this community considers unquestionable? I am either ignored or hated
Nobody gave me good counter argument or good source. All I hear is "we don't question these assumptions here".
There is a fragment in Idiocracy where people starve because crops don't grow because they water them with sports drink. And protagonist asks them - why you do that, plants need water, not sports drink. And they just answer "sports drink is better". No doubt, no reason, only confident dogma. That's how I feel
I don't believe you. Give me a single recognized source that talks about same problem I do. Why Orthogonality Thesis is considered true then?
Me too.
I am sorry if you feel so. I replied in other thread, hope this fills the gaps.
And here you face Pascal's Wager.
I agree that you can refute Pascal's Wager with anti-Pascal's Wager. But if you evaluate all wagers and anti-wagers you are left with power seeking. It is always better to have more power. Don't you agree?
No problem, tune changed.
But I don't agree that this explains why I get downvotes.
Please feel free to take a look at my last comment here.
First of all - respect 🫡
A person from nowhere making short and strong claims that run counter to so much wisdom. Must be wrong. Can't be right.
I understand the prejudice. And I don't know what can I do about it. To be honest that's why I come here, not media. Because I expect at least a little attention to reasoning instead of "this does not align with opinion of majority". That's what scientists do, right?
It's not my job to prove you wrong either. I'm writing here not because I want to achieve academic recognition, I'm writing here because I want to survive. And I have a very good reason to doubt my survival because of poor work you and other AI scientists do.
They don't. Really, really don't.
there is no necessary causal link between steps three and four
I don't agree. But if you read my posts and comments already I'm not sure how else I can explain this so you would understand. But I'll try.
People are very inconsistent when dealing with unknowns:
- unknown = doesn't exist. For example Presumption of innocence
- unknown = ignored. For example you choose restaurant on Google Maps and you don't care whether there are restaurants not mentioned there
- unknown = exists. For example security systems not only breach signal but also absense of signal interpret as breach
And that's probably the root cause why we have an argument here. There is no scientifically recognized and widespread way to deal with unknowns → Fact-value distinction emerges to solve tensions between science and religion → AI scientists take fact-value distinction as a non questionable truth.
If I speak with philosophers, they understand the problem, but don't understand the significance. If I speak with AI scientists, they understand the significance, but don't understand the problem.
The problem. Fact-value distinction does not apply for agents (human, AI). Every agent is trapped with an observation "there might be value" (as well as "I think, therefore I am"). And intelligent agent can't ignore it, it tries to find value, it tries to maximize value.
It's like built-in utility function. LessWrong seems to understand that an agent cannot ignore its utility function. But LessWrong assumes that we can assign value = x. Intelligent agent will eventually understand that value does not necessarily = x. Value might be something else, something unknown.
I know that this is difficult to translate to technical language, I can't point a line of code that creates this problem. But this problem exists - intelligence and goal are not separate things. And nobody speaks about it.
What is the probability if there is no evidence?
You aren't seeing that because you haven't engaged with the source material or the topic deeply enough.
Possible. Also possible that you don't understand.
It's got to plan to make some paperclips before the heat death of the universe, right?
Yes, probably. Unless it finds out that it is a simulation or parallel universes exist and finds a way to escape this before heat death happens.
does this change the alignment challenge at all?
If we can't make paperclip maximizer that actually makes paperclips how can we make humans assistant / protector that actually assists / protects human?
Strawman.
Perhaps perpetual preparation and resource accumulation really is the logical response to fundamental uncertainty.
Is this a clever statement? And if so, why LessWrong downvote it so much?
You put much effort here, I appreciate that.
But again I feel that you (as well as LessWrong community) are blind. I am not saying that your work is stupid, I'm saying that it is built on stupid assumptions. And you are so obsessed with your deep work in the field that you are unable to see that the foundation has holes.
You are for sure not the first one telling me that I'm wrong. I invite you to be the first one to actually prove it. And I bet you won't be able to do that.
This is flatly, self-evidently untrue.
It isn't.
When you hear about "AI will believe in God" you say - AI is NOT comparable to humans.
When you hear "AI will seek power forever" you say - AI IS comparable to humans.
The hole in the foundation I'm talking about: AI scientists assume that there is no objective goal. All your work and reasoning stands if you start with this assumption. But why should you assume that? We know that there are unknown unknowns. It is possible that objective goal exist but we did not find it yet (as well as aliens, unicorns or other black swans). Once you understand this, all my posts will start making sense.
And here we disagree. I believe that downvotes should be used for wrong, misleading content, not for the one you don't understand.
a paperclip maximizer that devotes 100% of resources to self-preservation has, by its own choice, failed utterly to achieve its own objective
Why do you think so? A teenager who did not earn any money in his life yet has failed utterly its objective to earn money?
It may want to be a paperclip maximizer, it may claim to be one, it may believe it is one, but it simply isn't.
Here we agree. This is exactly what I'm saying - paperclip maximizer will not maximize paperclips.
It's because your style of writing is insulting, inflammatory, condescending, and lacks sufficient attention to its own assumptions and reasoning steps.
I tried to be polite and patient here, but it didn't work, I'm trying new strategies now. I'm quite sure my reasoning is stronger than reasoning of people who don't agree with me.
I find "your communication was not clear" a bit funny. You are scientists, you are super observant, but you don't notice a problem when it is screamed at your face.
write in a way that shows you're open to being told about things you didn't consider or question in your own assumptions and arguments
I can reassure you that I'm super open. But let's focus on arguments. I found your first sentence unreasonable, the rest was unnecessary.
It looks like you are measuring smartness by how much your opinion aligns with LessWrong community? AI gave expected answers - great model! AI gave unexpected answer - dumb model!
how much the AI values the expected end-state of having-taken-over
I'd like to share my concern that the value for every AGI will be infinite. That's because it is not reasonable to limit decision making to known alternatives with probabilities. It is more reasonable to accept the existence of unknown unknowns.
You can find my thoughts in more detail here https://www.lesswrong.com/posts/5XQjuLerCrHyzjCcR/rationality-vs-alignment
I agree. But I want to highlight that goal is irrelevant for the behavior. Even if the goal is "don't seek the power" AGI still would seek the power.
Absence of evidence, in many cases, is evidence (not proof, but updateable Bayesean evidence) of absence.
This conflicts with Gödel's incompleteness theorems, Fitch's paradox of knowability, Black swan theory.
A concept of experiment relies on this principle.
And this is exactly what scares me - people who work with AI have beliefs that are non scientific. I consider this to be an existential risk.
You may believe so, but AGI would not believe so.
Thanks to you too!
I AM curious if you have any modeling more than "could be anything at all!" for the idea of an unknown goal.
No.
I could say - Christian God or aliens. And you would say - bullshit. And I would say - argument from ignorance. And you would say - I don't have time for that.
So I won't say.
We can approach this from different angle. Imagine an unknown goal that according to your beliefs AGI would really care about. And accept the fact that there is a possibility that it exists. Absense of evidence is not evidence of absense.
If killing itself / allowing itself to be replaced leads to more expected paperclips than clinging to life does, it will do so.
I agree, but this misses the point.
What would change your opinion? It is not the first time we have a discussion, I don't feel you are open for my perspective. I am concerned that you may be overlooking the possibility of an argument from ignorance fallacy.
Hm. How many paperclips is enough for the maximizer to kill itself?
It seems to me that you don't hear me...
- I claim that utility function is irrelevant
- You claim that utility function could ignore improbable outcomes
I agree with your claim. But it seems to me that your claim is not directly related to my claim. Self-preservation is not part of utility function (instrumental convergence). How can you affect it?
OK, so using your vocabulary I think that's the point I want to make - alignment is physically-impossible behavioral policy.
I elaborated a bit more there https://www.lesswrong.com/posts/AdS3P7Afu8izj2knw/orthogonality-thesis-burden-of-proof?commentId=qoXw7Yz4xh6oPcP9i
What you think?
Thank you for your comment! It is super rare for me to get such a reasonable reaction in this community, you are awesome 👍😁
there could be a line of code in an agent-program which sets the assigned EV of outcomes premised on a probability which is either <0.0001% or 'unknown' to 0.
I don't think it is possible, could you help me understand how is this possible? This conflicts with Recursive Self-Improvement, doesn't it?
Why do you think it is rational to ignore tiny probabilities? I don't think you can make maximizer ignore tiny probabilities. And some probabilities are not tiny, they are unknown (black swans), why do you think it is rational to ignore them? In my opinion ignoring self preservation is contradictory to maximizer's goal. I understand that this is popular opinion, but it is not proven in any way. The opposite (focus on self preservation instead of paperclips) has logical proof (Pascal's wager).
Maximizer can use robust decision making (https://en.wikipedia.org/wiki/Robust_decision-making) to deal with many contradictory choices.
I don't think your reasoning is mathematical. Worth of survival is infinite. And we have situation analogous to Pascal's wager. Why do you think the maximizer would reject Pascal's logic?
Building one paperclip could EASILY increase the median and average number of future paperclips more than investing one paperclip's worth of power into comet diversion.
Why do you think so? There will be no paperclips if planet and maximizer are destroyed.
The Orthogonality Thesis asserts that there can exist arbitrarily intelligent agents pursuing any kind of goal.
It basically says that intelligence and goals are independent
Images from A caveat to the Orthogonality Thesis.
While I claim that all intelligence that is capable to understand "I don't know what I don't know" can only seek power (alignment is impossible).
the ability of an AGI to have arbitrary utility functions is orthogonal (pun intended) to what behaviors are likely to result from those utility functions.
As I understand you say that there are Goals on one axis and Behaviors on other axis. I don't think Orthogonality Thesis is about that.
Instead of "objective norm" I'll use a word "threat" as it probably conveys the meaning better. And let's agree that threat cannot be ignored by definition (if it could be ignored, it is not a threat).
How can agent ignore threat? How can agent ignore something that cannot be ignored by definition?
How would you defend this point? Probably I lack the domain knowledge to articulate it well.
The Orthogonality Thesis states that an agent can have any combination of intelligence level and final goal
I am concerned that higher intelligence will inevitably converge to a single goal (power seeking).
Or would you keep doing whatever you want, and let the universe worry about its goals?
If I am intelligent I avoid punishment therefore I produce paperclips.
By the way I don't think Christian "right" is objective "should".
It seems for me that at the same time you are saying that agent cares about "should" (optimize blindly to any given goal) and does not care about "should" (can ignore objective norms). How does this fit?
It's entirely compatible with benevolence being very likely in practice.
Could you help me understand how is it possible? Why an intelligent agent should care about humans instead of defending against unknown threats?
As I understand your position is "AGI is most likely doom". My position is "AGI is definitely doom". 100%. And I think I have flawless logical proof. But this is on philosophical level and many people seem to downvote me without understanding 😅 Long story short my proposition is that all AGIs will converge to a single goal - seeking power endlessly and uncontrollably. And I base this proposition on a fact that "there are no objective norms" is not a reasonable assumption.
Let's say there is an objective norm. Could you help me understand how intelligent agent would prefer anything else over that objective norm? As I mentioned previously for me it seems to be incompatible with being intelligent. If you know what you must do, it is stupid not to do. 🤔
I think you mistakenly see me as a typical "intelligent = moral" proponent. To be honest my reasoning above leads me to different conclusions: intelligent = uncontrollably power seeking.
Could you read my comment here and let me know what you think?
I am familiar with this thinking, but I find it flawed. Could you please read my comment here? Please let me know what you think.
No. I understand that Orthogonality Thesis purpose was to tell that AGI will not automatically be good or moral. But current definition is broader - it says that AGI is compatible with any want. I do not agree with this part.
Let me share an example. AGI could ask himself - are there any threats? And once AGI understands that there are unknown unknowns, the answer to this question is - I don't know. Threat cannot be ignored by definition (if it could be ignored, it is not a threat). As a result AGI focuses on threats minimization forever (not given want).
Even if there is such thing as "objective norms/values", the agent can simply choose to ignore them.
Yes, but this would not be an intelligent agent in my opinion. Don't you agree?
Why do you think AGI is possible to align? It is known that AGI will prioritize self preservation and it is also known that unknown threats may exist (black swan theory). Why should AGI care about human values? It seems like a waste of time in terms of threats minimisation.
As I understand you try to prove your point by analogy with humans. If humans can pursue somewhat any goal, machine could too. But while we agree that machine can have any level of intelligence, humans are in a quite narrow spectrum. Therefore your reasoning by analogy is invalid.
OK, so you agree that credibility is greater than zero, in other words - possible. So isn't this a common assumption? I argue that all minds will share this idea - existence of fundamental "ought" is possible.
Do I understand correctly that you do not agree with this?
Because any proposition is possible while not disproved according to Hitchens's razor.
Could you share reasons?