Posts
Comments
I'm a transhumanist. I believe in morphological freedom. If someone wants to change sex, that's a valid desire that Society should try to accommodate as much as feasible given currently existing technology. In that sense, anyone can choose to become trans.
The problem is that the public narrative of trans rights doesn't seem to be about making a principled case for morphological freedom, or engaging with the complicated policy question of what accommodations are feasible given the imperfections of currently existing technology. Instead, we're told that everyone has an internal sense of their own gender, which for some people (who "are trans") does not match their assigned sex at birth.
Okay, but what does that mean? Are the things about me that I've been attributing to autogynephilia actually an internal gender identity, or did I get it right the first time? How could I tell? No one seems interested in clarifying!
actually growing up in Seattle my experience has been that people's narratives of trans rights are in fact making a pretty principled case for both morphological freedom and some kind of more abstract self-labelling freedom. which you can see in how big like, nonbinary and agender self-identifications are, and also a heavy overlap between online trans communities and eg DID and furry communities. so maybe this is just a problem with your generation, or something?
Yes, the point of the proof isn't that the sane pure bets condition and the weak indifference condition are the be-all and end-all of corrigibility. But using the proof's result, I can notice that your AI will be happy to bet a million dollars against one cent that the shutdown button won't be pressed, which doesn't seem desirable. It's effectively willing to burn arbitrary amounts of utility, if we present it with the right bets.
Ideally, a successful solution to the shutdown problem should violate one or both of these conditions in clear, limited ways which don’t result in unsafe behavior, or which result in suboptimal behavior whose suboptimality falls within well-defined bounds. Rather than guessing-and-checking potential solutions and being surprised when they fail to satisfy both conditions, we should look specifically for non-sane-pure-betters and non-intuitively-indifferent-agents which nevertheless behave corrigibly and desirably.
If we implement your example, the AI is willing to bet at arbitrarily poor odds that the on switch will be on, thus violating the sane pure bets condition.
You can have particular decision problems or action spaces that don't have the circular property of the Northland-Southland problem, but the fact remains that if an AI fulfills the weak indifference condition reliably, it must violate the sane pure bets scenario in some circumstances. There must be insane bets that it's willing to take, even if no such bets are available in a particular situation.
Basically, rather than thinking about an AI in a particular scenario, the proof is talking about conditions that it's impossible for an AI to fulfill in all scenarios.
I could construct a trivial decision problem where the AI only has one action it can take, and then the sane pure bets condition and weak indifference condition are both irrelevant to that decision problem. But when we place the same AI in different scenarios, there must exist some scenarios where it violates at least one of the conditions.
After hearing the problem, the question I asked myself was: at what odds would I bet that the coin came up heads? And the answer is that I would have a neutral expected return betting at 2:1 odds. This lines up with the Bayesian answer of P(heads) = 1/3.
I strongly disagree that this was the point of this in TWC and would be highly surprised if Eliezer agreed with you. For one thing, the parties involved in nonconsensual sex in TWC seem to be having a perfectly fine time. I also wouldn't be surprised if someone raping an Ancient such that they have a terrible awful no-good time would fall under some other crime and still get the perpetrator arrested.
Conjure an IQ test and take it, obviously. My IQ when dreaming ranges from greenish-purple to twelveteen o'clock.
-Eliezer Yudkowsky trims his beard using Solmonoff Induction.
-Eliezer Yudkowsky, and only Eliezer Yudkowsky, possesses quantum immortality.
-Eliezer Yudkowsky once persuaded a superintelligence to stay inside of its box.
"Different Minds (You're Concepts Formed Differently From Mine)" should probably be "Different Minds (Your Concepts Formed Differently From Mine)."
The Philosopher's Polar North (can also be translated as The Philosopher's Apex) [Nhato Remix]
https://www.youtube.com/watch?v=pc4zZ43R9o0
Turn on captions to see the lyrics and their English translation. The song says a lot about searching for truth and knowledge that I find powerful.
Some excerpts from the translated lyrics:
Accepting even those facts I have denied, <before time melts my memories away>
I absentmindedly lift “truth” from an uneven distribution <as a god might guide fate>
Even my current knowledge is still uncertain, swaying,
so with my current knowledge I should be able to keep seeking
When I break out of this shell I’ll reach out over everything
Everything in this world is in my mind; everything, is being drawn in
Everything that has been represented with a model has form,
in this infinite space inside my mind, can move freely
Everything is recorded in the observer’s eyes
Even the deficiencies in my imagination can be immediately supplemented by literature,
and so my descent continues –
Until the endless simulations converge into one
if there must exist “things that cannot be proven”,
I’ll just know everything there is
I think this might be a decent example of "rationalist" music. In particular, the lyrics communicate the value of seeking knowledge and discerning truth. There are parts that I disagree with, but overall I think it's pretty great for a Touhou soundtrack cover made by non-rationalists. Turn on captions to see the lyrics and their English translation.
The Philosopher's Polar North [Nhato Remix]
https://www.youtube.com/watch?v=pc4zZ43R9o0
I'll paste the translated lyrics here:
Accepting even those facts I have denied, <before time melts my memories away>
I absentmindedly lift “truth” from an uneven distribution <as a god might guide fate>
Even if all these impractical theories only represent my disbelief of the status quo,
there is no “normal” that stays certain or consistent over time
Deciding based on experience is setting yourself into a foolish worldly restriction,
even with careful consideration you cannot break the thick walls of your narrow views
Even my current knowledge is still uncertain, swaying,
so with my current knowledge I should be able to keep seeking
When I break out of this shell I’ll reach out over everything
Everything in this world is in my mind; everything, is being drawn in
Everything that has been represented with a model has form,
in this infinite space inside my mind, can move freely
Everything is recorded in the observer’s eyes
Even the deficiencies in my imagination can be immediately supplemented by literature,
and so my descent continues –
Until the endless simulations converge into one
If the “end of times” is but a definition, I’ll keep chasing after it forever
Negating even established models, <before I am overcome with the consciousness of sin>
I can go wherever I please, even as I await retaliation <as when a god guarantees freedom>
With my current knowledge, I will not deny anything
if with my current knowledge I can still keep seeking
The observer will cover over everything
Everything in this world is in my heart; everything, I can touch
No longer will this be contained in just the replicas that have been created,
this infinite space inside my mind is already “another heaven”
The love of the observer will surely reach all
Everything I find wrong with the real world is immediately supplemented with literature,
and so my descent continues –
Until my endless ideals converge wih reality
If there must exist “things that cannot be proven”,
I’ll just know everything there is
Roads without roads, questions without answers, turning the page the next is blank
Eventually, questions without questions, time without flow, just by walking towards a wall
There is only, impulse without intention, grasping at sand, it’s just another “Myth of Sisyphus”
That is, meaning without meaning, a predetermined time, zero degrees at the polar north
the tip of knowledge’s
Roads without roads, questions without answers, turning the page the next is gone
Eventually, questions without questions, time without flow, running into a wall, standing still
There is only, impulse without intention, grasping at sand, it’s just another “Myth of Sisyphus”
That is, meaning without meaning, a predetermined time, even if it is
zero degrees at the polar north
I agree that in many examples, like simple risk/reward decisions shown here, certainty does not give an option higher utility. However, there are situations in which it might be advantageous to make a decision that has a worse expected outcome, but is more certain. The example that comes to mind is complex plans that involve many decisions which affect each other. There is a computational cost associated with uncertainty, in multiple possible outcomes must be considered in the plan; the plan "branches." Certainty simplifies things. As an agent with limited computing power in a situation where there is a cost associated with spending time on planning, this might be significant.