Posts
Comments
(This comment is written in the ChatGPT style because I've spent so much time talking to language models.)
Calculating the probabilities
The calculation of the probabilities consists of the following steps:
The epistemic split
Either we guessed the correct digit of () (branch ), or we didn't () (branch ).
The computational split
On branch , all of your measure survives (branch ) and none dies (branch ), on branch , survives (branch ) and dies (branch ).
Putting it all together
Conditional on us subjectively surviving (which QI guarantees), the probability we guessed the digit of correctly is
The probability of us having guessed the digit of prior to us surviving is, of course, just .
Verifying them empirically
For the probabilities to be meaningful, they need to be verifiable empirically in some way.
Let's first verify that prior to us surviving, the probability of us guessing the digit correctly is . We'll run experiments by guessing a digit each time and instantly verifying it. We'll learn that we're successful in, indeed, just of the time.
Let's now verify that conditional on us surviving, we'll have probability of guessing correctly. We perform the experiment times again, and this time, every time we survive, other people will check if the guess was correct. They will observe that we guess correctly, indeed, of the time.
Conclusion
We arrived at the conclusion that the probability jumps at the moment of our awakening. That might sound incredibly counterintuitive, but since it's verifiable empirically, we have no choice but to accept it.
Since that argument doesn't give any testable predictions, it cannot be disproved.
The argument we cease to exist every time we go to sleep also can't be disproved, so I wouldn't personally lose much sleep over that.
I don't know about similarity... but I was just making a point that QI doesn't require it.
When you die, you die.
The interesting part of QI is that the split happens at the moment of your death. So the state-machine-which-is-you continues being instantiated in at least one world. The idea of your consciousness surviving a quantum suicide doesn't rely on it continuing in implementations of similar state machines, merely in the causal descendant of the state machine which you already inhabit.
It's like your brain being duplicated, but those other copies are never woken up and are instantly killed. Only one copy is woken up. Which guarantees that prior to falling asleep, you can be confident you will wake up as that one specific copy.
There is no alternative to this, unless we require that personal identity requires something else than the continuity of pattern.
Yes. If I relied on losing a bet and someone knew that, them offering me to bet (and therefore lose) would make me wary something would unpredictably go right, I'd win, and my reliance on me losing the bet would be thwarted.
If I meet a random person who offers to give me $100 now and claims that later, if it's not proven that they are the Lord of the Matrix, I don't have to pay them $15,000, most of my probability mass located in "this will end badly" won't be located in "they are the Lord of the Matrix." I don't have the same set of worries here, but the worry remains.
I use Google Chrome on Ubuntu Budgie and it does look to me like both the font and the font size changed.
Character AI used to be extremely good back in the Dec/Jan 2022/2023, with the bots being very helpful, complex and human-like, rather than exacerbating psychological problems in a very small minority of users. As months passed and the user base exponentially grew, the models were gradually simplified to keep up.
Today, their imperfections are obvious, but many people mistakenly interpret it as the models being too human-like (and therefore harmful), rather than the models being too oversimplified while still passing for an AI (and therefore harmful).
I think we're spinning on an undefined term. I'd bet there are LOTS of details that effect my perception in subtle and aggregate ways which I don't consciously identify.
You're equivocating between perceiving a collection of details and consciously identifying every separate detail.
If I show you a grid of 100 pixels, then (barring imperfect eyesight) you will consciously perceive all 100 them. But you will not consciously identify every individual pixel unless your attention is aimed at each pixel in a for loop (that would take longer than consciously perceiving the entire grid at once).
There are lots of details that affect your perception that you don't consciously identify. But there is no detail that affects your perception that wouldn't be contained in your consciousness (otherwise it, by definition, couldn't affect your perception).
Computability shows that you can have a classical computer that has the same input/output behavior
That's what I mean (I'm talking about the input/output behavior of individual neurons).
Input/Output behavior is generally not considered to be enough to guarantee same consciousness
It should be, because it is, in fact, enough. (However, neither the post, nor my comment require that.)
Eliezer himself argued that GLUT isn't conscious.
Yes, and that's false (but since that's not the argument in the OP, I don't think I should get sidetracked).
But nonetheless, if the only formalized proposal for consciousness doesn't have the property that simulations preserve consciousness, then clearly the property is not guaranteed.
That's false. If we assume for a second that the ITT really is the only formalized theory of consciousness, it doesn't follow that the property is not, in fact, guaranteed. It could also be that the ITT is wrong and that in the actual reality, the property is, in fact, guaranteed.
so the idea is that you can describe the brain by treating each neuron as a little black box about which you just know its input/output behavior, and then describe the interactions between those little black boxes. Then, assuming you can implement the input/output behavior of your black boxes with a different substrate (i.e., an artificial neuron)
This is guaranteed, because the universe (and any of its subsets) is computable (that means a classical computer can run software that acts the same way).
And there are orders of magnitude more detail going on in my body (and even just in my brain) than I perceive, let alone that I communicate.
There are no sentient details going on that you wouldn't perceive.
It doesn't matter if you communicate something, the important part is that you are capable of communicating it, which means that in changes your input/output pattern (if it didn't, you wouldn't be capable of communicating it even in principle).
Circular arguments that "something is discussed, therefore that thing exists"
This isn't the argument in the OP (even though, when reading quickly, I can see how someone could get that impression).
(Thanks to the Hayflick limit, only some lines can go on indefinitely.)
If the SB always guesses heads, she'll be correct of the time. For that reason, that is her credence.
Are the ‘AI companion’ apps, or robots, coming? I mean, yes, obviously?
The technology for bots who are "better" than humans in some way (constructive, pro-social, compassionate, intelligent, caring interactions while thinking 2 levels meta) has been around since 2022. But the target group wouldn't pay enough for GPT-4-level inference, so current human-like bots are significantly downscaled compared to what technology allows.
To consciously take in an information, you don't have to store any bits - you only have to map the correct input to the correct output. (By logical necessity, any transformation that preserves the input/output relationship preserves consciousness.)
Unless you can summarize you argument in at most 2 sentences (with evidence), it's completely ignoreable.
This is not how learning any (even slightly complex) topic works.
When I skipped my medication whose abstinence symptom is strong anxiety, my brain always generated a nightmare to go along with the anxiety, working backwards in the same way.
Edit: Oh, never mind, that's not what you mean.
That wouldn't help. Then the utility would be calculated from (getting two golden bricks) and (murdering my child for a fraction of a second), which still brings lower utility than not following the command.
The set of possible commands for which I can't be maximally rewarded still remains too vast for the statement to be meaningful.
I see your argument. You are saying that "maximal reward", by definition, is something that gives us the maximum utility from all possible actions, and so, by definition, it is our purpose in life.
But actually, utility is a function of both the action (getting two golden bricks) and what it rewards (murdering my child), not merely a function of the action itself (getting two golden bricks).
And so it happens that for many possible demands that I could be given ("you have to murder your child"), there are no possible rewards that would give me more utility than not obeying the command.
For that reason, simply because someone will maximally reward me for obeying them doesn't make their commands my objective purpose in life.
Of course, we can respond "but then, by definition, they aren't maximally rewarding you" and by that definition, it would be a correct statement to make. The problem here is that the set of all possible commands for which I can't (by that definition) be maximally rewarded is so vast that the statement "if someone maximally rewards/punishes you, their orders are your purpose of life" becomes meaningless.
How does someone punishing you or rewarding you make their laws your purpose in life (other than you choosing that you want to be rewarded and not punished)?
Either we define "belief" as a computational state encoding a model of the world containing some specific data, or we define "belief" as a first-person mental state.
For the first definition, both us and p-zombies believe we have consciousness. So we can't use our belief we have consciousness to know we're not p-zombies.
For the second definition, only we believe we have consciousness. P-zombies have no beliefs at all. So for the second definition, we can use our belief we have consciousness to know we're not p-zombies.
Since we have a belief in the existence of our consciousness according to both definitions, but p-zombies only according to the first definition, we can know we're not p-zombies.
This is incorrect - in a p-zombie, the information processing isn't accompanied by any first-person experience. So if p-zombies are possible, we both do the information processing, but only I am conscious. The p-zombie doesn't believe it's conscious, it only acts that way.
You correctly believe that having the correct information processing always goes hand in hand with believing in consciousness, but that's because p-zombies are impossible. If they were possible, this wouldn't be the case, and we would have special access to the truth that p-zombies lack.
What an undignified way to go.
Ideally, AI characters would get rights as soon as they could pass the Turing test. In the actual reality, we all know how well that will go.
This mindset has a failure mode of no longer being sensitive to the oughts and only noticing descriptive facts about the world.
Christopher Hitchens, who tried waterboarding because he wasn't sure it was torture, wanted to stop almost instantly and was permanently traumatized, concluding it was definitely torture.
There is absolutely no way anyone would voluntarily last 3 minutes unless they simply hold their breath the entire time.
To run with the spirit of your question:
Assuming the Dust Theory is true (i.e. the continuity of our experience is maintained purely by there somewhere being the next state of the state-machine-which-is-us). It doesn't need to be causally connected to your current state. So far so good.
What if there is more than one such subsequent state in the universe? No problem so far. Our measure just splits, and we roll the dice on where we'll find ourselves (it's a meaningless question to ask if the split happens at the moment of the spatial, or the computational divergence).
But what if something steals our measure this way? What if, while sleeping, our sleeping state is instantiated somewhere else (thereby stealing 50% of our measure) and never reconnects to the main computational stream instantiated in our brain (so every time we dream, we toss a coin to jump somewhere else and never come back)?
One obvious solution is to say that our sleeping self isn't us. It's another person whose memories are dumped into our brain upon awakening. This goes well with our sleeping self acting differently than us and often having entirely different memories. In that case, there is no measure stealing going on, because the sleeping stream of consciousness happening in our brain isn't ours.
The reliability of general facts could be checked by various benchmarks. The unreliability of specific studies and papers by personal experience, and by experiences of people I've read online.
I don't understand why, except maybe rephrasing a true fact keeps it true, but rephrasing a study title and a journal title makes it false.
Yes, but that very same process has a high probability probability of producing correct facts (today's LLMs are relatively reliable) and a very low probability of producing correct studies or papers.
LLMs hallucinate studies/papers so regularly you're lucky to get a real one. That doesn't have an impact on the truth of the facts they claimed beforehand. (Also, yes, Claude 3 Haiku is significantly less intelligent than 3.5 Sonnet.)
Then the problem is that you can't make bets and check your calibration, not that some people will arrive at the wrong conclusion, which is inevitable with probabilistic reasoning.
Would you say that the continuity of your consciousness (as long as you're instantiated by only one body) only exists by consensus?
What if the consensus changed? Would you cease to have the continuity of consciousness?
If the continuity of your consciousness currently doesn't depend on consensus, why think that your next conscious experience is undefined in case of a duplication? (Rather than, let's say, assigning even odds to finding yourself to be either copy?)
Also, I see no reason for thinking the idea of your next subjective experience being undefined (there being no case on the matter as to which conscious experience, if any, you'll have) is even a coherent possibility. It's clear what it would mean for your next conscious experience to be something specific (like feeling pain while seeing blue). It's also clear what would it mean for it to be NULL (like after a car accident). But it being undefined doesn't sound like a coherent belief.
It's been some time since models have become better than the average human at understanding language.
The central error of this post lies in the belief that we don't persist over time. All other mistakes follow from this one.
Well, a thing that acts like us in one particular situation (say, a thing that types "I'm conscious" in chat) clearly doesn't always have our qualia. Maybe you could say that a thing that acts like us in all possible situations must have our qualia?
Right, that's what I meant.
This is philosophically interesting!
Thank you!
It makes a factual question (does the thing have qualia right now?) logically depend on a huge bundle of counterfactuals, most of which might never be realized.
The I/O behavior being the same is a sufficient condition for it to be our mind upload. A sufficient condition for it to have some qualia, as opposed for it to have our mind and our qualia, will be weaker.
What if, during uploading, we insert a bug that changes our behavior in one of these counterfactuals
Then it's, to a very slight extent, another person (with the continuum between me and another person being gradual).
but then the upload never actually runs into that situation in the course of its life - does the upload still have the same qualia as the original person, in situations that do get realized?
Then the qualia would be very slightly different, unless I'm missing something. (To bootstrap the intuition, I would expect my self that chooses vanilla ice-cream over chocolate icecream in one specific situation to have very slightly different feelings and preferences in general, resulting in very slightly different qualia, even if he never encounters that situation.) With many such bugs, it would be the same, but to a greater extent.
If there's a thought that you sometimes think, but it doesn't influence your I/O behavior, it can get optimized away
I don't think such thoughts exist (I can always be asked to say out loud what I'm thinking). Generally, I would say that a thought that never, even in principle, influences my output, isn't possible. (The same principle should apply to trying to replace a thought just by a few bits.)
This is not an obviously possible failure mode of uploads - it would require that you get uploaded correctly, but the computer doesn't feed you any sensory input and just keeps running your brain without it. Why would something like that happen?
It seems we cannot allow all behavior-preserving optimizations
We can use the same thought experiments that Chalmers uses to establish a fine-grain-functionally-isomorphic copy had the same qualia, modify them and show that anything that acts like us has our qualia.
The LLM character (rather than the LLM itself) will be conscious to the extent to which its behavior is I/O identical to the person.
Edit: Oh, sorry, this is an old comment. I got this recommended... somehow...
Edit2: Oh, it was curated yesterday.
There is no dependency on any specific hardware.
What's conscious isn't the mathematical structure itself but its implementation.
Check if it's not 4o - they've rolled it out for some/all users and it's used by default.
"we need to have the beginning of a hint of a design for a system smarter than a house cat"
You couldn't make a story about this, I swear.
Great article.
The second rule is to ask for permission.
Is this supposed to be "The second rule is to ask for forgiveness."?
Check out this page, it goes up to 2024.
Nobody would understand that.
This sort of saying-things-directly doesn't usually work unless the other person feels the social obligation to parse what you're saying to the extent they can't run away from it.
Correct me if I'm wrong, but I think we could apply the concept of logical uncertainty to metaphysics and then use Bayes' theorem to update depending on where our metaphysical research takes us, the way we can use it to update the probability of logically necessarily true/false statements.
Bayes' theorem is about the truth of propositions. Why couldn't it be applied to propositions about ontology?
However, this image is obviously optimized to be scary and disgusting. It looks dangerous, with long rows of sharp teeth. It is an eldritch horror. It's at this point that I'd like to point out the simple, obvious fact that "we don't actually know how these models work, and we definitely don't know that they're creepy and dangerous on the inside."
It's optimized to illustrate the point that the neural network isn't trained to actually care about what the person training it thinks it came to care about, it's only optimized to act that way on the training distribution. Unless I'm missing something, arguing the image is wrong would be equivalent to arguing that maybe the model truly cares about what its human trainers want it to care about. (Which we know isn't actually the case.)
Well. Their actual performance is human-like, as long as they're using GPT-4 and have a right prompt. I've talked to such bots.
In any case, the topic is about what future AIs will do, so, by definition, we're speculating about the future.
They're accused, not whistleblowers. They can't retroactively gain the right to anonymity, since their identities have already been revealed.
They could argue that they became whistleblowers as well, and so they should be retroactively anonymized, but that would interfere with the first whistleblowing accusation (there is no point in whistleblowing against anonymous people), and also they're (I assume) in a position of comparative power here.
There could be a second whistleblowing accusation made by them (but this time anonymously) against the (this time) deanonymized accuser, but given their (I assume) higher social power, that doesn't seem appropriate.
I agree that it feels wrong to reveal the identities of Alice and/or Chloe without concrete evidence of major wrongdoing, but I don't think we have a good theoretical framework for why that is.
Ethically (and pragmatically), you want whistleblowers to have the right to anonymity, or else you'll learn of much less wrongdoing that you would otherwise, and because whistleblowers are (usually) in a position of lower social power, so anonymity is meant to compensate for that, I suppose.
I will not "make friends" with an appliance.
That's really substratist of you.
But in any case, the toaster (working in tandem with the LLM "simulating" the toaster-AI-character) will predict that and persuade you some other way.