Posts
Comments
Well, a thing that acts like us in one particular situation (say, a thing that types "I'm conscious" in chat) clearly doesn't always have our qualia. Maybe you could say that a thing that acts like us in all possible situations must have our qualia?
Right, that's what I meant.
This is philosophically interesting!
Thank you!
It makes a factual question (does the thing have qualia right now?) logically depend on a huge bundle of counterfactuals, most of which might never be realized.
The I/O behavior being the same is a sufficient condition for it to be our mind upload. A sufficient condition for it to have some qualia, as opposed for it to have our mind and our qualia, will be weaker.
What if, during uploading, we insert a bug that changes our behavior in one of these counterfactuals
Then it's, to a very slight extent, another person (with the continuum between me and another person being gradual).
but then the upload never actually runs into that situation in the course of its life - does the upload still have the same qualia as the original person, in situations that do get realized?
Then the qualia would be very slightly different, unless I'm missing something. (To bootstrap the intuition, I would expect my self that chooses vanilla ice-cream over chocolate icecream in one specific situation to have very slightly different feelings and preferences in general, resulting in very slightly different qualia, even if he never encounters that situation.) With many such bugs, it would be the same, but to a greater extent.
If there's a thought that you sometimes think, but it doesn't influence your I/O behavior, it can get optimized away
I don't think such thoughts exist (I can always be asked to say out loud what I'm thinking). Generally, I would say that a thought that never, even in principle, influences my output, isn't possible. (The same principle should apply to trying to replace a thought just by a few bits.)
This is not an obviously possible failure mode of uploads - it would require that you get uploaded correctly, but the computer doesn't feed you any sensory input and just keeps running your brain without it. Why would something like that happen?
It seems we cannot allow all behavior-preserving optimizations
We can use the same thought experiments that Chalmers uses to establish a fine-grain-functionally-isomorphic copy had the same qualia, modify them and show that anything that acts like us has our qualia.
The LLM character (rather than the LLM itself) will be conscious to the extent to which its behavior is I/O identical to the person.
Edit: Oh, sorry, this is an old comment. I got this recommended... somehow...
Edit2: Oh, it was curated yesterday.
There is no dependency on any specific hardware.
What's conscious isn't the mathematical structure itself but its implementation.
Check if it's not 4o - they've rolled it out for some/all users and it's used by default.
"we need to have the beginning of a hint of a design for a system smarter than a house cat"
You couldn't make a story about this, I swear.
Great article.
The second rule is to ask for permission.
Is this supposed to be "The second rule is to ask for forgiveness."?
Check out this page, it goes up to 2024.
Nobody would understand that.
This sort of saying-things-directly doesn't usually work unless the other person feels the social obligation to parse what you're saying to the extent they can't run away from it.
Correct me if I'm wrong, but I think we could apply the concept of logical uncertainty to metaphysics and then use Bayes' theorem to update depending on where our metaphysical research takes us, the way we can use it to update the probability of logically necessarily true/false statements.
Bayes' theorem is about the truth of propositions. Why couldn't it be applied to propositions about ontology?
However, this image is obviously optimized to be scary and disgusting. It looks dangerous, with long rows of sharp teeth. It is an eldritch horror. It's at this point that I'd like to point out the simple, obvious fact that "we don't actually know how these models work, and we definitely don't know that they're creepy and dangerous on the inside."
It's optimized to illustrate the point that the neural network isn't trained to actually care about what the person training it thinks it came to care about, it's only optimized to act that way on the training distribution. Unless I'm missing something, arguing the image is wrong would be equivalent to arguing that maybe the model truly cares about what its human trainers want it to care about. (Which we know isn't actually the case.)
Well. Their actual performance is human-like, as long as they're using GPT-4 and have a right prompt. I've talked to such bots.
In any case, the topic is about what future AIs will do, so, by definition, we're speculating about the future.
They're accused, not whistleblowers. They can't retroactively gain the right to anonymity, since their identities have already been revealed.
They could argue that they became whistleblowers as well, and so they should be retroactively anonymized, but that would interfere with the first whistleblowing accusation (there is no point in whistleblowing against anonymous people), and also they're (I assume) in a position of comparative power here.
There could be a second whistleblowing accusation made by them (but this time anonymously) against the (this time) deanonymized accuser, but given their (I assume) higher social power, that doesn't seem appropriate.
I agree that it feels wrong to reveal the identities of Alice and/or Chloe without concrete evidence of major wrongdoing, but I don't think we have a good theoretical framework for why that is.
Ethically (and pragmatically), you want whistleblowers to have the right to anonymity, or else you'll learn of much less wrongdoing that you would otherwise, and because whistleblowers are (usually) in a position of lower social power, so anonymity is meant to compensate for that, I suppose.
I will not "make friends" with an appliance.
That's really substratist of you.
But in any case, the toaster (working in tandem with the LLM "simulating" the toaster-AI-character) will predict that and persuade you some other way.
Of course, it would first make friends with you, so that you'd feel comfortable leaving up to it the preparation of your breakfast, and you'll even feel happy that you have such a thoughtful friend.
If you were to break the toaster for that, it would predict that and simply do it in a way that would actually work.
Unless you precommit to breaking all your AIs that will do anything differently from what you tell them to, no matter the circumstances and no matter how you feel about it in that moment.
Can you show what priors you used, how you calculated the posteriors, what numbers you got and where the input numbers came from? I highly doubt that hypothesis has a higher posterior probability.
The wifi hacking also immediately struck me as reminiscent of paranoid psychosis.
How hard is it to hack somebody's wifi?
Also, a traumatized person attributing a seemingly hacked wifi to their serious abuser doesn't need to mean any mental illness.
That's irrelevant. To see why one-boxing is important, we need to realize the general principle - that we can only impose a boundary condition on all computations-which-are-us (i.e. we can choose how both us and all perfect predictions of us choose, and both us and all the predictions have to choose the same). We can't impose a boundary condition only on our brain (i.e. we can't only choose how our brain decides while keeping everything else the same). This is necessarily true.
Without seeing this (and therefore knowing we should one-box), or even while being unaware of this principle altogether, there is no point in trying to have a "debate" about it.
What do I get out of any of this?
If Bob asked this question, it would show he's misunderstanding the point of Alice's critique - unless I'm missing something, she claims he should, morally speaking, act differently.
Responding "What do I get out of any of this?" to that kind of critique is either a misunderstanding, or a rejection of morality ("I don't care if I should be, morally speaking, doing something else, because I prefer to maximize my own utility.").
Edit: Or also, possibly, a rejection of Alice ("You are so annoying that I'll pretend this conversation is about something else to make you go away.").
The author shares how terrible it feels that X is true, without bringing arguments for X being true in the first place (based on me skimming the post). That can bypass the reader's fact-check (because why would he write about how bad it made him feel that X is true if it wasn't?).
It feels to me like he's trying to combine an emotional exposition (no facts, talking about his feelings) with an expository blogpost (explaining a topic), while trying to grab the best of both worlds (the persuasiveness and emotions of the former and the social status of the latter) without the substance to back it up.
omnizoid's post as an example of where not to take EY's side was poorly chosen. He two-boxes on Newcomb's problem and any confident statements he makes about rationality or decision theory should be, for that reason, ignored entirely.
Of course, you go meta, without claiming that he's object-level right, but I'm not sure using an obviously wrong post to take his side on the meta level is a good idea.
To understand why FDT is true, it's best to start with Newcomb's problem. Since you believe you should two-box, it might be best to debate the Newcomb's problem with somebody first. Debating FDT at this stage seems like a waste of time for both parties.
(The visual fidelity is a very small fraction of what we actually think it is - the brain lies to us about how much we perceive.)
CharacterAI bots don't show as public until some condition is fulfilled, which I don't remember right now.
ChatGPT-3.5 (when prompted the perfect way) outperforms CharacterAI, I assume GPT-4 with the right prompt would as well, you might check those options out. (I haven't tried a therapist specifically, though.)
That depends on how we define "information" - for one definition of information, qualia are information (and also everything else is, since we can only recognize something by the pattern it presents to us).
But for another definition of information, there is a conceptual difference - for example, morphine users report knowing they are in pain, but not feeling the quale of pain.
The "purpose" of most martial arts is to defeat other martial artists of roughly the same skill level, within the rules of the given martial art.
This is false - the reason they were created was self-defense. That you can have people of similar weight and belt color spar/fight each other in contests is only a side effect of that.
"Beginner's luck" is a thing in almost all games. It's usually what happens when someone tries a strategy so weird that the better player doesn't immediately understand what's going on.
That doesn't work in chess if the difference in skill is large enough - if it did, anyone could simply make up strategies weird enough, and without any skill, win any title or even the World Chess Championship (where is the number of victories needed).
If you're saying it works as a matter of random fluctuations - i.e. a player without skill could win, let's say, games against Magnus Carlsen, because these strategies (supposedly) usually almost never work but sometimes they do, that wouldn't be useful against an AI, because it would still almost certainly win (or, more realistically, I think, simply model us well enough to know when we'd try the weird strategy).
Two points of order, without going into any specific accusations or their absence:
- The post is transphobic, which anticorrelates with being correct/truthful/objective.
- It seems optimized for smoothness/persuasion, which, based on my experience, also anticorrelates with both truth and objectivity.
- That's seemingly quite a convincing reason why you can't be born too early. But what occurs to me now is that the problem can be about where you are, temporally, in relation to other people. (So you were still born on the same day, but depending on the entire size of the civilization (), the probability of you having people precede you is .)
- Depending on how "anthropic problem" is defined, that could potentially be true either for all, or for some anthropic problems.
How does one tell if they "are trans"
If introspection doesn't help, maybe a specialized therapist would. I can't offer any rationalist-level advice on how to find out if you're transgender that you couldn't google yourself. Good luck finding out, however.
Edit: It seems that whoever downvoted this completely correct comment has been psychologically empowered by the barrage of the transphobic crackpot comments in this thread, but it seems such is the price for having an anonymous upvote/downvote system.
Autogynephylia and being transgender are two distinct phenomena, the latter being caused by the brain of the opposite gender.
Experiencing the former doesn't mean the latter is also secretly the former.
Surely you're not saying that the point of arresting him is to prevent him from winning an election. Surely you're not saying that.
There are many points to arresting criminals. Making it harder for them to amass power is one of them, of which winning an election is a subset.
Do you think such a taboo is likely to increase or decrease the risk from dictators taking over?
Increase, for the reasons I enumerated.
Maybe you could claim that would-be dictators are more likely than good candidates to have committed crimes
That's one factor, yes.
On the other hand, if there is no such taboo, then a dictator who has already been elected is more likely to appoint cronies who will prosecute his political opponents for whatever might stick to them—even if they don't stick, the prosecution itself can be damaging and onerous.
That doesn't work for the reasons I gave (to very briefly repeat them - the dictator will not respect any informal taboos (or formal ones, for that matter)).
Aside from not respecting informal taboos, he will be helped by other Republicans in other branches of the government to get away with both overstepping his authority and committing outright crimes.
The idea you're describing is the exact opposite of how social interaction and the system of power work, and has been generated and released into the wild by bad-faith actors who are invested in people falsely believing in them (another one is "we have to give the Babyeaters a platform and host their hate speech on our servers, so that people can see how terrible they are and stop supporting them," which also works, in reality, the other way around).
Do you simultaneously know what it's like when something looks red, and also believe that you don't have qualia?
Are you taking into account the simulated exams? It doesn't look like it mostly generates false facts?
It seemed to be key that I fight well by some metrics
That couldn't be the case - that would leave you, even after having a black belt, vulnerable towards people who can't fight, which would defeat the purpose of martial arts. Whichever technique you use, you use when responding to what the other person is currently doing. You don't simply execute a technique that depends on the person fighting well by some metrics, and then get defeated when it turns out that they are, in fact, only in the 0.001st percentile of fighting well by any metrics we can imagine.
(That said, I'm really happy for your victories - maybe they weren't quite as well-trained.)
This has me wonder whether an AI would have significant difficulties winning against humans who act inconsistently and suboptimally in some ways, without acting like utter idiots randomly all the time
I'm thinking the AI would predict the way in which the other person would act inconsistently and suboptimally.
If there were multiple paths to victory for the human and the AI could block only one (thereby seemingly giving the human the option to out-random the AI by picking one of the unguarded paths to victory), the AI would be better at predicting the human than the human would be at randomizing.
People are terrible at being unpredictable. I remember a 10+ years-old predictor of a rock-paper-scissors for predicting a "random" decision of a human in a series of games. The humans had no chance.
I found two statements in the article that I think are well-defined enough and go into your argument:
- "The birth rank discussion isn't about if I am born slightly earlier or later."
How do you know? I think it's exactly about that. I have probability of being born within the first of all humans (assuming all humans are the correct reference class - if they're not, the problem isn't in considering ourselves a random person from a reference class, but choosing the wrong reference class).
2. "Nobody can be born more than a few months away from their actual birthday."
When reasoning probabilistically, we can imagine other possible worlds. We're not talking about something being the case while at the same time not being the case. We imagine other possible worlds (created by the same sampling process that created our world) and compare them to ours. In some of those possible worlds, we were born sooner or later.
That's true, but the definition of probability isn't inapplicable to everything. From that, in conjunction with us being able to make probabilistic predictions about ourselves, follows that we are a random member of at least one reference class, which means that our soul has been selected at random from all possible souls from a specific reference class (if that's what you meant by that).
By definition of probability, we can consider ourselves a random member of some reference class. (Otherwise, we couldn't make probabilistic predictions about ourselves.) The question is picking the right reference class.
Back in 2000s, the official version was that it's enough to ingest a pepper-grain-sized amount of the infected tissue to be infected with BSE, so maybe the decomposition of the proteins isn't perfect (in the sense that some molecules might not be taken apart). The ingestion of the tissue is still the official mode of transmission.
The separations here are not going to impress someone who thinks the Democrats are using the system to attack the enemy they hate.
We can't withdraw from arresting criminals and putting them on trial because conspiracy theorists will invent a different story in their minds. Unless...
I wouldn't extend this to all public figures, just those who are serious candidates for an election for leader of the country.
...Oh, so you meant like 5-10 people tops, in the country of 300M. I see.
On the surface, it's a very consequentialistic reasoning. Why arrest one criminal, if it can cause a revolution?
Of course, Trump has already been arrested, and the revolution hasn't happened, so this isn't the case here, apparently.
There are also three other problems:
- Local consequentialism - locally (both spatially and temporally) optimizing can have disastrous global effects. Today, we can't arrest an aspiring dictator. And so he, in 5 years, wins the election. Now he's in power, and we have the problem we hoped to avoid. If we arrested him, there would be localized violent disturbances, but his ascension to power would've been slower, if he ever became elected in the first place. (The general pattern is that the less you oppose evil to avoid undesirable externalities, the faster and the more power the evil gains.)
- Incentivizing self-modification against your values - if we reward people willing to invent conspiracy theories in their heads by not arresting their ideological leader, they are, both consciously and subconsciously, motivated to do just that, because they know you will back down.
- The peaceful evolution, enabled by democracy, only refers to the kind of enabling where gaining the power is lawful. If I'm elected because you are too scared to arrest me, that's a (non-violent) revolution, rather than a peaceful transition of power. Of course, you could say that doesn't matter because the goal here is to avoid violence. But that brings us back to the problems (1) and (2). Also, the violence (a bloody revolution is very unlikely) will happen in both branches of the future. The difference is that in the no-arrest one, you signaled that you will rather let evil win by inaction (and then be victimized) rather than arrest the would-be-dictator (and be victimized afterwards anyway). You won't end up being actually better off. Rather, you'll end up being worse off, since you've now set the precedent that you will ignore the law under the threat of violence. (Which is sometimes a good idea, but in this case it's not.)
there should be such a strong taboo against arresting political opponents that one just doesn't do it
Nobody arrested their political opponent (the politicians aren't the ones doing the arresting).
Also, why not? Shouldn't it be more important if a crime was committed, rather than if someone is someone else's political opponent? Why should public figures have immunity from being arrested unless >80% of the population agrees?
Yes, but the game is very easy, so a lot of different strategies get you close to the cap.
I've been thinking about it, and I'm not sure if this is the case in the sense you mean it - expected money maximization doesn't reflect human values at all, white Kelly criterion mostly does, so if we make our assumptions more realistic, it should move us away from expected money maximization and towards the Kelly criterion, as opposed to moving us the other way.
I'd consider that to be exploitation. In addition to that, too-easy-to-win bets make me wary of something unpredictable going wrong.
I'm sure that this time around, it's definitely real aliens. Or, barring that, magic or time travel.
nonhumans
(You might want to exclude advanced/experimental AI models from that, to capture the spirit of the bet better.)
People will judge this question, like many others, based on their feelings. The AI person, summoned into existence by the language model, will have to be sufficiently psychologically and emotionally similar to a human, while also having above-average-human-level intelligence (so that people can look up to the character instead of merely tolerating it).
Leaving aside the question whether the technology for creating such an AI character already exists or not, these, I think, will ultimately be the criteria that will be used by people of somewhat-above-average intelligence and zero technical and philosophical knowledge (i.e. our lawmakers) to grant AIs rights.
Haven't whistleblowers talking about how the government has alien spaceships always been a thing?
It would bring on an enormous amount of new evidence, since the position of the orthogonality thesis is so strong (rather than arguing from some vague and visibly false philosophical assumptions).