How "Pause AI" advocacy could be net harmful 2023-12-26T16:19:20.724Z
OpenAI, DeepMind, Anthropic, etc. should shut down. 2023-12-17T20:01:22.332Z
Alignment work in anomalous worlds 2023-12-16T19:34:26.202Z
Some biases and selection effects in AI risk discourse 2023-12-12T17:55:15.759Z
How LDT helps reduce the AI arms race 2023-12-10T16:21:44.409Z
We're all in this together 2023-12-05T13:57:46.270Z
So you want to save the world? An account in paladinhood 2023-11-22T17:40:33.048Z
an Evangelion dialogue explaining the QACI alignment plan 2023-06-10T03:28:47.096Z
formalizing the QACI alignment formal-goal 2023-06-10T03:28:29.541Z
Orthogonal's Formal-Goal Alignment theory of change 2023-05-05T22:36:14.883Z
Orthogonal: A new agent foundations alignment organization 2023-04-19T20:17:14.174Z
continue working on hard alignment! don't give up! 2023-03-24T00:14:35.607Z
the QACI alignment plan: table of contents 2023-03-21T20:22:00.865Z
your terminal values are complex and not objective 2023-03-13T13:34:01.195Z
the quantum amplitude argument against ethics deduplication 2023-03-12T13:02:31.876Z
QACI: the problem of blob location, causality, and counterfactuals 2023-03-05T14:06:09.372Z
state of my alignment research, and what needs work 2023-03-03T10:28:34.225Z
Hello, Elua. 2023-02-23T05:19:07.246Z
a narrative explanation of the QACI alignment plan 2023-02-15T03:28:34.710Z
so you think you're not qualified to do technical alignment research? 2023-02-07T01:54:51.952Z
formal alignment: what it is, and some proposals 2023-01-29T11:32:33.239Z
to me, it's instrumentality that is alienating 2023-01-27T18:27:19.062Z
one-shot AI, delegating embedded agency and decision theory, and one-shot QACI 2022-12-23T04:40:31.880Z
our deepest wishes 2022-12-20T00:23:32.892Z
all claw, no world — and other thoughts on the universal distribution 2022-12-14T18:55:06.286Z
a rough sketch of formal aligned AI using QACI 2022-12-11T23:40:37.536Z
Tamsin Leake's Shortform 2022-11-20T18:25:17.811Z
logical vs indexical dignity 2022-11-19T12:43:03.033Z
generalized wireheading 2022-11-18T20:18:53.664Z
fully aligned singleton as a solution to everything 2022-11-12T18:19:59.378Z
a casual intro to AI doom and alignment 2022-11-01T16:38:31.230Z
publishing alignment research and exfohazards 2022-10-31T18:02:14.047Z
love, not competition 2022-10-30T19:44:46.030Z
QACI: question-answer counterfactual intervals 2022-10-24T13:08:54.457Z
some simulation hypotheses 2022-10-12T13:34:51.780Z
confusion about alignment requirements 2022-10-06T10:32:49.779Z
my current outlook on AI risk mitigation 2022-10-03T20:06:48.995Z
existential self-determination 2022-09-27T16:08:46.997Z
ordering capability thresholds 2022-09-16T16:36:59.172Z
ethics and anthropics of homomorphically encrypted computations 2022-09-09T10:49:08.316Z
program searches 2022-09-05T20:04:17.916Z
everything is okay 2022-08-23T09:20:33.250Z
PreDCA: vanessa kosoy's alignment protocol 2022-08-20T10:03:10.701Z
goal-program bricks 2022-08-13T10:08:41.532Z
the Insulated Goal-Program idea 2022-08-13T09:57:47.251Z
The Peerless 2022-04-13T01:07:08.767Z
Nantes, France – ACX Meetups Everywhere 2021 2021-08-23T08:46:08.899Z


Comment by Tamsin Leake (carado-1) on What convincing warning shot could help prevent extinction from AI? · 2024-04-14T06:20:25.187Z · LW · GW

There's also the case of harmful warning shots: for example, if it turns out that, upon seeing an AI do a scary but impressive thing, enough people/orgs/states go "woah, AI is powerful, I should make one!" or "I guess we're doomed anyways, might as well stop thinking about safety and just enjoy making profit with AI while we're still alive", to offset the positive effect. This is totally the kind of thing that could be the case in our civilization.

Comment by Tamsin Leake (carado-1) on Tamsin Leake's Shortform · 2024-04-14T04:30:06.718Z · LW · GW

There could be a difference but only after a certain point in time, which you're trying to predict / plan for.

Comment by Tamsin Leake (carado-1) on Tamsin Leake's Shortform · 2024-04-13T20:20:10.761Z · LW · GW

What you propose, ≈"weigh indices by kolmogorov complexity" is indeed a way to go about picking indices, but "weigh indices by one over their square" feels a lot more natural to me; a lot simpler than invoking the universal prior twice.

Comment by Tamsin Leake (carado-1) on Tamsin Leake's Shortform · 2024-04-13T17:52:44.269Z · LW · GW

If you use the UTMs for cartesian-framed inputs/outputs, sure; but if you're running the programs as entire worlds, then you still have the issue of "where are you in time".

Say there's an infinitely growing conway's-game-of-life program, or some universal program, which contains a copy of me at infinitely many locations. How do I weigh which ones are me?

It doesn't matter that the UTM has a fixed amount of weight, there's still infinitely many locations within it.

Comment by Tamsin Leake (carado-1) on Tamsin Leake's Shortform · 2024-04-13T11:47:48.333Z · LW · GW

(cross-posted from my blog)

Is quantum phenomena anthropic evidence for BQP=BPP? Is existing evidence against many-worlds?

Suppose I live inside a simulation ran by a computer over which I have some control.

  • Scenario 1: I make the computer run the following:

    pause simulation
    if is even(calculate billionth digit of pi):
    	resume simulation

    Suppose, after running this program, that I observe that I still exist. This is some anthropic evidence for the billionth digit of pi being even.

    Thus, one can get anthropic evidence about logical facts.

  • Scenario 2: I make the computer run the following:

      pause simulation
      if is even(calculate billionth digit of pi):
      	resume simulation
      	resume simulation but run it a trillion times slower

    If you're running on the non-time-penalized solomonoff prior, then that's no evidence at all — observing existing is evidence that you're being ran, not that you're being ran fast. But if you do that, a bunch of things break including anthropic probabilities and expected utility calculations. What you want is a time-penalized (probably quadratically) prior, in which later compute-steps have less realityfluid than earlier ones — and thus, observing existing is evidence for being computed early — and thus, observing existing is some evidence that the billionth digit of pi is even.

  • Scenario 3: I make the computer run the following:

      pause simulation
      quantum_algorithm <- classical-compute algorithm which simulates quantum algorithms the fastest
      infinite loop:
      	use quantum_algorithm to compute the result of some complicated quantum phenomena
      	compute simulation forwards by 1 step

    Observing existing after running this program is evidence that BQP=BPP — that is, classical computers can efficiently run quantum algorithms: if BQP≠BPP, then my simulation should become way slower, and existing is evidence for being computed early and fast (see scenario 2).

    Except, living in a world which contains the outcome of cohering quantum phenomena (quantum computers, double-slit experiments, etc) is very similar to the scenario above! If your prior for the universe is a programs, penalized for how long they take to run on classical computation, then observing that the outcome of quantum phenomena is being computed is evidence that they can be computed efficiently.

  • Scenario 4: I make the computer run the following:

      in the simulation, give the human a device which generates a sequence of random bits
      pause simulation
      list_of_simulations <- [current simulation state]
      quantum_algorithm <- classical-compute algorithm which simulates quantum algorithms the fastest
      infinite loop:
      	list_of_new_simulations <- []
      	for simulation in list_of_simulations:
      		list_of_new_simulations += 
      			[ simulation advanced by one step where the device generated bit 0,
      			  simulation advanced by one step where the device generated bit 1 ]
      	list_of_simulations <- list_of_new_simulations

    This is similar to what it's like to being in a many-worlds universe where there's constant forking.

    Yes, in this scenario, there is no "mutual destruction", the way there is in quantum. But with decohering everett branches, you can totally build exponentially many non-mutually-destructing timelines too! For example, you can choose to make important life decisions based on the output of the RNG, and end up with exponentially many different lives each with some (exponentially little) quantum amplitude, without any need for those to be compressible together, or to be able to mutually-destruct. That's what decohering means! "Recohering" quantum phenomena interacts destructively such that you can compute the output, but decohering* phenomena just branches.

    The amount of different simulations that need to be computed increases exponentially with simulation time.

    Observing existing after running this program is very strange. Yes, there are exponentially many me's, but all of the me's are being ran exponentially slowly; they should all not observe existing. I should not be any of them.

    This is what I mean by "existing is evidence against many-worlds" — there's gotta be something like an agent (or physics, through some real RNG or through computing whichever variables have the most impact) picking a only-polynomially-large set of decohered non-compressible-together timelines to explain continuing existing.

    Some friends tell me "but tammy, sure at step N each you has only 1/2^N quantum amplitude, but at step N there's 2^N such you's, so you still have 1 unit of realityfluid" — but my response is "I mean, I guess, sure, but regardless of that, step N occurs 2^N units of classical-compute-time in the future! That's the issue!".

Some notes:

  • I heard about pilot wave theory recently, and sure, if that's one way to get single history, why not. I hear that it "doesn't have locality", which like, okay I guess, that's plausibly worse program-complexity wise, but it's exponentially better after accounting for the time penalty.

  • What if "the world is just Inherently Quantum"? Well, my main answer here is, what the hell does that mean? It's very easy for me to imagine existing inside of a classical computation (eg conway's game of life); I have no idea what it'd mean for me to exist in "one of the exponentially many non-compressible-together decohered exponenially-small-amplitude quantum states that are all being computed forwards". Quadratically-decaying-realityfluid classical-computation makes sense, dammit.

  • What if it's still true — what if I am observing existing with exponentially little (as a function of the age of the universe) realityfluid? What if the set of real stuff is just that big?

    Well, I guess that's vaguely plausible (even though, ugh, that shouldn't be how being real works, I think), but then the tegmark 4 multiverse has to contain no hypotheses in which observers in my reference class occupy more than exponentially little realityfluid.

    Like, if there's a conway's-game-of-life simulation out there in tegmark 4, whose entire realityfluid-per-timestep is equivalent to my realityfluid-per-timestep, then they can just bruteforce-generate all human-brain-states and run into mine by chance, and I should have about as much probability of being one of those random generations as I'd have being in this universe — both have exponentially little of their universe's realityfluid! The conway's-game-of-life bruteforced-me has exponentially little realityfluid because she's getting generated exponentially late, and quantum-universe me has exponentially little realityfluid because I occupy exponentially little of the quantum amplitude, at every time-step.

    See why that's weird? As a general observer, I should exponentially favor observing being someone who lives in a world where I don't have exponentially little realityfluid, such as "person who lives only-polynomially-late into a conway's-game-of-life, but happened to get randomly very confused about thinking that they might inhabit a quantum world".

Existing inside of a many-worlds quantum universe feels like aliens pranksters-at-orthogonal-angles running the kind of simulation where the observers inside of it to be very anthropically confused once they think about anthropics hard enough. (This is not my belief.)

Comment by Tamsin Leake (carado-1) on LessWrong's (first) album: I Have Been A Good Bing · 2024-04-01T11:56:22.973Z · LW · GW

I didn't see a clear indication in the post about whether the music is AI-generated or not, and I'd like to know; was there an indication I missed?

(I care because I'll want to listen to that music less if it's AI-generated.)

Comment by Tamsin Leake (carado-1) on On expected utility, part 1: Skyscrapers and madmen · 2024-03-31T06:14:21.947Z · LW · GW

Unlike on your blog, the images on the lesswrong version of this post are now broken.

Comment by Tamsin Leake (carado-1) on Orthogonality Thesis seems wrong · 2024-03-25T13:13:59.811Z · LW · GW

Taboo the word "intelligence".

An agent can superhumanly-optimize any utility function. Even if there are objective values, a superhuman-optimizer can ignore them and superhuman-optimize paperclips instead (and then we die because it optimized for that harder than we optimized for what we want).

Comment by Tamsin Leake (carado-1) on Tamsin Leake's Shortform · 2024-03-21T09:44:37.430Z · LW · GW

(I'm gonna interpret these disagree-votes as "I also don't think this is the case" rather than "I disagree with you tamsin, I think this is the case".)

Comment by Tamsin Leake (carado-1) on Tamsin Leake's Shortform · 2024-03-20T13:25:50.633Z · LW · GW

I don't think this is the case, but I'm mentioning this possibility because I'm surprised I've never seen someone suggest it before:

Maybe the reason Sam Altman is taking decisions that increase p(doom) is because he's a pure negative utilitarian (and he doesn't know-about/believe-in acausal trade).

Comment by Tamsin Leake (carado-1) on Toki pona FAQ · 2024-03-18T14:40:48.072Z · LW · GW

For writing, there's also jan misali's ASCII toki pona syllabary.

Comment by Tamsin Leake (carado-1) on Tamsin Leake's Shortform · 2024-03-16T17:56:09.067Z · LW · GW

Reposting myself from discord, on the topic of donating 5000$ to EA causes.

if you're doing alignment research, even just a bit, then the 5000$ are plobly better spent on yourself

if you have any gears level model of AI stuff then it's better value to pick which alignment org to give to yourself; charity orgs are vastly understaffed and you're essentially contributing to the "picking what to donate to" effort by thinking about it yourself

if you have no gears level model of AI then it's hard to judge which alignment orgs it's helpful to donate to (or, if giving to regranters, which regranters are good at knowing which alignment orgs to donate to)

as an example of regranters doing massive harm: openphil gave 30M$ to openai at a time where it was critically useful to them, (supposedly in order to have a chair on their board, and look how that turned out when the board tried to yeet altman)

i know of at least one person who was working in regranting and was like "you know what i'd be better off doing alignment research directly" — imo this kind of decision is probly why regranting is so understaffed

it takes technical knowledge to know what should get money, and once you have technical knowledge you realize how much your technical knowledge could help more directly so you do that, or something

Comment by Tamsin Leake (carado-1) on More people getting into AI safety should do a PhD · 2024-03-15T09:41:37.784Z · LW · GW

yes, edited

Comment by Tamsin Leake (carado-1) on More people getting into AI safety should do a PhD · 2024-03-15T05:42:02.441Z · LW · GW

So this option looks unattractive if you think transformative AI systems are likely to developed within the next 5 years. However, with a 10-years timeframe things look much stronger: you would still have around 5 years to contribute as a research.

This phrasing is tricky! If you think TAI is coming in approximately 10 years then sure, you can study for 5 years and then do research for 5 years.

But if you think TAI is coming within 10 years (for example, if you think that the current half-life on worlds surviving is 10 years; if you think 10 years is the amount of time in which half of worlds are doomed) then depending on your distribution-over-time you should absolutely not wait 5 years before doing research, because TAI could happen in 9 years but it could also happen in 1 year. If you think TAI is coming within 10 years, then (depending on your distribution) you should still in fact do research asap.

(People often get this wrong! They think that "TAI probably within X years" necessarily means "TAI in approximately X years".)

Comment by Tamsin Leake (carado-1) on Tamsin Leake's Shortform · 2024-03-12T08:39:25.202Z · LW · GW

Sure, this is just me adapting the idea to the framing people often have, of "what technique can you apply to an existing AI to make it safe".

Comment by Tamsin Leake (carado-1) on Tamsin Leake's Shortform · 2024-03-12T08:06:21.396Z · LW · GW

AI safety is easy. There's a simple AI safety technique that guarantees that your AI won't end the world, it's called "delete it".

AI alignment is hard.

Comment by Tamsin Leake (carado-1) on 0th Person and 1st Person Logic · 2024-03-10T05:33:04.864Z · LW · GW

I'm confused about why 1P-logic is needed. It seems to me like you could just have a variable X which tracks "which agent am I" and then you can express things like sensor_observes(X, red) or is_located_at(X, northwest). Here and Absent are merely a special case of True and False when the statement depends on X.

Comment by Tamsin Leake (carado-1) on Tamsin Leake's Shortform · 2024-03-09T20:59:25.892Z · LW · GW

Moral patienthood of current AI systems is basically irrelevant to the future.

If the AI is aligned then it'll make itself as moral-patient-y as we want it to be. If it's not, then it'll make itself as moral-patient-y as maximizes its unaligned goal. Neither of those depend on whether current AI are moral patients.

Comment by Tamsin Leake (carado-1) on Tamsin Leake's Shortform · 2024-03-09T16:34:29.210Z · LW · GW

If my sole terminal value is "I want to go on a rollercoaster", then an agent who is aligned to me would have the value "I want Tamsin Leake to go on a rollercoaster", not "I want to go on a rollercoaster myself". The former necessarily-has the same ordering over worlds, the latter doesn't.

Comment by Tamsin Leake (carado-1) on Let's build definitely-not-conscious AI · 2024-03-08T00:03:46.067Z · LW · GW
  • I think the term "conscious" is very overloaded and the source of endless confusion and should be tabood. I'll be answering as if the numbers are not "probability(-given-uncertainty) of conscious" but "expected(-given-uncertainty) amount of moral patienthood", calibrated with 1 meaning "as much as a human" (it could go higher — some whales have more neurons/synapses than humans and so they might plausibly be more of a moral patient than humans, in the sense that in a trolley problem you should prefer to save 1000 such whales to 1001 humans).
  • Besides the trivia I just mentioned about whales, I'm answering this mostly on intuition, without knowing off the top of my head (nor looking up) the amount of neurons/synapses. Not to imply that moral patienthood is directly linear to amount of neurons/synapses, but I expect that that amount probably matters to my notion of moral patienthood.
  • I'll also assume that everyone has a "normal amount of realityfluid" flowing through them (rather than eg being simulated slower, or being fictional, or having "double-thick neurons made of gold" in case that matters).

First list: 1, 1, 1, .7, 10⁻², 10⁻³, 10⁻⁶, 10⁻⁶, 10⁻⁸, ε, ε, ε, ε, ε.

Second list: .6, .8, .7, .7, .6, .6, .5, ε, ε, ε, ε.

Edit: Thinking about it more, something feels weird here, like these numbers don't track at all "how many of these would make me press the lever on the trolley problem vs 1 human" — for one, killing a sleeping person is about as bad as killing an awake person because like the sleeping person is a temporarily-paused-backup for an awake person. I guess I should be thinking about "the universe has budget for one more hour of (good-)experience just before heat death, but it needs to be all same species, how much do I value each?" or something.

Comment by Tamsin Leake (carado-1) on Even if we lose, we win · 2024-01-15T08:38:17.753Z · LW · GW

If you start out with CDT, then the thing you converge to is Son of CDT rather than FDT.
(that arbital page takes a huge amount of time to load for me for some reason, but it does load eventually)

And I could totally see the thing that kills us {being built with} or {happening to crystallize with} CDT rather than FDT.

We have to actually implement/align-the-AI-to the correct decision theory.

Comment by Tamsin Leake (carado-1) on Even if we lose, we win · 2024-01-15T08:32:21.455Z · LW · GW

By thinking about each other's source code, FAI and Clippy will be able to cooperate acausally like Alice and Bob, each turning their future lightcone into 10% utopia, 90% paperclips. Therefore, we get utopia either way! :D

So even if we lose we win, but even if we win we lose. The amount of utopiastuff is exactly conserved, and launching unaligned AI causes timelines-where-we-win to have less utopia by exactly as much as our timeline has more utopia.

The amount of utopiastuff we get isn't just proportional to how much we solve alignment, it's actually back to exactly equal.

See also: Decision theory does not imply that we get to have nice things.

Comment by Tamsin Leake (carado-1) on Which investments for aligned-AI outcomes? · 2024-01-06T13:56:17.233Z · LW · GW

I think it's exceedingly unlikely (<1%) that we robustly prevent anyone from {making an AI that kills everyone} without an aligned sovereign.

Comment by Tamsin Leake (carado-1) on Which investments for aligned-AI outcomes? · 2024-01-06T10:40:32.742Z · LW · GW

I continue to think that, in worlds where we robustly survive, money is largely going to be obsolete. The thing that maximizes the terminal values of the kind of (handshake of) utility functions we can expect probably aren't maximized by maintaining current allocations of wealth and institutions-that-care-about-that-wealth. The use for money/investment/resources is making sure we get utopia in the first place, by slowing capabilities and solving alignment (and thus also plausibly purchasing shares of the LDT utility function handshake), not being rich in utopia. (maybe see also 1, 2)

Comment by Tamsin Leake (carado-1) on Survey of 2,778 AI authors: six parts in pictures · 2024-01-06T10:37:51.432Z · LW · GW

‘high level machine intelligence’ (HLMI) and ‘full automation of labor’ (FAOL)

I continue to believe that predicting things like that is not particularly useful to predicting when AI will achieve decisive strategic advantage and/or kill literally everyone. AI could totally kill literally everyone without us ever getting to observe HLMI or FAOL first, and I think development in HLMI / FAOL does not say much about how close we are to AI that kills literally everyone.

Comment by Tamsin Leake (carado-1) on Does AI care about reality or just its own perception? · 2024-01-05T07:58:33.164Z · LW · GW

Both are possible. For theoretical examples, see the stamp collector for consequentialist AI and AIXI for reward-maximizing AI.

What kind of AI are the AIs we have now? Neither, they're not particularly strong maximizers. (if they were, we'd be dead; it's not that difficult to turn a powerful reward maximizer into a world-ending AI).

If the former, I think this makes alignment much easier. As long as you can reasonably represent “do not kill everyone”, you can make this a goal of the AI, and then it will literally care about not killing everyone, it won’t just care about hacking its reward system so that it will not perceive everyone being dead.

This would be true, except:

  • We don't know how to represent "do not kill everyone"
  • We don't know how to pick which quantity would be maximized by a would-be strong consequentialist maximizer
  • We don't know know what a strong consequentialist maximizer would look like, if we had one around, because we don't have one around (because if we did, we'd be dead)
Comment by Tamsin Leake (carado-1) on Tamsin Leake's Shortform · 2024-01-04T07:55:28.917Z · LW · GW

The first one. Alice fundamentally can't fully model Bob because Bob's brain is as large as Alice's, so she can't fit it all inside her own brain without simply becoming Bob.

Comment by Tamsin Leake (carado-1) on Tamsin Leake's Shortform · 2023-12-27T11:25:01.701Z · LW · GW

I remember a character in Asimov's books saying something to the effect of

It took me 10 years to realize I had those powers of telepathy, and 10 more years to realize that other people don't have them.

and that quote has really stuck with me, and keeps striking me as true about many mindthings (object-level beliefs, ontologies, ways-to-use-one's-brain, etc).

For so many complicated problem (including technical problems), "what is the correct answer?" is not-as-difficult to figure out as "okay, now that I have the correct answer: how the hell do other people's wrong answers mismatch mine? what is the inferential gap even made of? what is even their model of the problem? what the heck is going on inside other people's minds???"

Answers to technical questions, once you have them, tend to be simple and compress easily with the rest of your ontology. But not models of other people's minds. People's minds are actually extremely large things that you fundamentally can't fully model and so you're often doomed to confusion about them. You're forced to fill in the details with projection, and that's often wrong because there's so much more diversity in human minds than we imagine.

The most complex software engineering projects in the world are absurdly tiny in complexity compared to a random human mind.

Comment by Tamsin Leake (carado-1) on How "Pause AI" advocacy could be net harmful · 2023-12-26T18:39:40.703Z · LW · GW

I don't think it's a binary; they could still pay less attention!

(plausibly there's a bazillion things constantly trying to grab their attention, so they won't "lock on" if we avoid bringing AI to their attention too much)

Comment by Tamsin Leake (carado-1) on Why does expected utility matter? · 2023-12-26T16:16:13.988Z · LW · GW

You might want to read this post (it's also on lesswrong but the images are broken there)

Comment by Tamsin Leake (carado-1) on Tamsin Leake's Shortform · 2023-12-25T11:48:11.676Z · LW · GW

(to be clear: this is more an amusing suggestion than a serious belief)

Comment by Tamsin Leake (carado-1) on Tamsin Leake's Shortform · 2023-12-24T09:07:58.520Z · LW · GW

By "vaguely like dath ilan" I mean the parts that made them be the kind of society that can restructure in this way when faced with AI risk. Like, even before AI risk, they were already very different from us.

Comment by Tamsin Leake (carado-1) on Tamsin Leake's Shortform · 2023-12-24T00:53:56.283Z · LW · GW

I'm pretty sure we just need one resimulation to save everyone; once we have located an exact copy of our history, it's cheap to pluck out anyone (including people dead 100 or 1000 years ago). It's a one-time cost.

Lossy resurrection is better than nothing but it doesn't feel as "real" to me. If you resurrect a dead me, I expect that she says "I'm glad I exist! But — at least as per my ontology and values — you shouldn't quite think of me as the same person as the original. We're probly quite different, internally, and thus behaviorally as well, when ran over some time."

Like, the full-history resimulation will surely still not allow you to narrow things down to one branch. You'd get an equivalence class of them, each of them consistent with all available information. Which, in turn, would correspond to a probability distribution over the rescuee's mind; not a unique pick.

I feel like I'm not quite sure about this? It depends on what quantum mechanics entails, exactly, I think. For example: if BQP = P, then there's "only a polynomial amount" of timeline-information (whatever that means!), and then my intuition tells me that the "our world serves as a checksum for the one true (macro-)timeline" idea is more likely to be a thing. But this reasoning is still quite heuristical. Plausibly, yeah, the best we get is a polynomially large or even exponentially large distribution.

That said, to get back to my original point, I feel like there's enough unknowns making this scenario plausible here, that some people who really want to get reunited with their loved ones might totally pursue aligned superintelligence just for a potential shot at this, whether their idea of reuniting requires lossless resurrection or not.

Comment by Tamsin Leake (carado-1) on Tamsin Leake's Shortform · 2023-12-23T22:48:15.902Z · LW · GW

(Let's call the dead person "rescuee" and the person who wants to resurrect them "rescuer".)

The procedure you describe is what I call "lossy resurrection". What I'm talking about looks like: you resimulate the entire history of the past-lightcone on a quantum computer, right up until the present, and then either:

  • You have a quantum algorithm for "finding" which branch has the right person (and you select that timeline and discard the rest) (requires that such a quantum algorithm exists)
  • Each branch embeds a copy of the rescuer, and whichever branch looks like correct one isekai's the rescuer into the branch, right next to the rescuee (and also insta-utopia's the whole branch) (requires that the rescuer doesn't mind having their realityfluid exponentially reduced)

(The present time "only" serves as a "solomonoff checksum" to know which seed / branch is the right one.)

This is O(exp(size of the seed of the universe) * amount of history between the seed and the rescuee). Doable if the seed of the universe is small and either of the two requirements above hold, and if the future has enough negentropy to resimulate the past. (That last point is a new source of doubt for me; I kinda just assumed it was true until a friend told me it might not be.)

(Oh, and also you can't do this if resimulating the entire history of the universe — which contains at least four billion years of wild animal suffering(!) — is unethical.)

Comment by Tamsin Leake (carado-1) on Tamsin Leake's Shortform · 2023-12-23T21:52:15.496Z · LW · GW

Take our human civilization, at the point in time at which we invented fire. Now, compute forward all possible future timelines, each right up until the point where it's at risk of building superintelligent AI for the first time. Now, filter for only timelines which either look vaguely like earth or look vaguely like dath ilan.

What's the ratio between the number of such worlds that look vaguely like earth vs look vaguely like dath ilan? 100:1 earths:dath-ilans ? 1,000,000:1 ? 1:1 ?

Comment by Tamsin Leake (carado-1) on Tamsin Leake's Shortform · 2023-12-23T19:29:31.658Z · LW · GW

Typical user of outside-view epistemics

(actually clipped from this YourMovieSucks video)

Comment by Tamsin Leake (carado-1) on The problems with the concept of an infohazard as used by the LW community [Linkpost] · 2023-12-23T11:25:55.375Z · LW · GW

tbh I kinda gave up on reaching people who think like this :/

My heuristic is that they have too many brainworms to be particularly helpful to the critical parts of worldsaving, and it feels like it'd be unpleasant and not-great-norms to have a part of my brain specialized in "manipulating people with biases/brainworms".

Comment by Tamsin Leake (carado-1) on The problems with the concept of an infohazard as used by the LW community [Linkpost] · 2023-12-23T08:56:54.425Z · LW · GW

Alright, I think I've figured out what my disagreement with this post is.

A field of research pursues the general endeavor of finding out things there are to know about a topic. It consists of building an accurate map of the world, of how-things-work, in general.

A solution to alignment is less like a field of research and more like a single engineering project. A difficult one, for sure! But ultimately, still a single engineering project, for which it is not necessary to know all the facts about the field, but only the facts that are useful.

And small groups/individuals do put together single engineering projects all the time! Including very large engineering projects like compilers, games & game engines, etc.

And, yes, we need solving alignment to be an at least partially nonpublic affair, because some important insights about how to solve alignment will be dual use, and the whole point is to get the people trying to save the world to succeed before the people functionally trying to kill everyone, not to get the people trying to save the world to theoretically succeed if they as much time as they wanted.

(Also: I believe this post means "exfohazard", not "infohazard")

Comment by Tamsin Leake (carado-1) on How Would an Utopia-Maximizer Look Like? · 2023-12-22T10:23:05.175Z · LW · GW

Being embedded in a fake reality and fooled into believing it's true would be against many people's preferences.

Strongly agree; I have an old, short post about this. See also Contact with reality.

Some people might (under reflection) be locally-caring entities, but most people's preferences are about what the reality actually contains and they (even under reflection) wouldn't want to, for example, press a button that cause them to mistakenly believe that everything is fine.

Comment by Tamsin Leake (carado-1) on Tamsin Leake's Shortform · 2023-12-21T11:51:05.751Z · LW · GW

I'm kinda bewildered at how I've never observed someone say "I want to build aligned superintelligence in order to resurrect a loved one". I guess the sets of people who {have lost a loved one they wanna resurrect}, {take the singularity and the possibility of resurrection seriously}, and {would mention this} is… the empty set??

(I have met one person who is glad that alignment would also get them this, but I don't think it's their core motivation, even emotionally. Same for me.)

Comment by Tamsin Leake (carado-1) on Don't Share Information Exfohazardous on Others' AI-Risk Models · 2023-12-21T10:36:43.893Z · LW · GW

Hence, the policy should have an escape clause: You should feel free to talk about the potential exfohazard if your knowledge of it isn't exclusively caused by other alignment researchers telling you of it. That is, if you already knew of the potential exfohazard, or if your own research later led you to discover it.

In an ideal world, it's good to relax this clause in some way, from a binary to a spectrum. For example: if someone tells me of a hazard that I'm confident I would've discovered one my own one week later, then they only get to dictate me not-sharing-it for a week. "Knowing" isn't a strict binary; anyone can rederive anything with enough time (maybe) — it's just a question of how long it would've taken me to find it if they didn't tell me. This can even include someone bringing my attention to something I already knew, but to which I wouldn't as quickly have thought to pay attention if they didn't bring attention to it.

In the non-ideal world we inhabit, however, it's unclear how fraught it is to use such considerations.

Comment by Tamsin Leake (carado-1) on Don't Share Information Exfohazardous on Others' AI-Risk Models · 2023-12-21T10:32:38.221Z · LW · GW

Pretty sure that's what the "telling you of it" part fixes. Alice is the person who told you of Alice's hazards, so your knowledge is exclusively caused by Alice, and Alice is the person whose model dictates whether you can share them.

Comment by Tamsin Leake (carado-1) on Tamsin Leake's Shortform · 2023-12-18T11:12:35.205Z · LW · GW

(Epistemic status: Not quite sure)

Realityfluid must normalize for utility functions to work (see 1, 2). But this is a property of the map, not the territory.

Normalizing realityfluid is a way to point to an actual (countably) infinite territory using a finite (conserved-mass) map object.

Comment by Tamsin Leake (carado-1) on OpenAI, DeepMind, Anthropic, etc. should shut down. · 2023-12-17T20:44:43.189Z · LW · GW

Seems right. In addition, if there was some person out there waiting to make a new AI org, it's not like they're waiting for the major orgs to shut down to compete.

Shutting down the current orgs does not fully solve the problem, but it surely helps a lot.

Comment by Tamsin Leake (carado-1) on Alignment work in anomalous worlds · 2023-12-16T20:54:30.858Z · LW · GW

Oh, yeah, you're right.

Comment by Tamsin Leake (carado-1) on The convergent dynamic we missed · 2023-12-14T07:38:36.394Z · LW · GW

I still do not agree with your position, but thanks to this post I think I at least understand it better than I did before. I think my core disagreements are:

Here is the catch: AGI components interacting to maintain and replicate themselves are artificial. Their physical substrate is distinct from our organic human substrate.

That needn't be the case. If all of the other arguments in this post were to hold, any AI or AI-coalition (whether aligned to us or not) which has taken over the world could simply notice "oh no, if I keep going I'll be overtaken by the effects described in Remmelt's post!" and then decide to copy itself onto biological computing or nanobots or whatever else strange options it can think of. An aligned AI would even moreso move towards such a substrate if you're correct that otherwise humans would die, because it wants to avoid this.

The more general piece of solutionspace I want to point to, here, is "if you think there's a way for eight billion uncoordinated human minds running on messy human brains inside of industrial civilization to survive, why couldn't aligned superintelligent AI just at the very least {implement/reuse} a copy of what human civilization is doing, and get robustness that way? (though I expect that it could come up with much better).

Another argument you may have heard is that the top-down intelligent engineering by goal-directed AGI would beat the bottom-up selection happening through this intelligent machinery.

That argument can be traced back to Eliezer Yudkowsky's sequence The Simple Math of Evolution.

I'm pretty sure I already believed this before reading any Yudkowsky, so I'll make my own argument here.

Intelligent engineering can already observed to work much faster than selection effects. It also seems straightforward to me that explicit planning to maximize a particular utility function would be expected to steer the world towards what it wants a lot faster than selection effects would. I could maybe expand on this point if you disagree, but I'd be really surprised by that.

And intelligence itself can be very robust to selection effects. Homomorphic-encryption and checkums, and things like {a large population of copies of itself repeatedly checking each other's entire software state and deactivating (via eg killswitch) any instance that has been corrupted}, are examples of technologies an AI can use to make its software robust to hardware change, in a way that'd take selection effects exponential time to be able to get even just one bit of corruption to stay into the system, such that it is not difficult for the superintelligent AI to ensure that approximately zero copies of itself ever get robustly corrupted until the heat death of the universe.

These fall outside the limits of what the AGI's actual built-in detection and correction methods could control for.

Would it? Even once it has nanobots and biotech and any other strange weird tech it can use to maintain whichever parts of itself (if any) match those descriptions?

We humans too depend on highly specific environmental conditions and contexts for the components nested inside our bodies (proteins→organelles→cells→cell lining→) to continue in their complex functioning, such to be maintaining of our overall existence.

Finally, as a last recourse if the rest of your post is true, an aligned AI which has taken over the world can simply upload humans so they don't die when the physical conditions become too bad. We can run on the same compute as its software does, immune to corruption from hardware in the same way.

As an alternative, an aligned superintelligent AI could only planets (or other celestial bodies) which we don't live on to run the bulk of its infrastructure, ensuring "from a distance" (through still very reliable tech that can be made to not get in the way of human life) that planets with humans on them don't launch an AI which would put the aligned superintelligent AI at risk.

Finally, note that these arguments are mostly disjunctive. Even just one way for aligned superintelligent AI to get around this whole argument you're making, would be sufficient to make it wrong. My thoughts above are not particularly my predictions for how an aligned superintelligent AI would actually do, but moreso "existence arguments" for how ways to get around this exist at all — I expect that an aligned superintelligence can come up with much better solutions than I can.

If there truly is no way at all for an aligned superintelligence to exist without humans dying, then (as I've mentioned before), it can just notice that and shut itself down, after spending much-less-than-500-years rearranging the world into one that is headed towards a much better direction (through eg widespread documentation of the issues with building AI and widespread training in rationality).

Comment by Tamsin Leake (carado-1) on Some biases and selection effects in AI risk discourse · 2023-12-14T01:22:59.137Z · LW · GW

My current belief is that you do make some update upon observing existing, you just don't update as much as if we were somehow able to survive and observe unaligned AI taking over. I do agree that the no update at all because you can't see the counterfactual is wrong, but anthropics is still somewhat filtering your evidence; you should update less.

(I don't have my full reasoning for {why I came to this conclusion} fully loaded rn, but I could probably do so if needed. Also, I only skimmed your post, sorry. I have a post on updating under anthropics with actual math I'm working on, but unsure when I'll get around to finishing it.)

Comment by Tamsin Leake (carado-1) on AI Views Snapshots · 2023-12-13T19:59:16.570Z · LW · GW

I really like this! (here's mine)

A few questions:

  • The first time AI reaches STEM+ capabilities (if that ever happpens), it will disempower humanity within three months

    So this is asking for P(fasttakeoff and unaligned | STEM+) ? It feels weird that it's asking for both. Unless you count aligned-AI-takeover as "disempowering" humanity. Asking for either P(fasttakeoff | STEM+) or P(fasttakeoff | unaligned and STEM+) would make more sense, I think.

  • Do you count aligned-AI-takeover (where an aligned AI takes over everything and creates an at-least-okay utopia) as "disempowering humanity"?

  • "reasonable and informed" is doing a lot of work here — is that left to be left to the reader, or should there be some notion of what rough amount of people you expect that to be? I think that, given the definitions I filled my chart with, I would say that there are <1000 people on earth right now who fit this description (possibly <100).

Comment by Tamsin Leake (carado-1) on Some biases and selection effects in AI risk discourse · 2023-12-13T00:39:15.679Z · LW · GW

Okay yeah this is a pretty fair response actually. I think I still disagree with the core point (that AI aligned to current people-likely-to-get-AI-aligned-to-them would be extremely bad) but I definitely see where you're coming from.

Do you actually believe extinction is preferable to rolling the dice on the expected utility (according to your own values) of what happens if one of the current AI org people launches AI aligned to themself?

Even if, in worlds where we get an AI aligned to a set of values that you would like, that AI then acausally pays AI-aligned-to-the-"wrong"-values in different timelines to not run suffering? e.g. Bob's AI runs a bunch of things Alice would like in Bob's AI's timelines, in exchange for Alice's AI not running things Bob would very strongly dislike.

Comment by Tamsin Leake (carado-1) on Tamsin Leake's Shortform · 2023-12-12T23:53:51.065Z · LW · GW

I'm a big fan of Rob Bensinger's "AI Views Snapshot" document idea. I recommend people fill their own before anchoring on anyone else's.

Here's mine at the moment: