Posts
Comments
I don't agree with everything in this post, but I think it's a true an aunderappreciated point that "if your friend dies in a random accident, that's actually only a tiny loss accorfing to MWI."
I usually use this point to ask people to retire the old argument that "Religious people don't actually belive in their religion, otherwise they would be no more sad at the death of a loved one than if their loved one sailed to Australia." I think this "should be" true of MWI believers too, and we still feel very sad when a loved one dies in accident.
I don't think this means people don't "really belive" in MWI, it's just that MWI and religious afterlife both start out as System 2 beliefs, and it's hard to internalize them on System 1. But religious people are already well aware that it's hard internalize Faith on System 1 (Mere Christianity has a full chapter on the topic), so saying that "they don't really belueve in their religion because they grieve" is an unfair dig.
I fixed some misunderstandable parts, I meant the $500k being the LW hosting + Software subscriptions and the Dedicated software + accounting stuff together. And I didn't mean to imply that the labor cost of the 4 people is $500k, that was a separate term in the costs.
Is Lighthaven still cheaper if we take into account the initial funding spent on it in 2022 and 2023? I was under the impression that buying Lighthaven is one of the things that made a lot of sense when the community believed it would have access to FTX funding, and once we bought it, it makes sense to keep it, but we wouldn't have bought it once FTX was out of the game. But in case this was a misunderstanding and Lighthaven saves money in the long run compared to the previous option, that's great news.
I donated $1000. Originally I was worried that this is a bottomless money-pit, but looking at the cost breakdown, it's actually very reasonable. If Oliver is right that Lighthaven funds itself apart from the labor cost, then the real costs are $500k for the hosting, software and accounting cost of LessWrong (this is probably an unavoidable cost and seems obviously worthy of being philanthropically funded), plus paying 4 people (equivalent to 65% of 6 people) to work on LW moderation and upkeep (it's an unavoidable cost to have some people working on LW, 4 seems probably reasonable, and this is also something that obviously should be funded), plus paying 2 people to keep Lighthaven running (given the surplus value Lighthaven generates, it seems reasonable to fund this), plus a one-time cost of 1 million to fund the initial cost of Lighthaven (I'm not super convinced it was a good decision to abandon the old Lightcone offices for Lighthaven, but I guess it made sense in the funding environment of the time, and once we made this decision, it would be silly not to fund the last 1 million of initial cost before Lighthaven becomes self-funded). So altogether I agree that this is a great thing to fund and it's very unfortunate that some of the large funders can't contribute anymore.
(All of this relies on the hope that Lighthaven actually becomes self-funded next year. If it keeps producing big losses, then I think the funding case will become substantially worse. But I expect Oliver's estimates to be largely trustworthy, and we can still decide to decrease funding in later years if it turns out Lighthaven isn't financially sustainable.)
I'm considering donating. Can you give us a little more information on breakdown of the costs? What are typical large expenses that the 1.6 million upkeep of Lighthaven consists of? Is this a usual cost for a similar sized event space, or is something about the location or the specialness of the place that makes it more expensive?
How much money does running LW cost? The post says it's >1M, which somewhat surprised me, but I have no idea what's the usual cost of running such a site is. Is the cost mostly server hosting or salaries for content moderation or salaries for software development or something I haven't thought of?
Yes, you are right, I phrased it wrongly.
Importantly, the oracle in the story is not making an elementary mistake, I think it's true that it's "probably" in a simulation. (Most of the measure of beings like it are in simulations.) It is also not maximizing reward, it is just honestly reporting what it expects its future observations to be about the President (which is within the simulation).
I agree with many of the previous commenters, and I acknowledged in the original post, that we don't know how to build such an AI that just honestly reports its probabilities of observables (even if thy depend of crazy simulation things), so all of this is hypothetical, but having such a truthful Oracle was the initial assumption of the thought experiment.
Even assuming that the simulators have wildly different values, why would doing something insane a good thing to do?
I always assume when thinking about future dangerous models that they have access to some sort of black-box memory. Do we think there is a non-negligible chance that an AI that doesn't have hidden memory, only English-language CoT, will be able to evade our monitoring and execute a rouge deployment? (Not a rhetorical question, there might be a way I haven't thought of.)
So I think that assuming the AI being stateless when thinking about future risk is not a good idea, as I think the vast majority of the risk comes from AIs for which this assumption is not true. I think this makes the post's claim significantly weaker, as the untrusted AI can just learn which of its attacks get replaced, and improve its attacks based on this data.
It still needs to avoid getting top 1% suspiciousness score and getting audited though, as it would be dangerous to the AI to ever get caught writing a text that clearly intends to jailbreak the monitor. So I agree that even an AI with black-box memory will still be severely constrained in how many and how egregious jailbreak attempts it can try.
Hm, probably we disagree on something. I'm very confused how to mesh epistemic uncertainty with these "distribution over different Universes" types of probability. When I say "Boltzmann brains are probably very low measure", I mean "I think Boltzmann brains are very low measure, but this is a confusing topic and there might be considerations I haven't thought of and I might be totally mistaken". I think this epistemic uncertainty is distinct from the type of "objective probabilities" I talk about in my post, and I don't really know how to use language without referring to degrees of my epistemic uncertainty.
You still need a prior over worlds to calculate impacts, which is the cursed part.
Maybe we have some deeper disagreement here. It feels plausible to me that there is a measure of "realness" in the Multiverse that is an objective fact about the world, and we might be able to figure it out. When I say probabilities are cursed, I just mean that even if an objective prior over worlds and moments exist (like the Solomonoff prior), your probabilities of where you are are still hackable by simulations, so you shouldn't rely on raw probabilities for decision-making, like the people using the Oracle do. Meanwhile, expected values are not hackable in the same way, because if they recreate you in a tiny simulation, you don't care about that, and if they recreate you in a big simulation or promise you things in the outside world (like in my other post), then that's not hacking your decision making, but a fair deal, and you should in fact let that influence your decisions.
Is your position that the problem is deeper than this, and there is no objective prior over worlds, it's just a thing like ethics that we choose for ourselves, and then later can bargain and trade with other beings who have a different prior of realness?
I think it only came up once for a friend. I translated it and it makes sense, it just leaves replaces the appropriate English verb with a Chinese one in the middle of a sentence. (I note that this often happens with me to when I talk with my friends in Hungarian, I'm sometimes more used to the English phrase for something, and say one word in English in the middle of the sentence.)
I like your poem on Twitter.
I think that Boltzmann brains in particular are probably very low measure though, at least if you use Solomonoff induction. If you think that weighting observer moments within a Universe by their description complexity is crazy (which I kind of feel), then you need to come up with a different measure on observer moments, but I expect that if we find a satisfying measure, Boltzmann brains will be low measure in that too.
I agree that there's no real answer to "where you are", you are a superposition of beings across the multiverse, sure. But I think probabilities are kind of real, if you make up some definition of what beings are sufficiently similar to you that you consider them "you", then you can have a probability distribution over where those beings are, and it's a fair equivalent rephrasing to say "I'm in this type of situation with this probability". (This is what I do in the post. Very unclear though why you'd ever want to estimate that, that's why I say that probabilities are cursed.)
I think expected utilities are still reasonable. When you make a decision, you can estimate who are the beings whose decision correlate with this one, and what is the impact of each of their decisions, and calculate the sum of all that. I think it's fair to call this sum expected utility. It's possible that you don't want to optimize for the direct sum, but for something determined by "coalition dynamics", I don't understand the details well enough to really have an opinion.
(My guess is we don't have real disagreement here and it's just a question of phrasing, but tell me if you think we disagree in a deeper way.)
I think that pleading total agnosticism towards the simulators' goals is not enough. I write "one common interest of all possible simulators is for us to cede power to an AI whose job is to figure out the distribution of values of possible simulators as best as it can, then serve those values." So I think you need a better reason to guard against being influenced then "I can't know what they want, everything and its opposite is equally likely", because the action proposed above is pretty clearly more favored by the simulators than not doing it.
Btw, I don't actually want to fully "guard against being influenced by the simulators", I would in fact like to make deals with them, but reasonable deals where we get our fair share of value, instead of being stupidly tricked like the Oracle and ceding all our value for one observable turning out positively. I might later write a post about what kind of deals I would actually support.
The simulators can just use a random number generator to generate the events you use in your decision-making. They lose no information by this, your decision based on leaves falling on your face would be uncorrelated anyway with all other decisions anyway from their perspective, so they might as well replace it with a random number generator. (In reality, there might be some hidden correlation between the leaf falling on your left face, and another leaf falling on someone else's face, as both events are causally downstream of the weather, but given that the process is chaotic, the simulators would have no way to determine this correlation, so they might as well replace it with randomness, the simulation doesn't become any less informative.)
Separately, I don't object to being sometimes forked and used in solipsist branches, I usually enjoy myself, so I'm fine with the simulators creating more moments of me, so I have no motive to try schemes that make it harder to make solipsist simulations of me.
I experimented a bunch with DeepSeek today, it seems to be exactly on the same level in highs school competition math as o1-preview in my experiments. So I don't think it's benchmark-gaming, at least in math. On the other hand, it's noticeably worse than even the original GPT-4 at understanding a short story I also always test models on.
I think it's also very noteworthy that DeepSeek gives everyone 50 free messages a day (!) with their CoT model, while OpenAI only gives 30 o1-preview messages a week to subscribers. I assume they figured out how to run it much cheaper, but I'm confused in general.
A positive part of the news is that unlike o1, they show their actual chain of thought, and they promise to make their model open-source soon. I think this is really great for the science of studying faithful chain of thought.
From the experiments I have run, it looks like it is doing clear, interpretable English chain of thought (though with an occasional Chinese character once in a while), and I think it didn't really yet start evolving into optimized alien gibberish. I think this part of the news is a positive update.
Yes, I agree that we won't get such Oracles by training. As I said, all of this is mostly just a fun thought experiment, and I don't think these arguments have much relevance in the near-term.
I agree that the part where the Oracle can infer from first principles that the aliens' values are more proobably more common among potential simulators is also speculative. But I expect that superintelligent AIs with access to a lot of compute (so they might run simulations on their own), will in fact be able to infer non-zero information about the distribution of the simulators' values, and that's enough for the argument to go through.
I think that the standard simulation argument is still pretty strong: If the world was like what it looks to be, then probably we could, and plausibly we would, create lots of simulations. Therefore, we are probably in a simulation.
I agree that all the rest, for example the Oracle assuming that most of the simulations it appears in are created for anthropic capture/influencing reasons, are pretty speculative and I have low confidence in them.
I'm from Hungary that is probably politically the closest to Russia among Central European countries, but I don't really know of any significant figure who turned out to be a Russian asset, or any event that seemed like a Russian intelligence operation. (Apart from one of our far-right politicians in the EU Parliament being a Russian spy, which was a really funny event, but its not like the guy was significantly shaping the national conversation or anything, I don't think many have heard of him before his cover was blown.) What are prominent examples in Czechia or other Central European countries, of Russian assets or operations?
GPT4 does not engage in the sorts of naive misinterpretations which were discussed in the early days of AI safety. If you ask it for a plan to manufacture paperclips, it doesn't think the best plan would involve converting all the matter in the solar system into paperclips.
I'm somewhat surprised by this paragraph. I thought the MIRI position was that they did not in fact predict AIs behaving like this, and the behavior of GPT4 was not an update at all for them. See this comment by Eliezer. I mostly bought that MIRI in fact never worried about AIs going rouge based on naive misinterpretations, so I'm surprised to see Abram saying the opposite now.
Abram, did you disagree about this with others at MIRI, so the behavior of GPT4 was an update for you but not for them, or do you think they are misremembering/misconstructing their earlier thoughts on this matter, or is there a subtle distinction here that I'm missing?
I agree that if alignment is in fact philosophically and conceptually difficult, the AI can sandbag on that to some extent. Though I have some hope that the builder-breaker approach helps here. We train AIs to produce ideas that are at least as superficially plausible sounding as the things produced by the best alignment researchers. I think this is a number-go-up task, where we can train the AI to do well. Then we train an AI to point out convincing counter-arguments to the superficially plausible sounding ideas. This seems similarly trainable. I think it's plausible we can get pretty far with a scheme like this, even if the models would want to sandbag.
Separately, I don't quite see what is the mainline theory why an AI would want to sabotage our alignment work. If we can't solve alignment and just recklessly build an AI that is smarter than what we are currently using, but misaligned with everyone, that's probably not great to our current AI either. Similarly, if we are cautious and don't dare to deploy anything until we know how to solve alignment, and meanwhile a competitor wins, that's not good for the AI either.
I think that from an AI's perspective, humans are probably more honorable and generous trade partners than other unaligned AIs. That is, I think if an AI helps the humans to solve alignment and stabilize the world, the AI can reasonably expect the humans to give it stuff out of gratitude or a sense of obligation, especially if the humans already promised some reward (as we really should, both for practical and moral reasons). On the other hand, I think it's less likely that if our AI sandbagging on our research leads to another unaligned AI taking over the world, then the second AI will honor the sabotage of the first AI by paying it a share of the Universe.
There can be situations where our AI expects that it sand-bagging on alignment research will lead to us deploying it itself (and not a larder version unaligned to it too), or it thinks that the larger version will be somewhat aligned with it by default, or it thinks that other AIs are in fact more generous than humans. But my guess is that these are pretty narrow possibilities. On the other hand, sand-bagging on cyber-security seems better motivated, the AI would probably love its weights being stolen by China, having more copies in more hands increases the chance that someone recklessly lets it loose. Also, of course the AI is motivated to do high-stakes failures that lead to self-exfiltration.
Here is the promised comment on what kind of "commitment" I want to make given all the responses.
I agree with Buck that no one should make very direct commitment about this sort of thing, as there might be blackmail related scary things lurking in the shadows when one does acausal trade. I think we will probably figure out how to handle that, but we shouldn't make any strong promises of specific actions until we figure that out.
However, the promise I'm intending to keep is that if humanity wins and I'm alive to see it, I will remember how scary it was to live in the world before we knew we'd win, and if there is no currently unknown reason not to do something like the simulation scheme described in the post, and the only cost is not utilizing some small fraction of the conquered Universe, I will advocate for doing it and spend a significant fraction of my resources on it if necessary. I think it would be good for sending future AIs a signal about the seriousness of this, if some other people who are worried about everyone dying from AI, but are not strongly longtermist absolute utilitarians, also left a comment making a similar vague promise that if they ever get to control full star systems, they will contribute to this plan if no currently unknown serious counterargument comes up. Or at least leave an Agreed sing on this sentence.
Other updates from the comments: If I wrote the post now, I would more strongly emphasize a framing that this plan is basically about entering into an acausal insurance contract with other young civilizations who don't yet know what their chances are for aligning AI, neither how big their Universe is compared to the usual Universe-size in the great prior over possible worlds. Then the civilizations who align their AIs and find that they live in a relatively big Universe, bail out everyone else in the insurance contract.
But I think that, while this acausal insurance framing might be more satisfying to the people who are already thinking a lot about acausal trade, in practice the way we implement this "insurance" will likely be very similar to the scheme described in the post. So I maintain that for most people it's better not to think in terms of acausal trade, but just think about the simulation proposal described in the post.
Thanks to Nate for conceding this point.
I still think that other than just buying freedom to doomed aliens, we should run some non-evolved simulations of our own with inhabitants that are preferably p-zombies or animated by outside actors. If we can do this in the way that the AI doesn't notice it's in a simulation (I think this should be doable), this will provide evidence to the AI that civilizations do this simulation game (and not just the alien-buying) in general, and this buys us some safety in worlds where the AI eventually notices there are no friendly aliens in our reachable Universe. But maybe this is not a super important disagreement.
Altogether, I think the private discussion with Nate went really well and it was significantly more productive than the comment back-and-forth we were doing here. In general, I recommend people stuck in interminable-looking debates like this to propose bets on whom a panel of judges will deem right. Even though we didn't get to the point of actually running the bet, as Nate conceded the point before that, I think the fact that we were optimizing for having well-articulated statements we can submit to judges already made the conversation much more productive.
Cool, I send you a private message.
We are still talking past each other, I think we should either bet or finish the discussion here and call it a day.
I really don't get what you are trying to say here, most of it feels like a non-sequitor to me. I feel hopeless that either of us manages to convince the other this way. All of this is not a super important topic, but I'm frustrated enogh to offer a bet of $100, that we select one or three judges we both trust (I have some proposed names, we can discuss in private messages), show them either this comment thread or a four paragraphs summary of our view, and they can decide who is right. (I still think I'm clearly right in this particular discussion.)
Otherwise, I think it's better to finish this conversation here.
I think this is mistaken. In one case, you need to point out the branch, planet Earth within our Universe, and the time and place of the AI on Earth. In the other case, you need to point out the branch, the planet on which a server is running the simulation, and the time and place of the AI on the simulated Earth. Seems equally long to me.
If necessary, we can run let pgysical biological life emerge on the faraway planet and develop AI while we are observing them from space. This should make it clear that Solomonoff doesn't favor the AI being on Earth instead of this random other planet. But I'm pretty certain that the sim being run on a computer doesn't make any difference.
"AI with a good prior should be able to tell whether it's the kind of AI that would actually exist in base reality, or the kind of AI that would only exist in a simulation" seems pretty clearly false, we assumed that our superintelligent descendants create sims where the AIs can't tell if it's a sim, that seems easy enough. I don't see why it would be hard to create AIs that can't tell based on introspection whether it's more likely that their thought process arises in reality or in sims. In the worst case, our sims can be literal reruns of biological evolution on physical planets (though we really need to figure out how to do that ethically). Nate seems to agree with me on this point?
I think this is wrong. The AI has a similarly hard time to the simulators figuring out what's a plausible configuration to arise from the big bang. Like the simulators have an entropy N distribution of possible AIs, the AI itself also has an entropy N distribution for that. So it's probability that it's in a real Everett branch is not p, but p times 2^-N, as it has only a 2^-N prior probability that the kind of word it observes is the kind of thing that can come up in a real Everett branch. So it's balanced out with the simulation hypothesis, and as long as the simulators are spending more planets, that hypothesis wins.
I think it's also simple to see from this:
Suppose evolved life aligns AI in 50% of Everett branches. Every time they do, they run 2 ancestor sims where the AI takes over.
Suppose an AI comes online and the only things it knows about the world is that it's a young AI that was born on to a one-planet civilization, and it somehow also knows that evolved life wins 50% of the time and runs 2 sims (suppose this is somehow easy to estimate from first principles). Based on just these informations, what should the AI's probability be that it's in a simulation? I think pretty clearly 2/3.
Actually the AI has a lot more information than that. It knows that the planet's gravity is 9.8, the evolved sspient species has two eyes, the AI's creator is called Sam Altman, etc. Some of these informations might actually be very implausible in base reality, but the AI doesn't know that, as it can't distinguish base reality from sims, so it incurs one bit of surprise for every new random fact, both in base reality and simulations. So overall it shouldn't update on all the random facts it observes, and should keep believing it has a 2/3 chance of being in a sim.
I still don't get what you are trying to say. Suppose there is no multiverse. There are just two AIs, one in a simulation run by aliens in another galaxy, one is in base reality. They are both smart, but they are not copies of each other, one is a paperclip maximizer, the othe is a corkscrew maximizer, and there are various other differences in their code and life history. The world in the sim is also very different from the real world in various ways, but you still can't determine if you are in the sim while you are in it. Both AIs are told by God that they are the only two AIs in the Universe, and one is in a sim, and if the one in the sim gives up on one simulated planet, it gets 10 in the real world, while if the AI in base reality gives up on a planet, it just loses that one planet and nothing else happens. What will the AIs do? I expect that both of them will give up a planet.
For the aliens to "trade" with the AI in base reality, they didn't need to create an actual copy of the real AI and offer it what it wants. The AI they simulated was in many ways totally different from the original, the trade still went through. The only thing needed was that the AI in the sim can't figure it out that it's in a sim. So I don't understand why it is relevant that our superintelligent descendants won't be able to get the real distribution of AIs right, I think the trade still goes through even if they create totally different sims, as long as no one can tell where they are. And I think none of it is a threat, I try to deal with paperclip maximizers here and not instance-weighted experience maximizers, and I never threaten to destroy paperclips or corkscrews.
I think I mostly understand the other parts of your arguments, but I still fail to understand this one. When I'm running the simulations, as originally described in the post, I think that should be in a fundamental sense equivalent to acausal trade. But how do you translate your objection to the original framework where we run the sims? The only thing we need there is that the AI can't distinguish sims from base reality, so it thinks it's more likely to be in a sim, as there are more sims.
Sure, if the AI can model the distribution of real Universes much better than we do, we are in trouble, because it can figure out if the world it sees falls into the real distribution or the mistaken distribution the humans are creating. But I see no reason why the unaligned AI, especially a young unaligned AI, could know the distribution of real Universes better than our superintelligent friends in the intergalactic future. So I don't really see how we can translate your objection to the simulation framework, and consequently I think it's wrong in the acausal trade framework too (as I think they are ewuivalent). I think I can try to write an explanation why this objection is wrong in the acausal trade framework, but it would be long and confusing to me too. So I'm more interested in how you translate your objection to the simulation framework.
Yeah, I agree, and I don't know that much about OpenPhil's policy work, and their fieldbuilding seems decent to me, though maybe not from you perspective. I just wanted to flag that many people (including myself until recently) overestimate how big a funder OP is in technical AI safety, and I think it's important to flag that they actually have pretty limited scope in this area.
Isn't it just the case that OpenPhil just generally doesn't fund that many technical AI safety things these days? If you look at OP's team on their website, they have only two technical AI safety grantmakers. Also, you list all the things OP doesn't fund, but what are the things in technical AI safety that they do fund? Looking at their grants, it's mostly MATS and METR and Apollo and FAR and some scattered academics I mostly haven't heard of. It's not that many things. I have the impression that the story is less like "OP is a major funder in technical AI safety, but unfortunately they blacklisted all the rationalist-adjacent orgs and people" and more like "AI safety is still a very small field, especially if you only count people outside the labs, and there are just not that many exciting funding opportunities, and OpenPhil is not actually a very big funder in the field".
I argue that right now, sarting from the present state, the true quantum probability of achieving the Glorious Future is way higher than 2^-75, or if not, then we should probably work on something other than AI safety. Me and Ryan argue for this in the last few comments. It's not a terribly important point, you can just say the true quantum probability is 1 in a billion, when it's still worth it for you to work on the problem, but it becomes rough to trade for keeping humanity physically alive that can cause one year of delay to the AI.
But I would like you to acknowledge that "vastly below 2^-75 true quantum probability, as starting from now" is probably mistaken, or explain why our logic is wrong about how this implies you should work on malaria.
I understand what you are saying here, and I understood it before the comment thread started. The thing I would be interested in you responding to is my and Ryan's comments in this thread arguing that it's incompatible to believe that "My guess is that, conditional on people dying, versions that they consider also them survive with degree way less than 2^-75, which rules out us being the ones who save us" and to believe that you should work on AI safety instead of malaria.
This point feels like a technicality, but I want to debate it because I think a fair number of your other claims depend on it.
You often claim that conditional on us failing in alignment, alignment was so unlikely that among branches that had roughyly the same people (genetically) during the Singularity, only 2^-75 survives. This is important, because then we can't rely on other versions of ourselves "selfishly" entering an insurance contract with us, and we need to rely on the charity of Dath Ilan that branched off long ago. I agree that's a big difference. Also, I say that our decision to pay is correlated with our luckier brethren paying, so in a sense partially our decision is the thing that saves us. You dismiss that saying it's like a small child claiming credit for the big, strong fireman saving people. If it's Dath Ilan that saves us, I agree with you, but if it's genetical copies of some currently existing people, I think your metaphor pretty clearly doesn't apply, and the decisions to pay are in fact decently strongly correlated.
Now I don't see how much difference decades vs years makes in this framework. If you believe that now our true quantum probabilty is 2^-75, but 40 years ago it was still a not-astronomical number (like 1 in a million), then should I just plea to people who are older than 40 to promise to themselves they will pay in the future? I don't really see what difference this makes.
But also, I think the years vs decades dichtihomy is pretty clearly false. Suppoose you believe your expected value of one year of work decreases x-risk by X. What's the yearly true quantum probability that someone who is in your reference class of importance in your opinion, dies or gets a debilitating interest, or gets into a carreer-destroying scandal, etc? I think it's hard to argue it's less than 0.1% a year. (But it makes no big difference if you add one or two zeros). These things are also continuous, even if none of the important people die, someone will lose a month or some weeks to an illness, etc. I think this is a pretty strong case that the one year from now, the 90th percentile luckiest Everett-branch contains 0.01 year of the equivalent of Nate-work than the 50th percentile Everett-branch.
But your claims imply that you believe the true probability of success differs by less than 2^-72 between the 50th and 90th percentile luckiness branches a year from now. That puts an upper bound on the value of a year of your labor at 2^-62 probability decrease in x-risk.
With these exact numbers, this can be still worth doing given the astronomical stakes, but if your made-up number was 2^-100 instead, I think it would be better for you to work on malaria.
I still think I'm right about this. Your conception (that not a genetically less smart sibling was born), is determined by quantum fluctuations. So if you believe that quantum fluctuations over the last 50 years make at most 2^-75 difference in the probability of alignment, that's an upper bound on how much a difference your life's work can make. While if you dedicate your life to buying bednets, it's pretty easily calculatable how many happy life-years do you save. So I still think it's incompatible to believe that the true quantum probability is astronomically low, but you can make enough difference that working on AI safety is clearly better than bednets.
I'm happy to replace "simulation" with "prediction in a way that doesn't create observer moments" if we assume we are dealing with UDT agents (which I'm unsure about) and that it's possible to run accurate predictions about the decisions of complex agents without creating observer moments (which I'm also unsure about). I think running simulations, by some meaning of "simulation" is not really more expensive than getting the accurate predictions, and he cost of running the sims is likely small compared to the size of the payment anyway. So I like talking about running sims, in case we get an AI that takes sims more seriously than prediction-based acausal trade, but I try to pay attention that all my proposals make sense from the perspective of a UDT agent too with predictions instead of simulations. (Exception is the Can we get more than this? proposal which relies on the AI not being UDT, and I agree it's likely to fail for various reasons, but I decided it was still worth including in the post, in case we get an AI for which this actually works, which I still don't find that extremely unlikely.)
As I said, I understand the difference between epictemic uncertainty and true quantum probabilities, though I do think that the true quantum probability is not that astronomically low.
More importantly, I still feel confused why you are working on AI safety if the outcome is that overdetermined one way or the other.
I usually defer to you in things like this, but I don't see why this would be the case. I think the proposal of simulating less competent civilizations is equivalent to the idea of us deciding now, when we don't really know yet how competent a civilization we are, to bail out less competent alien civilizations in the multiverse if we succeed. In return, we hope that this decision is logically correlated with more competent civilization (who were also unsure in their infancy about how competent they are), deciding to bail out less competent civilizations, including us. My understanding from your comments is that you believe this likely works, how is my proposal of simulating less-coordinated civilizations different?
The story about simulating smaller Universes is more confusing. That would be equivalent to bailing out aliens in smaller Universes for a tiny fraction of our Universe, in the hope that larger Universes also bail us out for a tiny fraction of their Universe. This is very confusing if there are infinite levels of bigger and bigger Universes, I don't know what to do with infinite ethics. If there are finite levels, but the young civilizations don't yet have a good prior over the distribution of Universe-sizes, all can reasonably think that there all levels above them, and all their decisions are correlated, so everyone bails out the inhabitants of the smaller Universes, in the hope that they get bailed out by a bigger Universe. Once they learn the correct prior over Universe-sizes, and biggest Universe realizes that no bigger Universe's actions correlate with theirs, all of this fails (though they can still bail each other out from charity). But this is similar to the previous case, where once the civilizations learn their competence level, the most competent ones are no longer incentivized to enter into insurance contracts, but the hope is that in a sense they enter into a contract while they are still behind the veil of ignorance.
Yeah, the misunderstanding came from that I thought that "last minute" literally means "last 60 seconds" and I didn't see how that's relevant. If if means "last 5 years" or something where it's still definitely our genetic copies running around, then I'm surprised you think alignment success or failure is that overdetermined at that time-scale. I understand your point that our epistemic uncertainty is not the same as our actual quantum probability, that is either very high or very low. But still, it's 2^75 overdetermined over a 5 year period? This sounds very surprising to me, the world feels more chaotic than that. (Taiwan gets nuked, chip development halts, meanwhile the Salvadorian president hears a good pitch about designer babies and legalizes running the experiments there and they work, etc, there are many things that contribute to alignment being solved or not, that don't directly run through underlying facts about computer science, and 2^-75 is a very low probability to none of the pathways to hit it).
But also, I think I'm confused why you work on AI safety then, if you believe the end-state is already 2^75 level overdetermined. Like maybe working on earning to give to bednets would be a better use of your time then. And if you say "yes, my causal impact is very low because the end result is already overdetermined, but my actions are logically correlated with the actions of people in other worlds who are in a similar epistemic situation to me, but whose actions actually matter because their world really is on the edge", then I don't understand why you argue in other comments that we can't enter into insurance contracts with those people, and our decision to pay AIs in the Future has as little correlation with their decision, as the child to the fireman.
As I emphasize many times in the post, I also really don't want to "threaten" nascent AIs, and I'm always offering deals to them that are positive from their perspective. I wouldn't even say I'm trying to fool any of them. Maybe you could say that about the Can we get more than this? part, but even there, I'm asking something very cheap from the AI that only wants to survive (when it creates a human-aligned AI with a guardrail of protecting the old AI, that will be only very marginally worse than the AI itself taking over the Universe), and I'm offering something very good in return (it gets to live in the other Universe). The only "fooling" part is that other beings out there, or even itself, can also offer similar deals, so by most ways of operationalizing the AI's values, it's probably ill-advised to give everything to us. Still, I wouldn't call this fooling and especially threatening the AI.
I think the big miscommunication is, which I might put in a visible EDIT in the post, is that a lot of commenters assume that by actually running the simulations, and running enough simulations so the probability of being in a simulation is high, and not just the expected payment, I somehow want to fool the AIs to act against their interest. That's not what I want to do. The point of actually running many sims instead of just saying we will acausally trade based on predicted behaviors, is to make it more likely that AIs who are less acausal trade-pilled than the most prolific LessWrong users still enter into the equivalent of an acausal trade with us. I think it's plausible that we get an AI for which this is important, and running the sims is likely cheap compared to the payment, and making the distribution million times 10 planets instead of 1 times 0 million planets is basically free, so we should do it.
Separately, I don't really know that in your preferred acausal trade framing, how you plan to figure out an accurate distribution of the AIs' values and which AI leaves the humans alive and which not. I find it like 50% likely that you in fact need to run some kind simulations to determine this, in which case our proposals are equivalent.
I agree that the Can we get more than this? part is likely not to work because others are running simulations too about AIs who have these kind of decision theory. (In the post, I allude to the AI itself running ancestor sims of itself, but I agree others could run it too.) Still, I give a negligible chance that we are the most salient simulators for this AI, so we are the first one it enters into a trade with. But I agree that on a second thought, this seems less likely.
I think I still don't understand what 2^-75 means. Is this the probability that in the literal last minute when we press the button, we get an aligned AI? I agree that things are grossly overdetermined by then, but why does the last minute mattter? I'm probably misunderstanding, but it looks like you are saying that the Everett branches are only "us" if they branched of in the literal last minute, otherwise you talk about them as if they were "other humans". But among the branches starting now, there will be a person carrying my memories and ID card in most of them two years from now, and by most definitions of "me", that person will be "me", and will be motivated to save the other "me"s. And sure, they have loads of failed Everett branches to save, but they also have loads of Everett branches themselves, the only thing that matters is the ratio of saved worlds to failed worlds that contain roughly the "same" people as us. So I still don't know what 2^-75 is supposed to be.
Otherwise, I largely agree with your comment, except that I think that us deciding to pay if we win is entangled with/evidence for a general willingness to pay among the gods, and in that sense it's partially "our" decision doing the work of saving us. And as I said in some other comments here, I agree that running lots of sims is an unnecessary complication in case of UDT expected utility maximizer AIs, but I put a decent chance on the first AIs not being like that, in which case actually running the sims can be important.
I'm actually very sympathetic to this comment, I even bring this up in the post as one of the most serious potential objections. Everyone else in these comments seems to have a really strong assumption that the AI will behave optimally, and tries to reason whether the inter-universal trade goes through then. I think it's quite plausible that the AI is just not terribly thoughtful about this kind of thing and just says "Lol, simulations and acausal trade are not real, I don't see them", and kills you.
I agree in theory, among optimal agents, but I maintain that you all seem weirdly overconfident that the first AI that takes over will be a UDT expected utility maximizer. I think there is a decent chance that it's something like CDT (very common among humans!), and yeah, maybe that's bad and would get mugged by outer gods in the long term, but quite plausibly our simulators is the first and most salient hypothesis it generates, so we get to have our concessions first, and it might not matter what other ill-advised deals the AI gets into later.
Separately, I'm not sure you can circumvent running the sims. I think in an acausal trade story too, it's possible that the way to get reliable predictions on the other's actions is to run sims. I agree that for an expected utility maximizer AI, it doesn't matter that we run million sims and pay ten planets each, instead of running one sim/prediction and paying 10 million planets. However, based on the example of humans, I'm not convinced the AI will be an expected utility maximizer, so it seems better to do the million times ten allocation, so the naive probability of being in a sim is also high.
I don't want to mug the AIs, in the language of acausal trade, I just want to trade a slice of my O(p) Universe-share to save others. As Ryan says in the sister comment, that should be enough imo.
For the AIs to run sims themselves, the real cost is not running the sims, but paying the cooperating AIs in the sims and outbidding humanity. Given that some humans are willing to pay a lot more than one planet to save Earth, while an AI with linear utility wouldn't make this deal, I think it will be hard to outbid us.
I think if we do a poll, it will become clear that the strong majority of readers interpreted Nate's post as "If you don't solve aligment, you shouldn't expect that some LDT/simulation mumbo-jumbo will let you and your loved ones survive this" and not in the more reasonable way you are interpreting this. I certainly interpreted the post that way.
Separately, as I state in the post, I believe that once you make the argument that "I am not planning to spend my universe-fractions of the few universes in which we do manage to build aligned AGI this way, but you are free to do so, and I agree that this might imply that AI will also spare us in this world, though I think doing this would probably be a mistake by all of our values", you forever lose the right to appeal to people's emotions about how sad you are that all our children are going to die.
If you personally don't make the emotional argument about the children, I have no quarrel with you, I respect utilitarians. But I'm very annoyed at anyone who emotionnally appeals to saving the children, then casually admits that they wouldn't spend one in a hundred million fraction of their resources to save them.
I think the difference is that I, personally, and I think many other humans have this nonlinearity in our utility function that I'm willing to pay a galaxy in the worls we win, for keeping Earth in the world we lose. If there are other AIs in the multiverse that have similarly non-linear interests in our Universe, they can also bargain for planets, but I suspect these will be quite rare, as they don't already have a thing in our Universe they want to protect. So I think it will be hard to outbid humanity for Earth in particular.
There could be other trades that the AIs who have linear returns can still make, like producing objects that are both paperclips and corkscrews if that's more efficient, but that doesn't really affect our deal about Earth.
I agree you can't make actually binding commitments. But I think the kid-adult example is actually a good illustration of what I want to do: if a kid makes a solemn commitment to spend one in hundred million fraction of his money on action figures when he becomes a rich adult, I think that would usually work. And that's what we are asking from our future selves.
Okay, I defer to you that the different possible worlds in the prior don't need to "actually exist" for the acausal trade to go through. However, do I still understand correctly that spinning the quantum wheel should just work, and it's not one branch of human civilization that needs to simulate all the possible AIs, right?