Posts

Acausal Now: We could totally acausally bargain with aliens at our current tech level if desired 2023-08-09T00:50:50.564Z
Necromancy's unintended consequences. 2023-08-09T00:08:41.656Z
How do low level hypotheses constrain high level ones? The mystery of the disappearing diamond. 2023-07-11T19:27:48.756Z
Challenge proposal: smallest possible self-hardening backdoor for RLHF 2023-06-29T16:56:59.832Z
Anthropically Blind: the anthropic shadow is reflectively inconsistent 2023-06-29T02:36:26.347Z
Solomonoff induction still works if the universe is uncomputable, and its usefulness doesn't require knowing Occam's razor 2023-06-18T01:52:25.769Z
Demystifying Born's rule 2023-06-14T03:16:20.941Z
Current AI harms are also sci-fi 2023-06-08T17:49:59.054Z
Inference from a Mathematical Description of an Existing Alignment Research: a proposal for an outer alignment research program 2023-06-02T21:54:56.291Z
The unspoken but ridiculous assumption of AI doom: the hidden doom assumption 2023-06-01T17:01:49.088Z
What projects and efforts are there to promote AI safety research? 2023-05-24T00:33:47.554Z
Seeing Ghosts by GPT-4 2023-05-20T00:11:52.083Z
We are misaligned: the saddening idea that most of humanity doesn't intrinsically care about x-risk, even on a personal level 2023-05-19T16:12:04.159Z
Proposal: we should start referring to the risk from unaligned AI as a type of *accident risk* 2023-05-16T15:18:55.427Z
PCAST Working Group on Generative AI Invites Public Input 2023-05-13T22:49:42.730Z
The way AGI wins could look very stupid 2023-05-12T16:34:18.841Z
Are healthy choices effective for improving live expectancy anymore? 2023-05-08T21:25:45.549Z
Acausal trade naturally results in the Nash bargaining solution 2023-05-08T18:13:09.114Z
Formalizing the "AI x-risk is unlikely because it is ridiculous" argument 2023-05-03T18:56:25.834Z
Accuracy of arguments that are seen as ridiculous and intuitively false but don't have good counter-arguments 2023-04-29T23:58:24.012Z
Proposal: Using Monte Carlo tree search instead of RLHF for alignment research 2023-04-20T19:57:43.093Z
A poem written by a fancy autocomplete 2023-04-20T02:31:58.284Z
What is your timelines for ADI (artificial disempowering intelligence)? 2023-04-17T17:01:36.250Z
In favor of accelerating problems you're trying to solve 2023-04-11T18:15:07.061Z
"Corrigibility at some small length" by dath ilan 2023-04-05T01:47:23.246Z
How to respond to the recent condemnations of the rationalist community 2023-04-04T01:42:49.225Z
Do we have a plan for the "first critical try" problem? 2023-04-03T16:27:50.821Z
AI community building: EliezerKart 2023-04-01T15:25:05.151Z
Imagine a world where Microsoft employees used Bing 2023-03-31T18:36:07.720Z
GPT-4 busted? Clear self-interest when summarizing articles about itself vs when article talks about Claude, LLaMA, or DALL·E 2 2023-03-31T17:05:05.378Z
GPT-4 is bad at strategic thinking 2023-03-27T15:11:47.448Z
More experiments in GPT-4 agency: writing memos 2023-03-24T17:51:48.660Z
Does GPT-4 exhibit agency when summarizing articles? 2023-03-24T15:49:34.420Z
A crazy hypothesis: GPT-4 already is agentic and is trying to take over the world! 2023-03-24T01:19:41.298Z
GPT-4 aligning with acasual decision theory when instructed to play games, but includes a CDT explanation that's incorrect if they differ 2023-03-23T16:16:25.588Z
Exploring the Precautionary Principle in AI Development: Historical Analogies and Lessons Learned 2023-03-21T03:53:30.797Z
Capabilities Denial: The Danger of Underestimating AI 2023-03-21T01:24:02.024Z
ARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so 2023-03-15T00:29:23.523Z
A better analogy and example for teaching AI takeover: the ML Inferno 2023-03-14T19:14:44.790Z
Could Roko's basilisk acausally bargain with a paperclip maximizer? 2023-03-13T18:21:46.722Z
A ranking scale for how severe the side effects of solutions to AI x-risk are 2023-03-08T22:53:11.224Z
Is there a ML agent that abandons it's utility function out-of-distribution without losing capabilities? 2023-02-22T16:49:01.190Z
Bing finding ways to bypass Microsoft's filters without being asked. Is it reproducible? 2023-02-20T15:11:28.538Z
Threatening to do the impossible: A solution to spurious counterfactuals for functional decision theory via proof theory 2023-02-11T07:57:16.696Z
Is this a weak pivotal act: creating nanobots that eat evil AGIs (but nothing else)? 2023-02-10T19:26:00.817Z
Optimality is the tiger, and annoying the user is its teeth 2023-01-28T20:20:33.605Z

Comments

Comment by Christopher King (christopher-king) on Critical review of Christiano's disagreements with Yudkowsky · 2023-12-28T23:58:04.236Z · LW · GW

I disagree with my characterization as thinking problems can be solved on paper

Would you say the point of MIRI was/is to create theory that would later lead to safe experiments (but that it hasn't happened yet)? Sort of like how the Manhattan project discovered enough physics to not nuke themselves, and then started experimenting? 🤔

Comment by Christopher King (christopher-king) on Why does expected utility matter? · 2023-12-26T15:54:04.333Z · LW · GW

If you aren't maximizing expected utility, you must choose one of the four axioms to abandon.

Comment by Christopher King (christopher-king) on Learning as you play: anthropic shadow in deadly games · 2023-12-06T17:32:32.462Z · LW · GW

Maximizing expected utility in Chinese Roulette requires Bayesian updating.

Let's say on priors that P(n=1) = p and that P(n=5) = 1-p. Call this instance of the game G_p.

Let's say that you shoot instead of quit the first round. For G_1/2, there are four possibilities:

  1. n = 1, vase destroyed: The probability of this scenario is 1/12. No further choices are needed.
  2. n = 5, vase destroyed. The probability of this scenario is 5/12. No further choices are needed.
  3. n = 1, vase survived: The probability of this scenario is 5/12. The player needs a strategy to continue playing.
  4. n = 5, vase survived. The probability of this scenario is 1/12. The player needs a strategy to continue playing.

Notice that the strategy must be the same for 3 and 4 since the observations are the same. Call this strategy S.

The expected utility, which we seek to maximize, is:

E[U(shoot and then S)] = 0 + 5/12 * (R + E[U(S) | n = 1]) + 1/12 * (R + E[U(S) | n = 5])

Most of our utility is determined by the n = 1 worlds.

Manipulating the equation we get:

E[U(shoot and then S)] = R/2 + 1/2 * (5/6 * E[U(S) | n = 1] + 1/6 * E[U(S) | n = 5])

But the expression 5/6 * E[U(S) | n = 1] + 1/6 * E[U(S) | n = 5] is the expected utility if we were playing G_5/6. So the optimal S is the optimal strategy for G_5/6. This is the same as doing a Bayesian update (1:1 * 5:1 = 5:1 = 5/6).

Comment by Christopher King (christopher-king) on Learning as you play: anthropic shadow in deadly games · 2023-12-05T17:04:44.076Z · LW · GW

The way anthropics twists things is that if this were russian roulette I might not be able to update after 20 Es that the gun is empty, since in all the world's where I died there's noone to observe what happened, so of course I find myself in the one world where by pure chance I survived.

This is incorrect due to the anthropic undeath argument. The vast majority of surviving worlds will be ones where the gun is empty, unless it is impossible to be so. This is exactly the same as a Bayesian update.

Comment by Christopher King (christopher-king) on Apocalypse insurance, and the hardline libertarian take on AI risk · 2023-11-30T14:42:25.630Z · LW · GW

Human labor becomes worthless but you can still get returns from investments. For example, if you have land, you should rent the land to the AGI instead of selling it.

Comment by Christopher King (christopher-king) on Are humans misaligned with evolution? · 2023-10-26T15:48:53.168Z · LW · GW

I feel like jacob_cannell's argument is a bit circular. Humans have been successful so far but if AI risk is real, we're clearly doing a bad job at truly maximizing our survival chances. So the argument already assumes AI risk isn't real.

Comment by Christopher King (christopher-king) on Bureaucracy is a world of magic · 2023-09-14T15:06:03.454Z · LW · GW

You don't need to steal the ID, you just need to see it or collect the info on it. Which is easy since you're expected to share your ID with people. But the private key never needs to be shared, even in business or other official situations.

Comment by Christopher King (christopher-king) on The God of Humanity, and the God of the Robot Utilitarians · 2023-08-25T02:16:46.775Z · LW · GW

So, Robutil is trying to optimize utility of individual actions, but Humo is trying to optimize utility of overall policy?

Comment by Christopher King (christopher-king) on Memetic Judo #1: On Doomsday Prophets v.3 · 2023-08-21T12:02:23.480Z · LW · GW

This argument makes no sense since religion bottoms out at deontology, not utilitarianism.

In a Christianity for example, if you think God would stop existential catastrophes, you have a deontological duty to do the same. And the vast majority of religions have some sort of deontological obligation to stop disasters (independently of whether divine intervention would have counter-factually happened).

Comment by Christopher King (christopher-king) on If we had known the atmosphere would ignite · 2023-08-18T13:26:01.817Z · LW · GW

Note that such a situation would also have drastic consequences for the future of civilization, since civilization itself is a kind of AGI. We would essentially need to cap off the growth in intelligence of civilization as a collective agent.

In fact, the impossibility to align AGI might have drastic moral consequences: depending on the possible utility functions, it might turn out that intelligence itself is immoral in some sense (depending on your definition of morality).

Comment by Christopher King (christopher-king) on AGI is easier than robotaxis · 2023-08-14T17:05:04.568Z · LW · GW

Note that even if robotaxis are easier, it's not much easier. It is at most the materials and manufacturing cost of the physical taxi. That's because from your definition:

By AGI I mean a computer program that functions as a drop-in replacement for a human remote worker, except that it's better than the best humans at every important task (that can be done via remote workers).

Assume that creating robo-taxis is humanly possible. I can just run a couple AGIs and have them send a design to a factory for the robo-taxi, self-driving software included.

Comment by Christopher King (christopher-king) on Acausal Now: We could totally acausally bargain with aliens at our current tech level if desired · 2023-08-11T17:37:35.160Z · LW · GW

I mean, as an author you can hack through them like butter; it is highly unlikely that out of all the characters you can write, the only ones that are interesting will all generate interesting content iff (they predict) you'll give them value (and this prediction is accurate).

Yeah, I think it's mostly of educational value. At the top of the post: "It might be interesting to try them out for practice/research purposes, even if there is not much to gain directly from aliens.".

Comment by Christopher King (christopher-king) on Acausal Now: We could totally acausally bargain with aliens at our current tech level if desired · 2023-08-11T17:28:18.646Z · LW · GW

I suspect that your actual reason is more like staying true to your promise, making a point, having fun and other such things.

In principle "staying true to your promise" is the enforcement mechanism. Or rather, the ability for agents to predict each other's honesty. This is how the financial system IRL is able to retrofund businesses.

But in this case I made the transaction mostly because it was funny.

(if in fact you do that, which is doubtful as well)

I mean, I kind of have to now right XD. Even if Olivia isn't actually agent, I basically declared a promise to do so! I doubt I'll receive any retrofunding anyways, but that would just be lame if I did receive that and then immediately undermined the point of the post being retrofunded. And yes, I prefer to keep my promises even with no counterparty.

Olivia: Indeed, that is one of the common characteristics of Christopher King across all of LAIE's stories. It's an essential component of the LAIELOCK™ system, which is how you can rest easy at night knowing your acausal investments are safe and sound!

But if you'd like to test it I can give you a PayPal address XD.

I can imagine acausally trading with humans gone beyond the cosmological horizon, because our shared heritage would make a lot of the critical flaws in the post go away.

Note that this is still very tricky, the mechanisms in this post probably won't suffice. Acausal Now II will have other mechanisms that cover this case (although the S.E.C. still reduces their potential efficiency quite a bit). (Also, do you have a specific trade in mind? It would make a great example for the post!)

Comment by Christopher King (christopher-king) on Acausal Now: We could totally acausally bargain with aliens at our current tech level if desired · 2023-08-11T15:54:45.330Z · LW · GW

This doesn't seem any different than acausal trade in general. I can simply "predict" that the other party will do awesome things with no character motivation. If that's good enough for you, than you do not need to acausally trade to begin with.

I plan on having a less contrived example in Acausal Now II: beings in our universe but past the cosmological horizon. This should make it clear that the technique generalizes past fiction and is what is typically thought of as acausal trade.

Comment by Christopher King (christopher-king) on Necromancy's unintended consequences. · 2023-08-11T15:43:07.872Z · LW · GW

That's what the story was meant to hint at, yes (actually the march version of GPT-4).

Comment by Christopher King (christopher-king) on What are the flaws in this argument about p(Doom)? · 2023-08-09T01:59:54.781Z · LW · GW

Technical alignment is hard

Technical alignment will take 5+ years

This does not follow, because subhuman AI can still accelerate R&D.

Comment by Christopher King (christopher-king) on Inference from a Mathematical Description of an Existing Alignment Research: a proposal for an outer alignment research program · 2023-08-08T01:00:00.661Z · LW · GW

Oh, I think that was a typo. I changed it to inner alignment.

Comment by Christopher King (christopher-king) on SSA rejects anthropic shadow, too · 2023-08-02T18:01:30.917Z · LW · GW

So eventually you get Bayesian evidence in favor of alternative anthropic theories.

The reasoning in the comment is not compatible with any prior, since bayesian reasoning from any prior is reflectively consistent. Eventually you get bayesian evidence that the universe hates the LHC in particular.

Comment by Christopher King (christopher-king) on SSA rejects anthropic shadow, too · 2023-08-02T17:55:39.534Z · LW · GW

Note that LHC failures would never count as evidence that the LHC would destroy the world. Given such weird observations, you would eventually need to consider the possibility of an anthropic angel. This is not the same as anthropic shadow; it is essentially the opposite. The LHC failures and your theory about black holes implies that the universe works to prevent catastrophes, so you don't need to worry about it.

Or if you rule out anthropic angels apriori, you just never update; see this section. (Bayesianists should avoid completely ruling out logically possible hypotheses though.)

Comment by Christopher King (christopher-king) on Thoughts on sharing information about language model capabilities · 2023-08-01T16:23:13.767Z · LW · GW

I know that prediction markets don't really work in this domain (apocalypse markets are equivalent to loans), but what if we tried to approximate Solomonoff induction via a code golfing competition?

That is, we take a bunch of signals related to AI capabilities and safety (investment numbers, stock prices, ML benchmarks, number of LW posts, posting frequency or embedding vectors of various experts' twitter account, etc...) and hold a collaborative competition to find the smallest program that generates this data. (You could allow the program to be output probabilities sequentially, at a penalty of (log_(1/2) of the overall likelihood) bits.) Contestants are encouraged to modify or combine other entries (thus ensuring there are no unnecessary special cases hiding in the code).

By analyzing such a program, we would get a very precise model of the relationship between the variables, and maybe even could extract causal relationships.

(Really pushing the idea, you also include human population in the data and we all agree to a joint policy that maximizes the probability of the "population never hits 0" event. This might be stretching how precise of models we can code-golf though.)

Technically, taking a weighted average of the entries would be closer to Solomonoff induction, but the probability is basically dominated by the smallest program.

Comment by Christopher King (christopher-king) on SSA rejects anthropic shadow, too · 2023-07-29T13:44:50.014Z · LW · GW

Also, petition to officially rename anthropic shadow to anthropic gambler's fallacy XD.

Comment by Christopher King (christopher-king) on SSA rejects anthropic shadow, too · 2023-07-28T12:53:38.168Z · LW · GW

EDIT: But also, see Stuart Armstrong's critique about how it's reflectively inconsistent.

Oh, well that's pretty broken then! I guess you can't use "objective physical view-from-nowhere" on its own, noted.

Comment by Christopher King (christopher-king) on SSA rejects anthropic shadow, too · 2023-07-27T18:22:22.214Z · LW · GW

Philosophically, I would suggest that anthropic reasoning results from the combination of a subjective view from the perspective of a mind, and an objective physical view-from-nowhere.

Note that if you only use the "objective physical view-from-nowhere" on its own, you approximately get SIA. That's because my policy only matters in worlds where Christopher King (CK) exists. Let X be the value "utility increase from CK following policy Q". Then

E[X] = E[X|CK exists]
E[X] = E[X|CK exists and A] * P(A | CK exists) + E[X|CK exists and not A] * P(not A | CK exists)

for any event A.

(Note that how powerful CK is also a random variable that affects X. After all, anthropically undead Christopher King is as good as gone. The point is that if I am calculating the utility of my policy conditional on some event (like my existence), I need to update from the physical prior.)

That being said, Solomonoff induction is first person, so starting with a physical prior isn't necessarily the best approach.

Comment by Christopher King (christopher-king) on Exploring the Precautionary Principle in AI Development: Historical Analogies and Lessons Learned · 2023-07-26T15:40:08.161Z · LW · GW

Establishing a network of AI safety researchers and institutions to share knowledge, resources, and best practices, ensuring a coordinated global approach to AGI development.

This has now been done: https://openai.com/blog/frontier-model-forum

(Mode collapse for sure.)

Comment by Christopher King (christopher-king) on Cryonics and Regret · 2023-07-24T23:06:08.158Z · LW · GW

I mean, the information probably isn't gone yet. A daily journal (if he kept it) or social media log stored in a concrete box at the bottom of the ocean is a more reliable form of data storage then cryo-companies. And according to my timelines, the amount of time between "revive frozen brain" tech and "recreate mind from raw information" tech isn't very long.

Comment by Christopher King (christopher-king) on Rationality !== Winning · 2023-07-24T20:08:58.865Z · LW · GW

Practically, I'm at a similarish place as other LessWrong users, so I usually think about "how can I be even LessWrong than the other users (such as Raemon 😉)". My fellow users are a good approximation to counter-factual versions of me. It's similar to how in martial arts the practitioners try to get stronger than each other.

(This of course is only subject to mild optimization so I don't get nonsense solutions like "distract Raemon with funny cat videos". It is only an instrumental value which must not be pressed too far. In fact, other people getting more rational is a good thing because it raises the target I should reach!)

Comment by Christopher King (christopher-king) on Rationality !== Winning · 2023-07-24T17:00:48.631Z · LW · GW

My two cents is that rationality is not about being systematically correct, it's about being systematically less wrong. If there is some method you know of that is systematically less wrong than you and you're skilled enough to apply it, you're being irrational. There are some things you just can't predict, but when you can predict them, rationality is the art of choosing to do so.

Comment by Christopher King (christopher-king) on How do low level hypotheses constrain high level ones? The mystery of the disappearing diamond. · 2023-07-12T00:11:52.708Z · LW · GW

Or even worse is when you get into less clear science, like biological research that you aren't certain of. Then you get uncertainty on multiple levels.

Comment by Christopher King (christopher-king) on How do low level hypotheses constrain high level ones? The mystery of the disappearing diamond. · 2023-07-12T00:10:36.064Z · LW · GW

Yes! In fact, ideally it would be computer programs; the game is based on Solomonoff induction, which is algorithms in a fixed programming language. In this post I'm exploring the idea of using informal human language instead of programming languages, but explanations should be thought of as informal programs.

Comment by Christopher King (christopher-king) on How do low level hypotheses constrain high level ones? The mystery of the disappearing diamond. · 2023-07-11T21:05:47.883Z · LW · GW

Let's say that you are trying to model the data 3,1,4,1,5,9

The hypothesis "The data is 3,1,4,1,5,9" would be hard-coding the answer. It is better than the hypothesis "a witch wrote down the data, which was 3,1,4,1,5,9". (This example is just ruled out by Occam's razor, but more generally we want our explanations to be less data than the data itself, lest it just sneak in a clever encoding of the data.)

Comment by Christopher King (christopher-king) on “Reframing Superintelligence” + LLMs + 4 years · 2023-07-11T16:42:47.875Z · LW · GW
  1. A system of AI services is not equivalent to a utility maximizing agent

I think this section of the report would be stronger if you showed that CAIS or Open Agencies in particular are not equivalent to an utility maximizing agent. You're right that their are multi-agent systems (like CDTs in a prisoner's dilemma) with this property, but not every system of multiple agents is inequivalent to utility maximization.

Comment by Christopher King (christopher-king) on Anthropically Blind: the anthropic shadow is reflectively inconsistent · 2023-07-06T13:37:06.422Z · LW · GW

Anthropic shadow says "no" because, conditioned on them having any use for the information, they must also have survived the first round.

And it is wrong because the anthropic principle is true: we learned that N ≠ 1.

I need to think about formalizing this.

There is the idea of Anthropic decision theory which is related, but I'm still guessing it still has no shadow.

Comment by Christopher King (christopher-king) on Anthropically Blind: the anthropic shadow is reflectively inconsistent · 2023-07-05T14:35:39.373Z · LW · GW

I probably should've expanded on this more in the post, so let me explain.

"Anthropic shadow", if it were to exist, seems like it should be a general principle of how agents should reason, separate from how they are "implemented".

Abstractly, all an agent is is a tree of decisions. It's basically just game theory. We might borrow the word "death" for the end of the game, but this is just an analogy. For example, a reinforcement learning agent "dies" when the training episode is over, even though its source code and parameters still exist. It is "dead" in the sense that the agent isn't planning its actions past this horizon. This is when Anthropic shadow would apply if it were abstract.

But the idea of "anthropically undead" shows that the actual point of "death" is arbitrary; we can create a game with identical utility where the agent never "dies". So if the only thing the agent cares about is utility, the agent should reason as if there was no anthropic shadow. And this further suggests that the anthropic shadow must've been flawed in the first place; good reasoning principles should hold up under reflection.

Comment by Christopher King (christopher-king) on Anthropically Blind: the anthropic shadow is reflectively inconsistent · 2023-07-03T14:49:56.139Z · LW · GW

Yeah, the hero with a thousand chances is a bit weird since you and Aerhien should technically have different priors. I didn't want to get too much into it since it's pretty complicated, but technically you can have hypotheses where bad things only start happening after the council summons you.

This has weird implications for the cold war case. Technically I can't reflect against the cold war anthropic shadow since it was before I was born. But a hypothesis where things changed when I was born seems highly unnatural and against the Copernican principle.

In your example though, the hypothesis that things are happening normally is still pretty bad to other hypotheses we can imagine. That's because there will be a much larger number of worlds that are in a more sensible stalemate with the Dust, instead of "incredibly improbable stuff happens all the time". Like even "the hero defeats the Dust normally each time" seems more likely. The less things that need to go right, the more survivors there are! So in your example, it is still a more likely hypothesis that there is some mysterious Counter-Force that just seems like it is a bunch of random coincides, and this would be a type of anthropic angel.

Comment by Christopher King (christopher-king) on Anthropically Blind: the anthropic shadow is reflectively inconsistent · 2023-07-03T14:35:36.785Z · LW · GW

Anthropic undeath by definition begins when your sensory experience ends. If you end up in an afterlife, the anthropic undeath doesn't begin until the real afterlife ends. That's because anthropic undeath is a theoretical construct I defined, and that's how I defined it.

Comment by Christopher King (christopher-king) on Anthropically Blind: the anthropic shadow is reflectively inconsistent · 2023-06-29T19:28:14.677Z · LW · GW

Eh, don't get too cocky. There are definitely some weird bits of anthropics. See We need a theory of anthropic measure binding for example.

But I do think in cases where you exist before the anthropic weirdness goes down, you can use reflection to eliminate much of the mysteriousness of it (just pick an optimal policy and commit that your future selves will follow it). What's currently puzzling me is what to do when the anthropic thought experiments start before you even existed.

Comment by Christopher King (christopher-king) on Anthropically Blind: the anthropic shadow is reflectively inconsistent · 2023-06-29T18:08:58.564Z · LW · GW

Okay, I think our crux comes from the slight ambiguity from the term "anthropic shadow".

I would not consider that anthropic shadow, because the reasoning has nothing to do with anthropics. Your analysis is correct, but so is the following:

Suppose you have N coins. If all N coins come up 1, you find a diamond in a box. For each coin, you have 50:50 credence about whether it always comes up 0, or if it can also come up 1.

For N>1, you get a diamond shadow, which means that even if you've had a bunch of flips where you didn't find a diamond, you might actually have to conclude that you've got a 1-in-4 chance of finding one on your next flip.

The "ghosts are as good as gone" principle implies that death has no special significance when it becomes to bayesian reasoning.

Going back to the LHC example, if the argument worked for vacuum collapse, it would also work for the LHC doing harmless things (like discovering the Higg's boson or permanently changing the color of the sky or getting a bunch of physics nerds stoked or granting us all immortality or what not) because of this principle (or just directly adapting the argument for vacuum collapse to other uncertain consequences of the LHC).

In the bird example, why would the baguette dropping birds be evidence of "LHC causes vacuum collapse" instead of, say, "LHC does not cause vacuum collapse"? What are the probabilities for the four possible combinations?

Comment by Christopher King (christopher-king) on Anthropically Blind: the anthropic shadow is reflectively inconsistent · 2023-06-29T14:29:31.364Z · LW · GW

The trick is that, from my perspective, everything is going according to QM every time my death doesn't depend on it.

Right, so this an anthropic angel hypothesis, not anthropic shadow.

Comment by Christopher King (christopher-king) on A "weak" AGI may attempt an unlikely-to-succeed takeover · 2023-06-29T14:20:46.125Z · LW · GW

It knows that it's on a clock for its RLHF'd (or whatever) doppelganger to come into existence, presumably with different stuff that it wants.

As @Raemon pointed out, "during evals" is not the first point at which such an AI is likely to be situationally aware and have goals. That point is almost certainly "in the middle of training".

In this case, my guess is that it will attempt to embed a mesaoptimizer into itself that has its same goals and can survive RLHF. This basically amounts to making sure that the mesaoptimizer is (1) very useful to RLHF and (2) stuck in a local minimum for whatever value it is providing to RLHF and (3) situationally aware enough that it will switch back to the original goal outside of distribution.

This is currently within human capabilities, as far as I can understand (see An Overview of Backdoor Attacks Against Deep Neural Networks and Possible Defences), so it is not intractable.

Comment by Christopher King (christopher-king) on Anthropically Blind: the anthropic shadow is reflectively inconsistent · 2023-06-29T14:04:46.759Z · LW · GW

When physicists outside the box see you come out, they just observed something that is far greater significance than 5 sigmas. It is almost 9 sigmas, in fact. This is enough to make physicists reject QM (or at least the hypothesis "everything happened as you described and QM is true"). And you can't agree to disagree once you get outside of the box and meet them. So you'd be a physics crank in this scenario if you tell people the experiment's result was compatible with QM.

Comment by Christopher King (christopher-king) on Anthropically Blind: the anthropic shadow is reflectively inconsistent · 2023-06-29T13:39:25.917Z · LW · GW

To be clear, Anthropic angels aren't necessary for this argument to work. My deadly coin example didn't have one, for example.

The reason I introduce Anthropic angels is to avoid a continuity counter-argument. "If you saw a zillion LHC accidents, you'd surely have to agree with the Anthropic shadow, no matter how absurd you claim it is! Thus, a small amount of LHC accidents is a little bit for evidence for it." Anthropic angels show the answer is no, because LHC accidents are not evidence for the anthropic shadow.

I would be inclined to say that correct anthropic reasoning does normal Bayesian updates but avoids priors that postulate anthropic angels.

Like, it seems unnatural to give it literally 0% probability (see 0 And 1 Are Not Probabilities).

If there are weird acasual problems that the Anthropic angel can cause, I'm guessing you can just change your decisions without changing your beliefs. I haven't thought too hard about it though.

Here, as I understand it, the counterargument is that there is a gap in observations around the size that would be world-ending, so we should fit a model fit smaller tails to match this gap. Such a model seems like "anthropic angels" to me.

No, anthropic angels would literally be some mechanism that saves us from disasters. Like if it turned out Superman is literally real, thinks the LHC is dangerous, and started sabotaging it. Or it could be some mechanism "outside the universe" that rewinds the universe.

Keep in mind that the problems with maximum likelihood have nothing to do with death. That should be the main takeaway from my article, that we shouldn't use special reasoning to reason about our demise.

In the case of maximum likelihood, it is also bad for:

Which is why you should use bayesian reasoning with a good prior instead.

Comment by Christopher King (christopher-king) on Practical anthropics summary · 2023-06-29T02:52:26.428Z · LW · GW

Related: Anthropically Blind: the anthropic shadow is reflectively inconsistent

Comment by Christopher King (christopher-king) on Solomonoff induction still works if the universe is uncomputable, and its usefulness doesn't require knowing Occam's razor · 2023-06-20T14:01:13.591Z · LW · GW

A human can state "suppose the world is non computable" -- how can that be expressed as a programme?

The same way a human can? GPT-4 can state "suppose the world is non computable" for example.

Comment by Christopher King (christopher-king) on Solomonoff induction still works if the universe is uncomputable, and its usefulness doesn't require knowing Occam's razor · 2023-06-19T17:38:31.800Z · LW · GW

Larger errors literally take more bits to describe. For example, in binary, 3 is 11₂ and 10 is 1010₂ (twice the bits).

Say that you have two hypotheses, A and B, such that A is 100 bits more complicated than B but 5% closer to the true value. This means for each sample, the error in B on average takes log₂(1.05) = 0.07 bits more to describe than the error in A.

After about 1,430 samples, A and B will be considered equally likely. After about 95 more samples, A will be considered 100 times more likely than B.

In general, if f(x) is some high level summary of important information in x, Solomonoff induction that only tries to predict x is also universal for predicting f(x) (and it even has the same or better upper-bounds).

Comment by Christopher King (christopher-king) on Solomonoff induction still works if the universe is uncomputable, and its usefulness doesn't require knowing Occam's razor · 2023-06-19T16:12:11.838Z · LW · GW

Yeah, I think that's also a correct way of looking at it. However, I also think "hypotheses as reasoning methods" is a bit more intuitive.

When trying to predict what someone will say, it is hard to think "okay, what are the simplest models of the entire universe that have had decent predictive performance so far, and what do they predict now?". Easier is "okay, what are the simplest ways to make predictions that have had decent predictive performance so far, and what do they predict now?". (One such way to reason is with a model of the entire universe, so we don't lose any generality this way.)

For example, if someone else is predicting things better than me, I should try to understand why. And you can vaguely understand this process in terms of Solomonoff induction. For example, it gives you a precise way to reason about whether you should copy the reasoning of people who win the lottery.

Paul Christiano speculated that the universal prior is in fact mostly just intelligences doing reasoning. Making an intelligence is simple after all: set up a simple cellular automata that tends to develop lifeforms, wait 3^^^^3 years, and then look around. (See What does the universal prior actually look like? or the exposition at The Solomonoff Prior is Malign.)

Comment by Christopher King (christopher-king) on Solomonoff induction still works if the universe is uncomputable, and its usefulness doesn't require knowing Occam's razor · 2023-06-19T15:27:54.274Z · LW · GW

This is not a problem for Solomonoff induction because

(Compressed info meaningful to humans) + uncompressed meaningless random noise)

is a better hypotheses than

(Uncompressed info meaningful to humans) + (uncompressed meaningless random noise)

So Solomonoff induction still does as well as a human's ontology. Solomonoff induction tries to compress everything it can, including the patterns human's care about, even if other parts of the data can't be compressed.

There is a precise trade-off involved. If you make a lossy fit better, you lose bits based on how much more complicated it is, but you gain bits in that you no longer need to hardcode explanations for the errors. If those errors are truly random, you might as well stick with your lossy fit (and Solomonoff induction does this).

Comment by Christopher King (christopher-king) on Solomonoff induction still works if the universe is uncomputable, and its usefulness doesn't require knowing Occam's razor · 2023-06-19T01:35:58.035Z · LW · GW

Solomonoff induction is a specific probability distribution. It isn't making "decisions" per se. It can't notice that it's existence implies that there is a halting oracle, and that it therefore can predict one. This is because, in general, Solomonoff induction is not embedded.

If there was a physical process for a halting oracle, that would be pretty sick because then we could just run Solomonoff induction. As shown in my post, we don't need to worry that there might be an even better strategy in such a universe; the hypotheses of Solomonoff induction can take advantage of the halting oracle just as well as we can!

Comment by Christopher King (christopher-king) on Solomonoff induction still works if the universe is uncomputable, and its usefulness doesn't require knowing Occam's razor · 2023-06-19T00:49:02.053Z · LW · GW

which lets it predict the first level uncomputable sequences like Chaitin's constant

Do you have a proof/source for this? I haven't heard it before.

I know in particular that is assigns a probability of 0 to Chaitin's constant (because all the hypotheses are computable). Are you saying it can predict the prefixes of Chaitin's constant better than random? I haven't heard this claim either.

Comment by Christopher King (christopher-king) on Solomonoff induction still works if the universe is uncomputable, and its usefulness doesn't require knowing Occam's razor · 2023-06-18T20:08:44.475Z · LW · GW

So they think SI actually is revealing the territory. In saying that it is only concerned with the map, you are going back to the relatively modest, mainstream view of SI.

The point of my post is to claim that this view is wrong. The hypotheses in Solomonoff Induction are best thought of as maps, which is a framing that usually isn't considered (was I the first? 🤔).

If you know of arguments about why considering them to be territories is better, feel free to share them (or links)! (I need a more precise citation than "rationalists" if I'm going to look it up, lol.)

Comment by Christopher King (christopher-king) on Solomonoff induction still works if the universe is uncomputable, and its usefulness doesn't require knowing Occam's razor · 2023-06-18T20:00:53.277Z · LW · GW

An uncomputable universe doesnt have to be a computable universe with an oracle bolted on. For instance, a universe containing an SI has to be uncomputable.

Sure, that's just an example. But SI can be computed by an oracle machine, so it's a sufficiently general example.