utilistrutil's Shortform

utilistrutil

utilistrutil's Shortform

post by utilistrutil · 2023-11-23T20:11:49.028Z · LW · GW · 10 comments

10 comments

10 comments

Comments sorted by top scores.

comment by utilistrutil · 2024-07-01T20:18:48.511Z · LW(p) · GW(p)

Scott Alexander says:

Suppose I notice I am a human on Earth in America. I consider two hypotheses. One is that everything is as it seems. The other is that there is a vast conspiracy to hide the fact that America is much bigger than I think - it actually contains one trillion trillion people. It seems like SIA should prefer the conspiracy theory (if the conspiracy is too implausible, just increase the posited number of people until it cancels out).

I am often confused by the kind of reasoning at play in the text I bolded. Maybe someone can help sort me out. As I increase the number of people in the conspiracy world, my prior in that world also decreases. If my prior falls faster than the number of people in the considered world grows, I will not be able to construct a conspiracy-world that allows the thought experiment to bite.

Consider the situation where I arrive at the airport, where I will wait in line at security. Wouldn't I be more likely to discover a line 1000 people long than 100 people long? I am 10x more likely to exist in the longer line. The problem is that our prior on 1000 people security lines might be very low. The reasoning on display in the above passage would invite us to simply crank up the length of the line, say, to 1 million people. I suspect that SIA proponents don't show up at the airport expecting lines this long. Why? Because the prior on a million-person line is more than a thousand times lower than the prior on a 100-person line.

This also applies to some presentations of Pascal's mugging.

Replies from: cubefox, JBlack, Zane

↑ comment by cubefox · 2024-07-06T11:29:27.220Z · LW(p) · GW(p)

This point was recently elaborated on here: Pascal's Mugging and the Order of Quantification [LW · GW]

↑ comment by JBlack · 2024-07-02T05:53:09.295Z · LW(p) · GW(p)

There's no principle that says that prior probability of a population exceeding some size N must decrease more quickly than 1/N asymptotically, or any other property of some system. Some priors will have this property, some won't.

My prior for real-world security lines does have this property, though this cheats a little by being largely founded in real-world experience already. Does my prior for population of hypothetical worlds involving Truman Show style conspiracies (or worse!) have this property? I don't know - maybe not?

Does it even make sense to have a prior over these? After all a prior still requires some sort of model that you can use to expect things or not, and I have no reasonable models at all for such worlds. A mathematical "universal" prior like Solomonoff is useless since it's theoretically uncomputable, and also in a more practical sense utterly disconnected from the domain of properties such as "America's population".

On the whole though, your point is quite correct that for many priors you can't "integrate the extreme tails" to get a significant effect. The tails of some priors are just too thin.

↑ comment by Zane · 2024-07-02T08:01:47.413Z · LW(p) · GW(p)

While you're quite right about numbers on the scale of billions or trillions, I don't think it makes sense in the limit for the prior probability of X people existing in the world to fall faster than X grows in size.

Certain series of large numbers grow larger much faster than they grow in complexity. A program that returns 10^(10^(10^10)) takes fewer bits to specify (relative to most reasonable systems of specifying programs) than a program that returns 32758932523657923658936180532035892630581608956901628906849561908236520958326051861018956109328631298061259863298326379326013327851098368965026592086190862390125670192358031278018273063587236832763053870032004364702101004310417647840155719238569120561329853619283561298215693286953190539832693826325980569123856910536312892639082369382562039635910965389032698312569023865938615338298392306583192365981036198536932862390326919328369856390218365991836501590931685390659103658916392090356835906398269120625190856983206532903618936398561980569325698312650389253839527983752938579283589237325987329382571092301928* - even though 10^(10^(10^10)) is by far the larger number. And it only takes a linear increase in complexity to make it 10^(10^(10^(10^(10^(10^10))))) instead.

*I produced this number via keyboard-mashing; it's not anything special.

Consider the proposition "A superpowered entity capable of creating unlimited numbers of people ran a program that output the result of a random program out of all possible programs (with their outputs rendered as integers), weighted by the complexity of those programs, and then created that many people."

If this happened, the probability that their program outputs at least X would fall much slower than X rises, in the limit. The sum doesn't converge at all; the expected number of people created would be literally infinite.

So as long as you assign greater than literally zero probability to that proposition - and there's no such thing as zero probability - there must exist some number X such that you assign greater than 1/X probability to X people existing. In fact, there must exist some number X such that you assign greater than 1/X probability to X million people existing, or X billion, or so on.

(btw, I don't think that the sort of SIA-based reasoning here is actually valid - but if it was, then yeah, it implies that there are infinite people.)

Replies from: JBlack

↑ comment by JBlack · 2024-07-03T05:21:22.978Z · LW(p) · GW(p)

I think when you get to any class of hypotheses like "capable of creating unlimited numbers of people" with nonzero probability, you run into multiple paradoxes of infinity.

For example, there is no uniform distribution over any countable set, which includes the set of all halting programs. Every non-uniform distribution this hypothetical superbeing may have used over such programs is a different prior hypothesis. The set of these has no suitable uniform distribution either, since they can be partitioned into countably many equivalence classes under natural transformations.

It doesn't take much study of this before you're digging into pathologies of measure theory such as Vitali sets and similar.

You can of course arbitrarily pick any of these weightings to be your "chosen" prior, but that's just equivalent to choosing a prior over population directly so it doesn't help at all.

Probability theory can't adequately deal with such hypothesis families, and so if you're considering Bayesian reasoning you must discard them from your prior distribution. Perhaps there is some extension or replacement for probability that can handle them, but we don't have one.

comment by utilistrutil · 2024-03-18T03:06:19.109Z · LW(p) · GW(p)

I just came across this word from John Koenig's Dictionary of Obscure Sorrows, that nicely capture the thesis of All Debates Are Bravery Debates.

redesis n. a feeling of queasiness while offering someone advice, knowing they might well face a totally different set of constraints and capabilities, any of which might propel them to a wildly different outcome—which makes you wonder if all of your hard-earned wisdom's fundamentally nonstraferable, like handing someone a gift card in your name that probably expired years ago.

comment by utilistrutil · 2024-07-22T18:30:00.381Z · LW(p) · GW(p)

I would really like to see a post from someone in AI policy on "Grading Possible Comprehensive AI Legislation." The post would lay out what kind of safety stipulations would earn a bill an "A-" vs a "B+", for example.

I'm imagining a situation where, in the next couple years, a big omnibus AI bill gets passed that contains some safety-relevant components. I don't want to be left wondering "did the safety lobby get everything it asked for, or did it get shafted?" and trying to construct an answer ex-post.

comment by utilistrutil · 2025-02-23T23:56:28.088Z · LW(p) · GW(p)

Conditioning as a Crux Finding Device

Say you disagree with someone, e.g. they have low pdoom and you have high pdoom. You might be interested in finding cruxes with them.

You can keep imagining narrower and narrower scenarios in which your beliefs still diverge. Then you can back out properties of the final scenario to identify cruxes.

For example, you start by conditioning on AGI being achieved - both of your pdooms tick up a bit. Then you also condition on that AGI being misaligned, and again your pdooms increase a bit (if the beliefs move in opposite directions, that might be worth exploring!). Then you condition on the AGI self-exfiltrating, and your pdooms nudge up again.

Now you've found a very narrow scenario in which you still disagree! You think it's obvious that a misaligned AGI proliferating around the world is an endgame, they don't see what the big deal is. From there, you're in a good position to find cruxes.

(Note that you're not necessarily finding the condition of maximum disagreement, you're just trying to get information about where you disagree.)

comment by utilistrutil · 2024-07-12T21:36:47.393Z · LW(p) · GW(p)

File under 'noticing the start of an exponential': A.I. Helped to Find a Vast Source of the Copper That A.I. Needs to Thrive

comment by utilistrutil · 2023-11-23T20:11:49.117Z · LW(p) · GW(p)

Today I am thankful that Bayes' Rule is unintuitive.

Much ink has been spilled complaining that Bayes' Rule can yield surprising results. As anyone who has taken an introductory statistics class knows, it is difficult to solve a problem that requires an application of Bayes' Rule without plugging values into the formula, at least for a beginner. Eventually, the student of Bayes may gain an intuition for the Rule (perhaps in odds form), but at that point they can be trusted to wield their intuition responsibly because it was won through disciplined practice.

This unintuitiveness is a feature, not a bug because it discourages motivated reasoning. If Bayes' Rule were more intuitive, it would be simple to back out [? · GW] what P(A), P(B), and P(B|A) must be to justify your preferred posterior belief, and then argue for these quantities. It would also be simple to work backwards to select your prediction A from a favorable hypothesis space [LW · GW]. Because Bayes' Rule is unintuitive, these are challenging moves, and formally updating your beliefs is less vulnerable to motivated reasoning.

Happy Thanksgiving!

utilistrutil's Shortform

Contents

10 comments