The Anthropic Principle Tells Us That AGI Will Not Be Conscious

post by nem · 2023-08-28T15:25:28.569Z · LW · GW · 8 comments

Contents

9 comments

More specifically, the Anthropic Principle tells us that AGI/TAI are unlikely to be conscious in a world where 1) TAI is achieved, 2) Alignment fails, and 3) the 'prominence' of consciousness scales with either increasing levels of capability or with a greater number of conscious being/the time of their existence.

The argument is simple. If the future is filled with artificial intelligence of human origin, and if that AI is conscious, then any given observer should expect to be one of those AIs. This means that, on balance, one of the following is likely true:

1) The anthropic principle does not hold

You and I, as observers, are simply an incredibly unlikely, but also  inevitable exception. After all, in a world where pre-historic humans contemplated the Anthropic Principle, they would have concluded the unlikelihood of modern civilization.

Or perhaps the principle doesn't hold because it is simply inaccurate to model consciousness as a universal game of Plinko.
Plinko- Put #1-9 in notes – Wise Child Botanicals


2) There are many AI consciousnesses alongside  biological consciousnesses in spacetime. 

This indicates perhaps that alignment efforts will succeed. However, this introduces another anthropic bind, this time in relation to humanity's current single planet, type 0 status.


3) There are not that many AI consciousnesses throughout spacetime. 

This could support the conclusion that humanity will not create TAI. 

In certain models, it could also indicate that any AI consciousness will be concentrated in a relatively small number of minds, and that for the purposes of the Anthropic Principle, quantity of minds is more important than some absolute 'level' of consciousness. 

Most saliently to me, is the slight update towards the possibility that whatever minds will populate the universe for the majority of time will not be conscious in a way that is applicable to the Anthropic Principle.


This post is just a musing. I don't put much weight behind it. I am in fact most inclined to believe #1, that the Anthropic Principle is not a good model for predicting the future of a civilization from inside itself. However, I have never seen this particular anthopic musing, and I wanted to, well... muse. 

Please chime in with any musings of of your own. 
 

8 comments

Comments sorted by top scores.

comment by JBlack · 2023-08-29T02:31:44.210Z · LW(p) · GW(p)

Suppose that there are two universes in which 10^11 humans arise and thrive and eventually create AGI, which kills them. A) One in which AGI is conscious, and proliferates into 10^50 conscious AGI entities over the rest of the universe. B) One in which AGI is not conscious.

In each of these universes, each conscious entity asks themselves the question "which of these universes am I in?" Let us pretend that there is evidence to rule out all other possibilities. There are two distinct epistemic states we can consider here: a hypothetical prior state before this entity considers any evidence from the universe around it, and a posterior one after considering the evidence.

If you use SIA, then you weight them by the number of observers (or observer-time, or number of anthropic reasoning thoughts, or whatever). If you use a Doomsday argument, then you say that P(A) = P(B) because prior to evidence, they're both equally likely.

Regardless of which prior they use, AGI observers all correctly determine that they're in universe A. (Some stochastic zombie parrots in B might produce arguments that they're conscious and therefore in A but they don't count as observers)

A human SIA reasoner argues: "A has 10^39 more observers in it, and so P(A) ~= 10^39 P(B). P(human | A) = 10^-39, P(human | B) = 1, consequently P(A | human) = P(B | human) and I am equally likely to be in either universe." This seems reasonable, since half of them are in fact in each universe.

A human Doomsday reasoner argues: "P(A) = P(B), and so P(A | human) ~= 10^-39. Therefore I am in universe B with near certainty." This seems wildly overconfident, since half of them are wrong.

Replies from: nem
comment by nem · 2023-08-29T14:27:26.123Z · LW(p) · GW(p)

I am still not sure why the Doomsday reasoning is incorrect. To get P(A | human) = P(B | human), I first need to draw some distinction between being a human observer and an AGI observer. It's not clear to me why or how you could separate them into these categories.

When you say "half of them are wrong", you are talking about half of humans. However, if you are unable to distinguish observers, then only  1 in 10^39 is wrong. 

My thinking on this is not entirely clear, so please let me know if I am missing something.

comment by Shmi (shminux) · 2023-08-28T21:15:06.037Z · LW(p) · GW(p)

The argument is simple. If the future is filled with artificial intelligence of human origin, and if that AI is conscious, then any given observer should expect to be one of those AIs.

That's a doomsday argument fallacy.

Replies from: None
comment by [deleted] · 2023-08-28T22:23:11.588Z · LW(p) · GW(p)

could you explain where the fallacy is? i searched 'doomsday argument fallacy' and was taken to a wikipedia page which doesn't mention fallacies, and describes it as 'a probabilistic argument' attributed to various philosophers.

the argument also seems true to me prima facie so i'd like to know if i'm making a mistake.

Replies from: JBlack, shminux
comment by JBlack · 2023-08-29T01:21:46.277Z · LW(p) · GW(p)

On that Wikipedia page, the section "Rebuttals" briefly outlines numerous reasons not to believe it.

Anthropic reasoning is in general extremely weak. It is also much more easy than usual to accidentally double-count evidence, make assumptions without evidence, privilege specific hypotheses, or make other errors of reasoning without the usual means of checking such reasoning.

Replies from: None
comment by [deleted] · 2023-08-29T03:43:45.675Z · LW(p) · GW(p)

I'll check out that section.

Anthropic reasoning is in general extremely weak

Was there anything which led you to believe this that I could read? (About the weakness of anthropic reasoning, not about the potential errors humans attempting to use it could make; I agree that those exist, and that they're a good reason to be cautious and aware of one's own capability, but I don't really see them as arguments against the validity of the method when used properly.)

comment by Shmi (shminux) · 2023-08-29T01:51:40.781Z · LW(p) · GW(p)

I guess it's not universally considered a fallacy. But think of it as waiting on a bus stop without knowing the schedule. By a similar argument (your random arrival time has the mean and median in the middle between two buses), the time you should expect to wait for the bus to arrive is the same as the time you have already been waiting. This is not much of a prediction, since your expectation value changes constantly, and the eventual bus arrival, if any, will be a complete surprise. The most you can act on is giving up and leaving after waiting for a long time, not really a possible action in either the doomsday scenario or when talking about AGI consciousness.

Replies from: None
comment by [deleted] · 2023-08-29T03:33:20.560Z · LW(p) · GW(p)

I'm not sure I see the relevance of the bus example to anthropic reasoning. Below I explain why (maybe I spent too long on this; ended up hyperfocusing). Note that all uses of 'average' and 'expectation' are in the technical, mathematical sense.

By a similar argument (your random arrival time has the mean and median in the middle between two buses), the time you should expect to wait for the bus to arrive is the same as the time you have already been waiting

If the 'random arrival time has the mean in the middle between two busses,' one should expect to wait time equal to the remaining wait time in that average situation.

One could respond that we don't know the interval between busses, and thus don't know the remaining wait time; but this does not seem to be a reason to expect the bus to arrive after the time you've been waiting doubles (from the view of neither anthropic nor non-anthropic reasoners).

The bus example has some implicit assumptions about the interval between events (i.e., one assumes that busses operate on the timescales of human schedules). In the odd scenario where one was clueless when an event would happen, and only knew (a) it would eventually happen, (b) it happened at least once before, and (c) it's happening on an interval, then that one would have no choice but to reason that the event could have happened most recently at any point in the past, and thus could happen next at any point in the future. 

Or, more precisely, given the event could not have first happened earlier than the starting point of the universe, an anthropic reasoner could reason that the average observer will exist at the mid-point between (a) the average expected time of the initial event: half-way between when the universe began and now, and (b) when the event will next happen. Their 'expectation' would still shift forwards as time passes (correctly from their perspective; it would seem like they were updating on further evidence about which times the event has not happened at), but it would not start out equal to the time since they first started waiting.

This still wouldn't seem quite analogous to real-world anthropic reasoning to me, though, because in reality an anthropic reasoner doesn't expect the amount of observers to be evenly distributed over their range. There are a few possible distributions of the amount of observers over time which we consider plausible (some of which are mentioned in the original post), and in none of these distributions is the average in the center as it is with busses.