Do anthropic considerations undercut the evolution anchor from the Bio Anchors report?
post by Ege Erdil (ege-erdil) · 2022-10-01T20:02:48.464Z · LW · GW · 7 commentsThis is a question post.
Contents
Answers 10 John Schulman 8 jacob_cannell None 7 comments
The Bio Anchors report [LW · GW] by Ajeya Cotra tries to forecast AGI timelines by getting estimates of how much compute might be needed to build or train a model that has general intelligence capabilities. This estimate is based on various biological comparisons, and for the purposes of this question I want to focus on the most conservative estimate from the report, that coming from how much computation was done by evolution in the history of life.
It turns out we can bound this in several different ways by a number on the order of FLOP. One argument could be this: the Landauer limit bound on how much compute can be done on Earth given how much energy the Earth is able to radiate is something like bit erasures/year, and we can cut this down by many orders of magnitude due to various considerations: brains operate far from the Landauer limit at the bit level (though maybe not if you use a larger scale Landauer limit that conditions on irreversible information transmission via electrons, see this post [LW · GW]), evolution is only responsible for a small fraction of the Earth's radiation, et cetera. Overall I think something on the order of FLOP for total compute done by evolution seems sufficiently big to be safe. The Bio Anchors report uses a different method but arrives at a number that's not very different from this one.
There is, however, a potential problem with this: we can imagine that evolution is an optimization process that's running on many different planets, and only in a select few number of them does evolution end up producing general intelligence. If this is true, producing general intelligence can be a much harder problem than this calculation gives it credit for, because anthropic considerations mean we would only be asking the question of how difficult general intelligence is to produce in worlds where general intelligence was actually produced. We should therefore multiply our compute upper bound by the reciprocal of the probability that this amount of compute is actually sufficient to produce general intelligence, and anthropic considerations make it very difficult to put any nontrivial bounds on this probability, threatening to make the entire estimate worthless.
One argument I see on the other side is from Robin Hanson's grabby aliens model. He points out that if evolution happens in a sequence of steps and some of these steps are "hard" in the sense that their expected completion time is much greater than the lifespan of the Earth, and the completion time of each step follows an exponential distribution with unknown mean, then conditioning on all of the steps being successful in time to give rise to intelligent life, these steps will all look like they have the same average length in the history of life on Earth, and moreover this length will also on average equal the time remaining for Earth to become uninhabitable.
As Earth still has many hundreds of millions of years before it becomes uninhabitable due to natural processes, this suggests that anything in evolutionary history that happened in much less time than this could not have been a hard step, and in particular the whole transition from apes to humans happened too quickly in evolutionary time for it to count as a hard step. So if we think the "juice" of general intelligence is in the last few million years, this argument could suggest anthropic considerations are not important after all, but I remain unsure about this.
What do you think about this problem? I'm curious to hear people's thoughts.
Answers
I think that doing N independent parallel computation and selecting one of them is way less useful than doing an N times longer serial computation. This kind of selection only helps you guess something that is impossible to deduce in any other way. So if anthropics is tacitly selecting the earth out of N other worlds, that doesn't contribute a factor of N to the total computation, it's a much smaller factor.
EDIT: intended to write a comment rather than an answer.
↑ comment by Ege Erdil (ege-erdil) · 2022-10-01T22:47:15.880Z · LW(p) · GW(p)
I agree in general with this point, but in this context we have a problem even if the scaling is something extremely favorable like ~ log log N instead of ~ N, at least under the assumption that the universe is infinite.
The problem is really that we don't know N and it could be arbitrarily large in an infinite universe, so while I agree linear scaling is too pessimistic of an assumption and the right scaling is probably more like a power law with smaller exponent, I don't see how any scaling that's not bounded from above in N is going to get around the anthropic problems.
I'm just generally confused by anthropics and I made this post to get the opinions of other people on whether this is actually a problem or not.
Replies from: ege-erdil↑ comment by Ege Erdil (ege-erdil) · 2022-10-02T18:44:17.730Z · LW(p) · GW(p)
Continuing this thread because I had a thought that seems interesting: Robin Hanson's grabby aliens model actually makes predictions about how useful serial compute is versus parallel compute.
Specifically, if intelligent life evolves in N hard steps, then the probability of intelligent life evolving in time T on a given planet scales as T^N when T is small. So doubling the available time scales the probability by 2^N. Hanson's point estimate of N is around 10, based on the evidence I discuss in the body of the question and some other sources of evidence he considers.
Furthermore, Hanson's ~ 1 Gly estimate of distance between grabby alien origins also suggests that even in the presence of "quiet aliens", an upper bound on the typical volume you need to get a grabby civilization origin in 10 Gly is around 1 Gly^3, which is a volume that contains around 10^20 planets in orbit around stars. So life probably had a 10^(-20) chance of evolving in ~ 10 Gly, which means the chances go to roughly even odds once we scale the time up to ~ 1000 Gly. Therefore this whole story might only contribute a factor of 100 to the total compute we need, which is relatively insignificant. The scaling with N of the correction we get from anthropics is around ~ 20/N in OOM space (after dropping some terms scaling with log N): if N = 1 we get 20 OOM of correction in serial compute, if N = 2 we get 10 OOM, et cetera.
So at least on this model my power law intuition was correct and the exponent of the power law is the number of hard steps we need in evolution. If N is big, the anthropic shadow is actually quite small in serial compute space.
This estimate is based on various biological comparisons, and for the purposes of this question I want to focus on the most conservative estimate from the report, that coming from how much computation was done by evolution in the history of life.
There is being reasonably conservative, and then there is that. There is a singularly correct way to generate predictive theories (bayesianism/solomonoff/etc), and it always involves the product (or sum in logprob bits) of theory prior complexity and theory posterior postdiction power. The theory based on anchoring to the sum "computation done by evolution in the history of life" completely fails to postdict the success of deep learning, and thus is so wildly improbable as to not be worth any further consideration, and seriously undermines the entire report.
There is a much simpler model which tightly postdicts the evidence to date: AI progress is straightforwardly predictable by availability of compute equivalent to that required to train some relevant set of functionally similar minimal-sized brain modules.
brains operate far from the Landauer limit at the bit level (though maybe not if you condition on information transmission via electrons, see this post [LW · GW]),
The point of that post was to disprove/prevent that common claim about brain energy efficiency, as it's based on naive oversimplified layman models (which any good semiconductor engineer would already scoff at, but the wisdom from that field is not widely enough distributed). Perhaps I didn't communicate the simplest form of the argument, but it's something like this:
First you can't compare complex high level analog synaptic ops or floating-point ops (which are used to simulate synaptic ops) to low level binary switch ops: the former complex elements are built out of multiple OOM quantity of the simpler elements.
Secondly any real computer/brain is ultimately built out of minimal atomic elements which themselves are also computers to which the Landauer limit applies. So a computer could only operate directly at the (error rate corrected) thermodynamic limit implied by counting only it's switching elements (transistors) if it didn't require wires/interconnect. But in actuality any real computer/brain built out of such minimal elements ends up being mostly wires to be useful. Worse yet, the thermal output of a maximally miniaturized machine (to minimize interconnect cost) would be far beyond practical cooling capacity, and even worse worse yet the required cooling pipes (for the otherwise optimal 3d layout, as used in brains) would then inflate the volume and thus interconnect. Or you go 2d (as in silicon chips) and you can cool using a heatsink in the vertical dimension but then you have much worse interconnect scaling.
Or put it another way: latest '4 nm' foundry tech is now within about 10x or so of minimal feature scaling, but energy efficiency scaling is already slowing down, and we still haven't yet matched the brain there. The probability that biological evolution (which quickly evolved optimal nanotech bots operating right at the landauer limit), and the vast intelligence of our semiconductor research industry (humanity's greatest technological achievement by landslides) just happened to reach about the same physical limits, without those limits being fundamental, is very small.
Case in point, the new RTX 4090 built on latest TSMC 4N process has 7.6e10 transistors switching at 2.2e9 hz while using 450 watts, or roughly 1.6e20 transistor bit erasures in 450J, or 3.7e-17 J/erasure, which is only about 200x larger than the Landauer bound of 1eV [LW(p) · GW(p)] (1.6e-19 J) for reliable digital switching - if we wrongly assumed non-dissipative wires. If we assume more realistically that wire/interconnect dissipation uses 90% of the energy, then we are already only around 20x from hard limits.
This is why the future planned process nodes continue to offer die shrinkage, but disappointing shrinking energy efficiency improvements. It's also why GPUs increasingly have more flops than can even be usefully shuffled to registers, and why GPU energy use is growing exponentially with shrinkage.
If this is true, producing general intelligence can be a much harder problem than this calculation gives it credit for, because anthropic considerations mean we would only be asking the question of how difficult general intelligence is to produce in worlds where general intelligence was actually produced.
If one starts doing anthropic adjustments, you don't just stop there. The vast majority of our measure will be in various simulations, which dramatically shifts everything around to favor histories with life leading to singularities.
↑ comment by Ege Erdil (ege-erdil) · 2022-10-02T00:17:02.300Z · LW(p) · GW(p)
The theory based on anchoring to the sum "computation done by evolution in the history of life" completely fails to postdict the success of deep learning, and thus is so wildly improbable as to not be worth any further consideration, and seriously undermines the entire report.
I don't think the point is that you need that much compute but that's an upper bound on how much compute you might need. So I don't understand your argument; it's not like the report takes this as its central estimate. I don't think the scaling in performance we've seen in the past 10 years, in which training compute got scaled up by 6-7 OOM in total, is strong evidence against training requirements for AGI being around 10^40 FLOP. That question just looks mostly uncertain to me.
The point of that post was to disprove statements like that as they are based on naive oversimplified layman models (which any good semiconductor engineer would already scoff at, but the wisdom from that field is not widely enough distributed).
Again, I don't think this is particularly relevant to the post. I agree with you that the Landauer limit bound is very loose, that's the entire reason I cited your post to begin with. I'm not sure why you somehow felt that your message had not been properly communicated. I've edited this part of the question to clarify what I actually meant.
However, it's much easier to justify this bound for a physical computation you don't understand very well than to justify something that's tighter, and all you get after correcting for that is probably ~ 5 OOM of difference in the final answer, which I already incorporate in my 10^45 FLOP figure and which is also immaterial to the question I'm trying to ask here.
If one starts doing anthropic adjustments, you don't just stop there. The vast majority of our measure will be in various simulations, which dramatically shifts everything around to favor histories with life leading to singularities.
I strongly disagree with this way of applying anthropic adjustments. I think these should not in principle be different from Bayesian updates: you start with some prior (could be something like a simplicity prior) over all subjective universes you could have observed and update based on what you actually observe. In that case there's a trivial sense in which the simulation hypothesis is true, because you could always have a simulator that simulates every possible program that halts or something like this, but that doesn't help you actually reduce the entropy of your own observations or to predict anything about the future, so it is not functional.
I think for this to go through you need to do anthropics using SIA or something similar and I don't think that's justifiable, so I also think this whole argument is illegitimate.
Replies from: jacob_cannell↑ comment by jacob_cannell · 2022-10-02T00:51:59.228Z · LW(p) · GW(p)
So I don't understand your argument; it's not like the report takes this as its central estimate. I don't think the scaling in performance we've seen in the past 10 years, in which training compute got scaled up by 6-7 OOM in total, is strong evidence against training requirements for AGI being around 10^40 FLOP.
My argument is that the report does not use the correct procedure, where the correct procedure is to develop one or a few simple models that best postdict the relevant observed history. Most of the (correct) report would then be comparing posdictions of the simple models to the relevant history (AI progress), to adjust hyperparms and do model selection.
However, it's much easier to justify this bound for a physical computation you don't understand very well than to justify something that's tighter, and all you get after correcting for that is probably ~ 5 OOM of difference in the final answer, which is immaterial to the question I'm trying to ask here.
Fair.
If one starts doing anthropic adjustments, you don't just stop there. The vast majority of our measure will be in various simulations, which dramatically shifts everything around to favor histories with life leading to singularities.
I strongly disagree with this way of applying anthropic adjustments. I think these should not in principle be different from Bayesian updates: you start with some prior (could be something like a simplicity prior) over all subjective universes you could have observed and update based on what you actually observe. In that case there's a trivial sense in which the simulation hypothesis is true, because you could always have a simulator that simulates every possible program that halts or something like this, but that doesn't help you actually reduce the entropy of your own observations or to predict anything about the future, so it is not functional.
The optimal inference procedure (solomonoff in binary logic form, equivalent to full bayesianism) is basically what you describe: form a predictive distribution from all computable theories ranked by total entropy (posterior fit + complexity prior). I agree that probably does lead to accepting the simulation hypothesis, because most of the high fit submodels based on extensive physics sims will likely locate observers in simulations rather than root realities.
The anthropic update is then updating from approximate predictive models which don't feature future sims to those that do.
I don't understand what you mean by "doesn't help you actually reduce the entropy of your own observations", as that's irrelevant. The anthropic update to include sim hypothesis is not an update to the core ideal predictive models themselves (as that is physics), it's an update to the approximations we naturally must use to predict the far future.
I think for this to go through you need to do anthropics using SIA or something similar
I don't see the connection to SIA, and regardless of that philosophical confusion there is only one known universally correct in the limits inference method: so the question is always just what would a computationally unbound solomonoff inducer infer?
7 comments
Comments sorted by top scores.
comment by steven0461 · 2022-10-01T20:11:52.954Z · LW(p) · GW(p)
How Hard is Artificial Intelligence? Evolutionary Arguments and Selection Effects
Replies from: ege-erdil↑ comment by Ege Erdil (ege-erdil) · 2022-10-01T20:47:30.034Z · LW(p) · GW(p)
Nice reference, thanks for the link.
comment by Linch · 2022-10-01T22:21:35.540Z · LW(p) · GW(p)
I briefly discussed this here [EA(p) · GW(p)], thanks for having a better reference!
Replies from: ege-erdil↑ comment by Ege Erdil (ege-erdil) · 2022-10-02T00:36:48.083Z · LW(p) · GW(p)
Did you end up getting a good answer to this question? I see no responses to you on the EA forum but you may have received a response through other channels, I suppose.
Replies from: Linch↑ comment by Linch · 2022-10-10T21:57:18.658Z · LW(p) · GW(p)
I got an okay answer in-person but I'm still not fully convinced. The argument was roughly that fairly high level of intelligence developed independently (octopodes, humans. birds, etc). So you might expect that getting to a neuron is hard but after you have neurons evolution "got lucky" multiple times.
That said this was only briefly discussed in person and I might be misunderstanding their position drastically.
comment by Peter Wildeford (peter_hurford) · 2022-10-01T22:51:45.317Z · LW(p) · GW(p)
Does the chance evolution got really lucky cancel out with the chance that evolution got really unlucky? So maybe this doesn't change the mean but does increase the variance?as for how much to increase the variance, maybe like an additional +/-1 OOM tacked on to the existing evolution anchor?
I'm kinda thinking there's like a 10% chance you'd have to increase it by 10x and a 10% chance you'd have to decrease it by 10x. But maybe I'm not thinking about this right?
↑ comment by Ege Erdil (ege-erdil) · 2022-10-01T23:03:38.482Z · LW(p) · GW(p)
The problem with the "evolution got really unlucky" assumption is the Fermi paradox. It seems like to resolve the Fermi paradox we basically have to assume that evolution got really lucky at least at some point if we assume the entire Great Filter is already behind us. Of course in principle it's possible all of this luck was concentrated in an early step like abiogenesis which AI capabilities research has already achieved the equivalent of, and there was no special luck that was needed after that.
The important question seems to be whether we're already past "the Great Filter" in what makes intelligence difficult to evolve naturally or not. If the difficulty is concentrated in earlier steps then we're likely already past it and it won't pose a problem, but e.g. if the apes -> humans transition was particularly difficult then it means building AGI might take far more compute than we'll have at our disposal, or at least that evolutionary arguments cannot put a good bound on how much compute it would take.
The counterargument I give is that Hanson's model implies that if the apes -> humans transition was particularly hard then the number of hard steps in evolution has to be on the order of 100, and that seems inconsistent with both details of evolutionary history (such as how long it took to get multicellular life from unicellular life, for example) and what we think we know about Earth's remaining habitability lifespan. So the number of hard steps was probably small and that is inconsistent with the apes -> humans transition being a hard step.