[Prediction] We are in an Algorithmic Overhang

post by lsusr · 2021-09-29T23:40:15.011Z · LW · GW · 14 comments

Contents

14 comments

The primary purpose of this blog post is to create a public record. Some technical insights are deliberately omitted.

I think we're in an algorithmic AI overhang.

  1. Today's neural networks (including GPT) are so data inefficient they will not strictly outcompete human brains' performance in all domains (excluding robotics, which is hardware-limited) no matter how much data we shove into them and how big we scale them.
  2. The human brain uses a radically different core learning algorithm that scales much better as a function of its training data size.
  3. The core learning algorithm of human beings could be written in a handful of scientific papers comparable to the length and complexity of Einstein's Annus Mirabilis.
  4. Once the mathematics behind the human brain's learning algorithms are made public, they will be running at scale on silicon computers in less than 10 years.
  5. Within 10 years of getting these algorithms to scale, they will be cheap enough for a venture-backed startup to run them at a scale outstripping the smartest human alive―assuming civilization lasts that long. (A world war could destroy the world's semiconductor fabricators.)

I've been thinking about this idea for a long time. What finally pushed me to publish were two fire alarms in sequence. First, a well-respected industry in the leader of AI stated in a private conversation that he believed we were algorithmically limited. Secondly, Steven Byrnes wrote this post [LW · GW]. The basilisk is out of Pandora's Box.


Part 2 here [LW · GW]

14 comments

Comments sorted by top scores.

comment by gjm · 2021-09-30T01:48:05.415Z · LW(p) · GW(p)

The linked post by Steven Byrnes uses the term "hardware overhang" to describe the situation where we have hardware much more powerful than would be needed but algorithms aren't good enough. Your post here uses the term "algorithmic overhang". One could justify either terminology, but clearly not both. (I think SB's is better. Maybe something more explicit like "hardware-ahead-of-algorithms overhang" would be better than either, since evidently different people have different intuitions about which of two opposite situations an "X overhang" describes.)

Replies from: lsusr
comment by lsusr · 2021-09-30T02:15:51.512Z · LW(p) · GW(p)

The linked post by Steven Byrnes uses the term "hardware overhang" to describe the situation where we have hardware much more powerful than would be needed but algorithms aren't good enough. Your post here uses the term "algorithmic overhang". One could justify either terminology, but clearly not both.

Thank you for pointing this out. I think we are in a hardware-ahead-of-algorithms overhang.

Replies from: p.b.
comment by p.b. · 2021-10-17T16:16:41.725Z · LW(p) · GW(p)

Maybe worth pointing out that "hardware overhang" is a pretty old (>10years) and well known term that afaik was not coined by Steven Byrnes. So your title must be confusing to quite a lot of people.

comment by gjm · 2021-09-30T01:44:36.122Z · LW(p) · GW(p)

It is not clear whether, for each of your points 1-5, you are saying

  • "This is definitely true and I have conclusive evidence for it"
  • "I am confident that this is true but I don't have evidence that would reliably convince others"
  • "I think this is probably true"
  • "This might be true, which given the potential consequences is already enough to be interesting"
  • something else.

(I'm guessing probably "I think this is probably true"?)

Replies from: lsusr
comment by lsusr · 2021-09-30T02:14:35.957Z · LW(p) · GW(p)

I'm not sure. I think it's at least 10%.

I don't really think in terms of beliefs. I think in terms of bets. My confidence is high enough to stake some reputation on a public prediction.

comment by Pattern · 2021-09-30T02:24:07.958Z · LW(p) · GW(p)
The core learning algorithm [of] human beings could be written
Replies from: lsusr
comment by lsusr · 2021-09-30T02:25:50.675Z · LW(p) · GW(p)

Fixed. Thanks.

comment by ShardPhoenix · 2021-09-30T01:10:24.341Z · LW(p) · GW(p)

Can you expand on #3? Do you mean that this is happening now, that it could happen now (ie the data is already out there and just needs to be interpreted), or that it might happen in the near future (eg if brain science gets a bit better)? How sure are you of this?

Replies from: lsusr
comment by lsusr · 2021-09-30T01:37:35.871Z · LW(p) · GW(p)

Can you expand on #3? Do you mean that this is happening now, that it could happen now (ie the data is already out there and just needs to be interpreted), or that it might happen in the near future (eg if brain science gets a bit better)?

Might have happened already. Probably hasn't yet. Einstein's papers were an anomaly. I wouldn't be surprised some of the insights can be found here and there in the haystack but they haven't been completed yet.

I don't think the timeline has much to do with practical brain science. Math tends to advance faster than biology. Brain science just puts an upper bound on how long it'll take to get there. In the slowest possible world, we crack the basic biological cortical algorithm by observing brains with high fidelity in real time.

How sure are you of this?

At least 10%.

comment by Bernhard · 2021-10-14T20:22:43.534Z · LW(p) · GW(p)

First of all, kudos to you for making this public prediction.

To keep this brief: 1 (95%), 2 (60%), 3 (75%), 4(<<5%), 5 (<<1%)

I don't think we are in a hardware overhang, and my argument is the following:

Our brains are composed of ~10^11 neurons, and our computers of just as many transistors, so in a first approximation, we should already be there.

However, our brains have approximately 10^3 to 10^5 synapses per cell, while transistors are much more limited (I would guess maybe 10 on average?).

Even assuming that 1 transistor is "worth" one neuron, we come up short.

I remember learning that a perceptron with a single hidden layer of arbitrary width can approximate any function, and thereby any perceptron with finite width, but with more hidden layer. (I think this is called the "universal approximaten theorem"?)

After reading your post, I kept trying to find some numbers of how many neurons are equivalent to an additional layer, but came up empty.

I think the problem is basically that each additional layer contributes superlinearly to "complexity" (however you care to measure that). Please correct me if I'm wrong, I would say this point is my crux. If we are indeed in a territory where we have available transistor counts comparable to a "single-hidden-layer-perceptron-brain-equivalent", then I would have to revise my opinion.

I'm personally very interested in this highly parallel brain architecture, and if I could, I would work on ways to investigate/build/invent ways to create similar structures. However, besides self-assembly (as in living growing things), I don't yet see how we could build things of a similar complexity in a controlled way.

Replies from: Bernhard, lsusr
comment by Bernhard · 2021-11-01T20:21:48.220Z · LW(p) · GW(p)

Just for completeness, I found [this paper](http://dx.doi.org/10.1016/j.neuron.2021.07.002), where they try to simulate the output of a specific type of neuron, and for best results require a DNN of 5-8 layers (with widths of ~128)

comment by lsusr · 2021-10-14T20:50:09.220Z · LW(p) · GW(p)

I think the problem is basically that each additional layer contributes superlinearly to "complexity" (however you care to measure that). Please correct me if I'm wrong, I would say this point is my crux.

Do you mean that each additional layer contributed too much to hypothesis space entropy [LW · GW]?

Replies from: Bernhard
comment by Bernhard · 2021-10-16T17:35:25.104Z · LW(p) · GW(p)

I guess that would be one way to frame it. I think a simpler way to think of it (Or a way that my simpler mind thinks of it) is that for a given number of parameters (neurons), more complex wiring allows for more complex results. The "state-space" is larger if you will.

3+2, 3x2 and 3² are simply not the same.

From my limited knowledge (undergraduate-level CS knowledge), I seem to remember, that typical deep neural networks use a rather small number of hidden layers (maybe 10? certainly less than 100??  (please correct me if I am wrong)). I think this choice is rationalized with "This already does everything we need, and requires less compute"

To me this somewhat resembles a Chesterton's fence (Or rather its inverse). If we were to use neural nets of sufficient depths (>10e3), then we may encounter new things, but before we get there, we will certainly realize that we still have a ways to go in terms of raw compute.

comment by [deleted] · 2021-10-11T14:18:51.018Z · LW(p) · GW(p)

I agree, although I doubt the brain algorithm will end up at the core of AGI.