Reinterpreting "AI and Compute"

post by habryka (habryka4) · 2018-12-25T21:12:11.236Z · LW · GW · 9 comments

This is a link post for https://aiimpacts.org/reinterpreting-ai-and-compute/

Contents

10 comments

Some arguments saying that the recent evidence about the speed at which compute has been increasing and has been responsible for rapid progress in machine learning, might mean that we should be less worried about short timelines, not more.

[...] Overall, it seems pretty common to interpret the OpenAI data as evidence that we should expect extremely capable systems sooner than we otherwise would.
However, I think it’s important to note that the data can also easily be interpreted in the opposite direction. The opposite interpretation goes like this:
1. If we were previously underestimating the rate at which computing power was increasing, this means we were overestimating the returns on it.
2. In addition, if we were previously underestimating the rate at which computing power was increasing, this means that we were overestimating how sustainable its growth is.
3. Let’s suppose, as the original post does, that increasing computing power is currently one of the main drivers of progress in creating more capable systems. Then — barring any major changes to the status quo — it seems like we should expect progress to slow down pretty soon and we should expect to be underwhelmed by how far along we are when the slowdown hits.

9 comments

Comments sorted by top scores.

comment by Vaniver · 2018-12-26T19:49:29.045Z · LW(p) · GW(p)

I am amused that the footnotes are as long as the actual post.

Footnote 3 includes a rather salient point:

However, if you instead think that something like the typical amount of computing power available to talented researchers is what’s most important — or if you simply think that looking at the amount of computing power available to various groups can’t tell us much at all — then the OpenAI data seems to imply relatively little about future progress.

Especially in the light of this news item from Import AI #126:

The paper obtained state-of-the-art scores on lipreading, significantly exceeding prior SOTAs. It achieved this via a lot of large-scale infrastructure, combined with some elegant algorithmic tricks. But ultimately it was rejected from ICLR, with a comment from a meta-reviewer saying ‘Excellent engineering work, but it’s hard to see how others can build on it’, among other things.

It's possible that we will see more divergence between 'big compute' and 'small compute' worlds in a way that one might expect will slow down progress (because the two worlds aren't getting the same gains from trade that they used to).

Replies from: TheWakalix, jacobjacob
comment by TheWakalix · 2019-02-19T16:45:06.143Z · LW(p) · GW(p)

(For posterity: the above link is the homepage, and this is the article Vaniver referred to.)

Replies from: Vaniver
comment by Vaniver · 2019-02-19T19:53:51.896Z · LW(p) · GW(p)

Link fixed, thanks!

comment by jacobjacob · 2018-12-27T16:17:15.242Z · LW(p) · GW(p)

I'm confused. Do you mean "worlds" as in "future trajectories of the world" or as in "subcommunities of AI researchers"? And what's a concrete example of gains from trade between worlds?

Replies from: Vaniver
comment by Vaniver · 2018-12-27T18:56:16.590Z · LW(p) · GW(p)

Subcommunities of AI researchers. A simple concrete example of gains from trade is when everyone uses the same library or conceptual methodology, and someone finds a bug. The primary ones of interest are algorithmic gains; the new thing used to do better lipreading can also be used by other researchers to do better on other tasks (or to further enhance this approach and push it further for lipreading).

comment by avturchin · 2018-12-26T10:25:53.518Z · LW(p) · GW(p)

One point of possible critics of this article is that it assumes that the "price of flops" is independent variable from the "amount of investment". However, there are two ways how demand affects the price of calculations:

1) Economy of scale. Mass manufacturing of some type of hardware for AI will dilute the cost of its development. I heard that an order of magnitude increase of scale results - in general - in 2 times price decrease per unit. If a company produce hardware for itself, like Google, it also skips marketing costs.

2) Rise of AI specialised hardware, like TPU and Graphcore IPU.

comment by avturchin · 2018-12-26T11:07:35.451Z · LW(p) · GW(p)

The one interesting quote from the article is:

"We can then attempt to construct an argument where: (a) we estimate this minimum quantity of computing power (using evidence unrelated to the present rate of return on computing power), (b) predict that the quantity will become available before growth trends hit their wall, and (c) argue that having it available would be nearly sufficient to rapidly train systems that can do a large portion of the things humans can do. In this case, the OpenAI data would be evidence that we should expect the computational threshold to be reached slightly earlier than we would otherwise have expected to reach it. For example, it might take only five years to reach the threshold rather than ten. However, my view is that it’s very difficult to construct an argument where parts (a)-(c) are all sufficiently compelling. In any case, it doesn’t seem like the OpenAI data alone should substantially increase the probability anyone assigns to “near-term AGI” (rather than just shifting forward their conditional probability estimates of how “near-term” “near-term AGI” would be)."

It would be interesting to have Fermi estimates.

For example, if we assume that human level brain simulation requires 1 exaflops (the median estimation according to AI impacts) and DGX-2 is used to do the work (which has 2 petaflops in tensor operations and costs 400 K USD), then one needs 500 such systems, which will cost 200 mln USD. The price of the datacenter to host all this, plus electricity and connections etc will at least double the cost.

So, the initial investment to study human-brain-level AI is around 400 mln USD, which seems acceptable for many large IT companies, but not for startups.

However, even having human-level hardware is not enough. One need to train it and to account for trial and errors. We could assume - by analogy with human brain training in childhood - that to train one model of human mind, at least 1 year of training time is needed (if the computer is running on the same speed as human mind). Also, at least 1000 trials will be needed to get something workable. (I am sure there will be a large demand for such work as there will be attempts to create "home robots" which act in reality and speak human language, which is close to human level capabilities).

So, to successfully train human level models, at least 1000 years of human level hardware is needed. If a company wants to have it all now and in 1 year, it is unrealistic, as it will cost 400 billion USD upfront.

However, assuming that in the next decade price for flops will fail in the order of magnitude, this investment will be only 40 bln in 10 years from now, and it could be distributed in the period of many years. In other words, to have a human level AI in 10 years, a company needs to spend of such research at least 4 billion a year, which is still large, but is more acceptable for largest IT companies.

In the end, we will get a robot, acting in the environment and speaking human language, but it is not a superintelligence and it is a different question, if such form of embodied cognition is a useful and closest step to the superintelligence.

If exponential growth of compute in Open AI-style will continue and will result in even more dramatic growth of available computations, such robot may be trained not in 10, but maybe in 5 years. The quote above said exactly this, and also assume that there is no practical difference between 5 years estimate and 10 years estimate, as we are not ready for both.

Replies from: None
comment by [deleted] · 2018-12-26T23:01:06.720Z · LW(p) · GW(p)Replies from: avturchin
comment by avturchin · 2018-12-26T23:44:08.944Z · LW(p) · GW(p)

I think that the experimenters will find the ways to compress the training process, may be by skipping part of the dream-less dreaming and the periods of passivity, as well as they will use some algorithmic tricks.

Replies from: None
comment by [deleted] · 2018-12-27T00:06:39.173Z · LW(p) · GW(p)