[Linkpost] Growth in FLOPS used to train ML models

post by Derek M. Jones (Derek-Jones) · 2022-03-14T11:28:33.418Z · LW · GW · 3 comments

This is a linkpost for https://shape-of-code.com/2022/03/13/growth-in-flops-used-to-train-ml-models/

Given the ongoing history of continually increasing compute power, what is the maximum compute power that might be available to train ML models in the coming years?

3 comments

Comments sorted by top scores.

comment by gwern · 2022-03-14T15:57:27.954Z · LW(p) · GW(p)

Speaking of compute and experience curves, Karpathy just posted about replicating Le Cun's 1989 pre-MNIST digit classifying results and what difference compute & methods make: https://karpathy.github.io/2022/03/14/lecun1989/

Replies from: Derek-Jones
comment by Derek M. Jones (Derek-Jones) · 2022-03-14T17:24:55.351Z · LW(p) · GW(p)

Thanks, an interesting read until the author peers into the future.  Moore's law is on its last legs, so the historical speed-ups will soon be just that, something that once happened.  There are some performance improvements still to come from special purpose cpus, and half-precision floating-point will reduce memory traffic (which can then be traded for cpu perforamnce).

comment by wunan · 2022-03-14T15:20:01.928Z · LW(p) · GW(p)

Thanks for writing! I don't see an actual answer to the question asked in the beginning -- "Given the ongoing history of continually increasing compute power, what is the maximum compute power that might be available to train ML models in the coming years?" Did I miss it?