Assessing Kurzweil predictions about 2019: the results

post by Stuart_Armstrong · 2020-05-06T13:36:18.788Z · LW · GW · 21 comments

Contents

  Methods and thanks
  The results
  Interesting details
    Predictor agreement
    Most agreement/falsest predictions
    Most accurate prediction:
    Least agreement
    Most "Cannot Decide"
    A question of timeline?
  In conclusion
None
21 comments

EDIT: Mean and standard deviation of individidual predictions can be found here [LW · GW].

Thanks to all my brave assessors, I now have the data about Kurzweil's 1999 predictions about 2019.

This was a follow up to a previous assessment about his predictions about 2009 [LW · GW], which showed a mixed bag. Roughly evenly divided between right and wrong, which I found pretty good for ten-year predictions:

So, did more time allow for trends to overcome noise or more ways to go wrong? Pause for a moment to calibrate your expectations.

Methods and thanks

So, for the 2019 predictions, I divided them into 105 separate statements, did a call for volunteers, with instructions here [LW · GW]; the main relevant point being that I wanted their assessment for 2019, not for the (possibly transient) current situation. I got 46 volunteers with valid email addresses, of which 34 returned their predictions. So many thanks, in reverse alphabetical order, to Zvi Mowshowitz, Zhengdong Wang, Yann Riviere, Uriel Fiori, orthonormal, Nuño Sempere, Nathan Armishaw, Koen Holtman, Keller Scholl, Jaime Sevilla, Gareth McCaughan, Eli Rose and Dillon Plunkett, Daniel Kokotajlo, Anna Gardiner... and others who have chosen to remain anonymous.

The results

Enough background; what did the assessors find? Well, of the 34 assessors, 24 went the whole hog and did all 105 predictions; on average, 91 predictions were assessed by each person, a total of 3078 individual assessments[1].

So, did more time allow for more perspective or more ways to go wrong? Well, Kurzweil's predictions for 2019 were considerably worse than those for 2009, with more than half strongly wrong:

Interesting details

The (anonymised) data can be found here[2], and I encourage people to download and assess it themselves. But some interesting results stood out to me:

Predictor agreement

Taking a single prediction, for instance the first one:

Then we can compute the standard deviation of the predictors' answer for that prediction. This gives an impression of how much disagreement there was between predictors; in this case, it was 0.84.

Perfect agreement would be a standard deviation of 0; maximum disagreement (half find "1", half find "5") would be a standard deviation of 2. Perfect spread - equal numbers of 1s, 2s, 3s, 4s, and 5s - would have a standard deviation of 1.4.

Across the 105 predictions, the maximum standard deviation was 1.7, the minimum was 0 (perfect agreement), and the average was 0.97. So the predictors had a medium tendency to agree with each other.

Most agreement/falsest predictions

There was perfect agreement on five predictions; and on all of these, the agreed prediction was always "5": "False".

These predictions were:

As you can see, Kurzweil suffered a lot from his VR predictions. This seems a perennial thing: Hollywood is always convinced that mass 3D is just around the corner; technologists are convinced that VR is imminent.

Most accurate prediction:

With a mean score of 1.3, the prediction deemed most accurate was:

Now this might seem a trivial prediction, especially in retrospect, but I want to defend Kurzweil here - it was not at all certain in 1999, with many utopian changes foreseen and expected, that this would still be an issue.

The next prediction deemed most accurate (mean of 1.4), is:

This is truly non-trivial for 1999, and I do give Kurzweil credit for that.

Least agreement

With a standard deviation of 1.7, the predictors disagreed the most on this prediction:

This may have to do with different judgement over the extent of "everywhere" and "rarely an issue", or over who might or might not find this to be an issue.

The next prediction with the most disagreement (st dev 1.6) is:

It's possible that "fully" was a problem here, but I see this prediction as being just false.

Most "Cannot Decide"

This prediction had more than of predictors choosing "Cannot Decide":

Maybe the ambiguity in "fully recognized" made this hard to assess. Or maybe, as suggested in the comments [LW(p) · GW(p)], because this doesn't look much like a "prediction", but an obviously true statement?

A question of timeline?

It's been suggested that Kurzweil's predictions for 2009 are mostly correct in 2019. If this is the case - Kurzweil gets the facts right, but the timeline wrong - it would be interesting to revisit these predictions in 2029 (if he is a decade optimistic) and 2039 (if he expected things to go twice as fast). Though many of his predictions seem to be of the type "once true, always true", so his score should rise with time, assuming continuing technological advance and no disasters.

In conclusion

Again, thanks to all the volunteers who assessed these predictions and thanks to Kurzweil who, unlike most prognosticators, had the guts and the courtesy to write down his predictions and give them a date.

I strongly suspect that most people's 1999 predictions about 2019 would have been a lot worse.


  1. Five assessments of the 3078 returned question marks; I replaced these with "3" ("Cannot Decide"). Four assessors of the 34 left gaps in their predictions, instead of working through the randomly ordered predictions; to two significant figures, excluding these four didn't change anything, so I included them all. ↩︎

  2. Each column is an individual predictor, each row an individual prediction. ↩︎

21 comments

Comments sorted by top scores.

comment by Czynski (JacobKopczynski) · 2020-05-06T17:09:39.602Z · LW(p) · GW(p)

prediction after seeing the 2009 graph:

15-20% True
8-12% Weakly True
8-12% Undecided
20-25% Weakly False
35-45% False

This was basically taking the 2009 graph and skewing it to the right, pivoting on Undecided. It was still too optimistic.

True: 15-20% vs 12%
Weakly True: 8-12% vs 12%
Undecided: 8-12% vs 10%
Weakly False: 20-25% vs 15%
False: 35-45% vs 50%

Looking at the discrepancy, it doesn't seem like any systematic skewing adjustment, i.e. adjusting for overconfidence, would have gotten good results. (The closest would be one that pivoted on 'Weakly False'.) A better model would be assuming that all predictions had a modest chance of being other-than-totally-false which was uniformly distributed over degree of truth, and most were totally false.

Therefore, I predict that if these predictions are examined again in 10 or 20 year's time, they will still have this uniform distribution over degree of truth property, though presumably a higher chance of not being totally false.

Replies from: Stuart_Armstrong
comment by Stuart_Armstrong · 2020-05-06T17:16:14.732Z · LW(p) · GW(p)

Strong upvote for writing your own predictions before seeing the 2019 graph.

comment by Raemon · 2020-06-25T02:56:16.450Z · LW(p) · GW(p)

Curated.

I think "futurism with good epistemics" is pretty hard, and pretty important. The LessWrong zeitgeist is sort of "Post Kurzweil" – his predictions aren't the ones that we'll personally be graded on. But, I think the act of methodically looking over his predictions helps us orient on the predictions we're making. 

I think a) it offers a cautionary tale of mistakes we might be making, and b) I think the act of having a strong tradition of evaluating long-past predictions (hopefully?) helps ward off bullshit. (i.e. many pundits make predictions which skew towards 'locally sound exciting and impressive' because they don't expect to be called on it later)

It's also interesting to note how much disagreement there was over some predictions.

One question I came away with:

It's been suggested that Kurzweil's predictions for 2009 are mostly correct in 2019.

Is this well established? Is there a previous writeup that argues this, or just a general feel? I'd be interested in applying the same methology to the old 2009 predictions and check if they're true.

Replies from: Stuart_Armstrong
comment by Stuart_Armstrong · 2020-06-25T12:12:15.810Z · LW(p) · GW(p)

https://www.futuretimeline.net/forum/topic/17903-kurzweils-2009-is-our-2019/ , forwarded to me by Daniel Kokotajlo (I added a link in the post as well).

comment by Daniel Kokotajlo (daniel-kokotajlo) · 2020-05-06T13:48:44.387Z · LW(p) · GW(p)

The hypothesis that Kurzweil is basically right but off by 10 years (and thus, that these predictions will be mostly true in 2029) seems less plausible to me than the hypothesis that Kurzweil is basically right but thinks everything will happen twice as fast as it does (and thus, that these predictions will be mostly true in 2039). I'd give the first hypothesis about 15% credence and the second about 25%.

Replies from: Stuart_Armstrong
comment by Stuart_Armstrong · 2020-05-06T14:12:46.904Z · LW(p) · GW(p)

Edited my post to reflect this possibility.

Personally, I don't think there will be anything consistent like that; just some predictions right, some premature, some wrong. I note that most of the predictions seem to be of the type "once true, always true".

comment by Arenamontanus · 2020-05-06T16:19:03.606Z · LW(p) · GW(p)

I tried doing a PCA of the judgments, to see if there was any pattern in how the predictions were judged. However, the variance of the principal components did not decline fast. The first component explains just 14% of the variance, the next ones 11%, 9%, 8%... It is not like there are some very dominant low-dimensional or clustering explanation for the pattern of good or bad predictions.

No clear patterns when I plotted the predictions in PCA-space: https://www.dropbox.com/s/1jvhzcn6ngsw67a/kurzweilpredict2019.png?dl=0 (In this plot colour denotes mean assessor view of correctness, with red being incorrect, and size the standard deviation of assessor views, with large corresponding to more agreement). Some higher order components may correspond to particular correlated batches of questions like the VR ones.

(Or maybe I used the Matlab PCA routine wrong).

Replies from: Stuart_Armstrong
comment by Stuart_Armstrong · 2020-05-06T16:34:37.193Z · LW(p) · GW(p)

Plot visualised:

comment by cozy · 2020-07-02T14:21:26.697Z · LW(p) · GW(p)

I predicted the graph would be similar; and I was indeed much too optimistic. In fact, I was so wrong it reminded me of this exact issue, where your predictive ability becomes more and more impossibly difficult every decade or so off you go. It may also be getting even more difficult given our progress as a large unit of populations, but the underlying humanistic predictions may still well be possible, since that variable never quite leaves us.

If anyone has read 'Where To?' by Robert Heinlein, he makes predictions in 1950 and then updated his predictions in 1965 and 1980. Here is an excerpt on an alternative site since I don't have a direct link to the book itself, but if one is interested in the predictive ability of a well-educated individual, it is a very intriguing read, and archiving perspectives lets us understand just how absurd one idea could be, and yet how much further than it we got in a much shorter time frame than assumed. It focuses on technology; and the potential of certain science fiction possibilities, some of which were undershot, but most overshot. On average, he was almost entirely wrong, but only by some degrees of relevancy. Here is the excerpt of the book, and since it only extends to 1980, you can insert your own realizations of how the predictions went for 2000, and now 2020. There are many optimistic predictions made, and a few pessimistic ones. Being an aeronautical engineer and author, Heinlein possibly saw too much potential in our ability to progress that science; the same flaw would probably be present in any predictions made, since we know most well our favored subjects, and want to believe in their potentials.

I recall Alan Watts made some very interesting comments that essentially predicted a lot of our current access to phones, internet, information, etc. It isn't difficult to imagine how far we may come or how easy it may be when progress is not impeded. What is likely absurdly difficult to predict is how Google loading in four seconds instead of two can make someone upset to the point they may seethe or clench their fists. And yet, I have done that more than once, or seen that small red x icon, and then gotten similarly upset in private. The entire world is available to me, and two seconds or so of inconvenience, or possibly a modem reset, being an antagonizing factor is only context-driven. At what point does the time saving become unnoticeable; perhaps the exponential growth of a factor of our lives shouldn't be considered, or the understanding and availability of it, but the brand new frustrations and reasons for emotion they bring. Roadrage is a common example, and that comes from the novel nature of driving wearing off from monotony. It was great to drive the first few weeks and the freedom was liberating, until that freedom was stripped because I'm sitting on the I-5 and no one is dying, it's just 5PM and everyone is going home.

An optimistic prediction would be made to 'ease' that frustration- flying cars, better infrastructure, etc- but what about the new frustrations of those predictions? Having to deal with the FAA instead of the police every time you want to go eat? The city becoming sprawling and difficult to navigate, but very streamlined and without 'stops', or even simply every trip becoming much longer as the concept of not slowing down is pushed? Although contradictory, going faster when you're taking a longer route is not always favorable, but to an impatient crowd, it may solve the more pressing issue: no one will be able to get out of the car to yell at the guy behind them!

comment by Kaj_Sotala · 2021-01-02T11:28:34.537Z · LW(p) · GW(p)

Another review of Kurzweil's 2019 predictions: [1, 2, 3, 4].

comment by Steven Byrnes (steve2152) · 2020-05-06T14:14:06.088Z · LW(p) · GW(p)

"It is now fully recognized that the brain comprises many specialized regions, each with its own topology and architecture of interneuronal connections."

Is this really a prediction? I would call it "A blindingly obvious fact." This page says "Herophilus not only distinguished the cerebrum and the cerebellum, but provided the first clear description of the ventricles", and the putamen and corpus callosum were discovered in the 16th century, etc. etc. Sorry if I'm misunderstanding, I don't know the context.

ETA: Maybe I should be more specific and nuanced. I think it's uncontroversial and known for hundreds if not thousands of years that the brain comprises many regions which look different—for example, the cerebellum, the putamen, etc. I think it's also widely agreed for 100+ years that each is "specialized", at least in the sense that different regions have different functions, although the term is kinda vague. The idea that "each [has] its own topology and architecture of interneuronal connections" is I think the default assumption ... if they had the same topology and architecture, why would they look different? And now that we know what neurons are and have good microscopes, this is no longer just a default assumption, but an (I think) uncontroversial observation.

Replies from: RobbBB, Stuart_Armstrong
comment by Rob Bensinger (RobbBB) · 2020-05-06T14:58:48.415Z · LW(p) · GW(p)

Here's the context:

[...] Rotating memories and other electromechanical computing devices have been fully replaced with electronic devices. Three-dimensional nanotube lattices are now a prevalent form of computing circuitry.

The majority of "computes" of computers are now devoted to massively parallel neural nets and genetic algorithms.

Significant progress has been made in the scanning-based reverse engineering of the human brain. It is now fully recognized that the brain comprises many specialized regions, each with its own topology and architecture of interneuronal connections. The massively parallel algorithms are beginning to be understood, and these results have been applied to the design of machine-based neural nets. It is recognized that the human genetic code does not specify the precise interneuronal wiring of any of the regions, but rather sets up a rapid evolutionary process in which connections are established and fight for survival. The standard process for wiring machine-based neural nets uses a similar genetic evolutionary algorithm. [...]

Replies from: steve2152
comment by Steven Byrnes (steve2152) · 2020-05-06T15:25:44.257Z · LW(p) · GW(p)

Thanks¸ that actually helps a lot, I didn't get that it was from the voice of someone in the future. I still don't see any way to make sense of that as a "prediction", i.e. something that is true but was not fully recognized in 1999.

The closest thing I can think of that would make sense is if he were claiming that the neocortex comprises many specialized regions, each with its own topology and architecture of interneuronal connections (cf zhukeepa's post [LW · GW] a couple days ago). But that's not it. Not only would Kurzweil be unlikely to say "brain" when he meant "neocortex", but I also happen to know that Kurzweil is a strong advocate against the idea that the neocortex comprises many architecturally-different regions. Well, at least he advocated for cortical uniformity in his 2012 book, and when I read that I also got the impression that he had believed the same thing for a long time before that.

I think he put that in and phrased it as a prediction just for narrative flow, while setting up the subsequent sentences ... like if he had written

"It is now fully recognized that every object is fundamentally made out of just a few dozen types of atoms. Therefore, molecular assemblers with the right feedstock can make any object on demand..."

or whatever. The first sentence here is phrased as a prediction but it isn't really.

comment by Stuart_Armstrong · 2020-05-06T15:02:56.396Z · LW(p) · GW(p)

I didn't judge whether it was plausible or trivial; I just took out every thing that was formulated as a prediction for the future.

comment by Multicore (KaynanK) · 2020-05-07T15:56:55.903Z · LW(p) · GW(p)

It looks like two of the predictions, that the majority of teacher-student interactions would be remote and that the majority of meetings would be remote, have flipped from false to true between 2019 and 2020, but because of a global pandemic rather than directly proceeding from advancements in technology.

Replies from: Vaniver
comment by Vaniver · 2020-05-07T18:40:47.288Z · LW(p) · GW(p)

One of the things I find really hard about tech forecasting is that most of tech adoption is driven by market forces / comparative economics ("is solar cheaper than coal?"), but raw possibility / distance in the tech tree is easier to predict ("could more than half of schools be online?"). For about the last ten years we could have had the majority of meetings and classes online if we wanted to, but we didn't want to--until recently. Similarly, people correctly called that the Internet would enable remote work, in a way that could make 'towns' the winners and 'big cities' the losers--but they incorrectly called that people would prefer remote work to in-person work, and towns to big cities. 

[A similar thing happened to me with music-generation AI; for years I think we've been in a state where people could have taken off-the-shelf method A and done something interesting with it on a huge music dataset, but I think everyone with a huge music dataset cares more about their relationship with music producers than they do about making the next step of algorithmic music.]

Replies from: gwern, Vaniver
comment by gwern · 2020-06-13T01:07:39.844Z · LW(p) · GW(p)

for years I think we've been in a state where people could have taken off-the-shelf method A and done something interesting with it on a huge music dataset

Absolutely. I got decent enough results just tinkering with GPT-2, and OpenAI's Jukebox could have been done at smaller scale years ago, and OA could presumably do a lot better right now if they had a few million to spare (Jukebox has only ~7b parameters, while GPT-3 has 175b, and Jukebox is pretty close to human-level so just another 10x seems like it'd make it an extremely useful tool commercially).

comment by Vaniver · 2022-12-15T20:44:50.096Z · LW(p) · GW(p)

Riffusion is basically exactly what I was thinking of here (in terms of input-output behavior; the internals are more sophisticated than what I was thinking of at the time).

comment by NunoSempere (Radamantis) · 2020-05-13T07:41:57.638Z · LW(p) · GW(p)

Browsing Wikipedia, a similar effort was the 1985 book Tools for thought, (available here), though I haven't read it.