How does OpenAI's language model affect our AI timeline estimates?

post by jimrandomh · 2019-02-15T03:11:51.779Z · score: 51 (16 votes) · LW · GW · 2 comments

This is a question post.


    17 orthonormal
    10 Darmani

OpenAI recently announced progress in NLP, using a large transformer-based language model to tackle a variety of tasks and breaking performance records in many of them. It also generates synthetic short stories, which are surprisingly good.

How surprising are these results, given past models of how difficult language learning was and how far AI had progressed? Should we be significantly updating our estimates of AI timelines?


answer by orthonormal · 2019-02-15T22:29:03.024Z · score: 17 (6 votes) · LW · GW

It doesn't move much probability mass to the very near term (i.e. 1 year or less), because both this and AlphaStar aren't really doing consequentialist reasoning, they're just able to get a surprising performance with simpler tricks (the very Markovian nature of human writing, a good position evaluation function) given a whole lot of compute.

However, it does shift my probabilities forward in time, in the sense that one new weird trick to do deductive or consequentialist reasoning, plus a lot of compute, might get you there really quickly.

answer by Darmani · 2019-02-15T07:43:00.017Z · score: 10 (3 votes) · LW · GW

Something you learn pretty quickly in academia: don't trust the demos. Systems never work as well when you select the inputs freely (and, if they do, expect thorough proof). So, I wouldn't read too deeply into this yet; we don't know how good it actually is.

comment by james_t · 2019-02-15T15:17:31.764Z · score: 24 (3 votes) · LW · GW

Vis-a-vis selecting inputs freely: OpenAI also included a large dump of unconditioned text generation in their github repo.

comment by Vanessa Kosoy (vanessa-kosoy) · 2019-02-15T17:29:48.433Z · score: 19 (6 votes) · LW · GW

They claim beating records on a range of standard tests (such as the Winograd schema), which is not something you can cheat by cherry-picking, assuming they are honest about the results.

comment by wizzwizz4 · 2019-10-20T18:32:40.187Z · score: 2 (2 votes) · LW · GW is a nice demonstration of GPT2 that allows you to select the inputs freely.


Comments sorted by top scores.

comment by avturchin · 2019-02-15T09:51:01.373Z · score: 1 (4 votes) · LW · GW

It lowers expected AI timing but not only because it is so great achievement, but also because it demonstrates that large part of human thinking could be just generating plausible continuation of the input text.

comment by Pattern · 2019-02-15T17:26:37.106Z · score: 0 (4 votes) · LW · GW

OpenAI's "safety" move (not releasing the model) reduces the scrutiny it can receive, which makes its impact on forecasts conditional on how good you think it is, when you haven't seen it.