When did Eliezer Yudkowsky change his mind about neural networks?

post by [deactivated] (Yarrow Bouchard) · 2023-11-14T21:24:00.000Z · LW · GW · 2 comments

This is a question post.

Contents

  Answers
    11 Michael Tontchev
    6 Thomas Kwa
None
2 comments

In 2008, Eliezer Yudkowsky was strongly critical of neural networks. From his post "Logical or Connectionist AI? [LW · GW]":

Not to mention that neural networks have also been "failing" (i.e., not yet succeeding) to produce real AI for 30 years now.  I don't think this particular raw fact licenses any conclusions in particular.  But at least don't tell me it's still the new revolutionary idea in AI.

This is the original example I used when I talked about the "Outside the Box" box - people think of "amazing new AI idea" and return their first cache hit, which is "neural networks" due to a successful marketing campaign thirty goddamned years ago.  I mean, not every old idea is bad - but to still be marketing it as the new defiant revolution?  Give me a break.

By contrast, in Yudkowsky's 2023 TED Talk, he said:

Nobody understands how modern AI systems do what they do. They are giant, inscrutable matrices of floating point numbers that we nudge in the direction of better performance until they inexplicably start working. At some point, the companies rushing headlong to scale AI will cough out something that's smarter than humanity. Nobody knows how to calculate when that will happen. My wild guess is that it will happen after zero to two more breakthroughs the size of transformers.

Sometime between 2014 and 2017, I remember reading a discussion in a Facebook group where Yudkowsky expressed skepticism toward neural networks. (Unfortunately, I don't remember what the group was.)

As I recall, he said that while the deep learning revolution was a Bayesian update, he still didn't believe neural networks were the royal road to AGI. I think he said that he leaned more towards GOFAI/symbolic AI (but I remember this less clearly). 

I've combed a bit through Yudkowsky's published writing, but I have a hard time tracking when, how, and why he changed his view on neural networks. Can anyone help me out?


This post exists only for archival purposes.

Answers

answer by Michael Tontchev · 2023-11-15T09:35:05.970Z · LW(p) · GW(p)

Am I the only one reading the first passage as him being critical of the advertising of NNs, rather than of NNs themselves?

comment by titotal (lombertini) · 2023-11-15T12:03:20.128Z · LW(p) · GW(p)

Partially, but it is still true that Eliezer was critical of NN's at the time, see the comment on the post:

I'm no fan of neurons; this may be clearer from other posts.

Replies from: gwern
comment by gwern · 2023-11-15T19:12:43.716Z · LW(p) · GW(p)

Eliezer has never denied that neural nets can work (and he provides examples in that linked post of NNs working). Eliezer's principal objection was that NNs were inscrutable black boxes which would be insanely difficult to make safe enough to entrust humanity-level power to compared to systems designed to be more mathematically tractable from the start. (If I may quip: "The 'I', 'R', & 'S' in the acronym 'DL' stand for 'Interpretable, Reliable, and Safe'.")

This remains true - for all the good work on NN interpretability, assisted by the surprising levels of linearity inside them, NNs remain inscrutable. To quote Neel Nanda the other day (who has overseen quite a lot of the interpretability research that anyone replying to this comment might be tempted to cite):

Oh man, I do AI interpretability research, and we do not know what deep learning neural networks do. An fMRI scan style thing is nowhere near knowing how it works.


What Eliezer (and I, and pretty much every other LWer at the time who spent any time looking at neural nets) got wrong about neural nets, and has admitted as much, is the timing. (Aside from that, Ms Lincoln...)

To expand a bit on the backstory I also discussed in my scaling hypothesis essay: neural nets seemed like they were a colossally long way away. I don't know how to convey how universal a sentiment this was, or how astonishingly unimpressive neural nets were in 2008 when he was writing that. I was really interested in NNs at that time because the basic argument of 'humans are neural nets; therefore, neural nets must work for AGI' is so obviously correct, but even Schmidhuber, hyping his lab's work to the skies, had nothing better to show than 'we can win a contest about some simple handwritten digits'. Oh wow. So amazing, much nets, very win. Truly the AI paradigm of the future... the distant future.

Everyone except Shane Legg was wrong about DL prospects & timing, and even Legg was wrong about important things - if you look at his early writings, he's convinced that DL will take off and reach human-level in the mid-2020s half because of classic Moravec/Kruzweill/Turing-style projections from Moore's law, yes, but also half because he's super enthusiastic about all the wonderful neuroscientific discoveries in the mid-2000s which finally show How The Brain Works (For Real This Time*)™. So DeepMind simply needed to surf the compute wave to snap together all the neuroscience & reinforcement learning modules into something like Agent 57, and hey presto - AGI! But most of that DM neuroscience-inspired research is long since forgotten or abandoned, leading-edge DRL archs look nothing like a brain, and the current Transformer architecture owes even less than most to neurobiological inspiration, and it's unclear how much the Transformer arch matters at all compared to simple scale. DeepMind is now (still) on the hindfoot, and has suffered an ignominious shotgun wedding-merging to Google Brain

(Google Brain itself is now dissolved as penalty for failing the scaling test. Nor is it the only lab to suffer for failing to call scaling - Microsoft Research is increasingly moribund, and FAIR has apparently suffered major changes too. Maybe that's why LeCun is so shrill on Twitter and adamantly denying that LLMs have any agentic properties whatsoever, nevermind that he's the cherry-on-top guy... Moravec? Pretty good, but seems to have downplayed the role of training, overestimated robotics progress, and broadly tended to expect too-early dates. Dario Amodei? A relative late-comer who has published little, and while 'big blob of compute' aged well, other claims don't seem to - for example, in 2013, he seems to think that neural nets will not tend to have any particular goals or if they do, it'll be easy to align them and confine them to simply answering questions and it'll be easy to have neural nets which just are looking things up in databases and doing transparent symbolic-logical manipulations on the data. So that 'tool AI' perspective has not aged well, and makes Anthropic ironic indeed.)

2023 doesn't look like anyone expected until recently. The current timeline is a surprising place.

* Hinton's daughter reading this: "Oh dad - not again!"

Replies from: wassname, Fj_, Ppau
comment by wassname · 2023-11-28T12:44:50.578Z · LW(p) · GW(p)

he seems to think that neural nets will not tend to have any particular goals or if they do, it'll be easy to align them and confine them to simply answering questions and it'll be easy to have neural nets which just are looking things up in databases and doing transparent symbolic-logical manipulations on the data. So that 'tool AI' perspective has not aged well, and makes Anthropic ironic indeed.

I mean, isn't that what we have? It seems to me that, at least relative to what we expect, LLM's have turned out more human like, more oracle like than we imagined?

Maybe that will change once we use RL to add planning.

Replies from: gwern
comment by gwern · 2023-11-28T16:34:12.583Z · LW(p) · GW(p)

LLM's have turned out more human like, more oracle like than we imagined?

They have turned out far more human-like than Amodei suggested, which means they are not even remotely oracle like. There is nothing in a LLM which is remotely like 'looking things up in a database and doing transparent symbolic-logical manipulations'. That's about the last thing that describes humans too - it takes decades of training to get us to LARP as an 'oracle', and we still do it badly. Even the stuff LLMs do, like inner-monologue, which seem to be transparent, are actually just more Bayesian meta-RL agentic behavior, where the inner-monologue is a mish-mash of amortized computation and task location where the model is flexibly using the roleplay as hints rather than what everyone seems to think it does, which is turn into a little Turing machine mindlessly executing instructions (hence eg. the ability to distill inner-monologue into the forward pass, or insert errors into few-shot examples or the monologue and still get correct answers).

Replies from: wassname
comment by wassname · 2023-12-29T06:28:32.969Z · LW(p) · GW(p)

I see what you mean, but I meant oracle-like in the sense of my recollection of Nick Bostrom's usage in Superintelligence. E.g. an AI that only answers questions and does not act. In some sense, it's how much it's not an agent.

It does seem to me, that pretrained LLM's are not very agent-like by default. They are by default currently constrained to question answering. Although it's changing fast with things like toolformer.

Even the stuff LLMs do, like inner-monologue, which seem to be transparent, are actually just more Bayesian meta-RL agentic behavior, where the inner-monologue is a mish-mash of amortized computation and task location where the model is flexibly using the roleplay as hints rather than what everyone seems to think it does, which is turn into a little Turing machine mindlessly executing instructions (hence eg. the ability to distill inner-monologue into the forward pass, or insert errors into few-shot examples or the monologue and still get correct answers).

It kind of sounds like you are saying that they have a lot of agentic capability, but they are hampered by the lack of memory/planning. If your description here is predictive, then it seems there may be a lot of low hanging agentic behaviour that can be unlocked fairly easily. Like many other things with LLM's, we just need to "ask it properly". Perhaps using some standard RL techniques like world models.

Do you see the properties/danger of LLM's changing once we start using RL to make them into proper agents (not just the few-step chat)?

comment by Fj_ · 2024-01-19T06:39:40.875Z · LW(p) · GW(p)

* Hinton's daughter reading this: "Oh dad - not again!"

I don't know what this was a reference to, but amusingly I just noticed that the video I wanted to link was a 2007 lecture at Google by him (if it's the same Geoffrey Hinton): https://www.youtube.com/watch?v=AyzOUbkUf3M

In it he explained a novel approach to handwriting recognition: stack a bunch of increasingly small layers on top of each other until you have just a few dozen neurons, then an inverted pyramid on top of this bottleneck, and train the network by feeding it a lot of handwritten characters using some sort of modified gradient descent to train it to reproduce the input image in the topmost layer as accurately as possible. After the network is trained, use supervised learning with labeled data to train a usual small NN to interpret/map the bottleneck layer activations to characters. And it worked!

I find it interesting, especially in context of your comment, because:

  • Unsupervised learning meant that you could feed it a LOT of data.
  • So you could force it to learn patterns in the data without overfitting.
  • It appeared uninspired by any natural neuronal organization.
  • It was a precursor to Deep Dream - you could run the network in reverse and see what it imagines when prompted with a specific digit.
  • It actually worked! and basically solved handwriting recognition, as far as I understand.

And so it felt like a first qualitative leap in technology in decades, and a very impressive at that, innovating in weird and unexpected ways in several aspects. Sure, it would be another ten years until GPT2, but some promise was definitely there I think.

Replies from: gwern
comment by gwern · 2024-01-20T01:36:32.982Z · LW(p) · GW(p)

I don't know what this was a reference to

I guess the joke is not as well-known as I thought: https://twitter.com/pmddomingos/status/632685510201241600 (There's a better page of Hinton stories somewhere but I can't immediately refind it.)

comment by Ppau · 2023-11-15T20:32:03.890Z · LW(p) · GW(p)

Those statist AI doomers never miss a chance to bring I, R, and S into everything...

More seriously, thanks for the history lesson!

Replies from: gwern
comment by gwern · 2023-11-15T21:22:11.145Z · LW(p) · GW(p)

Death, taxes, and war, you know - you may not be interested in I, R, or S, but they are interested in you.

answer by Thomas Kwa · 2023-11-14T06:06:24.741Z · LW(p) · GW(p)

My guess is AlphaGo-- I once heard someone who worked at MIRI say that they watched the event and Eliezer was surprised by it.

comment by AnthonyC · 2023-11-14T14:30:44.632Z · LW(p) · GW(p)

Yes, I seem to remember him writing about it at the time, too. Not big posts, more public comments and short posts, not sure exactly where.

The invention of transformers circa 2017 would be the next time I remember a similar shift.

comment by Dirichlet-to-Neumann · 2023-11-14T09:27:34.858Z · LW(p) · GW(p)

In retrospect Alpha0 was really the wake up call for me, not because it was so strong at chess but because it looked so human playing chess.

2 comments

Comments sorted by top scores.

comment by Unnamed · 2023-11-15T23:43:22.194Z · LW(p) · GW(p)

Here is Yudkowsky (2008) Artificial Intelligence as a Positive and
Negative Factor in Global Risk:

Friendly AI is not a module you can instantly invent at the exact moment when it is first needed, and then bolt on to an existing, polished design which is otherwise completely unchanged.

The field of AI has techniques, such as neural networks and evolutionary programming, which have grown in power with the slow tweaking of decades. But neural networks are opaque—the user has no idea how the neural net is making its decisions—and cannot easily be rendered unopaque; the people who invented and polished neural networks were not thinking about the long-term problems of Friendly AI. Evolutionary programming (EP) is stochastic, and does not precisely preserve the optimization target in the generated code; EP gives you code that does what you ask, most of the time, under the tested circumstances, but the code may also do something else on the side. EP is a powerful, still maturing technique that is intrinsically unsuited to the demands of Friendly AI. Friendly AI, as I have proposed it, requires repeated cycles of recursive self-improvement that precisely preserve a stable optimization target.

The most powerful current AI techniques, as they were developed and then polished and improved over time, have basic incompatibilities with the requirements of Friendly AI as I currently see them. The Y2K problem—which proved very expensive to fix, though not global-catastrophic—analogously arose from failing to foresee tomorrow’s design requirements. The nightmare scenario is that we find ourselves stuck with a catalog of mature, powerful, publicly available AI techniques which combine to yield non-Friendly AI, but which cannot be used to build Friendly AI without redoing the last three decades of AI work from scratch.

comment by Vladimir_Nesov · 2023-11-14T15:10:53.324Z · LW(p) · GW(p)

Another 15 years didn't make the idea any newer. Being critical of invalid perception or presentation of an idea is more specific and different from "being critical" of the idea. Pointing out that the idea doesn't clarify specific confusions [LW · GW] about why some processes work is different [LW(p) · GW(p)] from the idea not referring to machines that make those processes work.

Similarly, forecasting that it won't work in some timeframe is more specific, and there does seem to have been a change of mind on that (as facts on the ground demand), but the linked post [LW · GW] doesn't seem particularly relevant, there don't appear to be claims to that effect there, other than on the level of vibes.