Shulman and Yudkowsky on AI progress 2021-12-03T20:05:22.552Z
Biology-Inspired AGI Timelines: The Trick That Never Works 2021-12-01T22:35:28.379Z
Soares, Tallinn, and Yudkowsky discuss AGI cognition 2021-11-29T19:26:33.232Z
Christiano, Cotra, and Yudkowsky on AI progress 2021-11-25T16:45:32.482Z
Yudkowsky and Christiano discuss "Takeoff Speeds" 2021-11-22T19:35:27.657Z
Ngo and Yudkowsky on AI capability gains 2021-11-18T22:19:05.913Z
Ngo and Yudkowsky on alignment difficulty 2021-11-15T20:31:34.135Z
Discussion with Eliezer Yudkowsky on AGI interventions 2021-11-11T03:01:11.208Z
Self-Integrity and the Drowning Child 2021-10-24T20:57:01.742Z
The Point of Trade 2021-06-22T17:56:44.088Z
I'm from a parallel Earth with much higher coordination: AMA 2021-04-05T22:09:24.033Z
A Semitechnical Introductory Dialogue on Solomonoff Induction 2021-03-04T17:27:35.591Z
Your Cheerful Price 2021-02-13T05:41:53.511Z
Movable Housing for Scalable Cities 2020-05-15T21:21:05.395Z
Coherent decisions imply consistent utilities 2019-05-12T21:33:57.982Z
Should ethicists be inside or outside a profession? 2018-12-12T01:40:13.298Z
Transhumanists Don't Need Special Dispositions 2018-12-07T22:24:17.072Z
Transhumanism as Simplified Humanism 2018-12-05T20:12:13.114Z
Is Clickbait Destroying Our General Intelligence? 2018-11-16T23:06:29.506Z
On Doing the Improbable 2018-10-28T20:09:32.056Z
The Rocket Alignment Problem 2018-10-04T00:38:58.795Z
Toolbox-thinking and Law-thinking 2018-05-31T21:28:19.354Z
Meta-Honesty: Firming Up Honesty Around Its Edge-Cases 2018-05-29T00:59:22.084Z
Challenges to Christiano’s capability amplification proposal 2018-05-19T18:18:55.332Z
Local Validity as a Key to Sanity and Civilization 2018-04-07T04:25:46.134Z
Security Mindset and the Logistic Success Curve 2017-11-26T15:58:23.127Z
Security Mindset and Ordinary Paranoia 2017-11-25T17:53:18.049Z
Hero Licensing 2017-11-21T21:13:36.019Z
Against Shooting Yourself in the Foot 2017-11-16T20:13:35.529Z
Status Regulation and Anxious Underconfidence 2017-11-16T19:35:00.533Z
Against Modest Epistemology 2017-11-14T20:40:52.681Z
Blind Empiricism 2017-11-12T22:07:54.934Z
Living in an Inadequate World 2017-11-09T21:23:25.451Z
Moloch's Toolbox (2/2) 2017-11-07T01:58:37.315Z
Moloch's Toolbox (1/2) 2017-11-04T21:46:32.597Z
An Equilibrium of No Free Energy 2017-10-31T21:27:00.232Z
Frequently Asked Questions for Central Banks Undershooting Their Inflation Target 2017-10-29T23:36:22.256Z
Inadequacy and Modesty 2017-10-28T21:51:01.339Z
AlphaGo Zero and the Foom Debate 2017-10-21T02:18:50.130Z
There's No Fire Alarm for Artificial General Intelligence 2017-10-13T21:38:16.797Z
Catalonia and the Overton Window 2017-10-02T20:23:37.937Z
Can we hybridize Absent-Minded Driver with Death in Damascus? 2016-08-01T21:43:06.000Z
Zombies Redacted 2016-07-02T20:16:33.687Z
Chapter 84: Taboo Tradeoffs, Aftermath 2 2015-03-14T19:00:59.813Z
Chapter 119: Something to Protect: Albus Dumbledore 2015-03-14T19:00:59.687Z
Chapter 32: Interlude: Personal Financial Management 2015-03-14T19:00:59.231Z
Chapter 46: Humanism, Pt 4 2015-03-14T19:00:58.847Z
Chapter 105: The Truth, Pt 2 2015-03-14T19:00:57.357Z
Chapter 19: Delayed Gratification 2015-03-14T19:00:56.265Z
Chapter 99: Roles, Aftermath 2015-03-14T19:00:56.252Z


Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Biology-Inspired AGI Timelines: The Trick That Never Works · 2021-12-03T03:52:18.830Z · LW · GW

A lot of the advantage of human technology is due to human technology figuring out how to use covalent bonds and metallic bonds, where biology sticks to ionic bonds and proteins held together by van der Waals forces (static cling, basically).  This doesn't fit into your paradigm; it's just biology mucking around in a part of the design space easily accessible to mutation error, while humans work in a much more powerful design space because they can move around using abstract cognition.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on "Infohazard" is a predominantly conflict-theoretic concept · 2021-12-03T03:48:44.255Z · LW · GW

Nope.  You're evaluating their strategies using your utility function.  Infohazards occur when individuals or groups create strategies using their own utility functions and then do worse under their own utility functions when knowledge of true facts is added to them.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on "Infohazard" is a predominantly conflict-theoretic concept · 2021-12-03T03:47:31.334Z · LW · GW

The idea of Transfiguring antimatter (assuming it works) is something that collectively harms all wizards if all wizards know it; it's a group infohazard.  The group infohazards seem worth distinguishing from the individual infohazards, but both seem much more worth distinguishing from secrets.  Secrets exist among rational agents; individual and group infohazards only exist among causal decision theorists, humans, and other such weird creatures.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on "Infohazard" is a predominantly conflict-theoretic concept · 2021-12-02T19:41:57.928Z · LW · GW

We already have a word for information that agent A would rather have B not know, because B's knowledge of it benefits B but harms A; that word is 'secret'.

As this is a very common and ordinary state of affairs, we need a larger and more technical word to describe that rarer and more interesting case where B's veridical knowledge of a true fact X harms B, or when a group's collective knowledge of a true fact X harms the group collectively.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Biology-Inspired AGI Timelines: The Trick That Never Works · 2021-12-02T00:04:31.311Z · LW · GW

It does fit well there, but I think it was more inspired by the person I met who thought I was being way too arrogant by not updating in the direction of OpenPhil's timeline estimates to the extent I was uncertain.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Visible Thoughts Project and Bounty Announcement · 2021-12-01T22:42:07.869Z · LW · GW

I initially tried doing post-hoc annotation and found it much more difficult than thinking my own actual thoughts, putting them down, and writing the prompt that resulted.  Most of the work is in writing the thoughts, not the prompts, so adding pregenerated prompts at expense of making the thoughts more difficult is a loss.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Visible Thoughts Project and Bounty Announcement · 2021-12-01T21:49:40.028Z · LW · GW

<non-binding handwave, ask again and more formally if serious>I'd say we'd pay $2000/each for the first 50, but after that we might also want 5 longer runs to train on in order to have the option of training for longer-range coherence too.  I suppose if somebody has a system to produce only 100-step runs, and nobody offers us 1000-step runs, we'd take what we could get.</non-binding>

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Visible Thoughts Project and Bounty Announcement · 2021-12-01T05:12:28.468Z · LW · GW

My coauthor and myself generated the sample run by taking turns on Action, Thought, Prompt.  That is, I wrote an Action, she wrote a Thought, I wrote a Prompt, she wrote an Action, I wrote a Thought, she wrote a Prompt.  This also helped show up immediately when a Thought underspecified a Prompt, because it meant the Thought and Prompt were never written by the same person.

More coherent overall plot is better - that current systems are terrible at this is all the more reason to try to show a dataset of it being done better.  There doesn't necessarily need to be an advance-planned endpoint which gets foreshadowed; that is demanding a bit much of the author when they're dealing with somebody else's Actions or when people are taking turns on the Thoughts.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Visible Thoughts Project and Bounty Announcement · 2021-11-30T23:57:23.039Z · LW · GW

I state: we'd be happy, nay, ecstatic, to get nice coherent complete shorter runs, thereby disproving my concern that short runs won't be possible to complete, and to pay for them proportionally.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Visible Thoughts Project and Bounty Announcement · 2021-11-30T20:29:02.725Z · LW · GW

We pay out $20,000 per run for the first 10 runs, as quality runs are received, not necessarily all to one group.  If more than one group demonstrates the ability to scale, we might ask more than one group to contribute to the $1M 100-run dataset.  Them cooperating with each other would hardly be a problem.  That said, a lot of the purpose of the 10-run trial is exactly to locate executives or groups that can scale - and maybe be employed by us again, after the prize ends - so everybody getting together to produce the first 10 runs, and then disbanding, in a process that doesn't scale to produce 100 runs, is not quite what we are hoping for here!

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Visible Thoughts Project and Bounty Announcement · 2021-11-30T20:25:14.633Z · LW · GW
  • 1:  I expect that it's easier for authors to write longer thoughtful things that make sense;
  • 2:  MIRI doesn't just target the AI we have, it targets the AI we're afraid we'll get;
  • 3:  Present-day use-cases for dungeons are a long-range problem even if they're currently addressed with short-range technology.

Answer 1:  Longer is easier to write per-step.

Fitting a coherent story with interesting stuff going on into 100 steps, is something I expect to be much harder for a human author than fitting that story into 1000 steps.  Novels are famously easier to write on a page-level basis than short stories.

If you take zombies attacking a magical academy for 1000 steps, you might get something that looks like a coherent quest.  If you take zombies attacking a magical academy for 100 steps, I think you get something that looks like a quest that was just getting started when the dataset ran out... unless the author has somehow carefully figured out a plot that will, given unknown user actions, get somewhere interesting within 100 steps, which sounds much harder for the author I imagine; they can't just pick a premise, run with it, and make stuff up as they go along.  This, indeed, is why I didn't produce a nice complete shorter run to show everyone as an example - because that would have been much harder.

Yes, producing a longer run may take somebody a month or two - though not the same amount of time it should take to produce a carefully crafted novel or short story of the same length.  But I would expect it to be harder and more stressful to ask them to produce 10x the runs that are 1/10 the length.  Especially if we asked authors to produce in medias res fragments taken from the middles or ends of imaginary longer quests not shown, so that the dataset contained windows into the middles and ends of quests, not just beginnings of quests.

I think Answer 1 is the actual dominant consideration in my reasoning.  If I believed it was much easier per data element to ask authors to produce shorter outtakes from imaginary longer quests, I would at least be asking for 5 long runs and 50 short fragments, not 10 long runs, despite answers 2 and 3.

Answer 3:  The real use-case is for long-range coherence.

If this avenue into transparency turns out to go anywhere on a larger strategic scale, it will be because the transparency-inspired tech was useful enough that other developers piled on to it.  This, no offense to the heroic Chris Olah, is one of the major concerns I have about transparency via microscopes - that it doesn't pay off in easy immediate rewards for the usual run of researchers that follow only immediate trails of sweetness in their easily-visible environment.

The present-day use-case for AI dungeons that inspires some user enthusiasm is fundamentally a long-range problem, being addressed with short-range technology, which produces corresponding weirdness.  (In the dataset we're asking for, I baked in an approach that I'm guessing might be helpful; asking the human authors to write long-range notes to themselves, in hopes that an AI can be trained to write long-range notes to itself.)  If this stuff takes off, I'm guessing, it takes off because somebody figured out something that works for the actual use-case of the longer-range coherence challenge.  I don't want to freeze into the dataset the weird limitations of our current technology, and make it be useful only for training dungeons that are weird the same way 2021 dungeons are weird.

If you're a user happy with incoherent dungeon runs, the present-day tech is great for you, but maybe your demand for internal reasoning isn't as strong either.

Answer 2:  It won't be 2021 forever.

MIRI (to some degree) targets the AI we're afraid we'll get, not the AI we have today.  An AI with a modern-short attention span is less worrisome than if somebody gets TransformerXL or axial transformers or whatevs to really start working.  It's longer-range cognition and longer-range thinking that we want to align.  A system that can read through a book is scarier than one which can think about one page.  At least to me, it seems not clear that the key phenomena to be explored will necessarily appear in the page case rather than the book case.  You would also expect scarier systems to have an easier time learning without overnarrowing from 100 big examples instead of 10,000 small examples.  If it turns out nobody can target our dataset today, we can toss it on the table as a challenge and leave it there for longer.  We've been around for 21 years; we can afford to spend at least some of our budget on longer-term planning.  I'm not very much of a gradualist, but I do mostly expect that we see AIs that can read more than a page, and learn from less diverse samples, before the world ends.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Visible Thoughts Project and Bounty Announcement · 2021-11-30T04:23:01.940Z · LW · GW

We're guessing 1000 steps per reasonably-completed run (more or less, doesn't have to be exact) and guessing maybe 300 words per step, mostly 'thought'.  Where 'thoughts' can be relatively stream-of-consciousness once accustomed (we hope) and the dungeon run doesn't have to be Hugo quality in its plotting, so it's not like we're asking for a 300,000-word edited novel.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Ngo and Yudkowsky on alignment difficulty · 2021-11-29T06:06:02.289Z · LW · GW

Singapore probably looks a lot less attractive to threaten if it's allied with another world power that can find and melt arbitrary objects.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Ngo and Yudkowsky on alignment difficulty · 2021-11-28T05:33:01.544Z · LW · GW

"Melt all GPUs" is indeed an unrealistic pivotal act - which is why I talk about it, since like any pivotal act it is outside the Overton Window, and then if any children get indignant about the prospect of doing something other than letting the world end miserably, I get to explain the child-reassuring reasons why you would never do the particular thing of "melt all GPUs" in real life.  In this case, the reassuring reason is that deploying open-air nanomachines to operate over Earth is a huge alignment problem, that is, relatively huger than the least difficult pivotal act I can currently see.

That said, if unreasonably-hypothetically you can give your AI enough of a utility function and have it deploy enough intelligence to create nanomachines that safely move through the open-ended environment of Earth's surface, avoiding bacteria and not damaging any humans or vital infrastructure, in order to surveil all of Earth and find the GPU farms and then melt them all, it's probably not very much harder to tell those nanomachines to melt other things, or demonstrate the credibly threatening ability to do so.

That said, I indeed don't see how we sociologically get into this position in a realistic way, in anything like the current world, even assuming away the alignment problem.  Unless Demis Hassabis suddenly executes an emergency pact with the Singaporean government, or something else I have trouble visualizing?  I don't see any of the current owners or local governments of the big AI labs knowingly going along with any pivotal act executed deliberately (though I expect them to think it's just fine to keep cranking up the dial on an AI until it destroys the world, so long as it looks like it's not being done on purpose).

It is indeed the case that, conditional on the alignment problem being solvable, there's a further sociological problem - which looks a lot less impossible, but which I do not actually know how to solve - wherein you then have to do something pivotal, and there's no grownups in government in charge who would understand why that was something necessary to do.  But it's definitely a lot easier to imagine Demis forming a siloed team or executing an emergency pact with Singapore, than it is to see how you would safely align the AI that does it.  And yes, the difficulty of any pivotal act to stabilize the Earth includes the difficulty of what you had to do, before or after you had sufficiently powerful AGI, in order to execute that act and then prevent things from falling over immediately afterwards.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Yudkowsky and Christiano discuss "Takeoff Speeds" · 2021-11-27T00:38:20.434Z · LW · GW

Maybe another way of phrasing this - how much warning do you expect to get, how far out does your Nope Vision extend?  Do you expect to be able to say "We're now in the 'for all I know the IMO challenge could be won in 4 years' regime" more than 4 years before it happens, in general?  Would it be fair to ask you again at the end of 2022 and every year thereafter if we've entered the 'for all I know, within 4 years' regime?

Added:  This question fits into a larger concern I have about AI soberskeptics in general (not you, the soberskeptics would not consider you one of their own) where they saunter around saying "X will not occur in the next 5 / 10 / 20 years" and they're often right for the next couple of years, because there's only one year where X shows up for any particular definition of that, and most years are not that year; but also they're saying exactly the same thing up until 2 years before X shows up, if there's any early warning on X at all.  It seems to me that 2 years is about as far as Nope Vision extends in real life, for any case that isn't completely slam-dunk; when I called upon those gathered AI luminaries to say the least impressive thing that definitely couldn't be done in 2 years, and they all fell silent, and then a single one of them named Winograd schemas, they were right that Winograd schemas at the stated level didn't fall within 2 years, but very barely so (they fell the year after).  So part of what I'm flailingly asking here, is whether you think you have reliable and sensitive Nope Vision that extends out beyond 2 years, in general, such that you can go on saying "Not for 4 years" up until we are actually within 6 years of the thing, and then, you think, your Nope Vision will actually flash an alert and you will change your tune, before you are actually within 4 years of the thing.  Or maybe you think you've got Nope Vision extending out 6 years?  10 years?  Or maybe theorem-proving is just a special case and usually your Nope Vision would be limited to 2 years or 3 years?

This is all an extremely Yudkowskian frame on things, of course, so feel free to reframe.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Christiano, Cotra, and Yudkowsky on AI progress · 2021-11-26T09:01:31.822Z · LW · GW

I also think human brains are better than elephant brains at most things - what did I say that sounded otherwise?

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Yudkowsky and Christiano discuss "Takeoff Speeds" · 2021-11-26T07:32:23.207Z · LW · GW

Okay, then we've got at least one Eliezerverse item, because I've said below that I think I'm at least 16% for IMO theorem-proving by end of 2025.  The drastic difference here causes me to feel nervous, and my second-order estimate has probably shifted some in your direction just from hearing you put 1% on 2024, but that's irrelevant because it's first-order estimates we should be comparing here.

So we've got huge GDP increases for before-End-days signs of Paulverse and quick IMO proving for before-End-days signs of Eliezerverse?  Pretty bare portfolio but it's at least a start in both directions.  If we say 5% instead of 1%, how much further would you extend the time limit out beyond 2024?

I also don't know at all what part of your model forbids theorem-proving to fall in a shocking headline followed by another headline a year later - it doesn't sound like it's from looking at a graph - and I think that explaining reasons behind our predictions in advance, not just making quantitative predictions in advance, will help others a lot here.

EDIT: Though the formal IMO challenge has a barnacle about the AI being open-sourced, which is a separate sociological prediction I'm not taking on.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Yudkowsky and Christiano discuss "Takeoff Speeds" · 2021-11-26T01:30:10.523Z · LW · GW

I expect it to be hella difficult to pick anything where I'm at 75% that it happens in the next 5 years and Paul is at 25%.  Heck, it's not easy to find things where I'm at over 75% that aren't just obvious slam dunks; the Future isn't that easy to predict.  Let's get up to a nice crawl first, and then maybe a small portfolio of crawlings, before we start trying to make single runs that pierce the sound barrier.

I frame no prediction about whether Paul is under 16%.  That's a separate matter.  I think a little progress is made toward eventual epistemic virtue if you hand me a Metaculus forecast and I'm like "lol wut" and double their probability, even if it turns out that Paul agrees with me about it.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Yudkowsky and Christiano discuss "Takeoff Speeds" · 2021-11-26T00:20:17.662Z · LW · GW

Ha!  Okay then.  My probability is at least 16%, though I'd have to think more and Look into Things, and maybe ask for such sad little metrics as are available before I was confident saying how much more.  Paul?

EDIT:  I see they want to demand that the AI be open-sourced publicly before the first day of the IMO, which unfortunately sounds like the sort of foolish little real-world obstacle which can prevent a proposition like this from being judged true even where the technical capability exists.  I'll stand by a >16% probability of the technical capability existing by end of 2025, as reported on eg solving a non-trained/heldout dataset of past IMO problems, conditional on such a dataset being available; I frame no separate sociological prediction about whether somebody is willing to open-source the AI model that does it.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Christiano, Cotra, and Yudkowsky on AI progress · 2021-11-26T00:13:02.571Z · LW · GW

Mostly, I think the Future is not very predictable in some ways, and this extends to, for example, it being the possible that 2022 is the year where we start Final Descent and by 2024 it's over, because it so happened that although all the warning signs were Very Obvious In Retrospect they were not obvious in antecedent and so stuff just started happening one day.  The places where I dare to extend out small tendrils of prediction are the rare exception to this rule; other times, people go about saying, "Oh, no, it definitely couldn't start in 2022" and then I say "Starting in 2022 would not surprise me" by way of making an antiprediction that contradicts them.  It may sound bold and startling to them, but from my own perspective I'm just expressing my ignorance.  That's one reason why I keep saying, if you think the world more orderly than that, why not opine on it yourself to get the Bayes points for it - why wait for me to ask you?

If you ask me to extend out a rare tendril of guessing, I might guess, for example, that it seems to me that GPT-3's current text prediction-hence-production capabilities are sufficiently good that it seems like somewhere inside GPT-3 must be represented a level of understanding which seems like it should also suffice to, for example, translate Chinese to English or vice-versa in a way that comes out sounding like a native speaker, and being recognized as basically faithful to the original meaning.  We haven't figured out how to train this input-output behavior using loss functions, but gradient descent on stacked layers the size of GPT-3 seems to me like it ought to be able to find that functional behavior in the search space, if we knew how to apply the amounts of compute we've already applied using the right loss functions.

So there's a qualitative guess at a surface capability we might see soon - but when is "soon"?  I don't know; history suggests that even what predictably happens later is extremely hard to time.  There are subpredictions of the Yudkowskian imagery that you could extract from here, including such minor and perhaps-wrong but still suggestive implications like, "170B weights is probably enough for this first amazing translator, rather than it being a matter of somebody deciding to expend 1.7T (non-MoE) weights, once they figure out the underlying setup and how to apply the gradient descent" and "the architecture can potentially look like somebody Stacked More Layers and like it didn't need key architectural changes like Yudkowsky suspects may be needed to go beyond GPT-3 in other ways" and "once things are sufficiently well understood, it will look clear in retrospect that we could've gotten this translation ability in 2020 if we'd spent compute the right way".

It is, alas, nowhere written in this prophecy that we must see even more un-Paul-ish phenomena, like translation capabilities taking a sudden jump without intermediates.  Nothing rules out a long wandering road to the destination of good translation in which people figure out lots of little things before they figure out a big thing, maybe to the point of nobody figuring out until 20 years later the simple trick that would've gotten it done in 2020, a la ReLUs vs sigmoids.  Nor can I say that such a thing will happen in 2022 or 2025, because I don't know how long it takes to figure out how to do what you clearly ought to be able to do.

I invite you to express a different take on machine translation; if it is narrower, more quantitative, more falsifiable, and doesn't achieve this just by narrowing its focus to metrics whose connection to the further real-world consequences is itself unclear, and then it comes true, you don't need to have explicitly bet against me to have gained more virtue points.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Christiano, Cotra, and Yudkowsky on AI progress · 2021-11-25T23:39:41.658Z · LW · GW

If they've found some way to put a lot more compute into GPT-4 without making the model bigger, that's a very different - and unnerving - development.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Yudkowsky and Christiano discuss "Takeoff Speeds" · 2021-11-25T23:30:19.108Z · LW · GW

(I'm currently slightly hopeful about the theorem-proving thread, elsewhere and upthread.)

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Yudkowsky and Christiano discuss "Takeoff Speeds" · 2021-11-25T23:27:47.213Z · LW · GW

I have a sense that there's a lot of latent potential for theorem-proving to advance if more energy gets thrown at it, in part because current algorithms seem a bit weird to me - that we are waiting on the equivalent of neural MCTS as an enabler for AlphaGo, not just a bigger investment, though of course the key trick could already have been published in any of a thousand papers I haven't read.  I feel like I "would not be surprised at all" if we get a bunch of shocking headlines in 2023 about theorem-proving problems falling, after which the IMO challenge falls in 2024 - though of course, as events like this lie in the Future, they are very hard to predict.

Can you say more about why or whether you would, in this case, say that this was an un-Paulian set of events?  As I have trouble manipulating my Paul model, it does not exclude Paul saying, "Ah, yes, well, they were using 700M models in that paper, so if you jump to 70B, of course the IMO grand challenge could fall; there wasn't a lot of money there."  Though I haven't even glanced at any metrics here, let alone metrics that the IMO grand challenge could be plotted on, so if smooth metrics rule out IMO in 5yrs, I am more interested yet - it legit decrements my belief, but not nearly as much as I imagine it would decrement yours.

(Edit:  Also, on the meta-level, is this, like, anywhere at all near the sort of thing you were hoping to hear from me?  Am I now being a better epistemic citizen, if maybe not a good one by your lights?)

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Yudkowsky and Christiano discuss "Takeoff Speeds" · 2021-11-25T22:55:28.391Z · LW · GW

I kind of want to see you fight this out with Gwern (not least for social reasons, so that people would perhaps see that it wasn't just me, if it wasn't just me).

But it seems to me that the very obvious GPT-5 continuation of Gwern would say, "Gradualists can predict meaningless benchmarks, but they can't predict the jumpy surface phenomena we see in real life."  We want to know when humans land on the moon, not whether their brain sizes continued on a smooth trend extrapolated over the last million years.

I think there's a very real sense in which, yes, what we're interested in are milestones, and often milestones that aren't easy to define even after the fact.  GPT-2 was shocking, and then GPT-3 carried that shock further in that direction, but how do you talk with that about somebody who thinks that perplexity loss is smooth?  I can handwave statements like "GPT-3 started to be useful without retraining via just prompt engineering" but qualitative statements like those aren't good for betting, and it's much much harder to come up with the right milestone like that in advance, instead of looking back in your rearview mirror afterwards.

But you say - I think? - that you were less shocked by this sort of thing than I am.  So, I mean, can you prophesy to us about milestones and headlines in the next five years?  I think I kept thinking this during our dialogue, but never saying it, because it seemed like such an unfair demand to make!  But it's also part of the whole point that AGI and superintelligence and the world ending are all qualitative milestones like that.  Whereas such trend points as Moravec was readily able to forecast correctly - like 10 teraops / plausibly-human-equivalent-computation being available in a $10 million supercomputer around 2010 - are really entirely unanchored from AGI, at least relative to our current knowledge about AGI.  (They would be anchored if we'd seen other planets go through this, but we haven't.)

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Christiano, Cotra, and Yudkowsky on AI progress · 2021-11-25T22:45:42.059Z · LW · GW

I don't necessarily expect GPT-4 to do better on perplexity than would be predicted by a linear model fit to neuron count plus algorithmic progress over time; my guess for why they're not scaling it bigger would be that Stack More Layers just basically stopped scaling in real output quality at the GPT-3 level.  They can afford to scale up an OOM to 1.75 trillion weights, easily, given their funding, so if they're not doing that, an obvious guess is that it's because they're not getting a big win from that.  As for their ability to then make algorithmic progress, depends on how good their researchers are, I expect; most algorithmic tricks you try in ML won't work, but maybe they've got enough people trying things to find some?  But it's hard to outpace a field that way without supergeniuses, and the modern world has forgotten how to rear those.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Christiano, Cotra, and Yudkowsky on AI progress · 2021-11-25T22:37:31.386Z · LW · GW

My memory of the past is not great in general, but considering that I bet sums of my own money and advised others to do so, I am surprised that my memory here would be that bad, if it was.

Neither GJO nor Metaculus are restricted to only past superforecasters, as I understand it; and my recollection is that superforecasters in particular, not all participants at GJO or Metaculus, were saying in the range of 20%.  Here's an example of one such, which I have a potentially false memory of having maybe read at the time:

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Christiano, Cotra, and Yudkowsky on AI progress · 2021-11-25T21:14:57.682Z · LW · GW

Somebody tries to measure the human brain using instruments that can only detect numbers of neurons and energy expenditure, but not detect any difference of how the fine circuitry is wired; and concludes the human brain is remarkable only in its size and not in its algorithms.  You see the problem here?  The failure of large dinosaurs to quickly scale is a measuring instrument that detects how their algorithms scaled with more compute (namely: poorly), while measuring the number of neurons in a human brain tells you nothing about that at all.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Yudkowsky and Christiano discuss "Takeoff Speeds" · 2021-11-25T21:10:20.259Z · LW · GW

I find it valuable to know what impressions other people had themselves; it only becomes tone-policing when you worry loudly about what impressions other people 'might' have.  (If one is worried about how it looks to say so publicly, one could always just DM me (though I might not respond).)

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Christiano, Cotra, and Yudkowsky on AI progress · 2021-11-25T20:40:24.735Z · LW · GW

Furthermore 2/3 doom is straightforwardly the wrong thing to infer from the 1:1 betting odds, even taking those at face value and even before taking interest rates into account; Bryan gave me $100 which gets returned as $200 later.

(I do consider this a noteworthy example of 'People seem systematically to make the mistake in the direction that interprets Eliezer's stuff as more weird and extreme' because it's a clear arithmetical error and because I saw a recorded transcript of it apparently passing the notice of several people I considered usually epistemically strong.)

(Though it's also easier than people expect to just not notice things; I didn't realize at the time that Ajeya was talking about a misinterpretation of the implied odds from the Caplan bet, and thought she was just guessing my own odds at 2/3, and I didn't want to argue about that because I don't think it valuable to the world or maybe even to myself to go about arguing those exact numbers.)

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Christiano, Cotra, and Yudkowsky on AI progress · 2021-11-25T18:18:27.534Z · LW · GW

I feel like the biggest subjective thing is that I don't feel like there is a "core of generality" that GPT-3 is missing

I just expect it to gracefully glide up to a human-level foom-ing intelligence

This is a place where I suspect we have a large difference of underlying models.  What sort of surface-level capabilities do you, Paul, predict that we might get (or should not get) in the next 5 years from Stack More Layers?  Particularly if you have an answer to anything that sounds like it's in the style of Gwern's questions, because I think those are the things that actually matter and which are hard to predict from trendlines and which ought to depend on somebody's model of "what kind of generality makes it into GPT-3's successors".

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Yudkowsky and Christiano discuss "Takeoff Speeds" · 2021-11-25T18:15:24.781Z · LW · GW

The crazy part is someone spending $1B and then generating $100B/year in revenue (much less $100M and then taking over the world).

Would you say that this is a good description of Suddenly Hominids but you don't expect that to happen again, or that this is a bad description of hominids?

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Yudkowsky and Christiano discuss "Takeoff Speeds" · 2021-11-25T18:11:54.658Z · LW · GW

Thanks for continuing to try on this!  Without having spent a lot of labor myself on looking into self-driving cars, I think my sheer impression would be that we'll get $1B/yr waifutech before we get AI freedom-of-the-road; though I do note again that current self-driving tech would be more than sufficient for $10B/yr revenue if people built new cities around the AI tech level, so I worry a bit about some restricted use-case of self-driving tech that is basically possible with current tech finding some less regulated niche worth a trivial $10B/yr.  I also remark that I wouldn't be surprised to hear that waifutech is already past $1B/yr in China, but I haven't looked into things there.  I don't expect the waifutech to transcend my own standards for mediocrity, but something has to be pretty good before I call it more than mediocre; do you think there's particular things that waifutech won't be able to do?

My model permits large jumps in ML translation adoption; it is much less clear about whether anyone will be able to build a market moat and charge big prices for it.  Do you have a similar intuition about # of users increasing gradually, not just revenue increasing gradually?

I think we're still at the level of just drawing images about the future, so that anybody who came back in 5 years could try to figure out who sounded right, at all, rather than assembling a decent portfolio of bets; but I also think that just having images versus no images is a lot of progress.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Yudkowsky and Christiano discuss "Takeoff Speeds" · 2021-11-25T16:43:19.889Z · LW · GW

Once you can buy a self-driving car, the thing that Paul predicts with surety and that I shrug about has already happened. If it does happen, my model says very little about remaining timeline from there one way or another. It shrugs again and says, "Guess that's how difficult the AI problem and regulatory problem were."

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Yudkowsky and Christiano discuss "Takeoff Speeds" · 2021-11-25T05:19:36.051Z · LW · GW

I think you are underconfident about the fact that almost all AI profits will come from areas that had almost-as-much profit in recent years. So we could bet about where AI profits are in the near term, or try to generalize this.

I wouldn't be especially surprised by waifutechnology or machine translation jumping to newly accessible domains (the thing I care about and you shrug about (until the world ends)), but is that likely to exhibit a visible economic discontinuity in profits (which you care about and I shrug about (until the world ends))?  There's apparently already mass-scale deployment of waifutech in China to forlorn male teenagers, so maybe you'll say the profits were already precedented.  Google offers machine translation now, even though they don't make much obvious measurable profit on that, but maybe you'll want to say that however much Google spends on that, they must rationally anticipate at least that much added revenue.  Or perhaps you want to say that "almost all AI profits" will come from robotics over the same period.  Or maybe I misunderstand your viewpoint, and if you said something concrete about the stuff you care about, I would manage to disagree with that; or maybe you think that waifutech suddenly getting much more charming with the next generation of text transformers is something you already know enough to rule out; or maybe you think that 2024's waifutech should definitely be able to do some observable surface-level thing it can't do now.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Yudkowsky and Christiano discuss "Takeoff Speeds" · 2021-11-25T05:10:12.307Z · LW · GW

And to say it also explicitly, I think this is part of why I have trouble betting with Paul.  I have a lot of ? marks on the questions that the Gwern voice is asking above, regarding them as potentially important breaks from trend that just get dumped into my generalized inbox one day.  If a gradualist thinks that there ought to be a smooth graph of perplexity with respect to computing power spent, in the future, that's something I don't care very much about except insofar as it relates in any known way whatsoever to questions like those the Gwern voice is asking.  What does it even mean to be a gradualist about any of the important questions like those of the Gwern-voice, when they don't relate in known ways to the trend lines that are smooth?  Isn't this sort of a shell game where our surface capabilities do weird jumpy things, we can point to some trend lines that were nonetheless smooth, and then the shells are swapped and we're told to expect gradualist AGI surface stuff?  This is part of the idea that I'm referring to when I say that, even as the world ends, maybe there'll be a bunch of smooth trendlines underneath it that somebody could look back and point out.  (Which you could in fact have used to predict all the key jumpy surface thresholds, if you'd watched it all happen on a few other planets and had any idea of where jumpy surface events were located on the smooth trendlines - but we haven't watched it happen on other planets so the trends don't tell us much we want to know.)

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Yudkowsky and Christiano discuss "Takeoff Speeds" · 2021-11-25T05:03:44.557Z · LW · GW

I predict that people will explicitly collect much larger datasets of human behavior as the economic stakes rise. This is in contrast to e.g. theorem-proving working well, although I think that theorem-proving may end up being an important bellwether because it allows you to assess the capabilities of large models without multi-billion-dollar investments in training infrastructure.

Well, it sounds like I might be more bullish than you on theorem-proving, possibly.  Not on it being useful or profitable, but in terms of underlying technology making progress on non-profitable amazing demo feats, maybe I'm more bullish on theorem-proving than you are?  Is there anything you think it shouldn't be able to do in the next 5 years?

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Yudkowsky and Christiano discuss "Takeoff Speeds" · 2021-11-25T00:31:17.228Z · LW · GW

Well put / endorsed / +1.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Yudkowsky and Christiano discuss "Takeoff Speeds" · 2021-11-25T00:16:48.043Z · LW · GW

I feel a bit confused about where you think we meta-disagree here, meta-policy-wise.  If you have a thesis about the sort of things I'm liable to disagree with you about, because you think you're more familiar with the facts on the ground, can't you write up Paul's View of the Next Five Years and then if I disagree with it better yet, but if not, you still get to be right and collect Bayes points for the Next Five Years?

I mean, it feels to me like this should be a case similar to where, for example, I think I know more about macroeconomics than your typical EA; so if I wanted to expend the time/stamina points, I could say a bunch of things I consider obvious and that contradict hot takes on Twitter and many EAs would go "whoa wait really" and then I could collect Bayes points later and have performed a public service, even if nobody showed up to disagree with me about that.  (The reason I don't actually do this... is that I tried; I keep trying to write a book about basic macro, only it's the correct version explained correctly, and have a bunch of isolated chapters and unfinished drafts.)  I'm also trying to write up my version of The Next Five Years assuming the world starts to end in 2025, since this is not excluded by my model; but writing in long-form requires stamina and I've been tired of late which is part of why I've been having Discord conversations instead.

I think you think there's a particular thing I said which implies that the ball should be in my court to already know a topic where I make a different prediction from what you do, and so I should be able to state my own prediction about that topic and bet with you about that; or, alternatively, that I should retract some thing I said recently which implies that.  And so, you shouldn't need to have to do all the work to write up your forecasts generally, and it's unfair that I'm trying to make you do all that work.  Check?  If so, I don't yet see the derivation chain on this meta-level point.

I think the Hansonian viewpoint - which I consider another gradualist viewpoint, and whose effects were influential on early EA and which I think are still lingering around in EA - seemed surprised by AlphaGo and Alpha Zero, when you contrast its actual advance language with what actually happened.  Inevitably, you can go back afterwards and claim it wasn't really a surprise in terms of the abstractions that seem so clear and obvious now, but I think it was surprised then; and I also think that "there's always a smooth abstraction in hindsight, so what, there'll be one of those when the world ends too", is a huge big deal in practice with respect to the future being unpredictable.  From this, you seem to derive that I should already know what to bet with you about, and are annoyed by how I'm playing coy; because if I don't bet with you right now, I should retract the statement that I think gradualists were surprised; but to me I'm not following the sequitur there.

Or maybe I'm just entirely misinterpreting the flow of your thoughts here.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Yudkowsky and Christiano discuss "Takeoff Speeds" · 2021-11-24T23:31:49.083Z · LW · GW

I wish to acknowledge this frustration, and state generally that I think Paul Christiano occupies a distinct and more clueful class than a lot of, like, early EAs who mm-hmmmed along with Robin Hanson on AI - I wouldn't put, eg, Dario Amodei in that class either, though we disagree about other things.

But again, Paul, it's not enough to say that you weren't surprised by GPT-2/3 in retrospect, it kinda is important to say it in advance, ideally where other people can see?  Dario picks up some credit for GPT-2/3 because he clearly called it in advance.  You don't need to find exact disagreements with me to start going on the record as a forecaster, if you think the course of the future is generally narrower than my own guesses - if you think that trends stay on course, where I shrug and say that they might stay on course or break.  (Except that of course in hindsight somebody will always be able to draw a straight-line graph, once they know which graph to draw, so my statement "it might stay on trend or maybe break" applies only to graphs extrapolating into what is currently the future.)

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Yudkowsky and Christiano discuss "Takeoff Speeds" · 2021-11-24T23:25:14.327Z · LW · GW

I do wish to note that we spent a fair amount of time on Discord trying to nail down what earlier points we might disagree on, before the world started to end, and these Discord logs should be going up later.

From my perspective, the basic problem is that Eliezer's story looks a lot like "business as usual until the world starts to end sharply", and Paul's story looks like "things continue smoothly until their smooth growth ends the world smoothly", and both of us have ever heard of superforecasting and both of us are liable to predict near-term initial segments by extrapolating straight lines while those are available.  Another basic problem, as I'd see it, is that we tend to tell stories about very different subject matters - I care a lot less than Paul about the quantitative monetary amount invested into Intel, to the point of not really trying to develop expertise about that.

I claim that I came off better than Robin Hanson in our FOOM debate compared to the way that history went.  I'd claim that my early judgments of the probable importance of AGI, at all, stood up generally better than early non-Yudkowskian EA talking about that.  Other people I've noticed ever making correct bold predictions in this field include Demis Hassabis, for predicting that deep learning would work at all, and then for predicting that he could take the field of Go and taking it; and Dario Amodei, for predicting that brute-forcing stacking more layers would be able to produce GPT-2 and GPT-3 instead of just asymptoting and petering out.  I think Paul doesn't need to bet against me to start producing a track record like this; I think he can already start to accumulate reputation by saying what he thinks is bold and predictable about the next 5 years; and if it overlaps "things that interest Eliezer" enough for me to disagree with some of it, better yet.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Yudkowsky and Christiano discuss "Takeoff Speeds" · 2021-11-24T23:07:01.277Z · LW · GW

That was a pretty good Eliezer model; for a second I was trying to remember if and where I'd said that.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Yudkowsky and Christiano discuss "Takeoff Speeds" · 2021-11-24T23:05:37.969Z · LW · GW

Apology accepted.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Yudkowsky and Christiano discuss "Takeoff Speeds" · 2021-11-24T23:04:58.769Z · LW · GW

The "weirdly uncharitable" part is saying that it "seemed like" I hadn't read it vs. asking.  Uncertainty is one thing, leaping to the wrong guess another.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Yudkowsky and Christiano discuss "Takeoff Speeds" · 2021-11-23T23:34:27.863Z · LW · GW

I read "Takeoff Speeds" at the time.  I did not liveblog my reaction to it at the time.  I've read the first two other items.

I flag your weirdly uncharitable inference.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Ngo and Yudkowsky on alignment difficulty · 2021-11-23T17:37:42.720Z · LW · GW

My reply to your distinction between 'consequentialists' and 'outcome pumps' would be, "Please forget entirely about any such thing as a 'consequentialist' as you defined it; I would now like to talk entirely about powerful outcome pumps.  All understanding begins there, and we should only introduce the notion of how outcomes are pumped later in the game.  Understand the work before understanding the engines; nearly every key concept here is implicit in the notion of work rather than in the notion of a particular kind of engine."

(Modulo that lots of times people here are like "Well but a human at a particular intelligence level in a particular complicated circumstance once did this kind of work without the thing happening that it sounds like you say happens with powerful outcome pumps"; and then you have to look at the human engine and its circumstances to understand why outcome pumping could specialize down to that exact place and fashion, which will not be reduplicated in more general outcome pumps that have their dice re-rolled.)

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Ngo and Yudkowsky on AI capability gains · 2021-11-20T20:45:22.059Z · LW · GW

I think some of your confusion may be that you're putting "probability theory" and "Newtonian gravity" into the same bucket.  You've been raised to believe that powerful theories ought to meet certain standards, like successful bold advance experimental predictions, such as Newtonian gravity made about the existence of Neptune (quite a while after the theory was first put forth, though).  "Probability theory" also sounds like a powerful theory, and the people around you believe it, so you think you ought to be able to produce a powerful advance prediction it made; but it is for some reason hard to come up with an example like the discovery of Neptune, so you cast about a bit and think of the central limit theorem.  That theorem is widely used and praised, so it's "powerful", and it wasn't invented before probability theory, so it's "advance", right?  So we can go on putting probability theory in the same bucket as Newtonian gravity?

They're actually just very different kinds of ideas, ontologically speaking, and the standards to which we hold them are properly different ones.  It seems like the sort of thing that would take a subsequence I don't have time to write, expanding beyond the underlying obvious ontological difference between validities and empirical-truths, to cover the way in which "How do we trust this, when" differs between "I have the following new empirical theory about the underlying model of gravity" and "I think that the logical notion of 'arithmetic' is a good tool to use to organize our current understanding of this little-observed phenomenon, and it appears within making the following empirical predictions..."  But at least step one could be saying, "Wait, do these two kinds of ideas actually go into the same bucket at all?"

In particular it seems to me that you want properly to be asking "How do we know this empirical thing ends up looking like it's close to the abstraction?" and not "Can you show me that this abstraction is a very powerful one?"  Like, imagine that instead of asking Newton about planetary movements and how we know that the particular bits of calculus he used were empirically true about the planets in particular, you instead started asking Newton for proof that calculus is a very powerful piece of mathematics worthy to predict the planets themselves - but in a way where you wanted to see some highly valuable material object that calculus had produced, like earlier praiseworthy achievements in alchemy.  I think this would reflect confusion and a wrongly directed inquiry; you would have lost sight of the particular reasoning steps that made ontological sense, in the course of trying to figure out whether calculus was praiseworthy under the standards of praiseworthiness that you'd been previously raised to believe in as universal standards about all ideas.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Ngo and Yudkowsky on alignment difficulty · 2021-11-19T22:38:38.875Z · LW · GW

To Rob's reply, I'll add that my own first reaction to your question was that it seems like a map-territory / perspective issue as appears in eg thermodynamics?  Like, this has a similar flavor to asking "What does it mean to say that a classical system is in a state of high entropy when it actually only has one particular system state?"  Adding this now in case I don't have time to expand on it later; maybe just saying that much will help at all, possibly.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on Discussion with Eliezer Yudkowsky on AGI interventions · 2021-11-19T06:40:54.737Z · LW · GW

are you interested in Redwood's research because it might plausibly generate alignment issues and problems that are analogous to the real problem within the safer regime and technology we have now?

It potentially sheds light on small subpieces of things that are particular aspects that contribute to the Real Problem, like "What actually went into the nonviolence predicate instead of just nonviolence?"  Much of the Real Meta-Problem is that you do not get things analogous to the full Real Problem until you are just about ready to die.

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on A positive case for how we might succeed at prosaic AI alignment · 2021-11-19T06:35:51.626Z · LW · GW

All the really basic concerns—e.g. it tries to get more compute so it can simulate better—can be solved by having a robust Cartesian boundary and having an agent that optimizes an objective defined on actions through the boundary

I'm confused from several directions here.  What is a "robust" Cartesian boundary, why do you think this stops an agent from trying to get more compute, and when you postulate "an agent that optimizes an objective" are you imagining something much more like an old chess-playing system with a known objective than a modern ML system with a loss function?

Comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) on A positive case for how we might succeed at prosaic AI alignment · 2021-11-19T06:31:34.937Z · LW · GW

Closer, yeah.  In the limit of doing insanely complicated things with Bob you will start to break him even if he is faithfully simulated, you will be doing things that would break the actual Bob; but I think HCH schemes fail long before they get to that point.