The Rocket Alignment Problem

eliezer_yudkowsky

The Rocket Alignment Problem

post by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2018-10-04T00:38:58.795Z · LW · GW · 44 comments

44 comments

The following is a fictional dialogue building off of AI Alignment: Why It’s Hard, and Where to Start.

(Somewhere in a not-very-near neighboring world, where science took a very different course…)

ALFONSO: Hello, Beth. I’ve noticed a lot of speculations lately about “spaceplanes” being used to attack cities, or possibly becoming infused with malevolent spirits that inhabit the celestial realms so that they turn on their own engineers.

I’m rather skeptical of these speculations. Indeed, I’m a bit skeptical that airplanes will be able to even rise as high as stratospheric weather balloons anytime in the next century. But I understand that your institute wants to address the potential problem of malevolent or dangerous spaceplanes, and that you think this is an important present-day cause.

BETH: That’s… really not how we at the Mathematics of Intentional Rocketry Institute would phrase things.

The problem of malevolent celestial spirits is what all the news articles are focusing on, but we think the real problem is something entirely different. We’re worried that there’s a difficult, theoretically challenging problem which modern-day rocket punditry is mostly overlooking. We’re worried that if you aim a rocket at where the Moon is in the sky, and press the launch button, the rocket may not actually end up at the Moon.

ALFONSO: I understand that it’s very important to design fins that can stabilize a spaceplane’s flight in heavy winds. That’s important spaceplane safety research and someone needs to do it.

But if you were working on that sort of safety research, I’d expect you to be collaborating tightly with modern airplane engineers to test out your fin designs, to demonstrate that they are actually useful.

BETH: Aerodynamic designs are important features of any safe rocket, and we’re quite glad that rocket scientists are working on these problems and taking safety seriously. That’s not the sort of problem that we at MIRI focus on, though.

ALFONSO: What’s the concern, then? Do you fear that spaceplanes may be developed by ill-intentioned people?

BETH: That’s not the failure mode we’re worried about right now. We’re more worried that right now, nobody can tell you how to point your rocket’s nose such that it goes to the moon, nor indeed any prespecified celestial destination. Whether Google or the US Government or North Korea is the one to launch the rocket won’t make a pragmatic difference to the probability of a successful Moon landing from our perspective, because right now nobody knows how to aim any kind of rocket anywhere.

ALFONSO: I’m not sure I understand.

BETH: We’re worried that even if you aim a rocket at the Moon, such that the nose of the rocket is clearly lined up with the Moon in the sky, the rocket won’t go to the Moon. We’re not sure what a realistic path from the Earth to the moon looks like, but we suspect it might not be a very straight path, and it may not involve pointing the nose of the rocket at the moon at all. We think the most important thing to do next is to advance our understanding of rocket trajectories until we have a better, deeper understanding of what we’ve started calling the “rocket alignment problem”. There are other safety problems, but this rocket alignment problem will probably take the most total time to work on, so it’s the most urgent.

ALFONSO: Hmm, that sounds like a bold claim to me. Do you have a reason to think that there are invisible barriers between here and the moon that the spaceplane might hit? Are you saying that it might get very very windy between here and the moon, more so than on Earth? Both eventualities could be worth preparing for, I suppose, but neither seem likely.

BETH: We don’t think it’s particularly likely that there are invisible barriers, no. And we don’t think it’s going to be especially windy in the celestial reaches — quite the opposite, in fact. The problem is just that we don’t yet know how to plot any trajectory that a vehicle could realistically take to get from Earth to the moon.

ALFONSO: Of course we can’t plot an actual trajectory; wind and weather are too unpredictable. But your claim still seems too strong to me. Just aim the spaceplane at the moon, go up, and have the pilot adjust as necessary. Why wouldn’t that work? Can you prove that a spaceplane aimed at the moon won’t go there?

BETH: We don’t think we can prove anything of that sort, no. Part of the problem is that realistic calculations are extremely hard to do in this area, after you take into account all the atmospheric friction and the movements of other celestial bodies and such. We’ve been trying to solve some drastically simplified problems in this area, on the order of assuming that there is no atmosphere and that all rockets move in perfectly straight lines. Even those unrealistic calculations strongly suggest that, in the much more complicated real world, just pointing your rocket’s nose at the Moon also won’t make your rocket end up at the Moon. I mean, the fact that the real world is more complicated doesn’t exactly make it any easier to get to the Moon.

ALFONSO: Okay, let me take a look at this “understanding” work you say you’re doing…

Huh. Based on what I’ve read about the math you’re trying to do, I can’t say I understand what it has to do with the Moon. Shouldn’t helping spaceplane pilots exactly target the Moon involve looking through lunar telescopes and studying exactly what the Moon looks like, so that the spaceplane pilots can identify particular features of the landscape to land on?

BETH: We think our present stage of understanding is much too crude for a detailed Moon map to be our next research target. We haven’t yet advanced to the point of targeting one crater or another for our landing. We can’t target anything at this point. It’s more along the lines of “figure out how to talk mathematically about curved rocket trajectories, instead of rockets that move in straight lines”. Not even realistically curved trajectories, right now, we’re just trying to get past straight lines at all –

ALFONSO: But planes on Earth move in curved lines all the time, because the Earth itself is curved. It seems reasonable to expect that future spaceplanes will also have the capability to move in curved lines. If your worry is that spaceplanes will only move in straight lines and miss the Moon, and you want to advise rocket engineers to build rockets that move in curved lines, well, that doesn’t seem to me like a great use of anyone’s time.

BETH: You’re trying to draw much too direct of a line between the math we’re working on right now, and actual rocket designs that might exist in the future. It’s not that current rocket ideas are almost right, and we just need to solve one or two more problems to make them work. The conceptual distance that separates anyone from solving the rocket alignment problem is much greater than that.

Right now everyone is confused about rocket trajectories, and we’re trying to become less confused. That’s what we need to do next, not run out and advise rocket engineers to build their rockets the way that our current math papers are talking about. Not until we stop being confused about extremely basic questions like why the Earth doesn’t fall into the Sun.

ALFONSO: I don’t think the Earth is going to collide with the Sun anytime soon. The Sun has been steadily circling the Earth for a long time now.

BETH: I’m not saying that our goal is to address the risk of the Earth falling into the Sun. What I’m trying to say is that if humanity’s present knowledge can’t answer questions like “Why doesn’t the Earth fall into the Sun?” then we don’t know very much about celestial mechanics and we won’t be able to aim a rocket through the celestial reaches in a way that lands softly on the Moon.

As an example of work we’re presently doing that’s aimed at improving our understanding, there’s what we call the “tiling positions” problem. The tiling positions problem is how to fire a cannonball from a cannon in such a way that the cannonball circumnavigates the earth over and over again, “tiling” its initial coordinates like repeating tiles on a tessellated floor –

ALFONSO: I read a little bit about your work on that topic. I have to say, it’s hard for me to see what firing things from cannons has to do with getting to the Moon. Frankly, it sounds an awful lot like Good Old-Fashioned Space Travel, which everyone knows doesn’t work. Maybe Jules Verne thought it was possible to travel around the earth by firing capsules out of cannons, but the modern study of high-altitude planes has completely abandoned the notion of firing things out of cannons. The fact that you go around talking about firing things out of cannons suggests to me that you haven’t kept up with all the innovations in airplane design over the last century, and that your spaceplane designs will be completely unrealistic.

BETH: We know that rockets will not actually be fired out of cannons. We really, really know that. We’re intimately familiar with the reasons why nothing fired out of a modern cannon is ever going to reach escape velocity. I’ve previously written several sequences of articles in which I describe why cannon-based space travel doesn’t work.

ALFONSO: But your current work is all about firing something out a cannon in such a way that it circles the earth over and over. What could that have to do with any realistic advice that you could give to a spaceplane pilot about how to travel to the Moon?

BETH: Again, you’re trying to draw much too straight a line between the math we’re doing right now, and direct advice to future rocket engineers.

We think that if we could find an angle and firing speed such that an ideal cannon, firing an ideal cannonball at that speed, on a perfectly spherical Earth with no atmosphere, would lead to that cannonball entering what we would call a “stable orbit” without hitting the ground, then… we might have understood something really fundamental and important about celestial mechanics.

Or maybe not! It’s hard to know in advance which questions are important and which research avenues will pan out. All you can do is figure out the next tractable-looking problem that confuses you, and try to come up with a solution, and hope that you’ll be less confused after that.

ALFONSO: You’re talking about the cannonball hitting the ground as a problem, and how you want to avoid that and just have the cannonball keep going forever, right? But real spaceplanes aren’t going to be aimed at the ground in the first place, and lots of regular airplanes manage to not hit the ground. It seems to me that this “being fired out of a cannon and hitting the ground” scenario that you’re trying to avoid in this “tiling positions problem” of yours just isn’t a failure mode that real spaceplane designers would need to worry about.

BETH: We are not worried about real rockets being fired out of cannons and hitting the ground. That is not why we’re working on the tiling positions problem. In a way, you’re being far too optimistic about how much of rocket alignment theory is already solved! We’re not so close to understanding how to aim rockets that the kind of designs people are talking about now would work if only we solved a particular set of remaining difficulties like not firing the rocket into the ground. You need to go more meta on understanding the kind of progress we’re trying to make.

We’re working on the tiling positions problem because we think that being able to fire a cannonball at a certain instantaneous velocity such that it enters a stable orbit… is the sort of problem that somebody who could really actually launch a rocket through space and have it move in a particular curve that really actually ended with softly landing on the Moon would be able to solve easily. So the fact that we can’t solve it is alarming. If we can figure out how to solve this much simpler, much more crisply stated “tiling positions problem” with imaginary cannonballs on a perfectly spherical earth with no atmosphere, which is a lot easier to analyze than a Moon launch, we might thereby take one more incremental step towards eventually becoming the sort of people who could plot out a Moon launch.

ALFONSO: If you don’t think that Jules-Verne-style space cannons are the wave of the future, I don’t understand why you keep talking about cannons in particular.

BETH: Because there’s a lot of sophisticated mathematical machinery already developed for aiming cannons. People have been aiming cannons and plotting cannonball trajectories since the sixteenth century. We can take advantage of that existing mathematics to say exactly how, if we fired an ideal cannonball in a certain direction, it would plow into the ground. If we tried talking about rockets with realistically varying acceleration, we can’t even manage to prove that a rocket like that won’t travel around the Earth in a perfect square, because with all that realistically varying acceleration and realistic air friction it’s impossible to make any sort of definite statement one way or another. Our present understanding isn’t up to it.

ALFONSO: Okay, another question in the same vein. Why is MIRI sponsoring work on adding up lots of tiny vectors? I don’t even see what that has to do with rockets in the first place; it seems like this weird side problem in abstract math.

BETH: It’s more like… at several points in our investigation so far, we’ve run into the problem of going from a function about time-varying accelerations to a function about time-varying positions. We kept running into this problem as a blocking point in our math, in several places, so we branched off and started trying to analyze it explicitly. Since it’s about the pure mathematics of points that don’t move in discrete intervals, we call it the “logical undiscreteness” problem. Some of the ways of investigating this problem involve trying to add up lots of tiny, varying vectors to get a big vector. Then we talk about how that sum seems to change more and more slowly, approaching a limit, as the vectors get tinier and tinier and we add up more and more of them… or at least that’s one avenue of approach.

ALFONSO: I just find it hard to imagine people in future spaceplane rockets staring out their viewports and going, “Oh, no, we don’t have tiny enough vectors with which to correct our course! If only there was some way of adding up even more vectors that are even smaller!” I’d expect future calculating machines to do a pretty good job of that already.

BETH: Again, you’re trying to draw much too straight a line between the work we’re doing now, and the implications for future rocket designs. It’s not like we think a rocket design will almost work, but the pilot won’t be able to add up lots of tiny vectors fast enough, so we just need a faster algorithm and then the rocket will get to the Moon. This is foundational mathematical work that we think might play a role in multiple basic concepts for understanding celestial trajectories. When we try to plot out a trajectory that goes all the way to a soft landing on a moving Moon, we feel confused and blocked. We think part of the confusion comes from not being able to go from acceleration functions to position functions, so we’re trying to resolve our confusion.

ALFONSO: This sounds suspiciously like a philosophy-of-mathematics problem, and I don’t think that it’s possible to progress on spaceplane design by doing philosophical research. The field of philosophy is a stagnant quagmire. Some philosophers still believe that going to the moon is impossible; they say that the celestial plane is fundamentally separate from the earthly plane and therefore inaccessible, which is clearly silly. Spaceplane design is an engineering problem, and progress will be made by engineers.

BETH: I agree that rocket design will be carried out by engineers rather than philosophers. I also share some of your frustration with philosophy in general. For that reason, we stick to well-defined mathematical questions that are likely to have actual answers, such as questions about how to fire a cannonball on a perfectly spherical planet with no atmosphere such that it winds up in a stable orbit.

This often requires developing new mathematical frameworks. For example, in the case of the logical undiscreteness problem, we’re developing methods for translating between time-varying accelerations and time-varying positions. You can call the development of new mathematical frameworks “philosophical” if you’d like — but if you do, remember that it’s a very different kind of philosophy than the “speculate about the heavenly and earthly planes” sort, and that we’re always pushing to develop new mathematical frameworks or tools.

ALFONSO: So from the perspective of the public good, what’s a good thing that might happen if you solved this logical undiscreteness problem?

BETH: Mainly, we’d be less confused and our research wouldn’t be blocked and humanity could actually land on the Moon someday. To try and make it more concrete – though it’s hard to do that without actually knowing the concrete solution – we might be able to talk about incrementally more realistic rocket trajectories, because our mathematics would no longer break down as soon as we stopped assuming that rockets moved in straight lines. Our math would be able to talk about exact curves, instead of a series of straight lines that approximate the curve.

ALFONSO: An exact curve that a rocket follows? This gets me into the main problem I have with your project in general. I just don’t believe that any future rocket design will be the sort of thing that can be analyzed with absolute, perfect precision so that you can get the rocket to the Moon based on an absolutely plotted trajectory with no need to steer. That seems to me like a bunch of mathematicians who have no clue how things work in the real world, wanting everything to be perfectly calculated. Look at the way Venus moves in the sky; usually it travels in one direction, but sometimes it goes retrograde in the other direction. We’ll just have to steer as we go.

BETH: That’s not what I meant by talking about exact curves… Look, even if we can invent logical undiscreteness, I agree that it’s futile to try to predict, in advance, the precise trajectories of all of the winds that will strike a rocket on its way off the ground. Though I’ll mention parenthetically that things might actually become calmer and easier to predict, once a rocket gets sufficiently high up –

ALFONSO: Why?

BETH: Let’s just leave that aside for now, since we both agree that rocket positions are hard to predict exactly during the atmospheric part of the trajectory, due to winds and such. And yes, if you can’t exactly predict the initial trajectory, you can’t exactly predict the later trajectory. So, indeed, the proposal is definitely not to have a rocket design so perfect that you can fire it at exactly the right angle and then walk away without the pilot doing any further steering. The point of doing rocket math isn’t that you want to predict the rocket’s exact position at every microsecond, in advance.

ALFONSO: Then why obsess over pure math that’s too simple to describe the rich, complicated real universe where sometimes it rains?

BETH: It’s true that a real rocket isn’t a simple equation on a board. It’s true that there are all sorts of aspects of a real rocket’s shape and internal plumbing that aren’t going to have a mathematically compact characterization. What MIRI is doing isn’t the right degree of mathematization for all rocket engineers for all time; it’s the mathematics for us to be using right now (or so we hope).

To build up the field’s understanding incrementally, we need to talk about ideas whose consequences can be pinpointed precisely enough that people can analyze scenarios in a shared framework. We need enough precision that someone can say, “I think in scenario X, design Y does Z”, and someone else can say, “No, in scenario X, Y actually does W”, and the first person responds, “Darn, you’re right. Well, is there some way to change Y so that it would do Z?”

If you try to make things realistically complicated at this stage of research, all you’re left with is verbal fantasies. When we try to talk to someone with an enormous flowchart of all the gears and steering rudders they think should go into a rocket design, and we try to explain why a rocket pointed at the Moon doesn’t necessarily end up at the Moon, they just reply, “Oh, my rocket won’t do that.” Their ideas have enough vagueness and flex and underspecification that they’ve achieved the safety of nobody being able to prove to them that they’re wrong. It’s impossible to incrementally build up a body of collective knowledge that way.

The goal is to start building up a library of tools and ideas we can use to discuss trajectories formally. Some of the key tools for formalizing and analyzing intuitively plausible-seeming trajectories haven’t yet been expressed using math, and we can live with that for now. We still try to find ways to represent the key ideas in mathematically crisp ways whenever we can. That’s not because math is so neat or so prestigious; it’s part of an ongoing project to have arguments about rocketry that go beyond “Does not!” vs. “Does so!”

ALFONSO: I still get the impression that you’re reaching for the warm, comforting blanket of mathematical reassurance in a realm where mathematical reassurance doesn’t apply. We can’t obtain a mathematical certainty of our spaceplanes being absolutely sure to reach the Moon with nothing going wrong. That being the case, there’s no point in trying to pretend that we can use mathematics to get absolute guarantees about spaceplanes.

BETH: Trust me, I am not going to feel “reassured” about rocketry no matter what math MIRI comes up with. But, yes, of course you can’t obtain a mathematical assurance of any physical proposition, nor assign probability 1 to any empirical statement.

ALFONSO: Yet you talk about proving theorems – proving that a cannonball will go in circles around the earth indefinitely, for example.

BETH: Proving a theorem about a rocket’s trajectory won’t ever let us feel comfortingly certain about where the rocket is actually going to end up. But if you can prove a theorem which says that your rocket would go to the Moon if it launched in a perfect vacuum, maybe you can attach some steering jets to the rocket and then have it actually go to the Moon in real life. Not with 100% probability, but with probability greater than zero.

The point of our work isn’t to take current ideas about rocket aiming from a 99% probability of success to a 100% chance of success. It’s to get past an approximately 0% chance of success, which is where we are now.

ALFONSO: Zero percent?!

BETH: Modulo Cromwell’s Rule, yes, zero percent. If you point a rocket’s nose at the Moon and launch it, it does not go to the Moon.

ALFONSO: I don’t think future spaceplane engineers will actually be that silly, if direct Moon-aiming isn’t a method that works. They’ll lead the Moon’s current motion in the sky, and aim at the part of the sky where Moon will appear on the day the spaceplane is a Moon’s distance away. I’m a bit worried that you’ve been talking about this problem so long without considering such an obvious idea.

BETH: We considered that idea very early on, and we’re pretty sure that it still doesn’t get us to the Moon.

ALFONSO: What if I add steering fins so that the rocket moves in a more curved trajectory? Can you prove that no version of that class of rocket designs will go to the Moon, no matter what I try?

BETH: Can you sketch out the trajectory that you think your rocket will follow?

ALFONSO: It goes from the Earth to the Moon.

BETH: In a bit more detail, maybe?

ALFONSO: No, because in the real world there are always variable wind speeds, we don’t have infinite fuel, and our spaceplanes don’t move in perfectly straight lines.

BETH: Can you sketch out a trajectory that you think a simplified version of your rocket will follow, so we can examine the assumptions your idea requires?

ALFONSO: I just don’t believe in the general methodology you’re proposing for spaceplane designs. We’ll put on some steering fins, turn the wheel as we go, and keep the Moon in our viewports. If we’re off course, we’ll steer back.

BETH: … We’re actually a bit concerned that standard steering fins may stop working once the rocket gets high enough, so you won’t actually find yourself able to correct course by much once you’re in the celestial reaches – like, if you’re already on a good course, you can correct it, but if you screwed up, you won’t just be able to turn around like you could turn around an airplane –

ALFONSO: Why not?

BETH: We can go into that topic too; but even given a simplified model of a rocket that you could steer, a walkthrough of the steps along the path that simplified rocket would take to the Moon would be an important step in moving this discussion forward. Celestial rocketry is a domain that we expect to be unusually difficult – even compared to building rockets on Earth, which is already a famously hard problem because they usually just explode. It’s not that everything has to be neat and mathematical. But the overall difficulty is such that, in a proposal like “lead the moon in the sky,” if the core ideas don’t have a certain amount of solidity about them, it would be equivalent to firing your rocket randomly into the void.

If it feels like you don’t know for sure whether your idea works, but that it might work; if your idea has many plausible-sounding elements, and to you it feels like nobody has been able to convincingly explain to you how it would fail; then, in real life, that proposal has a roughly 0% chance of steering a rocket to the Moon.

If it seems like an idea is extremely solid and clearly well-understood, if it feels like this proposal should definitely take a rocket to the Moon without fail in good conditions, then maybe under the best-case conditions we should assign an 85% subjective credence in success, or something in that vicinity.

ALFONSO: So uncertainty automatically means failure? This is starting to sound a bit paranoid, honestly.

BETH: The idea I’m trying to communicate is something along the lines of, “If you can reason rigorously about why a rocket should definitely work in principle, it might work in real life, but if you have anything less than that, then it definitely won’t work in real life.”

I’m not asking you to give me an absolute mathematical proof of empirical success. I’m asking you to give me something more like a sketch for how a simplified version of your rocket could move, that’s sufficiently determined in its meaning that you can’t just come back and say “Oh, I didn’t mean that” every time someone tries to figure out what it actually does or pinpoint a failure mode.

This isn’t an unreasonable demand that I’m imposing to make it impossible for any ideas to pass my filters. It’s the primary bar all of us have to pass to contribute to collective progress in this field. And a rocket design which can’t even pass that conceptual bar has roughly a 0% chance of landing softly on the Moon.

44 comments

Comments sorted by top scores.

comment by orthonormal · 2018-10-05T18:56:26.687Z · LW(p) · GW(p)

I expect that some otherwise convinceable readers are not going to realize that in this fictional world, people haven't discovered Newton's physics or calculus, and those readers are therefore going to miss the analogy of "this is how MIRI would talk about the situation if they didn't already know the fundamental concepts but had reasons for searching in the right direction". (I'm not thinking of readers incapable of handling that counterfactual, but of readers who aren't great at inferring implicit background facts from a written dialogue. Such readers might get very confused at the unexpected turns of the dialogue and quit rather than figure out what they're baffled by.)

I'd suggest adding to the preamble something like "In a weird world where people had figured out workable aeronautics and basic rocket propulsion by trial and error, but hadn't figured out Newton's laws or calculus".

Replies from: RobbBB, Jayson_Virissimo

↑ comment by Rob Bensinger (RobbBB) · 2018-10-08T23:34:39.256Z · LW(p) · GW(p)

The opening's updated now to try to better hint at this, with: "Somewhere in a not-very-near neighboring world, where science took a very different course…"

↑ comment by Jayson_Virissimo · 2018-10-05T22:37:17.658Z · LW(p) · GW(p)

That would break the rule of "show, don't tell" of fiction writing, but working that into the story more explicitly would help, I agree.

comment by Ben Pace (Benito) · 2018-10-08T23:07:54.803Z · LW(p) · GW(p)

I've curated this post. Here's the reasons that were salient to me:

The post clearly communicated (to me) some subtle intuitions about what useful research looks like when you're deeply confused about basic principles.

The question of "How would I invent calculus if I didn't know what calculus was and I was trying to build a plane" and finding some very specific and basic problems that cut to the heart of what we'd be confused about in that situation like "how to fire a cannonball such that it forever orbits the earth", point to a particular type of thinking that I can see being applied to understanding intelligence, and potentially coming up with incredibly valuable insights.
I had previously been much more confused about why folks around here have been asking many questions about logical uncertainty and such, and this makes it much clearer to me.

As usual, well-written dialogues like this are very easy and fun to read (which for me trades off massively with the length of the post).

Quick thoughts on further work that could be useful:

The post gives me a category of 'is missing some fundamental math like calculus' that applies to rocket alignment and AI alignment. I would be interested in some examples of how to look at a problem and write down a simple story [LW · GW] of how it works, including

some positive and negative examples - situations where it works, situations where it doesn't
what steps in particular seem to fail on the latter
what signs say that the math should exist (e.g. some things are just complicated with few simple models predicting it - I expect a bunch of microbiology looks more like this than, say, physics).

I would also be interested to read historical case studies of this thing happening and what work lead to progress here - Newton, Shannon, others.

comment by Ben Pace (Benito) · 2018-10-08T23:11:38.945Z · LW(p) · GW(p)

Meta: Eliezer previously made a quick attempt to summarise similar points in this SSC comment 2 years ago.

comment by Isnasene · 2020-01-12T23:23:59.038Z · LW(p) · GW(p)

[Disclaimer: I'm reading this post for the first time now, as of 1/11/2020. I also already have a broad understanding of the importance of AI safety. While I am skeptical about MIRI's approach to things, I am also a fan of MIRI. Where this puts me relative to the target demographic of this post, I cannot say.]

Overall Summary

I think this post is pretty good. It's a solid and well-written introduction to some of the intuitions behind AI alignment and the fundamental research that MIRI does. At the same time, the use of analogy made the post more difficult for me to parse and hid some important considerations about AI alignment from view. Though it may be good (but not optimal) for introducing some people to the problem of AI alignment and a subset of MIRI's work, it did not raise or lower my opinion of MIRI as someone who already understood AGI safety to be important.

To be clear, I do not consider any of these weaknesses serious because I believe them to be partially irrelevant to the audience of people who don't appreciate the importance of AI-Safety. Still, they are relevant to the audience of people who give AI-Safety the appropriate scrutiny but remain skeptical of MIRI. And I think this latter audience is important enough to assign this article a "pretty good" instead of a "great".

I hope a future post directly explores the merit of MIRI's work on the context AI alignment without use of analogy.

Below is an overview of my likes and dislikes in this post. I will go into more detail about them in the next section, "Evaluating Analogies."

Things I liked:

It's a solid introduction to AI-alignment, covering a broad range of topics including:

Why we shouldn't expect aligned AGI by default
How modern conversation about AGI behavior is problematically underspecified
Why fundamental deconfusion research is necessary for solving AI-alignment

It directly explains the value/motivation of particular pieces of MIRI work via analogy -- which is especially nice given that it's hard for the layman to actually appreciate the mathematically complex stuff MIRI is doing
On the whole, the analogy is elegant

Things I disliked:

Analogizing AI alignment to rocket alignment created a framing that hid important aspects of AI alignment from view and (unintentionally) stacked the deck in favor of MIRI.

A criticism of rocket alignment research with a plausible AI alignment analog was neglected (and could only be addressed by breaking the analogy).
An argument in favor of MIRI for rocket alignment had an AI analog that was much less convincing when considered in the context of AI alignment unique facts.

The cognitive effort I spent mapping the rocket alignment problem to the AI alignment problem took more cognitive effort than just directly reading justifications of AI alignment and MIRI
The world-building wasn't great

The actual world of the dialogue is counterintuitive -- imagine a situation where planes and rockets exist (or don't exist, but are being theorized about), but no one knows calculus (despite modeling cannonballs pretty well) or how centripetal force+gravity works. It's hard for me to parse the exact epistemic meaning of any given statement relative to the world
The world-building wasn't particularly clear -- it took me a while to completely parse that calculus hadn't been invented.

There's a lot of asides where Beth (a stand-in for a member of MIRI) makes nontrivial scientific claims that we know to be true. While this is technically justified (MIRI does math and is unlikely to make claims that are wrong; and Eliezer has been right about about a lot of stuff and does deserve credit), it probably just feels smug and irritating to people who are MIRI-skeptics, aka this post's probable target.

Evaluating Analogies

Since this post is intended as an analogy to AI alignment, evaluating its insights requires two steps. First, one must re-interpret the post in the context of AI alignment. Second, one must take that re-interpretation and see whether it holds up. This means that, if I criticize the content of this post -- my criticism might be directly in error or my interpretation could be in error.

1. The Alignment Problem Analogy:

Overall, I think the analogy between the Rocket Alignment Problem and the AI Alignment Problem is pretty good. Structurally speaking, they're identical and I can convert one to the other by swapping words around:

Rocket Alignment: "We know the conditions rockets fly under on Earth but, as we make our rockets fly higher and higher, we have reasons to expect those conditions to break down. Things like wind and weather conditions will stop being relevant and other weird conditions (like whatever keeps the Earth moving around the sun) will take hold! If we don't understand those, we'll never get to the moon!"

AI Alignment: "We know the conditions that modern AI performs under right now, but as we make our AI solve more and more complex problems, we have reason to expect those conditions to break down. Things like model overfitting and sample-size limitations will stop being relevant and other weird conditions (like noticing problems so subtle and possible decisions so clever that you as a human can't reason about them) will take hold! If we don't understand those, we'll never make an AI that does what we want!"

1a. Flaws In the Alignment Problem Analogy:

While the alignment problem is pretty good, it leaves out the key and fundamentally important fact that failed AI Alignment will end the world. While it's often not a big deal when an analogy isn't completely accurate, missing this fact leaves MIRI-skeptics with a pretty strong counter-argument that can only exist outside of the analogy:

In Rocket Alignment terms -- "Why bother thinking about all this stuff now? If conditions are different in space, we'll learn that when we start launching things into space and see things happen to them? This sounds more efficient than worrying about cannonballs."

In AI Alignment terms -- "Why bother thinking about all this stuff now? If conditions are different when AI start getting clever, we'll learn about those differences once we start making actual AI that are clever enough to behave like agents. This sounds more efficient than navel-gazing about mathematical constructs."

If you explore this counter-argument and its counter-counter-argument deeper, the conversation gets pretty interesting:

MIRI-Skeptic: Fine okay. The analogy breaks down there. We can't empirically study a superintelligent AI safely. But we can make AI that are slightly smarter than us but put security mechanisms around them that only AI extremely smarter than us would be expected to break. Then we can learn experimentally from the behavior of those AI about how to make clever AI safe. Again, easier than navel-gazing about mathematical constructs and we might expect this to happen because slow take-off.

MIRI-Defender: First of all, there's no theoretical reason we would expect to be able to extrapolate the behavior of slightly clever AI to the behavior of extremely clever AI. Second, we have empirical reasons for thinking your empirical approach won't work. We already did a test-run of your experiment proposal with a slightly clever being; we put Eliezer Yudkowsky in an inescapable box armed with only a communication tool and the guard let him out (twice!).

MIRI-Skeptic: Fair enough but... [Author's Note: There are further replies to MIRI-Defender but this is a dialogue for another day]

Given that this post is supposed to address MIRI skeptics and that the aforementioned conversation is extremely relevant to judging the benefits of MIRI, I consider the inabillity to address this argument to be a flaw -- despite it being an understandable flaw in the context of the analogy used.

2. The Understanding Intractably Complicated Things with Simple Things Analogy:

I think that this is a cool insight (with parallels to inverse-inverse problems [LW · GW]) and the above post captures it very well. Explicitly, the analogy is this: "Rocket Alignment to Cannonballs is like AI Alignment to tiling agents." Structurally speaking, they're identical and I can convert one to the other by swapping words around:

Rocket Modeling: "We can't think about rocket trajectories using actual real rockets under actual real conditions because there are so many factors and complications that can affect them. But, per the rocket alignment problem, we need to understand the weird conditions that rockets need to deal with when they're really high up and these conditions should apply to a lot of things that are way simpler than rockets. So instead of dealing with the incredibly hard problem of modeling rockets, let's try really simple problems using other high-up fast-moving objects like cannonballs."

AI Alignment: "We can't think about AI behavior using actual AI under actual real conditions because there are so many factors and complications that can affect them. But, per the AI alignment problem, we need to understand the weird conditions that AI need to deal with when they're extremely intelligent and these conditions should apply to a lot of things that are way simpler than modern AI. So instead of dealing with the incredibly hard problem of modeling AI, let's try the really simple problem of using other intelligent decision-making things like Tiling Agents."

3. The "We Need Better Mathematics to Know What We're Talking About" Analogy

I really like just how perfect this analogy is. The way that AI "trajectory" and literal physical rocket trajectory line-up feels nice.

Rocket Alignment: "There's a lot of trouble figuring out exactly where a rocket will go at any given moment as it's going higher and higher. We need calculus to make claims about this."

AI alignment: "There's a lot of trouble figuring out exactly what an AI will do at any given moment as it gets smarter and smarter (ie self-modification but also just in general). We need to understand how to model logical uncertainty to even say anything about its decisions."

4. The "Mathematics Won't Give Us Accurate Models But It Will Give Us the Ability to Talk Intelligently" Analogy

This analogy basically works...

Rocket Alignment: "We can't use math to accurately predict rockets in real life but we need some of if so we can even reason about what rockets might do. Also we expect our math to get more accurate when the rockets get higher up."

AI alignment: "We can't use math to accurately predict AGI in real life but we need some of if so we can even reason about what AGI might do. Also we expect our math to get more accurate when the AGI gets way smarter."

I also enjoy the way this discussion lightly captures the frustration that the AI Safety community has felt. Many skeptics have claimed their AGIs won't become misaligned but then never specify the details of why that wouldn't have it. And when AI Safety proponents produce situations where the AGI does become misaligned, the skeptics move the goal posts.

4a. Flaws in the "Mathematics Won't Give Us Accurate Models But It Will Give Us the Ability to Talk Intelligently" Analogy

On a cursory glance, the above analogy seems to make sense. But, again, this analogy breaks down on the object level. I'd expect being able to talk precisely about what conditions affect movement in space to help us make better claims about how a rocket would go to the moon because that is just moving in space in a particular way. The research (if successful) completes the set of knowledge needed to reach the goal.

But being able to talk precisely about the trajectory of an AGI doesn't really help us talk precisely about getting to the "destination" of friendly AGI for a couple reasons:

For rocket trajectories, there are clear control parameters that can be used to exploit the predictions made by a good understanding of how trajectories work. But for AI alignment, I'm not sure what would constitute a control parameter that would exploit a hypothetical good understanding of what strategies superintelligent beings use to make decisions.
For rocket trajectories, the knowledge set of how to get a rocket into a point in outer-space and how to predict the trajectories of objects in outer-space basically encompass the things one would need to know to get that rocket to the moon. For AGI trajectories, the trajectories depend on three things: it's decision theory (a la logical uncertainty, tiling agents, decision theory...), the actual state of the world that the AGI perceives (which is fundamentally unknowable to us humans, since the AGI will be much more perceptive than us), and its goals (which are well-known to be orthogonal to the AGI's actual strategy algorithms).
Given the above, we know scenarios where we understand agent foundations but not the goals of our agents won't work. But, if we do figure out the goals of our agents, it's not obvious that controlling those superintelligent agents' rationality skills will be a good use of our time. After all, they'll come up with better strategies than we would.

Like I guess you could argue that we can view our goals as the initial conditions and then use our agent foundations to reason about the AGI behavior given those goals and decide if we like its choices... But again, the AGI is more perceptive than us. I'm not sure if we could capably design toy circumstances for an AGI to behave under that would reflect the circumstances of reality in a meaningful way
Also, to be fair, MIRI does work on goal-oriented stuff in addition to agent-oriented stuff. Corrigibility ,which the post later links to, is an example of this. But, frankly, my expectation that this kind of thing will pan out is pretty low.

In principle, the rocket alignment analogy could've written in a way that captured the above concerns. For instance, instead of asking the question "How do we get this rocket to the moon when we don't understand how things move in outer-space?", we could ask "How do we get this rocket to the moon when we don't understand how things move in outer-space, we have a high amount of uncertainty about what exactly is up there in outer-space, and we don't have specifics about what exactly the moon is?"

But that would make this a much different, and much more epistemologically labyrinthian post.

Minor Comments

1. I appreciate the analogizing of an awesome thing (landing on the moon) to another awesome thing (making a friendly AGI). The AI safety community is quite rationally focused mostly on how bad a misaligned AI would be but I always enjoy spending some time thinking about the positives.

2. I noticed that Alfonso keeps using the term "spaceplanes" and Beth never does. I might be reading into it but my understanding is that this is done to capture how deeply frustrating when people talk about the thing you're studying (AGI) like it's something superficially similar but fundamentally different (modern machine-learning but like, with better data).

However, coming into this dialogue without any background on the world involved, the apparent interchangeability of spaceplane and rocket just felt confusing.

As an example of work we’re presently doing that’s aimed at improving our understanding, there’s what we call the “tiling positions” problem. The tiling positions problem is how to fire a cannonball from a cannon in such a way that the cannonball circumnavigates the earth over and over again, “tiling” its initial coordinates like repeating tiles on a tessellated floor –

Because of the deliberate choice to analogize tiling agents and tiling positions, I spent probably five minutes trying to figure out exactly what the relationship between tiling positions and rocket alignment meant about tiling agents and AI alignment. It seems to me tiling isn't clearly necessary in the former (understanding any kind of trajectory should do the job) while it is in the latter (understanding how AI can guarantee similar behavior in agents it creates seems fundamentally important).

My impression now is that this was just a conceptual pun on the idea of tiling. I appreciate that but I'm not sure it's good for this post. The reason I thought so hard about this was also because the Logical Discreteness/Logical Uncertainty analogy seemed deeper.

comment by lexande · 2018-10-11T00:30:22.826Z · LW(p) · GW(p)

Next time I see somebody say "shoot for the moon so if you miss you'll land among the stars" I'm going to link them here.

comment by An_Amazing_Login · 2018-10-14T19:16:02.665Z · LW(p) · GW(p)

Ouch.

This post is very frightening to me in that I had not viscerally understood just how far off we are from solving FAI. Are we really (probably) in the area of before solving the problem, invent Calculus? Is that Hyperbole? Or worse? Also, the post also gives me hope that the problem of FAI is solvable (hopefully tractably!). When i was at first shown the problems that arise from just trying something, it felt like no one had any attack whatsoever. Now i think i understand better that no one really knows anything, but they do have some sort of an attack on it. I'd feel pretty safe if one could formalise a solution in math without paradox.

This might constitute an artists complaint, but since i believe your goal is to effectively persuade/argue/show i think that the post suffered from stewing on the exact same material all the way through. Maybe contrast Beth/Alonso with a Dunce/Alonso interaction? Show how immediatly jumping on a solution is rather silly directly, not only indirectly?

Anyways, my sympathies if conversations like this are part of your daily (weekly) work. It hurt just to read.

Replies from: RobbBB

↑ comment by Rob Bensinger (RobbBB) · 2018-10-14T20:10:59.742Z · LW(p) · GW(p)

I think Eliezer's goal was mainly to illustrate the kind of difficulty FAI is, rather than the size of the difficulty. But they aren't totally unrelated; basic conceptual progress and coming up with new formal approaches often requires a fair amount of serial time (especially where one insight is needed before you can even start working toward a second insight), and progress is often sporadic compared to more applied/well-understood technical goals.

It would usually be extremely tough to estimate how much work was left if you were actually in the "rocket alignment" hypothetical—e.g., to tell with confidence whether you were 4 years or 20 years away from solving "logical undiscreteness". In the real world, similarly, I don't think anyone knows how hard the AI alignment problem is. If we can change the character of the problem from "we're confused about how to do this in principle" to "we fundamentally get how one could align an AGI in the real world, but we haven't found code solutions for all the snags that come with implementation", then it would be much less weird to me if you could predict how much work was still left.

comment by drethelin · 2018-10-10T16:19:18.538Z · LW(p) · GW(p)

I liked this at first, but about 6-7 paragraphs of back-and-forth in I got really tired, started skimming, and then realized it went on for way more pages.

I get that your internal experience of trying to talk to people about AI risk feels like going on and on in this sort of "humble and correct person argues with smugly incorrect person forever" way but I don't think that's a good format for actually convincing people.

comment by ryan_b · 2018-10-09T16:37:34.147Z · LW(p) · GW(p)

Is it safe for me to assume that all the relevant sub-analogies within the dialogue have been linked already? By that I mean, wherever the specific example is an analogy for something actually from MIRI, that information is linked?

This made me want a State of MIRI Research post for the layperson, in the style of "the way hard problems are handled" instead of situating projects in their respective fields or within the objective of alignment. I feel like this post crystallized the meta-research level in a way I have not encountered before, and that might be an important contribution in its own right.

comment by Shmi (shminux) · 2018-10-04T07:07:01.053Z · LW(p) · GW(p)

ALFONSO: So the first few rockets might not hit the Moon, I get that. Might fall back down and hurt someone even. But that's how technology advances in general, doesn't it? The first planes crashed often, and it took a long time for all the details to be worked out both theoretically and experimentally, both helping each other. Wouldn't Moon Rocketry follow the standard pattern?

BETH: We are very much worried that this is not what would happen at all! Moon Rocketry is special! The very first rocket that goes very high but misses the Moon will have enough power to destroy both the Earth and the Moon. And the Sun. And there is nothing we can do about it once it's fired. So we better get the first rocket as right as we possibly can, the stakes are just too high.

ALFONSO: This sounds alarmist and super far fetched! But humor me, explain what reasons do you have for your suspicion.

BETH: Well, we suspect that to get really high up, a rocket will need to keep gaining power, the higher up, the more power, and if it misses the Moon, the fallout from its engines will be so bad, it may end up causing worldwide destruction.

ALFONSO: I think your analogy between propulsion and information is getting a bit strained.

Replies from: Benito, vanessa-kosoy

↑ comment by Ben Pace (Benito) · 2018-10-04T07:40:12.962Z · LW(p) · GW(p)

I mean, analogies don't have to be similar in all respects to be useful explanations, just in the few respects that you're using the analogy for. OP isn't arguing that AI alignment is important because rocket alignment is important, it's only using the analogy to describe the type of work that it thinks needs to be done to align AGI - which I'm guessing has been difficult to describe before writing this post. Arguments that AGI needs to be built right the first time have been discussed elsewhere, and you're right that this post doesn't make that arg.

(On this side-topic of whether AGI needs to be build precisely right first time, and counter to your point that we-always-get-stuff-wrong-a-bunch-at-first-and-that's-fine, I liked Max Tegmark's story of how we're building technologies that increasingly have less affordance for error - fire, nukes, AGI - some of these having a few mistakes was of small damage, then of big damage, and in principle we may hit tech where initial mistakes are existential in nature. I think there are some sane args that make AGI seems like a plausible instance of this.

For discussion of the AI details I'd point elsewhere, to things like Gwern on "Why Tool AIs Want to be Agent AIs", Paul Christiano discussing arguments for fast-takeoff speeds [LW · GW], the paper Intelligence Explosion Microeconomics, and of course Bostrom's book.)

(edited a few lines to be clearer/shorter)

Replies from: shminux

↑ comment by Shmi (shminux) · 2018-10-04T22:57:28.837Z · LW(p) · GW(p)

My point, clearly not well expressed, is that the main issue why the AI alignment has to be figured out in advance is not even mentioned in the OP's dialogue:

We think the most important thing to do next is to advance our understanding of rocket trajectories until we have a better, deeper understanding of what we’ve started calling the “rocket alignment problem” There are other safety problems, but this rocket alignment problem will probably take the most total time to work on, so it’s the most urgent.

... why? So what if this problem remains after the other problems are solved and the rockets are flying every which way? I have tried to answer that, since Eliezer hasn't in this post, despite this being the main impetus of MIRI's work.

Replies from: Lunawarrior

↑ comment by Lunawarrior · 2018-10-09T19:07:01.294Z · LW(p) · GW(p)

I feel like the post is trying to convince the reader that AI alignment needs to be solved AT ALL. You can worry about arguing when it needs to be solved after the other person in convinced there is a problem to solve in the first place.

↑ comment by Vanessa Kosoy (vanessa-kosoy) · 2018-10-04T13:00:05.762Z · LW(p) · GW(p)

I agree with Ben, and also, humanity successfully sent a spaceship to the moon surface on the second attempt and successfully sent people (higher stakes) to the moon surface on the first attempt. This shows that difficult technological problems can be solved without extensive trial and error. (Obviously some trial and error on easier problems was done to get to the point of landing on the moon, and no doubt the same will be true of AGI. But, there is hope that the actual AGI can be constructed without trial and error, or at least without the sort of trial and error where error is potentially catastrophic.)

Replies from: None

↑ comment by [deleted] · 2018-10-04T15:59:33.692Z · LW(p) · GW(p)

The trouble with this problem is the rocket used for this was a system of welded and bolted together parts. The functions and rules of each system remained the same throughout the flight and thus it was possible to model. Self improving AI, it would be like if we used the rocket exhaust from the Saturn V to melt metal used in other parts of the rocket during the flight to the Moon.

I can see a way to do self-improving AI : separate modular subsystems, each being evaluated by some connection either directly or indirectly to the real world. But in that case, while each subsystem may be a "black box" that is ever-evolving, basically the function remains the same. Like you might have a box that re-renders scenes from a camera without shadows. And there's feedback and ways it can get better at it's job. And there's a meta-system that can gut the architecture of that box and replace it with a new internal way to do this task. But, all of the time, the box is still just subtracting shadows, it never does anything else.

Replies from: TheWakalix

↑ comment by TheWakalix · 2018-10-10T04:09:22.770Z · LW(p) · GW(p)

I don't think we need to explicitly plan for the later stages. If we have a sufficiently advanced AI that we know is aligned and capable of intelligently self-modifying without becoming unaligned, we can probably put more confidence in the seed AI's ability to construct the final stages than in our ability to shape the seed AI to better construct the final stages.

Edit: that's insufficient. What I mean is that once you make the seed AI I described, any change you make to the seed AI that's explicitly for the purpose of guiding its takeoff will be practically useless and possibly harmful given the AI's advantage. I think we may reach a point where we can trust the seed AI to do the job well better than we can trust ourselves to do the job well.

comment by Tapatakt · 2021-12-09T13:27:00.112Z · LW(p) · GW(p)

Translation into Russian by me

comment by gjm · 2018-10-04T20:18:27.356Z · LW(p) · GW(p)

I have a sense of déjà vu when reading this. Has Eliezer used the rocket-alignment analogy before somewhere?

... Aha, found it (or at least something that might be it): this LW article from 2015 [LW · GW].

Replies from: RobbBB

↑ comment by Rob Bensinger (RobbBB) · 2018-10-04T20:55:03.367Z · LW(p) · GW(p)

Yeah, that article was originally an attempt to "essay-ify" an earlier draft of this very dialogue. But I don't think the essay version succeeded at communicating the idea very well.

The dialogue is at least better, I think, if you have the relevant context ("MIRI is a math research group that works on AI safety and likes silly analogies") and know what the dialogue is trying to do ("better pinpoint the way MIRI thinks of our current understanding of AGI alignment, and the way MIRI thinks of its research as relevant to improving our understanding, without trying to argue for those models").

comment by romeostevensit · 2019-11-22T05:05:02.031Z · LW(p) · GW(p)

Good for loading intuitions about complexity of philosophical progress disagreements.

comment by Richard_Ngo (ricraz) · 2019-11-21T14:08:22.821Z · LW(p) · GW(p)

It's been very helpful for understanding the motivations behind MIRI's "deconfusion" research, in particular through linking it to another hard technical problem.

comment by Chris_Leong · 2019-11-24T00:05:41.736Z · LW(p) · GW(p)

This is a very important post. It provides a justification for why agent foundations research might be important which was always unclear to me.

comment by Zvi · 2020-01-16T13:20:09.601Z · LW(p) · GW(p)

Big fan of this but, like most of us, I knew all this already. What I want to know is, how effective is/was this when not preaching to the choir? What happens when someone who doesn't understand MIRI's mission starts to read this? I'd like to think it helps them grok what is going on reasonably often, but I could be fooling myself, and that question is ultimately the test of how vital this really is.

Replies from: Benito, liam-donovan

↑ comment by Ben Pace (Benito) · 2020-01-16T17:32:29.644Z · LW(p) · GW(p)

Huh, I’m surprised to hear you say you already knew it. I did not know this already. This is the post where I properly understood that Eliezer et al are interested in decision theory and tiling agents and so on, not because they’re direct failures that they expect of future systems, but because they highlight confusions that are in want of basic theory to describe them, and that this basic theory will hopefully help make AGI alignable. Like I think I’d heard the words once or twice before then, but I didn’t really get it.

(Its important that Embedded Agenyou came out too, which was entirely framed around this “list of confusions in want of better concepts / basic theory” so I has some more concrete things to pin this to.)

Replies from: Raemon

↑ comment by Raemon · 2020-01-16T19:48:32.630Z · LW(p) · GW(p)

FYI I also didn’t learn much from this post. (But, the places I did learn it from were random comments buried in threads that didn’t make it easy for people to learn)

Replies from: Benito

↑ comment by Ben Pace (Benito) · 2020-01-16T19:59:57.122Z · LW(p) · GW(p)

Fair, but I expect I've also read those comments buried in random threads. Like, Nate said it here [EA(p) · GW(p)] three years ago on the EA Forum.

The main case for [the problems we tackle in MIRI's agent foundations research] is that we expect them to help in a gestalt way with many different known failure modes (and, plausibly, unknown ones). E.g., 'developing a basic understanding of counterfactual reasoning improves our ability to understand the first AGI systems in a general way, and if we understand AGI better it's likelier we can build systems to address deception, edge instantiation, goal instability, and a number of other problems'.

I have a mental model of directly working on problems. But before Eliezer's post, I didn't have an alternative mental model to move probability mass toward. I just funnelled probability mass away from "MIRI is working on direct problems they foresee in AI systems" to "I don't understand why MIRI is doing what it's doing". Nowadays I have a clearer pointer to what technical research looks like when you're trying to get less confused and get better concepts.

This sounds weirdly dumb to say in retrospect, because 'get less confused and get better concepts' is one of the primary ways I think about trying to understand the world these days. I guess the general concepts have permeated a lot of LW/rationality discussion. But at the time I guess I had a concept shaped whole in my discussion of AI alignment research, and after reading this post I had a much clearer sense of that concept.

↑ comment by Liam Donovan (liam-donovan) · 2020-01-20T20:38:24.483Z · LW(p) · GW(p)

For what it's worth, I was just learning about the basics of MIRI's research when this came out, and reading it made me less convinced of the value of MIRI's research agenda. That's not necessarily a major problem, since the expected change in belief after encountering a given post should be 0, and I already had a lot of trust in MIRI. However, I found this post [LW · GW] by Jessica Taylor vastly clearer and more persuasive (it was written before "Rocket Alignment", but I read "Rocket Alignment" first). In particular, I would expect AI researchers to be much more competent than the portrayal of spaceplane engineers in the post, and it wasn't clear to me why the analogy should be strong Bayesian evidence for MIRI being correct.

comment by Eli Tyre (elityre) · 2019-12-02T04:41:37.715Z · LW(p) · GW(p)

Clarified what MIRI is about.

comment by Swimmer963 (Miranda Dixon-Luinenburg) (Swimmer963) · 2019-11-29T20:18:09.273Z · LW(p) · GW(p)

Used for research for my EA/rationality novel project, this is a really good analogy!

comment by Vanessa Kosoy (vanessa-kosoy) · 2019-11-27T19:36:16.241Z · LW(p) · GW(p)

I think this post is a good and memorable explanation-by-analogy of what kind of research MIRI is doing to solve AI risk and a good and memorable response to some common criticism of or confusion about the former.

comment by Raemon · 2019-12-30T22:54:53.010Z · LW(p) · GW(p)

I just wanted to flag that this post hasn't been reviewed [LW · GW] yet, despite being one of the most nominated posts. (And mosts of the nominations here are quite short).

The most obvious sort of review that'd be good to see is from people who were in this post's target demographic (i.e. people who hadn't understood or been unpersuaded about what sort of problem MIRI is trying to solve), about whether this post actually helped them understand that.

I'd also be interested in reviews that grapple a bit more with "how well exactly does this metaphor hold up?", although I'm not sure how to go about that.

Replies from: kabir-kumar

↑ comment by Kabir Kumar (kabir-kumar) · 2024-10-26T12:51:03.225Z · LW(p) · GW(p)

personally, I found how Beth just kept saying 'not really' and not saying the actual physics very very annoying.

Replies from: Raemon

↑ comment by Raemon · 2024-10-26T16:08:07.086Z · LW(p) · GW(p)

A thing I wanted to check: were you grokking the general premise that calculus and much of physics haven't been invented yet, and the metaphor here is more about an early stage physicist who has gotten a sense of how "I feel confused here, and I might need to invent [something that will turn out to be calculus]", but, it's at an early enough stage that crisp physics to easily explain it doesn't exist yet?

(If you did get that part, I'm interested in hearing a little bit more about what felt annoying, and if you didn't get that, I'm interested in what sort of things might have helped make the pre-physics/calculus part more clear)

Replies from: kabir-kumar

↑ comment by Kabir Kumar (kabir-kumar) · 2024-10-28T09:11:41.233Z · LW(p) · GW(p)

I didn't get the premise, no. I got that it was before a lot of physics was known, didn't know they didn't know calculus either.
Just stating it plainly and clearly at the start would have been good. Even with that premise, I still find it very annoying. I despise the refusal to speak clearly, the way it's constantly dancing around the bush, not saying the actual point, to me this is pretty obviously because the actual point is a nothing burger(because the analogy is bad) and by dancing around it, the text is trying to distract me and convince me of the point before I realize how dumb it is.

Why the analogy is bad: rocket flights can be tested and simulated much more easily than a superintelligence, with a lot less risk

Analogies are by nature lossy, this one is especially so.

comment by Peter Smythe (peter-smythe) · 2019-04-27T02:17:58.172Z · LW(p) · GW(p)

The funny thing is, once you are in orbit, pointing at Earth's moon as it comes up over the horizon DOES get you there. This fact was used by the Apollo astronauts to eyeball their maneuver.

comment by Peter Smythe (peter-smythe) · 2019-04-27T01:59:16.985Z · LW(p) · GW(p)

In a world in which somehow we don't have Calculus, Gravity, or a Heliocentric model of the solar system, but somehow the US goverment, Google, and North Korea are all valid references and have the same connotations they do in our universe.

comment by [deleted] · 2018-10-04T15:32:21.844Z · LW(p) · GW(p)

I hate to say this but I'm taking the side of the Spaceplane designers. Perhaps it's because it's what I know.

That is, I can't really extrapolate to fully sentient, free to act AI. What I see as plausible is we build a vast set of shared libraries used in automated systems all over the economy. And each of these is essentially a hierarchical stack of agents. So a robotic arm might have agents that use reinforcement learning to classify what's in range of the arm. And then internally there's an agent that evaluates possible paths and the estimated reward from moving the arm to do something. And then there's control algorithms that generate the exact motor control sequences of commands. And a higher level system above this that is commanding various robotic systems to do things, with a goal of making paperclips or whatever.

And meta-systems that can try possible architectural changes to pieces of this system and determine if the changes are 'better', using Baysian calculations of the distribution of performance.

So, ok, what stops the script that orders the robotic systems to do things to make paperclips from ordering up killbots so it can kill people who are blocking the system from making more paperclips?

Well the obvious thing to do would be to add more heuristics to your paperclip maker. Instead of just having a goal to make the number of paperclips bigger, add some more goals. Add a goal to keep energy consumption under a certain level. And factory space taken to a certain level. And put a value on each person you statistically kill so you consider killing a person (by, saying, having your robotic actuators impinge on the probability cloud where you think a human worker is) pretty bad. (but not infinitely bad or your actuators won't move at all because the probability a human is in the way is never exactly zero)

So that's the "guidance fins" of this analogy. And the truth is, once you start getting spaceplanes into orbit and notice that heading right for the moon isn't making progress, you could probably get together some mathematicians and scrum together a rough model of orbital mechanics in time for the next launch.

And making a spaceplane so powerful it wrecks the planet if it crashes into it, when you don't know what you are doing...seems implausible to me. I agree we can make an AI that powerful but I think we would need to know what we are doing. Nobody made fission bombs work by slamming radioactive rocks together, it took a set of millions of deliberate actions in a row, by an army of people, to get to the first nuclear weapon.

A nuclear weapon is a system of tightly interconnected components. Some parts of it are extremely pure substances that took enormous energy (and human effort) to obtain.

I think a sentient AI is the same way. Nobody's inventing one at a terminal in their basement. You'll need libraries, trained on millions of working robot systems in the real world. Probably some new programming languages meant for the purpose that we haven't invented yet. A vast network of server-clouds that provide the compute power to test and iterate on your models. Your actual underlying machine learning techniques will most likely been machine optimized.

Once you have all these pieces available to parties with sufficient budgets, it would be like having a way to order highly enriched plutonium from Granger. Then it would be possible to build a closed-loop, self improving system.

Replies from: RobbBB, Vaniver

↑ comment by Rob Bensinger (RobbBB) · 2018-10-04T19:17:37.305Z · LW(p) · GW(p)

I hate to say this but I'm taking the side of the Spaceplane designers. Perhaps it's because it's what I know.

Three things I think it's important to note explicitly here:

1. Eliezer's essay above is just trying to state where he thinks humanity's understanding of AI alignment is, and where he thinks it ultimately needs to be. The point of the fictional example is to make this view more concrete by explaining it in terms of concepts that we already understand well (rockets, calculus, etc.). None of this is an argument for Eliezer's view "our understanding of AI alignment is relevantly analogous to the fictional rocket example", just an attempt to be clearer about what the view even is.

2. "Don't worry about developing calculus, questioning the geocentric model of the solar system, etc." is the wrong decision in the fictional example Eliezer provided. You suggest, "once you start getting spaceplanes into orbit and notice that heading right for the moon isn't making progress, you could probably get together some mathematicians and scrum together a rough model of orbital mechanics in time for the next launch". I don't think this is a realistic model of how basic research works. Possibly this is a crux [LW · GW] between our models?

3. The value of the rocket analogy is that it describes a concrete "way the world could be" with respect to AI. Once this is added to the set of hypotheses under consideration, the important thing is to try to assess the evidence for which possible world we're in. "I choose to act as though this other hypothesis is true because it's what I know" should set off alarm bells in that context, as should any impulse to take the side of Team Don't-Try-To-Understand-Calculus in the contrived fictional example, because this suggests that your models and choices might be insensitive to whether you're actually in the kind of world where you're missing an important tool like calculus.

It's 100% fine to disagree about whether we are in fact in that world, but any indication that we should unconditionally act as though we're not in that world -- e.g., for reasons other than Bayesian evidence about our environment, or for reasons so strong they're insensitive even to things as important as "we trying to get to the Moon and we haven't figured out calculus yet" -- should set off major alarms.

And making a spaceplane so powerful it wrecks the planet if it crashes into it, when you don't know what you are doing...seems implausible to me.

Eliezer means the rocket analogy to illustrate his views on 'how well do we understand AI alignment, and what kind of understanding is missing?', not 'how big a deal is it if we mess up?' AI systems aren't rockets, so there's no reason to extend the analogy further. (If we do want to compare flying machines and scientific-reasoning machines on this dimension, I'd call it relevant that flying organs have evolved many times in Nature, and never become globally dominant; whereas scientific-reasoning organs evolved just once, and took over the world very quickly.)

A relevant argument that's nearby in conceptspace is 'technologies are rarely that impactful, full stop; so we should have a strong prior that AGI won't be that impactful either'.

I agree we can make an AI that powerful but I think we would need to know what we are doing. Nobody made fission bombs work by slamming radioactive rocks together, it took a set of millions of deliberate actions in a row, by an army of people, to get to the first nuclear weapon.

Eliezer doesn't mean to argue that we'll get to AGI by pure brute force, just more brute force than is needed for safety / robustness / precise targeting. "Build a system that's really good at scientific reasoning, and only solves the kinds of problems we want it to" is a much more constrained problem than "Build a system that's really good at scientific reasoning", and it's generally hard to achieve much robustness / predictability / deep understanding of very novel software, even when that software isn't as complex or opaque as a deep net.

It sounds to me like key disagreements might include "how much better at science are the first AGI systems built for science likely to be, compared to humans (who weren't evolved to do science at all, but accidented into being capable of such)?" and "how many developers are likely to have the insights and other resources needed to design/train/deploy AGI in the first few years?" Your view makes more sense in my head when I imagine a world where AGI yields smaller capability gains, and where there aren't a bunch of major players who can all deploy AGI within a few years of each other.

Replies from: None, None

↑ comment by [deleted] · 2018-10-04T19:56:45.629Z · LW(p) · GW(p)

2. "Don't worry about developing calculus, questioning the geocentric model of the solar system, etc." is the wrong decision in the fictional example Eliezer provided. You suggest, "once you start getting spaceplanes into orbit and notice that heading right for the moon isn't making progress, you could probably get together some mathematicians and scrum together a rough model of orbital mechanics in time for the next launch". I don't think this is a realistic model of how basic research works. Possibly this is a crux [LW · GW] between our models?

The theoretical framework behind current AI research is essentially "here's what we are regressing between, X and Y, or here's some input data X, outputs in response Y, and a reward R". % correct or biggest R is the objective. And for more complex reasons that I'm going to compress here, you also care about the distribution of the responses.

This is something we can run with. We can iteratively deploy an overall framework - a massive AI platform that is supported by a consortium of companies and offers the best and most consistent performance - that supports ever more sophisticated agent architectures. That is, at first, supported architectures are for problems where the feedback is immediate and the environment the system is operating in is very markovian and clean of dirt, and later we will be able to solve more abstract problems.

With this basic idea we can replace most current jobs on earth and develop fully autonomous manufacturing, resource gathering, construction,

Automating scientific research - there's a way to extend this kind of platform to design experiments autonomously. Essentially you build upon a lower level predictive model by predicting the outcomes of composite experiments that use multiple phenomena at once, and you conduct more experiments where the variance is high. It's difficult to explain and I don't have it fully mapped out, but I think developing a systematic model for how macroscale mechanical physical systems work could be done autonomously. And then the same idea scaled to how low level subatomic systems works, and to iteratively engineer nanotechnology, and maybe work through cell biology a similar way.

Umm, maybe big picture will explain it better : you have hundred story + megaliths of robotic test cells, where the robotic cells were made in an automated factory. And for cracking problems like nanotechnology or cell bio, each test cell is conducting an experiment at some level of integration to address unreliable parts. For example, if you have nanoscale gears and motors working well, but not switches, each test cell is exhaustively searching possible variants of a switch - not the entire grid, but using search trees to guess where a successful switch design might be - to get that piece to work.

And you have a simulator - a system using both learnable weights and some structure - that predict the switch designs that didn't work. You feed into the simulator the error between what it predicted would happen and what the actual robotic test waldos are finding in reality. This update to the simulation model makes the overall effort more likely to design the next piece of the long process to developing nanoscale self replicating factories more probable to succeed.

And a mix of human scientists/engineer and scripts that call on machine learning models decide what to do next once a particular piece of the problem is reliably solved.

There are humans involved, it would not be a hands off system, and the robotic system operating in each test cell uses a well known and rigidly designed architecture that can be understood, even if you don't know how the details of each module function since they are weighted combinations of multiple machine learning algorithms, some of which were in turn developed by other algorithms.

I have a pet theory that even if you could build a self improving AI, you would need to give it access to such megaliths (a cube of modular rooms as wide on each side as it is tall, where each room was made in a factory and trucked onto the site and installed by robots) to generate the clean information needed to do the kinds of magical things we think superintelligent AIs could do.

Robotic systems are the way to get that information because each step they do is replicable. And you subtract what happens without intervention by the robotic arm from what happens when you do, giving you clean data that only has the intervention in it, plus whatever variance the system you are analyzing has inherently. I have a theory that things like nanotechnology, or the kind of real medicine that could reverse human biology age and turn off all possible tumors, or all the other things we know the laws of physics permit but we cannot yet do, can't be found in a vacuum. If you could build an AI "deity" it couldn't come up with this solution from just what humans have published (whether it be all scientific journals ever written or every written word and recorded image) because far too much uncertainty would remain. You still wouldn't know, even with all information analyzed, exactly what arrangements of nanoscale gears will do in a vacuum chamber. Or what the optimal drug regimen to prevent Ms. Smith from developing another mycardial infarction was. You could probably get closer than humans ever have - but you would need to manipulate the environment to find out what you needed to do.

This is the concrete reason for my assessment that out of control AGI are probably not as big a risk as we think. If such machines can't find the information needed to kill us all without systematically looking into this with a large amount of infrastructure, and the host hardware for such a system is specialized and not just freely available on unsecured systems on the internet, and we haven't actually designed these systems with anything like self reflectance much less awareness, it seems pretty implausible.

But I could be wrong. Having a detailed model of how I think such things would really work, based upon my previous work with present day AI, doesn't necessarily make me correct. But I certainly feel more correct.

↑ comment by [deleted] · 2018-10-04T22:09:28.062Z · LW(p) · GW(p)

I don't think this is a realistic model of how basic research works. Possibly this is a crux [LW · GW] between our models?

I'm responding to this statement directly in this post. No, this isn't how basic research works. But just because centuries of inertia cause basic research to be structured a certain way doesn't mean it has to be that way, or that my original statement is wrong.

You could quick and dirty assemble a model using curve fitting that would approximately tell you the relationship between the position of the Moon in the sky and a rocket's thrust vector. It wouldn't need to be a complete theory of gravitation, that theory that was developed over centuries. And it would work : approximate models are very often good enough.

↑ comment by Vaniver · 2018-10-04T17:07:06.011Z · LW(p) · GW(p)

Well the obvious thing to do would be to add more heuristics to your paperclip maker.

I agree this is obvious. But do you have any reason to believe it will work? One of the core arguments here is that trying to constrain optimization processes is trying to constrain an intelligent opponent, because the optimization is performing search through a space of solutions much like an intelligent opponent is. This sort of 'patch-as-you-go' solution is highly inadequate, because the adversary always gets the first move and because the underlying problem making the search process an adversary hasn't been fixed, so it will just seek out the next hole in the specification. See Security Mindset and Ordinary Paranoia.

Once you have all these pieces available to parties with sufficient budgets, it would be like having a way to order highly enriched plutonium from Granger. Then it would be possible to build a closed-loop, self improving system.

What is the word 'then' doing in this paragraph? I'm reading you as saying "yes, highly advanced artificial intelligence would be a major problem, but we aren't there now or soon." But then there are two responses:

1) How long will it take to do the alignment research? As mentioned in the dialogue, it seems like it may be the longest part of the process, such that waiting to start would be a mistake that delays the whole process and introduces significant risk. As a subquestion, is the alignment research something that happens by default as part of constructing capabilities? It seems to me like it's quite easy to be able to build rockets without knowing how orbital mechanics work. Historically, orbital mechanics were earlier in the tech tree*, but I don't think they were a prerequisite for rocket-building.

2) When will we know that it's time? See There's No Fire Alarm for Artificial General Intelligence.

*Here I mean 'rockets that could escape Earth's gravity well,' since other rockets were made much earlier.

Replies from: None

↑ comment by [deleted] · 2018-10-04T17:47:46.827Z · LW(p) · GW(p)

If the paperclip maker's architecture is a set of constrained boxes, where each box does a tiny, well defined part of the problem of making paperclips, and is being evaluated by other boxes that ultimately trace their goals and outputs to human defined goals and sensor data, it's not going anywhere. It's not even sentient in that there's no memory in the system for anything like self reflection. Every piece of memory is specific to the needs of a component. You have to build reliable real-time systems like this, other architectures won't function in a way that wouldn't fail so often as to be economically infeasible. (because paperclips have very low value, while robotic waldos and human lives are expensive)

This is what I mean by I'm on the side of the spaceplane designers. I don't know how another, more flexible architecture would even function, in the same way in this story they don't know how to build a vehicle that doesn't depend on air.

The Rocket Alignment Problem

Contents

44 comments

Overall Summary

Evaluating Analogies

Minor Comments