Thomas Kwa's MIRI research experience

post by Thomas Kwa (thomas-kwa), peterbarnett, Vivek Hebbar (Vivek), Jeremy Gillen (jeremy-gillen), jacobjacob, Raemon · 2023-10-02T16:42:37.886Z · LW · GW · 52 comments


Comments sorted by top scores.

comment by aysja · 2023-10-05T06:00:51.152Z · LW(p) · GW(p)

Meta: I don’t want this comment to be taken as “I disagree with everything you (Thomas) said.” I do think the question of what to do when you have an opaque, potentially intractable problem is not obvious, and I don’t want to come across as saying that I have the definitive answer, or anything like that. It’s tricky to know what to do, here, and I certainly think it makes sense to focus on more concrete problems if deconfusion work didn’t seem that useful to you. 

That said, at a high-level I feel pretty strongly about investing in early-stage deconfusion work, and I disagree with many of the object-level claims you made suggesting otherwise. For instance: 

The neuroscientists I've talked to say that a new scanning technology that could measure individual neurons would revolutionize neuroscience, much more than a theoretical breakthrough. But in interpretability we already have this, and we're just missing the software.

It seems to me like the history of neuroscience should inspire the opposite conclusion: a hundred years of increasingly much data collection at finer and finer resolution, and yet, we still have a field that even many neuroscientists agree barely understands anything. I did undergrad and grad school in neuroscience and can at the very least say that this was also my conclusion. The main problem, in my opinion, is that theory usually tells us which facts to collect. Without it—without even a proto-theory or a rough guess, as with “model-free data collection” approaches—you are basically just taking shots in the dark and hoping that if you collect a truly massive amount of data, and somehow search over it for regularities, that theory will emerge. This seems pretty hopeless to me, and entirely backwards from how science has historically progressed. 

It seems similarly pretty hopeless to me to expect a “revolution” out of tabulating features of the brain at low-enough resolution. Like, I certainly buy that it gets us some cool insights, much like every other imaging advance has gotten us some cool insights. But I don’t think the history of neuroscience really predicts a “revolution,” here. Aside from the computational costs of “understanding” an object in such a way, I just don’t really buy that you’re guaranteed to find all the relevant regularities. You can never collect *all* the data, you have to make choices and tradeoffs when you measure the world, and without a theory to tell you which features are meaningfully irrelevant and can be ignored, it’s hard to know that you’re ultimately looking at the right thing. 

I ran into this problem, for instance, when I was researching cortical uniformity. Academia has amassed a truly gargantuan collection of papers on the structural properties of the human neocortex. What on Earth do any of these papers say about how algorithmically uniform the brain is? As far as I can tell, pretty much close to zero, because we have no idea how the structural properties of the cortex relate to the functional ones, and so who’s to say that “neuron subtype A is more dense in the frontal cortex relative to the visual cortex” is a meaningful finding or not? I worry that other “shot in the dark” data collection methods will suffer similar setbacks.

Eliezer has written about how Einstein cleverly used very limited data to discover relativity. But we could have discovered relativity easily if we observed not only the precession of Mercury, but also the drifting of GPS clocks, gravitational lensing of distant galaxies, gravitational waves, etc.

It’s of course difficult to say how science might have progressed counterfactually, but I find it pretty hard to believe that relativity would have been “discovered easily” were we to have had a bunch of data staring us in the face. In general, I think it’s very easy to underestimate how difficult it is to come up with new concepts. I felt this way when I was reading about Darwin and how it took him over a year to go from realizing that “artificial selection is the means by which breeders introduce changes,” to realizing that “natural selection is the means by which changes are introduced in the wild.” But then I spent a long time in his shoes, so to speak, operating from within the concepts he had available to him at the time, and I became more humbled. For instance, among other things, it seems like a leap to go from “a human uses their intellect to actively select” to “nature ends up acting like a selector, in the sense that its conditions favor some traits for survival over others.” These feel like quite different “types” of things, in some ways. 

In general, I suspect it’s easy to take the concepts we already have, look over past data, and assume it would have been obvious. But I think the history of science again speaks to the contrary: scientific breakthroughs are rare, and I don’t think it’s usually the case that they’re rare because of a lack of data, but because they require looking at that data differently. Perhaps data on gravitational lensing may have roused scientists to notice that there were anomalies, and may have eventually led to general relativity. But the actual process of taking the anomalies and turning that into a theory is, I think, really hard. Theories don’t just pop out wholesale when you have enough data, they take serious work.

I heard David Bau say something interesting at the ICML safety workshop: in the 1940s and 1950s lots of people were trying to unlock the basic mysteries of life from first principles. How was hereditary information transmitted? Von Neumann designed a universal constructor in a cellular automaton, and even managed to reason that hereditary information was transmitted digitally for error correction, but didn't get further. But it was Crick, Franklin, and Watson who used crystallography data to discover the structure of DNA, unraveling far more mysteries. Since then basically all advances in biochemistry have been empirical. Biochemistry is a case study where philosophy and theory failed to solve the problems but empirical work succeeded, and maybe interpretability and intelligence are similar.

This story misses some pretty important pieces. For instance, Schrödinger predicted basic features about DNA—that it was an aperiodic crystal—using first principles in his book What if Life? published in 1944. The basic reasoning is that in order to stably encode genetic information, the molecule should itself be stable, i.e., a crystal. But to encode a variety of information, rather than the same thing repeated indefinitely, it needs to be aperiodic. An aperiodic crystal is a molecule that can use a few primitives to encode near infinite possibilities, in a stable way. His book was very influential, and Francis and Crick both credited Schrödinger with the theoretical ideas that guided their search. I also suspect their search went much faster than it would have otherwise; many biologists at the time thought that the hereditary molecule was a protein, of which there are tens of millions in a typical cell. 

But, more importantly, I would certainly not say that biochemistry is an area where empirical work has succeeded to nearly the extent that we might hope it to. Like, we still can’t cure cancer, or aging, or any of the myriad medical problems people have to endure; we still can’t even define “life” in a reasonable way, or answer basic questions like “why do arms come out basically the same size?” The discovery of DNA was certainly huge, and helpful, but I would say that we’re still quite far from a major success story with biology.

My guess is that it is precisely because we lack theory that we are unable to answer these basic questions, and to advance medicine as much as we want. Certainly the “tabulate indefinitely” approach will continue pushing the needle on biological research, but I doubt it is going to get us anywhere near the gains that, e.g., “the hereditary molecule is an aperiodic crystal” did.

And while it’s certainly possible that biology, intelligence, agency and so on are just not amenable to the cleave-reality-at-its-joints type of clarity one gets from scientific inquiry, I’m pretty skeptical that this the world we in fact live in, for a few reasons. 

For one, it seems to me that practically no one is trying to find theories in biology. It is common for biologists (even bright-eyed, young PhDs at elite universities) to say things like (and in some cases this exact sentence): “there are no general theories in biology because biology is just chemistry which is just physics.” These are people at the beginning of their careers, throwing in the towel before they’ve even started! Needless to say, this take is clearly not true in all generality, because it would anti-predict natural selection. It would also, I think, anti-predict Newtonian mechanics (“there are no general theories of motion because motion is just the motion of chemicals which is just the motion of particles which is just physics”). 

Secondly, I think that practically all scientific disciplines look messy, ad-hoc, and empirical before we get theories that tie it together, and that this does not on its own suggest biology is a theoretically bankrupt field. E.g., we had steam engines before we knew about thermodynamics, but they were kind of ad-hoc, messy contraptions, because we didn’t really understand what variables were causing the “work.” Likewise, naturalism before Darwin was largely compendiums upon compendiums of people being like “I saw this [animal/fossil/plant/rock] here, doing this!” Science before theory often looks like this, I think. 

Third: I’m just like, look guys, I don’t really know what to tell you, but when I look at the world and I see intelligences doing stuff, I sense deep principles. It’s a hunch, to be sure, and kind of hard to justify, but it feels very obvious to me. And if there are deep principles to be had, then I sure as hell want to find them. Because it’s embarrassing that at this point we don’t even know what intelligence is, nor agency, nor abstractions: how to measure any of it, predict when it will increase or not. These are the gears that are going to move our world, for better or for worse, and I at least want my hands on the steering wheel when they do.

I think that sometimes people don’t really know what to envision with theoretical work on alignment, or “agent foundations”-style work. My own vision is quite simple: I want to do great science, as great science has historically been done, and to figure out what in god’s name any of these phenomena are. I want to be able to measure that which threatens our existence, such that we may learn to control it. And even though I am of course not certain this approach is workable, it feels very important to me to try. I think there is a strong case for there being a shot, here, and I want us to take it. 

Replies from: jacobjacob, TekhneMakre, thomas-kwa, jsd, Morpheus
comment by jacobjacob · 2023-10-05T06:08:01.178Z · LW(p) · GW(p)

I did undergrad and grad school in neuroscience and can at the very least say that this was also my conclusion.

I remember the introductory lecture for the Cognitive Neuroscience course I took at Oxford. I won't mention the professor's name, because he's got his own lab and is all senior and stuff, and might not want his blunt view to be public -- but his take was "this field is 95% nonsense. I'll try to talk about the 5% that isn't". Here's a lecture slide:

Replies from: Simon Skade
comment by Towards_Keeperhood (Simon Skade) · 2023-10-05T20:44:13.223Z · LW(p) · GW(p)

Lol possibly someone should try to make this professor work for Steven Byrnes / on his agenda.

comment by TekhneMakre · 2023-10-05T13:01:57.709Z · LW(p) · GW(p)

Top level blog post, do it.

comment by Thomas Kwa (thomas-kwa) · 2023-10-05T22:18:43.988Z · LW(p) · GW(p)

Thanks, I really like this comment. Here are some points about metascience I agree and disagree with, and how this fits into my framework for thinking about deconfusion vs data collection in AI alignment.

  • I tentatively think you're right about relativity, though I also feel out of my depth here. [1]
  • David Bau must have mentioned the Schrödinger book but I forgot about it, thanks for the correction. The fact that ideas like this told Watson and Crick where to look definitely seems important.
  • Overall, I agree that a good theoretical understanding guides further experiment and discovery early on in a field.
  • However, I don't think curing cancer or defining life are bottlenecked on deconfusion. [2]
    • For curing cancer, we know the basic mechanisms behind cancer and understand that they're varied and complex. We have categorized dozens of oncogenes of about 7 different types, and equally many ways that organisms defend against cancer. It seems unlikely that the the cure for cancer will depend on some unified theory of cancer, and much more plausible that it'll be due to investments in experiment and engineering. It was mostly engineering that gave us mRNA vaccines, and a mix of all three that allowed CRISPR.
    • For defining life, we already have edge cases like viruses and endosymbiotic organisms, and understand pretty well which things can maintain homeostasis, reproduce, etc. in what circumstances. It also seems unlikely that someone will draw a much sharper boundary around life, especially without lots more useful data.

My model of the basic process of science currently looks like this:

Note that I distinguish deconfusion (what you can invest in) from theory (the output). At any time, there are various returns to investing in data collection tech, experiments, and deconfusion, and returns diminish with the amount invested. I claim that in both physics and biology, we went through an early phase where the bottleneck was theory and returns to deconfusion were high, and currently the fields are relatively mature, such that the bottlenecks have shifted to experiment and engineering, but with theory still playing a role.

In AI, we're in a weird situation:

  • We feel pretty confused about basic concepts like agency, suggesting that we're early and deconfusion is valuable.
  • Machine learning is a huge field. If there are diminishing returns to deconfusion, this means experiment and data collection are more valuable than deconfusion.
  • Machine learning is already doing impressive things primarily on the back of engineering, without much reliance on the type of theory that deconfusion generates alone (deep, simple relationships between things).
    • But even if engineering alone is enough to build AGI, we need theory for alignment.
  • In biology, we know cancer is complex and unlikely to be understood by a deep simple theory, but in AI, we don't know whether intelligence is complex.

I'm not sure what to make of all this, and this comment is too long already, but hopefully I've laid out a frame that we can roughly agree on.

[1] When writing the dialogue I thought the hard part of special relativity was discovering the Lorentz transformations (which GPS clock drift observations would make obvious), but Lorentz did this between 1892-1904 and it took until 1905 for Einstein to discover the theory of special relativity. I missed the point about theory guiding experiment earlier, and without relativity we would not have built gravitational wave detectors. I'm not sure whether this also applies to gravitational lensing or redshift.

[2] I also disagree with the idea that "practically no one is trying to find theories in biology". Theoretical biology seems like a decently large field-- probably much larger than it was in 1950-- and biologists use mathematical models all the time.

comment by jsd · 2023-10-06T03:41:28.717Z · LW(p) · GW(p)

I don't have the energy to contribute actual thoughts, but here are a few links that may be relevant to this conversation:

comment by Morpheus · 2023-10-07T07:54:20.586Z · LW(p) · GW(p)

we still can’t even define “life” in a reasonable way, or answer basic questions like “why do arms come out basically the same size?”

Such a definition seems futile [? · GW] (I recommend the rest of the word sequence also). Biology already does a great job explaining what and why some things are alive. We are not going around thinking a rock is "alive". Or what exactly did you have in mind there?

Replies from: Ilio
comment by Ilio · 2023-10-07T14:01:18.387Z · LW(p) · GW(p)

Same quote, emphasis on the basic question.

What’s wrong with « Left and right limbs come out basically the same size because it’s the same construction plan. »?

Replies from: thomas-kwa, Morpheus
comment by Thomas Kwa (thomas-kwa) · 2023-10-07T21:47:56.808Z · LW(p) · GW(p)

A sufficiently mechanistic answer lets us engineer useful things, e.g. constructing an animal with left limbs 2 inches longer than its right limbs.

comment by Morpheus · 2023-10-07T15:48:48.617Z · LW(p) · GW(p)

Ups. Yeah I forgot to address that one. I was just astonished to hear no one knows the answer to that one.

comment by Thomas Kwa (thomas-kwa) · 2023-10-07T06:22:34.704Z · LW(p) · GW(p)

I wrote this dialogue to stimulate object-level discussion on things like how infohazard policies slow down research, how new researchers can stop flailing around so much, whether deconfusion is a bottleneck to alignment, or the sharp left turn. I’m a bit sad that most of the comments are about exactly how much emotional damage Nate tends to cause people and whether this is normal/acceptable: it seems like a worthwhile conversation to happen somewhere, but now that many people have weighed in with experiences from other contexts, including non-research colleagues, a romantic partner, etc., I think all the drama distracts from the discussions I wanted to have. The LW team will be moving many comments to an escape-valve post [LW · GW]; after this I’ll have a pretty low bar for downvoting comments I think are off-topic here.

comment by Muireall · 2023-10-03T18:31:17.442Z · LW(p) · GW(p)

The model was something like: Nate and Eliezer have a mindset that's good for both capabilities and alignment, and so if we talk to other alignment researchers about our work, the mindset will diffuse into the alignment community, and thence to OpenAI, where it would speed up capabilities. I think we didn't have enough evidence to believe this, and should have shared more.

What evidence were you working off of? This is an extraordinary thing to believe.

Replies from: thomas-kwa, Raemon
comment by Thomas Kwa (thomas-kwa) · 2023-10-03T23:09:19.666Z · LW(p) · GW(p)

First I should note that Nate is the one who most believed this; that we not share ideas that come from Nate was a precondition of working with him. [edit: this wasn't demanded by Nate except in a couple of cases, but in practice we preferred to get Nate's input because his models were different from ours.]

With that out of the way, it doesn't seem super implausible to us that the mindset is useful, given that MIRI had previously invented out of the box things like logical induction and logical decision theory, and that many of us feel like we learned a lot over the past year. On inside view I have much better ideas than I did a year ago, although it's unclear how much to attribute to the Nate mindset. To estimate this it's super unclear how to make a reference class-- I'd maybe guess at the base rate of mindsets transferring from niche fields to large fields and adjust from there. We spent way too much time discussing how. Let's say there's a 15%* chance of usefulness.

As for whether the mindset would diffuse conditional on it being useful, this seems pretty plausible, maybe 15% if we're careful and 50% if we talk to lots of people but don't publish? Scientific fields are pretty good at spreading useful ideas.

So I think the whole proposition is unlikely but not "extraordinary", maybe like 2.5-7.5%. Previously I was more confident in some of the methods so I would have given 45% for useful, making p(danger) 7%-22.5%. The scenario we were worried about is if our team had a low probability of big success (e.g. due to inexperience), but sharing ideas would cause a fixed 15% chance of big capabilities externalities regardless of success. The project could easily become -EV this way.

Some other thoughts:

  • Nate trusts his inside view more than any of our attempts to construct an argument legible to me which I think distorted our attempts to discuss this.
  • It's hard to tell 10% from 1% chances for propositions like this, which is also one of the problems in working on long-shot, hopefully high EV projects.
  • Part of why I wanted the project to become less private as it went on is that we generated actual research directions and would only have to share object level to get feedback on our ideas.

* Every number in this comment is very rough

Replies from: Muireall
comment by Muireall · 2023-10-03T23:12:18.631Z · LW(p) · GW(p)

Interesting, thanks!

comment by Raemon · 2023-10-03T21:46:17.659Z · LW(p) · GW(p)

This isn't quite how I'd frame the question.

[edit: My understanding is that] Eliezer and Nate believe this. I think it's quite reasonable for other people to be skeptical of it. 

Nate and Eliezer can choose to only work closely/mentor people who opt into some kind of confidentiality clause about it. People who are skeptical or don't think it's worth the costs can choose not to opt into it.

I have heard a few people talk about MIRI confidentiality norms being harmful to them in various ways, so I do also think it's quite reasonable for people to be more cautious about opting into working with Nate or Eliezer if they don't think it's worth the cost.

Presumably, Nate/Eliezer aren't willing to talk much about this precisely because they think it'd leak capabilities. You might think they're wrong, or that they haven't justified that, but, like, the people who have a stake in this are the people who are deciding whether to work with them. (I think there's also a question of "should Eliezer/Nate have a reputation as people who have a mindset that's good for alignment and capabilities that'd be bad to leak?", and I'd say the answer should be "not any moreso than you can detect from their public writings, and whatever your personal chains of trust with people who have worked closely with them that you've talked to.")

I do think this leaves some problems. I have heard about the MIRI confidentiality norms being fairly paralyzing for some people in important ways. But something about the Muireall's comment felt like a wrong frame to me.

Replies from: So8res, Muireall
comment by So8res · 2023-10-04T18:29:31.539Z · LW(p) · GW(p)

(I am pretty uncomfortable with all the "Nate / Eliezer" going on here. Let's at least let people's misunderstandings of me be limited to me personally, and not bleed over into Eliezer!)

(In terms of the allegedly-extraordinary belief, I recommend keeping in mind jimrandomh's note on Fork Hazards [LW · GW]. I have probability mass on the hypothesis that I have ideas that could speed up capabilities if I put my mind to it, as is a very different state of affairs from being confident that any of my ideas works. Most ideas don't work!)

(Separately, the infosharing agreement that I set up with Vivek--as was perhaps not successfully relayed to the rest of the team, though I tried to express this to the whole team on various occasions--was one where they owe their privacy obligations to Vivek and his own best judgements, not to me.)

Replies from: Raemon
comment by Raemon · 2023-10-04T19:16:48.578Z · LW(p) · GW(p)

(Separately, the infosharing agreement that I set up with Vivek--as was perhaps not successfully relayed to the rest of the team, though I tried to express this to the whole team on various occasions--was one where they owe their privacy obligations to Vivek and his own best judgements, not to me.)

That's useful additional information, thanks.

I made a slight edit to my previous comment to make my epistemic state more clear. 

Fwiw, I feel like I have a pretty crisp sense of "Nate and Eliezers communication styles are actually pretty different" (I noticed myself writing out a similar comment about communication styles under the Turntrout thread that initially said "Nate and Eliezer" a lot, and then decided that comment didn't make sense to publish as-is), but I don't actually have much of a sense of the difference between Nate, Eliezer, and MIRI-as-a-whole with regards to "the mindset" and "confidentiality norms".

comment by Muireall · 2023-10-03T22:38:58.107Z · LW(p) · GW(p)

Sure. I only meant to use Thomas's frame, where it sounds like Thomas did originally accept Nate's model on some evidence, but now feels it wasn't enough evidence. What was originally persuasive enough to opt in? I haven't followed all Nate's or Eliezer's public writing, so I'd be plenty interested in an answer that draws only from what someone can detect from their public writing. I don't mean to demand evidence from behind the confidentiality screen, even if that's the main kind of evidence that exists.

Separately, I am skeptical and a little confused as to what this could even look like, but that's not what I meant to express in my comment.

comment by jacobjacob · 2023-10-03T05:56:45.042Z · LW(p) · GW(p)

Flagging inconvenient acronym clash between SLT used for Sharp Left Turn and Singular Learning Theory (have seem both)!

Replies from: bideup
comment by bideup · 2023-10-03T22:06:27.305Z · LW(p) · GW(p)

I vote singular learning theory gets priority (if there was ever a situation where one needed to get priority). I intuitively feel like research agendas or communities need an acronym more than concepts. Possibly because in the former case the meaning of the phrase becomes more detached from the individual meaning of the words than it does in the latter.

comment by Vaniver · 2023-10-02T21:32:35.210Z · LW(p) · GW(p)

I’m also much less pessimistic about the possibility of communicating with people than Nate/MIRI seem to be.

FWIW this is also my take (I have done comms work for MIRI in the past, and am currently doing it part-time) but I'm not sure I have any more communication successes than Nate/MIRI have had, so it's not clear to me this is skill talking on my part instead of inexperience.

comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-10-02T17:03:31.762Z · LW(p) · GW(p)

Thanks for sharing! I was wondering what happened with that project & found this helpful (and would have found it even more helpful if I didn't already know and talk with some of you).

I'd love to see y'all write more, if you feel like it. E.g. here's a prompt:

You said:

Solving the full problem despite reflection / metacognition seems pretty out of reach for now. In the worst case, if an agent reflects, it can be taken over by subagents, refactor all of its concepts into a more efficient language, invent a new branch of moral philosophy that changes its priorities, or a dozen other things. There's just way too much to worry about, and the ability to do these things is -- at least in humans -- possibly tightly connected to why we're good at science. 

Can you elaborate? I'd love to see a complete list of all the problems you know of (the dozen things!). I'd also love to see it annotated with commentary about the extent to which you expect these problems to arise in practice vs. probably only if we get unlucky. Another nice thing to include would be a definition of reflection/metacognition. Finally it would be great to say some words about why ability to do reflection/metacognition might be tightly connected to ability to do science.

Replies from: thomas-kwa
comment by Thomas Kwa (thomas-kwa) · 2023-10-02T20:44:47.101Z · LW(p) · GW(p)

I'm less concerned about the fact that there might be a dozen different problems (and therefore don't have an explicit list), and more concerned about the fact that we don't understand the mathematical structure behind metacognition (if there even is something to find), and therefore can't yet characterize it or engineer it to be safe. We were trying to make a big list early on, but gradually shifted to resolving confusions and trying to make concrete models of the systems and how these problems arise.

Off the top of my head, by metacognition I mean something like: reasoning that chains through models of yourself, or applying your planning process to your own planning process.

On why reflection/metacognition might be connected to general science ability, I don't like to speculate on capabilities, but just imagine that the scientists in the Apollo program were unable to examine and refine their research processes-- I think they would likely fail.

Replies from: daniel-kokotajlo
comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-10-03T00:55:00.488Z · LW(p) · GW(p)

I agree that just because we've thought hard and made a big list, doesn't mean the list is exhaustive. Indeed the longer the list we CAN find, the higher the probability that there are additional things we haven't found yet...

But I still think having a list would be pretty helpful. If we are trying to grok the shape of the problem, it helps to have many diverse examples.

Re: metacognition: OK, that's a pretty broad definition I guess. Makes the "why is this important for doing science" question easy to answer. Arguably GPT4 already does metacognition to some extent, at least in ARC Evals and when in an AutoGPT harness, and probably not very skillfully.
ETA: so, to be clear, I'm not saying you were wrong to move from the draft list to making models; I'm saying if you have time & energy to write up the list, that would help me along in my own journey towards making models & generally understanding the problem better. And probably other readers besides me also.

comment by DanielFilan · 2023-10-03T16:48:29.854Z · LW(p) · GW(p)

The issues you describe seem fairly similar to Academia, where you get a PhD advisor but don't talk to them very often, in large part because they're busy.

FWIW this depends on the advisor, sometimes you do get to talk to the advisor relatively often.

comment by Thomas Kwa (thomas-kwa) · 2023-09-27T04:18:10.929Z · LW(p) · GW(p)

[1]: To clarify what I mean by "pairs of intuitions", here are two that seem more live to me:

  • Is it viable to rely on incomplete [LW · GW] preferences for corrigibility? Sami Petersen showed incomplete agents don't have to violate some consistency properties. But it's still super unclear how one would keep an agent's preferences naturally coming into conflict with yours, if the agent can build a moon rocket ten times faster than you.
  • What's going on with low-molecular-weight exhaust and rocket engine efficiency? There's a claim on the internet that since rocket engine efficiency is proportional to velocity of exhaust molecules, and since , lighter exhaust molecules give you higher velocity for a given amount of energy. This is validated by the fact that in hydrogen-oxygen rockets, the optimum  is achieved when using an excess of lighter hydrogen molecules. But this doesn't make sense, because energy is limited and you can't just double the number of molecules while keeping energy per molecule fixed. (I might write this up)

[2]: Note I'm not endorsing logical decision theory over CDT/EDT because there seem to be some problems and also LDT is not formalized.

Replies from: Connor_Flexman
comment by Connor_Flexman · 2023-10-06T08:00:39.352Z · LW(p) · GW(p)

Re rockets, I might be misunderstanding, but I’m not sure why you’re imagining doubling the number of molecules. Isn’t the idea that you hold molecules constant and covalent energy constant, then reduce mass to increase velocity? Might be worth disambiguating your comparator here: I imagine we agree that light hydrogen would be better than heavy hydrogen, but perhaps you’re wondering about kerosene?

Replies from: thomas-kwa
comment by Thomas Kwa (thomas-kwa) · 2023-10-06T21:12:03.730Z · LW(p) · GW(p)

The phenomenon I'm confused about is that changing the mixture ratio can cause the total energy per unit mass released by the fuel and oxidizer to decrease, but the  to increase.

Replies from: kyle-ibrahim
comment by zrezzed (kyle-ibrahim) · 2023-10-09T04:06:01.431Z · LW(p) · GW(p)

Fun nerd snipe! I gave it a quick go and was mostly able to deconfuse myself, though I'm still unsure of the specifics. I would still love to hear an expert take.

First, what exactly is the confusion?

For an LOX/LH2 rocket, the most energy efficient fuel ratio is stoichiometric, at 8:1 by mass. However, real rockets apparently use ratios with an excess of hydrogen to boost [1] -- somewhere around 4:1[2] seems to provide the best overall performance. This is confusing, as my intuition is telling me: for the same mass of propellant, a non-stoichiometric fuel ratio is less energetic. Less energy being put into a gas with more mols should mean lower-enough temperatures that the exhaust velocity should be also be lower, thus lower thrust and .

So, where is my intuition wrong?

The total fuel energy per unit mass is indeed lower, nothing tricky going on there. There's less loss than I expected though. Moving from an 8:1 -> 4:1 ratio only results in a theoretical 10% energy loss[3], but an 80% increase in products (by mol).

However, assuming lower energy implies lower temperatures was at least partially wrong. Given a large enough difference in specific heat, less energy could result in a higher temperature. In our case though, the excess hydrogen actually increases the heat capacity of the product, meaning a stoichiometric ratio will always produce the higher temperature[4].

But as it turns out, a stoichiometric ratio of LOX/LH2 burns so hot that a portion of the H2O dissociates into various species, significantly reducing efficiency. A naive calculation of the stoichiometric flame temperature is around 5,800K, vs ~3,700K when taking into account these details[5]. Additionally, this inefficiency drops off quickly as temperatures lower, meaning a 4:1 ratio is much more efficient and can generate temperatures over 3,000K. 

This seems to be the primary mechanism behind the improved : a 4:1 fuel ratio is able to generate combustion temperatures close enough to the stoichiometric ratio in a gas with a higher enough heat capacity to generate a higher exhaust velocity. And indeed, plugging in rough numbers to the exhaust velocity equation[6], this bears out.

The differences in molecular weight and heat capacities also contribute to how efficiently a real rocket nozzle can convert the heat energy into kinetic energy, which is what the other terms from the exhaust velocity help correct for. But as far as I can tell, this is not the dominant effect and actually reduces exhaust velocity for the 4:1 mixture (though I'm very uncertain about this). 

The internet is full of 1) poor, boldly incorrect and angry explainers on this topic, and 2) and incredibly high-quality rocket science resources and tools (this was some of the most disconsonant non-CW discourse I've had to wade through). With all the great resources that do exist though, I was surprised I couldn't find any existing intuitive explanations! I seemed to find either muddled thinking around this specific example, or clear thinking about the math in abstract. 

... or who knows, maybe my reading comprehension is just poor!

  1. ^

    Intense heat and the dangers of un-reacted, highly oxidizing O2  in the exhaust also motivates excess hydrogen ratios. 

  2. ^

    The Space Shuttle Main Engine used a 6.03:1 ratio, in part because a 4:1 ratio would require a much, much larger LH2 tank. 

  3. ^

    20H2 + 10O2 → 20H2O: ΔH ~= -5716kJ  (vs)  36H2 + 9O2 → 18H2O + 18H2: ΔH ~= -5144.4kJ

  4. ^

    For the fuel rich mixture, if we were somehow able to only heat the water product, the temperature would equal the stoichiometric flame temp. Then when considering the excess H2 temperatures would be necessarily lower. Charts showing showing the flame temp of various fuel ratios support this: 

  5. ^

    See the SSME example here: Ironically, they incorrectly use a stoichiometric ratio in their calculations. But as they show, the reaction inefficiencies explain the vast majority of the temperature discrepancy.

  6. ^
Replies from: waterlubber, zfurman, thomas-kwa
comment by waterlubber · 2023-10-09T04:33:03.986Z · LW(p) · GW(p)

You've got the nail on the head here. Aside from the practical limits of high temperature combustion (running at a lower chamber temperature allows for lighter combustion chambers, or just practical ones at all) the various advantages of a lighter exhaust most than make up for the slightly lower combustion energy. the practical limits are often important: if your max chamber temperature is limited, it makes a ton of sense to run fuel rich to bring it to an acceptable range.

One other thing to mention is that the speed of sound of the exhaust matters quite a lot. Given the same area ratio nozzle and same gamma in the gas, the exhaust mach number is constant; a higher speed of sound thus yields a higher exhaust velocity.

The effects of dissociation vary depending on application. It's less of an issue with vacuum nozzles, where their large area ratio and low exhaust temperature allow some recombination. For atmospheric engines, nozzles are quite short; there's little time for gases to recombine.

I'd recommend playing around with CEA (, which allows you to really play with a lot of combinations quickly.

I'd also like to mention that some coefficients in nozzle design might make things easier to reason about. Thrust coefficient and characteristic velocity are the big ones; see an explanation here

Note that exhaust velocity is proportional to the square root of (T_0/MW), where T_0 is chamber temperature.

Thrust coefficient, which describes the effectiveness of a nozzle, purely depends on area ratio, back pressure, and the specific heat ratio for the gas.

You're right about intuitive explanations of this being few and far between. I couldn't even get one out of my professor when I covered this in class.

To summarize:

  1. Only gamma, molecular weight, chamber temp T0, and nozzle pressures affect ideal exhaust velocity.
  2. Given a chamber pressure, gamma, and back pressure, (chamber pressure is engineering limited), a perfect nozzle will expand your exhaust to a fixed mach number, regardless of original temperature.
  3. Lower molecular weight gases have more exhaust velocity at the same mach number.
  4. Dissociation effects make it more efficient to avoid maximizing temperature in favor of lowering molecular weight.

This effect is incredibly strong for nuclear engines: since they run at a fixed, relatively low engineering limited temperature, they have enormous specific impulse gains by using as light a propellant as possible.

Replies from: kyle-ibrahim
comment by zrezzed (kyle-ibrahim) · 2023-10-09T22:02:50.101Z · LW(p) · GW(p)

One other thing to mention is that the speed of sound of the exhaust matters quite a lot. Given the same area ratio nozzle and same gamma in the gas, the exhaust mach number is constant; a higher speed of sound thus yields a higher exhaust velocity.


My understanding is this effect is a re-framing of what I described: for a similar temperature and gamma, a lower molecular weight (or specific heat) will result in a higher speed of sound (or exit velocity).

However, I feel like this framing fails to provide a good intuition for the underlying mechanism. At the limit,  anyways, so it's harder (for me at least) to understand how gas properties relate to sonic properties. Yes, holding other things constant, a lower molecular weight increases the speed of sound. But crucially, it means there's more kinetic energy to be extracted to start with.

Is that not right?

Replies from: waterlubber
comment by waterlubber · 2023-10-10T03:52:45.084Z · LW(p) · GW(p)

Aside from dissociation/bond energy, nearly all of the energy in the combustion chamber is kinetic. Hill's Mechanics and Thermodynamics of Propulsion gives us this very useful figure for the energy balance:

A good deal of the energy in the exhaust is still locked up in various high-energy states; these states are primarily related to the degrees of freedom of the gas (and thus gamma) and are more strongly occupied at higher temperatures. I think that the lighter molecular weight gasses have equivalently less energy here, but I'm not entirely sure. This might be something to look into.

Posting this graph has got me confused as well, though. I was going to write about how there's more energy tied up in the enthalpy of the gas in the exhaust, but that wouldn't make sense - lower MW propellants have a higher specific heat per unit mass, and thus would retain more energy at the same temperature. 

I ran the numbers in Desmos for perfect combustion, an infinite nozzle, and no dissociation, and the result was still there, but quite small:

The one thing to note: the ideal occurs where the gas has the highest speed of sound. I really can't think of any intuitive way to write this other than "nozzles are marginally more efficient at converting the energy of lighter molecular weight gases from thermal-kinetic to macroscopic kinetic."

comment by Zach Furman (zfurman) · 2023-10-09T23:28:21.845Z · LW(p) · GW(p)

I wish I had a more short-form reference here, but for anyone who wants to learn more about this, Rocket Propulsion Elements is the gold standard intro textbook. We used in my university rocketry group, and it's a common reference to see in industry. Fairly well written, and you should only need to know high school physics and calculus.

comment by Thomas Kwa (thomas-kwa) · 2023-10-26T21:46:00.172Z · LW(p) · GW(p)

This is an impressive piece of deconfusion, probably better than I could have done. I'd be excited about seeing a post with a log of your thought process and what sources you consulted in what order.

comment by [deleted] · 2023-10-03T02:07:38.206Z · LW(p) · GW(p)

Some of these comments about in-person communication with Nate seem like maybe they could be addressed by communicating over text instead. E.g., over text, Nate could choose not to convey an initial distressed reaction to someone not understanding.

Replies from: thomas-kwa, Raemon
comment by Thomas Kwa (thomas-kwa) · 2023-10-03T05:50:26.161Z · LW(p) · GW(p)

It's a good thought to try different media, but empirically, we had a text channel open and it wasn't as useful.

comment by Raemon · 2023-10-03T04:58:30.364Z · LW(p) · GW(p)

Seems maybe true, although I also think there's generally a lot lower bandwidth over text, and often text is more time-intensive to write. (Not saying it's necessarily not worth the tradeoff overall, but I'd probably bet against it being obviously better)

comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-10-03T20:51:26.805Z · LW(p) · GW(p)

Also, I figure this would be a good time to say: I have tremendous respect for all of you (the authors), thanks & keep up the good work! <3

Replies from: thomas-kwa
comment by Thomas Kwa (thomas-kwa) · 2023-10-04T07:53:00.636Z · LW(p) · GW(p)

I really have to thank Jacob and Ray for spending several hours each facilitating this dialogue. I had been meaning to write something like this up ever since the project ended last month, and couldn't really put pen to paper for some reason. It's possible that without them this post would never have happened, or been much worse.

comment by Gurkenglas · 2023-10-03T17:39:34.714Z · LW(p) · GW(p)

infohazards could be limited by being public until we start producing impressive results, then going private.

That is not how information theory works!

Replies from: thomas-kwa
comment by Thomas Kwa (thomas-kwa) · 2023-10-03T18:24:27.935Z · LW(p) · GW(p)

Theoretically yes, but the field of machine learning is not steered by what can be theoretically deduced from all information. My guess is that not publishing the first paper with impressive results is pretty important, and there are ways to get feedback short of that, like talking to people, or writing an Alignment Forum post for less dangerous ideas like we did once [LW · GW].

The full quote is also "object level infohazards could be limited"; subtle mindset infohazards likely present a trickier balance to strike and we'd have to think about them more.

comment by jacobjacob · 2023-10-05T17:45:56.150Z · LW(p) · GW(p)

Curated! (And here's the late curation notice) 

I don’t really know what I think about retrospectives in general, and I don’t always find them that helpful, because causal attribution is hard. Nonetheless, here are some reasons why I wanted to curate this:

  • I like that it both covers kind of a broad spectrum of stuff that influences a research project, and also manages to go into some interesting detail: high-level research direction and its feasibility, concrete sub-problems attempted and the outcome, particular cognitive and problem-solving strategies that were tried, as well as motivational and emotional aspects of the project. Hearing about what it was like when the various agent confusions collided with the various humans involved, was quite interesting and I feel like it actually gave me a somewhat richer view of both
  • It discusses some threads that seem important and that I’ve heard people talk about offline, but that I’ve seen less discussed online recently (the impact of infohazard policies, ambiguous lines between safety and capabilities research and how different inside views might cause one to pursue one rather than the other, people’s experience of interfacing with the MIRI_Nate way of doing research and communicating)
  • It holds different perspectives from the people involved in the research group, and I like how the dialogues feature kind of allows each to coexist without feeling a need to squeeze them into a unified narrative (the way things might feel if one were to coauthor a post or paper).

(Note: I chose to curate this, and I am also listed as a coauthor. I think this is fine because ultimately the impetus for writing up this content came from Thomas. Me and Raemon mostly just served as facilitators and interlocutors helping him get this stuff into writing.)

comment by ryan_b · 2023-10-04T13:12:21.535Z · LW(p) · GW(p)

Question: why is a set of ideas about alignment being adjacent to capabilities only a one-way relationship? More directly, why can't this mindset be used to pull alignment gains out of capability research?

Replies from: nathaniel-monson, thomas-kwa
comment by Nathaniel Monson (nathaniel-monson) · 2023-10-04T17:28:27.590Z · LW(p) · GW(p)

I think it is 2-way, which is why many (almost all?) Alignment researchers have spent a significant amount of time looking at ML models and capabilities, and have guesses about where those are going.

comment by Thomas Kwa (thomas-kwa) · 2023-10-05T00:57:28.923Z · LW(p) · GW(p)

Not sure exactly what the question is, but research styles from ML capabilities have definitely been useful for alignment, e.g. the idea of publishing benchmarks.

comment by leogao · 2023-10-03T18:00:37.254Z · LW(p) · GW(p)

I agree with this sentiment ("having lots of data is useful for deconfusion") and think this is probably the most promising avenue for alignment research. In particular, I think we should prioritize the kinds of research that give us lots of bits [LW · GW] about things that could matter. Though from my perspective actually most empirical alignment work basically fails this check, so this isn't just a "empirical good" take.

Replies from: jacobjacob
comment by jacobjacob · 2023-10-03T18:11:29.186Z · LW(p) · GW(p)

Reacted with "examples", but curious about examples/papers/etc both of things you think give lots of bits and things that don't. 

comment by Algon · 2023-10-06T16:02:12.730Z · LW(p) · GW(p)

So is this ability related to Nate's supposed ability to "keep reframing the problem ever-so-slightly until the solution seems obvious"? And does he, in fact, have that later ability? I want that ability. It would speed things up so much. But I've got few leads for how to acquire it.

comment by Chris_Leong · 2023-10-03T09:58:49.087Z · LW(p) · GW(p)

But to me, this felt like a confusion about the definition of the word "could"

Could you explain what you see as the confusion?

Replies from: thomas-kwa
comment by Thomas Kwa (thomas-kwa) · 2023-10-07T05:45:19.983Z · LW(p) · GW(p)

Suppose that you're pondering free will, and get to a mental state where you say "it seems like Asher could have driven off the cliff, but it also seems like Asher could not have driven off the cliff", where you have two tensions in conflict.

Here you might try to clarify things by saying could_A = "Asher has the feeling of ability to drive off the cliff" and could_B = "Asher will drive off the cliff if you run the deterministic universe forward".

If you let two sides of you argue about which side is right, pick a definition of "could", conclude from this definition that side A or side B is right, and drop it there, you've lost one of your intuitions and thus lost the path to more insights. If you instead define could_A and could_B, then conclude that side A is right under definition A, but side B is right under definition B, then drop it there, you've also lost the intuitions. Either way it seems like you're just internally arguing about whether a taco is a sandwich, and you're not on track to inventing functional decision theory.

I think there's some mindset to inhabit where you can stay in touch with both intuitions long enough to actually lay out the assumptions behind each poorly-defined position, and examine them until you get to Eliezer's position on free will, and then go invent FDT. This seems harder to do in agent foundations than in the average scientific field, but Nate claims it's possible. The hardest part for me was having the intuitions in the first place.

Replies from: Chris_Leong, TAG
comment by Chris_Leong · 2023-10-07T11:41:14.477Z · LW(p) · GW(p)

I claim that you also need a could_C "Asher counterfactually could drive off the cliff" unless you want to be eliminativist about counterfactuals.

I've written about this here [LW · GW]. Eliezer seems to have found the same solution that I did for the student and exam problem: there's a distinction between a) being fixed independently of your action b) being fixed given your action.

comment by TAG · 2023-10-07T10:09:30.232Z · LW(p) · GW(p)

One exercise we worked through was resolving free will, which is apparently an ancient rationalist tradition [LW · GW]. Suppose that Asher drives by a cliff on their way to work but didn’t swerve. The conflicting intuitions are “Asher felt like they could have chosen to swerve off the cliff”, and “the universe is deterministic, so Asher could not have swerved off the cliff”. But to me, this felt like a confusion about the definition of the word “could”, and not some exciting conflict—it’s probably only exciting when you’re in a certain mindset. [edit: I elaborate in a comment [LW(p) · GW(p)]]

Becoming deconfused doesn't have to mean a) finding the one right answer. It can also mean b) there is more than on the right answer c) there are no right answers.

EYs "dissolution" of free will is very much an a)type: the universe is deterministic , so thefeeling of free will is illusory.

An actual deconfusion would notice that you can't tell whether the universe is deterministic by armchair reflection. And that "Asher could have swerved because the universe isn't deterministic" is another consistent solution.( Only being able to see one so!union feels like lack of confusion, but isnt).

EY pushes an a) type solution without disproving a b) type (dis)solution. The correct approach includes an element of "more research is needed" as well as an element of "depends on what you mean by".

The situation with decision theory is similar ... there needs to be, but there isnt, a debate on whether a single DT can work for every possible agent in every possible universe.