How to do theoretical research, a personal perspective

post by Mark Xu (mark-xu) · 2022-08-19T19:41:21.562Z · LW · GW · 6 comments

Contents

  How to do research
    Figuring out what you want to happen in real-world cases
      ELK Examples
      Potshot algorithms
    Translating what you want in real-world cases into desiderata for simple cases
      ELK Example
    Articulating an algorithm for solving simple cases
      ELK Examples
    Finding cases where your algorithm doesn’t do what you want
      ELK Examples
    Other random tips
None
6 comments

"Where do new [algorithms] come from? I keep reading about someone who invented [an algorithm] to do something-or-other but there's no mention of how."

A shrug of robed shoulders. "Where do new books come from, Mr. Potter? Those who read many books sometimes become able to write them in turn. How? No one knows."

"There are books on how to write -"

"Reading them will not make you a famous playwright. After all such advice is accounted for, what remains is mystery. The invention of new [algorithms] is a similar mystery of purer form."

- Harry Potter and the Methods of Rationality

Most of the content in this document came out of extensive conversations with Paul Christiano.

(This document describes one way of thinking about how to do one particular type of research. There are other ways to productively do this kind of research, and other productive kinds of research. Consider this document peppered with phrases like “from my perspective”, “I think”, “my sense is”, etc.)

A lot of people have a vague mental picture of what empirical research looks like, which consists of exploring data, articulating hypotheses about the data, and running experiments that potentially falsify the hypothesis. I think people lack a similarly mechanistic picture of what theoretical research looks like, which results in them not knowing how to do theory, being skeptical of the possibility of theoretical progress, etc. I do think the difficulty of getting high-quality real-world feedback makes theoretical research more difficult than empirical research, but I think it’s possible to get enough real-world feedback when doing theory that you can still expect to make steady progress.

Currently, I think many people think of theory as “someone sits in a room and has a brilliant insight to solve the problem.” Instead, I think a more accurate picture is very similar to the picture one has for empirical research: the theorist explores some data, gradually builds an intuitive sense of what’s going on, articulates hypotheses that capture their intuition, and falsifies their hypotheses by testing them against data, all the while iteratively building up their understanding.

The key difference between the empirical researcher and the theoretical researcher is while the empirical researcher can build intuition and falsify hypothesis by considering real-world data, the theorist, although ultimately be grounded in the real world, must build intuition and falsify hypothesis by considering thought experiments, simple toy examples, computations, etc. Even mathematics, the purest of intellectual pursuits, roughly follows this process of iteration. Terence Tao:

[A]ctual solutions to a major problem tend to be arrived at by a process more like the following (often involving several mathematicians over a period of years or decades, with many of the intermediate steps described here being significant publishable papers in their own right):

  1. Isolate a toy model case x of major problem X.
  2. Solve model case x using method A.
  3. Try using method A to solve the full problem X.
  4. This does not succeed, but method A can be extended to handle a few more model cases of X, such as x’ and x”.
  5. Eventually, it is realised that method A relies crucially on a property P being true; this property is known for x, x’, and x”, thus explaining the current progress so far.
  6. Conjecture that P is true for all instances of problem X.
  7. Discover a family of counterexamples y, y’, y”, … to this conjecture. This shows that either method A has to be adapted to avoid reliance on P, or that a new method is needed.
  8. Take the simplest counterexample y in this family, and try to prove X for this special case. Meanwhile, try to see whether method A can work in the absence of P.
  9. Discover several counterexamples in which method A fails, in which the cause of failure can be definitively traced back to P. Abandon efforts to modify method A.
  10. Realise that special case y is related to (or at least analogous to) a problem z in another field of mathematics. Look up the literature on z, and ask experts in that field for the latest perspectives on that problem.
  11. Learn that z has been successfully attacked in that field by use of method B. Attempt to adapt method B to solve y.
  12. After much effort, an adapted method B’ is developed to solve y.
  13. Repeat the above steps 1-12 with A replaced by B’ (the outcome will of course probably be a little different from the sample storyline presented above). Continue doing this for a few years, until all model special cases can be solved by one method or another.
  14. Eventually, one possesses an array of methods that can give partial results on X, each of having their strengths and weaknesses. Considerable intuition is gained as to the circumstances in which a given method is likely to yield something non-trivial or not.
  15. Begin combining the methods together, simplifying the execution of these methods, locating new model problems, and/or finding a unified and clarifying framework in which many previous methods, insights, results, etc. become special cases.
  16. Eventually, one realises that there is a family of methods A^* (of which A was the first to be discovered) which, roughly speaking, can handle all cases in which property P^* (a modern generalisation of property P) occurs. There is also a rather different family of methods B^* which can handle all cases in which Q^* occurs.
  17. From all the prior work on this problem, all known model examples are known to obey either P^* or Q^. Formulate Conjecture C: all cases of problem X obey either P^ or Q^*.
  18. Verify that Conjecture C in fact implies the problem. This is a major reduction!
  19. Repeat steps 1-18, but with problem X replaced by Conjecture C. (Again, the storyline may be different from that presented above.) This procedure itself may iterate a few times.
  20. Finally, the problem has been boiled down to its most purified essence: a key conjecture K which (morally, at least) provides the decisive input into the known methods A^, B^, etc. which will settle conjecture C and hence problem X.
  21. A breakthrough: a new method Z is introduced to solve an important special case of K.
  22. The endgame: method Z is rapidly developed and extended, using the full power of all the intuition, experience, and past results, to fully settle K, then C, and then at last X.
  23. The technology developed to solve major problem X is adapted to solve other related problems in the field. But now a natural successor question X’ to X arises, which lies just outside of the reach of the newly developed tools… and we go back to Step 1.

How to do research

The Research methodology section of the ELK report and Paul’s My research methodology both articulate a high level picture of the basic theoretical research loop, but lack details about what steps one takes besides “propose solutions” and “generate counterexamples”

To lend more color to these vague descriptions and provide examples comprehensible to readers without an extensive background in number theory, I will describe what I think of as “modes” of research, articulate some key questions that get asked, and provide (stylized) historical examples related to ELK. I will approach this from the perspective of designing an algorithm, but this basic description will apply to many possible endeavors.

Suppose I’m trying to design an algorithm to accomplish task T (e.g. elicit latent knowledge) over a variety of situations S (e.g. ways my AI could look internally). My goal as a theorist is to develop a sufficiently accurate, unified, and precise intuition of how I hope to accomplish task T in every situation S such that I can just formalize the rules I’m using in my head into an algorithm and my problem has been solved. You might also say that the goal of someone trying to discover the laws of nature is to develop a precise enough model of nature in their head that they can just write it down and they have a law of nature.

This process of algorithm development roughly proceeds as follows:

  1. Figure out what I want to happen in real-world cases
  2. Figure out what I want to happen in simpler cases
  3. Articulate a general algorithm that does what I want in simple cases
  4. Search for examples where the algorithm does something that I don’t want
  5. Become tentatively convinced that a certain class of algorithms can’t work
  6. Refine my sense of what I want to happen in simpler cases
  7. Articulate another general algorithm that does what I want in simple cases
  8. Search for examples where the algorithm does something that I don’t want
  9. ...

I think of this process as roughly having 4 key mode:

  1. Figuring out what you want to happen in real-world cases
  2. Translating what you want in real-world cases into desiderata for simple cases
  3. Articulating an algorithm for solving simple cases
  4. Finding cases where your algorithm doesn’t do what you want

I will describe these modes as happening in sequence, but in practice they’re all happening at the same time, just with different amounts of emphasis. It’s common to spend a day or two in a particular mode. I would begin worrying if I spent more than ~three days trying to do a particular step in isolation, without the feedback loop that comes from transitioning between steps.

Figuring out what you want to happen in real-world cases

Ultimately, your intuition for what you want your algorithm to do has to be anchored on what you actually wanted to happen in the real world. The way to develop this intuition is to roughly “solve” the task by hand in many examples, and let the examples wash over you and percolate into your intuition. The key move in this research mode is asking questions of the form “Suppose the world was in situation S, what would the accomplishing task T look like?”

Questions that I find helpful to ask myself:

Your goal in this mode is to develop an extremely precise sense of exactly what you do/don’t want to happen in a handful of cases to serve as the final arbiter for whether you’ve succeeded at developing an algorithm. It’s generally okay to be extremely unsure of what you want to happen in a large number of cases/not know exactly how to handle a lot of scenarios. However, it’s often worth spending some time trying to articulate high-level hopes for various cases that you’re confused about how to handle (e.g. Indirect normativity: defining a utility function)

ELK Examples

Historically, we started considered cases like:

A small breakthrough was when we articulated the Game of Life Example, which gave us an extremely precise sense of what we wanted to happen in at least one case.

Other cases that one might consider:

Potshot algorithms

Ideally, once you’ve developed a precise sense of what you want to happen for real-world cases, you can iterate on algorithms by checking to see if they do what you want. Unfortunately, often times real-world cases are too complicated, which means that:

  1. It’s really hard to articulate algorithms that handle real-world cases without first iterating on simple cases. Most complicated problems need to be broken down into parts and handled separately.
    1. Example: writing complicated software consists of writing modular functions and composing them together.
  2. Even if you articulated an algorithm that might work, it’s really hard to tell what it actually does. You can still discard algorithms if you think you can tell a plausible story of that algorithm failing, but this requires a lot of judgment of what counts as “plausible.” 2. Example: It’s very hard to tell what “train a model to imitate what a human would say, hope it generalizes naturally” will result in the real world, so you can iterate on that. You can try to iterate on stories about bad things that might happen, e.g. Bad behavior: do inference in the human Bayes net, but this is still less than ideal.

It’s still often worth trying to directly solve real-world cases by proposing “potshot” algorithms. Here are two reasons:

  1. Often you’re not sure how hard your problem actually is. If it’s easy, you might just be able to solve it without developing substantial intuition about what the solution will look like.
    • Example: Often you can solve simple math problems via brute symbol manipulation, without really understanding what the proof “means”.
  2. It’s worth developing a sense for why “naive” algorithms fail, to get a sense of what the key barriers to coming up with an algorithm don’t work. It’s also often worth specifying these algorithms and counterexamples fully to get maximum surface area, even though you don’t expect adding more details to change your bottom line about whether an algorithm fails.
    • Example: you can think of most of the proposals in the ELK report as these sorts of potshots. We had a non-trivial probability that one of them would work, but in the end none of them did and we gained significant intuition about what the key difficulties were likely to be.

Translating what you want in real-world cases into desiderata for simple cases

Once you’re satisfied that potshot approaches are unlikely to work, the hard work of iteration can begin. Since you can’t iterate against real world cases, you must develop a sense of what you want to happen in cases simple enough that you can work through by hand. The way to do this is to try to build simple toy models of real-world situations, and then transfer your hard-won intuition about what you want to happen in those cases onto the simple toy cases.

Questions that I find helpful to ask myself:

Your goal in this mode is to develop an extremely precise sense of what you want to happen for a handful of examples that are simple enough for you to write down formally, evaluate by hand, etc. Again, it’s generally okay to be extremely unsure of what you want to happen in all but a handful of cases.

ELK Example

I’m currently considering cases like:

It seems clear that 'd' would be a good direct translator in this case, while 'x AND y' would be bad. But is d AND ¬hx AND ¬hy also an acceptable direct translator?

This example can be extended in a few ways:

Articulating an algorithm for solving simple cases

Once you have a sufficiently precise sense of what you want to happen you want to articulate a general algorithm that doesn’t special-case the cases where you know what you want, but nevertheless has the desired behavior anyway. The way to do this is to consider the reasoning that let you decide what you wanted and try to develop underlying rules or natural generalizations. Often, the process of trying to articulate a general algorithm will point out ambiguities in your sense of what you want to happen, leading to substantial revision/sharpening of your intuition.

Questions that I find helpful to ask myself:

Often, when you try to articulate the general rules behind what you’re to “solve” simple cases, you’ll find that you were accidentally special casing one of those simple cases, and you can’t quite see the connection between what you did in case 1 and what you did in case 2. In these situations, you can:

Generally, I think that if you can intuitively “solve” every case, then your intuition must be reliably executing some algorithm that solves every case. Your job is to just sharpen your intuition until it has unified, and extract the algorithm it’s executing.

ELK Examples

Unfortunately, describing examples in detail would require too much context for me to write down :(.

Finding cases where your algorithm doesn’t do what you want

After you have an algorithm, you want to articulate a case where it doesn’t do what you want. This is generally much easier than other parts of the process, because you have a precise sense of what you want and a precise algorithm.

Questions that I find helpful to ask myself:

ELK Examples

Again, unfortunately all the detailed examples I can think of require too much context for me to write down. Eliciting Latent Knowledge has many “worst-case” counterexamples to “potshot” algorithms that might give a general feel, but you typically wanting to be working more precisely than that.

Other random tips

6 comments

Comments sorted by top scores.

comment by aysja · 2023-10-04T23:05:05.325Z · LW(p) · GW(p)

This seems wrong to me in some important ways (at least as general theoretical research advice). Like, some of the advice you give seems to anti-predict important scientific advances.

Generally, unguided exploration is seldom that useful. 

Following this advice, for instance, would suggest that Darwin not go on the Beagle, i.e., not spend five years exploring the globe (basically just for fun) as a naturalist. But his experiences on the Beagle were exactly what led him to the seeds of natural selection, as he began to notice subtleties like how animals changed ever so slightly as one moves up a continent. It also seems like it screens out a bunch of Faraday’s experimental work on electricity, much of which he did because it seemed interesting or fun, rather than backchaining from some predetermined goal. Like, he has an entire lecture series on candles, which was mostly just him over and over saying “And isn’t it weird that this thing happens, too?? What happens if we change this?” And they’re great, and a lot of that exploratory work laid the groundwork for Maxwell’s later work on electromagnetism.

Cutting off research avenues that are fun to think about, but ultimately not that productive.

Similarly, I think this is one of the main failure modes with modern scientific research. When I look at academia one of the things I’m most hoping for is that people follow their taste more, and that they have more fun! Because often things that are open-ended and fun to play around with hold a deeper kind of logic that you’re attracted to, but haven’t articulated yet. If you only stick to things that seem immediately productive then you (roughly) never find truly novel or cool ideas. E.g., both Babbage and Shannon tinkered around with different coding type projects when they were younger (cipher cracking and barbed wire telegraphs, respectively), and I think it’s not crazy to assume that this sort of playing around with representing information abstractly may have helped with their later, more ambitious projects (general computers, information theory). Also, many Nobel prize winners say they wouldn’t have been able to do their seminal in the current environment because, e.g., “Today I wouldn’t get an academic job. It’s as simple as that. I don’t think I would be regarded as productive enough.” (Higgs). Certainly, some things are dead ends and it can be a bit hard to know that in advance, but if you prematurely screen off all of them you screen off the great ideas, too.

I think Altman puts it nicely, here: “Good ideas—actually, no, great ideas are fragile. Great ideas are easy to kill…. All the best ideas when I first heard them sound bad. And all of us, myself included, are much more affected by what other people think of us and our ideas than we like to admit. If you are just four people in your own door, and you have an idea that sounds bad but is great, you can keep that self-delusion going. If you’re in a coworking space, people laugh at you, and no one wants to be the kid picked last at recess. So you change your idea to something that sounds plausible but is never going to matter. It’s true that coworking spaces do kill off the very worst ideas, but a band-pass filter for startups is a terrible thing because they kill off the best ideas, too.” (Emphasis mine). Likewise, I think it is perhaps quite load-bearing the way that many great scientists spent significant portions of their thinking years alone (famously, Newton did this when he came up with Principia, but Darwin and Shannon too, etc.) 

On timescales of days and weeks, you should be able to point to concrete examples that constitute "units of progress" towards your final goal.

This also feels pretty wrong to me. Certainly that would be nice and perhaps something to try to aim for, but I don’t think it’s always the case and I don’t think the lack of it is that strong of evidence in favor of “not making progress.” Again, using Darwin as an example—after he noticed that species were mutable he spent about a year and a half trying to figure out why. He had one main insight a few months in—that breeders introduced changes via artificial selection—but he didn’t put it together for some time the way that nature could act as a selector. And in that year between “artificial” and “natural” selection, I would not say that he was making obvious, concrete progress on the solution because the solution wasn’t made from obvious steps. He had the right questions, and he read a lot, wrote a lot, talked to breeders, etc., but mostly he just held onto his confusion for a long time. And then one day in a flash of insight, shortly after reading Malthus, the solution came to him in a carriage ride. Certainly not all research looks like this, but I do think it’s an illustrative example of how good theoretical work can come out of non-obvious units of progress. 

I know at the beginning you mentioned that this is advice for a particular kind of research from your perspective, and I do think that it’s useful in certain domains. But I worry it’s easy to forget, at the end of a document with many high-level tips, that it’s not general advice on how to do good theoretical alignment work, period. And because I do think that some of this advice anti-predicts great scientific work—in particular the sort that I think alignment is currently most lacking, and the sort that would be the most helpful, were we to have it—I wanted to push back a bit on the idea that many people might walk away with, i.e., that this is general advice for theoretical work in alignment. 


 

comment by jacob_cannell · 2022-08-21T08:15:50.075Z · LW(p) · GW(p)

A shrug of robed shoulders. "Where do new books come from, Mr. Potter? Those who read many books sometimes become able to write them in turn. How? No one knows."

Funny enough we do now have a pretty plausible model of how this works - in the form of GPT3 and similar LLMs (which are surprisingly similar to linguistic cortex for the same reason that large deep vision models are similar to visual cortex).

Train a big (ANN or BNN) on sensory stream prediction of text and it ... generates text! In the human case this is just our internal monologue, which we (or some of us, with additional training) can then additionally steer/branch/backtrack/record/edit into higher quality stories because we also have general planning capability.

comment by Alex_Altair · 2023-05-09T22:58:09.000Z · LW(p) · GW(p)

I'd be interested to know how much you (or other readers) think this content carries over to other areas of research that aren't so specifically "the kind of theory ARC does". For example;

On timescales of days and weeks, you should be able to point to concrete examples/algorithms that constitute “units of progress” towards your final goal.

Is "days or weeks" the right scale here for, say, research in computational complexity? Or other alignment research?

comment by jungofthewon · 2022-08-20T16:28:11.169Z · LW(p) · GW(p)

This was really helpful and fun to read. I'm sure it was nontrivial to get to this level of articulation and clarity. Thanks for taking the time to package it for everyone else to benefit from. 

comment by Thomas Kwa (thomas-kwa) · 2022-08-19T21:54:11.322Z · LW(p) · GW(p)

"Terrance Tao" should be "Terence Tao"

"while the x OR y would be bad" should maybe be "while 'x AND y' would be bad"?

Replies from: mark-xu
comment by Mark Xu (mark-xu) · 2022-08-20T14:48:52.060Z · LW(p) · GW(p)

oops thanks

yeah, should be x AND y.