A model of research skill

post by L Rudolf L (LRudL) · 2024-01-08T00:13:12.755Z · LW · GW · 6 comments

This is a link post for https://www.strataoftheworld.com/2024/01/a-model-of-research-skill.html

Contents

  Two failure modes
  Methodology, except “methodology” is too fancy a word
  The Big Three
    Taste
    Paranoia
    Communication
  Other points
    Good things to read on research skill
None
6 comments

Doing research means answering questions no one yet knows the answer to. Lots of impactful projects are downstream of being good at this. A good first step is to have a model for what the hard parts of research skill are.

Two failure modes

There are two opposing failure modes you can fall into when thinking about research skill.

The first is the deferential one. Research skill is this amorphous complicated things, so the only way to be sure you have it is to spend years developing it within some ossified ancient bureaucracy and then have someone in a funny hat hand you a piece of paper (bonus points for Latin being involved).

The second is the hubristic one. You want to do, say, AI alignment research. This involves thinking hard, maybe writing some code, maybe doing some maths, and then writing up your results. You’re good at thinking - after all, you read the Sequences, like, 1.5 times. You can code. You did a STEM undergrad. And writing? Pffft, you’ve been doing that since kindergarten!

I think there’s a lot to be said for hubris. Skills can often be learned well by colliding hard with reality in unstructured ways. Good coders are famously often self-taught. The venture capitalists who thought that management experience and a solid business background are needed to build a billion-dollar company are now mostly extinct.

It’s less clear that research works like this, though. I’ve often heard it said that it’s rare for a researcher to do great work without having been mentored by someone who was themselves a great researcher. Exceptions exist and I’m sceptical that any good statistics exist on this point. However, this is the sort of hearsay an aspiring researcher should pay attention to. It also seems like the feedback signal in research is worse than in programming or startups, which makes it harder to learn.

Methodology, except “methodology” is too fancy a word

To answer this question, and steer between deferential confusion and hubristic over-simplicity, I interviewed people who had done good research to try to understand their models of research skill. I also read a lot of blog posts. Specifically, I wanted to understand what about research a bright, agentic, technical person trying to learn at high speed would likely fail at and either not realise or not be able to fix quickly.

I did structured interviews with Neel Nanda (Google DeepMind; grokking), Lauro Langosco (Krueger Labgoal misgeneralisation), and one other. I also learned a lot from unstructured conversations with Ferenc HuszarDmitrii KrasheninnikovSören MindermannOwain Evans, and several others. I then procrastinated on this project for 6 months touched grass and formed inside views by doing the MATS research program under the mentorship of Owain Evans. I owe a lot to the people I spoke to and their willingness to give their time and takes, but my interpretation and model should not taken as one they would necessarily endorse.

My own first-hand research experience consists mainly of a research-oriented CS (i.e. ML) master’s degree, followed by working as a full-time researcher for 6 months and counting. There are many who have better inside views than I do on this topic.

The Big Three

In summary:

  1. There are a lot of ways reality could be (i.e. hypotheses), and a lot of possible experiment designs. You want to avoid brute-forcing your way through these large spaces as much as possible, and instead be good at picking likely-true hypotheses or informative experiments. Being good at this is called research taste, and it’s largely an intuitive thing that develops over a lot of time spent engaging with a field.
  2. Once you have some bits of evidence from your experiment, it’s easy to over-interpret them (perhaps you interpret them as more bits than they actually are, or perhaps you were failing to consider how large hypothesis space is to start with). To counteract this, you need sufficient paranoia about your results, which mainly just takes careful and creative thought, and good epistemics.
  3. Finally, you need to communicate your results to transfer those bits of evidence into other people’s heads, because we live in a society.

Taste

Empirically, it seems that a lot of the value of senior researchers is a better sense of which questions are important to tackle, and better judgement for what angles of attack will work. For example, good PhD students often say that even if they’re generally as technically competent as their adviser and read a lot of papers, their adviser has much better quick judgements about whether something is a promising direction.

When I was working on my master’s thesis, I had several moments where I was working through some maths and got stuck. I’d go to one of my supervisors, a PhD student, and they’d have some ideas on angles of attack that I hadn’t thought of. We’d work on it for an hour and make more progress than I had in several hours on my own. Then I’d go to another one of my supervisors, a professor, and in fifteen minutes they’d have tried something that worked. Part of this is experience making you faster at crunching through derivations, and knowing things like helpful identities or methods. But the biggest difference seemed to be a good gut feeling for what the most promising angle or next step is.

I think the fundamental driver of this effect is dealing with large spaces: there are many possible ways reality could be (John Wentworth talks about this here [LW · GW]), and many possible things you could try, and even being slightly better at honing in on the right things helps a lot. Let’s say you’re trying to prove a theorem that takes 4 steps to prove. If you have a 80% chance of picking the right move at each step, you’ll have a 41% chance of success per attempt. If that chance is 60%, you’ll have a 13% chance – over 3 times less. If you’re trying to find the right hypothesis within some hypothesis space, and you’ve already managed to cut down the entropy of your probability distribution over hypotheses to 10 bits, you’ll be able to narrow down to the correct hypothesis faster and with fewer bits than someone whose entropy is 15 bits (and who’s search space is therefore effectively 25 = 32 times as large). Of course, you’re rarely chasing down just a single hypothesis in a defined hypothesis class. But if you’re constantly 5 extra bits of evidence ahead compared to someone in what you’ve incorporated into your beliefs, you’ll make weirdly accurate guesses from their perspective.

Why does research taste seem to correlate so strongly with experience? I think it’s because the bottleneck is seeing and integrating evidence into your (both explicit and intuitive) world models. No one is close to having integrated all empirical evidence that exists, and new evidence keeps accumulating, so returns from reading and seeing more keep going. (In addition to literal experiments, I count things like “doing a thousand maths problems in this area of maths” as “empirical” evidence for your intuitions about which approaches work; I assume this gets distilled into half-conscious intuitions that your brain can then use when faced with similar problems in the future)

This suggests that the way to speed-run getting research taste is to see lots of evidence about research ideas failing or succeeding. To do this, you could:

  1. Have your own research ideas, and run experiments to test them. The feedback quality is theoretically ideal, since reality does not lie (but may be constrained by what experiments you can realistically run, and a lack of the paranoia that I talk about next). The main disadvantage is that this is often slow and/or expensive.
  2. Read papers to see whether other people’s research ideas succeeded or failed. This is prone to several problems:
    1. Biases: in theory, published papers are drawn from the set of ideas that ended up working, so you might not see negative samples (which is bad for learning). In practice, paper creation and selection processes are imperfect, so you might see lots of bad or poorly-communicated ones.
    2. Passivity: it’s easy to fool yourself into thinking you would’ve guessed the paper ideas beforehand. Active reading strategies could help; for example, read only the paper’s motivation section and write down what experiment you’d design to test it, and then read only the methodology section and write down a guess about the results.
  3. Ask someone more experienced than you to rate your ideas. A mentor’s feedback is not as good as reality’s, but you can get it a lot faster (at least in theory). The speed up is huge: a big ML experiment might take a month to set up and run, but you can probably get detailed feedback on 10 ideas in an hour of conversation. This is a ~7000x speedup. I suspect a lot of the value of research mentoring lies here: an enormous amount of predictable failures or inefficiently targeted ideas can be skipped or honed into better ones, before you spend time running the expensive test of actually checking with reality. (If true, this would imply that the value of research mentorship is higher whenever feedback loops are worse.)

Chris Olah has a list of suggestions for research taste exercises (number 1 is essentially the last point on my list above).

Research taste takes the most time to develop, and seems to explain the largest part of the performance gap between junior and senior researchers. It is therefore the single most important thing to focus on developing.

(If taste is so important, why does research output not increase monotonically with age in STEM fields? The scary biological explanation is that fluid intelligence (or energy or …) starts dropping at some age, and this decreases your ability to execute on maths/code, even assuming your research taste is constant or improving. Alternatively, hours used on deep technical work might tend to decline with advanced career stages.)

Paranoia

I heard several people saying that junior researchers will sometimes jump to conclusions, or interpret their evidence as saying more than it actually does. My instinctive reaction to this is: “wait, but surely if you just creatively brainstorm the ways the evidence might be misleading, and take these into account in making your conclusions (or are industrious about running additional experiments to check them), you can just avoid this failure mode?” The average answer I got was that yes, this seems true, and indeed many people either only need one peer review cycle to internalise this mindset, or pretty much get it from the start. Therefore, I’m almost tempted to chuck this category off this list, and onto the list of less crucial things where “be generally competent and strategic” will sort you out in a reasonable amount of time. However, two things hold me back.

First, confirmation bias is a strong thing, and it seems helpful to wave a big red sign saying “WARNING: you may be about to experience confirmation bias”.

Second, I think this is one of the cases where the level of paranoia required is sometimes more than you expect, even after you expect it will be high. John Wentworth puts this best in You Are Not Measuring What You Think You Are Measuring [LW · GW], which you should go read right now. There are more confounders and weird effects than are dreamt of in your philosophies.

A few people mentioned going through the peer review process as being a particularly helpful thing for developing paranoia.

Communication

I started out sceptical about the difficulty of research-specific communication, above and beyond general good writing. However, I was eventually persuaded that yes, research-specific communication skills exist and are important.

First, if research has impact, it is through communication. Rob Miles once said (at a talk) something along the lines of: “if you’re trying to ensure positive AGI outcomes through technical work, and you think that you are not going to be one of the people who literally writes the code for it or is in the room when it’s turned on, your path to impact lies through telling other people about your technical ideas.” (This generalises: if you want to drive good policy through your research and you’re not literally writing it …, etc.) So you should expect good communication to be a force multiplier applied on top of everything else, and therefore very important.

Secondly, research is often not communicated well. On the smaller scale, Steven Pinker moans endlessly – and with good reason – about academic prose (my particular pet peeve is the endemic utilisation of the word “utilise” in ML papers.). On the larger scale, entire research agendas can get ignored because the key ideas aren’t communicated in a sufficiently clear and legible way.

I don’t know what’s the best way to speed-run getting good at research communication. Maybe read Pinker to make sure you’re not making predictable mistakes in general writing. I’ve heard that experienced researchers are often good at writing papers, so maybe seek feedback from any you know (but don’t internalise the things they say that are about goodharting for paper acceptance). With papers, understand how papers are read. Some sources of research-specific communication difficulty I can see are (a) the unusually high need for precision (especially in papers), and (b) communicating the intuitive, high-context, and often unverbalised-by-default world models that guide your research taste (especially when talking about research agendas).

Other points

Good things to read on research skill

(I have already linked to some of these above.)

6 comments

Comments sorted by top scores.

comment by Alex_Altair · 2024-01-10T22:23:05.541Z · LW(p) · GW(p)

I had a side project to get better at research in 2023. I found very little resources that were actually helpful to me. But here are some that I liked. 

  • A few posts by Holden Karnofsky on Cold Takes, especially Useful Vices for Wicked Problems and Learning By Writing.
  • Diving into deliberate practice. Most easily read is the popsci book Peak. This book emphasizes "mental representations", which I find the most useful part of the method, though I think it's also the least supported by the science.
  • The popsci book Grit.
  • The book Ultralearning. Extremely skimmable, large collection of heuristics that I find essential for the "lean" style of research.
  • Reading a scattering of historical accounts of how researchers did their research, and how it came to be useful. (E.g. Newton, Einstein, Erdős, Shannon, Kolmogorov, and a long tail of less big names.)

(Many resources were not helpful for me for reasons that might not apply to others; I was already doing what they advised, or they were about how to succeed inside academia, or they were about emotional problems like lack of confidence or burnout. But, I think mostly I failed to find good resources because no one knows how to do good research.)

Replies from: aysja
comment by aysja · 2024-01-25T00:03:44.195Z · LW(p) · GW(p)

Seconded! I love Holden's posts on wicked problems, I revisit them like once a week or whenever I'm feeling down about my work :p

I've also found it incredibly useful to read historical accounts of great scientists. There's just all kinds of great thinking tips scattered among biographies, many of which I've encountered on LessWrong before, but somehow seeing them in the context of one particular intellectual journey is very helpful. 

Reading Einstein's biography (by Walter Isaacson) was by far my favorite. I felt like I got a really good handle for his style of thinking (e.g., how obsessed he was with unity—like how he felt it “unbearable” that there should be an essential difference between a magnet moving through a conducting coil and a coil moving around a magnet, although the theories at the time posited such a difference; his insistence on figuring out the physical meaning of things—with special relativity, this was the operationalization of "time," with quanta this was giving meaning to an otherwise mathematical curiosity that Planck had discovered; his specific style of thought experiments; and just a sense of how wonderful and visceral his curiosity about the world was, like how as a very young child his father brought him a compass, and as he watched the needle align due to some apparently hidden force field he trembled and grew cold at the prospect of non-mechanical causes). He's so cool! 

Replies from: mattmacdermott
comment by mattmacdermott · 2024-01-25T11:20:40.010Z · LW(p) · GW(p)

Any other biography suggestions?

comment by domenicrosati · 2024-01-08T14:14:37.862Z · LW(p) · GW(p)

I think people underestimate formal study of research methods like reading texts / taking a course on research methodology for improving research abilities.

There are many concepts within control, experimental design, validitiy, and reliability like construct validity or conclusion validity that you would learn from a research methods textbook that are super helpful for improving the quality of research. I think many researchers implicitly learn these things without ever knowing what they exactly are but that is usually through trial and errors (peer review rejections and embarrasment) which can be avoided by looking into research methods texts.

Of course this doesn't help with the discovery aspect of research which I think your article is good at outlining but at some point research questions need to be investigated and understanding research design makes it really obvious the kinds of work you need to do in order to have a high quality investigation.

Replies from: LRudL
comment by L Rudolf L (LRudL) · 2024-01-08T14:49:44.533Z · LW(p) · GW(p)

Do you have a recommendation for a good research methods textbook / other text?

comment by Jonas Hallgren · 2024-01-08T16:42:47.182Z · LW(p) · GW(p)

Great post!

I wanted to mention something cool I learnt the other day which is that buddhism actually was created with a lot of the cultural baggage already there. (This is a relevant point, let me cook)

The buddha actually only came up with the new invention of "Dependent Origination". This lead to a view of the inherent emptiness (read underdeterminedness) of phenomenology. Yet it was only one invention on top of the rest that led to a view that in my opinion reduces a lot of suffering.

Similarly human evolution to where we are today is largely a process of cultural evolution as described in The Secret Of Our Success.

What I want to say is that ideas are built on other ideas and that Great Artists Steal. (Also a book)

Final statistic is that interdisciplinary researchers generally have more influential papers than specialised researchers.

So what is the take away for me? Well by sampling from independent sources of information you gain a lot more richness in your models. I therefore am trying to slap together dynamical systems, Active Inference and Boundaries at the moment as they seem to have a lot in common that seems relevant for embedded agents.

(Extra note is that GPT is actually really good at generating leads in between different areas of study. Especially biology + ML.)