Evaluations (of new AI Safety researchers) can be noisy

post by LawrenceC (LawChan) · 2023-02-05T04:15:02.117Z · LW · GW · 10 comments

Contents

  Introduction: evaluating skill is hard, and most evaluations are done via proxies
    My personal experience
  Why exactly are common evaluations so noisy?
    Bootcamp/Funding/Job Applications
    First impressions at parties/conferences/workshops
    Job Performance
  Yes, this includes your evaluations as well. 
    On anxious underconfidence and self-handicapping
  What does this mean you should do?
  Acknowledgments
  Appendix: testimonials from other researchers
    Addendum from Beth Barnes
    Addendum from Scott Emmons
    Addendum from anonymous senior AGI safety researcher
None
10 comments

TL;DR: Evaluating whether or not someone will do well at a job is hard, and evaluating whether or not someone has the potential to be a great AI safety researcher is even harder. This applies to evaluations from other people (e.g. job interviews, first impressions at conferences) but especially to self-evaluations. Performance is also often idiosyncratic: people who do poorly in one role may do well in others, even superficially similar ones. As a result, I think people should not take rejections or low self confidence so seriously, and instead try more things and be more ambitious in general. 

Related workHero Licensing [LW · GW], Modest Epistemology [? · GW], The Alignment Community is Culturally Broken [LW · GW], Status Regulation and Anxious Underconfidence [LW · GW], Touch reality as soon as possible [LW · GW], and many more. 

Epistemic status: This is another experiment in writing fast as opposed to carefully. (Total time spent: ~4 hours.) Please don’t injure yourself using this advice.[1] 


Introduction: evaluating skill is hard, and most evaluations are done via proxies

I think people in the LessWrong/Alignment Forum space tend to take negative or null evaluations of themselves too seriously.[2] For example, I’ve spoken to a few people who gave up on AI Safety after being rejected from SERI MATS and REMIX; I’ve also spoken to far too many people who are too scared to apply for any position in technical research after having a single negative interaction with a top researcher at a conference. While I think people should be free to give up whenever they want, my guess is that most people internalize negative evaluations too much, and would do better if they did less fretting and more touching reality. 

Fundamentally, this is because evaluations of new researchers are noisier than you think. Interviews and applications are not always indicative of the applicant’s current skill. First impressions, even from top researchers, do not always reflect reality. People can perform significantly differently in different work environments, so failing at a single job does not mean that you are incompetent. Most importantly, people can and do improve over time with effort. 

In my experience, a lot of updating so hard on negative examples comes from something like anxious underconfidence as opposed to reasoned arguments. It’s always tempting to confirm your own negative evaluations of yourself. And if you’re looking for reasons why you’re not “good enough” in order to handicap yourself, being convinced that one particular negative evaluation is not the end of the world will just make you overupdate on a different negative evaluation. Accordingly, I think it’s important to take things a little less seriously, be willing to try more things, and let your emotions more accurately reflect your situation. 

Of course, that’s not to say that you should respond to any negative sign by pushing yourself even harder; it’s okay to take time to recover when things don’t go well. But I strongly believe that people in the community give up a bit too easily, and are a bit too scared to apply to jobs and opportunities. In some cases, people give up even before the first external negative evaluation: they simply evaluate themselves negatively in their head, and then give up. Instead of doing this, you should try your best and put yourself out there, and let reality be the judge. 

My personal experience

I’m always pretty hesitant to use myself as an example, both because I’m not sure I’m “good enough” to qualify, and also because I think people should aspire to do better than I have. That being said, in my case:

I’m currently a researcher at the Alignment Research Center’s Evaluations team, was previously at Redwood Research and a PhD student at CHAI, have received offers from other AI labs, on the board of FAR, and have been involved in 5+ papers I’m pretty proud of in the past year.

In the past, I’ve had a bunch of negative signs and setbacks:

I also don’t think my case (or Beth’s case below) [LW · GW] was particularly unusual; several other AI safety researchers have had similar experiences. So empirically, it’s definitely not the case that a few negative evaluations mean that you cannot ever become an AI safety researcher. 

Why exactly are common evaluations so noisy?

Previously, I mentioned three common evaluation methods—-interviews/job applications, first impressions from senior researchers, and jobs/work trial tasks—and claimed that they tend to be noisy. Here, I’ll expand on why each evaluation method can be noisy in detail, even in cases where all parties are acting in good faith. 

This section is pretty long and rambly; feel free to skip to the next header if you feel like you’ve got the point already.

Bootcamp/Funding/Job Applications

By far the most common negative evaluation that most people receive is being rejected from a job or bootcamp, or having a funding application denied. While this is pretty disheartening, there’s a few reasons why a rejection may not be as informative as you might expect:

At the end of the day, not every denied application will come with a clearly denominated reason. I’d strongly recommend against immediately slapping on “the reason is because I’m bad” to every rejection. 

First impressions at parties/conferences/workshops

Insofar as applications don’t accurately reflect your skill or abilities, first impressions in social settings such as parties, conferences, and workshops are even worse. 

Yes, having negative social interactions always sucks. But a few negative interactions, even with famous or senior researchers, is not a particularly strong sign that you’re not cut out to be an AI researcher. 

Job Performance

It’s definitely true that poor job performance at a research-y job (or even a long work trial) is more of a signal than a rejection or a negative first impression. That being said, I don’t think it’s necessarily that strong of a signal, for the following reasons:

In my case, I think all four of the reasons applied to some extent for the last two years of my PhD: my skills were not super suited to academia, I was depressed in part due to COVID, I had significantly worse executive function, and I don’t think I enjoyed the academic culture at Berkeley very much. Again, while being let go from a job (or leaving due to poor performance) is definitely a negative sign, I think it’s nowhere near fatal for one’s research ambitions in itself. 

Yes, this includes your evaluations as well. 

In practice, people seem more hampered by their own self-assessments, more so than any external negative evaluations. I think a significant fraction of people I’ve met in this community have suffered from some form or another of imposter syndrome. I’ve also consistently been surprised by how often people fail to apply for jobs they’re clearly qualified for, and that would like to hire them. 

It’s certainly true that you have significantly more insight into yourself than any external evaluator. Empirically, I think that new researchers tend to be pretty poorly calibrated about how well they’d do in research later on, often underperforming even simple outside view heuristics. 

Why might self-assessments also suffer from significant noise?

Of course, I think people should aspire to have good models of themselves. But especially if you’re just starting out as a researcher, my guess is your model of your own abilities is probably relatively bad, and I would not update too much off of your self-assessments. 

On anxious underconfidence and self-handicapping

More speculatively, I think the tendency for people to over update on noisy negative evaluations is caused in large part due to a combination of anxiety and a desire to self-handicap. AI safety research is often quite difficult, and it’s understandable to feel scared or underconfident when starting your research journey.[4] And if you believe that such research is important and also feel daunted about whether or not you can contribute at all, it can be tempting to avoid touching reality or even self-handicapping to get an excuse for failure. After all, if your expectations are sufficiently low, you won’t ever be disappointed. 

I don’t think this dynamic happens at a conscious level for most people. Instead, my guess is that most people develop it due to status regulation [LW · GW] or due to small flinches from uncomfortable events. That being said, I do think it’s worth consciously pushing back against this!

What does this mean you should do?

You should touch reality as soon as possible [LW · GW], and try to get evidence on the precise concern or question you have. Instead of worrying about whether or not you can do something, or trying to extract the most out of the few bits of evidence you have, go gather more evidence! Try to learn the skills you think you don’t have, try to apply for some jobs or programs you think definitely won’t take you, and try to do the research you think you can’t do. 

I also find that I spend way more time encouraging people to be more ambitious than the other way around. So on average, I’d probably also recommend trying hard on the project that interests you, and being more willing to take risks with your career. 

That being said, I want to end this piece by reiterating the law of equal and opposite advice. While I suspect the majority of people should push themselves a bit harder to do ambitious things, this advice is precisely the opposite of what many people need to hear. There are many other valuable things you could be doing. If you’re currently doing an impactful job that you really enjoy, you should probably stick to it. And if you find that you’re already pushing yourself quite hard, and additional effort in this direction will hurt you, please stop. It’s okay to take it easy. It’s okay to rest. It’s okay to do what you need to do to be happy. Please don’t injure yourself using this advice.[5]


Acknowledgments

Thanks to Beth Barnes for inspiring this post and contributing her experiences in the appendix, and to Adrià Garriga-Alonso, Erik Jenner, Rachel Freedman, and Adam Gleave for feedback. 


Appendix: testimonials from other researchers

After writing the post, several other researchers reached out with additional evidence that they've given me evidence to post:

Addendum from Beth Barnes

Soon after writing the post, Beth Barnes reached out and gave me permission to post about her experiences:

I feel like I have a lot of examples of getting negative signals:

  • I [Beth] found undergraduate CS felt very hard and I was quite depressed especially in the second year of university. I felt like I wasn't understanding much of the content, and was barely scraping by.
  • I did a research internship that was highly unsuccessful - I had no idea how my supervisor's code worked and I spent most of the summer stuck on what I thought was an algorithmic problem but was actually a dumb bug I'd introduced at the beginning. After this I concluded I wasn't a good fit for technical research.
  • I felt like I 'didn't actually know how to code' and was actually not very smart and a total impostor, to the extent that I almost had a panic attack when a friend gave me a mock coding interview
  • I never even got to interview stage with any big tech company internships I applied to
  • I had a fixed-term role with an AI lab that I was hoping to extend or turn into a permanent position, but they decided not to continue the role and instead offered me a very junior operations assistant role.
  • There was discussion of firing me at an AI lab because people weren't excited about my work.
  • There were two incidents that I consider quite close to being fired, in that a manager had the choice to continue working with me or not, and chose not to.

Despite that, 

  • I currently run the evaluations project at ARC, which various people I respect think is pretty promising.
  • I've also produced some more standard technical alignment work I'm somewhat happy with.
  • In the past I was concerned that Paul had been saddled with me (after my previous manager left) and I was wasting his time, but he chose to hire me to ARC in the end.
  • I feel much better about my ability to code, mostly based on two key moments:
    • Realizing that trying to use high-level libraries you don't understand makes things much harder to debug, and it's much better for learning and overall faster to work with simple tools you understand well, even if that means writing significantly more code. Recognizing when I'm in a mode of 'randomly changing things I don't understand and hoping it will work', and trying to avoid that as much as possible.
    • Pair coding during MLAB (after again almost having a panic attack doing the coding test, and probably failing to meet the standard admission threshold on the test) and realizing that I wasn't actually that slow compared to various other people who were certified Good At Coding And ML (TM)

As a manager now, I've had to make various decisions about hiring, with different levels of involvement from skimming CVs to extended work trials. I've felt very uncertain in most cases. In particular, even with extended work trials, there's a lot of uncertainty because:

  •  People have different starting skills/knowledge, but  usually what we're actually interested in is growth rate, which is even harder to assess
  • Various people I chose not to continue with had significantly better technical skills than (I think) I did at their age, which feels confusing
  •  In various cases it felt like how well different people were doing was quite heavily influenced by extraneous factors, like whether they were working from the office, and how much energy and attention I had put into managing them. Ideally I would trial everyone in their optimal circumstances, after putting a decent amount of effort into thinking about what exactly they needed from me to maximally grow and flourish. But given limited resources this is often not what trials looked like.

I can also confirm direct knowledge of at least one case of a good candidate who was ultimately hired getting rejected for totally spurious clerical reasons. I wouldn't be surprised if this has happened various times without anyone even finding out.

Addendum from Scott Emmons

Scott Emmons, a PhD student at UC Berkeley's CHAI, gave me permission to share the following:

I'm happy with you mentioning the example of my getting rejected from the CHAI internship!

I was also rejected from the final round of Jane Street's trading internship interview process. I'm happy for you to mention that too if you think it's relevant.

My perspective on both these rejections is that I don't shine in on-the-spot problem solving interviews

Addendum from anonymous senior AGI safety researcher

Finally, a senior AGI safety researcher (who wishes to remain anonymous) sent me the following:

I listed people who I had had meetings with before 2021, and the meeting was at a time when they were either junior or new to the field (usually both, note I might also have been junior at the time). I then guessed how promising I thought they were at the time, and then said how promising I thought they were now (often this involved some Googling to figure out what they had done in the time since our meeting). I’ll focus here on the n=60 subgroup of “junior people already motivated by AI safety when I talked to them”.
 

  • For “promise now minus promise at time of meeting”, the mean is 0.05 and the stddev is 1.37 (on a 10-point scale where in practice most of my numbers were in the 5-8 range). So overall my initial impressions seem calibrated but non-trivially noisy. (Though I don’t take this too seriously since I’m guessing “promise at time of meeting” retroactively.)
  • The people who I rated highest by promise during an initial meeting usually stayed promising or decreased slightly (this is just optimizer’s curse). For those who stayed the same, the level of promise here is “more likely than not they will be hired by an existing AI safety org (including OpenAI + DeepMind) or achieve something similarly good” but not “probably a top-tier researcher”.
  • [Re-reading this now, I suspect that I’ve raised my estimate for the bar for getting hired by an AI safety org, and so the level of promise is actually lower than “more likely than not to get hired by an AI safety org”.]
  • The people who I rated low on promise had much more variance, though tended to increase in promise (again, optimizer’s curse / regression to the mean).
  • The two top people according to “promise now” had scores of “somewhat above average” and “below average” for “promise at time of meeting”. In general it seems like I’m pretty bad at identifying great people from a single meeting when they are junior.


 

  1. ^

    I think this probably also applies in general, but I’m much less sure than in the case of AI research. As always, the law of equal and opposite advice applies. It’s okay to take it easy, and to do what you need to do to recover. I also don’t think that everyone should aim to be an AI safety researcher – my focus is on this field because it’s what I’m most familiar with. If you’ve found something else you’re good at, you probably should keep doing it.

  2. ^

     I also think there’s a separate problem, where people take positive evaluations of their peers way too seriously. E.g. people seem to noticeably change in attitude if you mention you’ve worked with a high status person at some point in your life. I claim that this is also very bad, but it’s not the focus of the post.

  3. ^

    This also happens to a comical extent with papers at conferences. E.g. Neel Nanda's grokking work [LW · GW] was rejected twice from arXiv (!) but an updated version got a spotlight at ICLR. Redwood's adversarial training paper got a 3, a 5, and a 9 for its initial reviews. In fact, I know of several papers that got orals at conferences, that were rejected entirely from the previous conference. 

  4. ^

    I also feel like this is exacerbated by several social dynamics in the Bay Area, which I might eventually write a post about. 

  5. ^

    If there’s significant interest or if I feel like people are taking this advice too far, I’ll write a followup post giving the opposite advice.

10 comments

Comments sorted by top scores.

comment by Buck · 2023-02-05T19:50:16.558Z · LW(p) · GW(p)

I agree with a lot of this post.

Relatedly: in my experience, junior people wildly overestimate the extent to which senior people form confident and sticky negative evaluations of them. I basically never form a confident negative impression of someone's competence from a single interaction with them, and I place pretty substantial probability on people changing substantially over the course of a year or two.

I think that many people perform very differently in different job situations. When someone performs poorly in a job, I usually only update mildly against them performing well in a different role.

comment by Ansh Radhakrishnan (anshuman-radhakrishnan-1) · 2023-02-05T16:30:14.591Z · LW(p) · GW(p)

Thanks for this post Lawrence! I agree with it substantially, perhaps entirely.

One other thing that I thing interacts with the difficulty of evaluation in some ways is the fact that many AI safety researchers think that most of the work done by some other researchers is approximately useless, or even net-negative in terms of reducing existential risk. I think it's pretty easy to wrap an evaluation of a research direction or agenda and an evaluation of a particular researcher together. I think this is actually pretty justified for more senior researchers, since presumably an important skill is "research taste", but I think it's also important to acknowledge that this is pretty subjective and that there's substantial disagreement about the utility of different research directions among senior safety researchers. It seems probably good to try and disentangle this when evaluating junior researchers, as much as is possible, and instead try to focus on "core competencies" that are likely to be valuable across a wide range of safety research directions, though even then the evaluation of this can be difficult and noisy, as the OP argues.

comment by Neel Nanda (neel-nanda-1) · 2023-02-05T12:49:58.781Z · LW(p) · GW(p)

I appreciate this post, and vibe a lot!

Different jobs require different skills.

Very strongly agreed, I did 3 different AI Safety internships in different areas, where I think I was fairly mediocre in each, before I found that mech interp was a good fit.

Also strongly agreed on the self-evaluation point, I'm still not sure I really internally believe that I'm good at mech interp, despite having pretty solid confirmation from my research output at this point - I can't really imagine having it before completing my first real project!

comment by Jozdien · 2023-02-05T09:54:44.867Z · LW(p) · GW(p)

I think this post is valuable, thank you for writing it. I especially liked the parts where you (and Beth) talk about historical negative signals. To a certain kind of person, I think that can serve better than anything else as stronger grounding to push back against unjustified updating.

A factor that I think pulls more weight in alignment relative to other domains is the prevalence of low-bandwidth communication channels, given the number of new researchers whose sole interface with the field is online and asynchronous, textual or few-and-far-between calls. Effects from updating too hard on negative evals is probably amplified a lot when those form a bulk of the reinforcing feedback you get at all. To the point where at times for me it's felt like True Bayesian Updating from the inside even as you acknowledge the noisiness of those channels, because there's little counterweight to it.

My experience here probably isn't super standard given that most of the people I've mentored coming into this field aren't located near the Bay Area or London or anywhere else with other alignment researchers, but their sole point of interface to the rest of the field being a sparse opaque section of text has definitely discouraged some far more than anything else.

comment by Ben Pace (Benito) · 2023-02-05T19:59:51.634Z · LW(p) · GW(p)

As part of my work at Lightcone I manage an office space with an application for visiting or becoming a member, and indeed many of these points commonly apply to rejection emails I send to people, especially "Most applications just don’t contain that much information" and "Not all relevant skills show up on paper".

I try to include some similar things to the post in the rejection emails we send. In case it's of interest or you have any thoughts, here's the standard paragraph that I include:

Our application process is fairly lightweight and so I don't think a no is a strong judgment about a person's work. If you end up in the future working on new projects that you think are a good fit for Lightcone Offices, you're welcome to apply again. Also if you're ever collaborating on a project with a member of the Lightcone Offices, you can visit with them to work together. Good luck in finding traction on improving the trajectory of human civilization.

comment by DragonGod · 2023-02-07T12:46:20.782Z · LW(p) · GW(p)

At what point do you consider yourself a researcher and not just a noob, or someone who wants to one day become a researcher?

 

[This is actually a very important question for my self narrative; for how I relate to my AI safety writing, for what standards I expect of myself (is my AI safety writing currently a hobby that I hope to later turn into a job/or should I treat it as a volunteer job?), etc. I don't really have an answer, but I had mostly been thinking of myself as "someone who wants to one day become an AI safety researcher" (2022 shortened my timelines (suddenly I no longer had a decade to learn all the maths and CS before making useful contributions to alignment theory), and so I brought "one day" sooner, but I'm still at best "aspiring" to be one. 

Learning that an actual researcher™ I respected was younger than me was a massive slap to my face/wakeup call (we discovered LW at the same age/stage in our lives, so there's a sense in which I have a: "what was I doing with my life all this time?"/felt like I've fallen behind [yeah, I am status brainkilled]).]

comment by Akash (akash-wasil) · 2023-02-06T14:43:45.414Z · LW(p) · GW(p)

Great post. I expect to recommend it at least 10 times this year. 

Semi-related point: I often hear people get discouraged when they don't have "good ideas" or "ideas that they believe in" or "ideas that they are confident would actually reduce x-risk." (These are often people who see the technical alignment problem as Hard or Very Hard).

I'll sometimes ask "how many other research agendas do you think meet your bar for "an idea you believe in" or "an idea that you are confident would actually reduce x-risk?" Often, when considering the entire field of technical alignment, their answer is <5 or <10. 

While reality doesn't grade on a curve [LW · GW], I think it has sometimes been helpful for people to reframe "I have no good ideas" --> "I believe the problem we are facing is Hard or Very Hard. Among the hundreds of researchers who are thinking about this, I think only a few of them have met the bar that I sometimes apply to myself & my ideas."

(This is especially useful when people are using a harsher bar to evaluate themselves than when they evaluate others, which I think is common).

comment by Review Bot · 2024-03-12T12:34:19.445Z · LW(p) · GW(p)

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

comment by sudo · 2023-02-07T09:05:30.925Z · LW(p) · GW(p)

I’m a fan of this post, and I’m very glad you wrote it.

comment by Dennis Akar (British_Potato) · 2023-02-06T17:53:04.716Z · LW(p) · GW(p)

I have been feeling extremely impostery lately and do agree on the critical self-evaluation tendency. For the last month or so I felt entirely stuck with even the idea of an application giving me severe anxiety. Have been overcoming this slightly lately but I think this post and the conversations it caused has made em feel better. Thank you.