Which evals resources would be good?

post by Marius Hobbhahn (marius-hobbhahn) · 2024-11-16T14:24:48.012Z · LW · GW · 0 comments

Contents

  Future plans and missing resources 
None
No comments

I want to make a serious effort to create a bigger evals field. I’m very interested in which resources you think would be most helpful. I’m also looking for others to contribute and potentially some funding. 

  1. Which resources would be most helpful? Suggestions include:
    1. Open Problems in Evals list: A long list of open relevant problems & projects in evals.
    2. More Inspect Tutorials/Examples: Specifically, examples with deep explanations, examples of complex agent setups, and more detailed examples of the components around the evals.
    3. Evals playbook: A detailed guide on how to build evals with detailed examples for agentic and non-agentic evals.
    4. Salient demos: I don’t even necessarily want to build scary demos, I just want normal people to be less hit-by-a-truck when they learn about LM agent capabilities in the near future.
    5. More “day-to-day” evals process resources: A lot of evals expertise is latent knowledge that professional evaluators have accumulated over time. 
    6. An evals slack
    7. Talk through the most important evals papers
    8. Evals workshops and conferences
  2. Are you interested in contributing to any of these efforts?
    1. Right now I think the following types of contributions would be helpful:
      1. People who want to contribute more detailed Inspect tutorials / Examples as described here. I think this is a great opportunity for people who are considering working full-time on evals and want to test fit / make a name for themselves. 
      2. Someone who wants to help me coordinate these efforts part time or full-time. This would be a mix of playing around with evals yourself, coordinating with other actors in evals and probably a decent amount of writing.
    2. I’m very interested in other contributions as well. 
    3. If you’re interested, fill out this form
  3. Is anyone interested in funding these efforts?
    1. I’d be happy to function as a regrantor. I wouldn’t take a cut myself. I imagine most grants would be between $5-15k to pay for someone building resources full-time for a few months. 
    2. I’m also happy to be more of a coordinator / filter and forward people to funders if a funder is willing to do fast-turnaround grants. 
    3. I think the total amount of funding that would be helpful to get this off the ground is $50k-100k for the next year.

 

I wrote a more detailed post here (relevant parts copied below) but I’m primarily interested in other people’s ideas and preferences.

 

##########################################################

 

Future plans and missing resources 

Note: the following is largely “copy-paste what Neel Nanda did for the mechanistic interpretability community but for evals”. I think his work was great for the field, so why not give it a shot?

Broadly speaking, I want to make it easy and attractive for people to get involved with evals.

  1. Easy: Optimally, getting into evals should be as simple as logging into Netflix - one click, and you’re instantly part of a curated experience. There should be plenty of resources that feel enjoyable and neither too easy nor too hard for your level of experience.
  2. Attractive: When I first read through Neel Nanda’s A Mechanistic Interpretability Analysis of Grokking [LW · GW], I was hooked. I just needed to understand why the network is grokking. I couldn’t help myself, and spent a few days replicating the results and answering some additional questions about grokking. I didn’t do it because I had carefully reasoned that this was the most effective use of my time, I did it because I was intrigued. I think there are many similar situations for evals, e.g. when an LM agent unexpectedly solves a hard eval and you feel like you need to understand what else it is capable of, when it solves the eval in a clever way you hadn’t anticipated, or when you find a new powerful jailbreak. I think so far, the evals field has not done a good job conveying these aspects and that should be changed.

I think the following resources would be good:

If you’re keen to be involved in any of the above, please reach out. In case you want to spend a few months on producing evals materials, I might be able to find funding for it (but no promises). If you’re a funder and want to support these kinds of efforts, please contact me. I would only serve as a regrantor and not take any cut myself.

0 comments

Comments sorted by top scores.