AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0

james-fox

AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0

post by James Fox, Chloe Li (chloe-li-1), JamesH (AtlasOfCharts), Gracie Green, CallumMcDougall (TheMcDouglas) · 2024-07-06T11:34:57.227Z · LW · GW · 7 comments

  TL;DR
    Summary
  Outline of Content
    Chapter 0 - Fundamentals
    Chapter 1 - Transformers & Interpretability
    Chapter 2 - Reinforcement Learning
    Chapter 3 - Model Evaluation
    Chapter 4 - Capstone Project
  Call for Staff
  FAQ
      Q: Who is this program suitable for?
      Q: What will an average day in this program look like?
      Q: How many participants will there be?
      Q: Will there be prerequisite materials?
      Q: When is the application deadline?
      Q: What will the application process look like?
      Q: Can I join for some sections but not others?
      Q: Will you pay stipends to participants?
      Q: Which costs will you be covering for the in-person programme?
      Q: I'm interested in trialling some of the material or recommending material to be added. Is there a way I can do this?
  Link to Apply
None
7 comments

TL;DR

We are excited to announce the fourth iteration of ARENA (Alignment Research Engineer Accelerator), a 4-5 week ML bootcamp with a focus on AI safety! ARENA’s mission is to provide talented individuals with the skills, tools, and environment necessary for upskilling in ML engineering, for the purpose of contributing directly to AI alignment in technical roles. ARENA will be running in-person from LISA from 2nd September - 4th October (the first week is an optional review of the fundamentals of neural networks).

Apply here before 23:59 July 20th anywhere on Earth!

Summary

ARENA has been successfully run three times, with alumni going on to become MATS scholars and LASR participants; AI safety engineers at Apollo Research, Anthropic, METR, and OpenAI; and even starting their own AI safety organisations!

This iteration will run from 2nd September - 4th October (the first week is an optional review of the fundamentals of neural networks) at the London Initiative for Safe AI [LW · GW] (LISA) in Old Street, London. LISA houses small organisations (e.g., Apollo Research, BlueDot Impact), several other AI safety researcher development programmes (e.g., LASR Labs, MATS extension, PIBBS, Pivotal), and many individual researchers (independent and externally affiliated). Being situated at LISA, therefore, brings several benefits, e.g. facilitating productive discussions about AI safety & different agendas, allowing participants to form a better picture of what working on AI safety can look like in practice, and offering chances for research collaborations post-ARENA.

The main goals of ARENA are to:

Help participants skill up in ML relevant for AI alignment.
Produce researchers and engineers who want to work in alignment and help them make concrete next career steps.
Help participants develop inside views about AI safety and the paths to impact of different agendas.

The programme's structure will remain broadly the same as ARENA 3.0 (see below); however, we are also adding an additional week on evaluations.

For more information, see our website.

Also, note that we have a Slack group designed to support the independent study of the material (join link here).

Outline of Content

The 4-5 week program will be structured as follows:

Chapter 0 - Fundamentals

Before getting into more advanced topics, we first cover the basics of deep learning, including basic machine learning terminology, what neural networks are, and how to train them. We will also cover some subjects we expect to be useful going forward, e.g. using GPT-3 and 4 to streamline your learning, good coding practices, and version control.

Note: Participants can optionally skip the program this week and join us at the start of Chapter 1 if they'd prefer this option and if we're confident that they are already comfortable with the material in this chapter.

Topics include:

PyTorch basics
CNNs, Residual Neural Networks
Optimization (SGD, Adam, etc)
Backpropagation
Hyperparameter search with Weights and Biases
GANs & VAEs

Chapter 1 - Transformers & Interpretability

In this chapter, you will learn all about transformers and build and train your own. You'll also study LLM interpretability, a field which has been advanced by Anthropic’s Transformer Circuits sequence, and open-source work by Neel Nanda. This chapter will also branch into areas more accurately classed as "model internals" than interpretability, e.g. recent work on steering vectors.

Topics include:

GPT models (building your own GPT-2)
Training and sampling from transformers
TransformerLens
In-context Learning and Induction Heads
Indirect Object Identification [LW · GW]
Superposition
Steering Vectors [LW · GW]

Chapter 2 - Reinforcement Learning

In this chapter, you will learn about some of the fundamentals of RL and work with OpenAI’s Gym environment to run their own experiments.

Topics include:

Fundamentals of RL
Vanilla Policy Gradient
Proximal Policy Gradient
RLHF (& finetuning LLMs with RLHF)
Gym & Gymnasium environments

Chapter 3 - Model Evaluation

In this chapter, you will learn how to evaluate models. We'll take you through the process of building a multiple-choice benchmark of your own and using this to evaluate current models. We'll then move on to study LM agents: how to build them and how to evaluate them.

Topics include:

Constructing benchmarks for models
Using models to develop safety evaluations
Building pipelines to automate model evaluation
Building and evaluating LM agents

Chapter 4 - Capstone Project

We will conclude this program with a Capstone Project, where participants will receive guidance and mentorship to undertake a 1-week research project building on materials taught in this course. This should draw on the skills and knowledge that participants have developed from previous weeks and our paper replication tutorials.

Here is some sample material from the course on how to replicate the Indirect Object Identification paper (from the chapter on Transformers & Mechanistic Interpretability). An example Capstone Project might be to apply this method to interpret other circuits, or to improve the method of path patching.

Call for Staff

ARENA has been successful because we had some of the best in the field TA-ing with us and consulting with us on curriculum design. If you have particular expertise in topics in our curriculum and want to apply to be a TA, use this form to apply. TAs will be well compensated for their time. Please contact info@arena.education with any more questions.

FAQ

Q: Who is this program suitable for?

A: We welcome applications from people who fit most or all of the following criteria:

Care about AI safety and making future development of AI go well
Relatively strong maths skills (e.g. about one year's worth of university-level applied maths)
Strong programmers (e.g. have a CS degree/work experience in SWE or have worked on personal projects involving a lot of coding)
Have experience coding in Python
Would be able to travel to London for 4-5 weeks, starting September 2nd (or September 9th if skipping the intro week)
We are open to people of all levels of experience, whether they are still in school or have already graduated.

Note - these criteria are mainly intended as guidelines. If you're uncertain whether you meet these criteria, or you don't meet some of them but still think you might be a good fit for the program, please do apply! You can also reach out to us directly at info@arena.education.

Q: What will an average day in this program look like?

At the start of the program, most days will involve pair programming, working through structured exercises designed to cover all the essential material in a particular chapter. The purpose is to get you more familiar with the material in a hands-on way. There will also usually be a short selection of required readings designed to inform the coding exercises.

As we move through the course, some chapters will transition into more open-ended material. For example, in the Transformers & Interpretability chapter, after you complete the core exercises, you'll be able to choose from a large set of different exercises, covering topics as broad as model editing, superposition, circuit discovery, grokking, discovering latent knowledge, and more. In the last week, you'll choose a research paper related to the content we've covered so far & replicate its results (possibly even extend them!). There will still be TA supervision during these sections, but the goal is for you to develop your own research & implementation skills. Although we strongly encourage paper replication during this chapter, we would also be willing to support well-scoped projects if participants are excited about them.

Q: How many participants will there be?

We're expecting roughly 20-25 participants in the in-person program.

Q: Will there be prerequisite materials?

A: Yes, we will send you prerequisite reading & exercises covering material such as PyTorch, einops and some linear algebra (this will be in the form of a Colab notebook) a few weeks before the start of the program.

Q: When is the application deadline?

A: The deadline for submitting applications is July 20th, 11:59 pm anywhere on Earth.

Q: What will the application process look like?

A: There will be three steps:

Fill out the application form (this is designed to take <1 hour).
Perform a coding assessment.
Interview virtually with one of us, so we can find out more about your background and interests in this course.

Q: Can I join for some sections but not others?

A: Participants will be expected to attend the entire programme. The material is interconnected, so missing content would lead to a disjointed experience. We have limited space and, therefore, are more excited about offering spots to participants who can attend the entirety of the programme.

The exception to this is the first week, which participants can choose to opt in or out of based on their level of prior experience.

Q: Will you pay stipends to participants?

A: Unfortunately, we won't be able to pay stipends to participants. However, we will be providing housing & travel assistance to in-person participants (see below).

Q: Which costs will you be covering for the in-person programme?

A: We will cover all reasonable travel expenses (which will vary depending on where the participant is from) and visa assistance, where needed. Accommodation, meals, and drinks & snacks will also all be included.

Q: I'm interested in trialling some of the material or recommending material to be added. Is there a way I can do this?

A: If either of these is the case, please feel free to reach out directly via an EAForum/LessWrong message (or email info@arena.education) - we'd love to hear from you!

Link to Apply

Here is the link to apply as a participant. You should spend no more than one hour on it.

Here is the link to apply as staff. You shouldn’t spend longer than 30 minutes on it.

We look forward to receiving your application!

7 comments

Comments sorted by top scores.

comment by Gabe M (gabe-mukobi) · 2024-07-06T12:49:14.736Z · LW(p) · GW(p)

Congrats! Could you say more about why you decided to add evaluations in particular as a new week?

Replies from: chloe-li-1

↑ comment by Chloe Li (chloe-li-1) · 2024-07-06T22:26:01.192Z · LW(p) · GW(p)

It’s a fast-growing and important field right now - there is an urgency to make progress on eval, and a rapid increase in both technical safety eval roles at AI labs and governance roles. This need and capacity for safety evals make eval skills valuable for people who want to contribute to safety now. There are many methods that have been developed and relevant engineering skills to improve, but also a lot of minefields for producing false or misleading results. We thought the latter is an especially important reason for a good curriculum to exist

comment by Adrià Garriga-alonso (rhaps0dy) · 2024-07-09T14:57:18.485Z · LW(p) · GW(p)

Asking for an acquaintance. If I know some graduate-level machine learning, and have read ~most of the recent mechanistic interpretability literature, and have made good progress understanding a small-ish neural network in the last few months.

Is ARENA for me, or will it teach things I mostly already know?

(I advised this person that they already have ARENA-graduate level, but I want to check in case I'm wrong.)

Replies from: AtlasOfCharts

↑ comment by JamesH (AtlasOfCharts) · 2024-07-10T09:08:58.997Z · LW(p) · GW(p)

ARENA might end up teaching this person some mech-interp methods they haven't seen before, although it sounds like they would be more than capable of self-teaching any mech-interp. The other potential value-add for your acquaintance would be if they wanted to improve their RL or Evals skills, and have a week to conduct a capstone project with advisors. If they were mostly aiming to improve their mech-interp ability by doing ARENA, there would probably be better ways to spend their time.

comment by akshayaurora · 2024-07-29T22:04:19.746Z · LW(p) · GW(p)

ARENA has been successful because we had some of the best in the field TA-ing with us and consulting with us on curriculum design.

I personally love the ARENA curriculum, it has probably the single greatest resource that has helped me learning about current state of AI. I've also done couple of specializations on Coursera, but found the exercises lot easier - which also meant I didn't use everything I learnt in videos. On the contrary, all the ARENA exercises are challenging - but you also learn a lot more and it has definitely been more satisfying and rewarding journey for me.

comment by akshayaurora · 2024-07-29T19:42:01.636Z · LW(p) · GW(p)

Curious, if there are success stories to share from previous runs of ARENA? Example, maybe people publishing safety research or joining safety research labs.

Replies from: James Fox

↑ comment by James Fox · 2024-09-05T23:02:27.293Z · LW(p) · GW(p)

Sorry for not seeing this. Hopefully, the first paragraph of the summary answers this question. We're excited about running more ARENA iterations exactly because its track record has been pretty strong.

AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0

Contents

TL;DR

Summary

Outline of Content

Chapter 0 - Fundamentals

Chapter 1 - Transformers & Interpretability

Chapter 2 - Reinforcement Learning

Chapter 3 - Model Evaluation

Chapter 4 - Capstone Project

Call for Staff

FAQ

Q: Who is this program suitable for?

Q: What will an average day in this program look like?

Q: How many participants will there be?

Q: Will there be prerequisite materials?

Q: When is the application deadline?

Q: What will the application process look like?

Q: Can I join for some sections but not others?

Q: Will you pay stipends to participants?

Q: Which costs will you be covering for the in-person programme?

Q: I'm interested in trialling some of the material or recommending material to be added. Is there a way I can do this?

Link to Apply

7 comments