Preparing for AI-assisted alignment research: we need data!

post by CBiddulph (caleb-biddulph) · 2023-01-17T03:28:29.778Z · LW · GW · 3 comments

3 comments

Comments sorted by top scores.

comment by jacquesthibs (jacques-thibodeau) · 2023-01-17T16:56:25.873Z · LW(p) · GW(p)

Heads up, we are starting to work on stuff like this in a discord server (DM for link) and I’ll be working on this stuff full-time from February to end of April (if not longer). We’ve talked about data collection a bit over the past year, but have yet to take the time to do anything serious (besides the alignment text dataset). In order to make this work, we’ll have to make it insanely easy on the part of the people generating the data. It’s just not going to happen by default. Some people might take the time to set this up for themselves, but very few do.

Glad to see others take interest in this idea! I think this kind of stuff has a very low barrier to entry for software engineers who want to contribute to alignment, but might want to focus on using their software engineering skills rather than trying to become a full-on researcher. It opens up the door for engineering work that is useful for independent researchers, not just the orgs.

And as I said in the survey results post:

We are looking to build tools now rather than later because it allows us to learn what’s useful before we have access to even more powerful models. Once GPT-(N-1) arrives, we want to be able to use it to generate extremely high-quality alignment work right out of the gate. This work involves both augmenting alignment researchers and using AI to generate alignment research. Both of these approaches fall under the “accelerating alignment” umbrella.

Ideally, we want these kinds of tools to be used disproportionately for alignment work in the first six months of GPT-(N-1)’s release. We hope that the tools are useful before that time but, at the very least, we hope to have pre-existing code for interfaces, a data pipeline, and engineers already set to hit the ground running.

comment by A_Posthuman · 2023-01-17T05:02:36.672Z · LW(p) · GW(p)

I think this is a good idea, and as someone who has recorded themselves 16 hrs day for 10+ years now I can say recording yourself gets to be very routine and easy.

comment by plex (ete) · 2023-01-19T18:37:19.406Z · LW(p) · GW(p)

Yes, this is a robustly good intervention on the critical path. Have had it on the Alignment Ecosystem Development ideas list for most of a year now.

Some approaches to solving alignment go through teaching ML systems about alignment and getting research assistance from them. Training ML systems needs data, but we might not have enough alignment research to sufficiently fine tune our models, and we might miss out on many concepts which have not been written up. Furthermore, training on the final outputs (AF posts, papers, etc) might be less good at capturing the thought processes which go into hashing out an idea or poking holes in proposals which would be the most useful for a research assistant to be skilled at.

It might be significantly beneficial to capture many of the conversations between researchers, and use them to expand our dataset of alignment content to train models on. Additionally, some researchers may be fine with having their some of their conversations available to the public, in case people want to do a deep dive into their models and research approaches.

The two parts of the system which I'm currently imagining addressing this are:

  1. An email where audio files can be sent, automatically run through Whsiper, and added to the alignment dataset github.
  2. Clear instructions for setting up a tool which captures audio from calls automatically (either a general tool or platform-specific advice), and makes it as easy as possible to send the right calls to the dataset platform.