Announcing the Distillation for Alignment Practicum (DAP)

post by Jonas Hallgren, CallumMcDougall (TheMcDouglas) · 2022-08-18T19:50:31.371Z · LW · GW · 3 comments

Contents

  Practicum structure
  Give us feedback, please
  Links to curriculum and application
  More detail on the structure
  Thanks for the help
None
4 comments

Hey, fellow LessWrongers!

Around the time of John Wentworth's post A Call for Distillers, [LW · GW] we started to work on a distillation course which now has turned into a practicum. (A practicum is more close to a study circle than a course and a lot more open-ended)

Five months later and timely to the launch of the CAIS competition [LW · GW], we're launching the trial run of DAP! (Distillation for Alignment Practicum), starting on the week of September 16th. The course is six weeks long and structured similarly to AGISF, with core readings and exercises during and in-between sessions. 

The difference is that we're going to have slightly smaller cohorts containing 3-4 fellows (along with one facilitator), and we're also focusing on letting you develop your distillations. Through this process, we hope you will explore your views of what distillation is helpful for and what techniques might be practical when creating a distillation.

Practicum structure

Below, you can see our proposed loop for how we think of improving your distilling skills. It is essentially the scientific method. The first two weeks of the course are about working on a hypothesis on what makes a helpful distillation and what techniques you could use to implement the values you identify in the first week in practice. You then pick a distillation and apply the methodology you came up with in practice. Lastly, in week six, you review how it went and whether your models could have been improved for the specific distillation you were doing. (For more detail, check out the end of the post)

The hope is that this gives a new meta methodology for improving your writing. As this is a practicum, we aim to provide an open space for discussing what makes a distillation great, as we're not sure of it ourselves. 

We hope this will provide a space for people to continuously grow and become great distillers!

Give us feedback, please

It is the first iteration of the distillation practicum, so if you have any ways you can think to improve it, we're very open to feedback! We're specifically looking for you to do one or more of the following:

  1. Roast the living crap out of the current exercises (Trial by fire)
  2. Come up with new exercises that would be better than the existing ones 
  3. Suggest things which might be missing in the general research loop
  4. Come up with improvements to exercises

You can send this to us privately on LW or do it through whatever means you feel like, for example, while partaking in the practicum! We're looking for around 3-5 cohorts in the trial run, and we might expand it after, depending on how it goes. 

Here's the link to the current version of the curriculum (as we're still in the alpha phase, the curriculum will most likely change before the beginning of September): DAP curriculum

Here's the application form: https://airtable.com/shrckaJjpnp3RZGTQ

In general, we're looking for people who have finished AGISF/ have equivalent knowledge and who are interested in distilling a work.

More detail on the structure

Week 1 will focus on why distillations are valuable and what they contribute to the broader alignment community. We hope that by the end of the week, you will have a better idea of how to assess the impactfulness of a distillation.

Week 2 will look at how you structure and write a distillation. We will read about the different research & writing processes of successful people in the EA and AI safety communities. We will also examine specific foundational papers in various technical fields, and explore what made them so compelling - from the writing techniques to the overall structure.

The next three weeks will mainly focus on writing distillations. You will put the lessons from weeks 1 & 2 into practice while you write, and the discussion sessions will be an opportunity to get feedback and share thoughts.

Week 3 focuses on understanding the material in the first place. We will discuss skills such as performing literature reviews, breaking down and understanding complex concepts, forming gears-level models, etc.

Weeks 4 & 5 will be mainly devoted to writing your distillation. We hope that you will have a minimum viable product by the end of week 4, and it will be fleshed out into a full post by the end of week 5.


Week 6 will be an opportunity to update your distillation models and how you might improve them in the future. We will also reflect on the next steps you can take towards doing distillation work for a living in the most impactful way for the alignment field.

Thanks for the help

(Personal note from Jonas): Lastly I wanted to thank everyone who helped us develop the course and gave me the support and ideas needed to get it off the ground. A special thanks to Jamie Bernardi for being a chill dude and supporting this shit in its conception stages. Thanks to Kay Kozaronek, Rudolf Laine, and Oliver Zhang for reviewing the curriculum and giving great support along the way (you're superb). Thanks to John Wentworth for telling us to rebuild the practicum from scratch; it would have been a lot worse without your input. Huge thanks to Callum Mcdougall for being a great co-lead in turning this practicum into something tangible; I seriously couldn't have done it without you! (thanks man, you're great). Finally, thanks to everyone else who gave feedback and supported DAP along the way; you're simply fantastic. 

We hope you enjoy this practicum!

3 comments

Comments sorted by top scores.

comment by TW123 (ThomasWoodside) · 2022-08-18T22:47:51.281Z · LW(p) · GW(p)

I think the meaning of "distillation" is used differently by different people, and this confuses me. Originally (based on John Wentworth's post) I thought that "distillation" meant:

"Take existing ideas from a single existing work (or from a particular person) and present them in a way that is more understandable or more concise."

That's also the definition seemingly used by the Distillation Contest [LW · GW].

But then you give these examples in your curriculum:

Bushwackers:

  • Explain how a sharp left turn would look like in current ML paradigms
  • Explain the connection between Agent Foundations and ELK

Rosetta Scribes

  • Interpretability research -> Chaos Theory -> Interpretability research
  • Content extrapolation -> Causality theory (causal inference)
  • e.t.c  - open-ended and lose format, what field to translate to is probably very dependent on the problem

Field Mapping

  • Map out the timelines to AGI and identify the intersections and state the arguments for why we will go one down one road vs the other at each intersection
  • Systematically investigate the field with a set of assumptions about the road ahead and look at what research methodologies past the test (like Nate Soares did for MIRIs arguments about a sharp left turn.)
  • Other research methodology that elucidates where we should be going

Propagators

Trailblazers

  • Explain every concept in AI alignment using QCD?
  • Come up with new ways of doing distillations here?

It seems like you mean something more like:

"Write something understandable that presents ideas in an intuitive way and possibly draws from many different works"

But in that case, I am not sure how this is different from "conceptual research where you try hard to present your work in an understandable way." In which case, the meaning of "distillation" has become hopelessly stretched.

Could you include a clear definition of "distillation," such that it includes clear examples of what is and isn't considered a distillation? I would ask you to write a distillation of what a distillation is, but I don't know if I'd be using the term distillation correctly.

Replies from: Jonas Hallgren
comment by Jonas Hallgren · 2022-08-18T23:51:23.220Z · LW(p) · GW(p)

First and foremost, my confidence in the descriptions of different distillation methods is pretty low. It is a framework I've thrown together from discussions on what an optimal science communication landscape would look like. It is in its initial phases and will most likely be imperfect for quite some time as finding the optimal communication landscape is a difficult problem. 

Secondly, Great point! I think that my thinking of it, is as a "reinterpretation of existing research." The basic way of doing this is rewriting a post for higher clarity which is the classical way that a distillation is viewed from. 

I think there are more ways of doing this and that the space is underexplored. In terms of the terminology proposed in the course, a "classic" distillation is some combination of what I would describe as propagating and bushwacking.

Bushwacking would be more something like asking, "what the f*ck is going on here?" which might be relevant for things such as infra-bayesianism (I want to learn infra-bayesianism can someone please bushwack this). 

Propagating would be more of what Rob Miles is doing. 

So what is distillation? What is the superclass of all of these? 

I would phrase it like the following "A distillation is a work that takes existing research and reinterprets it in a new light." 
 

Finally, a meta point in defence of the introduction of new jargon. I think the term distillation is confusing in itself as it can mean a lot of things, and therefore if you say, "I'm bushwhacking this post" you get the idea that "ah, this person is cutting down the weeds of what is a confusing post". I hope to introduce new methodology so it is easier to understand what type of distillation someone is doing. (I don't think this terminology is optimal, but it's a start in the right direction IMO.)

Replies from: jskatt
comment by JakubK (jskatt) · 2022-08-19T20:07:11.237Z · LW(p) · GW(p)

(I want to learn infra-bayesianism can someone please bushwack this). 

You might be interested in this post [LW · GW].

comment by zeshen · 2022-08-18T22:44:38.950Z · LW(p) · GW(p)