Appendices to the live agendas

post by technicalities, Stag · 2023-11-27T11:10:32.187Z · LW · GW · 4 comments

Contents

  Appendix: Prior enumerations
  Appendix: Graveyard
  Appendix: Biology for AI alignment
    Human enhancement 
    Merging 
    As alignment aid 
  Appendix: Research support orgs
  Appendix: Meta, mysteries, more
None
4 comments

Lists cut from our main post [LW · GW], in a token gesture toward readability.

We list past reviews of alignment work, ideas which seem to be dead, the cool but neglected neuroscience / biology approach, various orgs which don't seem to have any agenda, and a bunch of things which don't fit elsewhere.

 

Appendix: Prior enumerations

Appendix: Graveyard

Appendix: Biology for AI alignment

Lots of agendas but not clear if anyone besides Byrnes and Thiergart are actively turning the crank. Seems like it would need a billion dollars.
 

Human enhancement 

Merging 

As alignment aid 


Appendix: Research support orgs

One slightly confusing class of org is described by the sample {CAIF, FLI}. Often run by active researchers with serious alignment experience, but usually not following an obvious agenda, delegating a basket of strategies to grantees, doing field-building stuff like NeurIPS workshops and summer schools.
 

CAIF 

AISC

 

 

See also:

Appendix: Meta, mysteries, more

4 comments

Comments sorted by top scores.

comment by Alex_Altair · 2023-11-28T18:09:57.659Z · LW(p) · GW(p)

Honestly this isn't that long, I might say to re-merge it with the main post. Normally I'm a huge proponent of breaking posts up smaller, but yours is literally trying to be an index, so breaking a piece off makes it harder to use.

Replies from: technicalities
comment by technicalities · 2023-11-29T09:31:29.689Z · LW(p) · GW(p)

yeah you're right

comment by Steven Byrnes (steve2152) · 2023-11-27T13:34:24.808Z · LW(p) · GW(p)

For what it’s worth, I am not doing (and have never done) any research remotely similar to your text “maybe we can get really high-quality alignment labels from brain data, maybe we can steer models by training humans to do activation engineering fast and intuitively”.

I have a concise and self-contained summary of my main research project here (Section 2) [LW · GW].

Replies from: technicalities
comment by technicalities · 2023-11-27T13:37:13.115Z · LW(p) · GW(p)

I care a lot! Will probably make a section for this in the main post under "Getting the model to learn what we want", thanks for the correction.