EIS XII: Summary

post by scasper · 2023-02-23T17:45:55.973Z · LW · GW · 0 comments

Contents

    TAISIC = “the AI safety interpretability community”
    MI = “mechanistic interpretability” 
  A Prequel: Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks (Räuker et al., 2022)
  EIS I: Intro
  EIS II: What is “Interpretability”?
  EIS III Broad Critiques of Interpretability Research
  EIS IV: A Spotlight on Feature Attribution/Saliency
  EIS V: Blind Spots In AI Safety Interpretability Research
  EIS VI: Critiques of Mechanistic Interpretability Work in AI Safety
  EIS VII: A Challenge for Mechanists
  EIS VIII: An Engineer’s Understanding of Deceptive Alignment
  EIS IX: Interpretability and Adversaries
  EIS X: Continual Learning, Modularity, Compression, and Biological Brains
  EIS XI: Moving Forward
  Thanks
None
No comments

Part 12 of 12 in the Engineer’s Interpretability Sequence [? · GW].

TAISIC = “the AI safety interpretability community”

MI = “mechanistic interpretability” 

There might be some addenda later, but for now, this is the final post in The Engineer’s Interpretability Sequence. I hope you have found it interesting and have gotten some useful ideas. I will always be happy to talk to people about the topics from this sequence in the comments or via email. For now, the last thing I will do is offer a summary of key points post by post :)

A Prequel: Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks (Räuker et al., 2022)

EIS I: Intro [? · GW]

EIS II: What is “Interpretability”? [? · GW]

EIS III Broad Critiques of Interpretability Research [? · GW]

EIS IV: A Spotlight on Feature Attribution/Saliency [? · GW]

EIS V: Blind Spots In AI Safety Interpretability Research [? · GW]

EIS VI: Critiques of Mechanistic Interpretability Work in AI Safety [? · GW]

EIS VII: A Challenge for Mechanists [? · GW]

EIS VIII: An Engineer’s Understanding of Deceptive Alignment [? · GW]

EIS IX: Interpretability and Adversaries [? · GW]

EIS X: Continual Learning, Modularity, Compression, and Biological Brains [? · GW]

EIS XI: Moving Forward [? · GW]

Thanks

I hope you enjoyed this sequence and found some useful ideas. Let me know if you’d like to talk about interpretability,  adversaries, etc. sometime. 

I am very thankful for TAISIC and others in the AI safety space for doing important and interesting work. For me personally, TAISIC members have been excellent sources of inspiration and collaboration, and I’m glad to be a part of this community. 


 

0 comments

Comments sorted by top scores.