OC ACXLW AI interpretability Breakthrough from anthropic 11/11/23

post by Michael Michalchik (michael-michalchik) · 2023-11-08T21:30:53.680Z · ? · GW · 0 comments

Contents

No comments

Hello Folks!

We are excited to announce the 48th Orange County ACX/LW meetup, happening this Saturday and most Saturdays thereafter.


 

Host: Michael Michalchik


 

Email: michaelmichalchik@gmail.com (For questions or requests)

Location: 1970 Port Laurent Place 

(949) 375-2045

Date: Saturday, Nov 11, 2023

Time: 2 PM

 



 

Conversation Starters :

The first concrete step towards AI alignment and safety and our ability to make it highly useful?! 


 

Journal club video:
https://youtu.be/hlCxSqWS6Rw?si=ONEGDAE3QiCPkWi-

Community Paper Reading: Decomposing Language Models Into Understandable Components


 

Short paper walkthrough:

https://youtu.be/HAxd8DoZaW4?si=X8kt9pRKHKKv1kBK

Anthropic Solved Interpretability?


 

The Paper itself: https://transformer-circuits.pub/2023/monosemantic-features/index.html


 

Zvi Moshowitz reports on the Paper: 

https://thezvi.substack.com/i/137705383/cool-new-interpretability-paper


 

Zvi Moshowitz reports on the reactions to the Paper:

 

https://thezvi.substack.com/i/137705383/so-what-do-we-all-think-of-the-cool-paper

This is a chatGPT glossary and brief overview of the ideas:

https://chat.openai.com/share/d5465786-5d11-4aa7-9d00-ed7d5b1ef94b


 



 

0 comments

Comments sorted by top scores.