OC ACXLW AI interpretability Breakthrough from anthropic 11/11/23
post by Michael Michalchik (michael-michalchik) · 2023-11-08T21:30:53.680Z · ? · GW · 0 commentsContents
No comments
Hello Folks!
We are excited to announce the 48th Orange County ACX/LW meetup, happening this Saturday and most Saturdays thereafter.
Host: Michael Michalchik
Email: michaelmichalchik@gmail.com (For questions or requests)
Location: 1970 Port Laurent Place
(949) 375-2045
Date: Saturday, Nov 11, 2023
Time: 2 PM
Conversation Starters :
The first concrete step towards AI alignment and safety and our ability to make it highly useful?!
Journal club video:
https://youtu.be/hlCxSqWS6Rw?si=ONEGDAE3QiCPkWi-
Community Paper Reading: Decomposing Language Models Into Understandable Components
Short paper walkthrough:
https://youtu.be/HAxd8DoZaW4?si=X8kt9pRKHKKv1kBK
Anthropic Solved Interpretability?
The Paper itself: https://transformer-circuits.pub/2023/monosemantic-features/index.html
Zvi Moshowitz reports on the Paper:
https://thezvi.substack.com/i/137705383/cool-new-interpretability-paper
Zvi Moshowitz reports on the reactions to the Paper:
https://thezvi.substack.com/i/137705383/so-what-do-we-all-think-of-the-cool-paper
This is a chatGPT glossary and brief overview of the ideas:
https://chat.openai.com/share/d5465786-5d11-4aa7-9d00-ed7d5b1ef94b
- Walk & Talk: We usually have an hour-long walk and talk after the meeting starts. Two mini-malls with hot takeout food are readily accessible nearby. Search for Gelson's or Pavilions in the zip code 92660.
- Share a Surprise: Tell the group about something unexpected that changed your perspective on the universe.
- Future Direction Ideas: Contribute ideas for the group's future direction, including topics, meeting types, activities, etc.
0 comments
Comments sorted by top scores.