Posts
Comments
Alignment approaches at different abstraction levels (e.g., macro-level interpretability, scaffolding/module-level AI system safety, systems-level theoretic process analysis for safety) is something I have been hoping to see more of. I am thrilled by this meta-level red-teaming work and excited to see the announcement of the new team.
Hey, great stuff -- thank you for sharing! I especially found this useful as somebody who has been "out" of alignment for 6 months and is looking to set up a new research agenda.