Posts
Mistral Large 2 (123B) exhibits alignment faking
2025-03-27T15:39:02.176Z
Reducing LLM deception at scale with self-other overlap fine-tuning
2025-03-13T19:09:43.620Z
Science advances one funeral at a time
2024-11-01T23:06:19.381Z
Self-prediction acts as an emergent regularizer
2024-10-23T22:27:03.664Z
The case for a negative alignment tax
2024-09-18T18:33:18.491Z
Self-Other Overlap: A Neglected Approach to AI Alignment
2024-07-30T16:22:29.561Z
Video Intro to Guaranteed Safe AI
2024-07-11T17:53:47.630Z
AE Studio @ SXSW: We need more AI consciousness research (and further resources)
2024-03-26T20:59:09.129Z