Posts

Mistral Large 2 (123B) exhibits alignment faking 2025-03-27T15:39:02.176Z
Reducing LLM deception at scale with self-other overlap fine-tuning 2025-03-13T19:09:43.620Z
Science advances one funeral at a time 2024-11-01T23:06:19.381Z
Self-prediction acts as an emergent regularizer 2024-10-23T22:27:03.664Z
The case for a negative alignment tax 2024-09-18T18:33:18.491Z
Self-Other Overlap: A Neglected Approach to AI Alignment 2024-07-30T16:22:29.561Z
Video Intro to Guaranteed Safe AI 2024-07-11T17:53:47.630Z
AE Studio @ SXSW: We need more AI consciousness research (and further resources) 2024-03-26T20:59:09.129Z

Comments