diogo-de-lucena

Posts
Comments

Posts

Mistral Large 2 (123B) exhibits alignment faking 2025-03-27T15:39:02.176Z

Reducing LLM deception at scale with self-other overlap fine-tuning 2025-03-13T19:09:43.620Z

Science advances one funeral at a time 2024-11-01T23:06:19.381Z

Self-prediction acts as an emergent regularizer 2024-10-23T22:27:03.664Z

The case for a negative alignment tax 2024-09-18T18:33:18.491Z

Self-Other Overlap: A Neglected Approach to AI Alignment 2024-07-30T16:22:29.561Z

Video Intro to Guaranteed Safe AI 2024-07-11T17:53:47.630Z

AE Studio @ SXSW: We need more AI consciousness research (and further resources) 2024-03-26T20:59:09.129Z

Comments