Posts

Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google 2025-02-07T03:57:30.904Z
GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning 2024-11-01T00:10:50.718Z
Even Superhuman Go AIs Have Surprising Failure Modes 2023-07-20T17:31:35.814Z

Comments