Posts
Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google
2025-02-07T03:57:30.904Z
GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning
2024-11-01T00:10:50.718Z
Even Superhuman Go AIs Have Surprising Failure Modes
2023-07-20T17:31:35.814Z