Posts

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback 2024-11-07T15:39:06.854Z
GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning 2024-11-01T00:10:50.718Z

Comments