Posts

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback 2024-11-07T15:39:06.854Z

Comments

Comment by Constantin Weisser (constantin-weisser) on Implications of the inference scaling paradigm for AI safety · 2025-01-14T10:41:08.819Z · LW · GW
Comment by Constantin Weisser (constantin-weisser) on Implications of the inference scaling paradigm for AI safety · 2025-01-14T10:40:25.573Z · LW · GW