Posts
Forecasting Frontier Language Model Agent Capabilities
2025-02-24T16:51:32.022Z
Do models know when they are being evaluated?
2025-02-17T23:13:22.017Z
Current safety training techniques do not fully transfer to the agent setting
2024-11-03T19:24:51.537Z
~80 Interesting Questions about Foundation Model Agent Safety
2024-10-28T16:37:04.713Z
Analyzing DeepMind's Probabilistic Methods for Evaluating Agent Capabilities
2024-07-22T16:17:07.665Z