Posts

Forecasting Frontier Language Model Agent Capabilities 2025-02-24T16:51:32.022Z
Do models know when they are being evaluated? 2025-02-17T23:13:22.017Z
Current safety training techniques do not fully transfer to the agent setting 2024-11-03T19:24:51.537Z
~80 Interesting Questions about Foundation Model Agent Safety 2024-10-28T16:37:04.713Z
Analyzing DeepMind's Probabilistic Methods for Evaluating Agent Capabilities 2024-07-22T16:17:07.665Z

Comments