Posts
Comments
Comment by
Lukas Petersson (lukas-petersson-1) on
Which evals resources would be good? ·
2024-11-16T22:13:00.354Z ·
LW ·
GW
This is probably not the first barrier to getting into evals, but I have an AI safety startup that designs evals. However, we don't have the capacity to also do good elicitation. I think we lose a lot of signal from our evals because our agent is too weak to explore properly. We're currently using Inspect's basic_agent
. Metr's modular_public
is better, but we prefer inspect
over vivaria
otherwise. I think open-sourcing a better agent would be positive for the evals community without contributing to capabilities.