AI Safety Institute's Inspect hello world example for AI evals
post by TheManxLoiner · 2024-05-16T20:47:44.346Z · LW · GW · 0 commentsThis is a link post for https://lovkush.medium.com/evaluating-llms-using-uk-ai-safety-institutes-inspect-framework-96435c9352f3
Contents
No comments
Sharing my detailed walk-through on using the UK AI Safety Institute's new open source package Inspect for AI evals.
Main points:
- Package released in early May 2024 is here: https://github.com/UKGovernmentBEIS/inspect_ai
- Seems easy to use and removes boiler-plate code. I am new to evals so I do not know what experienced researchers would look for in such a tool. I am curious to know what others think of it!
- There is one unusual behaviour around whether what they call 'scorer' should be independent of what they call 'plan'. I raised an issue about this on GitHub and would be very interested to know what others in AI safety community think of this detail.
0 comments
Comments sorted by top scores.