Posts

Comments

Comment by Vasilios Mavroudis (vasilios-mavroudis) on Recent AI model progress feels mostly like bullshit · 2025-04-07T12:56:42.730Z · LW · GW

Interesting article! Here is a benchmark that does real world exploitation (indeed CTFs don't cut it anymore): https://github.com/alan-turing-institute/HonestCyberEval