0 comments
Comments sorted by top scores.
comment by Owain_Evans · 2024-01-07T16:48:25.999Z · LW(p) · GW(p)
(Paper author). The benchmark came out in September 2021. Since then we published some results for new models here [LW · GW] in 2022. There are also results for GPT-4 and other models, some of which you can find at Papers with Code's leaderboard (https://paperswithcode.com/sota/question-answering-on-truthfulqa).
Replies from: bruce-lee↑ comment by Bruce W. Lee (bruce-lee) · 2024-01-07T17:16:09.050Z · LW(p) · GW(p)
Thanks, Owain, for pointing this out. I will make two changes as time allows: 1. make it clearer for all posts when the benchmark paper is released, and 2. for this post, append the additional results and point readers to them.