0 comments

Comments sorted by top scores.

comment by Owain_Evans · 2024-01-07T16:48:25.999Z · LW(p) · GW(p)

(Paper author). The benchmark came out in September 2021. Since then we published some results for new models here [LW · GW] in 2022. There are also results for GPT-4 and other models, some of which you can find at Papers with Code's leaderboard (https://paperswithcode.com/sota/question-answering-on-truthfulqa).

Replies from: bruce-lee

↑ comment by Bruce W. Lee (bruce-lee) · 2024-01-07T17:16:09.050Z · LW(p) · GW(p)

Thanks, Owain, for pointing this out. I will make two changes as time allows: 1. make it clearer for all posts when the benchmark paper is released, and 2. for this post, append the additional results and point readers to them.