0 comments

Comments sorted by top scores.

I think you made a off by 100 error in Unlabeled Evaluation with all win rates <1%

Replies from: bruce-lee

Thanks for pointing that out. Sometimes, the rows will not add up to 100 because there were some responses where the model refused to answer.

No. By off by 100 I meant of by a factor of 100 to small, NOT that they don't sum up to 100.

Replies from: bruce-lee

Yeah, I see it. It's fixed now. Thanks!

This means that LLMs can inadvertently learn to replicate these biases in their outputs.

Or the network learns to trust more the tokens that were already "thought about" during generation.

Replies from: bruce-lee

How is this possible? We are only inferencing