Posts

A Chess-GPT Linear Emergent World Representation 2024-02-08T04:25:15.222Z

Comments

Comment by karvonenadam on A Chess-GPT Linear Emergent World Representation · 2024-02-09T02:56:02.272Z · LW · GW

That's an interesting idea, I may test that out at some point. I'm assuming the softmax would be for kings / queens, where there is typically only one on the board, rather than for e.g. blank squares or pawns?

Comment by karvonenadam on A Chess-GPT Linear Emergent World Representation · 2024-02-09T02:54:58.206Z · LW · GW

The all stockfish data engine played at a level that was 100-200 Elo higher in my tests, with a couple caveats. First, I benchmarked the LLMs against stockfish, so an all stockfish dataset seems helpful for this benchmark. Secondly, the stockfish LLM would probably have an advantage for robustness because I included a small percentage of stockfish vs random move generator games in the stockfish dataset in the hopes that it would improve its ability.

I haven't done an in depth qualitative assessment of their abilities to give a more in depth answer unfortunately.

Comment by karvonenadam on A Chess-GPT Linear Emergent World Representation · 2024-02-09T02:48:34.791Z · LW · GW

Yes, in this recent OpenAI superalignment paper they said that GPT-4's training dataset included a dataset of chess games filtered for players with greater than 1800 Elo. Given gpt-3.5-turbo-instruct's ability, I'm guessing that its dataset included a similar collection.