Unsupervised Methods for Concept Discovery in AlphaZero

post by aogara (Aidan O'Gara) · 2023-10-26T19:05:57.897Z · LW · GW · 0 comments

This is a link post for https://arxiv.org/abs/2310.16410

Using contrast pairs, the authors extract linear directions in the activation space of AlphaZero which correspond to concepts. By observing AlphaZero's play in situations that use these concepts, human grandmasters can improve their own play. 

This is related to the following recent research:

Collin Burns has argued [LW · GW] that unsupervised methods for concept discovery should scale to superhuman systems, offering an empirical average-case approach to ELK

AlphaZero has superhuman performance and may not use the same ontology as human players. This creates an empirical opportunity to study the problem of ontology mismatch

Section 4.1 describes the method for constructing contrast pairs and finding linear directions representing concepts. The full paper can be found here

0 comments

Comments sorted by top scores.