Posts
Comments
The code is currently not public. We intend to make it public once we have finished a few more projects with the same codebase. One of the things we would like to look at is varying the amount of noise. I don't have great intuitions for what the loss landscape of a model trained on a finite random dataset will look like.
As to the translational symmetry of the circuits, the measure just sums the absolute difference between adjacent elements parallel to the diagonal, does the same for elements perpendicular to the diagonal and takes the difference of the two sums. The intuition behind this is that if the circuit has translational symmetry, the relationship between vocabulary element i and j would be the same as the relationship i+1 and j+1. We subtract the lines perpendicular to the diagonal to avoid our measure becoming very large for a circuit that is just very uniform in all directions. We expect the circuits to have translational symmetry because we expect the sorting to work the same across all the vocabulary (except for the first and the last vocabulary). If you compare two numbers a and b for the purpose of sorting, the only thing that should matter is the difference between a and b, not their absolute scale. When a circuit for instance does something like "vocabulary elements attends to the smallest number larger than itself", that should only depend on the relationship difference between itself and all the numbers, not on their overall magnitude. I do agree that our translational symmetry measure is somewhat arbitrary, and that we instead could have looked at the standard deviation of lines parallel and perpendicular to the diagonal, or something like that. I expect that the outcome would have been largely the same.
As to how to interpret the circuits, Callum goes into some more detail on interpreting the final form of the baseline 2-head model here (select [October] Solutions in the menu on the left).
Thanks for the suggestion! You can access the still images that have been used to generate the gifs here. We have also added the link to the still images to the post!