Minor interpretability exploration #1: Grokking of modular addition, subtraction, multiplication, for different activation functions

post by Rareș Baron · 2025-02-26T11:35:56.610Z · LW · GW · 2 comments

Contents

  Introduction
    TL;DR results
  Methods
  Results
    Operations
    Activation functions
  Discussion
  Conclusion
  Acknowledgements
None
2 comments

Epistemic status: small exploration without previous predictions, results low-stakes and likely correct.

Introduction

As a personal exercise for building research taste and experience in the domain of AI safety and specifically interpretability, I have done four minor projects, all building upon code previously written. They were done without previously formulated hypotheses or expectations, but merely to check for anything interesting in low-hanging fruit. In the end, they have not given major insights, but I hope they will be of small use and interest for people working in these domains.

This is the first project: extending Neel Nanda’s modular addition network [LW · GW], made for studying grokking, to subtraction and multiplication, as well as to all 6 activation functions of TransformerLens (ReLU, 3 variants of GELU, SiLU, and SoLU plus LayerNorm).

The modular addition grokking results have been redone using the original code, while changing the operation (subtraction, multiplication), and the activation function.

TL;DR results

Methods

The basis for these findings is Neel Nanda’s grokking notebook. All modifications are straight-forward.

All resulted notebooks, extracted graphs, and word files with clean, tabular comparisons can be found here.

Results

Operations

General observations for the three operations[1]:

Activation functions

Specific observations for the activation functions (ReLU, the three GELUs, SiLU, SoLU with LayerNorm) - they apply for all operations unless otherwise specified:

Discussion

Checking if the modular addition Fourier-based algorithm appears for other simple arithmetic operations is a small test of universality. Extending the work to other activation functions used in transformers is similar.

GELU hastens grokking, though it makes the process messier. The quadratic, smooth, negative aspect of the function is a large help in forming circuits.

That multiplication does not use the Fourier algorithm is a problem for universality, though the fact that NNs can do multiplication innately means this is likely not an issue or relevant bits of evidence against it.

Other interpretability hypotheses are untouched by this. We also have yet another confirmation that LN hinders interpretability.

Conclusion

Multiplication does not use a Fourier algorithm, and GELU helps grokking. More research might be needed for multiplication and its algorithm.

Acknowledgements

I would like to thank the wonderful Neel Nanda et al. for starting this research direction, establishing its methods, and writing the relevant code.
 

  1. ^

     The graphs are for ReLU, though these observations apply to all functions. Graphs are, in order: addition, subtraction, multiplication.

2 comments

Comments sorted by top scores.

comment by Gurkenglas · 2025-02-26T13:59:54.325Z · LW(p) · GW(p)

Some of these plots look like they ought to be higher resolution, especially when Epoch is on the x axis. Consider drawing dots instead of lines to make this clearer.

Replies from: Rareș Baron
comment by Rareș Baron · 2025-02-26T14:12:41.086Z · LW(p) · GW(p)

I will keep that in mind for the future. Thank you!
I have put all high-quality .pngs of the plots in the linked Drive folder.