Posts
Comments
Comment by
Kevin Slagle (kevin-slagle) on
Addendum: More Efficient FFNs via Attention ·
2023-06-19T22:51:53.009Z ·
LW ·
GW
This paper looks relevant. They also show that you can get rid of FFN by modifying the attention slightly
https://arxiv.org/abs/1907.01470