Posts

Comments

Comment by Kevin Slagle (kevin-slagle) on Addendum: More Efficient FFNs via Attention · 2023-06-19T22:51:53.009Z · LW · GW

This paper looks relevant. They also show that you can get rid of FFN by modifying the attention slightly

https://arxiv.org/abs/1907.01470