Embed, encode, attend, predict: The new deep learning formula for state-of-the-art NLP models

post by morganism · 2016-11-12T21:33:03.380Z · LW · GW · Legacy · 1 comments

This is a link post for https://explosion.ai/blog/deep-learning-formula-nlp

1 comments

Comments sorted by top scores.

comment by morganism · 2016-11-12T21:36:17.228Z · LW(p) · GW(p)

"The main factor that drives the model's accuracy is the bidirectional LSTM encoder, to create the position-sensitive features. The authors demonstrate this by swapping the attention mechanism out for average pooling. With average pooling, the model still outperforms the previous state-of-the-art on all benchmarks. However, the attention mechanism improves performance further on all evaluations. I find this especially interesting. The implications are quite general — there are after all plenty of situations where you want to reduce a matrix to a vector for further prediction, without reference to any particular external context.