Embed, encode, attend, predict: The new deep learning formula for state-of-the-art NLP models
post by morganism · 2016-11-12T21:33:03.380Z · LW · GW · Legacy · 1 commentsThis is a link post for https://explosion.ai/blog/deep-learning-formula-nlp
1 comments
Comments sorted by top scores.
comment by morganism · 2016-11-12T21:36:17.228Z · LW(p) · GW(p)
"The main factor that drives the model's accuracy is the bidirectional LSTM encoder, to create the position-sensitive features. The authors demonstrate this by swapping the attention mechanism out for average pooling. With average pooling, the model still outperforms the previous state-of-the-art on all benchmarks. However, the attention mechanism improves performance further on all evaluations. I find this especially interesting. The implications are quite general — there are after all plenty of situations where you want to reduce a matrix to a vector for further prediction, without reference to any particular external context.