The single mechanism that reshaped deep learning
Queries, keys, values — derived and animated.
Parallel attention heads specializing on different patterns.
Attention + MLP + norms + residuals — one layer.