Sequence modeling before attention — and the problems that motivated it
Hidden state, shared weights, sequential processing.
Unroll the loop to compute gradients across a sequence.
Why long sequences kill plain RNNs — analytically.
Gates, cell state, and the first real fix for long memory.
A lighter LSTM that often matches it.