ml ›section 06 of 14

RNN & LSTM

Sequence modeling before attention — and the problems that motivated it

5 lessons·2medium3hard

Lessons

in order

Hidden state, shared weights, sequential processing.

Unroll the loop to compute gradients across a sequence.

Why long sequences kill plain RNNs — analytically.

Gates, cell state, and the first real fix for long memory.

A lighter LSTM that often matches it.