From gradient descent to GPT — every equation proven, every line of code written from zero, every research nuance unpacked to the paper.
hover the rows · toggle causal masking · drag the temperature
See the full derivation — Self Attention168 more widgets inside
Fourteen sections, each a self-contained arc. Pick a topic — the section page opens with every lesson in order, blurbed and ranked by difficulty.
The calculus and linear algebra behind every neural net
Neurons, layers, and backprop — wired by hand
Swap NumPy for autograd and GPUs
The loop, the diagnostics, the first real model
Filters, feature maps, and the architectures that taught machines to see
Sequence modeling before attention — and the problems that motivated it
From bag-of-words to dense meaning vectors
The single mechanism that reshaped deep learning
A working GPT, built lesson by lesson
From a base model to an aligned, instruction-following assistant
Sparse activation — the next axis of scale
Generate images by learning to reverse noise
Learn from reward signals — the algorithms behind AlphaGo and RLHF
Ship the model — make it fast, cheap, and production-ready