v1·75 lessons · 14 sections · 168+ widgets

Derive ityourself.

From gradient descent to GPT — every equation proven, every line of code written from zero, every research nuance unpacked to the paper.

75lessons·168+widgets·14sections

Start Deriving View Curriculum

∇Math firstevery gradient derived

⌥NumPy → PyTorchthree-layer code pattern

◈Live widgetspoke it until it breaks

this is what a lesson feels like

hover the rows · toggle causal masking · drag the temperature

See the full derivation — Self Attention

168 more widgets inside

ml ›curriculum · 14 sections · 75 lessons

The Full Roadmap

Fourteen sections, each a self-contained arc. Pick a topic — the section page opens with every lesson in order, blurbed and ranked by difficulty.

01 / 14·6 lessons

Math Foundations

The calculus and linear algebra behind every neural net

4easy2medium

Gradient Descent
Sigmoid & ReLU
+ 4 more

enter section

02 / 14·6 lessons

Build a Neural Net

Neurons, layers, and backprop — wired by hand

1easy3medium2hard

Single Neuron
Backpropagation
+ 4 more

enter section

03 / 14·4 lessons

PyTorch

Swap NumPy for autograd and GPUs

1easy3medium

PyTorch Basics
Layer Normalization
+ 2 more

enter section

04 / 14·4 lessons

Training

The loop, the diagnostics, the first real model

1easy3medium

Training Loop
Training Diagnostics
+ 2 more

enter section

05 / 14·6 lessons

CNNs & Vision

Filters, feature maps, and the architectures that taught machines to see

2easy2medium2hard

Convolution Operation
Pooling
+ 4 more

enter section

06 / 14·5 lessons

RNN & LSTM

Sequence modeling before attention — and the problems that motivated it

2medium3hard

Recurrent Neural Network
Backprop Through Time
+ 3 more

enter section

07 / 14·4 lessons

NLP

From bag-of-words to dense meaning vectors

1easy2medium1hard

Word Embeddings
Intro to Natural Language Processing
+ 2 more

enter section

08 / 14·3 lessons

Attention & Transformers

The single mechanism that reshaped deep learning

3hard

Self Attention
Multi Headed Self Attention
+ 1 more

enter section

09 / 14·10 lessons

Build GPT

A working GPT, built lesson by lesson

5medium5hard

Tokenizer (Byte Pair Encoding)
Build Vocabulary
+ 8 more

enter section

10 / 14·6 lessons

Fine-Tuning & RLHF

From a base model to an aligned, instruction-following assistant

1medium5hard

Supervised Fine-Tuning
LoRA
+ 4 more

enter section

11 / 14·4 lessons

Mixture of Experts

Sparse activation — the next axis of scale

1medium3hard

MoE Fundamentals
Top-k Routing
+ 2 more

enter section

12 / 14·6 lessons

Diffusion Models

Generate images by learning to reverse noise

1easy1medium4hard

Denoising Intuition
Forward & Reverse Diffusion
+ 4 more

enter section

13 / 14·6 lessons

Reinforcement Learning

Learn from reward signals — the algorithms behind AlphaGo and RLHF

2medium4hard

Markov Decision Processes
Q-Learning
+ 4 more

enter section

14 / 14·5 lessons

Inference & Serving

Ship the model — make it fast, cheap, and production-ready

1medium4hard

Quantization Basics
INT8 & INT4 Quantization
+ 3 more

enter section