v1·75 lessons · 14 sections · 168+ widgets

Derive ityourself.

From gradient descent to GPT — every equation proven, every line of code written from zero, every research nuance unpacked to the paper.

75lessons·168+widgets·14sections
Math firstevery gradient derived
NumPy → PyTorchthree-layer code pattern
Live widgetspoke it until it breaks
this is what a lesson feels like

hover the rows · toggle causal masking · drag the temperature

See the full derivation — Self Attention

168 more widgets inside

ml ›curriculum · 14 sections · 75 lessons

The Full Roadmap

Fourteen sections, each a self-contained arc. Pick a topic — the section page opens with every lesson in order, blurbed and ranked by difficulty.

01 / 14·6 lessons

Math Foundations

The calculus and linear algebra behind every neural net

4easy2medium
  • Gradient Descent
  • Sigmoid & ReLU
  • + 4 more
enter section
02 / 14·6 lessons

Build a Neural Net

Neurons, layers, and backprop — wired by hand

1easy3medium2hard
  • Single Neuron
  • Backpropagation
  • + 4 more
enter section
03 / 14·4 lessons

PyTorch

Swap NumPy for autograd and GPUs

1easy3medium
  • PyTorch Basics
  • Layer Normalization
  • + 2 more
enter section
04 / 14·4 lessons

Training

The loop, the diagnostics, the first real model

1easy3medium
  • Training Loop
  • Training Diagnostics
  • + 2 more
enter section
05 / 14·6 lessons

CNNs & Vision

Filters, feature maps, and the architectures that taught machines to see

2easy2medium2hard
  • Convolution Operation
  • Pooling
  • + 4 more
enter section
06 / 14·5 lessons

RNN & LSTM

Sequence modeling before attention — and the problems that motivated it

2medium3hard
  • Recurrent Neural Network
  • Backprop Through Time
  • + 3 more
enter section
07 / 14·4 lessons

NLP

From bag-of-words to dense meaning vectors

1easy2medium1hard
  • Word Embeddings
  • Intro to Natural Language Processing
  • + 2 more
enter section
08 / 14·3 lessons

Attention & Transformers

The single mechanism that reshaped deep learning

3hard
  • Self Attention
  • Multi Headed Self Attention
  • + 1 more
enter section
09 / 14·10 lessons

Build GPT

A working GPT, built lesson by lesson

5medium5hard
  • Tokenizer (Byte Pair Encoding)
  • Build Vocabulary
  • + 8 more
enter section
10 / 14·6 lessons

Fine-Tuning & RLHF

From a base model to an aligned, instruction-following assistant

1medium5hard
  • Supervised Fine-Tuning
  • LoRA
  • + 4 more
enter section
11 / 14·4 lessons

Mixture of Experts

Sparse activation — the next axis of scale

1medium3hard
  • MoE Fundamentals
  • Top-k Routing
  • + 2 more
enter section
12 / 14·6 lessons

Diffusion Models

Generate images by learning to reverse noise

1easy1medium4hard
  • Denoising Intuition
  • Forward & Reverse Diffusion
  • + 4 more
enter section
13 / 14·6 lessons

Reinforcement Learning

Learn from reward signals — the algorithms behind AlphaGo and RLHF

2medium4hard
  • Markov Decision Processes
  • Q-Learning
  • + 4 more
enter section
14 / 14·5 lessons

Inference & Serving

Ship the model — make it fast, cheap, and production-ready

1medium4hard
  • Quantization Basics
  • INT8 & INT4 Quantization
  • + 3 more
enter section