ml ›section 09 of 14

Build GPT

A working GPT, built lesson by lesson

10 lessons·5medium5hard

Lessons

in order

BPE from scratch — merges, vocabs, edge cases.

Train a BPE vocab on real text.

Whitespace, unicode, emoji, and the quirks of tiktoken.

Streaming tokens into the model efficiently.

Context windows, next-token targets, packing.

Assemble the full GPT architecture.

AdamW, warmup, cosine decay — the real recipe.

Sampling: temperature, top-k, nucleus.

The single trick behind fast inference.

Llama-style attention: memory savings without accuracy loss.