ml ›section 05 of 14

CNNs & Vision

Filters, feature maps, and the architectures that taught machines to see

6 lessons·2easy2medium2hard

Lessons

in order

The kernel that slides, multiplies, sums.

Max, average, and why we downsample.

A LeNet-style conv net, one layer at a time in NumPy.

CIFAR-10 end-to-end — augmentation, training, evaluation.

Residuals, identity paths, and why depth stopped hurting.

Treating image patches as tokens — the bridge to attention.