Filters, feature maps, and the architectures that taught machines to see
The kernel that slides, multiplies, sums.
Max, average, and why we downsample.
A LeNet-style conv net, one layer at a time in NumPy.
CIFAR-10 end-to-end — augmentation, training, evaluation.
Residuals, identity paths, and why depth stopped hurting.
Treating image patches as tokens — the bridge to attention.