The calculus and linear algebra behind every neural net
The workhorse optimizer — derive, implement, and visualize it.
Activation functions, their derivatives, and why ReLU won.
Turn raw logits into calibrated probabilities.
The canonical classification loss — from KL divergence down.
Predictions as matrix multiplications.
Closed-form vs iterative — when each wins.