title: "Gradient Descent & Numerical Optimization" description: "The soul of ML training! From Gradient Descent to Adam Optimizer, implement various optimizers with Vibe Coding." duration: "120 minutes" difficulty: "Advanced"

Gradient Descent & Numerical Optimization

Machine learning is fundamentally an optimization problem: find parameters that minimize the loss function.

This course starts from basic gradient descent, progressively implementing Momentum, RMSProp, Adam and other modern optimizers.

Course Outline

  1. Gradient Descent Fundamentals - intuitive calculus, basic GD algorithm
  2. Momentum & Adam - adding momentum and adaptive learning rates
  3. SGD & Mini-Batch - stochastic optimization for large-scale data
  4. Automatic Differentiation - building an autograd engine from scratch
  5. Hands-On: Linear Regression - integrate all optimizers

Key Takeaways

  • Gradient descent is the core optimization algorithm in machine learning
  • Gradient direction = fastest increase direction
  • Learning rate determines step size
  • SGD approximates gradients with random samples
  • Momentum accelerates convergence
  • Adam combines Momentum + RMSProp
  • Understanding optimizer variants is crucial for deep learning training