MomentumとAdam最適化器

🔥 Vibe プロンプト

「f(x,y)=x²+10y²でSGD、Momentum、Adamを比較。パラメータ軌跡をプロット。」

最適化アルゴリズム概要

| アルゴリズム | 特徴 | 更新則 | 用途 | |------------|------|--------|------| | SGD | 基本勾配降下 | w = w - α·g | 単純な問題 | | Momentum | 速度項追加 | v = β·v + g, w = w - α·v | 振動抑制 | | RMSProp | 適応的LR | v = β·v+(1-β)·g², w = w - α·g/√v | スパース特徴 | | Adam | Momentum+RMSProp | m=β₁·m+(1-β₁)·g, v=β₂·v+(1-β₂)·g² | デフォルト |

実装

import numpy as np

def sgd(grad, w0, lr=0.1, steps=100):
    w = w0.copy()
    history = [w.copy()]
    for _ in range(steps):
        w = w - lr * grad(w)
        history.append(w.copy())
    return w, history

def momentum(grad, w0, lr=0.1, beta=0.9, steps=100):
    w = w0.copy()
    v = np.zeros_like(w)
    history = [w.copy()]
    for _ in range(steps):
        g = grad(w)
        v = beta * v + (1 - beta) * g
        w = w - lr * v
        history.append(w.copy())
    return w, history

def adam(grad, w0, lr=0.1, beta1=0.9, beta2=0.999, eps=1e-8, steps=100):
    w = w0.copy()
    m = np.zeros_like(w)  # 1次モーメント（勾配の平均）
    v = np.zeros_like(w)  # 2次モーメント（勾配の分散）
    history = [w.copy()]
    for t in range(1, steps + 1):
        g = grad(w)
        m = beta1 * m + (1 - beta1) * g
        v = beta2 * v + (1 - beta2) * g * g
        m_hat = m / (1 - beta1 ** t)  # バイアス補正
        v_hat = v / (1 - beta2 ** t)
        w = w - lr * m_hat / (np.sqrt(v_hat) + eps)
        history.append(w.copy())
    return w, history

# 比較: f(x,y) = x² + 10y²
def grad(w):
    return np.array([2 * w[0], 20 * w[1]])

w0 = np.array([5.0, 5.0])

_, h_sgd = sgd(grad, w0, lr=0.05, steps=100)
_, h_mom = momentum(grad, w0, lr=0.05, steps=100)
_, h_adam = adam(grad, w0, lr=0.05, steps=100)

print(f"SGD最終: {h_sgd[-1]}")
print(f"Momentum最終: {h_mom[-1]}")
print(f"Adam最終: {h_adam[-1]}")

# 収束速度比較
print(f"\n収束速度（0に近いほど速い）:")
for step in [0, 10, 20, 50, 100]:
    s = np.linalg.norm(h_sgd[step])
    m = np.linalg.norm(h_mom[step])
    a = np.linalg.norm(h_adam[step])
    print(f"Step {step:3d}: SGD={s:.3f} Momentum={m:.3f} Adam={a:.3f}")

Adamが優れている理由

| 特徴 | 利点 | |------|------| | 適応的LR | パラメータごとに異なる学習率 | | モメンタム | 勾配の振動を平滑化 | | バイアス補正 | 最初のステップから正確な推定 | | LRに頑健 | デフォルトα=0.001でほとんどの問題に対応 |

まとめ

| アルゴリズム | 適応的LR? | モメンタム? | デフォルトα | |------------|----------|-----------|-----------| | SGD | No | No | 0.01 | | Momentum | No | Yes | 0.01 | | RMSProp | Yes | No | 0.001 | | Adam | Yes | Yes | 0.001 |

章のまとめ

コアコンセプトと原理を理解
実装方法とテクニックを習得
一般的な問題と解決策に精通
実際のプロジェクトに適用可能

さらに読む

公式ドキュメントとAPIリファレンス
GitHubのオープンソース例
技術書とオンラインコース
コミュニティディスカッションと技術ブログ