Adaptive moment estimation (Adam) is one of the most commonly used optimizer in the training of neural networks. The existing convergence studies focus on demonstrating that the limit point is stationary, i.e., limk→∞∇f(x(k))=0, or showing that the ratio between the algorithm’s regret and the number of iterations steps, goes to zero. In this work, we show that under the Polyak-Łojasiewicz inequality, the sequence of objective function values associated with a run of Adam converges linearly up to a neighbourhood of the optimal value. Moreover, our analysis sheds lights on the influence of the various hyperparameters on the convergence of the Adam optimizer. Numerical tests are conducted to assess the convergence speed and accuracy achieved by Adam when varying the configuration of the hyperparameters during the training of a multinomial logistic regression model for image classification.
Influence of Hyperparameters on the Convergence of Adam Under the Polyak-Lojasiewicz Inequality
Massei, Stefano
2025-01-01
Abstract
Adaptive moment estimation (Adam) is one of the most commonly used optimizer in the training of neural networks. The existing convergence studies focus on demonstrating that the limit point is stationary, i.e., limk→∞∇f(x(k))=0, or showing that the ratio between the algorithm’s regret and the number of iterations steps, goes to zero. In this work, we show that under the Polyak-Łojasiewicz inequality, the sequence of objective function values associated with a run of Adam converges linearly up to a neighbourhood of the optimal value. Moreover, our analysis sheds lights on the influence of the various hyperparameters on the convergence of the Adam optimizer. Numerical tests are conducted to assess the convergence speed and accuracy achieved by Adam when varying the configuration of the hyperparameters during the training of a multinomial logistic regression model for image classification.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


