Distributed Stochastic Gradient Descent and Convergence to Local Minima

Brian Swenson; Ryan Murray; Soummya Kar; H Vincent Vincent Poort

Abstract

1 min read

In centralized settings, it is well known that stochastic gradient descent (SGD) avoids saddle points. However, similar guarantees are lacking for distributed first-order algorithms in nonconvex optimization.The paper studies distributed stochastic gradient descent (D-SGD)--a simple network-based implementation of SGD. Conditions under which D-SGD converges to local minima are studied. In particular, it is shown that, for each fixed initialization, with probability 1 we have that: (i) D-SGD converges to critical points of the objective and (ii) D-SGD avoids nondegenerate saddle points. To prove these results, we use ODE-based stochastic approximation techniques. The algorithm is approximated using a continuous-time ODE which is easier to study than the (discrete-time) algorithm. Results are first derived for the continuous-time process and then extended to the discrete-time algorithm. Consequently, the paper studies continuous-time distributed gradient descent (DGD) alongside D-SGD. Because the continuous-time process is easier to study, this approach allows for simplified proof techniques and builds important intuition that is obfuscated when studying the discrete-time process alone.

Distributed Stochastic Gradient Descent and Convergence to Local Minima

Abstract

Discussion(0)

Open reviews(0)

Related publications

Distributed Stochastic Gradient Descent: Nonconvexity, Nonsmoothness, and Convergence to Local Minima

Distributed Gradient Descent: Nonconvergence to Saddle Points and the Stable-Manifold Theorem

Distributed Gradient Descent: Nonconvergence to Saddle Points and the\n Stable-Manifold Theorem

Distributed Gradient Methods for Nonconvex Optimization: Local and\n Global Convergence Guarantees

Distributed Gradient Flow: Nonsmoothness, Nonconvexity, and Saddle Point Evasion

Related publications

Preprint2020
Distributed Stochastic Gradient Descent: Nonconvexity, Nonsmoothness, and Convergence to Local Minima
Preprint2020

Preprint2019
Distributed Gradient Descent: Nonconvergence to Saddle Points and the Stable-Manifold Theorem
Preprint2019

Preprint2019
Distributed Gradient Descent: Nonconvergence to Saddle Points and the\n Stable-Manifold Theorem
Preprint2019

Preprint2020
Distributed Gradient Methods for Nonconvex Optimization: Local and\n Global Convergence Guarantees
Preprint2020

Preprint2021
Distributed Gradient Flow: Nonsmoothness, Nonconvexity, and Saddle Point Evasion
Preprint2021