SMAC Talk
stat
Accelerated Stochastic Optimization: Optimism, Coordinate Descent and Beyond
Add to Calendar 2021-02-05T15:10:00 2021-02-05T16:00:00 UTC Accelerated Stochastic Optimization: Optimism, Coordinate Descent and Beyond
Start DateFri, Feb 05, 2021
10:10 AM
to
End DateFri, Feb 05, 2021
11:00 AM
Presented By
Anant Raj (Max-Planck Institute for Intelligent Systems, Tuebingen, German)
Event Series: SMAC Talks

Recently there have been several attempts to extend Nesterov’s accelerated algorithm to smooth stochastic and variance-reduced optimization.  We show that there is a simpler approach to acceleration: applying optimistic online learning algorithms and querying the gradient oracle at the online average of the intermediate optimization iterates. In particular, we tighten a recent result of Cutkosky (2019) to demonstrate theoretically that online iterate averaging results in a reduced optimization gap, independently of the algorithm involved. We show that carefully combining this technique with existing generic optimistic online learning algorithms yields the optimal accelerated rates for optimizing strongly-convex and non-strongly-convex, possibly composite objectives, with deterministic as well as stochastic first-order oracles. We further extend this idea to variance-reduced optimization. Finally, we also provide “universal” algorithms that achieve the optimal rate for smooth and non-smooth composite objectives.

Moving further, we consider (accelerated) stochastic gradient methods under the interpolation regime where a perfect fit can be obtained (minimum loss at each observation). While previous work highlighted the implicit regularization of such algorithms, we consider an explicit regularization framework as a minimum Bregman divergence convex feasibility problem. Using convex duality, we propose randomized Dykstra-style algorithms based on randomized dual coordinate ascent. For non-accelerated coordinate descent, we obtain an algorithm which bears strong similarities with (non-averaged) stochastic mirror descent on specific functions, as it is is equivalent for quadratic objectives, and equivalent in the early iterations for more general objectives. It comes with the benefit of an explicit convergence theorem to a minimum norm solution. For accelerated coordinate descent, we obtain a new algorithm that has better convergence properties than existing stochastic gradient methods in the interpolating regime.