Ahn, K., Bubeck, S., Chewi, S., Lee, Y. T., Suarez, F., & Zhang, Y. (2022). Learning threshold neurons via the" edge of stability". In arXiv preprint arXiv:2212.07469.
@unpublished{ahn2022learning,
title = {Learning threshold neurons via the" edge of stability"},
author = {Ahn, Kwangjun and Bubeck, S{\'e}bastien and Chewi, Sinho and Lee, Yin Tat and Suarez, Felipe and Zhang, Yi},
journal = {arXiv preprint arXiv:2212.07469},
doi = {2212.07469},
year = {2022}
}
Existing analyses of neural network training often operate under the unrealistic assumption of an extremely small learning rate. This lies in stark contrast to practical wisdom and empirical studies, such as the work of J. Cohen et al. (ICLR 2021), which exhibit startling new phenomena (the "edge of stability" or "unstable convergence") and potential benefits for generalization in the large learning rate regime. Despite a flurry of recent works on this topic, however, the latter effect is still poorly understood. In this paper, we take a step towards understanding genuinely non-convex training dynamics with large learning rates by performing a detailed analysis of gradient descent for simplified models of two-layer neural networks. For these models, we provably establish the edge of stability phenomenon and discover a sharp phase transition for the step size below which the neural network fails to learn "threshold-like" neurons (i.e., neurons with a non-zero first-layer bias). This elucidates one possible mechanism by which the edge of stability can in fact lead to better generalization, as threshold neurons are basic building blocks with useful inductive bias for many tasks.
Clancy, J., & Suarez, F. (2022). Wasserstein-Fisher-Rao Splines. In arXiv preprint arXiv:2203.15728.
@unpublished{clancy2022wasserstein,
title = {Wasserstein-Fisher-Rao Splines},
author = {Clancy, Julien and Suarez, Felipe},
journal = {arXiv preprint arXiv:2203.15728},
doi = {2203.15728},
year = {2022}
}
We study interpolating splines on the Wasserstein-Fisher-Rao (WFR) space of measures with differing total masses. To achieve this, we derive the covariant derivative and the curvature of an absolutely continuous curve in the WFR space. We prove that this geometric notion of curvature is equivalent to a Lagrangian notion of curvature in terms of particles on the cone. Finally, we propose a practical algorithm for computing splines extending the work of arXiv:2010.12101.
Ahn, K., & Suarez, F. (2021). Riemannian perspective on matrix factorization. In arXiv preprint arXiv:2102.00937.
@unpublished{ahn2021riemannian,
title = {Riemannian perspective on matrix factorization},
author = {Ahn, Kwangjun and Suarez, Felipe},
journal = {arXiv preprint arXiv:2102.00937},
doi = {2102.00937},
year = {2021}
}
We study the non-convex matrix factorization approach to matrix completion via Riemannian geometry. Based on an optimization formulation over a Grassmannian manifold, we characterize the landscape based on the notion of principal angles between subspaces. For the fully observed case, our results show that there is a region in which the cost is geodesically convex, and outside of which all critical points are strictly saddle. We empirically study the partially observed case based on our findings.