Back

G-2024-64

A proximal modified quasi-Newton method for nonsmooth regularized optimization

, , and

BibTeX reference

We develop R2N, a modified quasi-Newton method for minimizing the sum of a \(\mathcal{C}^1\) function \(f\) and a lower semi-continuous prox-bounded \(h\). Both \(f\) and \(h\) may be nonconvex. At each iteration, our method computes a step by minimizing the sum of a quadratic model of \(f\), a model of \(h\), and an adaptive quadratic regularization term. A step may be computed by way of a variant of the proximal-gradient method. An advantage of R2N over competing trust-region methods is that proximal operators do not involve an extra trust-region indicator. We also develop the variant R2DH, in which the model Hessian is diagonal, which allows us to compute a step without relying on a subproblem solver when \(h\) is separable. R2DH can be used as standalone solver, but also as subproblem solver inside R2N. We describe non-monotone variants of both R2N and R2DH. Global convergence of a first-order stationarity measure to zero holds without relying on local Lipschitz continuity of \(\nabla f\), while allowing model Hessians to grow unbounded, an assumption particularly relevant to quasi-Newton models. Under Lipschitz-continuity of \(\nabla f\), we establish a tight worst-case evaluation complexity bound of \(O(1 / \epsilon^{2/(1 - p)})\) to bring said measure below \(\epsilon > 0\), where \(0 \leq p < 1\) controls the growth of model Hessians. Specifically, the latter must not diverge faster than \(|\mathcal{S}_k|^p\), where \(\mathcal{S}_k\) is the set of successful iterations up to iteration \(k\). When \(p = 1\), we establish the tight exponential complexity bound \(O(\exp(c \epsilon^{-2}))\) where \(c > 0\) is a constant. We describe our Julia implementation and report numerical experience on a classic basis-pursuit problem, an image denoising problem, a minimum-rank matrix completion problem, and a nonlinear support vector machine. In particular, the minimum-rank problem cannot be solved directly at this time by a trust-region approach as corresponding proximal operators are not known analytically.

, 24 pages

Research Axis

Research application

Document

G2464.pdf (600 KB)