大规模优化的大规模方法

论文标题

大规模优化的大规模方法

Large-Scale Methods for Distributionally Robust Optimization

论文作者

Levy, Daniel, Carmon, Yair, Duchi, John C., Sidford, Aaron

论文摘要

我们提出和分析算法，以在有条件的风险（CVAR）和$χ^2 $ divergence不确定性集的条件值（CVAR）和$χ^2 $ divergence不确定性集的情况下进行分布强劲优化。我们证明，我们的算法需要多种梯度评估，而与训练集大小和参数数量无关，使其适合大规模应用。对于$χ^2 $的不确定性集，这些是文献中的第一个保证，对于CVAR，我们的保证在不确定性级别线性地线性，而不是像以前的工作一样四边形。我们还提供了下界，证明了我们的CVAR算法最糟糕的最优性，以及$χ^2 $问题的惩罚版本。我们的主要技术贡献是批次稳定风险估计的偏见以及由于[Blanchet＆Glynn，2015]引起的多级蒙特卡洛梯度估计器的差异。关于MNIST和Imagenet的实验证实了我们算法的理论缩放，这是9---36倍的效率，其效率是全批次方法。

We propose and analyze algorithms for distributionally robust optimization of convex losses with conditional value at risk (CVaR) and $χ^2$ divergence uncertainty sets. We prove that our algorithms require a number of gradient evaluations independent of training set size and number of parameters, making them suitable for large-scale applications. For $χ^2$ uncertainty sets these are the first such guarantees in the literature, and for CVaR our guarantees scale linearly in the uncertainty level rather than quadratically as in previous work. We also provide lower bounds proving the worst-case optimality of our algorithms for CVaR and a penalized version of the $χ^2$ problem. Our primary technical contributions are novel bounds on the bias of batch robust risk estimation and the variance of a multilevel Monte Carlo gradient estimator due to [Blanchet & Glynn, 2015]. Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9--36 times more efficient than full-batch methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题