论文标题
总变化距离的高概率下限
High Probability Lower Bounds for the Total Variation Distance
论文作者
论文摘要
统计和机器学习社区最近对基于分类的两样本测试的方法越来越兴趣。基于分类的两样本测试的结果仍然是一个拒绝决定,这并不总是有用的,因为零假设很少完全正确。因此,当测试拒绝时,提供一个额外的数量作为分布差的精制度量是有益的。在这项工作中,我们引入了一个框架,用于在总变化距离上构建高概率下限。这些界限基于一维投影,例如分类或回归方法,可以解释为指向分布差异的样本的最小分数。我们进一步得出了两个提出的估计量的渐近功率和检测率,并通过应用于重新分析气候数据集讨论潜在用途。
The statistics and machine learning communities have recently seen a growing interest in classification-based approaches to two-sample testing. The outcome of a classification-based two-sample test remains a rejection decision, which is not always informative since the null hypothesis is seldom strictly true. Therefore, when a test rejects, it would be beneficial to provide an additional quantity serving as a refined measure of distributional difference. In this work, we introduce a framework for the construction of high-probability lower bounds on the total variation distance. These bounds are based on a one-dimensional projection, such as a classification or regression method, and can be interpreted as the minimal fraction of samples pointing towards a distributional difference. We further derive asymptotic power and detection rates of two proposed estimators and discuss potential uses through an application to a reanalysis climate dataset.