论文标题
顺序内核化独立性测试
Sequential Kernelized Independence Testing
论文作者
论文摘要
独立性测试是一个经典的统计问题,当一个人在收集数据之前修复样本量时,在批处理设置中进行了广泛的研究。但是,从业人员通常更喜欢适应手头问题复杂性而不是提前设置样本量的过程。理想情况下,此类过程应(a)更早地停止容易任务(稍后处理更艰巨的任务),从而更好地利用可用资源,(b)在收集新数据后不断监视数据并有效地合并统计证据,同时控制错误的警报率。经典的批次测试不是针对流数据进行量身定制的:数据窥视后有效推理需要纠正多个测试,从而导致低功率。遵循下注测试的原则,我们设计了克服此类缺点的顺序内核化的独立性测试。我们使用受内核依赖度量启发的BET来体现我们的广泛框架,例如Hilbert-Schmidt独立标准。我们的测试在非I.I.D。(时变设置)下也有效。我们在模拟和真实数据上演示了我们的方法的力量。
Independence testing is a classical statistical problem that has been extensively studied in the batch setting when one fixes the sample size before collecting data. However, practitioners often prefer procedures that adapt to the complexity of a problem at hand instead of setting sample size in advance. Ideally, such procedures should (a) stop earlier on easy tasks (and later on harder tasks), hence making better use of available resources, and (b) continuously monitor the data and efficiently incorporate statistical evidence after collecting new data, while controlling the false alarm rate. Classical batch tests are not tailored for streaming data: valid inference after data peeking requires correcting for multiple testing which results in low power. Following the principle of testing by betting, we design sequential kernelized independence tests that overcome such shortcomings. We exemplify our broad framework using bets inspired by kernelized dependence measures, e.g., the Hilbert-Schmidt independence criterion. Our test is also valid under non-i.i.d., time-varying settings. We demonstrate the power of our approaches on both simulated and real data.