论文标题
Mixboost:合成与增强混合的合成过采样,以处理极端不平衡
MixBoost: Synthetic Oversampling with Boosted Mixup for Handling Extreme Imbalance
论文作者
论文摘要
在数据集上训练一个分类模型,在数据集上,一个类的实例超过另一个班级的实例是一个具有挑战性的问题。这种不平衡的数据集是在现实情况下的标准配置,例如欺诈检测,医学诊断和计算广告。我们提出了一种迭代数据增强方法MixBoost,该方法智能选择(Boost),然后结合多数和少数类别的(MIX)实例,以生成具有两个类别特征的合成混合实例。我们在20个基准数据集上评估MixBoost,表明它表现出色的方法,并通过显着性测试测试其功效。我们还提供了消融研究,以分析Mixboost不同组成部分的影响。
Training a classification model on a dataset where the instances of one class outnumber those of the other class is a challenging problem. Such imbalanced datasets are standard in real-world situations such as fraud detection, medical diagnosis, and computational advertising. We propose an iterative data augmentation method, MixBoost, which intelligently selects (Boost) and then combines (Mix) instances from the majority and minority classes to generate synthetic hybrid instances that have characteristics of both classes. We evaluate MixBoost on 20 benchmark datasets, show that it outperforms existing approaches, and test its efficacy through significance testing. We also present ablation studies to analyze the impact of the different components of MixBoost.