论文标题

超越图像:具有较低特征的任务的标签噪声过渡矩阵估计

Beyond Images: Label Noise Transition Matrix Estimation for Tasks with Lower-Quality Features

论文作者

Zhu, Zhaowei, Wang, Jialu, Liu, Yang

论文摘要

标签噪声过渡矩阵,表示从干净标签到嘈杂标签的过渡概率,对于设计统计上强大的解决方案至关重要。噪声过渡矩阵的现有估计器,例如,使用锚点或凝聚力,重点是相对容易获得高质量表示的计算机视觉任务。我们观察到,由于非信息和信息性表示的共存,具有较低质量特征的任务无法满足锚点或凝聚力条件。为了解决这个问题,我们提出了一种通用和实用的信息理论方法,以减少质量低下功能的信息不足的部分。这种改进对于识别和估计标签噪声转变矩阵至关重要。显着的技术挑战是仅使用嘈杂标签而不是干净的标签来计算相关的信息理论指标。我们证明,使用嘈杂标签计算时,著名的$ f $ - 流浪性信息度量通常可以保留订单。然后,我们使用此蒸馏版本的功能构建过渡矩阵估计器。通过评估具有较低质量特征的各种表格数据和文本分类任务的估计误差,还可以通过评估提出方法的必要性和有效性。代码可在github.com/ucsc-real/beyondimages上找到。

The label noise transition matrix, denoting the transition probabilities from clean labels to noisy labels, is crucial for designing statistically robust solutions. Existing estimators for noise transition matrices, e.g., using either anchor points or clusterability, focus on computer vision tasks that are relatively easier to obtain high-quality representations. We observe that tasks with lower-quality features fail to meet the anchor-point or clusterability condition, due to the coexistence of both uninformative and informative representations. To handle this issue, we propose a generic and practical information-theoretic approach to down-weight the less informative parts of the lower-quality features. This improvement is crucial to identifying and estimating the label noise transition matrix. The salient technical challenge is to compute the relevant information-theoretical metrics using only noisy labels instead of clean ones. We prove that the celebrated $f$-mutual information measure can often preserve the order when calculated using noisy labels. We then build our transition matrix estimator using this distilled version of features. The necessity and effectiveness of the proposed method are also demonstrated by evaluating the estimation error on a varied set of tabular data and text classification tasks with lower-quality features. Code is available at github.com/UCSC-REAL/BeyondImages.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源