论文标题
使用现实世界数据分布的联合视觉分类
Federated Visual Classification with Real-World Data Distribution
论文作者
论文摘要
联合学习使视觉模型可以在设备上进行培训,从而为用户隐私带来了优势(数据绝不需要离开设备),但是在数据多样性和质量方面挑战。尽管数据中心中的典型模型是使用独立且分布相同(IID)的数据训练的,但源数据通常远离IID。此外,通常在每个设备(不平衡)上可以使用不同数量的数据。在这项工作中,我们表征了这些现实世界数据分布对分布式学习的影响,并用作标准联合平均(FIDAVG)算法的基准。为此,我们介绍了两个新的大规模数据集,用于物种和地标分类,并通过使用现实的每个用户数据拆分模拟了现实世界的边缘学习方案。我们还开发了两种新的算法(FedVC,fedir),它们可以在客户池中智能地重新置换和重新重新取得并重新重新取得重新样本,从而在培训中的准确性和稳定性方面有了很大的提高。数据集可在线提供。
Federated Learning enables visual models to be trained on-device, bringing advantages for user privacy (data need never leave the device), but challenges in terms of data diversity and quality. Whilst typical models in the datacenter are trained using data that are independent and identically distributed (IID), data at source are typically far from IID. Furthermore, differing quantities of data are typically available at each device (imbalance). In this work, we characterize the effect these real-world data distributions have on distributed learning, using as a benchmark the standard Federated Averaging (FedAvg) algorithm. To do so, we introduce two new large-scale datasets for species and landmark classification, with realistic per-user data splits that simulate real-world edge learning scenarios. We also develop two new algorithms (FedVC, FedIR) that intelligently resample and reweight over the client pool, bringing large improvements in accuracy and stability in training. The datasets are made available online.