比较概率模型与教育用例深度学习模型之间的综合表格数据生成

论文标题

比较概率模型与教育用例深度学习模型之间的综合表格数据生成

Comparing Synthetic Tabular Data Generation Between a Probabilistic Model and a Deep Learning Model for Education Use Cases

论文作者

Combrink, Herkulaas MvE, Marivate, Vukosi, Rosman, Benjamin

论文摘要

生成合成数据的能力在不同域中具有多种用例。在教育研究中，越来越需要访问合成数据来测试某些概念和想法。近年来，使用了几种深度学习架构来帮助生成合成数据，但结果有所不同。在教育环境中，实施需要大型数据集的不同模型的复杂性变得非常重要。这项研究旨在比较概率模型（特别是贝叶斯网络）和深度学习模型，特别是使用分类任务的生成对抗网络之间的合成表格数据生成的应用。这项研究的结果表明，由于概率相互依存关系，综合表格数据生成比深度学习架构（总体准确性为38％）更适合使用概率模型（总体准确性为75％）（总体精度为38％）。最后，我们建议应探索和评估其他数据类型，以应用其在为教育用例生成合成数据时的应用。

The ability to generate synthetic data has a variety of use cases across different domains. In education research, there is a growing need to have access to synthetic data to test certain concepts and ideas. In recent years, several deep learning architectures were used to aid in the generation of synthetic data but with varying results. In the education context, the sophistication of implementing different models requiring large datasets is becoming very important. This study aims to compare the application of synthetic tabular data generation between a probabilistic model specifically a Bayesian Network, and a deep learning model, specifically a Generative Adversarial Network using a classification task. The results of this study indicate that synthetic tabular data generation is better suited for the education context using probabilistic models (overall accuracy of 75%) than deep learning architecture (overall accuracy of 38%) because of probabilistic interdependence. Lastly, we recommend that other data types, should be explored and evaluated for their application in generating synthetic data for education use cases.

下载PDF全文

下载文献需遵守相关版权规定

论文标题