论文标题
从序数数据中学习贝叶斯网络
Learning Bayesian Networks from Ordinal Data
论文作者
论文摘要
贝叶斯网络是研究复杂系统中变量的依赖性结构的强大框架。学习贝叶斯网络的问题与给定数据类型紧密相关。在应用研究中无处不在的序数数据,例如癌症的阶段,评级量表调查问题以及考试的字母等级。但是,现有的解决方案主要用于连续数据和名义数据。在这项工作中,我们提出了一种迭代分数和搜索方法 - 称为序列结构EM(OSEM)算法 - 用于从序数数据中学习贝叶斯网络。与为名义数据设计的传统方法不同,我们明确尊重类别之间的顺序。更确切地说,我们假设序数变量源于一组高斯变量的边缘离散,其在潜在空间中的结构依赖性遵循有向的无环图。然后,我们采用结构EM算法,并得出封闭形式的评分函数,以进行有效的图形搜索。通过模拟研究,我们说明了与替代方案相比,OSEM算法的出色性能,并分析了可能影响学习准确性的各种因素。最后,我们通过来自408例强迫症和抑郁症患者的心理调查数据的现实应用程序的现实应用程序来证明我们的方法的实用性。
Bayesian networks are a powerful framework for studying the dependency structure of variables in a complex system. The problem of learning Bayesian networks is tightly associated with the given data type. Ordinal data, such as stages of cancer, rating scale survey questions, and letter grades for exams, are ubiquitous in applied research. However, existing solutions are mainly for continuous and nominal data. In this work, we propose an iterative score-and-search method - called the Ordinal Structural EM (OSEM) algorithm - for learning Bayesian networks from ordinal data. Unlike traditional approaches designed for nominal data, we explicitly respect the ordering amongst the categories. More precisely, we assume that the ordinal variables originate from marginally discretizing a set of Gaussian variables, whose structural dependence in the latent space follows a directed acyclic graph. Then, we adopt the Structural EM algorithm and derive closed-form scoring functions for efficient graph searching. Through simulation studies, we illustrate the superior performance of the OSEM algorithm compared to the alternatives and analyze various factors that may influence the learning accuracy. Finally, we demonstrate the practicality of our method with a real-world application on psychological survey data from 408 patients with co-morbid symptoms of obsessive-compulsive disorder and depression.