论文标题
对多种流形的概率学习
Probabilistic Learning on Manifolds
论文作者
论文摘要
本文介绍了数学结果,以支持作者最近引入的流形(PLOM)的概率学习方法,该方法已被成功用于分析复杂的工程系统。 PLOM认为给定的初始数据集由欧几里得空间中给出的少数点构成,该数据被解释为对矢量值的随机变量的独立实现,其非高斯概率度量是未知的,但\ textit {a Priorchit {a Priordit},集中在欧美氏空间的未知子集中。目的是构建由允许评估融合统计信息的其他实现的学习数据集。用初始数据集估计的概率度量的传输是通过使用减少阶扩散映射基础构建的线性变换来完成的。在本文中,证明这种运输的度量是减少阶段的ITô随机微分方程的不变度度量的边缘分布,该方程与耗散性的哈密顿动力学系统相对应。这种结构允许保留概率度量的浓度。通过分析用PLOM构建的随机矩阵与代表初始数据集的矩阵之间的距离显示的距离,以显示此属性。进一步证明,该距离的尺寸最小,用于降低排放映射基础的尺寸,该距离严格小于初始数据集中的点数。最后,简短的数值应用说明了数学结果。
This paper presents mathematical results in support of the methodology of the probabilistic learning on manifolds (PLoM) recently introduced by the authors, which has been used with success for analyzing complex engineering systems. The PLoM considers a given initial dataset constituted of a small number of points given in an Euclidean space, which are interpreted as independent realizations of a vector-valued random variable for which its non-Gaussian probability measure is unknown but is, \textit{a priori}, concentrated in an unknown subset of the Euclidean space. The objective is to construct a learned dataset constituted of additional realizations that allow the evaluation of converged statistics. A transport of the probability measure estimated with the initial dataset is done through a linear transformation constructed using a reduced-order diffusion-maps basis. In this paper, it is proven that this transported measure is a marginal distribution of the invariant measure of a reduced-order Itô stochastic differential equation that corresponds to a dissipative Hamiltonian dynamical system. This construction allows for preserving the concentration of the probability measure. This property is shown by analyzing a distance between the random matrix constructed with the PLoM and the matrix representing the initial dataset, as a function of the dimension of the basis. It is further proven that this distance has a minimum for a dimension of the reduced-order diffusion-maps basis that is strictly smaller than the number of points in the initial dataset. Finally, a brief numerical application illustrates the mathematical results.