论文标题
降低数据维度使ML算法有效
Data Dimension Reduction makes ML Algorithms efficient
论文作者
论文摘要
数据维度降低(DDR)都是关于将数据从高维映射到低维度,DDR的各种技术被用于降低图像维度,例如随机投影,主成分分析(PCA),方差方法,LSA-Transform,合并和直接方法,以及新的随机方法。自动编码器(AE)用于学习端到端映射。在本文中,我们证明了预处理不仅可以加快算法的速度,而且还提高了受监督和无监督学习的准确性。在预处理DDR时,第一个基于PCA的DDR用于监督学习,然后我们探索基于AE的DDR,以进行无监督的学习。在基于PCA的DDR中,我们首先比较了使用PCA之前和之后的监督学习算法准确性和时间。同样,在基于AE的DDR中,我们比较了无监督的学习算法准确性以及AE代表学习之前和之后的时间。有监督的学习算法包括支持 - 向量机(SVM),带有Gini索引的决策树,带有熵和随机梯度下降分类器(SGDC)的决策树以及无监督的学习算法,包括K-Means群集,用于分类目的。我们使用了两个数据集MNIST和FashionMnist,我们的实验表明,在预处理和无监督的学习中进行了预处理后,准确性和时间缩短了。
Data dimension reduction (DDR) is all about mapping data from high dimensions to low dimensions, various techniques of DDR are being used for image dimension reduction like Random Projections, Principal Component Analysis (PCA), the Variance approach, LSA-Transform, the Combined and Direct approaches, and the New Random Approach. Auto-encoders (AE) are used to learn end-to-end mapping. In this paper, we demonstrate that pre-processing not only speeds up the algorithms but also improves accuracy in both supervised and unsupervised learning. In pre-processing of DDR, first PCA based DDR is used for supervised learning, then we explore AE based DDR for unsupervised learning. In PCA based DDR, we first compare supervised learning algorithms accuracy and time before and after applying PCA. Similarly, in AE based DDR, we compare unsupervised learning algorithm accuracy and time before and after AE representation learning. Supervised learning algorithms including support-vector machines (SVM), Decision Tree with GINI index, Decision Tree with entropy and Stochastic Gradient Descent classifier (SGDC) and unsupervised learning algorithm including K-means clustering, are used for classification purpose. We used two datasets MNIST and FashionMNIST Our experiment shows that there is massive improvement in accuracy and time reduction after pre-processing in both supervised and unsupervised learning.