论文标题
使用TF-IDF转换方法在单细胞染色质分析中的信息检索
Information retrieval in single cell chromatin analysis using TF-IDF transformation methods
论文作者
论文摘要
转座酶可访问染色质(SCATAC-SEQ)的单细胞测序测定法评估了数千个细胞中全基因组染色质的可及性,以揭示高分辨率中的调节景观。但是,由于数据的高维和稀疏性,该分析提出了挑战。已经开发了几种方法,包括术语频率倒数频率(TF-IDF)的转换技术,降低降低方法,例如奇异值分解(SVD),因子分析和自动编码器。然而,尚未对上述方法进行全面研究。在分析SCATAC-SEQ数据时,尚不清楚什么是最佳实践。我们比较了几种转换和减少尺寸的方案以及基于SVD的功能分析,以研究SCATAC-SEQ信息检索的潜在增强。此外,我们研究自动编码器是否从TF-IDF转换中受益。我们的结果表明,TF-IDF转换通常会导致聚类和生物学相关的特征提取。
Single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq) assesses genome-wide chromatin accessibility in thousands of cells to reveal regulatory landscapes in high resolutions. However, the analysis presents challenges due to the high dimensionality and sparsity of the data. Several methods have been developed, including transformation techniques of term-frequency inverse-document frequency (TF-IDF), dimension reduction methods such as singular value decomposition (SVD), factor analysis, and autoencoders. Yet, a comprehensive study on the mentioned methods has not been fully performed. It is not clear what is the best practice when analyzing scATAC-seq data. We compared several scenarios for transformation and dimension reduction as well as the SVD-based feature analysis to investigate potential enhancements in scATAC-seq information retrieval. Additionally, we investigate if autoencoders benefit from the TF-IDF transformation. Our results reveal that the TF-IDF transformation generally leads to improved clustering and biologically relevant feature extraction.