论文标题

从单细胞基因表达数据发现的推定细胞类型发现

Putative cell type discovery from single-cell gene expression data

论文作者

Miao, Zhichao, Moreno, Pablo, Huang, Ni, Papatheodorou, Irene, Brazma, Alvis, Teichmann, Sarah A

论文摘要

我们提出了一种从单细胞RNA-SEQ(SCRNA-SEQ)数据自动鉴定自动鉴定的新方法。通过迭代将机器学习方法应用于给定一组单元的基因表达谱的初始聚类,我们同时确定了不同的细胞组和每个组的特征基因的加权列表。特征基因在特定细胞组中差异表达,共同区分给定细胞组与其他细胞。每个这样的细胞组对应于推定的细胞类型或状态,其特征在于特征基因作为标记。为了基于这种方法,我们使用来自一系列实验的专家注释的SCRNA-SEQ数据集,并与现有的单元注释方法进行比较,这些方法都是基于预先存在的参考。我们表明,我们的方法会以高精度自动识别“地面真相”细胞分配。此外,我们的方法,单细胞聚类评估框架(SCCAF)预测了有关造血和人类皮质的已发表数据中新推定的生物学有意义的细胞园。 SCCAF可作为GITHUB(https://github.com/sccaf/sccaf)上的开源软件包,并且可以作为Python软件包索引,并且也已作为人类细胞地图集的星系工具实现。

We present a novel method for automated identification of putative cell types from single-cell RNA-seq (scRNA-seq) data. By iteratively applying a machine learning approach to an initial clustering of gene expression profiles of a given set of cells, we simultaneously identify distinct cell groups and a weighted list of feature genes for each group. The feature genes, which are differentially expressed in the particular cell group, jointly discriminate the given cell group from other cells. Each such group of cells corresponds to a putative cell type or state, characterised by the feature genes as markers. To benchmark this approach, we use expert-annotated scRNA-seq datasets from a range of experiments, as well as comparing to existing cell annotation methods, which are all based on a pre-existing reference. We show that our method automatically identifies the 'ground truth' cell assignments with high accuracy. Moreover, our method, Single Cell Clustering Assessment Framework (SCCAF) predicts new putative biologically meaningful cell-states in published data on haematopoiesis and the human cortex. SCCAF is available as an open-source software package on GitHub (https://github.com/SCCAF/sccaf) and as a Python package index and has also been implemented as a Galaxy tool in the Human Cell Atlas.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源