论文标题

多个实例神经网络基于使用T细胞受体序列稀疏注意癌症检测的稀疏

Multiple Instance Neural Networks Based on Sparse Attention for Cancer Detection using T-cell Receptor Sequences

论文作者

Kim, Younghoon, Wang, Tao, Xiong, Danyi, Wang, Xinlei, Park, Seongoh

论文摘要

由于其在生物医学领域的重要性,因此对癌症的早期发现进行了大量探索。在用于回答这个生物学问题的不同类型的数据中,由于对宿主免疫系统在肿瘤生物学中的作用的增长,基于T细胞受体(TCR)的研究受到了最近的关注。但是,患者和多个TCR序列之间的一对一对应关系阻碍了研究人员简单地采用经典的统计/机器学习方法。最近有尝试在多个实例学习(MIL)的上下文中对这种类型的数据进行建模。 尽管使用TCR序列将MIL在癌症检测中采用了新的应用,并且在几种肿瘤类型中表现出了足够的表现,但仍然有改善的空间,尤其是对于某些癌症类型。此外,对于此应用,无法完全研究可解释的神经网络模型。 在本文中,我们提出了基于稀疏注意(Minn-SA)的多个实例神经网络,以增强癌症检测和解释性的性能。稀疏的注意力结构在每个袋子中散发出非信息的实例,与跳过连接结合使用可解释性和更好的预测性能。 我们的实验表明,与现有的MIL方法相比,Minn-SA平均在10种不同类型的癌症中得出的ROC曲线(AUC)得分最高。此外,我们从估计的注意力中观察到,明尼 - SA可以鉴定出对同一T细胞库中肿瘤抗原特异的TCR。

Early detection of cancers has been much explored due to its paramount importance in biomedical fields. Among different types of data used to answer this biological question, studies based on T cell receptors (TCRs) are under recent spotlight due to the growing appreciation of the roles of the host immunity system in tumor biology. However, the one-to-many correspondence between a patient and multiple TCR sequences hinders researchers from simply adopting classical statistical/machine learning methods. There were recent attempts to model this type of data in the context of multiple instance learning (MIL). Despite the novel application of MIL to cancer detection using TCR sequences and the demonstrated adequate performance in several tumor types, there is still room for improvement, especially for certain cancer types. Furthermore, explainable neural network models are not fully investigated for this application. In this article, we propose multiple instance neural networks based on sparse attention (MINN-SA) to enhance the performance in cancer detection and explainability. The sparse attention structure drops out uninformative instances in each bag, achieving both interpretability and better predictive performance in combination with the skip connection. Our experiments show that MINN-SA yields the highest area under the ROC curve (AUC) scores on average measured across 10 different types of cancers, compared to existing MIL approaches. Moreover, we observe from the estimated attentions that MINN-SA can identify the TCRs that are specific for tumor antigens in the same T cell repertoire.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源