使用可靠的评估方法预测肽的溶血趋势

论文标题

使用可靠的评估方法预测肽的溶血趋势

Prediction of Hemolysis Tendency of Peptides using a Reliable Evaluation Method

论文作者

Raza, Ali, Arshad, Hafiz Saud

论文摘要

在过去的几十年中发现了许多肽，它们表现出抗菌和抗癌倾向。由于这些原因，肽应该是合理的治疗候选者。一些肽可以构成低代谢稳定性，高毒性和肽的高血压。这凸显了在将肽用于治疗药物之前评估溶血趋势和毒性的重要性。评估肽毒性的传统方法可能是耗时且昂贵的。在这项研究中，我们根据某些血液性标准从抗菌活性和肽的结构（DBAASP）数据库中提取了肽数据（Hemo-DB），我们提出了一种基于机器学习的方法，用于预测肽溶液倾向（即溶解性或非溶性性或非溶性性）。我们的模型可对血液预测基准有了重大改进。我们还提出了一种可靠的基于聚类的火车测试分裂方法，该方法可确保火车组中没有肽的40％以上，类似于测试集中的任何肽。使用此火车测试拆分，我们可以在看不见的数据分布或新发现的肽上可靠地估计预期的模型性能。我们的模型测试使用传统的随机火车测试方法测试0.9986 AUC-ROC（接收器操作曲线下的面积）和97.79％的Hemo-DB测试精度。此外，我们的模型测试AUC-ROC为0.997，精度为97.58％，同时使用基于聚类的火车测试数据拆分。此外，我们检查了关于看不见的数据分布的模型（在Hemo-Pi 3处），并记录了0.8726 AUC-ROC和79.5％的精度。使用所提出的方法，可以筛选潜在的治疗肽，这可能会进一步进行治疗，并获得可靠的预测肽和新发现的肽的看不见的氨基酸分布。

There are numerous peptides discovered through past decades, which exhibit antimicrobial and anti-cancerous tendencies. Due to these reasons, peptides are supposed to be sound therapeutic candidates. Some peptides can pose low metabolic stability, high toxicity and high hemolity of peptides. This highlights the importance for evaluating hemolytic tendencies and toxicity of peptides, before using them for therapeutics. Traditional methods for evaluation of toxicity of peptides can be time-consuming and costly. In this study, we have extracted peptides data (Hemo-DB) from Database of Antimicrobial Activity and Structure of Peptides (DBAASP) based on certain hemolity criteria and we present a machine learning based method for prediction of hemolytic tendencies of peptides (i.e. Hemolytic or Non-Hemolytic). Our model offers significant improvement on hemolity prediction benchmarks. we also propose a reliable clustering-based train-tests splitting method which ensures that no peptide in train set is more than 40% similar to any peptide in test set. Using this train-test split, we can get reliable estimated of expected model performance on unseen data distribution or newly discovered peptides. Our model tests 0.9986 AUC-ROC (Area Under Receiver Operating Curve) and 97.79% Accuracy on test set of Hemo-DB using traditional random train-test splitting method. Moreover, our model tests AUC-ROC of 0.997 and Accuracy of 97.58% while using clustering-based train-test data split. Furthermore, we check our model on an unseen data distribution (at Hemo-PI 3) and we recorded 0.8726 AUC-ROC and 79.5% accuracy. Using the proposed method, potential therapeutic peptides can be screened, which may further in therapeutics and get reliable predictions for unseen amino acids distribution of peptides and newly discovered peptides.

下载PDF全文

下载文献需遵守相关版权规定

论文标题