青岛科技大学  English 
王明辉
赞  

教师拼音名称:wangminghui

手机版

访问量:

最后更新时间:..

SulSite-GTB: identification of protein S-sulfenylation sites by fusing multiple feature information and gradient tree boosting

关键字:Classification (of information);Amino acids - Cell signaling - Feature extraction - Forecasting - Forestry - Nearest neighbor search - Proteins - Regression analysis

摘要:Protein cysteine S-sulfenylation is an essential and reversible post-translational modification that plays a crucial role in transcriptional regulation, stress response, cell signaling and protein function. Studies have shown that S-sulfenylation is involved in many human diseases such as cancer, diabetes and arteriosclerosis. However, experimental identification of protein S-sulfenylation sites is generally expensive and time-consuming. In this study, we proposed a new protein S-sulfenylation sites prediction method SulSite-GTB. First, fusion of amino acid composition, dipeptide composition, encoding based on grouped weight, K nearest neighbors, position-specific amino acid propensity, position-weighted amino acid composition and pseudo-position specific score matrix feature extraction to obtain the initial feature space. Secondly, we use the synthetic minority oversampling technique (SMOTE) algorithm to process the class imbalance data, and the least absolute shrinkage and selection operator (LASSO) are employed to remove the redundant and irrelevant features. Finally, the optimal feature subset is input into the gradient tree boosting classifier to predict the S-sulfenylation sites, and the five-fold cross-validation and independent test set method are used to evaluate the prediction performance of the model. Experimental results showed the overall prediction accuracy is 92.86% and 88.53%, respectively, and the AUC values are 0.9706 and 0.9425, respectively, on the training set and the independent test set. Compared with other prediction methods, the results show that the proposed method SulSite-GTB is significantly superior to other state-of-the-art methods and provides a new idea for the prediction of post-translational modification sites of other proteins. The source code and all datasets are available at https://github.com/QUST-AIBBDRC/SulSite-GTB/.<br/> © 2020, Springer-Verlag London Ltd., part of Springer Nature.

卷号:32

期号:17

是否译文:

崂山校区 - 山东省青岛市松岭路99号   
四方校区 - 山东省青岛市郑州路53号   
中德国际合作区(中德校区) - 山东省青岛市西海岸新区团结路3698号
高密校区 - 山东省高密市杏坛西街1号   
济南校区 - 山东省济南市文化东路80号©2015 青岛科技大学    
管理员邮箱:master@qust.edu.cn