关键字:NEURAL-NETWORK; PREDICTION; INFORMATION; SELECTION; SETS
摘要:An essential post-translational modification, phosphorylation is intimately related with a wide range of biological activities. The advancement of effective computational methods for correctly recognizing phosphorylation sites is important for in-depth understanding of various physiological phenomena. However, the traditional method of identifying phosphorylation sites experimentally is time-consuming and laborious, which makes it difficult to meet the processing demands of today's big data. This research proposes the use of a novel model, Res-GCN, to recognize the phosphorylation sites of SARS-CoV-2. Firstly, eight feature extraction strategies are utilized to digitize the protein sequence from multiple viewpoints, including amino acid property encodings (AAindex), pseudo-amino acid composition (PseAAC), adapted normal distribution bi-profile Bayes (ANBPB), dipeptide composition (DC), binary encoding (BE), enhanced amino acid composition (EAAC), Word2Vec, and BLOSUM62 matrices. Secondly, elastic net is utilized to eliminate redundant data in the fused matrix. Finally, a combination of graph convolutional network (GCN) and residual network (ResNet) is used to classify the phosphorylated sites and output predictions using a fully connected layer (FC). The performance of Res-GCN is tested by 5-fold cross-validation and independent testing, and excellent results are obtained on S/T and Y datasets. This demonstrates that the Res-GCN model exhibits exceptional predictive performance and generalizability.
卷号:112
期号:
是否译文:否