A novel model to predict O-glycosylation sites using a highly unbalanced dataset | |
Zhou, Kun1,2; Ai, Chunzhi1; Dong, Peipei3; Fan, Xuran1; Yang, Ling1 | |
刊名 | glycoconjugate journal |
2012-10-01 | |
卷号 | 29期号:7页码:551-564 |
关键词 | Protein glycosylation prediction Amino acid index Feature selection PP-LDA |
通讯作者 | 杨凌 |
产权排序 | 1,1 |
英文摘要 | in silico approaches have become an alternative method to study o-glycosylation. in this paper, we developed a linear interpretable model for o-glycosylation prediction based on an unbalanced dataset, analyzing the underlying biological knowledge of glycosylation. a training set of 4446 sites involving 468 positive sites and 3978 negative sites was developed during this research. the sites were encoded using the amino acid index (aaindex), and the forward stepwise procedure utilized for feature selection. the linear discriminant analysis with an equal a priori probability (pp-lda) was employed to develop the interpretable model. performance of the model was verified using both the internal leave-one-out cross-validation and external validation methods. two non-linear algorithms, the supervised support vector machine and the unsupervised self-organizing competitive neural network, were used as comparisons. the pp-lda model exhibited improved classification results with accuracy of 82.1 % for cross-validations and 80.3 % for external prediction. further analysis of this linear model indicated that the properties at position r-1 and the properties relative to hydrophobicity contributed more to the glycosylation prediction. however, the alpha and turn propensities at the c-terminal, together with physicochemical properties at the n-terminal, are also relative to the glycosylation activity. this model is not only capable of predicting the possibility of glycosylation using an unbalanced dataset, but is also helpful to understand the underlying biological mechanisms of glycosylation. considering the publicly accessibility of our prediction model, a downloadable program is provided in our supply materials. |
学科主题 | 物理化学 |
WOS标题词 | science & technology ; life sciences & biomedicine |
类目[WOS] | biochemistry & molecular biology |
研究领域[WOS] | biochemistry & molecular biology |
关键词[WOS] | polypeptide n-acetylgalactosaminyltransferase ; amino-acid-sequence ; mammalian proteins ; galnac-transferase ; posttranslational modifications ; neural-network ; udp-galnac ; in-vitro ; specificity ; selection |
收录类别 | SCI |
语种 | 英语 |
WOS记录号 | WOS:000308356000009 |
公开日期 | 2013-10-11 |
内容类型 | 期刊论文 |
源URL | [http://159.226.238.44/handle/321008/118136] |
专题 | 大连化学物理研究所_中国科学院大连化学物理研究所 |
作者单位 | 1.Chinese Acad Sci, Dalian Inst Chem Phys, Lab Pharmaceut Resource Discovery, Dalian 116023, Peoples R China 2.Chinese Acad Sci, Grad Sch, Beijing 100049, Peoples R China 3.Western Med Dalian Med Univ, Res Inst Integrated Tradit, Dalian 116044, Peoples R China |
推荐引用方式 GB/T 7714 | Zhou, Kun,Ai, Chunzhi,Dong, Peipei,et al. A novel model to predict O-glycosylation sites using a highly unbalanced dataset[J]. glycoconjugate journal,2012,29(7):551-564. |
APA | Zhou, Kun,Ai, Chunzhi,Dong, Peipei,Fan, Xuran,&Yang, Ling.(2012).A novel model to predict O-glycosylation sites using a highly unbalanced dataset.glycoconjugate journal,29(7),551-564. |
MLA | Zhou, Kun,et al."A novel model to predict O-glycosylation sites using a highly unbalanced dataset".glycoconjugate journal 29.7(2012):551-564. |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论