A novel model to predict O-glycosylation sites using a highly unbalanced dataset
Zhou, Kun1,2; Ai, Chunzhi1; Dong, Peipei3; Fan, Xuran1; Yang, Ling1
刊名glycoconjugate journal
2012-10-01
卷号29期号:7页码:551-564
关键词Protein glycosylation prediction Amino acid index Feature selection PP-LDA
通讯作者杨凌
产权排序1,1
英文摘要in silico approaches have become an alternative method to study o-glycosylation. in this paper, we developed a linear interpretable model for o-glycosylation prediction based on an unbalanced dataset, analyzing the underlying biological knowledge of glycosylation. a training set of 4446 sites involving 468 positive sites and 3978 negative sites was developed during this research. the sites were encoded using the amino acid index (aaindex), and the forward stepwise procedure utilized for feature selection. the linear discriminant analysis with an equal a priori probability (pp-lda) was employed to develop the interpretable model. performance of the model was verified using both the internal leave-one-out cross-validation and external validation methods. two non-linear algorithms, the supervised support vector machine and the unsupervised self-organizing competitive neural network, were used as comparisons. the pp-lda model exhibited improved classification results with accuracy of 82.1 % for cross-validations and 80.3 % for external prediction. further analysis of this linear model indicated that the properties at position r-1 and the properties relative to hydrophobicity contributed more to the glycosylation prediction. however, the alpha and turn propensities at the c-terminal, together with physicochemical properties at the n-terminal, are also relative to the glycosylation activity. this model is not only capable of predicting the possibility of glycosylation using an unbalanced dataset, but is also helpful to understand the underlying biological mechanisms of glycosylation. considering the publicly accessibility of our prediction model, a downloadable program is provided in our supply materials.
学科主题物理化学
WOS标题词science & technology ; life sciences & biomedicine
类目[WOS]biochemistry & molecular biology
研究领域[WOS]biochemistry & molecular biology
关键词[WOS]polypeptide n-acetylgalactosaminyltransferase ; amino-acid-sequence ; mammalian proteins ; galnac-transferase ; posttranslational modifications ; neural-network ; udp-galnac ; in-vitro ; specificity ; selection
收录类别SCI
语种英语
WOS记录号WOS:000308356000009
公开日期2013-10-11
内容类型期刊论文
源URL[http://159.226.238.44/handle/321008/118136]  
专题大连化学物理研究所_中国科学院大连化学物理研究所
作者单位1.Chinese Acad Sci, Dalian Inst Chem Phys, Lab Pharmaceut Resource Discovery, Dalian 116023, Peoples R China
2.Chinese Acad Sci, Grad Sch, Beijing 100049, Peoples R China
3.Western Med Dalian Med Univ, Res Inst Integrated Tradit, Dalian 116044, Peoples R China
推荐引用方式
GB/T 7714
Zhou, Kun,Ai, Chunzhi,Dong, Peipei,et al. A novel model to predict O-glycosylation sites using a highly unbalanced dataset[J]. glycoconjugate journal,2012,29(7):551-564.
APA Zhou, Kun,Ai, Chunzhi,Dong, Peipei,Fan, Xuran,&Yang, Ling.(2012).A novel model to predict O-glycosylation sites using a highly unbalanced dataset.glycoconjugate journal,29(7),551-564.
MLA Zhou, Kun,et al."A novel model to predict O-glycosylation sites using a highly unbalanced dataset".glycoconjugate journal 29.7(2012):551-564.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace