化学学报 ›› 2004, Vol. 62 ›› Issue (19): 1968-1972. 上一篇    

研究简报

基于氨基酸模糊聚类分析的跨膜区域预测

邓勇1, 刘琪2, 李亦学2   

  1. 1. 上海交通大学电子信息学院, 上海, 200030;
    2. 中国科学院上海生命科学研究院生物信息中心, 上海, 200031
  • 投稿日期:2004-03-11 修回日期:2004-06-09 发布日期:2014-02-17
  • 通讯作者: 邓勇,E-mail:dengyong@sjtu.edu.cn. E-mail:dengyong@sjtu.edu.cn
  • 基金资助:
    863计划(No.2001AA2311)、上海市自然科学基金(No.03ZR14065)资助项目.

Prediction of Transmembrane Segments Based on Fuzzy Cluster Analysis of Amino Acids

DENG Yong1, LIU Qi2, LI Yi-Xue2   

  1. 1. School of Electronics & Information Technology, Shanghai Jiaotong University, Shanghai 200030;
    2. Bioinformation Center, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031
  • Received:2004-03-11 Revised:2004-06-09 Published:2014-02-17

跨膜蛋白在进化过程中,序列保守性较差,即使是同源蛋白序列的一致性程度也较低,因而在跨膜区预测算法中,通过序列的一致性程度来选取训练集并不能有效地消除预测结果对训练集的过度适应性.本文提出了一种基于氨基酸模糊聚类分析的预测算法,通过氨基酸在各个区域分布的相似性程度进行模糊聚类,从而根据一类氨基酸的分布特性而不是各个氨基酸的分布特性进行跨膜区预测.结果表明,该方法能在一定程度上消除训练集的选取对测试结果的影响,提高跨膜蛋白拓扑结构预测的准确度,特别是提高对目前知之甚少的跨膜蛋白的预测准确度.

关键词: 氨基酸, 跨膜蛋白, 模糊聚类, 跨膜区预测

Transmembrane protein sequences are badly conserved during evolution.Even two homologous proteins have a low level of sequence identity.Consequently,the commonly used method to select training sequences based on sequence identity can not efficiently reduce the sampling bias in the transmembrane segment predictions.To solve this problem,this paper presents a new prediction algorithm based on fuzzy cluster analysis of amino acids.It clusters the amino acids into groups according to their distribution similarity in different regions and then makes the prediction based on the distribution properties of each group instead of those of each amino acid.The results show that the new algorithm can efficiently reduce the impact of the selection of training sequences on the prediction results to some extent and thus improve the prediction accuracy.

Key words: amino acid, transmembrane protein, fuzzy cluster, transmembrane segment prediction