Acta Chimica Sinica ›› 2010, Vol. 68 ›› Issue (11): 1137-1142. Previous Articles     Next Articles

Special Topic

支持矢量机和线性判别分析对细胞穿透肽的识别

陈国华1,2,夏之宁*,1,陆瑶3   

  1. (1重庆大学生物工程学院 重庆 400030)
    (2四川理工学院化学与制药工程学院 自贡 643000)
    (3四川理工学院材料与化学工程系 自贡 643000)
  • 投稿日期:2009-11-26 修回日期:2010-01-28 发布日期:2010-02-11
  • 通讯作者: 陈国华 E-mail:chgh29@163.com
  • 基金资助:

    国家自然科学基金资助项目(No. 20775096)

Prediction of Cell-Penetrating Peptides Using both Support Vector Machine and Linear Discriminant Analysis

Chen Guohua1,2 Xia Zhining*,1 Lu Yao3   

  1. 1 College of Bioengineering, Chongqing University, Chongqing 400030)
    (2 School of Chemistry and Pharmaceutical Engineering, Sichuan University of Science and Engineering, Zigong 643000)
    (3 College of Materials and Chemical Engineering, Sichuan University of Science and Engineering, Zigong 643000
  • Received:2009-11-26 Revised:2010-01-28 Published:2010-02-11
  • Contact: Guohua Chen E-mail:chgh29@163.com

In order to identify new potential CPPs, two methods, fisher's linear discriminant analysis (LDA) and support vector machine (SVM), have used to construct two classifiers. We have identified 123 known natural CPPs from the literature and used them to construct 2 data sets, the training set with 25 CPPs and 16 non-CPPs and the test set with 61 CPPs and 21 non-CPPs. The auto cross covariances (ACCs) by describing each amino acid by principal properties (z-scales) and their main compounds were used to construct classifiers, respectively. The obtained models, using fisher's LDA, were only able to classify correctly 57.3% on test sets, whereas these models showed large classification rates on the training sets in training and cross-validation procedures. The classification rates using SVM tool were 100% (75.6%) and 85.4% (80.5%) on the training test in training (Loo-cross-validation), when 72 ACCs and their main components were used for classification. The best result for SVM classification on test set is 74.4% using 72 ACCs. These results validate that the SVM can extract the minor change in variables. The SVM's model is better than LDA model.

Key words: cell-penetrating peptide, support vector machine, linear discriminant analysis, z-scale, QSAR