化学学报 ›› 2007, Vol. 65 ›› Issue (3): 197-202. 上一篇    下一篇

研究论文

基于支持向量学习机的HIV-1蛋白酶抑制剂的活性预测

饶含兵1, 李泽荣*,1, 陈晓梅1, 李象远*,2   

  1. (1四川大学化学学院 成都 610064)
    (2四川大学化工学院 成都 610065)
  • 投稿日期:2006-06-12 修回日期:2006-09-23 发布日期:2007-02-14
  • 通讯作者: 李象远

Activity Prediction of HIV-1 Protease Inhibitors Using Support Vector Machine

RAO Han-Bing1; LI Ze-Rong*,1; CHEN Xiao-Mei1; LI Xiang-Yuan*,2   

  1. (1 College of Chemistry, Sichuan University, Chengdu 610064)
    (2 College of Chemical Engineering, Sichuan University, Chengdu 610065)
  • Received:2006-06-12 Revised:2006-09-23 Published:2007-02-14
  • Contact: Xiang-Yuan Li

为了预测人体免疫缺陷蛋白酶抑制剂的活性, 计算了表征分子的组成和拓扑特征的462个分子描述符, 用Kennard-Stone方法和随机方法进行了训练集和测试集设计, 用Monte Carlo 模拟退火方法进行变量筛选, 并分别用神经网络, 逻辑回归, k-近邻和支持向量学习机方法建立了HIV-1蛋白酶的抑制剂模型. 结果表明支持向量学习机优于其余机器学习方法, 用SVM方法所建立的最优模型的最后预测正确率达到98.24%.

关键词: 蛋白酶抑制剂, 分子描述符, 机器学习方法, 变量筛选

In order to predict the activity of HIV protease inhibitors, constitutional and topological descriptors, in total 462, were calculated to characterize the structural and physicochemical properties for each molecule under study. The Kennard-Stone method and a random method were adopted to design the training set and the test set. Monte Carlo simulated annealing method was applied to the variable selection. Machine learning methods including support vector machine, artificial neural network, logistic regression, and k-nearest neighbor, were applied to the development of inhibitor models. It was shown that the support vector machine method outperforms the other methods and the final model developed using the SVM method gave a prediction accuracy of 98.24%.

Key words: protease inhibitor, molecular descriptor, machine-learning method, variable selection