Acta Chimica Sinica ›› 2011, Vol. 69 ›› Issue (10): 1232-1238. Previous Articles     Next Articles

Full Papers

PLS变量筛选法用于有机物透聚乙烯膜性能QSAR研究

张永红1,2,3, 刘树深*,2, 肖乾芬2, 覃礼堂2, 夏之宁*,3   

  1. (1重庆医科大学药学院 重庆 400016)
    (2同济大学环境科学与工程学院 长江水环境教育部重点实验室 上海 200092)
    (3重庆大学生物工程学院 重庆 400030)
  • 投稿日期:2010-07-26 修回日期:2010-12-30 发布日期:2011-01-17
  • 通讯作者: 刘树深 E-mail:ssliuhl@263.net
  • 基金资助:

    区域饮用水源优化配置与水质改善技术集成与示范

PLS Variable Selection Procedure in QSAR Study on the Performance of Organic Compounds Through Polyethylene Membrane

Zhang Yonghong1,2,3|Liu Shushen*,2|Xiao Qianfen2|Qin Litang2|Xia Zhining*,3   

  1. (1 College of Pharmaceutical Sciences, Chongqing Medical University, Chongqing 400016)
    (2 Key Laboratory of Yangtze River Water Environment, Ministry of Education, College of Environmental Science and Engineering, Tongji University, Shanghai 200092)
    (3 College of Bioengineering, Chongqing University, Chongqing 400030)
  • Received:2010-07-26 Revised:2010-12-30 Published:2011-01-17
  • Contact: Shu-Shen LIU E-mail:ssliuhl@263.net

Following the large number of descriptors used in QSAR/QSPR, it has become a bottleneck problem how to choose the descriptor set which can be used to develop a good stable and predictable model. In this work, the partial least squares (PLS) method was used to screen the important descriptors. The 42 molecular descriptors were selected from an original pool of 1664 descriptors of 63 organic compounds. The PLS regression model between 42 descriptors and the logarithm of the permeability coefficients of various organic compounds through low-density polyethylene was developed and validated by the variable selection and modeling based on prediction (VSMP) technique. It has been found that PLS regression model has good quality, r2=0.9647 and q2=0.8364 for the training set of 43 samples and =0.9306 for the test set of 20 compounds. Using PLS variable selection procedure, it is possible to rapidly and effectively select the important variables closely related with the activity of compounds and construct a model with good stability and predictability.

Key words: variable selection, partial least squares (PLS), variable importance in projection (VIP), QSAR, permeability coefficient