化学学报 ›› 2008, Vol. 66 ›› Issue (19): 2093-2098.    下一篇

研究论文

用拓扑指数和神经网络研究有机污染物的生物富集因子

冯长君*,a 沐来龙b 杨伟华b 蔡可迎a

  

  1. (a徐州工程学院化学化工学院 徐州 221008)
    (b徐州师范大学化学化工学院 徐州 221116)

  • 投稿日期:2008-01-20 修回日期:2008-05-14 发布日期:2008-10-14
  • 通讯作者: 冯长君

Research on the Bioconcentration Factors of Organic Pollutants with Topological Indices and Artificial Neural Network

FENG, Chang-Jun *,a MU, Lai-Long b YANG, Wei-Hua b CAI, Ke-Ying a   

  1. (a School of Chemistry & Chemical Engineering, Xuzhou Institute of Technology, Xuzhou 221008)
    (b School of Chemistry & Chemical Engineering, Xuzhou Normal University, Xuzhou 221116)
  • Received:2008-01-20 Revised:2008-05-14 Published:2008-10-14
  • Contact: FENG, Chang-Jun

在修正Randic的分子连接性指数和连接矩阵的基础上, 定义新型分子连接性指数(mF), 并计算了239种有机污染物的分子连接性指数(mF). 用其1F构建了239种有机污染物生物富集因子(lgBCF)的QSAR模型, 该模型判定系数(R2)及逐一剔除法(LOO)的交互验证系数(Q2)分别为0.747和0.742. 而用1F和4个电性距离矢量(Mk)构建的五元QSAR模型的R2及Q2分别为0.829和0.819. 结果表明, 从统计学的角度, 该模型具有高度的稳定性及良好预测能力. 从此模型可知, 有机污染物BCF的主要影响因素是—C—, >C—, —O—, —S—, —X等分子结构碎片以及分子的柔韧性、折叠程度等空间因素. 将5个结构参数作为人工神经网络的输入层结点, 采用5∶26∶1的网络结构, 利用BP算法, 获得了一个令人满意的QSAR模型, 其R2和标准偏差s分别为0.987和0.157, 表明lgBCF与这5个参数具有良好的非线性关系. 从上可见, 新建的连接性指数1F以及电性距离矢量与有机物的生物富集因子具有良好的相关性, 可望在物质构效关系研究中获得广泛的应用.

关键词: 有机污染物, 生物富集因子, 新型分子连接性指数, 电性距离矢量, 定量构效关系

On the basis of the revision of Randic’s molecular connectivity index and conjugation matrix, a novel molecular connectivity index (mF) was defined and calculated for 239 organic pollutant molecules in this paper. The QSAR model of bioconcentration factor (lgBCF) for 239 organic pollutants was constructed from 1F, the traditional correlation coefficient (R2) and the cross-validation correlation coefficient (Q2) of leave-one-out (LOO) were 0.747 and 0.742, respectively. The five-parameter QSAR model was constructed from 1F and the four electronegativity distance vectors (Mk), the traditional correlation coefficient (R2) and the cross-validation correlation coefficient (Q2) were 0.829 and 0.819, respectively. The result demonstrates that the model is highly reliable and has good predictive ability from the point of view of statistics. From the five parameters of the model, it is known that the dominant influence factors of bioconcentration factor are the molecular structure fragments: —C—, >C—, —O—, —S—, —X and the space factors: the flexibility and the puckered degree of molecules for organic pollutant. The five structural parameters were used as the input neurons of artificial neural network, and a 5∶26∶1 network architecture was employed. A satisfying model could be constructed with the back-propagation algorithm, with the correlation coefficient R2 and the standard error s being 0.987 and 0.157, respectively, showing that the relationship between lgBCF and five structural parameters has a good nonlinear correlation. The results show that the new parameters 1F and Mk have good rationality and efficiency for the bioconcentration factors of organic compounds. It can be expected that the 1F and Mk will be used widely in quantitative structure-property/activity relationship research.

Key words: organic pollutant, bioconcentration factor, novel molecular connectivity index, electronegativity distance vector, quantitative structure-activity relationship