化学学报    

研究论文

基于MOFid赋能下的AI大数据挖掘高性能化学战剂吸附材料

翁惠琼a,#, 黄河a,#, 王雯菲a, 李和国b, 李晓鹏b, 张守鑫b, 李树华a, 赵越*,b, 吴玉芳*,a, 乔智威*,a   

  1. a广州大学化学化工学院,能源与催化研究所,广东 广州 510006,
    b核生化灾害防护化学全国重点实验室,北京 102205
  • 投稿日期:2025-08-08
  • 通讯作者: *E-mail: zqiao@gzhu.edu.cn
  • 基金资助:
    项目受国家自然科学基金项目(22478085, 22308069, 21978058)和广东省自然科学基金项目(2023A1515240076, 2022A1515011446)资助

AI Big-Data Mining Empowered by MOFid for High-Performance Chemical Warfare Agent Adsorbents

Weng, Huiqionga,#, Huang, Hea,#, Wang, Wenfeia, Li, Heguob, Li, Xiaopengb, Zhang, Shouxinb, Li, Shuhuaa, Zhao, Yue*,b, Wu, Yufang*,a, Qiao, Zhiwei*,a   

  1. a Guangzhou Key Laboratory for New Energy and Green Catalysis, College of Chemistry and Chemical Engineering,Guangzhou University, Guangzhou 510006, Guangdong, China.
    b State Key Laboratory of Chemistry for NBC Hazards Protection, Beijing 102205, China.
  • Received:2025-08-08
  • About author:#H. Weng, and H. Huang contributed equally to this work.
  • Supported by:
    National Natural Science Foundation of China (22478085, 22308069, 21978058) and the Natural Science Foundation of Guangdong Province (2023A1515240076, 2022A1515011446).

为高效捕获对人类健康与环境构成严重威胁的低浓度化学战剂及模拟物,本研究依托AI大数据驱动的高通量计算筛选策略,针对数万种金属有机框架(Metal Organic Framework,MOF)吸附材料数据库进行系统分析,精准评估低浓度化学战剂在空气中的分离捕获性能。引入选择性与吸附容量的权衡值(TSN)作为综合指标,结合六种算法(决策树、随机森林、梯度提升回归树、极端梯度提升(Extreme Gradient Boosting, XGB)、反向传播神经网络和轻梯度提升机)构建预测模型,结果表明XGB算法的预测效果最佳(R2可达0.923)。随后,将XGB算法与MOFid标准化标识符相结合,采用大数据筛选挖掘方法对TOP1%高性能MOFs进行结构共性解析,发现开放过渡金属位点与刚性有机配体的协同作用有助于增强化学战剂的吸附亲和力;同时,高频率的特定拓扑结构被证明可通过形成有效的孔隙结构来增益吸附作用。本研究通过MOFid赋能下的AI大数据挖掘技术为优化MOF吸附性能和筛选高效材料提供了关键指导,助力空气中痕量化学战剂的高效捕获。

关键词: 大数据挖掘, MOFid, 高通量筛选, 化学战剂, 机器学习, 分子模拟, 金属有机框架

To efficiently capture low-concentration chemical warfare agents (CWAs), which pose severe threats to human health and the environment, this study employed an AI big-data-driven high-throughput computational screening strategy to systematically analyze and evaluate the capture and separation performance of tens of thousands of metal-organic framework (MOF) adsorbents for trace CWAs in air. Grand Canonical Monte Carlo (GCMC) simulations were employed to evaluate the CWAs uptake of 15333 Computation-ready experimental MOFs (CoRE-MOFs) under a gas mixture containing N2, O2, and toxic gas at 298 K and 101.325 kPa. A trade-off score (TSN) combining selectivity and adsorption capacity was introduced as a comprehensive performance metric. Predictive models were constructed using six algorithms (Decision Tree, Random Forest, Gradient Boosting Regression Tree, Extreme Gradient Boosting (XGB), Backpropagation Neural Network, and Light Gradient Boosting Machine) based on seven key descriptors of MOFs— namely, Largest Cavity Diameter (LCD), density of MOF (ρMOF), Pore-Limiting Diameter (PLD), porosity (φ), Volumetric Surface Area (VSA), Henry Coefficient (K), and heat of adsorption (Q0st). And the XGB algorithm yielded the best predictive performance, with the R2 up to 0.923. ​​SHAP analysis revealed that K was the most critical descriptor for CWAs adsoprtion, with optimal φ and VSA ranges for maximal sarin and soman adsorption​​ being 0.7-0.8 and 2000-2500 m2/cm3, respectively​​. Then, the structural commonalities of the top 1% high-performance MOFs were analyzed by integrating the XGB algorithm with the MOFid standardized identifier. The results reveal that the synergistic effect between open transition metal sites (​​particularly Cr, Nb, Al, In, Li​​) and rigid organic linkers of MOFs enhances the adsorption affinity towards CWAs. Meanwhile, the high occurrence probability of specific topological structures (​​such as sql and kgm​​) indicates that they are conducive to forming favorable pore structures, thus enhancing the adsorption effect. ​​Guided by these insights, 26 novel MOFs were rational designed for enhanced CWAs capture. This study, employing AI big-data mining driven by MOFid method, provides critical guidance for optimizing MOF adsorption performance and screening highly efficient materials, thereby facilitating the efficient capture of trace chemical warfare agents in air ​​and advancing the development of protective equipment​​.

Key words: AI big-data mining, MOFid, High-throughput screening, Chemical warfare agents, Machine learning, Molecular simulation, Metal-organic frameworks