化学学报 ›› 2024, Vol. 82 ›› Issue (4): 387-395.DOI: 10.6023/A23100473 上一篇    下一篇

研究论文

机器学习方法预测含硼材料能隙

李珺卿, 宋千禧, 刘子义, 王东琪*()   

  1. 大连理工大学精细化工国家重点实验室 辽宁省碳资源催化转化重点实验室 化学学院 化工学院 大连 116024
  • 投稿日期:2023-10-27 发布日期:2024-01-05
  • 基金资助:
    科技部重点研发专项(2021YFA1500301); 中央高校基础研究基金项目(DUT20RC(3)081); 辽宁兴辽英才计划(XLYC2002015)

Machine Learning for Predicting Band Gap in Boron-containing Materials

Junqing Li, Qianxi Song, Ziyi Liu, Dongqi Wang*()   

  1. State Key Laboratory of Fine Chemistry, Key Laboratory of Catalytic Conversion of Carbon Resources, School of Chemistry, School of Chemical Engineering, Dalian University of Technology, Dalian, Liaoning 116024, China
  • Received:2023-10-27 Published:2024-01-05
  • Contact: * E-mail: wangdq@dlut.edu.cn
  • Supported by:
    State Key Research and Development Program(2021YFA1500301); Fundamental Research Funds for the Central Universities(DUT20RC(3)081); LiaoNing Revitalization Talents Program(XLYC2002015)

近年来, 含硼材料在新能源、催化等领域日益受到重视, 然而, 对于高附加值的含硼材料发展还存在很高的技术壁垒. 因此, 亟需深入研究含硼材料微观性质间的关联关系, 推动高端含硼材料的研发. 本工作面向材料研究从传统的试错法向数据驱动的研究范式转变的需求, 通过特征选择、网格搜索优化以及特征重要性分析, 探索了多种重要的机器学习算法在含硼材料能隙预测中的应用. 结果表明, 采用随机森林算法的能隙预测模型决定系数(R2)可达0.84, 并发现含硼材料的总磁化强度(total magnetization)特征与能隙存在显著的负相关关系, 即材料的总磁化强度越小, 其能隙越大. 本工作表明机器学习方法可用于定向设计具有特定能隙的含硼材料. 同时, 结果也表明, 作为一种集成学习模型, 随机森林具有较好的学习能力与稳定的预测性能, 可以应用到其它类型材料体系的能隙以及其它材料属性的预测, 加速材料性能的设计与优化过程, 对新型功能材料的快速筛选与高性能预测具有重要的科学意义.

关键词: 含硼材料, 能隙, 总磁化强度, 机器学习

New materials are an important driving force for social development. In recent years, attention on boron-containing materials is growing in the fields of new energy and catalysis, and calls for compelling need for in-depth study of their structure-property relationship to contribute to the research and development of boron-containing materials. In this work, on the aware of the shift of materials research from traditional trial-and-error paradigm to data-driven research paradigm, we explored the application of ten important machine learning algorithms in the prediction of band gaps of boron-containing materials through feature selection (Pearson correlation analysis), grid search-based optimization (Model optimal parameters), and feature importance analysis (interpretability analysis of the model). The results show that the band gap prediction model using the Random Forest algorithm outperforms the other models with a 84% prediction accuracy, and the total magnetization of boron-containing materials is identified to significantly correlate negatively with the band gap, i.e. the smaller the total magnetization of the material, the larger its band gap. The advantage of the Random Forest algorithm over other models is that it is better able to capture correlations between features. For example, the linear model is unable to detect the importance of the total magnetization of boron-containing materials from the material features, thus leading to a lower model prediction performance. This work shows that machine learning methods can be used to guide the design of boron-containing materials with specific band gap. Meanwhile, the results also show that, as an integrated learning model, Random Forest has good learning ability and stable prediction performance, and can be applied to the prediction of band gap and other material properties of other types of material systems, accelerating the design and optimization process of material properties, and is of great scientific significance for the rapid screening and high-performance prediction of new functional materials.

Key words: boron-containing materials, band gap, total magnetization, machine learning