Acta Chimica Sinica ›› 2024, Vol. 82 ›› Issue (4): 387-395.DOI: 10.6023/A23100473 Previous Articles     Next Articles

Original article

机器学习方法预测含硼材料能隙

李珺卿, 宋千禧, 刘子义, 王东琪*()   

  1. 大连理工大学精细化工国家重点实验室 辽宁省碳资源催化转化重点实验室 化学学院 化工学院 大连 116024
  • 投稿日期:2023-10-27 发布日期:2024-01-05
  • 基金资助:
    科技部重点研发专项(2021YFA1500301); 中央高校基础研究基金项目(DUT20RC(3)081); 辽宁兴辽英才计划(XLYC2002015)

Machine Learning for Predicting Band Gap in Boron-containing Materials

Junqing Li, Qianxi Song, Ziyi Liu, Dongqi Wang*()   

  1. State Key Laboratory of Fine Chemistry, Key Laboratory of Catalytic Conversion of Carbon Resources, School of Chemistry, School of Chemical Engineering, Dalian University of Technology, Dalian, Liaoning 116024, China
  • Received:2023-10-27 Published:2024-01-05
  • Contact: * E-mail: wangdq@dlut.edu.cn
  • Supported by:
    State Key Research and Development Program(2021YFA1500301); Fundamental Research Funds for the Central Universities(DUT20RC(3)081); LiaoNing Revitalization Talents Program(XLYC2002015)

New materials are an important driving force for social development. In recent years, attention on boron-containing materials is growing in the fields of new energy and catalysis, and calls for compelling need for in-depth study of their structure-property relationship to contribute to the research and development of boron-containing materials. In this work, on the aware of the shift of materials research from traditional trial-and-error paradigm to data-driven research paradigm, we explored the application of ten important machine learning algorithms in the prediction of band gaps of boron-containing materials through feature selection (Pearson correlation analysis), grid search-based optimization (Model optimal parameters), and feature importance analysis (interpretability analysis of the model). The results show that the band gap prediction model using the Random Forest algorithm outperforms the other models with a 84% prediction accuracy, and the total magnetization of boron-containing materials is identified to significantly correlate negatively with the band gap, i.e. the smaller the total magnetization of the material, the larger its band gap. The advantage of the Random Forest algorithm over other models is that it is better able to capture correlations between features. For example, the linear model is unable to detect the importance of the total magnetization of boron-containing materials from the material features, thus leading to a lower model prediction performance. This work shows that machine learning methods can be used to guide the design of boron-containing materials with specific band gap. Meanwhile, the results also show that, as an integrated learning model, Random Forest has good learning ability and stable prediction performance, and can be applied to the prediction of band gap and other material properties of other types of material systems, accelerating the design and optimization process of material properties, and is of great scientific significance for the rapid screening and high-performance prediction of new functional materials.

Key words: boron-containing materials, band gap, total magnetization, machine learning