化学学报 ›› 2023, Vol. 81 ›› Issue (8): 912-919.DOI: 10.6023/A23040113 上一篇    下一篇

所属专题: 庆祝《化学学报》创刊90周年合辑

研究论文

一种时序信号分类算法在纳米孔道离子电流信号识别中的应用

倪雪a, 辛凯莉b, 胡正利b,*(), 蒋翠玲a,*(), 万永菁a, 应佚伦b, 龙亿涛b   

  1. a 华东理工大学信息科学与工程学院 上海 200237
    b 南京大学化学化工学院 分子传感与成像中心 南京 210023
  • 投稿日期:2023-04-03 发布日期:2023-09-14
  • 作者简介:
    庆祝《化学学报》创刊90周年.
    † 共同第一作者
  • 基金资助:
    项目受科技部重点研发计划(2022YFA1304604); 国家自然科学基金(22106066); 国家自然科学基金(22027806)

A Time-Series Signal Classification Algorithm and Its Application to Nanopore Ionic Current Signal Identification

Xue Nia, Kaili Xinb, Zhengli Hub(), Cuiling Jianga(), Yongjing Wana, Yi-Lun Yingb, Yi-Tao Longb   

  1. a School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237
    b School of Chemistry and Chemical Engineering, Molecular Sensing and Imaging Center (MSIC), Nanjing University, Nanjing 210023
  • Received:2023-04-03 Published:2023-09-14
  • Contact: *E-mail: zhenglihu@nju.edu.cn; cuilingjiang@ecust.edu.cn
  • About author:
    Dedicated to the 90th anniversary of Acta Chimica Sinica.
    † These authors contributed equally to this work.
  • Supported by:
    Ministry of Science and Technology Key R&D Program of China(2022YFA1304604); National Natural Science Foundation of China(22106066); National Natural Science Foundation of China(22027806)

纳米孔道单分子分析技术通常利用阻断电流的时域特征进行识别. 但对于结构、分子量等相似的物质, 由于其时域特性交叠, 采用传统的纳米孔道识别方法难以准确分辨. 为了充分挖掘具有差异性的深层特征, 提升纳米孔道离子流信号识别准确率, 提出了一种时序信号分类算法. 通过有重叠的滑动窗口对原始信号进行分帧, 并利用连续小波变换对逐帧信号进行处理, 可准确获取单分子事件的时频域浅层特征信息. 在此基础上, 利用多分支层间特征融合网络处理获取深层特征. 采用可信统计预测策略对子信号的分类概率统计, 该算法对单氨基酸差异多肽的纳米孔道离子电流信号的识别准确率高达99.00%, 可显著提高纳米孔道对分子量相似甚至相同的单分子的传感能力.

关键词: 纳米孔道分析, 时序信号, 深层特征提取, 层间特征融合, 可信统计预测

Nanopore-based single molecular analysis technique usually uses time-domain features such as time-current scatter plots of blocking currents for event recognition. However, as the time-domain features overlap with each other, the substances with extremely similar molecular structures are difficult to be accurately discriminated using traditional nanopore recognition methods. The differences in the deep feature representations need fully explored to obtain credible recognition results, thus improving the recognition accuracy of nanopore ionic current signals. Here, a time-series signal classification algorithm is proposed in this paper: firstly, the original signal is framed with overlapping sliding windows to generate sub-signals and extract their shallow feature information; then a time-series signal classification network based on Emphasized Channel Attention, Propagation and Aggregation in time delay neural network (ECAPA-TDNN) is proposed to develop a multi-branch inter-layer feature fusion model for deep feature extraction, where the multi-branch multi-level attention module of this model (RepVGG-SE-Res2Block, RSR-Block) obtains multi-scale features by constructing a feature pyramid structure within each residual block, reduces the inference speed based on structural reparameterization techniques while ensuring the model performance, and introduces Adaptively Spatial Feature Fusion (ASFF) to fuse the features of different layers in the network; finally, a credible statistical prediction strategy is used to obtain reliable classification results by counting the classification probabilities of sub-signals. The experimental results show that for the peptide sequences N'-DDFFIFFDD-C' (DF_I) and N'-DDFFLFFDD- C' (DF_L) containing only the different amino acids I (isoleucine) and L (leucine), which are isomers of each other, the algorithm achieves a recognition accuracy of 99.00%, obviously improving the sensing capability of nanopores for single molecules with similar or even identical molecular weights.

Key words: nanopore analysis, time-series signal, deep feature extraction, inter-layer feature fusion, credible statistical predictions