Acta Chimica Sinica ›› 2021, Vol. 79 ›› Issue (5): 663-669.DOI: 10.6023/A21010025 Previous Articles     Next Articles

Article

基于非特异性蛋白酶连续酶解的蛋白质全序列测定方法

杨超a,b, 单亦初a,*(), 张玮杰a,b, 戴忠鹏a, 张丽华a,*(), 张玉奎a   

  1. a 中国科学院大连化学物理研究所 中国科学院分离分析化学重点实验室 大连 116023
    b 中国科学院大学 北京 100049
  • 投稿日期:2021-01-27 发布日期:2021-03-30
  • 通讯作者: 单亦初, 张丽华
  • 基金资助:
    项目受国家重点研发计划课题(2017YFF0205404); 项目受国家重点研发计划课题(2017YFA0505004); 国家自然科学基金(21675153); 国家自然科学基金(21725506)

Full-length Protein Sequencing Based on Continuous Digestion Using Non-specific Proteases

Chao Yanga,b, Yi-Chu Shana,*(), Wei-Jie Zhanga,b, Zhong-Peng Daia, Li-Hua Zhanga,*(), Yu-Kui Zhanga   

  1. a CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Dalian 116023, China
    b University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2021-01-27 Published:2021-03-30
  • Contact: Yi-Chu Shan, Li-Hua Zhang
  • About author:
  • Supported by:
    Ministry of Science and Technology of China(2017YFF0205404); Ministry of Science and Technology of China(2017YFA0505004); National Natural Science Foundation of China(21675153); National Natural Science Foundation of China(21725506)

Determining the complete sequence of the protein is helpful to analyze the structure of the protein and reveal the biological function of the protein. In traditional “bottom-up” proteomic strategy, database searching is used to identify sequences of peptides and proteins analyzed by liquid chromatography-tandem mass spectrometry (LC-MS/MS). It is impossible to identify proteins with unknown sequences through database searching, so de novo sequencing is essential for protein characterization. To increase the accuracy and coverage of protein sequencing, a de novo protein sequencing method based on continuous digestion using various non-specific proteases has been developed. A continuous digestion device was constructed, and a variety of non-specific proteases were used to continuously digest the protein. Taking advantage of the non-specific cleavage sites of non-specific proteases, the complementarity of peptides produced at different time and by different kinds of proteases, the type and overlapping degree of digested peptides were improved. The sequence coverage of peptides after continuous digestion by each protease can reach 100%. Finally, a sequence assembly algorithm was developed to assemble the peptides obtained by de novo sequencing. At first, the candidate peptide sequences were splitted into sequence tags which contain 7 amino acids, and then the most frequently occurring sequence tag was chosen as the seed sequence. Afterwards, the seed sequence was automatically or manually extended to the N-terminal end and C-terminal end respectively according to the scores of sequence tags. Finally, the complete protein sequence was successfully assembled. The developed method was applied to the de novo sequencing of bovine serum albumin (BSA) and monoclonal antibody Herceptin. Excluding leucine and isoleucine, full-length de novo sequencing was achieved with 100% accuracy for BSA and Herceptin light chain. Accuracy of the sequenced Herceptin heavy chain was 99.7%. The de novo sequencing strategy based on continuous digestion of proteins using non-specific proteases can be applied to de novo sequencing of proteins with unknown sequences or quality control of monoclonal antibody drugs.

Key words: non-specific protease, continuous digestion, sequence assembly, full-length sequencing