化学学报 ›› 2003, Vol. 61 ›› Issue (2): 273-278. 上一篇    下一篇

研究论文

用墨西哥帽小波研究DNA序列的分形特征

陈晓燕;鲍伦军;莫金垣;蔡沛祥   

  1. 中山大学化学与化学工程学院;广州出入境检验检疫局
  • 发布日期:2003-02-15

Study on DNA Sequences' Fractal Characteristics by Mexican Hat Wavelet

Chen Xiaoyan;Bao Lunjun;Mo Jinyuan;Cai Peixiang   

  1. College of Chemistry and Chemical Engineering,Zhongshan University;Guangzhou Entry-Exit Inspection & Quarantine Bureau of China
  • Published:2003-02-15

结合小波分析和分形理论,采用分数布朗运动(FBM)建立数学模型,研究脱氧 核糖核酸(DNA)序列的自相似性。用DNA walk的方式将DNA序列表达成为一个数字信 号,通过不同小波变换的尺度对应不同特征长度的碱基,选择墨西哥帽小波为母小 波,进行实验考察,结果发现小波系数的图形在许多尺度看上去很相似,大尺度对 应较多的碱基(小波变换尺度为2^7时,对应512个碱基),能看到概貌;小尺度对应 较少的碱基(小波变换尺度为2^3时,对应32个碱基),可看到细节。这表明其DNA序 列中存在分形结构,可以用分维数来作为定量描述。这种算法为进一步研究与基因 序列自相似结构有关的基因进化信息提供一种选择的途径。

关键词: 脱氧核糖核酸, 序列分析, 数学模型

The design of the experiments was to explore the complexity of genome sequences by wavelet transform and fractal geometry theory. The mathematic model was set up by Fractional Brownian Motion (FBM) to study the self-similarity of the deoxyribonucleic acid (DNA) sequences. The DNA sequence was expressed as a numerical signal by the method of DNA walk at first. There were three kinds of representation for DNA sequences such as AG walk, AC walk and AT walk. The DNA sequence was investigated by continuous Mexican hat wavelet transform. Each scale of the transform was corresponding to a certain characteristics length of nucleotides. Then the fractal dimension can be calculated and used to describe the self-similarity in the DNA sequences. Three horizontal cuts about the wavelet coefficients were shown at three different scales a at a-1=2~3 , a-2 = 2~5 , a-3 = 2~7 , which correspond to looking at the fluctuations of the DNA walk over a characteristic length of the order of 32, 128 and 512 nucleotides respectively. The smaller scale can observe the details, and the bigger scale displays the approximate. This train of thoughts can be a choice for the forward research of gene's evolutional information that relative to the self-similar structure of the gene sequences.

Key words: DNA, SEQUENCE ANALYSIS, MATHEMATICAL MODELS

中图分类号: