CpG island identification with higher order and variable order markov models

Zhenqiu Liu, Dechang Chen, Xue Wen Chen

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

1 Scopus citations


Identifying the location and function of human genes in a long sequence of genome is difficult due to lack of sufficient information about genes. Experimental evidence has suggested that there exists strong correlation between CpG islands and genes immediately following them. Much research has been done to identify CpG islands in a DNA sequence using various models. In this chapter, we introduce two alternative models based on high order and variable order Markov chains. Compared with the popular models such as the ffist order Markov chain, HMM, and HMT, these two models are much easier to compute and have higher identification accuracies. One unsolved problem with the Markov model is that there is no way to decide the exact boundary point between CpG and non-CpG islands. In this chapter, we provide a novel tool to decide the boundary points using the sequential probability test. Sequential data from GeneBank are used for the experiments in this chapter.

Original languageEnglish
Title of host publicationSpringer Optimization and Its Applications
PublisherSpringer International Publishing
Number of pages10
StatePublished - 2007
Externally publishedYes

Publication series

NameSpringer Optimization and Its Applications
ISSN (Print)1931-6828
ISSN (Electronic)1931-6836


  • CpG islands
  • DNA sequences
  • Markov models
  • Probability Suffix Trees (PST)
  • Sequential probability ratio test (SPRT)


Dive into the research topics of 'CpG island identification with higher order and variable order markov models'. Together they form a unique fingerprint.

Cite this