Advanced   Register
XTBG OpenIR  > 其他  > 期刊论文

title: Inference of Markovian properties of molecular sequences from NGS data and applications to comparative genomics
author: Ren, Jie;  Song, Kai;  Deng, Minghua;  Reinert, Gesine;  Cannon, Charles H.;  Sun, Fengzhu
Issued Date: 2016
Abstract: Motivation: Next-generation sequencing (NGS) technologies generate large amounts of short read data for many different organisms. The fact that NGS reads are generally short makes it challenging to assemble the reads and reconstruct the original genome sequence. For clustering genomes using such NGS data, word-count based alignment-free sequence comparison is a promising approach, but for this approach, the underlying expected word counts are essential. 

A plausible model for this underlying distribution of word counts is given through modeling the DNA sequence as a Markov chain (MC). For single long sequences, efficient statistics are available to estimate the order of MCs and the transition probability matrix for the sequences. As NGS data do not provide a single long sequence, inference methods on Markovian properties of sequences based on single long sequences cannot be directly used for NGS short read data. 

Results: Here we derive a normal approximation for such word counts. We also show that the traditional Chi-square statistic has an approximate gamma distribution, using the Lander-Waterman model for physical mapping. We propose several methods to estimate the order of the MC based on NGS reads and evaluate those using simulations. We illustrate the applications of our results by clustering genomic sequences of several vertebrate and tree species based on NGS reads using alignment-free sequence dissimilarity measures. We find that the estimated order of the MC has a considerable effect on the clustering results, and that the clustering results that use an MC of the estimated order give a plausible clustering of the species.
Appears in Collections:其他_期刊论文

Files in This Item:

File SizeFormat
Inference of Markovian properties of molecular sequences from NGS data and applications to comparative genomics.pdf1927KbAdobe PDFView  Download

全文许可: Creative Commons 署名-非商业性使用-相同方式共享 3.0

Recommended Citation:
Ren, Jie,Song, Kai,Deng, Minghua,et al. Inference Of Markovian Properties Of Molecular Sequences From Ngs Data And Applications To Comparative Genomics[J]. Bioinformatics,2016,32(7):993-1000.

SCI Citaion Data:
 Recommend this item
 Sava as my favorate item
 Show this item's statistics
 Export Endnote File
Google Scholar
 Similar articles in Google Scholar
 [Ren, Jie]'s Articles
 [Song, Kai]'s Articles
 [Deng, Minghua]'s Articles
CSDL cross search
 Similar articles in CSDL Cross Search
 [Ren, Jie]‘s Articles
 [Song, Kai]‘s Articles
 [Deng, Minghua]‘s Articles
Scirus search
 Similar articles in Scirus
Related Copyright Policies
Social Bookmarking
  Add to CiteULike  Add to Connotea  Add to  Add to Digg  Add to Reddit 
所有评论 (0)
内 容:
Email:  *
验证码:   刷新
标 题:
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.



Valid XHTML 1.0!
Powered by CSpace