Fast Comparison of Microbial Genomes Using the Chaos Games Representation for Metagenomic Applications

Research output: Contribution to journalArticlepeer-review

10 Citations (SciVal)
138 Downloads (Pure)

Abstract

Genome sequencing technology is generating large databases of sequence at such a rate that advances in computer hardware alone are not adequate to handle them: more efficient algorithms are needed. Here an alignment-free method of sequence comparison and visualisation based on the Chaos Games Representation (CGR) and multifractal analysis is explored as an approach to search and filter through a data set of over 1500 microbial genomes. Whereas BLAST takes 25 hours to search this data set with large sequence fragments (e.g. 100 Kb), the method introduced here can reduce this data set by 95% (from 1550 target species to just 50) in about 15 minutes, and it is able to predict the exact species correctly in 67% of cases. The results presented here demonstrate that CGR is worth further investigation as a fast method to perform genome sequence comparison on large data sets, and various ways to further develop the method are discussed.
Original languageEnglish
Pages (from-to)1372-1381
Number of pages9
JournalProcedia Computer Science
Volume18
DOIs
Publication statusPublished - 01 Jun 2013
Event2013 International Conference on Computational Science - Barcelona, Spain
Duration: 05 Jun 201307 Jun 2013

Keywords

  • genomics
  • visualisation
  • biological sequence analysis
  • large data sets
  • data mining

Fingerprint

Dive into the research topics of 'Fast Comparison of Microbial Genomes Using the Chaos Games Representation for Metagenomic Applications'. Together they form a unique fingerprint.

Cite this