Abstract
Genome sequencing technology is generating large databases of sequence at such a rate that advances in computer hardware alone are not adequate to handle them: more efficient algorithms are needed. Here an alignment-free method of sequence comparison and visualisation based on the Chaos Games Representation (CGR) and multifractal analysis is explored as an approach to search and filter through a data set of over 1500 microbial genomes. Whereas BLAST takes 25 hours to search this data set with large sequence fragments (e.g. 100 Kb), the method introduced here can reduce this data set by 95% (from 1550 target species to just 50) in about 15 minutes, and it is able to predict the exact species correctly in 67% of cases. The results presented here demonstrate that CGR is worth further investigation as a fast method to perform genome sequence comparison on large data sets, and various ways to further develop the method are discussed.
Original language | English |
---|---|
Pages (from-to) | 1372-1381 |
Number of pages | 9 |
Journal | Procedia Computer Science |
Volume | 18 |
DOIs | |
Publication status | Published - 01 Jun 2013 |
Event | 2013 International Conference on Computational Science - Barcelona, Spain Duration: 05 Jun 2013 → 07 Jun 2013 |
Keywords
- genomics
- visualisation
- biological sequence analysis
- large data sets
- data mining
Fingerprint
Dive into the research topics of 'Fast Comparison of Microbial Genomes Using the Chaos Games Representation for Metagenomic Applications'. Together they form a unique fingerprint.Datasets
-
Microbial genome sequences and taxonomic information based on the Genometa 2012 data set
Swain, M., Prifysgol Aberystwyth | Aberystwyth University, 18 Jul 2019
DOI: 10.20391/e6974906-f30f-4976-90fb-ea1679eedef0
Dataset
File