Replacing Sanger with Next Generation Sequencing to improve coverage and quality of reference DNA barcodes for plants

Michael Wilkinson, Claudio Szabo, Caroline Ford, Yuval Yarom, Adam Croxford, Amanda Camp, Paul Gooding

Research output: Contribution to journalArticlepeer-review

46 Citations (Scopus)
188 Downloads (Pure)

Abstract

We estimate the global BOLD Systems database holds core DNA barcodes (rbcL + matK) for about 15% of land plant species and that comprehensive species coverage is still many decades away. Interim performance of the resource is compromised by variable sequence overlap and modest information content within each barcode. Our model predicts that the proportion of species-unique barcodes reduces as the database grows and that ‘false’ species-unique barcodes remain >5% until the database is almost complete. We conclude the current rbcL + matK barcode is unfit for purpose. Genome skimming and supplementary barcodes could improve diagnostic power but would slow new barcode acquisition. We therefore present two novel Next Generation Sequencing protocols (with freeware) capable of accurate, massively parallel de novo assembly of high quality DNA barcodes of >1400 bp. We explore how these capabilities could enhance species diagnosis in the coming decades
Original languageEnglish
Article number46040
JournalScientific Reports
Volume7
DOIs
Publication statusPublished - 12 Apr 2017

Keywords

  • computational biology and bioinformatics
  • plant science

Fingerprint

Dive into the research topics of 'Replacing Sanger with Next Generation Sequencing to improve coverage and quality of reference DNA barcodes for plants'. Together they form a unique fingerprint.

Cite this