Background: Metabolomics experiments using Mass Spectrometry (MS) technology measure the mass to charge ratio (m/z) and intensity of ionised molecules in crude extracts of complex biological samples to generate high dimensional metabolite 'fingerprint' or metabolite 'profile' data. High resolution MS instruments perform routinely with a mass accuracy of < 5 ppm (parts per million) thus providing potentially a direct method for signal putative annotation using databases containing metabolite mass information. Most database interfaces support only simple queries with the default assumption that molecules either gain or lose a single proton when ionised. In reality the annotation process is confounded by the fact that many ionisation products will be not only molecular isotopes but also salt/solvent adducts and neutral loss fragments of original metabolites. This report describes an annotation strategy that will allow searching based on all potential ionisation products predicted to form during electrospray ionisation (ESI). Results: Metabolite 'structures' harvested from publicly accessible databases were converted into a common format to generate a comprehensive archive in MZedDB. 'Rules' were derived from chemical information that allowed MZedDB to generate a list of adducts and neutral loss fragments putatively able to form for each structure and calculate, on the fly, the exact molecular weight of every potential ionisation product to provide targets for annotation searches based on accurate mass. We demonstrate that data matrices representing populations of ionisation products generated from different biological matrices contain a large proportion (sometimes > 50%) of molecular isotopes, salt adducts and neutral loss fragments. Correlation analysis of ESI-MS data features confirmed the predicted relationships of m/z signals. An integrated isotope enumerator in MZedDB allowed verification of exact isotopic pattern distributions to corroborate experimental data. Conclusion: We conclude that although ultra-high accurate mass instruments provide major insight into the chemical diversity of biological extracts, the facile annotation of a large proportion of signals is not possible by simple, automated query of current databases using computed molecular formulae. Parameterising MZedDB to take into account predicted ionisation behaviour and the biological source of any sample improves greatly both the frequency and accuracy of potential annotation 'hits' in ESI-MS data.
- Computational Biology/methods
- Databases, Factual
- Spectrometry, Mass, Electrospray Ionization/instrumentation