Abstract
This work presents a scientific data mining process model for metabolomics that provides a systematic and formalised framework for guiding and performing metabolomics data analysis in a justifiable and traceable manner. The process model is designed to promote the achievement of the analytical objectives of metabolomics investigations and to ensure the validity, interpretability and reproducibility of their results. It satisfies the requirements of metabolomics data mining, focuses on the contextual meaning of metabolomics knowledge, and addresses the shortcomings of existing data mining process models, while paying attention to the practical aspects of metabolomics investigations and other desirable features. The process model development involved investigating the ontologies and standards of science, data mining and metabolomics and its design was based on the principles, best practices and inspirations from Process Engineering, Software Engineering, Scientific Methodology and Machine Learning. A software environment was built to realise and automate the process model execution and was then applied to a number of metabolomics datasets to demonstrate and evaluate its applicability to different metabolomics investigations, approaches and data acquisition instruments on one hand, and to different data mining approaches, goals, tasks and techniques on the other. The process model was successful in satisfying the requirements of metabolomics data mining and can be generalised to perform data mining in other scientific disciplines.
Original language | English |
---|---|
Pages (from-to) | 209964 - 210005 |
Number of pages | 42 |
Journal | IEEE Access |
Volume | 8 |
DOIs | |
Publication status | Published - 18 Nov 2020 |
Keywords
- Analytical models
- Bioinformatics
- Computational Biology
- Data analysis
- Data Mining
- Data mining
- Data models
- Knowledge discovery
- Knowledge Discovery
- Machine Learning
- Metabolomics
- Metabolomics Data Analysis
- Process Engineering
- Software
- Software Engineering