AbstractThis thesis presents a novel knowledge discovery and data mining process model for metabolomics, which was successfully developed, implemented and applied to a number of metabolomics applications.
The process model provides a formalised framework and a methodology for conducting justifiable and traceable data mining in metabolomics. It promotes the achievement of metabolomics analytical objectives and contributes towards the reproducibility of its results.
The process model was designed to satisfy the requirements of data mining in metabolomics and to be consistent with the scientific nature of metabolomics investigations. It considers the practical aspects of the data mining process, covering management, human interaction, quality assurance and standards, in addition to other desired features such as visualisation, data exploration, knowledge presentation and automation.
The development of the process model involved investigating data mining concepts, approaches and techniques; in addition to the popular data mining process models, which were critically analysed in order to utilise their better features and to overcome their shortcomings. Inspiration from process engineering, software engineering, machine learning and scientific methodology was also used in developing the process model along with the existing ontologies of scientific experiments and data mining.
The process model was designed to support both data-driven and hypothesis-driven data mining. It provides a mechanism for defining the analytical objectives of metabolomics data mining, considering their achievability, feasibility, measurability and success criteria. The process model also provides a novel strategy for performing justifiable selection of data mining techniques, taking into consideration the achievement of the process's analytical objectives and taking into account the nature and quality of the metabolomics data, in addition to the requirements and feasibility of the selected data mining techniques. The model ensures validity and reproducibility of the outcomes by defining traceability and assessment mechanisms, which cover all the procedures applied and the deliveries generated throughout the process. The process also defines evaluation mechanisms, which cover not only the technical aspects of the data mining model, but also the contextual aspects of the acquired knowledge.
The process model was implemented using a software environment, and was applied to four real-world metabolomics applications. The applications demonstrated the proposed process model's applicability to various data mining approaches, goals, tasks, and techniques. They also confirmed the process's applicability to various metabolomics investigations and approaches using data generated by a variety of data acquisition instruments. The outcomes of the process execution in these applications were used in evaluating the process model's design and its satisfaction of the requirements of metabolomics data mining.
|Date of Award
|Nigel Hardy (Supervisor)