Abstract
Clinical data comes in many forms and an abundance of data models and standards coexist, each specifying different ways of capturing the same data. At the same time these tend to specify how to store structured data and require adherence to a strict data model. Often times data may be less rigid and more varied, or come in simpler formats such as spreadsheets or tabular formats.Integrating multiple sources of such data presents a challenge due to not only differing data formats and schemas but also different terminology through the use of multiple languages, dated terms, shorthand, or human errors.
This thesis aims to address this issue with the development of a framework that utilises ontologies for clinical data integration and enhanced querying.
Within this framework a web-based interface and APIs have been implemented to simplify the process of integrating heterogeneous clinical data using ontologies.
An approach to RDF triple generation of tabular data using an extendible pipeline has been implemented, with a filter system capable of applying concept extraction and linking on freetext input fields.
The application of this framework and its efficacy has been demonstrated with two use cases, including annotation of some real-world experimental data as well as cohort identification using a common clinical challenge dataset.
| Date of Award | 2023 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Sponsors | Llywodraeth Cymru KESS-2 |
| Supervisor | Chuan Lu (Supervisor) & Luis Mur (Supervisor) |