Projects per year
Abstract
A big challenge in epidemiology is to perform data pre-processing, specifically feature selection, on a large scale data and high dimensional feature set. In this paper, this challenge is tackled by using a recently established distributed and scalable version of Rough Set Theory (RST). It considers epidemiological data that has been collected from three international institutions for the purpose of cancer incidence prediction. The concrete data set used aggregates about 5495 risk factors (features), spanning 32 years and 38 countries. Detailed experiments demonstrate that RST is relevant to real world big data applications as it can offer insights about the selected risk factors, speed up the learning process, assure the performance of the cancer incidence prediction model without a significant information loss, and simplify the learned model for epidemiologists. Code related to this paper is available at: https://github.com/zeinebchelly/Sp-RST.
Original language | English |
---|---|
Title of host publication | Machine Learning and Knowledge Discovery in Databases |
Subtitle of host publication | ECML PKDD 20-18 |
Editors | Ulf Brefeld, Edward Curry, Elizabeth Daly, Brian MacNamee, Alice Marascu, Fabio Pinelli, Michele Berlingerio, Neil Hurley |
Publisher | Springer Nature |
Pages | 440-455 |
Number of pages | 16 |
ISBN (Electronic) | 978-3-030-10997-4 |
ISBN (Print) | 978-3-030-10996-7 |
DOIs | |
Publication status | Published - 18 Jan 2019 |
Event | European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - Croke Park Conference Centre, Dublin, Ireland Duration: 10 Sept 2018 → 14 Sept 2018 http://www.ecmlpkdd2018.org |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Volume | 11053 |
Conference
Conference | European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases |
---|---|
Abbreviated title | ECML-PKDD |
Country/Territory | Ireland |
City | Dublin |
Period | 10 Sept 2018 → 14 Sept 2018 |
Internet address |
Keywords
- Big Data
- Rough Set Theory
- Feature Selection
- Epidemiology
- Cancer Incidence Prediction
Fingerprint
Dive into the research topics of 'Rough Set Theory as a Data Mining Technique: A Case Study in Epidemiology and Cancer Incidence Prediction'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Optimised Framework based on Rough Set Theory for Big Data Pre-processing in Certain and Imprecise Contexts - RoSTBiDFramework
01 Mar 2017 → 28 Feb 2019
Project: Externally funded research