Projects per year
Abstract
A big challenge in epidemiology is to perform data pre-processing, specifically feature selection, on large scale data sets with a high dimensional feature set. In this paper, this challenge is tackled by using a recently established distributed and scalable version of Rough Set Theory (RST. It considers epidemiological data that has been collected from three international institutions for the purpose of cancer incidence prediction. The concrete data set used aggregates about 5 495 risk factors (features), spanning 32 years and 38 countries. Detailed experiments demonstrate that RST is relevant to real world big data applications as it can offer insights into the selected risk factors, speed up the learning process, ensure the performance of the cancer incidence prediction model without huge information loss, and simplify the learned model for epidemiologists. Code related to this paper is available at: https://github.com/zeinebchelly/Sp-RST.
Original language | English |
---|---|
Title of host publication | Machine Learning and Knowledge Discovery in Databases |
Subtitle of host publication | European Conference, ECML PKDD 2018, Proceedings |
Editors | Ulf Brefeld, Alice Marascu, Fabio Pinelli, Edward Curry, Brian MacNamee, Neil Hurley, Elizabeth Daly, Michele Berlingerio |
Publisher | Springer Nature |
Pages | 440-455 |
Number of pages | 16 |
ISBN (Electronic) | 978-3-030-10997-4 |
ISBN (Print) | 978-3-030-10996-7 |
DOIs | |
Publication status | Published - 18 Jan 2019 |
Event | European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - Croke Park Conference Centre, Dublin, Ireland Duration: 10 Sept 2018 → 14 Sept 2018 http://www.ecmlpkdd2018.org |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 11053 LNAI |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases |
---|---|
Abbreviated title | ECML-PKDD |
Country/Territory | Ireland |
City | Dublin |
Period | 10 Sept 2018 → 14 Sept 2018 |
Internet address |
Keywords
- Big Data
- Rough Set Theory
- Feature Selection
- Epidemiology
- Cancer Incidence Prediction
- Rough set theory
- Feature selection
- Cancer incidence prediction
- Big data
- Application
Fingerprint
Dive into the research topics of 'Rough Set Theory as a Data Mining Technique: A Case Study in Epidemiology and Cancer Incidence Prediction'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Optimised Framework based on Rough Set Theory for Big Data Pre-processing in Certain and Imprecise Contexts - RoSTBiDFramework
Zarges, C. (PI)
01 Mar 2017 → 28 Feb 2019
Project: Externally funded research