Rough Set Theory as a Data Mining Technique: A Case Study in Epidemiology and Cancer Incidence Prediction

Zaineb Chelly Dagdia, Christine Zarges, Benjamin Schannes, Martin Micalef, Lino Galiana, Benoît Rolland, Olivier de Fresnoye, Mehdi Benchoufi

Research output: Chapter in Book/Report/Conference proceedingConference Proceeding (Non-Journal item)

5 Citations (Scopus)
347 Downloads (Pure)

Abstract

A big challenge in epidemiology is to perform data pre-processing, specifically feature selection, on large scale data sets with a high dimensional feature set. In this paper, this challenge is tackled by using a recently established distributed and scalable version of Rough Set Theory (RST. It considers epidemiological data that has been collected from three international institutions for the purpose of cancer incidence prediction. The concrete data set used aggregates about 5 495 risk factors (features), spanning 32 years and 38 countries. Detailed experiments demonstrate that RST is relevant to real world big data applications as it can offer insights into the selected risk factors, speed up the learning process, ensure the performance of the cancer incidence prediction model without huge information loss, and simplify the learned model for epidemiologists. Code related to this paper is available at: https://github.com/zeinebchelly/Sp-RST.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases
Subtitle of host publicationEuropean Conference, ECML PKDD 2018, Proceedings
EditorsUlf Brefeld, Alice Marascu, Fabio Pinelli, Edward Curry, Brian MacNamee, Neil Hurley, Elizabeth Daly, Michele Berlingerio
PublisherSpringer Nature
Pages440-455
Number of pages16
ISBN (Electronic)978-3-030-10997-4
ISBN (Print)978-3-030-10996-7
DOIs
Publication statusPublished - 18 Jan 2019
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - Croke Park Conference Centre, Dublin, Ireland
Duration: 10 Sept 201814 Sept 2018
http://www.ecmlpkdd2018.org

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11053 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
Abbreviated titleECML-PKDD
Country/TerritoryIreland
CityDublin
Period10 Sept 201814 Sept 2018
Internet address

Keywords

  • Big Data
  • Rough Set Theory
  • Feature Selection
  • Epidemiology
  • Cancer Incidence Prediction
  • Rough set theory
  • Feature selection
  • Cancer incidence prediction
  • Big data
  • Application

Fingerprint

Dive into the research topics of 'Rough Set Theory as a Data Mining Technique: A Case Study in Epidemiology and Cancer Incidence Prediction'. Together they form a unique fingerprint.

Cite this