Rough Set Theory as a Data Mining Technique: A Case Study in Epidemiology and Cancer Incidence Prediction

Zaineb Chelly Dagdia, Christine Zarges, Benjamin Schannes, Martin Micalef, Lino Galiana, Benoît Rolland, Olivier de Fresnoye, Mehdi Benchoufi

Research output: Chapter in Book/Report/Conference proceedingConference Proceeding (Non-Journal item)

3 Citations (SciVal)
246 Downloads (Pure)

Abstract

A big challenge in epidemiology is to perform data pre-processing, specifically feature selection, on a large scale data and high dimensional feature set. In this paper, this challenge is tackled by using a recently established distributed and scalable version of Rough Set Theory (RST). It considers epidemiological data that has been collected from three international institutions for the purpose of cancer incidence prediction. The concrete data set used aggregates about 5495 risk factors (features), spanning 32 years and 38 countries. Detailed experiments demonstrate that RST is relevant to real world big data applications as it can offer insights about the selected risk factors, speed up the learning process, assure the performance of the cancer incidence prediction model without a significant information loss, and simplify the learned model for epidemiologists. Code related to this paper is available at: https://github.com/zeinebchelly/Sp-RST.
Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases
Subtitle of host publicationECML PKDD 20-18
EditorsUlf Brefeld, Edward Curry, Elizabeth Daly, Brian MacNamee, Alice Marascu, Fabio Pinelli, Michele Berlingerio, Neil Hurley
PublisherSpringer Nature
Pages440-455
Number of pages16
ISBN (Electronic)978-3-030-10997-4
ISBN (Print)978-3-030-10996-7
DOIs
Publication statusPublished - 18 Jan 2019
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - Croke Park Conference Centre, Dublin, Ireland
Duration: 10 Sept 201814 Sept 2018
http://www.ecmlpkdd2018.org

Publication series

NameLecture Notes in Computer Science
Volume11053

Conference

ConferenceEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
Abbreviated titleECML-PKDD
Country/TerritoryIreland
CityDublin
Period10 Sept 201814 Sept 2018
Internet address

Keywords

  • Big Data
  • Rough Set Theory
  • Feature Selection
  • Epidemiology
  • Cancer Incidence Prediction

Fingerprint

Dive into the research topics of 'Rough Set Theory as a Data Mining Technique: A Case Study in Epidemiology and Cancer Incidence Prediction'. Together they form a unique fingerprint.

Cite this