A Distributed Rough Set Theory based Algorithm for an Efficient Big Data Pre-processing under the Spark Framework

Zaineb Chelly Dagdia, Christine Zarges, Gaël Beck, Mustapha Lebbah

Research output: Chapter in Book/Report/Conference proceedingConference Proceeding (Non-Journal item)

16 Citations (Scopus)
292 Downloads (Pure)

Abstract

Big Data reduction is a main point of interest across a wide variety of fields. This domain was further investigated when the difficulty in quickly acquiring the most useful information from the huge amount of data at hand was encountered. To achieve the task of data reduction, specifically feature selection, several state-of-the-art methods were proposed. However, most of them require additional information about the given data for thresholding, noise levels to be specified or they even need a feature ranking procedure. Thus, it seems necessary to think about a more adequate feature selection technique which can extract features using information contained within the dataset alone. Rough Set Theory (RST) can be used as such a technique to discover data dependencies and to reduce the number of features contained in a dataset using the data alone, requiring no additional information. However, despite being a powerful feature selection technique, RST is computationally expensive and only practical for small datasets. Therefore, in this paper, we present a novel efficient distributed Rough Set Theory based algorithm for large-scale data pre-processing under the Spark framework. Our experimental results show the efficient applicability of our RST solution to Big Data without any significant information loss.
Original languageEnglish
Title of host publicationProceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
EditorsJian-Yun Nie, Zoran Obradovic, Toyotaro Suzumura, Rumi Ghosh, Raghunath Nambiar, Chonggang Wang, Hui Zang, Ricardo Baeza-Yates, Xiaohua Hu, Jeremy Kepner, Alfredo Cuzzocrea, Jian Tang, Masashi Toyoda
PublisherIEEE Press
Pages911-916
Number of pages6
ISBN (Electronic)9781538627143
DOIs
Publication statusPublished - 15 Jan 2018
Event2017 IEEE International Conference on Big Data(BigData 2017) - Boston, United States of America
Duration: 11 Dec 201714 Dec 2017

Publication series

NameProceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
Volume2018-January

Conference

Conference2017 IEEE International Conference on Big Data(BigData 2017)
Country/TerritoryUnited States of America
CityBoston
Period11 Dec 201714 Dec 2017

Keywords

  • Big Data Pre-processing
  • Distributed Processing
  • Feature Selection
  • Rough Set Theory
  • Scalability

Fingerprint

Dive into the research topics of 'A Distributed Rough Set Theory based Algorithm for an Efficient Big Data Pre-processing under the Spark Framework'. Together they form a unique fingerprint.

Cite this