A Distributed Rough Set Theory Algorithm based on Locality Sensitive Hashing for an Efficient Big Data Pre-processing

Zaineb Chelly Dagdia, Christine Zarges, Gaël Beck, Hanene Azzag, Mustapha Lebbah

Research output: Chapter in Book/Report/Conference proceedingConference Proceeding (Non-Journal item)

253 Downloads (Pure)

Abstract

—A big challenge in the knowledge discovery process is to perform big data pre-processing; specifically feature selection. To handle this challenge, Rough Set Theory (RST) has been considered as one of the most powerful techniques
as it has much to offer for feature selection. To extend its applicability to big data, a distributed version of RST was developed. However, one of its key challenges is the partitioning of the feature search space in the distributed environment while guaranteeing data dependency. In this paper, we propose a new distributed version of RST based on Locality Sensitive Hashing (LSH), named LSH-dRST, for big data pre-processing. LSHdRST uses LSH to match similar features into the same bucket and maps the generated buckets into partitions to enable the splitting of the universe in a more appropriate way. We compare
LSH-dRST to the standard distributed RST technique which is based on a random partitioning of the universe and demonstrate that our LSH-dRST is not only scalable but also more reliable for feature selection; making it more relevant to big data preprocessing. We also demonstrate that our LSH-dRST ensures
the partitioning of the high dimensional feature search space in a more reliable way. Hence, guarantees data dependency in the distributed environment, and ensures a lower computational cost
Original languageEnglish
Title of host publication2018 IEEE International Conference on BIG DATA
PublisherIEEE Press
Publication statusPublished - 2018
Event2018 IEEE International Conference on BIG DATA - The Westin Seattle, Seattle, United States of America
Duration: 10 Dec 201813 Dec 2018

Conference

Conference2018 IEEE International Conference on BIG DATA
Country/TerritoryUnited States of America
CitySeattle
Period10 Dec 201813 Dec 2018

Keywords

  • big data pre-processing
  • feature selection
  • rough set theory
  • locality sensitive hashing
  • distributed processing

Fingerprint

Dive into the research topics of 'A Distributed Rough Set Theory Algorithm based on Locality Sensitive Hashing for an Efficient Big Data Pre-processing'. Together they form a unique fingerprint.

Cite this