A Distributed Rough Set Theory Algorithm based on Locality Sensitive Hashing for an Efficient Big Data Pre-processing

Zaineb Chelly Dagdia, Christine Zarges, Gaël Beck, Hanene Azzag, Mustapha Lebbah

Allbwn ymchwil: Pennod mewn Llyfr/Adroddiad/Trafodion CynhadleddTrafodion Cynhadledd (Nid-Cyfnodolyn fathau)

205 Wedi eu Llwytho i Lawr (Pure)

Crynodeb

—A big challenge in the knowledge discovery process is to perform big data pre-processing; specifically feature selection. To handle this challenge, Rough Set Theory (RST) has been considered as one of the most powerful techniques
as it has much to offer for feature selection. To extend its applicability to big data, a distributed version of RST was developed. However, one of its key challenges is the partitioning of the feature search space in the distributed environment while guaranteeing data dependency. In this paper, we propose a new distributed version of RST based on Locality Sensitive Hashing (LSH), named LSH-dRST, for big data pre-processing. LSHdRST uses LSH to match similar features into the same bucket and maps the generated buckets into partitions to enable the splitting of the universe in a more appropriate way. We compare
LSH-dRST to the standard distributed RST technique which is based on a random partitioning of the universe and demonstrate that our LSH-dRST is not only scalable but also more reliable for feature selection; making it more relevant to big data preprocessing. We also demonstrate that our LSH-dRST ensures
the partitioning of the high dimensional feature search space in a more reliable way. Hence, guarantees data dependency in the distributed environment, and ensures a lower computational cost
Iaith wreiddiolSaesneg
Teitl2018 IEEE International Conference on BIG DATA
CyhoeddwrIEEE Press
StatwsCyhoeddwyd - 2018
Digwyddiad2018 IEEE International Conference on BIG DATA - The Westin Seattle, Seattle, Unol Daleithiau America
Hyd: 10 Rhag 201813 Rhag 2018

Cynhadledd

Cynhadledd2018 IEEE International Conference on BIG DATA
Gwlad/TiriogaethUnol Daleithiau America
DinasSeattle
Cyfnod10 Rhag 201813 Rhag 2018

Ôl bys

Gweld gwybodaeth am bynciau ymchwil 'A Distributed Rough Set Theory Algorithm based on Locality Sensitive Hashing for an Efficient Big Data Pre-processing'. Gyda’i gilydd, maen nhw’n ffurfio ôl bys unigryw.

Dyfynnu hyn