A Distributed Rough Set Theory based Algorithm for an Efficient Big Data Pre-processing under the Spark Framework

Zaineb Chelly Dagdia, Christine Zarges, Gaël Beck, Mustapha Lebbah

Allbwn ymchwil: Pennod mewn Llyfr/Adroddiad/Trafodion CynhadleddTrafodion Cynhadledd (Nid-Cyfnodolyn fathau)

14 Dyfyniadau(SciVal)
242 Wedi eu Llwytho i Lawr (Pure)


Big Data reduction is a main point of interest across a wide variety of fields. This domain was further investigated when the difficulty in quickly acquiring the most useful information from the huge amount of data at hand was encountered. To achieve the task of data reduction, specifically feature selection, several state-of-the-art methods were proposed. However, most of them require additional information about the given data for thresholding, noise levels to be specified or they even need a feature ranking procedure. Thus, it seems necessary to think about a more adequate feature selection technique which can extract features using information contained within the dataset alone. Rough Set Theory (RST) can be used as such a technique to discover data dependencies and to reduce the number of features contained in a dataset using the data alone, requiring no additional information. However, despite being a powerful feature selection technique, RST is computationally expensive and only practical for small datasets. Therefore, in this paper, we present a novel efficient distributed Rough Set Theory based algorithm for large-scale data pre-processing under the Spark framework. Our experimental results show the efficient applicability of our RST solution to Big Data without any significant information loss.
Iaith wreiddiolSaesneg
Teitl2017 IEEE International Conference on Big Data (Big Data)
GolygyddionJian-Yun Nie, Zoran Obradovic, Toyotaro Suzumura, Rumi Ghosh, Raghumath Nambiar, Chonggang Wang, Hui Zang, Ricardo Baeza-Yates, Xiaohua Hu, Jeremy Kepner, Alfredo Cuzzocrea, Jian Tang, Masashi Toyoda
CyhoeddwrIEEE Press
ISBN (Electronig)978-1-5386-2715-0
Dynodwyr Gwrthrych Digidol (DOIs)
StatwsCyhoeddwyd - 15 Ion 2018
Digwyddiad2017 IEEE International Conference on Big Data(BigData 2017) - Boston, Unol Daleithiau America
Hyd: 11 Rhag 201714 Rhag 2017


Cynhadledd2017 IEEE International Conference on Big Data(BigData 2017)
Gwlad/TiriogaethUnol Daleithiau America
Cyfnod11 Rhag 201714 Rhag 2017

Ôl bys

Gweld gwybodaeth am bynciau ymchwil 'A Distributed Rough Set Theory based Algorithm for an Efficient Big Data Pre-processing under the Spark Framework'. Gyda’i gilydd, maen nhw’n ffurfio ôl bys unigryw.

Dyfynnu hyn