Projects per year
Abstract
Big Data reduction is a main point of interest across a wide variety of fields. This domain was further investigated when the difficulty in quickly acquiring the most useful information from the huge amount of data at hand was encountered. To achieve the task of data reduction, specifically feature selection, several state-of-the-art methods were proposed. However, most of them require additional information about the given data for thresholding, noise levels to be specified or they even need a feature ranking procedure. Thus, it seems necessary to think about a more adequate feature selection technique which can extract features using information contained within the dataset alone. Rough Set Theory (RST) can be used as such a technique to discover data dependencies and to reduce the number of features contained in a dataset using the data alone, requiring no additional information. However, despite being a powerful feature selection technique, RST is computationally expensive and only practical for small datasets. Therefore, in this paper, we present a novel efficient distributed Rough Set Theory based algorithm for large-scale data pre-processing under the Spark framework. Our experimental results show the efficient applicability of our RST solution to Big Data without any significant information loss.
Original language | English |
---|---|
Title of host publication | Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017 |
Editors | Jian-Yun Nie, Zoran Obradovic, Toyotaro Suzumura, Rumi Ghosh, Raghunath Nambiar, Chonggang Wang, Hui Zang, Ricardo Baeza-Yates, Xiaohua Hu, Jeremy Kepner, Alfredo Cuzzocrea, Jian Tang, Masashi Toyoda |
Publisher | IEEE Press |
Pages | 911-916 |
Number of pages | 6 |
ISBN (Electronic) | 9781538627143 |
DOIs | |
Publication status | Published - 15 Jan 2018 |
Event | 2017 IEEE International Conference on Big Data(BigData 2017) - Boston, United States of America Duration: 11 Dec 2017 → 14 Dec 2017 |
Publication series
Name | Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017 |
---|---|
Volume | 2018-January |
Conference
Conference | 2017 IEEE International Conference on Big Data(BigData 2017) |
---|---|
Country/Territory | United States of America |
City | Boston |
Period | 11 Dec 2017 → 14 Dec 2017 |
Keywords
- Big Data Pre-processing
- Distributed Processing
- Feature Selection
- Rough Set Theory
- Scalability
Fingerprint
Dive into the research topics of 'A Distributed Rough Set Theory based Algorithm for an Efficient Big Data Pre-processing under the Spark Framework'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Optimised Framework based on Rough Set Theory for Big Data Pre-processing in Certain and Imprecise Contexts - RoSTBiDFramework
Zarges, C. (PI)
01 Mar 2017 → 28 Feb 2019
Project: Externally funded research