Abstract
Data dimensionality has become a pervasive problem in many areas that require the learning of interpretable models. This has become particularly pronounced in recent years with the seemingly relentless growth in the size of datasets. Indeed, as the number of dimensions increases, the number of data instances required in order to generate accurate models increases exponentially. Feature selection has therefore become not only a useful step in the process of model learning, but rather an increasingly necessary one. Rough set and fuzzy-rough set theory have been used as such dataset pre-processors with much success, however the underlying time/space complexity of the subset evaluation metric is an obstacle to the processing of very large data. This paper proposes a general approach to this problem that employs a novel feature grouping step in order to alleviate the processing overhead for large datasets. The approach is framed within the context of (and applied to) fuzzy-rough sets, although it can be used with other subset evaluation techniques. The experimental evaluation demonstrates that considerable computational effort can be avoided, and as a result efficiency can be improved considerably for larger datasets.
Original language | English |
---|---|
Title of host publication | 2014 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE) |
Place of Publication | NEW YORK |
Publisher | IEEE Press |
Pages | 1488-1495 |
Number of pages | 8 |
Publication status | Published - 2014 |
Event | Fuzzy Systems - Beijing, Beijing, China Duration: 06 Jul 2014 → 11 Jul 2014 Conference number: 23 |
Publication series
Name | IEEE International Fuzzy Systems Conference Proceedings |
---|---|
Publisher | IEEE |
ISSN (Print) | 1544-5615 |
Conference
Conference | Fuzzy Systems |
---|---|
Abbreviated title | FUZZ-IEEE-2014 |
Country/Territory | China |
City | Beijing |
Period | 06 Jul 2014 → 11 Jul 2014 |
Keywords
- fuzzy-rough sets
- feature selection
- feature grouping