Unsupervised fuzzy-rough set-based dimensionality reduction

Research output: Contribution to journalArticlepeer-review

73 Citations (Scopus)

Abstract

Each year worldwide, more and more data is collected. In fact, it is estimated that the amount of data collected and stored at least doubles every 2 years. Of this data, a large percentage is unlabelled or has labels which are incomplete or missing. It is because this data is so large that it becomes very difficult for humans to manually assign labels to data objects. Additionally, many real-world application datasets such as those in gene expression analysis, and text classification are also of large dimensionality. This further frustrates the process of label assignment for domain experts as not all of the features are relevant or necessary in order to assign a given label. Hence unsupervised feature selection is required. For supervised learning, feature selection algorithms attempt to maximise a given function of predictive accuracy. This function typically considers the ability of feature vectors to reflect decision class labels. However, for the unsupervised learning task, decision class labels are not provided, which poses questions such as: which features should be retained? In fact, not all features are important and some are irrelevant, redundant or noisy. In this paper, several unsupervised FS approaches are presented which are based on fuzzy-rough sets. These approaches require no thresholding information, are domain-independent, and can operate on real-valued data without the need for discretisation. They offer a significant reduction in dimensionality whilst retaining the semantics of the data, and can even result in supersets of the supervised fuzzy-rough approaches. The approaches are compared with some supervised techniques and are shown to retain useful features. 

Original languageEnglish
Pages (from-to)106-121
Number of pages16
JournalInformation Sciences
Volume229
Early online date13 Dec 2012
DOIs
Publication statusPublished - 20 Apr 2013

Keywords

  • Unsupervised learning
  • Unsupervised feature selection
  • Feature selection
  • Attribute reduction
  • Fuzzy set
  • Rough set

Fingerprint

Dive into the research topics of 'Unsupervised fuzzy-rough set-based dimensionality reduction'. Together they form a unique fingerprint.

Cite this