|Title of host publication
|Encyclopedia of Data Warehousing and Mining - 2nd Edition
|Number of pages
|Published - 2008
Data reduction is an important step in knowledge discovery from data. The high dimensionality of databases can be reduced using suitable techniques, depending on the requirements of the data mining processes. These techniques fall in to one of two categories: those that transform the underlying meaning of the data features and those that are semantics-preserving. Feature selection (FS) methods belong to the latter category, where a smaller set of the original features is chosen based on a subset evaluation function. The process aims to determine a minimal feature subset from a problem domain while retaining a suitably high accuracy in representing the original features. In knowledge discovery, feature selection methods are particularly desirable as these facilitate the interpretability of the resulting knowledge. Rough set theory has been used as such a tool with much success, enabling the discovery of data dependencies and the reduction of the number of features contained in a dataset using the data alone, requiring no additional information.