TY - JOUR
T1 - A Distance Measure Approach to Exploring the Rough Set Boundary Region for Attribute Reduction
AU - MacParthaláin, Neil Seosamh
AU - Shen, Qiang
AU - Jensen, Richard
N1 - N. Mac Parthaláin, Q. Shen, and R. Jensen. A Distance Measure Approach to Exploring the Rough Set Boundary Region for Attribute Reduction. IEEE Transactions on Knowledge and Data Engineering, vol. 22, no.3, pp. 306--317, 2010.
PY - 2010/3/1
Y1 - 2010/3/1
N2 - Feature Selection (FS) or Attribute Reduction techniques are employed for dimensionality reduction and
aim to select a subset of the original features of a dataset which
are rich in the most useful information. The benefits of employing FS techniques include improved
data visualisation and transparency, a reduction in training and utilisation
times and potentially, improved prediction performance.
Many approaches based on rough set theory up to now, have employed the
dependency function, which is based on lower approximations as an evaluation step in the FS process. However, by examining
only that information which is considered to be certain and ignoring the boundary region, or region of
uncertainty, much useful information is lost.
This paper examines a rough set FS technique which uses
the information gathered from both the lower approximation dependency value and a distance
metric which considers the number of objects in the boundary region and
the distance of those objects from the lower approximation.
The use of this measure in rough set feature selection can result in
smaller subset sizes than those obtained using the dependency function
alone. This demonstrates that there is much valuable information to be extracted
from the boundary region. Experimental results are presented for both crisp
and real-valued data and compared with two other FS techniques in terms of subset size,
runtimes and classification accuracy.
AB - Feature Selection (FS) or Attribute Reduction techniques are employed for dimensionality reduction and
aim to select a subset of the original features of a dataset which
are rich in the most useful information. The benefits of employing FS techniques include improved
data visualisation and transparency, a reduction in training and utilisation
times and potentially, improved prediction performance.
Many approaches based on rough set theory up to now, have employed the
dependency function, which is based on lower approximations as an evaluation step in the FS process. However, by examining
only that information which is considered to be certain and ignoring the boundary region, or region of
uncertainty, much useful information is lost.
This paper examines a rough set FS technique which uses
the information gathered from both the lower approximation dependency value and a distance
metric which considers the number of objects in the boundary region and
the distance of those objects from the lower approximation.
The use of this measure in rough set feature selection can result in
smaller subset sizes than those obtained using the dependency function
alone. This demonstrates that there is much valuable information to be extracted
from the boundary region. Experimental results are presented for both crisp
and real-valued data and compared with two other FS techniques in terms of subset size,
runtimes and classification accuracy.
U2 - 10.1109/TKDE.2009.119
DO - 10.1109/TKDE.2009.119
M3 - Article
SN - 1041-4347
VL - 22
SP - 306
EP - 317
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 3
ER -