Graph clustering-based discretization of splitting and merging methods (GraphS and GraphM)

Kittakorn Sriwanna, Tossapon Boongoen, Natthakan Iam-On

Research output: Contribution to journalArticlepeer-review

6 Downloads (Pure)

Abstract

Discretization plays a major role as a data preprocessing technique used in machine learning and data mining. Recent studies have focused on multivariate discretization that considers relations among attributes. The general goal of this method is to obtain the discrete data, which preserves most of the semantics exhibited by original continuous data. However, many techniques generate the final discrete data that may be less useful with natural groups of data not being maintained. This paper presents a novel graph clustering-based discretization algorithm that encodes different similarity measures into a graph representation of the examined data. The intuition allows more refined data-wise relations to be obtained and used with the effective graph clustering technique based on normalized association to discover nature graphs accurately. The goodness of this approach is empirically demonstrated over 30 standard datasets and 20 imbalanced datasets, compared with 11 well-known discretization algorithms using 4 classifiers. The results suggest the new approach is able to preserve the natural groups and usually achieve the efficiency in terms of classifier performance, and the desired number of intervals than the comparative methods.

Original languageEnglish
Article number21
Number of pages39
JournalHuman-centric Computing and Information Sciences
Volume7
Issue number1
Early online date03 Aug 2017
DOIs
Publication statusPublished - 01 Dec 2017
Externally publishedYes

Keywords

  • Data mining
  • Graph clustering
  • Multivariate discretization
  • Normalized association
  • Normalized cuts

Fingerprint

Dive into the research topics of 'Graph clustering-based discretization of splitting and merging methods (GraphS and GraphM)'. Together they form a unique fingerprint.

Cite this