Graph clustering-based discretization approach to microarray data

Kittakorn Sriwanna*, Tossapon Boongoen, Natthakan Iam-On

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Citations (SciVal)

Abstract

Several techniques in data mining require discrete data. In fact, learning with discrete domains often performs better than the case of continuous data. Multivariate discretization is the algorithm that transforms continuous data to discrete one by considering correlations among attributes. Given the benefit of this idea, many multivariate discretization algorithms have been proposed. However, there are a few discretization algorithms that directly apply to microarray or gene expression data, which is high-dimensional and unbalance data. Even so interesting, no multivariate method has been put forward for microarray data analysis. According to the recent published research, graph clustering-based discretization of splitting and merging methods (GraphS and GraphM) usually achieves superior results compared to many well-known discretization algorithms. In this paper, GraphS and GraphM are extended by adding the alpha parameter that is the ratio between the similarity of gene expressions (distance) and the similarity of the class label. Moreover, the extensions consider 3 similarity measures of cosine similarity, Euclidean distance, and Pearson correlation in order to determine the proper pairwise similarity measure. The evaluation against 20 real microarray datasets and 4 classifiers suggests that the results of three classification performances (ACC, AUC, Kappa) and running time of two proposed methods based on cosine similarity, GraphM(C) and GraphS(C) are better than 9 state-of-the-art discretization algorithms.

Original languageEnglish
Pages (from-to)879-906
Number of pages28
JournalKnowledge and Information Systems
Volume60
Issue number2
Early online date05 Sept 2018
DOIs
Publication statusPublished - 01 Aug 2019
Externally publishedYes

Keywords

  • Data mining
  • Graph clustering
  • High-dimensional data
  • Microarray data
  • Multivariate discretization

Fingerprint

Dive into the research topics of 'Graph clustering-based discretization approach to microarray data'. Together they form a unique fingerprint.

Cite this