Pairwise similarity for cluster ensemble problem: Link-based and approximate approaches

Natthakan Iam-On*, Tossapon Boongoen

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingChapter

5 Citations (Scopus)

Abstract

Cluster ensemble methods have emerged as powerful techniques, aggregating several input data clusterings to generate a single output clustering, with improved robustness and stability. In particular, link-based similarity techniques have recently been introduced with superior performance to the conventional co-association method. Their potential and applicability are, however limited due to the underlying time complexity. In light of such shortcoming, this paper presents two approximate approaches that mitigate the problem of time complexity: the approximate algorithm approach (Approximate SimRank Based Similarity matrix) and the approximate data approach (Prototype-based cluster ensemble model). The first approach involves decreasing the computational requirement of the existing link-based technique; the second reduces the size of the problem by finding a smaller, representative, approximate dataset, derived by a density-biased sampling technique. The advantages of both approximate approaches are empirically demonstrated over 22 datasets (both artificial and real data) and statistical comparisons of performance (with 95% confidence level) with three well-known validity criteria. Results obtained from these experiments suggest that approximate techniques can efficiently help scaling up the application of link-based similarity methods to wider range of data sizes.

Original languageEnglish
Title of host publicationTransactions on Large-Scale Data- and Knowledge-Centered Systems IX
EditorsAbdelkader Hameurlain, Josef Küng, Roland Wagner
PublisherSpringer Nature
Pages95-122
Number of pages28
ISBN (Print)9783642400681
DOIs
Publication statusPublished - 2013
Externally publishedYes

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7980
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Keywords

  • cluster ensembles
  • cluster relation
  • clustering
  • data prototype
  • link analysis
  • pairwise similarity matrix

Fingerprint

Dive into the research topics of 'Pairwise similarity for cluster ensemble problem: Link-based and approximate approaches'. Together they form a unique fingerprint.

Cite this