Decision Trees for Hierarchical Multilabel Classification: A Case Study in Functional Genomics

Sašo Džeroski, Hendrik Blockeel, Amanda Clare, Schietgat Leander, Jan Struyf

Research output: Chapter in Book/Report/Conference proceedingConference Proceeding (Non-Journal item)

92 Citations (Scopus)

Abstract

Hierarchical multilabel classification (HMC) is a variant of classification where instances may belong to multiple classes organized in a hierarchy. The task is relevant for several application domains. This paper presents an empirical study of decision tree approaches to HMC in the area of functional genomics. We compare learning a single HMC tree (which makes predictions for all classes together) to learning a set of regular classification trees (one for each class). Interestingly, on all 12 datasets we use, the HMC tree wins on all fronts: it is faster to learn and to apply, easier to interpret, and has similar or better predictive performance than the set of regular trees. It turns out that HMC tree learning is more robust to overfitting than regular tree learning.
Original languageEnglish
Title of host publicationKnowledge Discovery in Databases
Subtitle of host publicationPKDD 2006 - 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, Proceedings
PublisherSpringer Nature
Pages18-29
Number of pages12
ISBN (Print)3540453741, 9783540453741
DOIs
Publication statusPublished - 2006

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4213 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Fingerprint

Dive into the research topics of 'Decision Trees for Hierarchical Multilabel Classification: A Case Study in Functional Genomics'. Together they form a unique fingerprint.

Cite this