Consensus clustering-based undersampling for improved classification of transient events in time-domain astronomy surveys

Tossapon Boongoen, Natthakan Iam-On*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Astronomical data analytics has rapidly expanded given the advancement of data handling techniques and computing system. The race to discover new events is subject to acquiring and digesting the high volume of data from sky surveys efficiently, yet accurately. The assumption is valid for many modern astronomy projects, with the issue of big data storage on the one hand, and effective data analysis on the other. This research deals with the latter by focusing on the classification of potential transient events initially detected in time-domain astronomical surveys. Most of these candidate transients represent false positives that are the results of fault in hardware, errors in data collection and/or data pre-processing. Hence, the ability to filter these out is much needed to avoid a laborious manual assessment down the line. The problem investigated here is that training data can be highly imbalanced. For the first attempt, the coupling of oversampling methods and several classifiers provides an improvement, but generally leads to overfitting. As a solution, this paper presents a novel application of consensus clustering to undersample majority-class instances instead. It not only helps to overcome the aforementioned drawback but also strengthen the recent approach that exploits a single clustering to guide the selection of representative samples.
Original languageEnglish
Article number37382
Number of pages18
JournalScientific Reports
Volume15
DOIs
Publication statusPublished - 27 Oct 2025

Fingerprint

Dive into the research topics of 'Consensus clustering-based undersampling for improved classification of transient events in time-domain astronomy surveys'. Together they form a unique fingerprint.

Cite this