Estimation of missing values in astronomical survey data: An improved local approach using cluster directed neighbor selection

Phimmarin Keerin, Tossapon Boongoen*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

10 Citations (Scopus)
95 Downloads (Pure)

Abstract

The work presented in this paper aims to develop new imputation methods to better handle missing values encountered in astronomical data analysis, especially the classification of transient events in a sky survey from the Gravitational wave Optical Transient Observatory (GOTO) project. In particular, the framework of cluster directed selection of neighbors that has proven effective for benchmark local imputation techniques of KNNimpute and LLSimpute are extended to new multi-stage models. The proposed models, namely Iterative-CKNN and Iterative-CLLS, are novel with an original application to analyze sky survey data. They bring out advantages from both local approaches, where estimates are summarized from neighbors in the same data cluster, within the iterative process to refine previous guesses. Based on experiments with simulated datasets corresponding to different survey sizes and missing rations between 1 to 20%, they usually outperform baseline models and Bayesian Principal Component Analysis (BPCA), which is the well-known global technique. For instance, at 10% missing rate, Iterative-CLLS appears to be the most accurate with NRMSE score of 0.190, while BPCA and the best among its baseline models reaches 0.351 and 0.249, respectively. For their practical implications, these methods have proven to be effective for classifying transients, using common algorithms like KNN, Naive Bayes and Random Forest.
Original languageEnglish
Article number102881
JournalInformation Processing and Management
Volume59
Issue number2
Early online date03 Feb 2022
DOIs
Publication statusPublished - 01 Mar 2022

Keywords

  • Astronomy
  • Clustering
  • Imputation
  • Missing value
  • Sky survey

Fingerprint

Dive into the research topics of 'Estimation of missing values in astronomical survey data: An improved local approach using cluster directed neighbor selection'. Together they form a unique fingerprint.

Cite this