Abstract
Hyperspectral Image (HSI) cross-scene classification is a challenging task in remote sensing, particularly when realtime processing of Target Domain (TD) HSI is required, and data cannot be reused for training. While deep learning methods have shown promising results, the generalization ability of HIS representations remains limited, mainly due to class label imbalance. This paper introduces a dual-stage learning framework based on transfer learning to enhance classification accuracy in the TD. The framework includes a self-supervised learning stage and a supervised fine-tuning stage. The self-supervised stage focuses on learning robust representations by leveraging inherent structures within HSI data, while the fine-tuning stage uses training labels to extract semantic information. A masked diffusion model predicts masked tokens from unmasked ones, capturing both high-level structures and fine details in HIS data. An efficient spatiospectral Transformer, which removes self-attention from the decoder, is proposed to enhance the selfsupervised process. This design allows mask tokens to obtain information from visible tokens without interacting with each other, reducing sequence length and computational costs. By decoding each mask token conditionally independently, only a subset of masked tokens is processed. Extensive experiments on two public HSI datasets demonstrate that the proposed method outperforms state-of-the-art techniques.
Original language | English |
---|---|
Title of host publication | 2025 IEEE International Conference on Acoustics, Speech and Signal Processing |
Publisher | IEEE Press |
Publication status | Accepted/In press - 20 Dec 2024 |
Event | 2025 IEEE International Conference on Acoustics, Speech and Signal Processing - Hyderabad, India Duration: 06 Apr 2025 → 11 Apr 2025 |
Conference
Conference | 2025 IEEE International Conference on Acoustics, Speech and Signal Processing |
---|---|
Country/Territory | India |
City | Hyderabad |
Period | 06 Apr 2025 → 11 Apr 2025 |