HyperDiff: Masked Diffusion Model with High-efficient Transformer for Hyperspectral Image Cross-Scene Classification

Pei Zhang, Dong Wang, Chanyue Wu, Jing Yang, Lei Kang, Zongwen Bai, Ying Li, Qiang Shen

Research output: Chapter in Book/Report/Conference proceedingConference Proceeding (Non-Journal item)

7 Downloads (Pure)

Abstract

Hyperspectral Image (HSI) cross-scene classification is a challenging task in remote sensing, particularly when realtime processing of Target Domain (TD) HSI is required, and data cannot be reused for training. While deep learning methods have shown promising results, the generalization ability of HIS representations remains limited, mainly due to class label imbalance. This paper introduces a dual-stage learning framework based on transfer learning to enhance classification accuracy in the TD. The framework includes a self-supervised learning stage and a supervised fine-tuning stage. The self-supervised stage focuses on learning robust representations by leveraging inherent structures within HSI data, while the fine-tuning stage uses training labels to extract semantic information. A masked diffusion model predicts masked tokens from unmasked ones, capturing both high-level structures and fine details in HIS data. An efficient spatiospectral Transformer, which removes self-attention from the decoder, is proposed to enhance the selfsupervised process. This design allows mask tokens to obtain information from visible tokens without interacting with each other, reducing sequence length and computational costs. By decoding each mask token conditionally independently, only a subset of masked tokens is processed. Extensive experiments on two public HSI datasets demonstrate that the proposed method outperforms state-of-the-art techniques.
Original languageEnglish
Title of host publication2025 IEEE International Conference on Acoustics, Speech and Signal Processing
PublisherIEEE Press
Publication statusAccepted/In press - 20 Dec 2024
Event2025 IEEE International Conference on Acoustics, Speech and Signal Processing - Hyderabad, India
Duration: 06 Apr 202511 Apr 2025

Conference

Conference2025 IEEE International Conference on Acoustics, Speech and Signal Processing
Country/TerritoryIndia
CityHyderabad
Period06 Apr 202511 Apr 2025

Fingerprint

Dive into the research topics of 'HyperDiff: Masked Diffusion Model with High-efficient Transformer for Hyperspectral Image Cross-Scene Classification'. Together they form a unique fingerprint.

Cite this