Mitigating Modality Discrepancies for RGB-T Semantic Segmentation

Shenlu Zhao, Yichen Liu, Qiang Jiao, Qiang Zhang, Jungong Han

Research output: Contribution to journalArticlepeer-review

10 Citations (Scopus)


Semantic segmentation models gain robustness against adverse illumination conditions by taking advantage of complementary information from visible and thermal infrared (RGB-T) images. Despite its importance, most existing RGB-T semantic segmentation models directly adopt primitive fusion strategies, such as elementwise summation, to integrate multimodal features. Such strategies, unfortunately, overlook the modality discrepancies caused by inconsistent unimodal features obtained by two independent feature extractors, thus hindering the exploitation of cross-modal complementary information within the multimodal data. For that, we propose a novel network for RGB-T semantic segmentation, i.e. MDRNet<inline-formula> <tex-math notation="LaTeX">$+$</tex-math> </inline-formula>, which is an improved version of our previous work ABMDRNet. The core of MDRNet<inline-formula> <tex-math notation="LaTeX">$+$</tex-math> </inline-formula> is a brand new idea, termed the strategy of bridging-then-fusing, which mitigates modality discrepancies before cross-modal feature fusion. Concretely, an improved Modality Discrepancy Reduction (MDR<inline-formula> <tex-math notation="LaTeX">$+$</tex-math> </inline-formula>) subnetwork is designed, which first extracts unimodal features and reduces their modality discrepancies. Afterward, discriminative multimodal features for RGB-T semantic segmentation are adaptively selected and integrated via several channel-weighted fusion (CWF) modules. Furthermore, a multiscale spatial context (MSC) module and a multiscale channel context (MCC) module are presented to effectively capture the contextual information. Finally, we elaborately assemble a challenging RGB-T semantic segmentation dataset, i.e., RTSS, for urban scene understanding to mitigate the lack of well-annotated training data. Comprehensive experiments demonstrate that our proposed model surpasses other state-of-the-art models on the MFNet, PST900, and RTSS datasets remarkably.

Original languageEnglish
Number of pages15
JournalIEEE Transactions on Neural Networks and Learning Systems
Early online date06 Jan 2023
Publication statusE-pub ahead of print - 06 Jan 2023


  • Bridging-then-fusing
  • contextual information
  • Data mining
  • dataset
  • Feature extraction
  • Lighting
  • modality discrepancy reduction
  • RGB-T semantic segmentation
  • Semantic segmentation
  • Semantics
  • Sensor phenomena and characterization
  • Training
  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Software


Dive into the research topics of 'Mitigating Modality Discrepancies for RGB-T Semantic Segmentation'. Together they form a unique fingerprint.

Cite this