Joint Cross-Modal and Unimodal Features for RGB-D Salient Object Detection

Nianchang Huang, Yi Liu, Qiang Zhang, Jungong Han

Research output: Contribution to journalArticlepeer-review

22 Citations (SciVal)
232 Downloads (Pure)


RGB-D salient object detection is one of the basic tasks in computer vision. Most existing models focus on investigating efficient ways of fusing the complementary information from RGB and depth images for better saliency detection. However, for many real-life cases, where one of the input images has poor visual quality or contains affluent saliency cues, fusing cross-modal features does not help to improve the detection accuracy, when compared to using unimodal features only. In view of this, a novel RGB-D salient object detection model is proposed by simultaneously exploiting the cross-modal features from the RGB-D images and the unimodal features from the input RGB and depth images for saliency detection. To this end, a Multi-branch Feature Fusion Module is presented to effectively capture the cross-level and cross-modal complementary information between RGB-D images, as well as the cross-level unimodal features from the RGB images and the depth images separately. On top of that, a Feature Selection Module is designed to adaptively select those highly discriminative features for the final saliency prediction from the fused cross-modal features and the unimodal features. Extensive evaluations on four benchmark datasets demonstrate that the proposed model outperforms the state-of-the-art approaches by a large margin.
Original languageEnglish
Article number9147051
Pages (from-to)2428-2441
Number of pages14
JournalIEEE Transactions on Multimedia
Publication statusPublished - 24 Jul 2020


  • RGB-D
  • multi-branch feature fusion and feature selection
  • saliency detection


Dive into the research topics of 'Joint Cross-Modal and Unimodal Features for RGB-D Salient Object Detection'. Together they form a unique fingerprint.

Cite this