TY - JOUR
T1 - Employing Bilinear Fusion and Saliency Prior Information for RGB-D Salient Object Detection
AU - Huang, Nianchang
AU - Yang, Yang
AU - Zhang, Dingwen
AU - Zhang, Qiang
AU - Han, Jungong
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2022/1/1
Y1 - 2022/1/1
N2 - Multi-modal feature fusion and saliency reasoning are two core sub-tasks of RGB-D salient object detection. However, most existing models employ linear fusion strategies (e.g., concatenation) for multi-modal feature fusion and use a simple coarse-to-fine structure for saliency reasoning. Despite their simpleness, they can neither fully capture the cross-modal complementary information nor exploit the multi-level complementary information among the cross-modal features at different levels. To address these issues, a novel RGB-D salient object detection model is presented, where we pay special attention to the aforementioned two sub-tasks. Concretely, a multi-modal feature interaction module is first presented to explore more interactions between the unimodal RGB and depth features. It helps to capture their cross-modal complementary information by jointly using some simple linear fusion strategies and bilinear fusion ones. Then, a saliency prior information guided fusion module is presented to exploit the multi-level complementary information among the fused cross-modal features at different levels. Instead of employing a simple convolutional layer for the final saliency prediction, a saliency refinement and prediction module is designed to better exploit those extracted multi-level cross-modal information for RGB-D saliency detection. Experimental results on several benchmark datasets verify the effectiveness and superiority of the proposed framework over some state-of-the-art methods.
AB - Multi-modal feature fusion and saliency reasoning are two core sub-tasks of RGB-D salient object detection. However, most existing models employ linear fusion strategies (e.g., concatenation) for multi-modal feature fusion and use a simple coarse-to-fine structure for saliency reasoning. Despite their simpleness, they can neither fully capture the cross-modal complementary information nor exploit the multi-level complementary information among the cross-modal features at different levels. To address these issues, a novel RGB-D salient object detection model is presented, where we pay special attention to the aforementioned two sub-tasks. Concretely, a multi-modal feature interaction module is first presented to explore more interactions between the unimodal RGB and depth features. It helps to capture their cross-modal complementary information by jointly using some simple linear fusion strategies and bilinear fusion ones. Then, a saliency prior information guided fusion module is presented to exploit the multi-level complementary information among the fused cross-modal features at different levels. Instead of employing a simple convolutional layer for the final saliency prediction, a saliency refinement and prediction module is designed to better exploit those extracted multi-level cross-modal information for RGB-D saliency detection. Experimental results on several benchmark datasets verify the effectiveness and superiority of the proposed framework over some state-of-the-art methods.
KW - Bilinear fusion strategy
KW - RGB-D salient object detection
KW - saliency prior information guided fusion
KW - saliency refinement and prediction
UR - http://www.scopus.com/inward/record.url?scp=85103782143&partnerID=8YFLogxK
U2 - 10.1109/TMM.2021.3069297
DO - 10.1109/TMM.2021.3069297
M3 - Article
AN - SCOPUS:85103782143
SN - 1520-9210
VL - 24
SP - 1651
EP - 1664
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -