Transformer-based hierarchical dynamic decoders for salient object detection

Qingping Zheng, Ling Zheng, Jiankang Deng, Ying Li*, Changjing Shang, Qiang Shen

*Awdur cyfatebol y gwaith hwn

Allbwn ymchwil: Cyfraniad at gyfnodolynErthygladolygiad gan gymheiriaid

10 Dyfyniadau (Scopus)
19 Wedi eu Llwytho i Lawr (Pure)

Crynodeb

Global context and global contrast are crucial clues for Salient Object Detection (SOD) in images. Most advanced SOD methods exploit CNN-based architectures, achieving impressive results. However, these methods have intrinsic limitations in capturing long-range global information since a CNN extracts feature in local sliding windows. In contrast, transformers exploit a self-attention mechanism to extract features, gaining a powerful capability of learning global cues. Nonetheless, a pure transformer-based network consumes a large computational overhead and easily suffers from attention collapse, as it goes deeper. To address this issue, in this paper, we propose a Transformer-based Hierarchical Dynamic Decoder (T-HDDNet) for image salient object detection. Specifically, our T-HDDNet employs the transformer to encode each image patch into multi-level and multi-resolution features based on the long-range dependencies among pixels. To obtain an accurate saliency map of high resolution, we develop a dynamic dual upsampling mechanism to enlarge feature spatial size in a data-driven manner, together with a dynamic feature fusion unit. Ultimately, the hierarchical dynamic decoders built on the basis of these two units are used to attain the final saliency progressively. Extensive experimental results show that the proposed method achieves the best performance on all benchmarks, in comparison with state-of-the-art technologies.

Iaith wreiddiolSaesneg
Rhif yr erthygl111075
Nifer y tudalennau11
CyfnodolynKnowledge-Based Systems
Cyfrol282
Dyddiad ar-lein cynnar30 Hyd 2023
Dynodwyr Gwrthrych Digidol (DOIs)
StatwsCyhoeddwyd - 20 Rhag 2023

Ôl bys

Gweld gwybodaeth am bynciau ymchwil 'Transformer-based hierarchical dynamic decoders for salient object detection'. Gyda’i gilydd, maen nhw’n ffurfio ôl bys unigryw.

Dyfynnu hyn