Abstract
Recent years have witnessed the prevalence of using the Siamese network for RGB-T tracking because of its remarkable success in RGB object tracking. Despite their faster than real-time speeds, existing RGB-T Siamese trackers suffer from low accuracy and poor robustness, compared to other state-of-the-art RGB-T trackers. To address such issues, a new complementarity- and distractor-aware RGB-T tracker based on Siamese network (referred to as SiamCDA) is developed in this paper. To this end, several modules are presented, where the feature pyramid network (FPN) is incorporated into the Siamese network to capture the cross-level information within unimodal features extracted from the RGB or the thermal images. Next, a complementarity-aware multi-modal feature fusion module (CA-MF) is specially designed to capture the cross-modal information between RGB features and thermal features. In the final bounding box selection phase, a distractor-aware region proposal selection module (DAS) further enhances the robustness of our tracker. On top of the technical modules, we also build a large-scale, diverse synthetic RGB-T tracking dataset, containing more than 4831 pairs of synthetic RGB-T videos and 12K synthetic RGB-T images. Extensive experiments on three RGB-T tracking benchmark datasets demonstrate the outstanding performance of our proposed tracker with a tracking speed over 37 frames per second (FPS).
Original language | English |
---|---|
Pages (from-to) | 1403-1417 |
Number of pages | 15 |
Journal | IEEE Transactions on Circuits and Systems for Video Technology |
Volume | 32 |
Issue number | 3 |
Early online date | 09 Apr 2021 |
DOIs | |
Publication status | Published - 01 Mar 2022 |
Keywords
- RGB-T tracking
- complementarity-aware fusion
- distractor-aware region proposal selection
- large-scale synthetic dataset
- siamese network