Semantic-Aware Real-Time Correlation Tracking Framework for UAV Videos

Xizhe Xue, Ying Li, Xiaoyue Yin, Changjing Shang, Taoxin Peng, Qiang Shen

Research output: Contribution to journalArticlepeer-review

9 Citations (SciVal)
247 Downloads (Pure)


Discriminative correlation filter (DCF) has contributed tremendously to address the problem of object tracking benefitting from its high computational efficiency. However, it has suffered from performance degradation in unmanned aerial vehicle (UAV) tracking. This article presents a novel semantic-aware real-time correlation tracking framework (SARCT) for UAV videos to enhance the performance of DCF trackers without incurring excessive computing cost. Specifically, SARCT first constructs an additional detection module to generate ROI proposals and to filter any response regarding the target irrelevant area. Then, a novel semantic segmentation module based on semantic template generation and semantic coefficient prediction is further introduced to capture semantic information, which can provide precise ROI mask, thereby effectively suppressing background interference in the ROI proposals. By sharing features and specific network layers for object detection and semantic segmentation, SARCT reduces parameter redundancy to attain sufficient speed for real-time applications. Systematic experiments are conducted on three typical aerial datasets in order to evaluate the performance of the proposed SARCT. The results demonstrate that SARCT is able to improve the accuracy of conventional DCF-based trackers significantly, outperforming state-of-the-art deep trackers.
Original languageEnglish
Pages (from-to)2418-2429
Number of pages12
JournalIEEE Transactions on Cybernetics
Issue number4
Early online date23 Jul 2020
Publication statusPublished - 01 Apr 2022


  • Detection proposals
  • Discriminative correlation filter (DCF)
  • Semantic information
  • Unmanned aerial vehicle (UAV) tracking


Dive into the research topics of 'Semantic-Aware Real-Time Correlation Tracking Framework for UAV Videos'. Together they form a unique fingerprint.

Cite this