Progressively real-time video salient object detection via cascaded fully convolutional networks with motion attention

Qingping Zheng, Ying Li, Ling Zheng, Qiang Shen

Research output: Contribution to journalArticlepeer-review

12 Citations (SciVal)
111 Downloads (Pure)


Semantics and motion are two cues of essence for the success in video salient object detection. Most existing deep-learning based approaches extract semantic features by the use of only one fully convolutional network with simple stacked encoders. They simulate motion patterns of video objects with two consecutive frames being simultaneously fed into a convolutional LSTM network or a weights-sharing fully convolutional network. However, such approaches have the shortcomings of producing a coarse predicted saliency map or requiring significant computational overheads. In this paper, we present a novel approach with cascaded fully convolutional networks involving motion attention (abbreviated as CFCN-MA), to achieve real-time saliency detection in videos. Our key idea is to construct twofold fully convolutional networks in order to gain a saliency map from coarse to fine. We devise an optical flow-based motion attention mechanism to improve the prediction accuracy of the initial fully convolutional networks, using the popular FlowNet2-SD model that is efficient and effective for motion pattern recognition of distinctive objects in videos. This method can obtain a fine saliency map with a refined region of interest. Moreover, we propose a means for calculating attention-guided intersection-over-union loss (shortnamed as AIoU) to supervise the CFCN-MA model in learning a saliency map with both clear edge and complete structure. Our approach is evaluated on three popular benchmark datasets, namely DAVIS, ViSal and FBMS. Experimental results demonstrate that our method outperforms many state-of-the-art techniques while meeting the real-time demand at 27 fps.

Original languageEnglish
Pages (from-to)465-475
Number of pages11
Early online date19 Oct 2021
Publication statusPublished - 07 Jan 2022


  • Cascaded fully convolutional networks
  • Motion attention
  • Optical flow
  • Video salient object detection


Dive into the research topics of 'Progressively real-time video salient object detection via cascaded fully convolutional networks with motion attention'. Together they form a unique fingerprint.

Cite this