Temporal Saliency Query Network for Efficient Video Recognition

Boyang Xia, Zhihao Wang, Wenhao Wu, Haoran Wang, Jungong Han

Allbwn ymchwil: Pennod mewn Llyfr/Adroddiad/Trafodion CynhadleddTrafodion Cynhadledd (Nid-Cyfnodolyn fathau)

13 Dyfyniadau (Scopus)

Crynodeb

Efficient video recognition is a hot-spot research topic with the explosive growth of multimedia data on the Internet and mobile devices. Most existing methods select the salient frames without awareness of the class-specific saliency scores, which neglect the implicit association between the saliency of frames and its belonging category. To alleviate this issue, we devise a novel Temporal Saliency Query (TSQ) mechanism, which introduces class-specific information to provide fine-grained cues for saliency measurement. Specifically, we model the class-specific saliency measuring process as a query-response task. For each category, the common pattern of it is employed as a query and the most salient frames are responded to it. Then, the calculated similarities are adopted as the frame saliency scores. To achieve it, we propose a Temporal Saliency Query Network (TSQNet) that includes two instantiations of the TSQ mechanism based on visual appearance similarities and textual event-object relations. Afterward, cross-modality interactions are imposed to promote the information exchange between them. Finally, we use the class-specific saliencies of the most confident categories generated by two modalities to perform the selection of salient frames. Extensive experiments demonstrate the effectiveness of our method by achieving state-of-the-art results on ActivityNet, FCVID and Mini-Kinetics datasets. Our project page is at https://lawrencexia2008.github.io/projects/tsqnet.

Iaith wreiddiolSaesneg
TeitlComputer Vision – ECCV 2022 - 17th European Conference, Proceedings
Is-deitl17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXIV
GolygyddionShai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, Tal Hassner
Tudalennau741-759
Nifer y tudalennau19
Dynodwyr Gwrthrych Digidol (DOIs)
StatwsCyhoeddwyd - 22 Hyd 2022

Cyfres gyhoeddiadau

EnwLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Cyfrol13694 LNCS
ISSN (Argraffiad)0302-9743
ISSN (Electronig)1611-3349

Ôl bys

Gweld gwybodaeth am bynciau ymchwil 'Temporal Saliency Query Network for Efficient Video Recognition'. Gyda’i gilydd, maen nhw’n ffurfio ôl bys unigryw.

Dyfynnu hyn