DGIG-Net: Dynamic Graph-in-Graph Networks for Few-Shot Human-Object Interaction

Xiyao Liu, Zhong Ji, Yanwei Pang, Jungong Han, Xuelong Li

Research output: Contribution to journalArticlepeer-review

19 Citations (Scopus)
210 Downloads (Pure)

Abstract

Few-shot learning (FSL) for human-object interaction (HOI) aims at recognizing various relationships between human actions and surrounding objects only from a few samples. It is a challenging vision task, in which the diversity and interactivity of human actions result in great difficulty to learn an adaptive classifier to catch ambiguous interclass information. Therefore, traditional FSL methods usually perform unsatisfactorily in complex HOI scenes. To this end, we propose dynamic graph-in-graph networks (DGIG-Net), a novel graph prototypes framework to learn a dynamic metric space by embedding a visual subgraph to a task-oriented cross-modal graph for few-shot HOI. Specifically, we first build a knowledge reconstruction graph to learn latent representations for HOI categories by reconstructing the relationship among visual features, which generates visual representations under the category distribution of every task. Then, a dynamic relation graph integrates both reconstructible visual nodes and dynamic task-oriented semantic information to explore a graph metric space for HOI class prototypes, which applies the discriminative information from the similarities among actions or objects. We validate DGIG-Net on multiple benchmark datasets, on which it largely outperforms existing FSL approaches and achieves state-of-the-art results.

Original languageEnglish
Pages (from-to)7852-7864
Number of pages13
JournalIEEE Transactions on Cybernetics
Volume52
Issue number8
Early online date10 Feb 2021
DOIs
Publication statusPublished - 01 Aug 2022

Keywords

  • Dynamic graph
  • few-shot learning (FSL)
  • graph convolutional network (GCN)
  • human-object interaction (HOI)
  • metalearning

Fingerprint

Dive into the research topics of 'DGIG-Net: Dynamic Graph-in-Graph Networks for Few-Shot Human-Object Interaction'. Together they form a unique fingerprint.

Cite this