TY - JOUR
T1 - DGIG-Net
T2 - Dynamic Graph-in-Graph Networks for Few-Shot Human-Object Interaction
AU - Liu, Xiyao
AU - Ji, Zhong
AU - Pang, Yanwei
AU - Han, Jungong
AU - Li, Xuelong
N1 - This work was supported by the National Natural Science Foundation of China (NSFC) under Grants 61771329 and 61632018, and the Natural Science Foundation of Tianjin under Grant 19JCYBJC16000. X. Liu, Z. Ji*(corresponding author), and Y. Pang are with the School of Electrical and Information Engineering, Tianjin University, and Tianjin Key Laboratory of Brain-inspired Intelligence Technology, Tianjin 300072, China (e-mails: [email protected]; [email protected]; [email protected]).
Publisher Copyright:
IEEE
PY - 2022/8/1
Y1 - 2022/8/1
N2 - Few-shot learning (FSL) for human-object interaction (HOI) aims at recognizing various relationships between human actions and surrounding objects only from a few samples. It is a challenging vision task, in which the diversity and interactivity of human actions result in great difficulty to learn an adaptive classifier to catch ambiguous interclass information. Therefore, traditional FSL methods usually perform unsatisfactorily in complex HOI scenes. To this end, we propose dynamic graph-in-graph networks (DGIG-Net), a novel graph prototypes framework to learn a dynamic metric space by embedding a visual subgraph to a task-oriented cross-modal graph for few-shot HOI. Specifically, we first build a knowledge reconstruction graph to learn latent representations for HOI categories by reconstructing the relationship among visual features, which generates visual representations under the category distribution of every task. Then, a dynamic relation graph integrates both reconstructible visual nodes and dynamic task-oriented semantic information to explore a graph metric space for HOI class prototypes, which applies the discriminative information from the similarities among actions or objects. We validate DGIG-Net on multiple benchmark datasets, on which it largely outperforms existing FSL approaches and achieves state-of-the-art results.
AB - Few-shot learning (FSL) for human-object interaction (HOI) aims at recognizing various relationships between human actions and surrounding objects only from a few samples. It is a challenging vision task, in which the diversity and interactivity of human actions result in great difficulty to learn an adaptive classifier to catch ambiguous interclass information. Therefore, traditional FSL methods usually perform unsatisfactorily in complex HOI scenes. To this end, we propose dynamic graph-in-graph networks (DGIG-Net), a novel graph prototypes framework to learn a dynamic metric space by embedding a visual subgraph to a task-oriented cross-modal graph for few-shot HOI. Specifically, we first build a knowledge reconstruction graph to learn latent representations for HOI categories by reconstructing the relationship among visual features, which generates visual representations under the category distribution of every task. Then, a dynamic relation graph integrates both reconstructible visual nodes and dynamic task-oriented semantic information to explore a graph metric space for HOI class prototypes, which applies the discriminative information from the similarities among actions or objects. We validate DGIG-Net on multiple benchmark datasets, on which it largely outperforms existing FSL approaches and achieves state-of-the-art results.
KW - Dynamic graph
KW - few-shot learning (FSL)
KW - graph convolutional network (GCN)
KW - human-object interaction (HOI)
KW - metalearning
UR - http://www.scopus.com/inward/record.url?scp=85100846515&partnerID=8YFLogxK
U2 - 10.1109/TCYB.2021.3049537
DO - 10.1109/TCYB.2021.3049537
M3 - Article
C2 - 33566778
AN - SCOPUS:85100846515
SN - 2168-2267
VL - 52
SP - 7852
EP - 7864
JO - IEEE Transactions on Cybernetics
JF - IEEE Transactions on Cybernetics
IS - 8
ER -