Attentional Composition Networks for Long-Tailed Human Action Recognition
Wang, Haoran5; Wang, Yajie5; Yu, Baosheng4; Zhan, Yibing3; Yuan, Chunfeng2; Yang, Wankou1
刊名ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS
2024
卷号20期号:1页码:18
关键词Compositional learning long tail few-shot zero-shot action recognition
ISSN号1551-6857
DOI10.1145/3603253
通讯作者Wang, Haoran(wanghaoran@ise.neu.edu.cn)
英文摘要The problem of long-tailed visual recognition has been receiving increasing research attention. However, the long-tailed distribution problem remains underexplored for video-based visual recognition. To address this issue, in this article we propose a compositional learning based solution for video-based human action recognition. Our method, named Attentional Composition Networks (ACN), first learns verb-like and prepositionlike components, then shuffles these components to generate samples for the tail classes in the feature space to augment the data for the tail classes. Specifically, during training, we represent each action video by a graph that captures the spatial-temporal relations (edges) among detected human/object instances (nodes). Then, ACN utilizes the position information to decompose each action into a set of verb and preposition representations using the edge features in the graph. After that, the verb and preposition features from different videos are combined via an attention structure to synthesize feature representations for tail classes. This way, we can enrich the data for the tail classes and consequently improve the action recognition for these classes. To evaluate the compositional human action recognition, we further contribute a new human action recognition dataset, namely NEU-Interaction (NEU-I). Experimental results on both Something-Something V2 and the proposed NEU-I demonstrate the effectiveness of the proposed method for long-tailed, few-shot, and zero-shot problems in human action recognition. Source code and the NEU-I dataset are available at https://github.com/YajieW99/ACN.
资助项目Major Science and Technology Innovation 2030 New Generation Artificial Intelligence key project[2021ZD0111700] ; Fundamental Research Funds for the Central Universities of China[N2304012] ; National Nature Science Foundation of China[61773117] ; National Nature Science Foundation of China[61972397] ; National Nature Science Foundation of China[62276061] ; National Nature Science Foundation of China[62002090]
WOS研究方向Computer Science
语种英语
出版者ASSOC COMPUTING MACHINERY
WOS记录号WOS:001080441800008
资助机构Major Science and Technology Innovation 2030 New Generation Artificial Intelligence key project ; Fundamental Research Funds for the Central Universities of China ; National Nature Science Foundation of China
内容类型期刊论文
源URL[http://ir.ia.ac.cn/handle/173211/52978]  
专题多模态人工智能系统全国重点实验室
通讯作者Wang, Haoran
作者单位1.Southeast Univ, Sch Automat, Nanjing, Peoples R China
2.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
3.JD Explore Acad, Beijing 100176, Peoples R China
4.Univ Sydney, Sch Comp Sci, Fac Engn, Darlington, NSW 2008, Australia
5.Northeastern Univ, Coll Informat Sci & Engn, Shenyang 110819, Peoples R China
推荐引用方式
GB/T 7714
Wang, Haoran,Wang, Yajie,Yu, Baosheng,et al. Attentional Composition Networks for Long-Tailed Human Action Recognition[J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS,2024,20(1):18.
APA Wang, Haoran,Wang, Yajie,Yu, Baosheng,Zhan, Yibing,Yuan, Chunfeng,&Yang, Wankou.(2024).Attentional Composition Networks for Long-Tailed Human Action Recognition.ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS,20(1),18.
MLA Wang, Haoran,et al."Attentional Composition Networks for Long-Tailed Human Action Recognition".ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS 20.1(2024):18.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace