Improving Description-Based Person Re-Identification by Multi-Granularity Image-Text Alignments

CORC > 自动化研究所 > 中国科学院自动化研究所 > 智能感知与计算研究中心

	Improving Description-Based Person Re-Identification by Multi-Granularity Image-Text Alignments
	Niu, Kai 1,2; Huang, Yan 1,2; Ouyang, Wanli 3; Wang, Liang 1,2,4,5
刊名	IEEE Transactions on Image Processing
	2020
卷号	29 期号:1 页码:15
关键词	Description-based person re-identification Multi-granularity image-text alignments Step training strategy
英文摘要	Description-based person re-identification (Re-id) is an important task in video surveillance that requires discriminative cross-modal representations to distinguish different people. It is difficult to directly measure the similarity between images and descriptions due to the modality heterogeneity (the cross-modal problem). And all samples belonging to a single category (the fine-grained problem) makes this task even harder than the conventional image-description matching task. In this paper, we propose a Multi-granularity Image-text Alignments (MIA) model to alleviate the cross-modal fine-grained problem for better similarity evaluation in description-based person Re-id. Specifically, three different granularities, i.e., global-global, global-local and local-local alignments are carried out hierarchically. Firstly, the global-global alignment in the Global Contrast (GC) module is for matching the global contexts of images and descriptions. Secondly, the global-local alignment employs the potential relations between local components and global contexts to highlight the distinguishable components while eliminating the uninvolved ones adaptively in the Relation-guided Global-local Alignment (RGA) module. Thirdly, as for the local-local alignment, we match visual human parts with noun phrases in the Bi-directional Fine-grained Matching (BFM) module. The whole network combining multiple granularities can be end-to-end trained without complex pre-processing. To address the difficulties in training the combination of multiple granularities, an effective step training strategy is proposed to train these granularities step-by-step. Extensive experiments and analysis have shown that our method obtains the state-of-the-art performance on the CUHK-PEDES dataset and outperforms the previous methods by a significant margin.
语种	英语
内容类型	期刊论文
源URL	[http://ir.ia.ac.cn/handle/173211/40563]
专题	自动化研究所_智能感知与计算研究中心
通讯作者	Wang, Liang
作者单位	1.Center for Research on Intelligent Perception and Computing (CRIPAC), National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA), Beijing 100190, China 2.School of Artificial Intelligence, University of Chinese Academy of Sciences (UCAS), Beijing 100049, China 3.School of Electrical and Information Engineering, The University of Sydney, Sydney, NSW 2006, Australia 4.Center for Excellence in Brain Science and Intelligence Technology (CEBSIT), Institute of Automation, Chinese Academy of Sciences (CASIA), Beijing 100190, China 5.CAS Artificial Intelligence Research (CAS-AIR), Qingdao 266061, China
推荐引用方式 GB/T 7714	Niu, Kai,Huang, Yan,Ouyang, Wanli,et al. Improving Description-Based Person Re-Identification by Multi-Granularity Image-Text Alignments[J]. IEEE Transactions on Image Processing,2020,29(1):15.
APA	Niu, Kai,Huang, Yan,Ouyang, Wanli,&Wang, Liang.(2020).Improving Description-Based Person Re-Identification by Multi-Granularity Image-Text Alignments.IEEE Transactions on Image Processing,29(1),15.
MLA	Niu, Kai,et al."Improving Description-Based Person Re-Identification by Multi-Granularity Image-Text Alignments".IEEE Transactions on Image Processing 29.1(2020):15.