Robust Video-Text Retrieval Via Noisy Pair Calibration

doi:10.1109/TMM.2023.3239183

CORC > 自动化研究所 > 中国科学院自动化研究所 > 多模态人工智能系统全国重点实验室

	Robust Video-Text Retrieval Via Noisy Pair Calibration
	Zhang, Huaiwen 1,2,3; Yang, Yang 1,2,3; Qi, Fan 4; Qian, Shengsheng 5,6; Xu, Changsheng 5,6
刊名	IEEE TRANSACTIONS ON MULTIMEDIA
	2023
卷号	25 页码:8632-8645
关键词	Noise calibration uncertainty video text retrieval
ISSN号	1520-9210
DOI	10.1109/TMM.2023.3239183
通讯作者	Qian, Shengsheng(shengsheng.qian@nlpr.ia.ac.cn)
英文摘要	Video-text retrieval is a fundamental task in managing the emerging massive amounts of video data. The main challenge focuses on learning a common representation space for videos and queries where the similarity measurement can reflect the semantic closeness. However, existing video-text retrieval models may suffer from the following noise in the common space learning procedure: First, the video-text correspondences in positive pairs may not be exact matches. The crowdsourcing annotation for existing datasets leads to inevitable tagging noise for non-expert annotators. Second, the learning of video-text representation is based on the negative samples randomly sampled. Instances that are semantically similar to the query may be incorrectly categorized as negative samples. To alleviate the adverse impact of these noisy pairs, we propose a novel robust video-text retrieval method that protects the model from noisy positive and negative pairs by identifying and calibrating noisy pairs with their uncertainty score. In particular, we propose a noisy pair identifier, which divides the training dataset into noisy and clean subsets based on the estimated uncertainty of each pair. Then, with the help of uncertainties, we calibrate the two types of noisy pairs with an adaptive margin triplet loss and a weighted triplet loss function, respectively. To verify the effectiveness of our methods, we conduct extensive experiments on three widely used datasets. Experimental results show that the proposed robust video-text retrieval methods successfully identify and calibrate the noisy pairs and improve retrieval performance.
资助项目	National Natural Science Foundation of China
WOS研究方向	Computer Science ; Telecommunications
语种	英语
出版者	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
WOS记录号	WOS:001125902000070
资助机构	National Natural Science Foundation of China
内容类型	期刊论文
源URL	[http://ir.ia.ac.cn/handle/173211/54883]
专题	多模态人工智能系统全国重点实验室
通讯作者	Qian, Shengsheng
作者单位	1.Inner Mongolia Univ, Coll Comp Sci, Mongolia 010031, Peoples R China 2.Natl & Local Joint Engn Res Ctr Intelligent Inform, Mongolia 010031, Peoples R China 3.Inner Mongolia Key Lab Mongolian Informat Proc Tec, Hohhot 010021, Peoples R China 4.Tianjin Univ Technol, Sch Comp Sci & Engn, Tianjin 300384, Peoples R China 5.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China 6.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
推荐引用方式 GB/T 7714	Zhang, Huaiwen,Yang, Yang,Qi, Fan,et al. Robust Video-Text Retrieval Via Noisy Pair Calibration[J]. IEEE TRANSACTIONS ON MULTIMEDIA,2023,25:8632-8645.
APA	Zhang, Huaiwen,Yang, Yang,Qi, Fan,Qian, Shengsheng,&Xu, Changsheng.(2023).Robust Video-Text Retrieval Via Noisy Pair Calibration.IEEE TRANSACTIONS ON MULTIMEDIA,25,8632-8645.
MLA	Zhang, Huaiwen,et al."Robust Video-Text Retrieval Via Noisy Pair Calibration".IEEE TRANSACTIONS ON MULTIMEDIA 25(2023):8632-8645.