Robust Video-Text Retrieval Via Noisy Pair Calibration
Zhang, Huaiwen1,2,3; Yang, Yang1,2,3; Qi, Fan4; Qian, Shengsheng5,6; Xu, Changsheng5,6
刊名IEEE TRANSACTIONS ON MULTIMEDIA
2023
卷号25页码:8632-8645
关键词Noise calibration uncertainty video text retrieval
ISSN号1520-9210
DOI10.1109/TMM.2023.3239183
通讯作者Qian, Shengsheng(shengsheng.qian@nlpr.ia.ac.cn)
英文摘要Video-text retrieval is a fundamental task in managing the emerging massive amounts of video data. The main challenge focuses on learning a common representation space for videos and queries where the similarity measurement can reflect the semantic closeness. However, existing video-text retrieval models may suffer from the following noise in the common space learning procedure: First, the video-text correspondences in positive pairs may not be exact matches. The crowdsourcing annotation for existing datasets leads to inevitable tagging noise for non-expert annotators. Second, the learning of video-text representation is based on the negative samples randomly sampled. Instances that are semantically similar to the query may be incorrectly categorized as negative samples. To alleviate the adverse impact of these noisy pairs, we propose a novel robust video-text retrieval method that protects the model from noisy positive and negative pairs by identifying and calibrating noisy pairs with their uncertainty score. In particular, we propose a noisy pair identifier, which divides the training dataset into noisy and clean subsets based on the estimated uncertainty of each pair. Then, with the help of uncertainties, we calibrate the two types of noisy pairs with an adaptive margin triplet loss and a weighted triplet loss function, respectively. To verify the effectiveness of our methods, we conduct extensive experiments on three widely used datasets. Experimental results show that the proposed robust video-text retrieval methods successfully identify and calibrate the noisy pairs and improve retrieval performance.
资助项目National Natural Science Foundation of China
WOS研究方向Computer Science ; Telecommunications
语种英语
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
WOS记录号WOS:001125902000070
资助机构National Natural Science Foundation of China
内容类型期刊论文
源URL[http://ir.ia.ac.cn/handle/173211/54883]  
专题多模态人工智能系统全国重点实验室
通讯作者Qian, Shengsheng
作者单位1.Inner Mongolia Univ, Coll Comp Sci, Mongolia 010031, Peoples R China
2.Natl & Local Joint Engn Res Ctr Intelligent Inform, Mongolia 010031, Peoples R China
3.Inner Mongolia Key Lab Mongolian Informat Proc Tec, Hohhot 010021, Peoples R China
4.Tianjin Univ Technol, Sch Comp Sci & Engn, Tianjin 300384, Peoples R China
5.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
6.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
推荐引用方式
GB/T 7714
Zhang, Huaiwen,Yang, Yang,Qi, Fan,et al. Robust Video-Text Retrieval Via Noisy Pair Calibration[J]. IEEE TRANSACTIONS ON MULTIMEDIA,2023,25:8632-8645.
APA Zhang, Huaiwen,Yang, Yang,Qi, Fan,Qian, Shengsheng,&Xu, Changsheng.(2023).Robust Video-Text Retrieval Via Noisy Pair Calibration.IEEE TRANSACTIONS ON MULTIMEDIA,25,8632-8645.
MLA Zhang, Huaiwen,et al."Robust Video-Text Retrieval Via Noisy Pair Calibration".IEEE TRANSACTIONS ON MULTIMEDIA 25(2023):8632-8645.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace