Deep cross-modal retrieval for remote sensing image and audio

doi:10.1109/PRRS.2018.8486338

CORC > 西安光学精密机械研究所 > 中国科学院西安光学精密机械研究所 > 光学影像学习与分析中心

	Deep cross-modal retrieval for remote sensing image and audio
	Mao, Gou 1,2; Yuan, Yuan1 ; Xiaoqiang, Lu1
	2018-10-08
会议日期	2018-08-19
会议地点	Beijing, China
DOI	10.1109/PRRS.2018.8486338
英文摘要	Remote sensing image retrieval has many important applications in civilian and military fields, such as disaster monitoring and target detecting. However, the existing research on image retrieval, mainly including to two directions, text based and content based, cannot meet the rapid and convenient needs of some special applications and emergency scenes. Based on text, the retrieval is limited by keyboard inputting because of its lower efficiency for some urgent situations and based on content, it needs an example image as reference, which usually does not exist. Yet speech, as a direct, natural and efficient human-machine interactive way, can make up these shortcomings. Hence, a novel cross-modal retrieval method for remote sensing image and spoken audio is proposed in this paper. We first build a large-scale remote sensing image dataset with plenty of manual annotated spoken audio captions for the cross-modal retrieval task. Then a Deep Visual-Audio Network is designed to directly learn the correspondence of image and audio. And this model integrates feature extracting and multi-modal learning into the same network. Experiments on the proposed dataset verify the effectiveness of our approach and prove that it is feasible for speech-to-image retrieval. ? 2018 IEEE.
产权排序	1
会议录	2018 10th IAPR Workshop on Pattern Recognition in Remote Sensing, PRRS 2018
会议录出版者	Institute of Electrical and Electronics Engineers Inc.
语种	英语
ISBN号	9781538684795
内容类型	会议论文
源URL	[http://ir.opt.ac.cn/handle/181661/30867]
专题	西安光学精密机械研究所_光学影像学习与分析中心
作者单位	1.Chinese Academy of Sciences, Center for OPTical IMagery Analysis and Learning (OPTIMAL), Xi'An Institute of Optics and Precision Mechanics, Xi'an, Shaanxi; 710119, China; 2.University of Chinese Academy of Sciences, Beijing; 100049, China
推荐引用方式 GB/T 7714	Mao, Gou,Yuan, Yuan,Xiaoqiang, Lu. Deep cross-modal retrieval for remote sensing image and audio[C]. 见:. Beijing, China. 2018-08-19.