Deep cross-modal retrieval for remote sensing image and audio | |
Mao, Gou1,2; Yuan, Yuan1; Xiaoqiang, Lu1 | |
2018-10-08 | |
会议日期 | 2018-08-19 |
会议地点 | Beijing, China |
DOI | 10.1109/PRRS.2018.8486338 |
英文摘要 | Remote sensing image retrieval has many important applications in civilian and military fields, such as disaster monitoring and target detecting. However, the existing research on image retrieval, mainly including to two directions, text based and content based, cannot meet the rapid and convenient needs of some special applications and emergency scenes. Based on text, the retrieval is limited by keyboard inputting because of its lower efficiency for some urgent situations and based on content, it needs an example image as reference, which usually does not exist. Yet speech, as a direct, natural and efficient human-machine interactive way, can make up these shortcomings. Hence, a novel cross-modal retrieval method for remote sensing image and spoken audio is proposed in this paper. We first build a large-scale remote sensing image dataset with plenty of manual annotated spoken audio captions for the cross-modal retrieval task. Then a Deep Visual-Audio Network is designed to directly learn the correspondence of image and audio. And this model integrates feature extracting and multi-modal learning into the same network. Experiments on the proposed dataset verify the effectiveness of our approach and prove that it is feasible for speech-to-image retrieval. ? 2018 IEEE. |
产权排序 | 1 |
会议录 | 2018 10th IAPR Workshop on Pattern Recognition in Remote Sensing, PRRS 2018 |
会议录出版者 | Institute of Electrical and Electronics Engineers Inc. |
语种 | 英语 |
ISBN号 | 9781538684795 |
内容类型 | 会议论文 |
源URL | [http://ir.opt.ac.cn/handle/181661/30867] |
专题 | 西安光学精密机械研究所_光学影像学习与分析中心 |
作者单位 | 1.Chinese Academy of Sciences, Center for OPTical IMagery Analysis and Learning (OPTIMAL), Xi'An Institute of Optics and Precision Mechanics, Xi'an, Shaanxi; 710119, China; 2.University of Chinese Academy of Sciences, Beijing; 100049, China |
推荐引用方式 GB/T 7714 | Mao, Gou,Yuan, Yuan,Xiaoqiang, Lu. Deep cross-modal retrieval for remote sensing image and audio[C]. 见:. Beijing, China. 2018-08-19. |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论