Attention-Guided Network for Semantic Video Segmentation

doi:10.1109/ACCESS.2019.2943365

CORC > 自动化研究所 > 中国科学院自动化研究所 > 模式识别国家重点实验室 > 图像与视频分析团队

	Attention-Guided Network for Semantic Video Segmentation
	Li, Jiangyun 1,4; Zhao, Yikai 1,4; Fu, Jun 2; Wu, Jiajia 3; Liu, Jing 2
刊名	IEEE ACCESS
	2019
卷号	7 页码:140680-140689
关键词	Semantics Image segmentation Feature extraction Active appearance model Optical imaging Context modeling Task analysis Semantic video segmentation attention convolutional neural networks
ISSN号	2169-3536
DOI	10.1109/ACCESS.2019.2943365
通讯作者	Li, Jiangyun(leejy@ustb.edu.cn)
英文摘要	Remarkable success has been made by deep convolutional neural network (CNN) models in semantic image segmentation. However, most segmentation models are based on classification networks which tend to learn image-level features and lost abundant spatial information due to repeated pooling and downsampling operations, and the CNN-based methods are not robust to inputs, hence directly applying existing segmentation methods to semantic video segmentation will result in spatially inconsecutive and temporally inconsistent segmentation predictions within one instance and of the same objects across adjacent frames, respectively. To tackle this challenge, we propose an Attention-Guided Network (AGNet) to adaptively strengthen inter-frame and intra-frame features for more precise segmentation predictions. Specifically, we append an adjacent attention module (AAM) and a spatial attention module (SAM) on the top of dilated fully convolutional network (FCN), which model the feature correlations in temporal and spatial dimensions, respectively. The AAM selectively enhances the inter-frame features of the same objects across adjacent frames for temporally consistent predictions. Meanwhile, the SAM selectively aggregates the intra-frame features within one instance for spatially consecutive predictions. Finally, we sum the outputs of the two attention modules to further improve feature representations which contribute to more precise segmentation predictions across temporal and spatial dimensions simultaneously. Extensive experiments demonstrate the effectiveness of the proposed method, obtaining state-of-the-art mean intersection of union (mIoU) of 75.22 on CamVid dataset.
资助项目	National Nature Science Foundation of China[61671054] ; Beijing Natural Science Foundation[4182038]
WOS关键词	DEEP ; DECODER
WOS研究方向	Computer Science ; Engineering ; Telecommunications
语种	英语
出版者	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
WOS记录号	WOS:000497156000044
资助机构	National Nature Science Foundation of China ; Beijing Natural Science Foundation
内容类型	期刊论文
源URL	[http://ir.ia.ac.cn/handle/173211/29337]
专题	自动化研究所_模式识别国家重点实验室_图像与视频分析团队
通讯作者	Li, Jiangyun
作者单位	1.Minist Educ, Key Lab Knowledge Automat Ind Proc, Beijing 100083, Peoples R China 2.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China 3.Beijing Technol & Business Univ, Sch Comp & Informat Engn, Beijing 102488, Peoples R China 4.Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing 100083, Peoples R China
推荐引用方式 GB/T 7714	Li, Jiangyun,Zhao, Yikai,Fu, Jun,et al. Attention-Guided Network for Semantic Video Segmentation[J]. IEEE ACCESS,2019,7:140680-140689.
APA	Li, Jiangyun,Zhao, Yikai,Fu, Jun,Wu, Jiajia,&Liu, Jing.(2019).Attention-Guided Network for Semantic Video Segmentation.IEEE ACCESS,7,140680-140689.
MLA	Li, Jiangyun,et al."Attention-Guided Network for Semantic Video Segmentation".IEEE ACCESS 7(2019):140680-140689.