Video captioning based on vision transformer and reinforcement learning | |
Zhao, Hong1; Chen, Zhiwen1; Guo, Lan1; Han, Zeyu2 | |
刊名 | PEERJ COMPUTER SCIENCE |
2022-03-16 | |
卷号 | 8 |
关键词 | Video captioning Vision transformer Reinforcement learning Long short-term memory network Computer vision Natural language processing Attention mechanism Encode-decode Deep learning |
DOI | 10.7717/peerj-cs.916 |
英文摘要 | Global encoding of visual features in video captioning is important for improving the description accuracy. In this paper, we propose a video captioning method that combines Vision Transformer (ViT) and reinforcement learning. Firstly, Resnet-152 and ResNeXt-101 are used to extract features from videos. Secondly, the encoding block of the ViT network is applied to encode video features. Thirdly, the encoded features are fed into a Long Short-Term Memory (LSTM) network to generate a video content description. Finally, the accuracy of video content description is further improved by fine-tuning reinforcement learning. We conducted experiments on the benchmark dataset MSR-VTT used for video captioning. The results show that compared with the current mainstream methods, the model in this paper has improved by 2.9%, 1.4%, 0.9% and 4.8% under the four evaluation indicators of LEU-4, METEOR, ROUGE-L and CIDEr-D, respectively. |
WOS研究方向 | Computer Science |
语种 | 英语 |
出版者 | PEERJ INC |
WOS记录号 | WOS:000773302200003 |
内容类型 | 期刊论文 |
源URL | [http://ir.lut.edu.cn/handle/2XXMBERH/158092] |
专题 | 兰州理工大学 |
作者单位 | 1.Lanzhou Univ Technol, Sch Comp & Commun, Lanzhou, Gansu, Peoples R China; 2.Lanzhou Univ Technol, Network & Informat Ctr, Lanzhou, Gansu, Peoples R China |
推荐引用方式 GB/T 7714 | Zhao, Hong,Chen, Zhiwen,Guo, Lan,et al. Video captioning based on vision transformer and reinforcement learning[J]. PEERJ COMPUTER SCIENCE,2022,8. |
APA | Zhao, Hong,Chen, Zhiwen,Guo, Lan,&Han, Zeyu.(2022).Video captioning based on vision transformer and reinforcement learning.PEERJ COMPUTER SCIENCE,8. |
MLA | Zhao, Hong,et al."Video captioning based on vision transformer and reinforcement learning".PEERJ COMPUTER SCIENCE 8(2022). |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论