A unified framework and models for integrating translation memory into phrase-based statistical machine translation

doi:10.1016/j.csl.2018.09.006

	A unified framework and models for integrating translation memory into phrase-based statistical machine translation
	Liu, Yang 1; Wang, Kun1 ; Zong, Chengqing1 ; Su, Keh-Yih 2
刊名	COMPUTER SPEECH AND LANGUAGE
	2019-03-01
卷号	54 页码:176-206
关键词	Phrase-based machine translation Translation memory
ISSN号	0885-2308
DOI	10.1016/j.csl.2018.09.006
通讯作者	Liu, Yang(yang.liu2013@nlpr.ia.ac.cn)
英文摘要	Since statistical machine translation (SMT) and translation memory (TM) complement each other in TM matched and unmatched regions, a unified framework for integrating TM into phrase-based SMT is proposed in this paper. Unlike previous two-stage pipeline approaches, which directly merge TM results into the input sentences and subsequently let the SMT only translates those unmatched regions, the proposed framework refers to the corresponding TM information associated with each phrase at the SMT decoding. Under this unified framework, several integrated models are proposed to incorporate different types of information extracted from TM to guide the SMT decoding. We thus let SMT implicitly and indirectly utilize global context with a local dependency model. Furthermore, the SMT phrase table is dynamically enhanced with TM phrase pairs when the TM database and the SMT training set are different. On a Chinese-English TM database, our experiments show that the proposed Model-I significantly improves over both SMT and TM when the SMT training set is also adopted as the TM database and when the fuzzy match score is over 0.4 (overall 3.5 BLEU points improvement and 2.6 TER points reduction). In addition, the proposed Model-II is significantly better than the TM and the SMT systems when the SMT training set and the TM database are different. Furthermore, the proposed Model-III outperforms both the TM and the SMT systems even when the SMT training set and the TM database are from different domains. Additionally, the proposed Model-IV further achieves significant improvements with the help of Top-N TM sentence pairs. Lastly, all our models significantly outperform those state-of-the-art approaches under all test conditions. (C) 2018 Elsevier Ltd. All rights reserved.
WOS研究方向	Computer Science
语种	英语
出版者	ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD
WOS记录号	WOS:000451046000012
内容类型	期刊论文
源URL	[http://ir.ia.ac.cn/handle/173211/25711]
专题	中国科学院自动化研究所
通讯作者	Liu, Yang
作者单位	1.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing, Peoples R China 2.Acad Sinica, Inst Informat Sci, Taipei, Taiwan
推荐引用方式 GB/T 7714	Liu, Yang,Wang, Kun,Zong, Chengqing,et al. A unified framework and models for integrating translation memory into phrase-based statistical machine translation[J]. COMPUTER SPEECH AND LANGUAGE,2019,54:176-206.
APA	Liu, Yang,Wang, Kun,Zong, Chengqing,&Su, Keh-Yih.(2019).A unified framework and models for integrating translation memory into phrase-based statistical machine translation.COMPUTER SPEECH AND LANGUAGE,54,176-206.
MLA	Liu, Yang,et al."A unified framework and models for integrating translation memory into phrase-based statistical machine translation".COMPUTER SPEECH AND LANGUAGE 54(2019):176-206.