An End-to-End Text-Independent Speaker Identification System on Short Utterances

CORC > 自动化研究所 > 中国科学院自动化研究所 > 数字内容技术与服务研究中心 > 听觉模型与认知计算

	An End-to-End Text-Independent Speaker Identification System on Short Utterances
	Ji RF(吉瑞芳)1,2; Cai XY(蔡新元)1 ; Xu B(徐波)1
	2018-09
会议日期	2018-9-2——2018-9-6
会议地点	印度，海得拉巴
英文摘要	In the field of speaker recognition, text-independent speaker identification on short utterances is still a challenging task, since it is rather tough to extract a robust and dicriminative speaker feature in short duration condition. This paper explores an end-to-end speaker identification system, which maps utterances to a speaker identity subspace where the similarity of speakers can be measured by Euclidean distance. To be specific, we apply GRU architectures to extract utterance-level feature. Then it is assumed that one’s various utterances can be viewed as transformations of a single object in an ideal speaker identity subspace. Based on this assumption, the ResCNN architecture is utilized to model the transformation, and the whole system is jointly optimized by speaker identity subspace loss. Experimental results demonstrate the effectiveness of our proposed system and superiority over pervious methods. For example, the GRU learned feature reduces the equal error rate by 27.53% relatively and the speaker identity subspace loss further brings 7.22% relative reduction compared to softmax loss.
内容类型	会议论文
源URL	[http://ir.ia.ac.cn/handle/173211/23545]
专题	数字内容技术与服务研究中心_听觉模型与认知计算
作者单位	1.中科院自动化研究所 2.中国科学院大学
推荐引用方式 GB/T 7714	Ji RF,Cai XY,Xu B. An End-to-End Text-Independent Speaker Identification System on Short Utterances[C]. 见:. 印度，海得拉巴. 2018-9-2——2018-9-6.