题名人机交互应用中麦克风阵列语音增强的研究
作者何成林
学位类别博士
答辩日期2005
授予单位中国科学院声学研究所
授予地点中国科学院声学研究所
关键词人机交互 麦克风阵列 语音增强 语音检测 阵列信号处理 集成维纳滤波 后滤波
其他题名Research on Microphone Array Speech Enhancement for Human-machine Interaction
中文摘要在人机语音交互的实际应用中,计算机处理的实际信号除了包含目标语音信号以外,还常常包含噪声信号或干扰语音信号或噪声信号与干扰语音信号二者兼有,导致语音识别的可用性急剧降低。本论文开展如何根据人机语音交互的实际应用场景进行语音识别的前端处理研究,使目标语音信号相对于噪声信号和干扰语音信号得到增强,以便改善语音识别在实际应用场景中的可用性。本论文的创新工作如下:1、系统地分析了各种基本的麦克风阵列语音增强技术的消噪性能,包括经典的延迟相加波束形成器、自适应波束形成器、后滤波技术等;并对一些最新的麦克风阵列语音增强算法进行了分析,如近场超定向波束形成器、广义奇异值分解结构、传输函数广义旁瓣相消器等,归纳了这些算法和结构的特点及其在实际应用中的局限性。2、针对人机语音交互实际应用中目标声源和干扰声源的空间分布特性,提出了一种结合维纳后滤波及空间滤波的麦克风阵列语音检测方法,较好地解决了低信噪比和存在干扰语音时的语音检测问题,当目标声源和干扰声源的位置固定,或其位置存在一定的相对移动时,对于信噪比为-5dB、干扰噪声比为-5dB的阵列接收信号,该语音检测算法对目标语音和干扰语音的检测结果正确率分别为87.3%和82.2%,对于干扰语音和目标语音同时存在的情况(SNR=0dB,SIR=-5dB),语音检测结果正确率为89.9%。3、提出了一种集成维纳滤波的稳健麦克风阵列语音增强结构(RGSC-IW),通过构建一个有效的自适应模式控制器(AMO来控制广义旁瓣相消器(GSC)的自适应,实验结果表明,当目标声源和干扰声源的位置固定或存在一定的相对移动时,RGSC-IW能够取得与人工自适应广义旁瓣相消器维纳后滤波结构(GSC-PW)相当的噪声抵消量和干扰抵消量,且RGSC-IW增强之后的语音信号失真度更小。
英文摘要In the practical applications of human-machine speech interaction, the signal received by the computer comprises not only target speech but also noise and interfering speech, which severely degrades the usability of speech recognition. Based on the practical scene of human-machine speech interaction, this thesis focuses on the research of the front-end processing of speech recognition, i.e. enhancing the target speech and suppressing the interfering speech and the noise, for improving the performance of the speech recognizer. The innovative characteristics of this thesis include three aspects: K Some basic techniques for microphone array speech enhancement, such as Delay-and-Sum Beamformer, Adaptive Beamformer and Post-filter are explained, and their noise reduction properties are also analyzed. The mechanisms of some most advanced technologies for microphone array speech enhancement, such as Near-field Superdirectivity Beamformer, Generalized Singular Value Decomposition and Transform Function Generalized Sidelobe Canceller are also presented. The properties of these algorithms and the appropriate acoustic environments for using them are also analyzed. 2> Based on the position characteristics of the target and interferefering speaker, a new speech detector based on Wiener post-filter and space filter is proposed, which is suitable for speech detection when the signal-to-noise ratio is low and interferefering speech exits. When the target and interferefering speakers are fixed or they are moving in a small areas, the detector can detect the target speech at an accuracy of 87.3% (SNR=5-dB) and 82.2% (INR=-5dB) for the interferefering speech, and the accuracy is 89.9% when the target and inteTferefering speech burst at the same time (SNR=0dB, SIR=5dB). 3^ A new structure (RGSC-IW) for microphone array speech enhancement is proposed, in which an efficient adaptive mode controller (AMC) is constructed to control the adaptation of GSC. When the target and interferefering speakers are fixed or they are moving in a small areas, practical tests show that the RGSC-IW can achieve nearly the same ability of noise and interference reduction with the technique of combine GSC with post Wiener filter (GSC-PW) whose adaptation is realized manually, and the enhanced speech's distortion of the former is smaller than that of the later.
语种中文
公开日期2011-05-07
页码108
内容类型学位论文
源URL[http://159.226.59.140/handle/311008/1054]  
专题声学研究所_声学所博硕士学位论文_1981-2009博硕士学位论文
推荐引用方式
GB/T 7714
何成林. 人机交互应用中麦克风阵列语音增强的研究[D]. 中国科学院声学研究所. 中国科学院声学研究所. 2005.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace