CORC  > 清华大学
几种改进的MFCC特征提取方法在说话人识别中的应用
许鑫 ; 苏开娜 ; 胡起秀 ; Xu Xin ; Su Kai-na ; Hu Qi-xiu
2010-07-15 ; 2010-07-15
会议名称第一届建立和谐人机环境联合学术会议(HHME2005)论文集 ; 第一届建立和谐人机环境联合学术会议(HHME2005) ; 中国昆明 ; CNKI ; 中国计算机学会、中国图象图形学学会、ACM SIGCHI中国分会、清华大学计算机科学与技术系
关键词MFCC 说话人识别 特征提取 鲁棒 MFCC speaker recognition feature extraction robust. TN912.34
其他题名A Comparative Study of Some Improved MFCC Algorithms for Speaker Recognition
中文摘要Mel频率倒谱系数(MFCC)表征了人类的听觉特征。目前国内外提出了一些比较好的MFCC改进算法,可以提高语音特征提取的鲁棒性。本文介绍了一些在语音识别中取得一定效果的Mel倒谱提取的改进算法。将这些算法应用于文本无关的说话人识别,并在此基础上提出了四种改进方法。在100人和200人的电话语料库中,分别进行同信道和不同信道的实验,使识别率获得了不同程度的提高。尤其在不同信道上的识别效果更为显著。其中频率掩蔽滤波与Expolog尺度相结合的方法识别效果最好:在用座机语音建模手机语音测试的实验中,识别率从基准系统的16.327%上升到38.776%;在用手机语音建模座机语音测试的实验中,识别率从基准系统的8%上升到40%。可见,所提出的改进方法是非常有效的。; MFCC symbolizes the property of human auditory system, and it is the key feature parameter in speaker recognition and speech recognition. The researchers proposed some improved algorithms for MFCC feature extraction, which succeeded in speech recognition in some cases. Those algorithms that we introduce to text-independent speaker recognition are Frequency Masking Filtering (FMF), Weighted Filter Bank Analysis (WFBA), Half Raised-Sine Function (HRSF) and ExpoLog Frequency Scale (EFS). Due to their advantages, we consider combining these algorithms and proposing four combined methods, including WFBA and HRSF, FMF and EFS, EFS and HRFS, FMF and HRFS. Combined WFBA and HRSF could decrease the influence by noise and emphasize the important middle MFCC terms; combined FMF and EFS could make MFCC more suitable to human auditory mechanism; combined EFS and HRFS could emphasize the more important mid-frequencies and lifter the more useful coefficients; combined FMF and HRFS could mimic a human masking mechanism to get more robust features. The speaker recognition system is based on Vector Quantization models. The speech database used in these experiments is telephone speech. These experiments are carried out between the same types or different types of handset. We train 100 people models and 200 people models respectively, and use speech from 50 people to test. With the proposed methods, the experiments reveal high robustness, especially in the different types of handset. In four proposed methods, combined FMF (linear interpolation) and EFS, combined EFS and HRSF, show higher robust than any other in both the same and different types of handset. In the different types of handset tests, combined FMF (linear interpolation) and EFS gets the correct recognition rate 38.776% and 40% respectively compared with the result of 16.327% and 8% in the baseline system. Although combined FMF (linear interpolation) and EFS gets the best result, it should require more extra computation than combined EFS and HRSF. By making a tradeoff between recognition speed and correct recognition rate with our needs, we can choose the right method for speaker recognition system.
语种中文 ; 中文
内容类型会议论文
源URL[http://hdl.handle.net/123456789/69978]  
专题清华大学
推荐引用方式
GB/T 7714
许鑫,苏开娜,胡起秀,等. 几种改进的MFCC特征提取方法在说话人识别中的应用[C]. 见:第一届建立和谐人机环境联合学术会议(HHME2005)论文集, 第一届建立和谐人机环境联合学术会议(HHME2005), 中国昆明, CNKI, 中国计算机学会、中国图象图形学学会、ACM SIGCHI中国分会、清华大学计算机科学与技术系.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace