CORC  > 自动化研究所  > 中国科学院自动化研究所  > 毕业生  > 博士学位论文
题名手写数字、汉字识别研究与应用
作者黄磊
学位类别工学博士
答辩日期2003-06-01
授予单位中国科学院研究生院
授予地点中国科学院自动化研究所
导师刘昌平
关键词大规模模式识别 手写汉字识别 手写数字识别 信函分拣 pattern recognition problems of large scale handwritten digit recognition handwritten Chinese character recognition automatic ma
其他题名Research and Application of Handwritten Digit and Chinese Character Recognition
学位专业模式识别与智能系统
中文摘要为改善大规模的手写识别问题,本文从特征提取、降维、聚类、训练方法、分类器设计、分类器集成、以及识别后处理等多个方面进行了研究。在此基础上,本文将手写汉字识别与数字识别应用于实际领域。本文的主要工作包括: 1.本文对汉字识别中常用的方向线素特征进行了详细的分析,并采用层次 扫描方法替代网格划分方法,使方向线素特征识别率相对于8×8+7×7的 二重分割方法提高了3.9%。然后,研究了特征后处理与识别率的关系。 2.本文将测地路径(geodesic paths)与非参数化降维相结合,提出了优化 的降维方法。为解决大规模模式识别问题,本文同时给出了算法的化简 策略,使得用测地距离优化的非参数化降维算法在保证识别率的同时, 训练速度极大提高,能应用于汉字识别与数字识别。该方法使汉字识别率提高了1.5%。 3. 本文提出了多种特征的交叠聚类算法,可以融合不同特征的聚类结果。该方法在数字识别中取得了较好效果。 4.由于分类错误的样本对分类影响大,本文提出了分层学习的方法。分层学习方法使数字识别的错误率减少了34.6%。针对汉字识别,本文提出 了非参数化的均值分类器。针对大样本集的手写数字,本文提出了简化 的基于高斯混合分布的参数化分类器,并提出了相应的竞争学习算法。 5.本文提出了与投票法结合的白适应加权多分类器集成方法,并利用该方 法设计了手:写汉字识别体统和手写数字识别系统。其中数字识别识别率 为99.15%,汉字识别系统对1998年863测试集的识别率为90.32%。 6.本文提出利用两类分类问题来区分混淆汉字,对混淆字的错误问题作了 详细分析。然后对Bayes概率子空间方法进行了改进,将汉字识别率提 高到90.55%。 7.将手写数字与汉字识别应用于实际领域,研究并设计了结合邮政编码识 别和地址识别的手写信函分拣软件系统,该系统正在推广使用。
英文摘要To improve the performance of handwriting recognition, several key components of statistical pattern recognition have been studied in this paper, such as feature extraction, dimensionality reduction, clustering, classifier design, classifier combination, and so on. The research results have been integrated into a practical application system as well. Following are main contributions of this paper: 1. Direction element feature (DEF) is analyzed, and the traditional grid-based feature extraction method is replaced by a hierarchical scanning method. Compared with 8x8+7x7 statistical method, the accuracy is improved by 3.9% 2. An optimized dimensionality reduction algorithm is proposed, which combines geodesic paths and non-parameter dimensionality reduction methods. A simplified algorithm is also proposed for large-scale pattern recognition problems. The proposed method improves the accuracy by 1.5%. 3. An overlapping clustering algorithm is proposed to combine the clustering results using different features. 4. As the mislabeled training samples greatly affect the performance of classification, a hierarchical training method is proposed, which reduces the error rate of digit recognition by 34.6%. And then, a non-parameter center-based classifier is proposed for handwritten Chinese character recognition, and a Gaussian Mixture Model-based parameter classifier for handwritten digit recognition. A competitive learning algorithm is also proposed to further improve the accuracy. 5. A novel classifier combination method is presented, which integrates voting method and adaptive weighting method. Using the proposed algorithm, a handwritten digit recognition system and a handwritten Chinese character recognition system are implemented. The accuracy of the former system is 99.15%, while that of the latter is 90.32%. 6. A two-class classification method is introduced to distinguish between a pair of confusing characters. After analyzing those confusing characters, an improved method is proposed for Bayesian probabilistic subspace. Using this method, the accuracy of handwritten Chinese character recognition is improved to 90.55%. 7. Based on these handwriting recognition techniques, we have designed an automatic mail sorting system, which has been released in China.
语种中文
其他标识符739
内容类型学位论文
源URL[http://ir.ia.ac.cn/handle/173211/5761]  
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
黄磊. 手写数字、汉字识别研究与应用[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2003.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace