CORC  > 清华大学
基于Rough集约简算法的中文文本自动分类系统
盛晓炜 ; 江铭虎 ; Sheng Xiao-wei ; Jiang Ming-hu
2010-06-07 ; 2010-06-07
关键词自动分类 Rough集 决策表 约简算法 Automatic classification, Rough set, Decision table, Reduction algorithm TP391.1
其他题名Automatic Classification of Chinese Documents Based on Rough Set and Improved Quick-Reduce Algorithm
中文摘要现有的文本自动分类离不开文档向量的构造,向量的分量与文档中的特征项相对应。这种向量通常高达几千维甚至数万维,计算量相当大,因此需要对向量进行约简。而传统的基于频率的阈值过滤法往往会导致有效信息的丢失,影响分类的准确度。该文将Rough集理论引入自动分类,并提出了一种新的文档向量约简算法。实验证明该算法不仅能有效缩减文档向量的规模,而且相比传统的阈值法信息损失小、准确率更高。; Much of the previous automatic Text Classification (TC) methods are closely connected with the construction of document vectors. With each term corresponding to a unit in the vector, this method maps the document vectors into a very high dimensional space, possibly of tens of thousands of dimension, which results in a massive amount of calculation. Since the traditional algorithms based on frequency and threshold filtering may often lead to the loss of effective information, this paper presents a new system for TC, which introduces rough set theory that can greatly reduce the document vector dimensions by reduction algorithm. The empirical results prove to be very successful, for it can not only effectively reduce the dimensional space, but also reach higher accuracy while losing less information compared with usual reduction methods.; 教育部优秀青年教师资助计划 教育部归国人员启动基金 模式识别国家重点实验室开放基金 清华大学基础研究基金资助课题
语种中文 ; 中文
内容类型期刊论文
源URL[http://hdl.handle.net/123456789/44770]  
专题清华大学
推荐引用方式
GB/T 7714
盛晓炜,江铭虎,Sheng Xiao-wei,等. 基于Rough集约简算法的中文文本自动分类系统[J],2010, 2010.
APA 盛晓炜,江铭虎,Sheng Xiao-wei,&Jiang Ming-hu.(2010).基于Rough集约简算法的中文文本自动分类系统..
MLA 盛晓炜,et al."基于Rough集约简算法的中文文本自动分类系统".(2010).
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace