CORC  > 北京大学  > 信息科学技术学院
Identification of embedded mathematical formulas in PDF documents using SVM
Lin, Xiaoyan ; Gao, Liangcai ; Tang, Zhi ; Hu, Xuan ; Lin, Xiaofan
2012
关键词Mathematical formula recognition formula identification embedded formulas PDF documents EXPRESSIONS
英文摘要With the tremendous popularity of PDF format, recognizing mathematical formulas in PDF documents becomes a new and important problem in document analysis field. In this paper, we present a method of embedded mathematical formula identification in PDF documents, based on Support Vector Machine (SVM). The method first segments text lines into words, and then classifies each word into two classes, namely formula or ordinary text. Various features of embedded formulas, including geometric layout, character and context content, are utilized to build a robust and adaptable SVM classifier. Embedded formulas are then extracted through merging the words labeled as formulas. Experimental results show good performance of the proposed method. Furthermore, the method has been successfully incorporated into a commercial software package for large-scale e-Book production.; Computer Science, Artificial Intelligence; Engineering, Electrical & Electronic; Optics; EI; CPCI-S(ISTP); 0
语种英语
DOI标识10.1117/12.912445
内容类型其他
源URL[http://ir.pku.edu.cn/handle/20.500.11897/321209]  
专题信息科学技术学院
推荐引用方式
GB/T 7714
Lin, Xiaoyan,Gao, Liangcai,Tang, Zhi,et al. Identification of embedded mathematical formulas in PDF documents using SVM. 2012-01-01.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace