CORC  > 北京大学  > 信息科学技术学院
Integration of Text Information and Graphic Composite for PDF Document Analysis
Xu, Canhui ; Tang, Zhi ; Tao, Xin ; Shi, Cao
2012
关键词PDF document graphic segmentation graph based method text clustering SEGMENTATION RECOGNITION EXTRACTION
英文摘要The trend of large scale digitization has greatly motivated the research on the processing of the PDF documents with little structure information. Challenging problems like graphic segmentation integrating with texts remain unsolved for successful practical application of PDF layout analysis. To cope with PDF documents, a hybrid method incorporating text information and graphic composite is proposed to segment the pages that are difficult to handle by traditional methods. Specifically, the text information is derived accurately from born-digital documents embedded with low-level structure elements in explicit form. Then page text elements are clustered by applying graph based method according to proximity and feature similarity. Meanwhile, the graphic components are extracted by means of texture and morphological analysis. By integrating the clustered text elements with image based graphic components, the graphics are segmented for layout analysis. The experimental results on pages of PDF books have shown satisfactory performance.; http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000315974300002&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=8e1609b174ce4e31116a60747a720701 ; Computer Science, Artificial Intelligence; Computer Science, Theory & Methods; EI; CPCI-S(ISTP); 0
语种英语
DOI标识10.1007/978-3-642-34456-5_2
内容类型其他
源URL[http://ir.pku.edu.cn/handle/20.500.11897/321194]  
专题信息科学技术学院
推荐引用方式
GB/T 7714
Xu, Canhui,Tang, Zhi,Tao, Xin,et al. Integration of Text Information and Graphic Composite for PDF Document Analysis. 2012-01-01.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace