CORC  > 北京大学  > 信息科学技术学院
PARADISE based search engine at TREC 2009 Web track
Shan, Dongdong ; Zhao, Dongsheng ; He, Jing ; Yan, Hongfei
2009
英文摘要In this paper, we introduce the PARADISE search engine in TREC09 Web track. PARADISE is the abbreviation for Platform for Applying, Research and Developing Intelligent Search Engine, which is a search engine platform developed by SEWM group, Peking University. The system is designed to support both English and Chinese information retrieval. This system preprocessed and indexed the five hundred million web pages for this year's Web Track. In the preprocessing stage, the templates were removed, the encoding were identified and unified, and the anchor texts and InLink information are extracted with the mapreduce framework (using Hadoop in this system). In retrieval, our runs used an extension of BM25. This model distinguishes terms from different fields and integrated both term counts and position information. Furthermore, some web based features are also considered.; EI; 0
语种英语
内容类型其他
源URL[http://ir.pku.edu.cn/handle/20.500.11897/327522]  
专题信息科学技术学院
推荐引用方式
GB/T 7714
Shan, Dongdong,Zhao, Dongsheng,He, Jing,et al. PARADISE based search engine at TREC 2009 Web track. 2009-01-01.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace