CORC  > 清华大学
Algorithm of estimating index sizes of resource collections in distributed search
Wu Sheng ; Li Xing
2010-10-12 ; 2010-10-12
关键词Practical Theoretical or Mathematical/ indexing Internet maximum likelihood estimation probability query processing sampling methods/ index size estimation resource collection distributed search deep Web high frequent resample algorithm random query heterogeneous capture algorithm probability logistic function conditional maximum likelihood method/ C7250R Information retrieval techniques C7240 Information analysis and indexing C1140Z Other topics in statistics C7210N Information networks
中文摘要Distributed search is an effective way to search the deep Web, while collection size is an important feature in collection representation and selection in distributed search. To estimate collection size in uncooperative environments, the two novel algorithms were proposed in this paper. High frequent resample algorithm first samples collections with random queries, then resamples with high frequent queries in the sample set. Heterogeneous capture algorithm, based on the assumption of different capture probabilities among documents, uses logistic functions and conditional maximum likelihood. Experimental results show that the algorithms outperform both sample-resample and capture-recapture algorithms.
语种中文
出版者Science Press ; China
内容类型期刊论文
源URL[http://hdl.handle.net/123456789/82758]  
专题清华大学
推荐引用方式
GB/T 7714
Wu Sheng,Li Xing. Algorithm of estimating index sizes of resource collections in distributed search[J],2010, 2010.
APA Wu Sheng,&Li Xing.(2010).Algorithm of estimating index sizes of resource collections in distributed search..
MLA Wu Sheng,et al."Algorithm of estimating index sizes of resource collections in distributed search".(2010).
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace