Algorithm of estimating index sizes of resource collections in distributed search | |
Wu Sheng ; Li Xing | |
2010-10-12 ; 2010-10-12 | |
关键词 | Practical Theoretical or Mathematical/ indexing Internet maximum likelihood estimation probability query processing sampling methods/ index size estimation resource collection distributed search deep Web high frequent resample algorithm random query heterogeneous capture algorithm probability logistic function conditional maximum likelihood method/ C7250R Information retrieval techniques C7240 Information analysis and indexing C1140Z Other topics in statistics C7210N Information networks |
中文摘要 | Distributed search is an effective way to search the deep Web, while collection size is an important feature in collection representation and selection in distributed search. To estimate collection size in uncooperative environments, the two novel algorithms were proposed in this paper. High frequent resample algorithm first samples collections with random queries, then resamples with high frequent queries in the sample set. Heterogeneous capture algorithm, based on the assumption of different capture probabilities among documents, uses logistic functions and conditional maximum likelihood. Experimental results show that the algorithms outperform both sample-resample and capture-recapture algorithms. |
语种 | 中文 |
出版者 | Science Press ; China |
内容类型 | 期刊论文 |
源URL | [http://hdl.handle.net/123456789/82758] |
专题 | 清华大学 |
推荐引用方式 GB/T 7714 | Wu Sheng,Li Xing. Algorithm of estimating index sizes of resource collections in distributed search[J],2010, 2010. |
APA | Wu Sheng,&Li Xing.(2010).Algorithm of estimating index sizes of resource collections in distributed search.. |
MLA | Wu Sheng,et al."Algorithm of estimating index sizes of resource collections in distributed search".(2010). |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论