CORC  > 北京大学  > 信息科学技术学院
A scalable crawler framework for FLOSS data
Zhang, Lingxiao ; Zou, Yanzhen ; Xie, Bing
2013
英文摘要Free / Libre / Open Source Software (FLOSS) data, such as bug reports, mailing lists and related webpages, contains valuable information for reusing open source software projects. Before conducting further experiment on FLOSS data, researchers often need to download these data into a local storage system. We refer to this pre-process as FLOSS data retrieval, which in many cases can be a challenging task. In this paper, we proposed a crawler framework to ease the process of FLOSS data retrieval. To cope with various types of FLOSS data scattered on the Internet, we designed the framework in a scalable manner where a crawler program can be easily plugged into the system to extend its functionality. Researchers can perform the retrieval process on datasets of various types and sources simply by adding new configurations to the system. We have implemented the framework and provided basic functions via web-based interfaces. We presented the usage of the system by a detailed case study where we retrieved various types of datasets related to Apache Lucene project using our framework.; EI; 0
语种英语
DOI标识10.1145/2532443.2532454
内容类型其他
源URL[http://ir.pku.edu.cn/handle/20.500.11897/294350]  
专题信息科学技术学院
推荐引用方式
GB/T 7714
Zhang, Lingxiao,Zou, Yanzhen,Xie, Bing. A scalable crawler framework for FLOSS data. 2013-01-01.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace