题名面向电子政务的用户行为数据收集与预处理
作者刘雅思
学位类别硕士
答辩日期2016-05-29
授予单位中国科学院大学
授予地点北京
导师李晓
关键词电子政务 政府网站 Web日志挖掘 Web使用挖掘 数据分析
学位专业计算机应用技术
中文摘要随着Internet的飞速发展,人们交流和获取信息的方式都发生了很大的变化,网络成了人们主要信息来源。政府网站作为电子政府的核心,逐渐成为了政府发布相关政策、法律、信息的主流平台,公众对于政府网站的使用方式也随之发生了改变。公众希望可以通过政府网站与政府相关部门人员交流、提供自己的监督意见。我国政府大力支持电子政务的发展,经过多年的努力,我国的各级政府网站的建设也越来越好,同时积累了海量的日志数据文件。如何有效的对政府网站用户的使用数据进行收集与预处理,直接关系到其中潜在规律的挖掘结果,是一个非常值得研究的课题。本文针对面向电子政务的用户行为数据收集与预处理进行了相关研究。首先,对国内电子政府发展历史及现状进行调查研究,分析了政府网站的职能、特点和用户体验,指出了其发展过程中存在的问题。为了解决这些实际中存在的问题,结合用户行为数据收集方法难易情况和实际需求的用户行为数据收集的颗粒度,确定了基于服务器日志的数据收集方法。实际的数据处理中,为了提高数据清洗的效率,提出了SNM ( Sorted neighborhood method, 临近记录排序)算法的改进算法,增加了长度过滤和对属性缺失情况的判断,提高了数据清洗的准确度和效率。针对政府网站用户行为的特点实现了用户识别、会话识别、路径完成的相关启发式算法,并对其用户识别的有效性进行了验证。 最后,实现了面向电子政务的用户行为数据收集与预处理平台的运行,对政府网站的实际运行日志数据进行了分析,得到了相应的分析,并对平台的性能进行了分析。
英文摘要The rapid development of Internet has hugely influenced the way how people communicate and get information. The Internet has become the main source of information for people. As the core of e-government, government website has become the platform of publishing the relevant policies, laws and information. How people use government websites also changes. The public hope to communicate through the government website with relevant officer, supervise and provide their own advice. Chinese government strongly supports the development of e-government, after years of efforts, all levels of government websites are being built better and better, and vast amounts of log data files has accumulated. It is a subject need to study, how to effectively collect and preprocess the government website user's data, which is directly related to the results of the potential data mining.In this paper, user behavior data collection and preprocessing for e-government is researched. First, we investigate and study the history and current status of the domestic development of e-government, analyze the functions of government websites, features and user experience, point out its shortcomings. To solve these problems, considering the advantages and disadvantages of different collection methods and the request of how exquisite the user behavior data should be, we collect data based on the server log. In order to improve data cleaning efficiency, we propose an improved SNM (Sorted-Neighborhood Method) algorithm, based on length filtering and dynamic fault-tolerance (LF-SNM). LF-SNM improved accuracy and efficiency of data cleaning. Considering the characteristics of e-government website user, heuristic rules are proposed for user identification, session identification and path completion. Finally, we runs a user behavior data collection and pre-processing platform for e-government successfully. The platform can analyze e-government websites log data and provide result reports.
内容类型学位论文
源URL[http://ir.xjipc.cas.cn/handle/365002/4559]  
专题新疆理化技术研究所_多语种信息技术研究室
作者单位中国科学院新疆理化技术研究所
推荐引用方式
GB/T 7714
刘雅思. 面向电子政务的用户行为数据收集与预处理[D]. 北京. 中国科学院大学. 2016.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace