CORC  > 软件研究所  > 软件工程技术研究开发中心  > 学位论文
题名OnceDI中可视化ETL工具的设计与实现
作者赵迪
学位类别博士
答辩日期2008-06-05
授予单位中国科学院软件研究所
授予地点软件研究所
关键词数据集成 ETL 数据转换 中间件
其他题名Design and Implementation of Visualized ETL System in OnceDI
中文摘要随着网络技术的快速发展、企业信息化的不断深入,企业中分布的数据、信息和知识更加多样,更加复杂,企业信息系统更加开放。如何实现企业中这些数据、信息和知识集成和共享已成为关键性问题。数据集成技术正是针对这种需求,实现分布、异构、复杂数据、信息和知识的动态、灵活、实时的集成和共享。 OnceDI 2.0很好的解决了在数据级别上异构数据源的互操作问题,满足不同的数据集成需求,跨平台,跨多种数据源,具有增量传输,冲突解决等多种实用机制,并提供完善的安全和管理工具。然而,它也存在缺陷,包括:接收数据源只能根据接收到的数据块定义,这时已经完成了数据的发送过程;发送数据源和接收数据源的字段对应关系必须完全由人工构建等问题。 数据集成的目标是为用户访问多个分布的、独立的、异构的数据源提供统一的应用界面。在ETL(Extract-Transform-Load,即数据抽取、转换和加载)过程可视化配置中,包含如何让用户更好地理解ETL过程以及如何让用户更有效地、更容易地配置、管理和执行ETL过程等问题。 论文在研究数据集成过程特点基础上,围绕数据集成中的可视化ETL过程的问题,确立了本文关于数据集成中数据转换和数据过滤的研究方向。针对数据转换,论文从模式匹配和实例转换两方面入手。在模式匹配方面,论文提出一种本体辅助的自动化模式匹配算法,它包括三部分:决策树学习和WordNet词汇本体相结合的方法计算属性名称匹配,定义属性数据类型本体解决带数据类型的属性匹配以及利用领域本体构建属性间的非直接映射关系解决一对多的语义匹配。该方法使得数据转换的可视化过程操作更加简便,自动化匹配结果更令用户满意。在实例转换方面,论文提出一种实例转换工具的设计方案,界面更加友好,更重要地,使得用户对实例级别的转换操作更加清晰、简单。针对数据过滤,论文从数据质量控制条件设置的特点入手,提出一种数据质量控制条件设置工具设计方案。 最后,本文针对OnceDI 3.0中的数据集成模型和OnceDI 3.0客户端-控制中心-DI服务器的三层体系结构设计实现数据集成中的可视化ETL工具,在设计中通过设计模式的应用增强了系统的可扩展性。
英文摘要With the rapid development of the network technologies and the enterprise information technologies, the data, information and knowledge distributed in enterprises are being more diversity and complication. Meanwhile, the enterprise information systems become more open. There is a problem that the distributed, heterogeneous and sophisticated data, information and knowledge within enterprises need to be integrated and shared. Data integration technology is just the one that can realize the dynamic, flexible and real-time integration and share such kind of data, information and knowledge. OnceDI 2.0 has solved many problems well, such as mutual operation of heterogeneous data sources based on data, meeting many kinds of requirements, supporting different platforms, supporting different data sources, incremental delivery, dealing with many kinds of conflicts, and offering perfect tools for security and management. However, it does exist some flaws. The system limits the definition of receiving data source to the definition of the receiving data block, which data sending process has obviously done. The column matching between the sending data source and the receiving data source can be done only by totally manual way. So on so forth. The goal of data integration is that it can provide unified application interface for the user accessing distributed, heterogeneous and sophisticated data sources. In the visualized ETL process, there are problems such as how to make users understand ETL process well, how to make users configure, manage and execute the ETL process more efficiently and easily. After researching the characteristics of data integration, focusing on the issues during the ETL process of data integration, the thesis establishes the research direction towards the data transformation and data filtering in data integration. In each research direction above, the thesis proposes a combined strategy of schema match and instance transformation. On the schema match hand, the thesis proposes a schema match algorithm, which makes the data transformation in the ETL process more easily and friendly. On the instance transformation hand, the thesis gives out a solution, which makes the instance-based transformation more clearly and easily. Furthermore, the thesis proposes two kinds of tools to resolve the data filtering problem. Concerning the implementation, based on the data integration models above and the 3-tier architecture of OnceDI 3.0, the thesis gives out the design and the implementation of the visualized ETL tool in OnceDI 3.0. A series of design patterns are applied in the design to enhance the extensibility of the system.
语种中文
公开日期2011-03-17
页码81
内容类型学位论文
源URL[http://124.16.136.157/handle/311060/6664]  
专题软件研究所_软件工程技术研究开发中心 _学位论文
推荐引用方式
GB/T 7714
赵迪. OnceDI中可视化ETL工具的设计与实现[D]. 软件研究所. 中国科学院软件研究所. 2008.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace