带结构样本集上的机器学习

CORC > 自动化研究所 > 中国科学院自动化研究所 > 毕业生 > 博士学位论文

题名	带结构样本集上的机器学习
作者	韩彦军
学位类别	工学博士
答辩日期	2011-05-28
授予单位	中国科学院研究生院
授予地点	中国科学院自动化研究所
导师	王珏
关键词	带结构样本集归纳逻辑规则集成多实例学习推荐算法群体发现 structural sample set inductive logic programming rule ensemble multiple instance learning recommendation systems community detection
其他题名	Learning On Structural Sample Set
学位专业	模式识别与智能系统
中文摘要	带结构样本集上的学习包括三种规模不同的学习范式，涉及的研究对象从最微观的样本内结构一直到最宏观的全体样本上的结构。第一种学习范式中结构存在于样本内部，即规则学习。在该情形下，每个样本是一组由关系连接的碎片，无法表达为欧氏空间中的向量。我们采用经典的规则学习算法，诸如Progol和FOIL，把碎片和关系组合为规则，然后用多核学习方法把生成的规则集成在一起。可以证明当用逻辑规则生成的核矩阵进行多核学习时，其他核都可以等价转化为线性核。在此基础上，通过用修正FOIL算法迭代生成规则，构造相应的线性核然后进行多核优化，由此实现了由规则诱导出的特征空间上的线性分类器。算法具有“双稀疏”特性，即：可以同时得到支持向量和支持规则。此外，我们还基于ℓ1正则化设计了一种规则复杂程度惩罚项，生成的规则不仅更少而且更简洁，可以证明若每个迭代步生成局部最优规则，则获得的规则集是全局最优规则集。对比试验结果表明泛化性能较其他方法有明显提升。第二种学习范式中结构存在于部分样本之间，即多实例学习。该问题的难点在于样本的标签信息不充分，即正包内的实例都会被标记为正实例。在该问题中有两种预测错误：假阴性和假阳性。目前的方法大多集中于避免假阴性。我们充分利用了正包内实例的几何分布，可以同时避免假阴性和假阳性。基于核主成分分析，我们为每个正包设计了投影约束，该约束会使得正包内实例尽量远离分类超平面并且把正实例和负实例划分在超平面的两侧。获得的优化问题可以使用带约束凹-凸过程（the Constrained Concave-Convex Procedure）来求解。此外我们提出了隐符号一致理念并设计了一种特定的损失函数来实现这一理念。得到的问题可以用带约束凹-凸过程高效求解，其对偶问题中不仅包含了与实例对应的支持向量，还包括了与包对应的支持向量，我们称其为α/β 支持向量机。我们的方法事实上实现了一个核函数学习过程，在迭代中自动调整正包中实例的权值。对比实验结果表明我们的算法具有更好的泛化性能。第三种学习范式中结构存在于全体样本之间。该范式中包含两个独立的问题。第一个问题的目的是为结构中的每个节点提供信息，即推荐系统。我们采用支持向量机计算出每个用户的信息价值和信息接受阈值，然后在此基础上设计了一个推荐系统，该系统会根据用户反馈自动修正用户价值和阈值。该方法能够检测到用户偏好的改变并且不容易受枪手的影响。第二个问题的目的是发现全局结构中隐含的子结构，即群体发现。我们把热门节点看做带标签数据，把全局结构看做邻接图，然后采用半监督学习来发现子结构。与领域内最新的方法相比，我们的方法的优势在于具有泛化保证。实验结果表明在测试数据集上我们的方法有所改进。
英文摘要	The learning on structural sample set consists of three learning schemes with structures of different scales, ranging from the most microscopic structure inside each sample to the most macroscopic structure over the whole sample set. The first learning scheme deals with the structure inside each sample, i.e., rule learning. In this situation, samples are given in the form of fragments con-nected by relations and cannot be represented by vectors in the Euclidean Space. We employ classic rule learning algorithms, such as Progol and FOIL, to form single rules from fragments and relations insides samples. And we utilize ℓ1 reg-ularization to produce sparse rule combination. Furthermore, we design a rule complexity penalty to encourage rules with fewer literals. It is proved that if a locally optimal rule is generated at each iteration, the final obtained rule set will be globally optimal. Besides, we proposed a multiple kernel learning approach for learning rules. We prove that for multiple kernel learning with the kernels induced by logical rules, it suffices to use the linear kernel. Based on this, through iteratively constructing rules by a modified FOIL algorithm and performing the corresponding multiple kernel optimization, the proposed approach realizes an ad-ditive model on the feature space induced by the obtained rules. The algorithm is characterized by its “bi-sparsity”, i.e., support rules and support vectors are obtained simultaneously. Experimental results demonstrate that our approach has better prediction accuracy than previous approaches. Meanwhile, the output classifier has a straightforward interpretation and relies on a smaller number of rules. The second learning scheme deals with the structure among set of samples, i.e., multiple instance learning. The difficulty in this situation is that the label information for positive samples is incomplete in that the instances in a certain positive bag are all labeled positive. There are two kinds of prediction failure, i.e., false negative and false positive. Current research mainly focus on avoiding the former. We attempt to utilize the geometric distribution of instances inside positive bags to avoid both the former and the latter. Based on kernel principal component analysis, we define a projection constraint for each positive bag to classify its constituent instances far away from the separating hyperplane while place positive instances and negative instances at opposite sides. We ap...
语种	中文
其他标识符	200518014628081
内容类型	学位论文
源URL	[http://ir.ia.ac.cn/handle/173211/6365]
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	韩彦军. 带结构样本集上的机器学习[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2011.

个性服务

查看访问统计

相关权益政策

暂无数据

收藏/分享

所有评论 (0)

[发表评论/异议/意见]

暂无评论

评论
权益异议
反馈意见

评注功能仅针对注册用户开放，请您登录

您对该条目有什么异议，请向管理员反馈。
内容：
Email：	*
单位:
验证码：	刷新

您在知识库使用过程中有什么好的想法或者建议可以反馈给我们。
标题：	*
内容：
Email：	*
验证码：	刷新

相关链接

CORC

联系我们