A Bidirectional Hierarchical Skip-Gram Model for Text Topic Embedding
Suncong Zheng; Hongyun Bao; Jiaming Xu; Yuexing Hao; Zhenyu Qi; Hongwei Hao
2016
会议日期2016
会议地点Canada
英文摘要

Taking advantage of the large scale corpus on the web to effectively and efficiently mine the topics within texts is an essential problem in the era of big data. We focus on the problem of learning text topic embedding in an unsupervised manner, which enjoys the properties of efficiency and scalability. Text topic embedding represents words and documents in a semantic topic space, in which the words and documents with similar topic will be embedded close to each other. When compared with con-ventional topic models, which implicitly capture the document-level word co-occurrence patterns, text topic embedding alleviates the data sparsity problem and captures the semantic relevance between different words and documents. To model text topic embedding, we propose a Bidirectional Hierarchical Skip-Gram model (BHSG) based on skip-gram model. BHSG includes two components: semantic generation module to learn semantic relevance between texts and topic enhance module to produce the text topic embedding based on text embedding learned in the former module. We evaluated our method on two kinds of topic-related tasks: text classification and information retrieval. The experimental results on four public datasets and one dataset we provide all demonstrate that our proposed method can achieve a better performance.

会议录出版者IEEE
会议录出版地Canada
内容类型会议论文
源URL[http://ir.ia.ac.cn/handle/173211/40650]  
专题数字内容技术与服务研究中心_听觉模型与认知计算
作者单位CASIA
推荐引用方式
GB/T 7714
Suncong Zheng,Hongyun Bao,Jiaming Xu,et al. A Bidirectional Hierarchical Skip-Gram Model for Text Topic Embedding[C]. 见:. Canada. 2016.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace