CORC  > 北京大学  > 信息科学技术学院
Understanding the limiting factors of topic modeling via posterior contraction analysis
Tang, Jian ; Meng, Zhaoshi ; Nguyen, Xuan Long ; Mel, Qiaozhu ; Zhang, Ming
2014
英文摘要2014 Topic models such as the latent Dirichlet allocation (LDA) have become a standard staple in the modeling toolbox of machine learning. They have been applied to a vast variety of data sets, contexts, and tasks to varying degrees of success. However, to date there is almost no formal theory explicating the LDA's behavior, and despite its familiarity there is very little systematic analysis of and guidance on the properties of the data that affect the inferential performance of the model. This paper seeks to address this gap, by providing a systematic analysis of factors which characterize the LDA's performance. We present theorems elucidating the posterior contraction rates of the topics as the amount of data increases, and a thorough supporting empirical study using synthetic and real data sets, including news and web-based articles and tweet messages. Based on these results we provide practical guidance on how to identify suitable data sets for topic models, and how to specify particular model parameters.; EI; 0
语种英语
内容类型其他
源URL[http://ir.pku.edu.cn/handle/20.500.11897/412763]  
专题信息科学技术学院
推荐引用方式
GB/T 7714
Tang, Jian,Meng, Zhaoshi,Nguyen, Xuan Long,et al. Understanding the limiting factors of topic modeling via posterior contraction analysis. 2014-01-01.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace