基于大规模文言语料库的古人心理分析关键技术研究

题名	基于大规模文言语料库的古人心理分析关键技术研究
作者	邢付贵
答辩日期	2021-07
文献子类	硕士
授予单位	中国科学院心理研究所
授予地点	中国科学院心理研究所
其他责任者	朱廷劭
关键词	心理语义古汉语分词大数据大五人格迁移学习
学位名称	理学硕士（同等学力硕士）
其他题名	The research on the key technologies of psychological analysis Based on large-scale ancient Chinese Corpus
学位专业	应用心理学
中文摘要	It is of great significance to study the psychology of ancient Chinese. The psychology of contemporary Chinese comes from the psychology of the ancients. Studying the psychology of the ancients can better promote the research of contemporary Chinese and reveal the deep connotation of Chinese psychology. The ancient Chinese thought has been tested by both practice and history. The ancient psychological thought is worth learning and using for reference. The study of ancient Chinese psychological thought can not only make up for the deficiency of western psychological thought but also provide a strong foundation for Chinese Psychological Science.Previous scholars have conducted a lot of research on the psychological analysis of the ancients, but there are few pieces of research on the psychological semantics based on the self-expression texts of the Chinese ancients, most of which are limited to quantitative research or qualitative analysis based on other's comments. Besides, the research materials used in the traditional methods are small in scale, and most of them use the research methods of expert evaluation and manual reading of literature, so it is difficult to be used in the study of the group of ancient people in large time. Based on big data and artificial intelligence technology, this paper realizes the automatic analysis of ancient people's psychology based on a large-scale corpus, which breaks through the limitations of traditional methods, and is more suitable for large-scale group research; in this research method, we take the ancient people's self-expression text as the research material and carry on the psychological semantic analysis, which is more objective than other methods. We also introduced the Big Five personality model of modern psychology into the personality analysis of the ancients, which can better explain the psychology of the ancients with modern psychological theory.The basis of using big data to analyze ancient Chinese text is word segmentation. Firstly, this paper proposes a word segmentation method based on the dictionary CCIDict. With the development of the Internet, more and more online ancient Chinese datasets provide a new opportunity for the study of ancient Chinese corpus. We use the big data processing method to collect, process, and transform the data, and get a basic dictionary containing 331516 words; then we propose a New Word Discovery method based on multi-feature fusion, which integrates n-gram word frequency, mutual information, information entropy, and location probability to extract new words from the large-scale corpus and form a candidate dictionary; finally, we form a fusion basic dictionary and a candidate dictionary. The combined Dictionary of the dictionary includes all CC-LIWC words. By comparison, it is found that the forward maximum matching algorithm using combination dictionary CCIDict has the highest accuracy. Compared with the open-source Jia Yan tokenizer, the F value of our tokenizer is increased by 14 % , and better results are achieved. It also lays a foundation for the analysis of the psychological semantics of ancient Chinese texts by using big data.Personality analysis is an important part of the psychological research of the ancients. Due to the limitation of resources, the traditional research methods of psychological biography have certain limitations for the large-scale longitudinal study of ancient Chinese historical figures. With the development of artificial intelligence technology, transfer learning provides a new idea for the psychological research of the ancients. Microblog tweets are modern people's self-expression on the Internet, which can reflect the personality tendency of individuals. In the corresponding ancient books, autobiography, letters, and imperial edicts are also a kind of self-expression. Through the analysis of self-expression text, we can understand the personality feature of historical figures with different roles in ancient times. We use the microblog data of modern people as the source domain and the Autobiography of ancient people as the target domain to predict the big five personalities of ancient historical figures through the training transfer learning model. We propose UERDA and SERDA models. UERDA adds information entropy in the process of distribution adaptation to realize unsupervised domain adaptation based on regression tasks. SERDA adds a small number of labeled samples based on UERDA to improve the accuracy of the model in the regression task. Compared with the traditional algorithm PCA, the performance index RMSE of UERDA is increased by 2.23 on average; SERDA is similar to the traditional algorithm PCA The average performance index RMSE is increased by 3.16. The results show that transfer learning has achieved good performance in the prediction of the Big Five personality of ancient Chinese people, which opens up a new way for the study of social change in a large period through the quantitative way.To prove the validity of the method of historical and psychological analysis of ancient emperors. In a sense, the imperial edict is the emperor's self-expression text, which can reflect the emperor's psychological features. We combine our tokenizer with CC-LIWC to analyze the psychological semantic characteristics of the imperial edict and then explain the psychological laws in social changes. The results show that power has a significant impact on the duration of the Dynasty and the population. Power can explain the duration of 38.3 % , but there is no significant correlation between power and land area. Among the famous historical figures, Zhuge Liang was the Prime Minister of the Shu Han Dynasty, who presided over the Court Affairs for Liu Bei and was a well-known ancient figure. We chose Zhuge Liang as the object of analysis and analyzed Zhuge Liang's Big Five personality characteristics. The results show that the result of the SERDA is similar to that of artificial evaluation, and also in line with people's subjective feelings.From the perspective of big data and artificial intelligence, this paper puts forward a series of methods to study ancient people's psychology and realizes the psychological analysis of the individual or group of ancient Chinese people. The data results show that the research methods of this paper can be better applied to the psychological research of the ancient people, especially in the research of social change in a large period.
英文摘要	研究中国古人的心理具有重要的意义。当代中国人的心理来源于古人的心理学思想，研究中国古人的心理能更好地促进当代人的研究，揭示中国人心理的深层内涵。中国古人的思想经过了实践和历史的双重检验，古人的心理学思想值得今人学习和借鉴。研究中国古代心理学思想不仅能弥补西方心理学思想的不足，还能为中国心理科学提供强有力的根基。以往的学者在古人心理分析方面做了很多研究，但基于中国古人自我表达文本的心理语义研究并不多，大多局限在基于他评的定量研究或定性分析，除此之外，传统方法所用的研究材料规模较小，大多采用专家评定和人工阅读文献的研究方法，因此难以在大时间跨度的群体研究中使用。本文基于大数据和人工智能相关技术实现了基于大规模语料自动化地分析古人的心理，突破了传统方法的局限性，更适合大时间跨度的群体研究；在本文的研究方法中，我们以古人的自我表达文本为研究材料，对其进行心理语义分析，比其他方法更具有客观性；我们还将现代心理学中的大五人格模型引入古人的人格分析，能更好地用现代心理学理论解释古人的心理。利用大数据分析古汉语文本的基础是分词，本文首先提出了基于分词词典CCIDict的分词器构建方法。在互联网发达的今天，日渐丰富的在线古汉语数据资料为研究古汉语语料提供了新的契机。我们采用了大数据相关的处理方法对数据进行采集，处理，转换，得到了包含331516个字词的基础词典；然后提出多特征融合的新词发现方法，融合N-Gram词频、互信息、信息熵、位置成词概率在大规模语料库中抽取新词，形成候补词典；最后形成融合基础词典和候补词典的组合词典，基本囊括了所有CC-LIWC词汇。通过对比发现，使用了组合词典CCIDict的正向最大匹配算法准确度最高，与开源的甲言分词器相比，本文分词器的F值提高了14%，取得了更好的效果。这也为利用大数据分析古汉语文本的心理语义奠定了基础。人格分析是古人心理研究的一个重要部分，传统的心理传记学研究方法由于受到资源的限制，对于开展大范围纵向的中国古代历史人物研究有一定的局限性，随着人工智能技术的发展，迁移学习为古人心理研究提供了新的思路。微博推文是现代人在互联网上的自我表达，能够体现个体的人格倾向。与之相对应的古籍文本中，自传、书信、诏书也都是一种自我表达。通过自我表达文本的分析，我们可以从中了解古代不同角色的历史人物的人格特征。我们将现代人的微博数据作为源域，古人自传作为目标域，通过训练迁移学习模型预测古代历史人物的大五人格。我们提出了UERDA和SERDA模型，UERDA在分布适配的过程中加入了信息熵，实现了基于回归任务的无监督领域适配，SERDA在UERDA基础上加入了少量标注样本，从而提升了迁移模型在回归任务中的准确度，UERDA与传统算法PCA相比，性能指标RMSE平均提升2.23；SERDA与传统算法PCA相比，性能指标RMSE平均提升3.16。结果表明，迁移学习在中国古代人物的大五人格预测方面获得了良好的表现，为通过量化方式开展大时间跨度下的社会变迁研究开辟了新的途径。为了证明本文研究方法的有效性，我们对古代皇帝和历史名人进行了心理分析。诏书从一定意义上来说是皇帝的自我表达文本，能够体现皇帝的心理特征，我们将分词器与CC-LIWC相结合，分析了皇帝诏书的心理语义特征，进而解释社会变迁中的心理学规律。结果发现，权力对王朝存续时间和人口数量有显著的影响，权力能够解释存续时间38.3%的原因，但是权力与国土面积的关系不存在统计学意义上的相关。在历史名人中，诸葛亮是蜀汉时期的丞相，为刘备主持朝政，是大众所熟知的古代人物，我们选择诸葛亮作为分析对象，分析了诸葛亮的大五人格特征，结果表明，利用SERDA预测诸葛亮的大五人格与人工他评的结果相近，并且也符合人们的主观感受。本文从大数据和人工智能的角度提出了研究古人心理的一系列方法，实现了中国古代人物个体或群体的心理分析，数据结果表明，本文的研究方法可以较好地应用于古人心理研究，特别是在大时间跨度下的社会变迁研究中优势比较突出。
语种	中文
内容类型	学位论文
源URL	[http://ir.psych.ac.cn/handle/311026/41575]
专题	心理研究所_社会与工程心理学研究室
推荐引用方式 GB/T 7714	邢付贵. 基于大规模文言语料库的古人心理分析关键技术研究[D]. 中国科学院心理研究所. 中国科学院心理研究所. 2021.

个性服务

查看访问统计

相关权益政策

暂无数据

收藏/分享

所有评论 (0)

[发表评论/异议/意见]

暂无评论

评论
权益异议
反馈意见

评注功能仅针对注册用户开放，请您登录

您对该条目有什么异议，请向管理员反馈。
内容：
Email：	*
单位:
验证码：	刷新

您在知识库使用过程中有什么好的想法或者建议可以反馈给我们。
标题：	*
内容：
Email：	*
验证码：	刷新

相关链接

CORC

联系我们