CORC  > 中国科学院大学
Towards identifying and reducing the bias of disease information extracted from search engine data
Huang, Da-Cang1,2,3; Wang, Jin-Feng1,2; Huang, Ji-Xia4; Sui, Daniel Z.5; Zhang, Hong-Yan6; Hu, Mao-Gui1,2; Xu, Cheng-Dong1,2
刊名Plos computational biology
2016-06-01
卷号12期号:6页码:16
ISSN号1553-734X
DOI10.1371/journal.pcbi.1004876
通讯作者Wang, jin-feng(wangjf@lreis.ac.cn)
英文摘要The estimation of disease prevalence in online search engine data (e.g., google flu trends (gft)) has received a considerable amount of scholarly and public attention in recent years. while the utility of search engine data for disease surveillance has been demonstrated, the scientific community still seeks ways to identify and reduce biases that are embedded in search engine data. the primary goal of this study is to explore new ways of improving the accuracy of disease prevalence estimations by combining traditional disease data with search engine data. a novel method, biased sentinel hospital-based area disease estimation (b-shade), is introduced to reduce search engine data bias from a geographical perspective. to monitor search trends on hand, foot and mouth disease (hfmd) in guangdong province, china, we tested our approach by selecting 11 keywords from the baidu index platform, a chinese big data analyst similar to gft. the correlation between the number of real cases and the composite index was 0.8. after decomposing the composite index at the city level, we found that only 10 cities presented a correlation of close to 0.8 or higher. these cities were found to be more stable with respect to search volume, and they were selected as sample cities in order to estimate the search volume of the entire province. after the estimation, the correlation improved from 0.8 to 0.864. after fitting the revised search volume with historical cases, the mean absolute error was 11.19% lower than it was when the original search volume and historical cases were combined. to our knowledge, this is the first study to reduce search engine data bias levels through the use of rigorous spatial sampling strategies.
WOS关键词BIG DATA-ANALYSIS ; INFLUENZA-A H7N9 ; MOUTH-DISEASE ; DIGITAL EPIDEMIOLOGY ; INTERNET SEARCHES ; SURVEILLANCE ; HAND ; FOOT ; INTELLIGENCE ; HEALTHMAP
WOS研究方向Biochemistry & Molecular Biology ; Mathematical & Computational Biology
WOS类目Biochemical Research Methods ; Mathematical & Computational Biology
语种英语
出版者PUBLIC LIBRARY SCIENCE
WOS记录号WOS:000379349700012
内容类型期刊论文
URI标识http://www.corc.org.cn/handle/1471x/2374747
专题中国科学院大学
通讯作者Wang, Jin-Feng
作者单位1.Chinese Acad Sci, Inst Geog Sci & Nat Resource Res, State Key Lab Resources & Environm Informat Syst, Beijing, Peoples R China
2.Chinese Ctr Dis Control & Prevent, Key Lab Surveillance & Early Warning Infect Dis, Beijing, Peoples R China
3.Univ Chinese Acad Sci, Beijing, Peoples R China
4.Beijing Forestry Univ, Coll Forestry, Beijing, Peoples R China
5.Ohio State Univ, Dept Geog, Columbus, OH 43210 USA
6.Northeast Normal Univ, Sch Geog Sci, Changchun, Peoples R China
推荐引用方式
GB/T 7714
Huang, Da-Cang,Wang, Jin-Feng,Huang, Ji-Xia,et al. Towards identifying and reducing the bias of disease information extracted from search engine data[J]. Plos computational biology,2016,12(6):16.
APA Huang, Da-Cang.,Wang, Jin-Feng.,Huang, Ji-Xia.,Sui, Daniel Z..,Zhang, Hong-Yan.,...&Xu, Cheng-Dong.(2016).Towards identifying and reducing the bias of disease information extracted from search engine data.Plos computational biology,12(6),16.
MLA Huang, Da-Cang,et al."Towards identifying and reducing the bias of disease information extracted from search engine data".Plos computational biology 12.6(2016):16.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace