A self-training semi-supervised machine learning method for predictive mapping of soil classes with limited sample data
Zhang, Lei3,5; Yang, Lin3,4; Ma, Tianwu1,2,5; Shen, Feixue3; Cai, Yanyan3; Zhou, Chenghu3,4
刊名GEODERMA
2021-02-15
卷号384页码:10
关键词Digital soil sampling Machine learning Semi-supervised learning Self-training Predictive mapping
ISSN号0016-7061
DOI10.1016/j.geoderma.2020.114809
通讯作者Yang, Lin(yanglin@nju.edu.cn)
英文摘要Numerous machine learning models have been developed for constructing the relationship between soil classes or properties and its environmental covariates in digital soil mapping (DSM). Most machine learning models are trained with a supervised learning (SL) method based on training samples. However, the collected sample data is often limited in practice due to that field sampling is expensive and time-consuming. The insufficient samples may limit the learning ability of the model to a large extent. Semi-supervised machine learning, a new machine learning paradigm that makes use of both unsampled data and a small amount of sampled data in the learning process, can be a potential effective method for DSM. In this study, we present a self-training semi-supervised learning (SSL) method for DSM. Different with the SL method for machine learning models, the SSL method not only utilizes the sampled locations but also the abundant environmental covariate information at the unvisited locations. Its basic idea is to iteratively enlarge the training data set by adding the unsampled points with high prediction confidence from the unvisited locations until a stopping criterion reached. The proposed SSL method was applied in machine learning models for predicting soil classes in Heshan Farm of Nenjiang County in Heilongjiang Province, China. Three machine learning models, including multinomial logistic regression (MLR), k-nearest neighbor (KNN) and random forest (RF), were selected to evaluate the efficiency of the SSL method. The entropy threshold was an important parameter in the SSL method, and a sensitivity analysis on this parameter was conducted with using a series of entropy thresholds. The SSL method was compared with the SL method for the three machine learning models for soil prediction. A cross-validation was employed to evaluate the accuracy of the predicted soil class maps generated based on each method. The results showed that the prediction accuracies (the proportion of the correctly predicted samples over the total number of validation samples) of the SSL method were higher than those of the SL method for MLR, KNN, and RF by 5.9%, 12.2%, and 6.0%, respectively. RF-SSL was the most accurate model in the study area, followed by KNN-SSL. Meanwhile, the self-training SSL method for the KNN model had the largest improvement comparing with the other two models. Furthermore, the predicted soil maps using the SSL method showed a more reasonable spatial variation pattern of soil classes. In the study area, a suitable value of the entropy threshold was 0.8 similar to 1.0. We concluded that the SSL method improved the soil prediction accuracy compared with the SL method when applying machine learning models for DSM, and thus is a potential efficient method for DSM with limit sample data.
资助项目National Natural Science Foundation of China[41971054] ; National Natural Science Foundation of China[41530749] ; National Natural Science Foundation of China[41871300]
WOS关键词SPATIAL PREDICTION ; RANDOM FORESTS ; REGRESSION ; CLASSIFICATION ; RESOLUTION ; LANDSCAPE ; REGION ; STOCKS ; MAP
WOS研究方向Agriculture
语种英语
出版者ELSEVIER
WOS记录号WOS:000594244300014
资助机构National Natural Science Foundation of China
内容类型期刊论文
源URL[http://ir.igsnrr.ac.cn/handle/311030/136524]  
专题中国科学院地理科学与资源研究所
通讯作者Yang, Lin
作者单位1.Nanjing Normal Univ, Sch Geog, Nanjing 210023, Peoples R China
2.Nanjing Normal Univ, Minist Educ, Key Lab Virtual Geog Environm, Nanjing 210023, Peoples R China
3.Nanjing Univ, Sch Geog & Ocean Sci, Nanjing 210023, Peoples R China
4.Chinese Acad Sci, Inst Geog Sci & Nat Resources Res, State Key Lab Resources & Environm Informat Syst, Beijing 100101, Peoples R China
5.Jiangsu Ctr Collaborat Innovat Geog Informat Reso, Nanjing 210023, Peoples R China
推荐引用方式
GB/T 7714
Zhang, Lei,Yang, Lin,Ma, Tianwu,et al. A self-training semi-supervised machine learning method for predictive mapping of soil classes with limited sample data[J]. GEODERMA,2021,384:10.
APA Zhang, Lei,Yang, Lin,Ma, Tianwu,Shen, Feixue,Cai, Yanyan,&Zhou, Chenghu.(2021).A self-training semi-supervised machine learning method for predictive mapping of soil classes with limited sample data.GEODERMA,384,10.
MLA Zhang, Lei,et al."A self-training semi-supervised machine learning method for predictive mapping of soil classes with limited sample data".GEODERMA 384(2021):10.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace