森林资源抽样调查缺失数据填充方法

doi:10.13466/j.cnki.lyzygl.2018.06.021

林业资源管理 ›› 2018, Vol. 0 ›› Issue (6): 130-137.doi: 10.13466/j.cnki.lyzygl.2018.06.021

森林资源抽样调查缺失数据填充方法

刘菲(), 李明阳(), 刘雅楠, 江一帆, 王子

南京林业大学林学院,南京 210037

收稿日期:2018-09-17 修回日期:2018-12-10 出版日期:2018-12-28 发布日期:2020-09-27
通讯作者: 李明阳
作者简介:刘菲(1994-),女,安徽滁州人,在读硕士,从事3S技术应用方面的研究。Email: 121126082@qq.com
基金资助:
国家自然科学基金项目“基于情景分析与多目标决策的南方集体林长期经营规划方法研究”(31770679)

Filling Method for Missing Data of Forest Resource Sampling Investigation

LIU Fei(), LI Mingyang(), LIU Yanan, JIANG Yifan, WANG Zi

College of Forestry,Nanjing Forestry University,Nanjing,Jiangsu 210037,China

Received:2018-09-17 Revised:2018-12-10 Online:2018-12-28 Published:2020-09-27
Contact: LI Mingyang

摘要/Abstract

摘要：

在森林资源抽样调查中数据缺失现象时常发生,为了提高数据分析的准确性,有必要对缺失数据填充方法进行研究。以浙江省临安市1996年Landsat-5 TM影像及同期县级森林资源连续监测固定样地数据为主要信息源,以样地内林木平均胸径为缺失因子,在对其空间自相关分析的基础上,采用十折交叉验证法对缺失数据进行空间、非空间和基于遥感估测模型填充以及精度评价。结果表明:1)研究区样地林木平均胸径的Moran’s I系数为0.21,空间分布表现出较强的空间自相关性;2)遥感估测模型中K-近邻算法的填充精度最高,其次为随机森林、空间填充的克里金内插,非空间的期望极大化算法填充精度最低;3)克里金内插的4个半方差理论模型中,球状模型填充精度最高,相关系数(0.632 5)最高,平均绝对误差(2.049 3cm)和均方根误差(3.809 3cm)最低;4)按照填充精度由高到低的顺序,4种性能较好的数据填充方法依次为:K-近邻算法>随机森林>克里金内插>距离权重反比。在地势形态复杂、海拔差异较大的临安境内,K-近邻算法较适合样地林木平均胸径因子的缺失数据填充。

关键词: 森林资源抽样调查, 胸径, 缺失数据, 填充方法, 临安市

Abstract:

The phenomenon of data loss often occurs in forest resource sampling investigation.So it is necessary to study the filling method of missing data in order to improve the accuracy of the data analysis.Linan County located in Zhejiang Province was chosen as the case study area.Landsat-5 TM image in 1996 and County-level fixed plot data of forest resources continuous detection in the same period were used as the main information,and the average DBH(Diameter at Breast Height) of trees in sample plot as the missing factor to make spatial filling,non-spatial filling,model filling of remote sensing estimation for missing data.And 10 fold cross-validation method on the basis of spatial autocorrelation analysis of the average DBH of trees in sample plot was employed to make accuracy evaluation.The results show that:(1) The Moran’I coefficient of the average DBH of sample plot trees in study area is 0.21 and its spatial distribution shows strong spatial autocorrelation;(2)The filling accuracy of K-Nearest Neighbor of remote sensing estimation models is the highest,the second is Random Forest followed by the Kriging Interpolation of spatial filling.However,the filling accuracy of expectation maximization algorithm of non-spatial fillings is the lowest;(3)Among four semi-variance models of Kriging interpolation,the filling accuracy of spherical model is higher than any other models.Its correlation coefficient constitutes 0.632 5,the mean absolute error makes up 2.049 3 centimeters and the root mean square error accounts for 3.809 3 centimeters;(4)According to the order of filling accuracy from high to low,four priority filling methods of missing data includes:K-Nearest Neighbor,Random Forest,Kriging Interpolation and Inverse Distance Weighting.It is the K-Nearest Neighbor that is most suitable for filling missing data of the average DBH of sample plot trees in Linan with complex topography and great different altitudes.

Key words: forest resource sampling investigation, DBH, missing data, filling methods, Linan

中图分类号:

S757.2

刘菲, 李明阳, 刘雅楠, 江一帆, 王子. 森林资源抽样调查缺失数据填充方法[J]. 林业资源管理, 2018, 0(6): 130-137.

LIU Fei, LI Mingyang, LIU Yanan, JIANG Yifan, WANG Zi. Filling Method for Missing Data of Forest Resource Sampling Investigation[J]. FOREST RESOURCES WANAGEMENT, 2018, 0(6): 130-137.

图/表 4

表1

表2

表3

表4

参考文献 21

[1]	李明阳, 刘敏, 刘米兰. 基于GIS的森林调查因子地统计学分析[J]. 南京林业大学学报:自然科学版, 2010,34(6):66-70.
[2]	Dempster A P, Laird N M, Rubin D B. Maximan likelihood estimation from incomplete data via the algorithm[J]. Journal of the Royal Statistical Society Series B-statistical Methodology, 1977,39:1-38.
[3]	Rubin D B. Multiple imputation after 18+ years[J]. Journal of the American Statistical Association, 1996,91(434):473-489. doi: 10.1080/01621459.1996.10476908
[4]	Rubin D B. Multiple imputation a primer[J]. Statistical Methods in Medical Research, 1999,8(1):3-15. doi: 10.1177/096228029900800102 pmid: 10347857
[5]	Chiu H Y, Sedransk J. A Bayesian procedure for imputing missing values in sample surveys[J]. Journal of the American Statistical Association, 1986,81(395):667-676. doi: 10.1080/01621459.1986.10478319
[6]	Astebro T, Chen G. How to deal with missing categorical data:Test of a simple Bayesian method[J]. Organizational Research Methods, 2010,6(3):309-327. doi: 10.1177/1094428103254672
[7]	Tara B, Matti M. Missing data in forest ecology and management:Advances in quantitative methods[J]. Forest Ecology and Management, 2012,271:1-2. doi: 10.1016/j.foreco.2012.01.045
[8]	何红艳, 郭志华, 肖文发. 降水空间插值技术的研究进展[J]. 生态学杂志, 2005,24(10):1187-1191.
[9]	靳国栋, 刘衍聪, 牛文杰. 距离权重反比插值法和克里金插值法的比较[J]. 长春工业大学学报, 2003,24(3):53-57.
[10]	张连强, 赵有中, 欧阳宗继, 等. 运用地理因子推算山区局地降水量的研究[J]. 中国农业气象, 1996,17(2):6-10.
[11]	王丹丹. 空间统计分析及其在农用地分等中的应用[D]. 西安:长安大学, 2008.
[12]	张文彤, 董伟. SPSS统计分析高级教程[M]. 北京: 高等教育出版社, 2004.
[13]	蒋云姣, 胡曼, 李明阳, 等. 县域尺度森林地上生物量遥感估测方法研究[J]. 西南林业大学学报, 2015,35(6):53-59.
[14]	荣媛, 刘任琪, 李明阳, 等. 基于星载高光谱数据的南京新济州湿地土壤有机质估测研[J].西南林业大学学报, 2017(6):171-177.
[15]	Breiman L. Random forests[J]. Machine Learning, 2001,45(1):5-32. doi: 10.1023/A:1010933404324
[16]	Goldstein B A, Hubbard A E, Cutle A, et al. An application of random forests to genome-wide association dataset:methodological considerations & new findings[J]. BMC Genetics, 2010,11(1):49-61. doi: 10.1186/1471-2156-11-49
[17]	Fullerr R M, Devereux B J, Gillings S, et al. Indices of bird-habitat preference from field surveys of birds and remote sensing of land cover:a study of south-eastern England with wider implications for conservation and biodiversity assessment[J]. Global Ecology & Biogeography, 2005,14(3):223-229.
[18]	李明阳, 余超, 张密芳, 等. 紫金山风景林生物量及驱动因素时间轨迹分析[J]. 北京林业大学学报, 2015,37(2):1-7.
[19]	David E. Statistics in Geography[M]. Oxford:Oxford Basil Blackwell Ltd, 1985.
[20]	戴前石, 刘金山. 青藏高原贡觉县森林规划设计因子的地统计学分析[J]. 西南林业大学学报, 2017,37(3):146-151.
[21]	陈伟强, 刘国顺, 华一新, 等. 基于GIS的河南省典型烟区土壤养分时空变异分析[J]. 河南农业科学, 2007,36(11):70-75. doi: 10.3969/j.issn.1004-3268.2007.11.023

半方差模型	半方差模型参数				精度评价指标
半方差模型	块金值	基台值	变程/m	块金/基台	相关系数	平均绝对误差/cm	均方根误差/cm
球状模型	0.856	1.121	8290	0.236	0.6325	2.0493	3.8093
指数模型	0.327	0.997	3000	0.672	0.5901	2.3147	3.8877
线性模型	0.278	0.977	5100	0.721	0.3212	3.8925	4.9475
高斯模型	0.314	0.997	3900	0.685	0.4119	3.9730	4.6904

精度评价指标	回归算法	EM算法	距离权重反比	样条法	克里金内插
相关系数	0.1373	0.1327	0.3981	0.1496	0.6325
平均绝对误差/cm	4.8925	4.9730	3.8936	4.3168	2.0493
均方根误差/cm	5.9475	6.1904	4.7262	5.6554	3.8093

精度评价指标	多元线性回归	多元感知器	K-近邻算法	随机森林
相关系数	0.1521	0.1701	0.9017	0.8726
平均绝对误差/cm	4.3126	4.2088	1.4552	1.7012
均方根误差/cm	5.5554	5.3995	2.0369	2.5854

森林资源抽样调查缺失数据填充方法

Filling Method for Missing Data of Forest Resource Sampling Investigation

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 4

参考文献 21

相关文章 7

编辑推荐

Metrics

本文评价

[1]	曾伟生, 曹迎春, 陈新云, 赵连清. 河北省主要树种单木和林分生长率模型研建[J]. 林业资源管理, 2020, 0(1): 30-37.
[2]	曾伟生. 总体与林分水平的树高-胸径模型对蓄积量估计的影响分析[J]. 林业资源管理, 2019, 0(6): 38-41.
[3]	萨如拉, 徐加睿, 王智慧, 张科, 于宪军. 基于树干解析的兴安落叶松人工林单木生长模型研究[J]. 林业资源管理, 2019, 0(2): 88-92.
[4]	王小兰, 陈甲瑞, 邢震, 张传龙. 藏东南高山松胸径与冠径的相关性分析及应用研究[J]. 林业资源管理, 2019, 0(1): 63-69.
[5]	马克西, 曾伟生, 侯晓巍. 青海省林木胸径生长量与生长率模型研究[J]. 林业资源管理, 2018, 0(4): 22-27.
[6]	杨玉泽, 林文树, 孙英伟. 小兴安岭地区针阔混交林主要树种生长模型研究[J]. 林业资源管理, 2018, 0(3): 49-57.
[7]	赵娱, 张菲, 许中旗, 张岩, 程顺, 崔同祥. 塞罕坝地区樟子松生长过程研究[J]. 林业资源管理, 2017, 0(5): 39-44.