Optimized k-NN regression for imputing and mapping land prices in Dar es Salaam city
Abstract
Urban land prices provide essential information regarding urban land markets, investment direction, resource allocation and socioeconomic status, and may facilitate an understanding of urban spatial growth and its implication to the environment. Availability of complete and up-to-date land price information is thus crucial in supporting sustainability of urban environments in rapidly urbanizing African cities. However, land price inventories in countries like Tanzania are characterized by missing data, which hinders an understanding of its spatial distribution and temporal change. Thus, this paper demonstrates the potential of machine learning in imputation of missing residential land prices in Dar es Salaam city. It first maps available residential land prices during the 2014-2020 period and identifies locations with missing data. The first Law of Geography and optimized k-Nearest Neighbor regression analysis with a grid-search cross-validation method is applied to impute the missing values. An optimized k-Nearest Neighbor regression model with the Euclidean distance metric demonstrated better accuracy, achieving a lower error rate of 4%-30%, with fewer nearest neighbors, than other distance metrics for all studied years. This optimized model can be used to minimize challenges faced by different stakeholders in development planning and decision making as a result of incompleteness of data across spatial and temporal scales for data types with characteristics similar to the land price data.
Keywords: Land price, imputation, k-Nearest Neighbor, missing data, grid-search cross-validation
Published
Issue
Section
License
Copyright (c) 2023 Marandu Gideon, Beatrice Tarimo, Mushi Vianey

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.