Paper details

Title: Spatial Disaggregation of Population Subgroups Leveraging Self-Trained Multi-Output Gradient Boosted Regression Trees

Authors: Marina Georgati, João Monteiro, Bruno Martins, Carsten Keßler

Abstract: Obtained from CrossRef

Abstract. Accurate and consistent estimations on the present and future population distribution, at fine spatial resolution, are fundamental to support a variety of activities. However, the sampling regime, sample size, and methods used to collect census data are heterogeneous across temporal periods and/or geographic regions. Moreover, the data is usually only made available in aggregated form, to ensure privacy. In an attempt to address these issues, several previous initiatives have addressed the use of spatial disaggregation methods to produce high-resolution gridded datasets describing the human population distribution, although these projects have usually not addressed specific population subgroups. This paper describes a spatial disaggregation method based on self-training regression models, innovating over previous studies in the simultaneous prediction of disaggregated counts for multiple inter-related variables, by leveraging multi-output models based on gradient tree boosting. We report on experiments for two case studies, using high-resolution data (i.e., counts for different subgroups available at a resolution of 100 meters) for the municipality of Amsterdam and the region of Greater Copenhagen. Results show that the proposed approach can capture spatial heterogeneity and the dependency on local factors, outperforming alternatives (e.g., seminal disaggregation algorithms, or approaches leveraging individual regression models for each variable) in terms of averaged error metrics, and also upon visual inspection of spatial variation in the resulting maps.

Codecheck details

Certificate identifier: 2022-005

Codechecker name: Frank O. Ostermann

Time of codecheck: 2022-07-09 12:00:00

Repository: https://osf.io/CDFAH

Codecheck report: https://doi.org/10.17605/osf.io/cdfah

Summary:

The paper presents an extensive quantitative study that consists of numerous (pre-)processing steps involving multiple input data sets of different types (e.g., CSV, remotely sensed imagery, and geographic vector data). The input data is not provided but sufficiently documented to be recreatable or retrievable from other sources. Unfortunately, despite great support from the corresponding author, the time constraints of this review, coupled with the need to organize the data and the complexity of the workflow, allowed only a partial reproduction of the processing pipeline: the initial dasymetric mapping to generate the first inputs of disaggregated population density, and one of the multiple analysis on that data for the city of Amsterdam. However, a careful evaluation of the available code led this reviewer to the conclusion that with more time, a successful reproduction of the entire workflow is highly likely. In any case, there is sufficient information to replicate the study for a different geographic area or with different methods or parameters.


https://codecheck.org.uk/ | GitHub codecheckers

© Stephen Eglen & Daniel Nüst

Published under CC BY-SA 4.0

DOI of Zenodo Deposit

CODECHECK is a process for independent execution of computations underlying scholarly research articles.