Bias in Error Estimation When Using Cross-Validation for Model Selection.

B-Tier
Journal: World Bank Economic Review
Year: 2022
Volume: 36
Issue: 4
Pages: 835-856

Authors (3)

Daniel Gerszon Mahler (not in RePEc) R Andrés Castañeda Aguilar (not in RePEc) David Newhouse (World Bank Group)

Score contribution per author:

0.670 = (α=2.01 / 3 authors) × 1.0x B-tier

α: calibrated so average coauthorship-adjusted count equals average raw count

Abstract

This paper evaluates different methods for nowcasting country-level poverty rates, including methods that apply statistical learning to large-scale country-level data obtained from the World Development Indicators and Google Earth Engine. The methods are evaluated by withholding measured poverty rates and determining how accurately the methods predict the held-out data. A simple approach that scales the last observed welfare distribution by a fraction of real GDP per capita growth performs nearly as well as models using statistical learning on 1,000+ variables. This GDP-based approach outperforms all models that predict poverty rates directly, even when the last survey is up to five years old. The results indicate that in this context, the additional complexity introduced by applying statistical learning techniques to a large set of variables yields only marginal improvements in accuracy.

Technical Details

RePEc Handle
repec:oup:wbecrv:v:36:y:2022:i:4:p:835-856.
Journal Field
Development
Author Count
3
Added to Database
2026-01-26