Food prices have consistently been one of the leading contributors to Colombia’s inflation rate. They are particularly sensitive to exogenous factors such as extreme weather events, supply chain disruptions, and global commodity price shocks, often resulting in sharp and unpredictable price fluctuations. This document pursues two main objectives. First, it aims to estimate and evaluate methods for forecasting 33 homogeneous food inflation baskets, which together constitute the total food Consumer Price Index (Food CPI), offering tools that can assist policymakers in anticipating the drivers of future inflation. This includes both traditional time series models and modern machine learning approaches. Second, it seeks to enhance the interpretability of model predictions through explainable AI techniques. To achieve this, we propose a variable lag selection algorithm to identify optimal feature-lag pairs, and employ SHAP (SHapley Additive exPlanations) values to quantify the contribution of each feature to the model’s forecast. Our findings indicate that machine learning models outperform traditional approaches in forecasting food inflation, delivering improved accuracy across most individual baskets as well as for aggregated food inflation.
The most recent
Approach
This article aims to develop statistical models to forecast monthly inflation for the next 12 months for 33 baskets that make up the Consumer Price Index (CPI) for food. To this end, both traditional time series models and Machine Learning approaches are employed. Each basket is modeled independently, incorporating four groups of explanatory variables relevant to food supply: climate variables, the nominal exchange rate, commodity prices, and transportation and energy costs. Commodity and energy prices influence food inflation by affecting production, transportation, and processing costs. Additionally, climate impacts agricultural production by altering crop growth, soil quality, and livestock health. Furthermore, the document seeks to interpret the forecasts using SHAP values (SHapley Additive exPlanations), a widely used tool to explain machine learning model predictions.
Contribution
This research contributes to the literature on food inflation forecasting by demonstrating that tree-based models such as XGBoost produce more accurate forecasts than linear time series models. Moreover, we show that it is possible to break down the predictions of this model into contributions from fundamental explanatory variables. The dynamics of each food basket respond to variables such as climate, commodity prices, transportation costs, and exchange rate behavior.
The interpretation of these model predictions is achieved through an exhaustive process of selecting explanatory variables and an algorithm we propose to select the optimal lags of these variables. This algorithm reduces the number of variables, simplifying interpretation and lowering computational costs.
XGBoost models are more accurate in forecasting food inflation than linear models for most of the 33 baskets studied, especially for longer forecast horizons.
Results
XGBoost models are more accurate in forecasting food inflation than linear models for most of the 33 baskets studied, especially for longer forecast horizons. Forecast errors from the XGBoost model were between 5% and 60% lower than those of linear models, depending on the basket and forecast horizon, and for the aggregate food inflation basket, errors were on average 25% lower.
There is a high degree of heterogeneity in the explanation of forecasts depending on the basket analyzed. For certain food groups, such as perishables, climate and inflation persistence are relevant factors. In contrast, other foods, such as industrial products, are mainly explained by commodity costs and inflation persistence.

Cesar Anzola-Bravoa,
Poveda-Olarte Paolaa