The conditional value of machine learning in revenue forecasting.
Description
This manuscript evaluates whether pooled tree-based methods improve one-year-ahead revenue forecasts when models use transparent, publicly observable information available at the forecast origin. It studies large-cap U.S. firms from 2013 to 2025 and evaluates out-of-sample forecasts from 2022 through 2025 under an expanding-window design.
Random Forest and XGBoost are compared with naive growth, linear trend, AR(1), and multivariate OLS benchmarks. The full-sample benchmark remains AR(1), while the tree-based methods show their clearest value among firms with more volatile revenue, where model flexibility is more useful than it is for stable revenue series.
What this paper contributes.
Uses strong simple comparators.
The paper compares machine learning models against naive, trend, autoregressive, and OLS benchmarks rather than only weak baselines.
Limits inputs to observable information.
Forecasts use data available at the forecast origin, keeping the design closer to practical forecasting conditions.
Shows machine learning value is conditional.
Tree-based methods are most useful for firms with volatile revenues, while simple autoregressive models remain competitive for stable firms.
Clarifies when complexity is warranted.
The results argue against adopting machine learning by default and support model choice based on revenue behavior.