Under review · 2025

The conditional value of machine learning in revenue forecasting.

Jodonnis Rodriguez · Charles Teague III · Yu Zhang · Faizan Ali

Description

This manuscript evaluates whether pooled tree-based methods improve one-year-ahead revenue forecasts when models use transparent, publicly observable information available at the forecast origin. It studies large-cap U.S. firms from 2013 to 2025 and evaluates out-of-sample forecasts from 2022 through 2025 under an expanding-window design.

Random Forest and XGBoost are compared with naive growth, linear trend, AR(1), and multivariate OLS benchmarks. The full-sample benchmark remains AR(1), while the tree-based methods show their clearest value among firms with more volatile revenue, where model flexibility is more useful than it is for stable revenue series.

What this paper contributes.

01 · Benchmark

Uses strong simple comparators.

The paper compares machine learning models against naive, trend, autoregressive, and OLS benchmarks rather than only weak baselines.

02 · Design

Limits inputs to observable information.

Forecasts use data available at the forecast origin, keeping the design closer to practical forecasting conditions.

03 · Finding

Shows machine learning value is conditional.

Tree-based methods are most useful for firms with volatile revenues, while simple autoregressive models remain competitive for stable firms.

04 · Practice

Clarifies when complexity is warranted.

The results argue against adopting machine learning by default and support model choice based on revenue behavior.