| Title | Research on improved HyperGBM machine learning sales forecasting model based on weather factors |
| Author | FENG Gong; LI Jianbin; SHEN Boyu; LI Xinhong; GUAN Mengcheng; MEI Qihuang |
| Abstract | The volatility of demand and the high requirement for forecast accuracy have made sales forecasting a critical issue in both academic and industrial fields. The accuracy of sales forecasting exerts a significant impact on a company's production activities and overall revenue. This paper conducts an in-depth study on sales forecasting in the beverage retail industry and proposes an improved HyperGBM model that incorporates weather factors. In this paper, we compare this proposed model with the classic ARIMA (AutoRegressive Integrated Moving Average) model, SARIMA (Seasonal ARIMA) model, and Prophet model developed by Facebook, while also analyzing the influence of weather factors on the forecasting performance. By analyzing the sales records and weather data of 72 SKUs (Stock Keeping Units) in Xi'an and Kunming, the study reveals that the HyperGBM model demonstrates significant advantages over traditional methods in most SKUs (63 out of 71), with the average forecasting accuracy improved by 22.9%. In particular, the inclusion of weather factors further enhances the forecasting accuracy of the HyperGBM model, increasing it by 30%. First of all, based on the literature review, this study explores the impact of the periodicity and stability of historical sales data on forecasting accuracy. From the perspective of seasonal cycles, the influence of external weather on product sales is taken into account; for instance, the increased demand for cold beverages in hot weather and the heightened demand for hot beverages in cold weather. From the perspective of stability, the ability to handle abnormal fluctuations in time series data is a key indicator for measuring the robustness of a model. Therefore, we divide the sample data into four groups according to these two dimensions. STL (Seasonal-Trend Decomposition using Loess) is employed to test the periodic fluctuations in SKU sales, while the ADF (Augmented Dickey-Fuller) test is used to evaluate the stability of the time series, and the IQR test is applied to measure data dispersion and effectively identify outliers. Secondly, in terms of model selection, this paper not only considers the traditional ARIMA, SARIMA, and Prophet models proposed by FACEBOOK, but also takes into account the HyperGBM model based on machine learning methods. An innovative improved HyperGBM model is proposed, which integrates external weather parameters. By inputting additional weather data into the original HyperGBM to assist in forecasting, compared with the original HyperGBM model, this improved model achieves a maximum performance improvement of 30% and an average performance improvement of 13.8% in terms of forecasting accuracy. Thirdly, regarding experiments and conclusions, we used 80% of the data for training and the remaining 20% for testing. By comparing the prediction results of different models under the metrics of MSE (Mean Squared Error), RMSE (Root Mean Squared Error), and SMAPE (Symmetric Mean Absolute Percentage Error), it was found that the HyperGBM model (Note: Corrected the possible typo "HxPerGBAA" to be consistent with the previously mentioned "HyperGBM" for term consistency) outperformed traditional methods in most SKUs (63 out of 71), with an average improvement of 22.9% in the RMSE metric. The study further explored the impact of weather factors and found that the HyperGBM model integrated with weather data could achieve a prediction accuracy improvement of up to 30% and an average of 13.8%. Different types of SKUs are suitable for different prediction methods; for example, SKUs with strong periodicity are more suitable for the SARIMA model, while SKUs with high stability are more suitable for the HyperGBM algorithm. g practical guidance and references for sales forecasting in the beverage industry.The Prophet model shows good performance in handling outliers, and when compared with other models (especially machine learning models), it performs particularly well when there are sudden jumps in the dataset. In conclusion, this paper explores the issue of sales forecasting in the highly competitive beverage sales market. By employing six different sales forecasting methods, we investigate which forecasting algorithms are suitable for beverages with different sales characteristics. This paper also integrates the impact of weather factors on beverage sales into machine learning forecasting algorithms to explore the influence of weather factors on the forecasting accuracy of these algorithms, thereby providin |
| Keywords | Sales forecasting; machine learning; weather factors; HyperGBM. |
| Issue | Vol. 40, No. 2, 2026 |
Title
Research on improved HyperGBM machine learning sales forecasting model based on weather factors
Author
FENG Gong; LI Jianbin; SHEN Boyu; LI Xinhong; GUAN Mengcheng; MEI Qihuang
Abstract
The volatility of demand and the high requirement for forecast accuracy have made sales forecasting a critical issue in both academic and industrial fields. The accuracy of sales forecasting exerts a significant impact on a company's production activities and overall revenue. This paper conducts an in-depth study on sales forecasting in the beverage retail industry and proposes an improved HyperGBM model that incorporates weather factors. In this paper, we compare this proposed model with the classic ARIMA (AutoRegressive Integrated Moving Average) model, SARIMA (Seasonal ARIMA) model, and Prophet model developed by Facebook, while also analyzing the influence of weather factors on the forecasting performance. By analyzing the sales records and weather data of 72 SKUs (Stock Keeping Units) in Xi'an and Kunming, the study reveals that the HyperGBM model demonstrates significant advantages over traditional methods in most SKUs (63 out of 71), with the average forecasting accuracy improved by 22.9%. In particular, the inclusion of weather factors further enhances the forecasting accuracy of the HyperGBM model, increasing it by 30%. First of all, based on the literature review, this study explores the impact of the periodicity and stability of historical sales data on forecasting accuracy. From the perspective of seasonal cycles, the influence of external weather on product sales is taken into account; for instance, the increased demand for cold beverages in hot weather and the heightened demand for hot beverages in cold weather. From the perspective of stability, the ability to handle abnormal fluctuations in time series data is a key indicator for measuring the robustness of a model. Therefore, we divide the sample data into four groups according to these two dimensions. STL (Seasonal-Trend Decomposition using Loess) is employed to test the periodic fluctuations in SKU sales, while the ADF (Augmented Dickey-Fuller) test is used to evaluate the stability of the time series, and the IQR test is applied to measure data dispersion and effectively identify outliers. Secondly, in terms of model selection, this paper not only considers the traditional ARIMA, SARIMA, and Prophet models proposed by FACEBOOK, but also takes into account the HyperGBM model based on machine learning methods. An innovative improved HyperGBM model is proposed, which integrates external weather parameters. By inputting additional weather data into the original HyperGBM to assist in forecasting, compared with the original HyperGBM model, this improved model achieves a maximum performance improvement of 30% and an average performance improvement of 13.8% in terms of forecasting accuracy. Thirdly, regarding experiments and conclusions, we used 80% of the data for training and the remaining 20% for testing. By comparing the prediction results of different models under the metrics of MSE (Mean Squared Error), RMSE (Root Mean Squared Error), and SMAPE (Symmetric Mean Absolute Percentage Error), it was found that the HyperGBM model (Note: Corrected the possible typo "HxPerGBAA" to be consistent with the previously mentioned "HyperGBM" for term consistency) outperformed traditional methods in most SKUs (63 out of 71), with an average improvement of 22.9% in the RMSE metric. The study further explored the impact of weather factors and found that the HyperGBM model integrated with weather data could achieve a prediction accuracy improvement of up to 30% and an average of 13.8%. Different types of SKUs are suitable for different prediction methods; for example, SKUs with strong periodicity are more suitable for the SARIMA model, while SKUs with high stability are more suitable for the HyperGBM algorithm. g practical guidance and references for sales forecasting in the beverage industry.The Prophet model shows good performance in handling outliers, and when compared with other models (especially machine learning models), it performs particularly well when there are sudden jumps in the dataset. In conclusion, this paper explores the issue of sales forecasting in the highly competitive beverage sales market. By employing six different sales forecasting methods, we investigate which forecasting algorithms are suitable for beverages with different sales characteristics. This paper also integrates the impact of weather factors on beverage sales into machine learning forecasting algorithms to explore the influence of weather factors on the forecasting accuracy of these algorithms, thereby providin
Keywords
Sales forecasting; machine learning; weather factors; HyperGBM.
Issue
Vol. 40, No. 2, 2026
References