Supply Chain Demand Forecasting; A Comparison of Machine Learning Techniques and Traditional Methods

Journal of Applied Sciences

Year: 2009 | Volume: 9 | Issue: 3 | Page No.: 521-527
DOI: 10.3923/jas.2009.521.527

Supply Chain Demand Forecasting; A Comparison of Machine Learning Techniques and Traditional Methods

J. Shahrabi, S. S. Mousavi and M. Heydar

Abstract: In this study, supply chain demand is forecasted with different methods and their results are compared. In this research traditional time series forecasting methods including moving average, exponential smoothing, exponential smoothing with trend at the first stage and finally two machine learning techniques including Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs), are used to forecast the long-term demand of supply chain. By using the data set of the component supplier of the biggest Iranian`s car company this research is then implemented. The comparison reveals that the results producing by machine learning techniques are more accurate and much closer to the actual data in contrast with traditional forecasting methods.

Fulltext PDF Fulltext HTML

How to cite this article

J. Shahrabi, S. S. Mousavi and M. Heydar, 2009. Supply Chain Demand Forecasting; A Comparison of Machine Learning Techniques and Traditional Methods. Journal of Applied Sciences, 9: 521-527.

Keywords: Artificial neural networks, support vector machine, demand forecasting and bullwhip effect

INTRODUCTION

This study is concerned with forecasting an Iranian’s car components supplier company, a time-series with trend and seasonal patterns.

Some machine learning techniques, including artificial neural networks and support vector machines, are compared to the more traditional time-series forecasting methods, including moving average, exponential smoothing without and with trend as MAPE (Mean Absolute Percentage Error) index. These methods are chosen because of their ability to model trend and seasonal fluctuations present in suppliers’ data. The objectives of this study are three-fold: (1) to show how to forecast wholesaler sales using ANNs and SVMs and (2) to display how various time-series forecasting methods compare in their forecasting accuracy of wholesaler sales.

The reasons for this study are both theoretical and practical. Theoretically speaking, how to improve the quality of forecasts is still an outstanding question (Granger, 1996). For data containing trend or/and seasonal patterns, failure to account for these patterns may result in poor forecasts. Over the last few decades several methods such as moving average, Halt method, Winters exponential smoothing, Box-Jenkins ARIMA model and multivariate regressions have been proposed and widely used to account for these patterns. ANN is a new contender in forecasting sophisticated trend and seasonal data. Franses and Draisma (1997) suggested that ANNs be used to investigate how and when seasonal patterns change over time.

Industry forecasts are especially useful to wholesalers who may have a greater market share. For the supply chains, Peterson (1993) showed that larger retailers or wholesalers are more likely to use time-series methods and prepare industry forecasts, while smaller retailers emphasize judgmental methods and company forecasts. Better forecasts of wholesaler sales can improve the forecasts of individual retailers because changes in their sales levels are often systematic. For example, around new years, sales of most retailers increase. Moreover, models of forecasting individual store sales will often include assumptions about industry-wide sales and market-share.

Indeed, accurate forecasts of wholesalers’ sales have the potential to improve individual stores’ sales forecasts, especially of larger retailers who may have a significant market share.

In the past decade, ANNs have emerged as a technology with a great chance for identifying and modeling data patterns that are not easily discernible by traditional statistical methods in many fields of science, such as computer science, electrical engineering and finance. Qi and Maddala (1999) and Qi (1999) showed that many studies in the finance literature evidencing predictability of stock returns by means of linear regression can be improved by a neural network. Qi (1996) did comprehensive survey of ANN applications in finance.

ANNs have also been increasingly used in management, marketing and retailing. The reader is referred to Krycha and Wagner (1999) for a comprehensive survey of ANN applications in management and marketing. Zhang et al. (1998) provided a comprehensive review of ANNs’ usage in forecasting.

Jeong et al. (2002) have presented a computerized system for implementing the forecasting activities required in SCM. For building a generic forecasting model applicable to SCM, a linear causal forecasting model has proposed and its coefficients have efficiently determined using the proposed genetic algorithms (GA), canonical GA and guided GA (GGA). Compared to canonical GA, GGA adopts a fitness function with penalty operators and uses Population Diversity Index (PDI) to overcome premature convergence of the algorithm. The results obtained from two case studies show that the proposed GGA provides the best forecasting accuracy and greatly outperforms the regression analysis and canonical GA methods.

Willemain et al. (2004) forecasted the cumulative distribution of intermittent (or irregular) demand, i.e., random demand with a large proportion of zero values, over a fixed lead time using a new type of time series bootstrap. To assess accuracy in forecasting an entire distribution, they adapted the probability integral transformation to intermittent demand. Using nine large industrial datasets, they showed that the bootstrapping method produces more accurate forecasts of the distribution of demand over a fixed lead time than do exponential smoothing method and Croston’s method (Croston, 1972).

Hilas et al. (2006) used forecasting models for the monthly outgoing telephone calls in a University Campus. In this study, three different forecasting methods, including Seasonal Decomposition, Exponential Smoothing and SARIMA method, have been used. Forecasts with 95% confidence intervals were calculated for each method and compared with the actual data.

Hill et al. (1996) showed that ANNs significantly outperform traditional methods of forecasting when forecasting quarterly and monthly data. Although theoretically speaking ANN may improve on the traditional time-series methods in forecasting a series with trend and seasonal patterns, Nelson et al. (1994) found that ANNs do not model the seasonal fluctuations in the data very well. Foster et al. (1992) found that exponential smoothing is superior to ANNs in forecasting yearly data and comparable in forecasting quarterly data. Monthly data were not used in their study. Winters exponential smoothing model, in particular, has been found to provide superior forecasts in a variety of contexts. Dugan et al. (1994) showed that the Winters model can outperform both the Census X-11 and the random walk models in predicting a variety of income statement items (i.e., sales, earnings before interest and taxes, interest expenses, earnings before taxes, tax expenses and earnings before extraordinary items). This result was derived from a 15-year sample (1971-1985) of 127 manufacturing and retailing firms. Alon et al. (2001) compared artificial neural networks and traditional methods including Winters exponential smoothing, Box-Jenkins ARIMA model and multivariate regression. The results indicated that on average ANNs fare favorably in relation to the more traditional statistical methods, followed by the Box-Jenkins model. The derivative analysis has showed that the neural network model is able to capture the dynamic nonlinear trend and seasonal patterns, as well as the interactions between them.

In the 1980s, the overall impression was that for immediate and short-term forecasts ARIMA models provide more accurate forecasts than other econometric models (O’Donovan, 1983). This perception was real- firmed when more recently, Dugan et al. (1994) showed that the ARIMA model forecasted income statement items more accurately than Census X-11 and random walk models.

For time series with a long history, Box-Jenkins and ANNs provided comparable results (Sharda and Pati, 1992; Tang et al., 1990). The ARIMA model is the same or superior to ANNs in terms of MAPE in a variety of applications. Despite its old tradition, the Box-Jenkins approach is a formidable competitor in the forecasting area.

Support Vector Machines (SVM), a more recent learning algorithm that has been developed from statistical learning theory (Vapnik, 1995; Vapnik et al., 1997), has a very strong mathematical foundation and has been previously applied to time series analysis (Mukherjee et al., 1997; Ruping and Morik, 2003).

We use two machine learning techniques, including Artificial Neural Network (ANN) and Support Vector Machines (SVM) for forecasting long-term demand. As benchmarks of comparison to machine learning techniques, we forecast same data with traditional time series forecasting methods, including moving average, exponential smoothing and exponential smoothing with trend.

At the end of this study, Robustness of presented forecasting methods is examined using a new data set. It is shown that machine learning techniques can forecast more accurate than traditional time series forecasting methods.

ARTIFICIAL NEURAL NETWORKS

Artificial Neural Networks (ANNs) are a class of generalized nonlinear nonparametric models inspired by studies of the brain and nerve system. The comparative advantage of ANNs over more conventional econometric models is that they can model complex, possibly nonlinear relationships without any prior assumptions about the underlying data generating process (Hornik et al., 1989, 1990; White, 1990). The data-driven nature of ANNs makes them appealing in time series modeling and forecasting. ANN models overcome the limitations of traditional forecasting methods, including misspecification, biased outliers, assumption of linearity and re-estimation (Hill et al., 1996). ANNs have been shown to be universal approximators, a property which makes them attractive in most forecasting applications. In addition, ANNs are more parsimonious than linear subspace methods such as polynomial and trigonometric series in approximating unknown functions.

Despite the many desirable features of ANNs, constructing a good network for a particular application is a non-trivial task. It involves choosing an appropriate architecture (the number of layers, the number of units in each layer and the connections among units), selecting the transfer functions of the middle and output units, designing a training algorithm, choosing initial weights and specifying the stopping rule.

It is widely accepted that a three-layer feed-forward network with an identity transfer function in the output unit and logistic functions in the middle-layer units can approximate any continuous function arbitrarily well given sufficient amount of middle-layer units (White, 1990). Thus, the network used in this research is a three-layer feed-forward one that is shown in Fig. 1.

The inputs (similar to the regressors used in the multivariate regression model) are connected to the output (similar to the regress and) via a middle layer. The network model can be specified as:

where, S_t are the wholesaler sales at time t; X is a vector of regressors that are exactly the same as the ones used in the multivariate regressions.


Fig. 1:	A three-layer feedforward neural network


Fig. 2:	MAPE index change trend graph

n is the number of units in the middle layer; is a logistic transfer function F (a) = 1/1+exp (-a)), α_t represents a vector of coefficients (or weights) from the middle to output layer units and β_t represents a matrix of coefficients from the input to middle-layer units at time t. The details of the specification and estimation of our ANN model is shown below:

•	Initial parameter values: The initial values of α_t and β_t are generated using a uniform distribution. Because of high records in the data set, it is not very important to use sophisticated method in order to generate initial value for α_t and β_t
•	Training algorithm: The ANN network is trained using the Levenberg-Marquardt’s algorithm which has been found to be the fastest method for training moderate-sized feed-forward neural networks of up to several hundred weights (Demuth and Beale, 1997)
•	Number of middle-layer units

Although we can use Bayesian regularization in the training algorithm in order to measure how many network parameters are being effectively used by the network regardless of the total number of parameters in the network (MacKay, 1992), but we experimented with several different numbers of middle-layer units and number or layers and found that the final number of middle-layer units and final number of layer are set to three and three, respectively. Results of each case MAPE index are shown in Fig. 2.

SUPPORT VECTOR MACHINES

Support Vector Machines (SVMs) are a newer type of universal function approximators that are based on the structural risk minimization principle from statistical learning theory (Vapnik, 1995) as opposed to the empirical risk minimization principle on which neural networks and linear regression, to name a few, are based. The objective of structural risk minimization is to reduce the true error on an unseen and randomly selected test example as opposed to NN and MLR, which minimize the error for the currently seen examples. Support vector machines project the data into a higher dimensional space and maximize the margins between classes or minimize the error margin for regression. Margins are soft, meaning that a solution can be found even if there are contradicting examples in the training set. A complexity parameter permits the adjustment of the number of error versus the model complexity and different kernels, such as the Radial Basis Function (RBF) kernel, can be used to permit non-linear mapping into the higher dimensional space.

The technique corresponds to minimization of the following function:

Where:

is Vapnik’s ε Insensitive Loss Function (ILF). An important point is that this function assigns zero loss to errors less than, ε thus safeguarding against over-fitting. In other words, this function doesn’t fit a crisp value but instead fits a tube with radius ε to the data. This is similar to fuzzy description of the function. The other important aspect of this loss function that it minimizes a least modulus but not least squares. The ε parameter also plays an important role by providing a sparse representation of the data, as we shall see later.

Vapnek (1998) showed that the minimizer of the objective function under very general conditions can be written as:

where, c₁ are the solution of a quadratic problem. X is equal to (x, x_i) is the so-called kernel function that

defines the generalized inner product and is a commonly used tool to perform nonlinear mapping. Several choices for the kernel function are available, such as Gaussian, sigmoid, polynomial, splines. The only item defined by the data is the coefficients c₁ which are obtained by maximizing the following quadratic form:

Subject to the constraints:

where, parameters C and ε are regularization parameters that control the flexibility or complexity of the SVM. These parameters should be selected by user using resampling or other standard techniques. However, it should be emphasized that, in contrast to the classical regularization techniques, a clear theoretical understanding of C and is still missing and is a subject of theoretical as well as experimental efforts.

Here, an appropriate kernel function is found for SVM that it can forecast with least error. In order to achieve to this aim, 4 different kernel function, including linear, polynomial, RBF and sigmoid, are examined. Errors of these forecasts are measured as MAPE index.

Based on result presented in Fig. 3, the best type of kernel function for this kind of data is linear. The output of different kernel functions and their solving time are shown in Fig. 3 and 4 respectively.


Fig. 3:	Output of different kernel functions


Fig. 4:	Solution time of different kernel function

BENCHMARK STATISTICAL METHODS

Moving average: This method considers the average of n previous periods as a forecast for the next period. The problem is determining optimum value for n. In this research, we considered a range of values for n and then determined MAPE index in order to select best value of n.

We found 150 as optimum value for n and MAPE =167.7526 consequently. Result of using this method is shown in Table 1.

Exponential smoothing: These models use a weighted average of past values, in which the weights decline geometrically over time to suppress short-term fluctuations in the data. The following formula is used to forecast:

Where:

F_t+1	=	Forecasted demand at time t+1
A_t	=	Real demand at time t

By using this method on data set, the demand was forecasted and found that the best value for α is 0.010.

With this value, forecasting error was measured and determined 167.797 as MAPE index in the best combination (Table 2).

Table 1:	Result of using moving average technique

Table 2:	Result of using exponential smoothing technique

Exponential smoothing with trend: The simple exponential smoothing method may provide an adequate future forecast if no trend, seasonal or cyclical effects exit. If a trend is present, the method can be extended to adjust tow variable, the average level and the trend level. With assumption of existence of trend in data series, formula is used and then measured MAPE index in order to determining α* and β*.

Where:

T_t+1	=	Demand trend at period t₊₁
F_t+1	=	Forecasted demand at time t₊₁
A_t	=	Real demand at time t
Ft_t+1	=	Forecasted demand with trend consideration at period t₊₁

Using this method, the best value for α and β were found 0.02 and 0.05 respectively. In this combination, forecasting error is determined 170.081 as MAPE index (Table 3).

Comparisons and models validity: In order to assess stability of proposed methods, they are tested on raw data sets and MAPE index is calculated for their results again. The least amount of MAPE which is resulted from each method is shown in Table 4. Results show that proposed ANN in this research can be applied in more efficient way than previous classic methods for forecasting demand in supply chain.

Table 3:	Result of using exponential smoothing with trend technique

Table 4:	Comparison and models validity by MAPE index

CONCLUSIONS

In this study, some machine learning techniques, including artificial neural network and support vector machines, are used to forecast demand in supply chain. The process contained two steps. In the first step, an artificial neural network with three layers and three middle units was trained by using sensitivity analysis, then 4 different kernel function was used for finding best kernel function and parameter combination in SVM algorithm, then three traditional forecasting methods was used to forecast. Forecasting errors measured by using MAPE index in all methods. Results showed that artificial neural network can forecast precisely better than other methods. Best parameter combination of each method is used to comparison and model validity in the next step. In the second step, efficiency of proposed models measured with raw data. Results showed again that artificial neural network can forecast precisely in comparison with other traditional forecasting methods and SVM.

Rank of each method in both training and testing steps shows that results of SVM method are improved so that its rank is 2, after ANN method. This finding shows that structure of data in testing set has changed probably so that linear kernel function could forecast better than past.

REFERENCES

Alon, I., Q. Min and R. Sadowski, 2001. Forecasting aggregate retail sales: A comparison of artificial neural networks and traditional method. J. Retailing Consumer Serv., 8: 147-156.
Direct Link

Croston, J.D., 1972. Forecasting and stock control for intermittent demands. Operat. Res. Q., 23: 289-303.
CrossRef Direct Link

Demuth, H. and M. Beale, 1997. Neural Network Toolbox User's Guide, Version 3.0. he Math Works, Inc., pp: 5-35.

Dugan, M., K.A. Shriver and A. Peter, 1994. How to forecast income statement items for auditing purposes. J. Business Forecast., 13: 22-26.
Direct Link

Foster, B., F. Collopy and L. Ungar, 1992. Neural network forecasting of short, noisy time series. Comput. Chemical Eng., 16: 293-297.
CrossRef Direct Link

Franses, P.H. and G. Draisma, 1997. Recognizing changing seasonal patterns using artificial neural networks. J. Econometrics, 81: 273-280.
Direct Link

Granger, C.W.J., 1996. Can we improve the perceived quality of economic forecasts. J. Applied Econometrics, 11: 455-473.
Direct Link

Hilas, C.S., S.K. Goudos and J.N. Sahalos, 2006. Seasonal decomposition and forecasting of telecommunication data: A comparative case study. Technological Forecast. Social Change, 73: 495-509.
Direct Link

Hill, T., M. O'Connor and W. Remus, 1996. Neural network models for time series forecasts. Manage. Sci., 42: 1082-1092.
Direct Link

Hornik, K., M. Stinchcombe and H. White, 1989. Multilayer feedforward networks are universal approximators. Neural Networks, 2: 359-366.
CrossRef

Hornik, K., M. Stinchcombe and H. White, 1990. Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Network, 3: 551-560.
Direct Link

Jeong, B., H. Jung and N. Park, 2002. A computerized causal forecasting system using genetic algorithms in supply chain management. J. Syst. Software, 60: 233-237.
Direct Link

Krycha, K.A. and U. Wagner, 1999. Applications of artificial neural networks in management science: A survey. J. Retailing Consumer Serv., 6: 185-203.
CrossRef Direct Link

MacKay, D.J.C., 1992. Bayesian interpolation. Neural Computation, 4: 415-447.
Direct Link

Mukherjee, S., E. Osuna and F. Girosi, 1997. Nonlinear prediction of chaotic time series using support vector machines. Paper presented at the Proceedings of IEEE NNSP.

Nelson, M., T. Hill, B. Remus and M. O'Connor, 1994. Can neural networks be applied to time series forecasting learn seasonal patterns: An empirical investigation. Proceedings of the 27 Annual Hawaii International Conference on System Sciences, January 4-7, 1994, Information Systems: Decision Support and Knowledge-Based Systems, pp: 649-655.

O'Donovan, T.M., 1983. Short Term Forecasting: An Introduction to the Box-Jenkins Approach. 1st Edn., Wiley, New York

Peterson, R.T., 1993. Forecasting practices in the retail industry. J. Business Forecast., 12: 11-14.
Direct Link

Qi, M., 1996. Financial Applications of Artificial Neural Networks. In: Handbook of Statistics, Statistical Methods in Finance, Maddala, G.S., Rao, C.R. (Eds.). North-Holland, Elsevier Science Publishers, Amsterdam, pp: 529-552

Qi, M., 1999. Nonlinear predictability of stock returns using financial and economic variables. J. Bus. Econ. Statist., 17: 419-429.
Direct Link

Qi, M. and G.S. Maddala, 1999. Economic factors and the stock market: A new perspective. J. Forecast., 18: 151-166.
CrossRef

Ruping, S. and K. Morik, 2003. Support vector machines and learning about time. Paper presented at the Proceedings of ICASSP.

Sharda, R. and R. Patil, 1992. Connectionist approach to time series prediction: an empirical test. J. Intelligent Manufact., 3: 317-323.
Direct Link

Tang, Z., C. De Almeida and P. Fishwick, 1990. Time series forecasting using neural networks vs. Box-Jenkins methodology. Simulation, 57: 303-310.
Direct Link

Vapnik, V.N., 1995. The Nature of Statistical Learning Theory. 1st Edn., Springer-Verlag, New York, USA

Vapnik, V.N., 1998. Statistical Learning Theory. 1st Edn., John Wiley and Sons, New York

Vapnik, V., S.E. Golowich and A. Smola, 1997. Support vector method for function approximation, regression estimation, and signal processing. Adv. Neural Inform. Process. Syst., 9: 287-291.
Direct Link

White, H., 1990. Connectionist nonparametric regression: Multilayer feedforward networks can learn arbitrary mappings. Neural Networks, 3: 535-549.
CrossRef Direct Link

Willemain, T.R., C.N. Smart and H.F. Schwarz, 2004. A new approach to forecasting intermittent demand for service parts inventories. Int. J. Forecast., 20: 375-387.
CrossRef Direct Link

Zhang, G., B.E. Patuwo and M.Y. Hu, 1998. Forecasting with artificial neural networks: The state of the art. Int. J. Forecast., 14: 35-62.
CrossRef Direct Link

HOME JOURNALS CONTACT

Journal of Applied Sciences

Year: 2009 | Volume: 9 | Issue: 3 | Page No.: 521-527 DOI: 10.3923/jas.2009.521.527

Supply Chain Demand Forecasting; A Comparison of Machine Learning Techniques and Traditional Methods

J. Shahrabi, S. S. Mousavi and M. Heydar

How to cite this article

J. Shahrabi, S. S. Mousavi and M. Heydar, 2009. Supply Chain Demand Forecasting; A Comparison of Machine Learning Techniques and Traditional Methods. Journal of Applied Sciences, 9: 521-527.

Keywords: Artificial neural networks, support vector machine, demand forecasting and bullwhip effect

REFERENCES

Year: 2009 | Volume: 9 | Issue: 3 | Page No.: 521-527
DOI: 10.3923/jas.2009.521.527