**Yong Cao**

Institute of Computer Science and Engineering, University of Electronic Science and Technology of China

Qing-Xin Zhu

Institute of Computer Science and Engineering, University of Electronic Science and Technology of China

Institute of Computer Science and Engineering, University of Electronic Science and Technology of China

Qing-Xin Zhu

Institute of Computer Science and Engineering, University of Electronic Science and Technology of China

A forecasting method of software failure using fractals is proposed in this study. The empirical failure data (three data sets of Musa’s and one of NTDS) are used to demonstrate the performance of the reliability forecasting. Compared with other method, our method is very effective. It should be noticed that the analyses and research methods in this study are differ from the conventional methods in the past and a new idea for the research of the software failure mechanism is presented.

PDF Abstract XML References Citation

Yong Cao and Qing-Xin Zhu, 2010. The Software Reliability Forecasting Method Using Fractals. *Information Technology Journal, 9: 331-336.*

**DOI:** 10.3923/itj.2010.331.336

**URL:** https://scialert.net/abstract/?doi=itj.2010.331.336

Software reliability, namely the capability that a given component or system within a specified environment will operate correctly for a specified period of time, has been one of the most important qualities. Software-reliability models are used for the assessment and improvement of reliability in software systems and failure analysis is an important part of the research of software reliability. The important problem of the software reliability model is to calculate and predict the next failure time in advance. It was treated as random and statistical problem in the past and the modeling methods are usually based on probability statistics. The famous models are Jelinsky-Moranda based on time measurement, Littlewood and Goel-Okumoto etc.

A few papers have used time-series analysis in the construction of Software reliability models, by treating reliability growth data as an auto-regressive process. In literature (Lyu, 1996) a time series of software reliability data was assumed to arise from a power-law process

(1) |

where, T_{i} is the time to the ith failure, θ(i) is an unknown constant and δ_{i} is a random coefficient. By assuming that the T_{i}s and δ_{i}s are lognormally distributed, we can take the logarithm of both sides and obtain a first-order auto-regressive process.

A more complex time-series model based on an auto-regressive integrated moving average process is presented in literature (Khoshgoftaar and Szabo, 1995). This study also stands out because a small number of software complexity metrics are integrated into the model along with reliability growth data.

Recently, along with the development of statistical theory, Bayesian approach has been applied in forecasting software reliability. El-Aroui and Soler (1996) first employed Bayesian statistical model to predict software reliability. They assumed the successive times between software failures follow the exponential distribution. Pham and Zhang (2003) proposed a software reliability forecasting model based on NHPP approaches. The experimental results indicate that proposed models have better goodness-of-fit than other exist NHPP models. Su and Huang (2006) showed how to apply **neural networks** to predict software reliability. Further they made use of the **neural network** approach to build a Dynamic Weighted Combinational Model (DWCM). The study of Kanmani *et al*. (2007) showed the **neural network** models which were applied for predicting faults in object-oriented software to be performing much better than the statistical methods.

The term fractal, which is a geometrical concept and means broken or irregular fragments, was originally coined by Mandelbrot to describe a family of complex shapes that possess an inherent self-similarity or self-affinity in their geometrical structure. It was generally believed by mathematicians and scientists that such complex natural phenomena were almost beyond rigorous description. Fractals are mathematical or natural objects that are made of parts similar to the whole in some ways. A fractal has a self-similar structure that occurs at different scales. For example, a small branch of a tree looks like the whole tree due to the existence of branching structures.

The word Fractal describes a new system of mathematics and Fractal objects are Self-Similar under some change in scale, either strictly, or statistically. For Strictly Self-Similar Fractals do not change their appearance significantly when viewed under a microscope of arbitrary magnifying power, whereas for statistically self-similar fractals, when a small portion of it is magnified, results into a fractal, which is seemingly but not exactly similar to the original fractal itself. Fractal objects, by definition, contain infinite datai1 i.e., they contain the same degree of detail in each part as is contained in the entire object, no matter how many times sections of it are enlarged.

Fractals can be applicable in depicting natural complexities, which have been applied to biology, geophysics, solid physics, chemistry and astronomy etc. It is also well known that fractals have been applied to analyze and prediction of some random affairs like earthquakes. They are also used in random movements like random walk or Brownian motion so that we can discover the character of fractional dimension. Qin *et al*. (2006) proposed a burst detection algorithm over data streams through building monotonic search space based on fractals. Theoretical analysis and experimental results show that this algorithm can achieve a higher precision with less space and time complexity as compared with the existing methods. Durga *et al*. (2008) proposed a filter-based feature selection method based on Correlation Fractal Dimension (CFD) discrimination measure and successfully applied the proposed algorithm on a challenging classification problem in bioinformatics.

Software failures can also be treated as random events and fractal can be applied to software failure prediction. We may investigate the fractal relationship between the cumulative time of software failure and the accumulative number of software failure in time series.

**POWER LAW IN FRACTALS**

A power law is a relationship between two scalar variables x and y, which can be written as follows:

(2) |

where, C is the constant of proportionality and k is the exponent of the power law. Such a power law relationship shows as a straight line on a log-log plot since, taking logs of both sides, the above equation is equivalent to

(3) |

which has the same form as the equation for a straight line

(4) |

The equation f (x) = Cx^{k} has a property that relative scale change f (sx)/f(x) = s^{k} is independent of x. In this sense, f (x) is scale invariant or lacks a characteristic scale. Consequently, f (x) can be related to fractals because of its scale invariance. The k is called fractal dimension. For strictly self-similar fractals, two scalar variables x and y fit power law strictly described as Eq. 2, for statistically self-similar fractals, the x and y fit power law statistically and fractal dimension k is slope of linear regression on a log-log plot described as Eq. 2.

**THE SOFTWARE RELIABILITY FORECASTING METHOD USING FRACTALS**

Time series are able to characterize the prediction of software systems. The ith software failure can be described as pair (the ith failure, the ith failure time), namely the accumulative number of software failure and the cumulative time of corresponding software failure.

Fractals can be measured by dimensional measures, such as the Hausdorff dimension etc. The fractal dimension is often noninteger and smaller than the embedding topological dimension. Previously we always adopt scale variation method to analyze fractal dimension of data such as earthquakes etc. We select time section t as scale ε, divide the time section into some time subsections and take count of N (ε) which is the number of subsections the earthquake happen. Then we change the time section ε to obtain a new N (ε) and we repeat the above steps to obtain a series of ε-N (ε). We regard a ε-N (ε) pair of the series as a point in log-log coordinate and draw a log ε-log N (ε) graph to analyze the data. This method is inapplicable to analyzing the data of software failure time for its key is prediction of next failure time whereas the next failure time is non- determinate. There will make a lot of waste if we adopt scale variation method to analyze the data of software failure. According to Eq. 3 let y = N(r) = i which is the accumulative number of software failure and x = T_{i} which is the cumulative time of the ith software failure. The formula will be described as:

(5) |

Let d = 1/k. The Eq. 5 can be transformed into:

(6) |

(7) |

where, let s = C^{-d}. When software failures happen, if there exists fractal relationship between the cumulative time of software failure and the accumulative number of software failure, the time fractal dimension will be computed.

Fig. 1: | The double log coordinates of the cumulative time of software failure and the accumulative number of software failure log t -log i of Musa’s failure data set 1 |

(8) |

where, i_{j}, i_{k} denote the ordinal number of software failure and t_{ij}, t_{ik} denote the corresponding failure time.

At first, we compute double log coordinates of the cumulative time of software failure and the accumulative number of software failure log t-log i in Musa’s data set 1 (Musa *et al*., 1987), namely Appendix 1 (The results shown Fig. 1). The slope d of each beeline connects point pairs (3, 20), (3, 21), (3, 22),… (3, 80) is between 1.52-0.07 and 1.52+0.07. All points are almost in a beeline, where, d = 1/k and k is fractal dimension. It is observed easily that the there exists fractal relationship between the cumulative time of software failure and the accumulative number of software failure in part. Whether the relationship always exists need further experiments.

Prediction of scalar time-series {x (n)} refers to the task of finding an estimate (n+1)of the next future sample x (n + 1) based on the knowledge of the history of the time-series. Introducing a general nonlinear function f (.): R^{N}→R applied to the vector X (n) = [x (n), x (n-1),..., x (N-1))]^{T} of past samples, we arrive at the nonlinear forecasting approach:

(9) |

In this way, we adopt method of sliding window to compute. The prediction algorithm of sliding-window fractal model of software failure will be described as follow:

**Algorithm 1:**

Begin

Initialization: suppose the size of slide window m, k=1 and A is a array of the number of failure corresponding failure time;

Repeat for i=k to m+k-1 { | |

B(i)=log(A(i));/*the logarithm of practical failure time | |

in the slide window.*/ | |

C(i)=log(i);/*the logarithm of failure number in the | |

slide window.*/ | |

} |

• | According to Eq. 5 and method of linear regression, compute the slope of linear regression in the slide window b = d = 1/k and constant a = log(s) = -dlog (C) |

• | Make a prediction of next point out of the slide window using the above coefficients b and a |

• | Add the practical failure time of the next point to A; k++; /*the slide window move backwards.*/ |

Until test over

End

After software failure happens, maintenance personnel will repair software system, correct mistake and reliability of software will be changed. The fractal dimension k will change and d will also change. Apparently in the sliding window the double log coordinates of the cumulate time of software failure and the accumulative number of software failure log t-log i preserve linearity well. Although with the sliding of window the slope that we use linear regression to obtain will change, the linearity still exists. When window slides, the points in the anterior window and the points in the posterior window may not be in a beeline approximately, but the points in same window are in a beeline approximately. We can see that forecasting data almost fits actual data in Fig. 2. Next we will compare our method with other methods through experiments.

The forecasting algorithm and a one-step-ahead forecasting policy are applied in four examples. The software failure data are obtained from Musa’s data set 1 (Appendix 1), Musa’s data set 2 (Appendix 2), Musa’s data set 3 (Appendix 3) and NTDS data set (Appendix 4) (Musa *et al*., 1987; Singpurwalla and Soyer, 1985; Jelinsky and Moranda, 1972). The performance of the proposed model is compared with normal distribution (Singpurwalla and Soyer, 1985), Kalman filter (Singpurwalla and Soyer, 1985), adaptive Kalman filter (Singpurwalla and Soyer, 1985) and ARIMA (Khoshgoftaar and Szabo, 1995; Abdullah *et al*., 2008) forecasting methods.

Fig. 2: | Forecasting results of different models (ARIMA and Fractal) of Musa’s data set 1 (Appendix 1) of software failure (Sliding window size m = 5) |

Fig. 3: | Forecasting error of different models (Adaptive Kalman, ARIMA and Fractal) of Musa’s data set 1 (Appendix 2) of software failure (FFE stands for Fractal Forecasting Error, AKFE stands for Adaptive Kalman Forecasting Error and AFE stands for ARIMA Forecasting Error) |

The experimental results are shown in Fig. 2-5, Table 1 and 2. In Fig. 2 the forecasting data using fractals fit actual data better than ARIMA; in Fig. 3 85% of the forecasting error using fractals is less than 3%, which is better than ARIMA (58%) and Adaptive Kalman filter (73%); in Fig. 4, 82% of the forecasting error using fractals is less than 6%, which is better than ARIMA (62%) and Adaptive Kalman filter (69%); in Fig. 5, 79% of the forecasting error using fractals is less than 8%, which is better than ARIMA (48%) and Adaptive Kalman filter (65%). Obviously our method is effective. In the investigation, the values of mean absolute error:

Fig. 4: | Forecasting error of different models (Adaptive Kalman, ARIMA and Fractal) of Musa’s data set 2 (Appendix 3) of software failure (Sliding window size m = 3. FFE stands for Fractal Forecasting Error, AKFE stands for Adaptive Kalman Forecasting Error and AFE stands for ARIMA Forecasting Error) |

Fig. 5: | Forecasting error of different models (Adaptive Kalman, ARIMA and Fractal) of NTDS data set (Appendix 4) of software failure (sliding window size m = 3) |

Table 1: | Forecasting results of different models of Musa’s data set 1, 2 and NTDS data set |

Table 2: | Frocasting results of different models of Musa’s data set 3. Model I stands for normal distribution, Model II stands for Kalman filter and Model III stands for adaptive Kalman filter |

(10) |

Normal root mean square error

(11) |

where, T_{i} is the ith actual failure time and T_{i} is prediction time (Table 1). Similarly, mean absolute error of interval time

(12) |

where, n is the number of prediction periods, d_{i} is actual value of period i (actual interval time between the ith failure and (i+1)th failure) and is prediction value (Table 2).

Reliability is one of the most important qualities of software and the important problem of the software reliability model is to calculate and predict the next failure time in advance. This study analyzes the empirical failure data, discovers the fractal relationship between the cumulative time of software failure and the accumulative number of software failure and proposes the software reliability method using fractals to predict the next software failure time which almost fit the actual failure time. The analyses of empirical data prove our method effective in comparison with other methods. We will research the mechanism behind fractals further in future research.

The study described in this research was supported by a grant from the National Science Foundation of China (NSFC) (60671033) and the Doctoral Foundation of the Ministry of Education of China (2006061405).

**APPENDIX**

Appendix 1: | The Musa’s data set 1 of failure time series and from left to right the time in each cell denotes the cumulative time of the ith software failure, i = 1, 2, ……. |

Appendix 2: | The Musa’s data set 2 of software failure time series, and from left to right time in each cell denotes interval time between the ith failure and the (i+1)th failure, i = 1, 2, ……. |

Appendix 3: | The Musa’s data set 3 of software failure time series, and from left to right time in each cell denotes interval time between the ith failure and the (i+1)th failure, i = 1, 2, …… . |

Appendix 4: | The NTDS data set of software failure time series and from left to right the time in each cell denotes the cumulative time of the ith software failure, i = 1, 2, ……. |

- El-Aroui, M.A. and J.L. Soler, 1996. A Bayes nonparametric framework for software-reliability analysis. IEEE Trans. Reliab., 45: 652-660.

CrossRef - Singpurwalla, N.D. and R. Soyer, 1985. Assessing (software) reliability growth using a random coefficient autoregressive process and its ramifications. IEEE Trans. Softw. Eng., SE-11: 1456-1464.

CrossRef - Pham, H. and X. Zhang, 2003. NHPP software reliability and cost models with testing coverage. Eur. J. Opert. Res., 145: 443-454.

CrossRefDirect Link - Khoshgoftaar, T.M. and R.M. Szabo, 1995. Investigating ARIMA models of software system quality. Softw. Qual. J., 4: 33-48.

CrossRef - Su, Y.S. and C.Y. Huang, 2006. Neural-network-based approaches for software reliability estimation using dynamic weighted combinational models. J. Inform. Softw. Technol., 80: 606-615.

CrossRef - Qin, S.K., W.N. Qian and A.Y. Zhou, 2006. Fractal based algorithms for burst detection over data streams. J. Softw., 17: 1969-1979.

CrossRefDirect Link - Abdullah, S., M.D. Ibrahim, A. Zaharim and Z.M. Nopiah, 2008. Statistical analysis of a nonstationary fatigue data using the ARIMA approach. WSEAS Trans. Math., 7: 59-66.

Direct Link - Durga, B.S., Sobha, T. Rani and R.S. Bapi, 2008. Feature selection using correlation fractal dimension: Issues and applications in binary classification problems. J. Applied Soft Comput., 8: 555-563.

CrossRef - Kanmani, S., V.R. Uthariaraj, V. Sankaranarayanan and P. Thambidurai, 2006. Object-oriented software fault prediction using neural networks. Inform. Software Technol., 49: 483-492.

CrossRefDirect Link