HOME JOURNALS CONTACT

Information Technology Journal

Year: 2009 | Volume: 8 | Issue: 5 | Page No.: 735-742
DOI: 10.3923/itj.2009.735.742
Modeling of Software Fault Detection and Correction Processes Based on the Correction Lag
Yanjun Shu, Hongwei Liu, Zhibo Wu and Xiaozong Yang

Abstract: This study presents a software reliability growth model integrating the fault detection process with the fault correction process. Although, a few research projects have been devoted to the modeling of these two processes, most of them studied the correction lag from a theoretical viewpoint of time delay. In this study, the correction lag is characterized by the remaining uncorrected faults which can be clearly observed from the actual data. Through analyzing its varying trend, the Gamma curve is found to be appropriate in representing the correction lag function. Then, the proposed model is derived. Two real data sets of software testing are used to evaluate models. Experimental results indicate that the proposed model not only provides better performance than other models on both fault detection and correction processes, but also does better in describing the correction lag. Finally, a revised software cost model is presented based on the proposed model. From the analysis on the determination of software release time, the new cost model shows more practical than the traditional approach.

Fulltext PDF Fulltext HTML

How to cite this article
Yanjun Shu, Hongwei Liu, Zhibo Wu and Xiaozong Yang, 2009. Modeling of Software Fault Detection and Correction Processes Based on the Correction Lag. Information Technology Journal, 8: 735-742.

Keywords: Fault detection process, correction lag, software reliability growth model, fault correction process and software release time

INTRODUCTION

As one of the most important aspects of software quality, software reliability is often defined as the probability of failure-free software operation for a specified period of time in a specified environment (Musa et al., 1987; Lyu, 1996; Gokhale et al., 2004; Williams, 2006). Many Software Reliability Growth Models (SRGMs) have been developed to describe the error-detection process and estimate the growth reliability of products during software development process (Musa et al., 1987; Lyu, 1996; Pham and Zhang, 1999; Zhao et al., 2006). Most existing SRGMs consider only the software fault detection process in the testing stage, assuming immediate fault correction activity (Schneidewind, 2001; Xie et al., 2007; Huang and Huang, 2008). However in practice, each detected fault is reported, located and then corrected. The software fault correction process plays a significant role in improving software reliability and ensures software quality (Schneidewind, 2003; Huang and Lin, 2006). Thus, the fault correction process cannot be simply treated as an immediately achieved process. In addition to predicting and assessing software failure occurrence, it is also m eaningful for analyzing and modeling the software fault correction.

Recently, some researchers have emphasized the great importance of fault correction process modeling. Since, the fault correction process is usually lagging behind the fault detection process, the correction lag is a common phenomenon in the software testing. Schneidewind (2001, 2003) first modeled the fault correction process as a constant delay fault detection process. Within this framework, some extensions have been proposed by substituting various time-dependent delay functions for the constant delay (Huang and Lin, 2006; Lo, 2007; Xie et al., 2007). The introduction of time-delay functions is very useful to separate the fault correction from the fault detection. It can help people find more useful models. Wu et al. (2007) summarized several kinds of time-delay distribute functions in constructing the fault correction process models. Following the similar idea of time delay, some other attempts have also been made to model the fault detection and correction process. For instance, Huang and Huang (2008) incorporated server queuing models into the fault correction modeling by considering imperfect debugging; Gokhale et al. (2004, 2006) modeled the fault detection and correction processes with a non-homogeneous Markov chain. Lo and Huang (2006) proposed a framework to integrate two processes by the simultaneous differential equations. Kupar et al. (2008) indicated that there are two types of detected faults in the software which need different testing effort for correction. They used the simultaneous differential equations to integrate the two processes.

In the work mentioned earlier, the correction lag has been treated as a time delay between the fault detection and correction processes. It is represented by many kinds of time-delay functions. However, these time-delay functions have different forms because they are established under different assumptions. In reality, the time to remove a fault depends on many factors such as the number of the detected faults, the complexity of the software structure, the skill of the correction personnel, etc. It seems difficult to choose an accurate time-delay function to describe the correction lag in various testing circumstances. In reality, as a fault must be corrected and removed after its detection, the number of corrected faults is naturally depended on the number of detected faults. Since the correction lag cannot be ignored, some detected but not corrected faults exist in the software. These latent faults are caused by the correction lag and reflect the relationship between the fault detection and correction processes. We can use them to integrate the two processes together.

In this study, the correction lag is defined as the difference between the number of detected faults and the number of corrected faults. We are going to model both fault detection and correction processes based on this lag.

THE CORRECTION LAG

The purpose of software testing is finding and correcting faults before the software product delivery to the customs. When a failure is detected, the fault correction personnel need a period of time to locate the fault and modify some codes to remove it. The correction lag is a common phenomenon in software testing. Some detected but not corrected faults exist in the software. We call them the remaining uncorrected faults in this study. These faults directly represent the correction lag between the fault detection and correction processes. The number of remaining uncorrected faults is the difference between the number of detected faults and the number of corrected faults, which is φ(t) = md(t) = mc(t).

Xie et al., 2007 used an actual set of data from the testing on a middle size software project to study the fault detection and correction processes with several kinds of time-delay distribute functions. Figure 1 plots the number of detected, corrected, remaining uncorrected faults of this data set. It is analyzed that the number of detected faults varies like the number of corrected faults means they are dependent on each other. Moreover, the number of remaining uncorrected faults is not constant, increasing from the beginning of software testing and decreasing to minimum value at the later stage. Musa et al. (1987) reported a data set of System T1 which includes the number of detected and corrected faults. System T1 was used for a real-time command and control application.

Fig. 1: The number of detected and corrected faults of a middle size software project

Fig. 2: The number of detected and corrected faults of system T1

Figure 2 depicts the number of detected and corrected faults of System T1. Figure 2 shows that the number of remaining uncorrected faults has an increasing and then decreasing scenario. Next, we will discuss about why the correction lag has such a varying trend.

At the beginning of software testing, there are many faults in the software and they are easy to be detected. While in this time, the effective debugging is not easy, the testing personnel need time to read and analyze the failure report to find out the root cause of failure (Huang and Lin, 2006). Thus, the correction process gradually falls far behind the fault detection process and the correction lag is increasing. As testing proceeds, Fig. 1 and 2 show that the correction lag is decreasing. Although, the number of remaining uncorrected faults may not be monotonously decreasing versus time, it is approximately reduced at the latter stage of testing. During this time, faults in the software are almost completely detected and the number of detected faults increases very slowly. Simultaneity, the fault correction personnel are engaging to correct each detected fault.

Thus, the difference between the number of detected faults and the number of corrected faults is decreasing. Generally speaking, although, some of non-critical detected faults might be postponed to be corrected until next release, most of detected faults should be corrected before software release. A large number of remaining uncorrected faults cannot be acceptable for both software developers and users. It is natural to see that the number of remaining uncorrected faults is decreasing to the minimum value at the end of testing process. From the analysis above, it can conclude that the number of remaining uncorrected faults has an increasing and then decreasing scenario. This feature of correction lag is also testified by experimental results from Huang’s server queuing models (Huang and Huang, 2008).

To characterize the varying trend of the remaining uncorrected faults, we use the Gamma curve and other curves (such as Weibull, Raleigh, Lognormal curves) are used as the correction lag function to fit it on these two data sets. Through simulation, we find out that the Sum of Squares Errors (SSE) value of the Gamma curve on these data sets are 93.89 and 169.48, which are the lowest SSE value compared with other curves. As the Gamma curve has the best goodness of fit, we select the Gamma curve as the correction lag function:

(1)

SOFTWARE RELIABILITY GROWTH MODELING

Here, the fault detection and correction processes are modeled based on Non-Homogeneous Poison Process (NHPP). The correction lag between fault detection and correction processes is characterized by the remaining uncorrected faults. As studied in the former section, the number of remaining uncorrected faults has an increasing and then decreasing feature and the Gamma curve is selected to represent it. The followings are the explicit assumptions for both fault detection and correction processes models:

The failure observation/fault correction phenomenon is modeled by NHPP
The software system is subject to failures during execution caused by faults remaining in the software
All faults are independent and equally detectable
The software failure rate at any time is affected by the number of faults remaining in the software at that time
Each time a failure occurs, the fault that caused it is not immediately removed because the correction lag φ exists in the debugging process

According to these assumptions, we have:

(2)

and

(3)

To solve the simultaneous differential equations simply and easily, we assume a(t), b(t) are constant, which means no fault is introduced and the failure rate is not changing during the software testing. So, Eq. 2 is changed into:

(4)

Equation 3 is directly deduced from assumption 5. It can be transformed as:

(5)

Substituting Eq. 5 into 4, we obtain a new differential equation as:

(6)

When c = 0, then

It means the correction lag does not exist in the software testing and the fault correction model is just equal to the fault detection model, md(t) = mc(t). In this case, by solving the Eq. 6 with marginal conditions md(0) = 0 and md(+∞) = a, we can get md(t) = mc(t) = mc(t) = mc(t) = a(l-e-bt), the model becomes the Goel-Okumoto (G-O) model, which is a fundamental SRGM. Otherwise c ≠ 0, by solving Eq. 6, we get a new SRGM for the fault detection process:

(7)

From Eq. 5 and 7, we can get the fault correction model which paired with the fault detection model as:

(8)

MODEL EVALUATION

Here, we will evaluate the proposed model for both fault detection and correction processes by using two real data sets. The first data set (DS1) is from a middle size software project reported by Xie et al., 2007 and the second data set (DS2) is from System T1 of the Roma Air Development Center project (Musa et al., 1987). Table 1 summarizes some existing SRGMs which modeled both fault detection and correction processes. These models are from Schneidewind et al. (2001, 2003), Xie et al., 2007, Lo, 2007and Huang and Huang (2008).

Parameter estimation: With the model which combined the fault detection process with the fault correction process, a software testing can be described more practically than conventional SRGMs. There are unknown parameters in the model and they can be estimated by using the method of least squares. Against observations of fault detection and correction processes, the parameters of the new model are estimated by minimizing the Sum of Square Residual (SSR) (Xie et al., 2007), which is defined as follows:

(9)

where, ydk and yck denote the cumulative number of detected and corrected faults collected until tk respectively, tk, k = 1,2,…, are the running times from the begging of testing.

Substituting md(t) and mc(t) into Eq. 9 with a data set which includes the fault detection and correction processes, then the equation of SSR is obtained. If we want to estimate the parameters of the proposed model on the data set, the proposed model is substituted into Eq. 9 and the minimum value of SSR can be deduced by solving the differential equations:

The parameters of the proposed model can be estimated as the results of the differential equations. Commonly, numerical procedures have to be developed in order to obtain the Least Squares Estimates (LSEs).

Criteria for model comparison: In the following comparison criteria expressions, every variable or function is used for both fault detection and correction processes:

Table 1: Summarization of models

The goodness of fit of the curve is measured by the sum of squares errors, SSE. Sum of Square residual SSE sums up the squares of the residuals between the actual data and the mean value function m(t) of each model in terms of the number of actual faults at any time. The SSE function can be expresses as follows:

(10)

  where, yk is the total number of faults observed at time tk according to the testing time and is the estimated cumulative number of faults at time tk obtained from the fitted value function, k = 1,2,..n. Therefore, the lower the SSE value, the better the model performs
After the proposed model is fitted to the actual observed data, the deviation between observed and fitted values can also be evaluated by a using the Kolmogorov-Smirov (K-S) test. The Kolmogorov-Distance (KD) is defined as:

(11)

  Where, k is the sample size, F*(x) is the normalized observed cumulative distribution at x-point and F(x) is the expected cumulative distribution at x-point, based on the model
The capability of the model to predict failure behavior from present and past failure behavior is called predictive validity, which can be represented by computing the Mean value of Relative Errors (MRE) for a data set:

(12)

  Suppose Z failures have been observed by the end of testing time tz, we use the failure data up to time ti(ti≤tz) to estimate the parameters of m(t). Moreover, a lower value of MRE indicates a better performance of fitting the real software data

Experiments on DS1: We estimate the parameters of the selected models and the proposed models by using the LSEs method on DS1. The estimated parameters of all models are listed in Table 2. Table 3 gives the performance comparisons in terms of SSE, KD and MRE. Table 3, show the proposed model provides the lowest value of all criteria only except the KD value on the detection process. On the average, it can figure out that the proposed model gives a better fit and prediction in both detection and correction processes than other models. As can be seen from Table 3, the proposed model, Schneidewind model and Lo model do well in fitting this data set on both fault detection and correction processes. Figure 3 and 4 depict the estimated data by these three models. We can see that all of these three models have a good performance on DS1.

It is also calculate the estimated number of remaining uncorrected faults by using the proposed model, Schneidewind model and Lo model. Figure 5 graphically shows the actual and estimated number of remaining uncorrected faults versus time. The difference between detected and corrected faults is initially zero at the beginning of software testing and it is increasing rapidly as test proceeds. This phenomena means effective debugging is difficult. The testing personnel need time to analyze the failure report and find the root of failure. The correction process falls far behind the fault detection process and the remaining uncorrected faults increase. After reaching the maximum value, the number of remaining uncorrected faults is decreasing. At this time, faults in the software are almost completely detected, the detected fault number increases very slowly. On the other hand, the testing personnel master the whole software system better than before the detected faults can be corrected in time. So, the corrected fault number gets gradually closer to the detected fault number. The remaining uncorrected faults decrease quickly to the minimum value. All three models can characterize the increasing and then decreasing feature of remaining uncorrected faults. Furthermore, the proposed model is the best one for representing the correction lag and it also has the best goodness of fit on DS1. Therefore, it conclude that using the remaining uncorrected faults is a better way to integrate the fault detection and correction processes than using the time delay functions.

Table 2: Parameter estimation of different SRGMs on DS1

Table 3: Comparison results on DS1

Fig. 3: Detected fault data on DS1: fitted versus actual

Fig. 4: Corrected fault data on DS1: fitted versus actual

Experiments on DS2: Similarly, we also estimated the parameters of the proposed model and other selected SRGMs by using the methods of LSEs on DS2. The estimated parameters are listed in Table 4. Table 5 lists the comparisons results in terms of SSE, KD and MRE. The proposed model has the lowest value of SSE, MRE on both fault detection and correction processes.

Fig. 5: Actual/estimated remaining uncorrected fault for DS1

Fig. 6: Detected and corrected fault data on DS2: fitted versus actual

Fig. 7: Actual/estimated remaining uncorrected fault on DS

Although, the proposed model does not provide the lowest value of KD on the two processes, it also gives very close value to them. Figure 6 shows the estimated data by the proposed model versus the actual data set. It can see that the proposed model fit the data set very well in both detection and correction processes.

Table 4: Parameter estimation of different SRGMs on DS2

Table 5: Comparison results on DS2

Overall, the proposed model gives a better fit and prediction on DS2.

As can be seen from Table 4, the proposed model, Xie model and Lo model do well in fitting DS2. Figure 7 graphically shows the actual and estimated number of remaining uncorrected faults versus time. We can see that the proposed model can characterize the actual remaining uncorrected faults better than other two models. As can be seen from Table 4, the Xie model also fits DS2 very well and it provides a very close SSE value to the proposed model. But Fig. 7 shows that the Xie model cannot characterize the correction lag as well as the proposed model. This fact means the Xie model cannot fit the DS2 as precisely as the proposed model does. Hence, the relationship between the fault detection and correction processes can be more accurately described by the proposed model from the viewpoint of fault amount.

SOFTWARE RELEASE PROBLEM

Although, testing is an efficient way to find and eliminate the faults in software systems, the exhaustive testing is impractical in fact. The software developers need to decide when to stop the testing and release the software to customers. As software reliability growth models can capture the quantitative aspects of the testing, they are used to provide a reasonable software release time. Some researchers have studied the release time problem from the cost-benefit viewpoint and they suggested that the optimal release time T* can be obtained by minimizing the expected total software system cost (Lyu, 1996; Pham and Zhang, 1999, 2003; Huang and Lyu, 2005). For instance, Pham and Zhang (1999) proposed a generalized software cost model including removal cost, warranty cost and risk cost, which is:

(13)

where, c1 is the software test cost per unit time, c2 is the cost of correcting a fault per unit time during testing period, c3 is the cost of correcting a fault per unit time during the warranty period, c4 is the loss due to software failure, C(T) is the expect total cost of software systems at time T, α is the discount rate of the testing cost, uy is the expected time to correct a fault during the testing period, uw is the expected time to correct a fault during the warranty period; Tw is the period of warranty time and R(x/T) is the reliability function of software by time T for a mission time x which can be calculate by the SRGMs and it is expressed as:

R(x/T) = e-[m(T+x)-m(T)]

The conventional SRGMs, m(T), only describe the fault detection process, which is represented by md(T) in this study. As the proposed SRGM models both fault detection and correction processes, the traditional cost function C(T), can be revised to be more practical by incorporating the fault correction process model mc(T):

(14)

where, mc(T) is the expected number of corrected faults at the time of release T, md(T+Tw)-mc(T) is the expected number of corrected faults during the warranty period, which includes two components: undetected faults md(T+Tw)-md(T) and remaining uncorrected faults md(T+Tw)-mc(T).

The latter component represents the correction lag between the fault detection and correction processes which has been specifically discussed in our study and it exists in most software testing. These detected but uncorrected faults remain in the software even after the software release for some potential reasons, such as the limited testing time or the testing resource, the detected fault is uncritical, etc. The correction of these detected faults will cost more efforts in the operation period. We can conclude that the revised software cost model C’(T), incorporates the fault detection and correction processes and reflect the correction lag phenomena, so it is very practical.

Pham and Zhang (1999) used DS2 to show how to determine the optimal release time by their generalized software cost model C(T) with the conventional SRGMs. They considered the coefficients as c0 = 50, c1 = 700, c2 = 60, c3 = 3,600, c4 = 50,000, Tw = 20, uy = 0.1, uw = 0.5, x = 0.05, α = 0.95. Substituting the proposed fault detection process model md(t) to Eq. 13 with the estimated parameters on DS2, = 138.65, =0.1351 c = 5.44, =8.65, we get that T* is 30.1 h and the lowest cost is $22,760 by minimizing C(T).

Fig. 8: Plot of the total cost

The corresponding cost function is plotted as a dash line in Fig. 8.

To show the difference in using the new software cost model, we also substitute the proposed model for both fault detection process and fault correction process, md(t) and mc(t), into the revised cost function C’(T). With the same coefficients and estimated parameters value, the optimal release time can be obtained as T’* = 33.5 h with the lowest cost of $24,538. The revised cost function is also plotted as a solid line in Fig. 8. From the numbers and the figure, we can find out that a later optimal release time and a larger total cost are provided by the revised cost model. This result shows the effects of the correction lag. In reality, people need more time and recourse to perform the fault correction activity. As the traditional cost model does not consider the correction lag, it is optimistic in determining the software release time.

CONCLUSIONS

The fault correction process plays a significant role in ensuring the software quality and it always lags behind the fault detection process. In this study, the correction lag between the fault detection and correction processes was directly studied from a novel viewpoint of fault number. We investigated the number of remaining uncorrected faults on real data sets and found that it had an increasing and then decreasing feature. The Gamma curve was appropriate in characterizing the remaining uncorrected faults. Then we modeled the fault detection and correction processes based on the correction lag. Numerical examples showed that the proposed model successfully described the two processes and the comparison results also indicated that using the remaining uncorrected faults to integrate the two processes was more effective than using the time delay functions. Furthermore, we studied the software release problem. The proposed SRGM was applied to extend the software cost model and a more reasonable software release time was achieved by considering the effect of the correction lag.

ACKNOWLEDGMENTS

The work described in this study is jointly supported by Hi-Tech Research and Development Program of China (2008AA01A201) and National Nature Science Foundation of China (60503015).

NOTATIONS

MVF : Mean value function
SSE : Sum of squared errors
MRE : Means of relative errors
NHPP : Non-homogeneous Poison process
SRGMs : Software reliability growth models
md(t) : The expected number of faults detected in time (0, t)
mc(t) : The expected number of faults corrected in time (0, t)
φ(t) : The number of remaining uncorrected faults at time t
a : Expected number of initial faults
b : Fault detection rate
yk : No. of actual faults observed at time tk

REFERENCES

  • Gokhale, S., M.R. Lyu and K.S. Trivedi, 2004. Analysis of software fault removal policies using a non-homogeneous continuous time Markov chain. Software Qual. J., 12: 211-230.
    CrossRef    Direct Link    


  • Gokhale, S., M.R. Lyu and K.S. Trivedi, 2006. Incorporating fault debugging activities into software reliability models: A simulation approach. IEEE Trans. Reliab., 55: 281-292.
    CrossRef    Direct Link    


  • Huang, C. and M.R. Lyu, 2005. Optimal release time for software systems considering cost, testing-efforts test efficiency. IEEE Trans. Reliab., 54: 583-591.
    CrossRef    Direct Link    


  • Huang, C. and C. Lin, 2006. Software reliability analysis by considering fault dependency and debugging time lag. IEEE Trans. Reliab., 55: 436-450.
    CrossRef    Direct Link    


  • Huang, C.Y. and W.C. Huang, 2008. Software reliability analysis and measurement using finite and infinite queuing model. IEEE Trans. Reliab., 57: 192-203.
    CrossRef    Direct Link    


  • Kupar, P.K., D.N. Goswami, A. Badham and O. Singh, 2008. Flexible software reliability growth model with testing effort dependent learning process. Applied Math. Model., 32: 1298-1307.
    CrossRef    Direct Link    


  • Lo, J.H., 2007. Effect of the delay time in fixing a fault on software error models. Proceedings of the 31st Annual International Computer Software and Application Conference, July 23-27, 2007, Beijing, China, pp: 711-716.


  • Lo, J.H. and C.Y. Huang, 2006. An integration of fault detection and correction processes in software reliability analysis. J. Syst. Software, 79: 1312-1323.
    CrossRef    Direct Link    


  • Lyu, M.R., 1996. Handbook of Software Reliability Engineering. McGraw-Hill and IEEE Computer Society, New York, ISBN: 0-07-039400-8


  • Musa, J.D., A. Iannino and K. Okumoto, 1987. Software Reliability Measurement, Prediction and Application. McGraw-Hill, New York, ISBN: 0-07-044093-X


  • Pham, H. and X. Zhang, 1999. A software cost model with warranty and risk costs. IEEE Trans. Comput., 48: 71-75.
    CrossRef    Direct Link    


  • Pham, H. and X. Zhang, 2003. NHPP software reliability and cost models with testing coverage. Eur. J. Opert. Res., 145: 443-454.
    CrossRef    Direct Link    


  • Schneidewind, N.F., 2001. Modeling the fault correction process. Proceedings of the 12th International Symposium on Software Reliability Engineering-ISSRE, November 28-30, 2001, Hong Kong, China, pp: 185-190.


  • Schneidewind, N.F., 2003. Fault correction profiles. Proceedings of the 14th International Symposium on Software Reliability Engineering-ISSRE, November 28-30, 2003, Denver, USA., pp: 257-267.


  • Williams, D.R.P., 2006. Prediction capability analysis of two and three parameters software reliability growth models. Inform. Technol. J., 5: 1048-1052.
    CrossRef    Direct Link    


  • Wu, Y.P., Q.P. Hu, M. Xie and S.H. Ng, 2007. Modeling and analysis of software fault detection and correction process by considering time dependency. IEEE Trans. Reliab., 56: 629-642.
    CrossRef    Direct Link    


  • Xie, M., Q.P. Hu, Y.P. Wu and S.H. Ng, 2007. A study of the modeling and analysis of software fault-detection and fault-correction processes. Qual. Reliab. Eng. Int., 23: 459-470.
    CrossRef    Direct Link    


  • Zhao, J., H.W. Liu, G. Cui and X.Z. Yang, 2006. Software reliability growth model with change-point and environment function. J. Sys. Software, 79: 1578-1587.
    CrossRef    Direct Link    

  • © Science Alert. All Rights Reserved