Subscribe Now Subscribe Today
Research Article
 

A Methodology for Analyzing the Transient Availability and Survivability of a System with Every Combination of Components by Using Fault Tree



Maghsoud Amiri, Farhad Ghassemi-Tari, Mohsen Rahimi Mazrae Shahi, Jamshid Salehi Sadaghiani and Ali Mahtasshami
 
Facebook Twitter Digg Reddit Linkedin StumbleUpon E-mail
ABSTRACT

The main purpose of this study is to offer a new method for transient analysis of availability and survivability of a system with the identical components and one repairman and with every combination of components either standard system for example series systems, parallel systems, stand-by systems and K out of N systems or complex system. This method is a technique for fault tree evaluation too. The considered system is supposed to consist of n components and there are some composition of them that systems the failure occurs when one of its composition occur. Some concepts such as fault tree, Markov models, Eigen vectors and Eigen values are employed for analyzing the transient availability and survivability of the system. By reason of using fault tree analysis, the new method is useful for large systems where high reliability is required and where the design is to incorporated many layers of protection such as in nuclear reactor systems. The method is implemented through an algorithm which is tested in MATLAB programming environment. The new method enjoys a stronger mathematical foundation and more flexibility for analyzing the transient availability and survivability of the system.

Services
Related Articles in ASCI
Similar Articles in this Journal
Search in Google Scholar
View Citation
Report Citation

 
  How to cite this article:

Maghsoud Amiri, Farhad Ghassemi-Tari, Mohsen Rahimi Mazrae Shahi, Jamshid Salehi Sadaghiani and Ali Mahtasshami, 2009. A Methodology for Analyzing the Transient Availability and Survivability of a System with Every Combination of Components by Using Fault Tree. Journal of Applied Sciences, 9: 1074-1081.

DOI: 10.3923/jas.2009.1074.1081

URL: https://scialert.net/abstract/?doi=jas.2009.1074.1081
 

INTRODUCTION

Loss prevention is in large part the application of probabilistic methods to the problem of failure in the process industries. The discipline which is concerned with the probabilistic treatment of failure in systems in general is reliability engineering. Reliability has been a major concern for the system designers. Many systems consist of components having various failure modes. Several authors have considered a k-out-of-n system subject to two failure modes. Among those, Moustafa (1996) presented Markov models for analyzing the transient reliability of k-out-of-n: G systems subject to two failure modes. He proposed a procedure for obtaining closed form of the transient probabilities and the reliability for non-repairable systems. Moustafa (1998) extended his work by providing a set of simultaneous linear differential equations for repairable and non-repairable of two different k-out-of-n: G systems subject to M failure modes. In his paper numerical solutions for the reliability of the repairable systems were discussed, and closed formula for solutions of the reliability for the non-repairable systems were presented. Another research effort is the work of Pham and Pham (1991) which has considered [k, n-k+1]-out of-n: F systems subject to two failure modes. Shao and Lamberson (1991) presented a model for k-out-of-n: G system with load sharing.

Zhang et al. (2000) presented circular consecutive 2-out-of-n repairable system with one repairman. They determined rate of occurrence of failure, mean time between failures, reliability and mean time to first failure. Li et al. (2006) presented a k-out-of-n system with independent exponential components. They assigned that some working components are suspended as soon as the system is down, repair starts immediately when a component fails and repair times are independent and exponentially distributed. Also they determined mean time between failures, mean working time in a failure repair cycle and mean down time in a failure-repair cycle.

Another attempt is the study conducted by Sarhan and Abouammoh (2001), who applied the concept of shock model to derive the reliability function of a k-out-of-n non-repairable system with non-independent and non-identical components. Later-Gohary and Sarhan (2005) extended Sarhan and Abouammoh (2001) work by proposing a Bayes estimator for of a three non-independent and non-identical component series system under the condition of four sources of fetal shock. They support their estimation method by presenting a simulation study and showed how one can utilize the theoretical results obtained in their study.

Azaron et al. (2006) introduced a new methodology, by using continuous time Markov processes and shortest path technique, for the reliability evaluation of an L-dissimilar-unit non-repairable cold-standby redundant system. Amiri and Ghassemi (2007a) introduced a method for transient analysis of availability and survivability of a system with repairable components using Markov models, eigen values and eigen vectors. The considered system was supposed to consist of n identical components and k repairmen which components are arranged in series or in k-out-of-n or in parallel. they proposed a methodology for obtaining availability, survivability, MTTFs (Mean time to system failure) of the system and calculating the duration for the system to reach to its steady state. Amiri and Ghassemi (2007b) introduced a method for analyzing the transient reliability of systems with identical components and identical repairmen using Markov models, eigen values and eigen vectors. They assumed that The components of the systems under consideration can have two distinct configurations, namely; they can be arranged in series, or in parallel. they also considered third case in which the system is up (good) if k-out-of-n components are good. For all three cases they proposed a procedure for calculating the transient probability of the system availability and the duration of the system to reach the steady state. Amiri et al. (2008) introduced a methodology for analyzing the transient availability and survivability of a system with the standby components in two cases: the identical components and the non-identical components. They assumed that in the standby systems the whole components are not employed at the moment, it means that in each moment there is just one part or component that is employed, and as soon as the failure of the operating component, the system switch on another well component.

In major of actual systems composition of components is more complex and there are combinations of component failures which if they all occur, will cause main undesired event to occur.

A Markov model is a model of the probabilities of different states of a system as a function of time. It therefore has two variables, state and time. The state of a system can generally be defined in a number of ways. Thus for a two equipment system with equipments of 1 and 2, one set of states is as follows: no equipment failed, one equipment (either 1 or 2) failed, both equipment failed. Another set of states is as follows: no equipment failed, equipment 1 failed, equipment 2 failed, both equipment failed. The transition in the system may be in the forward direction only or they may be in both forward and backward directions. The transition rates are defined by the states chosen. It is generally desirable to choose the states so that the transition rates correspond to quantities which are known, such as failure and repair rates, the transition rate of a Markov model is a constant (Bolch et al., 2006).

A fault tree is a logic diagram depicting certain events that must occur in order for other events to occur. The events are faults if they are initiated by other events and are termed failures if they are the basic initiating events. The fault tree interrelates events (faults to faults or faults to failures) and certain symbols are used to depict the various relationships. The basic symbol is the gate and each gate has inputs and an output. The two basic gate categories are OR-gate and the AND-gate. Because these gates relate events in exactly the same way as the Boolean operations, there is a one-to-one correspondence between the Boolean algebraic representation and the fault tree representation (Vesely et al., 1980). Fault tree analysis is used for large systems where high reliability is required and where the design is to incorporate many layers of protection, such as in nuclear reactor systems. A fault tree is a graphical representation of the fault paths and logic of a system and is of value as such. There are, however, a number of methods of fault tree evaluation for example the minimal cut sets method, the gate-by-gate method and Monte Carlo method (Frank Lees, 1996).

In this study we present a method for transient analysis of availability and survivability of a system with repairable components that this method is a technique for fault tree evaluation too. In this study we use from concepts of fault tree, Markov models, eigen values and eigen vectors. The considered system is supposed to consist of n identical components and one repairman. We propose a methodology for obtaining availability, survivability, MTTFs (Mean time to system failure) of the system and calculating the duration for the system to reach to its steady state. This method is useful for both standard system series system, parallel systems and stand-by systems and complex systems.

NOMENCLATURE AND DEFINITIONS

X (t)=State of possible combination failed components at time t
(1)

pn (t)=Probability of having State of nth at time t; pn (t) = P (X(t) = n).
(2)

A (t) = Probability of system to be up (good) at time t, regardless of its historical components failure and/or repair.
A (∞) = Long time system availability or system reliability.
Rs(t) = Survivability function:

Determines the probability that a system does not leave the set B of functioning states during the time interval (0, t];

(3)

MTTFs: Mean time to system failure;

(4)

Definition 1: If we consider Q as the state transient rate matrix and P(t) as the state transient probability in the exponential Markov chain with the continuous time, then we have:

(5)

In which Q and P(t) are square matrixes and, where Pn(t) and Pn(0) are row vectors.

THE MODEL

The aim of this study is to determine the availability, survivor function and MTTF of a system with the following assumptions:

The system consists of identical and independent components
Combination of the components can be simple and standard (series system, parallel systems, …) or very complex
The components of system are repairable
The system consists of one repairman
The life time of each component is exponentially distributed with the parameter λ
The service time of each component by repairman is exponentially distributed with the parameter μ

THE PROPOSED METHODOLOGY

To describe the proposed methodology for analyzing the system’s transient reliability, survivability of the system and the time until the system is reached to its steady state. Consider a system having n identical components and one repairman. The components can be arranged in any structure. Our methodology can analyze many distinct cases. First we must define main undesired event (top event) in system and the system is then analyzed in the context of its environment and operation to find all credible ways in which the undesired event can occur. The fault tree is a graphic model of the various parallel and sequential combinations of faults that will result in the occurrence of the predefined undesired event. The faults can be events that are associated with component hardware failure, human errors or any other pertinent events which can lead to the top event. A FT thus depicts the logical interrelationships of basic events that lead to the undesired event which is the top event of the fault tree. It is important to point out that a FT is not in itself a quantitative model. It is a qualitative model that can be evaluated quantitatively and often is. In the next phase we must determine minimal cut sets of the FT. A minimal cut set is a smallest combination of component failure witch, if they occur, will cause the top event to occur. A minimal cut set is thus a combination of primary events sufficient for the top event. If one of the failures in the cut set does not occur, then the top event will not occur (by this combination).

It is assumed that the time between two components failure is a random variable having the exponential distribution with the parameter λ. It is also assumed that there is one repairman providing services to the system. The service time of a component is also an exponentially distributed random variable with the parameter μ. A system with n component has 2n state that some of them is impossible for occur because we assume that when system failed by a combination failed components, then no component failed and only they can failed in operation mode and in stand-by mode no failure occur. Some of these possible combination failed components are terminal combination; a terminal combination is a possible combination failed components that certainly include one minimal cut set of the system.

Set of possible combination failed components = Fp = {A1, A2, A3, …., Ak} k≤2n
Set of terminal combination failed components = FT = {A1, A2, A3,…, Am} m≤k

Considering X(t) as the state of possible combination failed components at time t, As an example if we assume a system with 3 component (A,B,C) that its minimal cut sets are C and A.B then the related Markov model is represented in Fig. 1.

According to the above explanation FP and FT in this case are:

Fig. 1: State transition diagram of the system with 3 components by minimal cut sets of C and A.B

set of possible combination failed components = Fp = {0, A, B, C, AB, AC, BC}
set of terminal combination failed components = Fp = {C, AB, AC, BC}

The proposed methodology for obtaining the system availability and the transient probabilities are based on several theorems. These theorems are established to provide the underlying theory of the methodology. We first present these theorems as the following:

Theorem 1: Let us consider a continuous time exponential Markov chain in which P’0 (t) = P (t)·Q, then we have:

P (t) = e Q·t
(6)

Pn (t) = Pn (0)·eQ·t

Proof

P’(t) = P(t)·Q ⇒
⇒ InP (t) = Q · t + C ·I ⇒ eInp(t) = eQ ·t ·eC ·I ⇒ P (t) = eQ ·t ·eC ·I

where, I is an identity matrix. Since P(0) = I then we have P (t) = eQ·t. By Definition 1 we will have:

Pn (t) = Pn (0)· eQ·t.

Consider the following theorem.

Let us consider Q as an nxn square matrix which has n non-repeating eigen values, then we have:

eQ·t = V·ed·t ·V-1
(7)

where, t represent time, V is a matrix of eigen vectors of Q, V-1 is the inverse of V and d is a diagonal eigen values of Q defined as follows:

And the matrix ed·t is as follows:

Theorem 2: Consider P (t) = eQ·t in which Q is the transition matrix. In matrix Q one of the eigen values is zero and the remaining eigen values are the complex number with the negative real part.

Proof: Since in every row of transition matrix the summation of row elements is zero, we can deduce that one its eigen value of matrix Q is zero. By theorem 1 and Eq. 7, we have

(8)

where, λk is the kth eigen value, αijk’s are constant values, and πj is the limiting probability.

Using the contradictory concept, if we assume that one of the eigen values of Q is a complex number with positive real part then we have:

Which contradicts and therefore the eigen values of Q are complex numbers with the negative real part.

Theorem 3: Consider P(t) = eQ·t in which Q is the transition matrix, the time elapse until system reaches to the steady state (P (t) = Π) can be calculated by the following formula:

(9)

where, ε is a very small number (i.e., ε = 0.0001), Sr is the largest real part of the eigen values excluding the zero element of matrix Q and Π is a square matrix representing the limiting probabilities. The elements of matrix P(t) and Π are shown as follows:

Proof

By theorem 2 all Sm are negative and j, αkjm, Sm, and Cm are constant numbers). Now suppose Sr is greater then Sm, then for large values of t we have:

Based on the proof of these theorems, an algorithmic procedure is proposed for calculating the availability of the system.

Algorithm

Let i=0
Draw the Fault tree of system
Determine the minimal cut sets of the Fault tree
Determine the sets of possible combination failed components and terminal combination failed components
Determine the transition matrix Q
Determine the eigen values and eigen vectors of the matrix Q and let i = i + 1
Determine P (t) = V.ed.t.V-1
Determine P(t) = Pn(0).P(t) and if i = 1 go to step 9 and if i = 2 go to step 10
Determine the availability of the system by:


FT = set of terminal combination failed components

Then delete those rows and columns of the Q matrix that related to members of Ft set and go to step 6

Determine the survivability and MTTFs of the system as follows:

(10)

A NUMERICAL EXAMPLE: WATER PUMPING SYSTEM

As a very simple example, suppose the pumping system shown in Fig. 2.

Assume that our undesired event is no flow of water to pad deluge nozzles. There is one repairman for repairing this system.

Fig. 2: Water pumping system

Fig. 3:
Basic fault tree for water pumping system example, T: no flow of water to pad deluge nozzles, C: valve V fails closed, A: pump 1 fails to run, B: pump 2 fails to run

It is assumed that time to failure of repaired component is a random variable with exponential distribution function with the mean of 1/2 h. The repair time is also considered to be a random variable distributed exponentially with the mean of 1/10 of h. We would like to calculate availability of the system at any given time.

Solution: There are 3 basic components (Valve V, Pump 1 and Pump 2) in this system. For determining of the system availability we have:

Stage 1: Draw the Fault tree of system: Ignoring the contribution of the pipes, we can model this system by fault tree of Fig. 3.

Stage 2: Determine the minimal cut sets of the Fault tree: The minimal cut sets of this tree are C and A.B. this tells us that our undesired event will occur if either valve V fails closed or both pumps fail to run. In this simple case the cut sets do not really provide any insights that are not already quite obvious form the system diagram. In more complex systems, however, where the system failure modes are not so obvious, the minimal cut set computation provides analyst with a thorough and systematic method for identifying the basic combination of component failures witch can cause an undesired event. From the basic fault tree we can express the top event as a Boolean function of the primary input events.

This expression of the top event in terms of basic inputs to the tree is the Boolean algebraic equivalent of the tree itself.

In this example, we have found two minimal cut sets; one single and one double:

Mcs: {{C}, {A· B}}

Each of these defines an event or series of events whose existence or joint existence will initiate the top event of the tree.

Stage 3: Determine the set of possible combination failed components: From the minimal cut sets of example we can determine the set of possible combination failed components. We have 3 components and 2 minimal cut sets in this system and then we have 23 = 8 states of combination failed components. But some of these are not possible because we assume that a component may failed only in operation mode. If components of one state consist of more than one minimal cut set of the system then it must eliminate and it is not possible combination failed component. But surveying of all states in cases with more than 4 components is time consuming, complex and hard to understand. Accordingly we must use special computer program for determining set of possible combination failed components. It is described earlier that FP and FT in this example are:

set of possible combination failed components = Fp = {0, A, B, C, AB, AC, BC}
set of terminal combination failed components = Fp = {C, AB, AC, BC}

Stage 4: Determine the transition matrix Q: The transition matrix of this example is:

The graphical Markov model of this example represented in Fig. 1 but in more complex systems it is not easy for representation and it is not necessary too.

Stage 5:Determine the eigen values and eigen vectors of the matrix Q.

Stage 6: Determine P (t) = V.ed.t.V-1.

Stage 7: Determine P(t) = Pn(0).P(t) .

Stages of 5, 6 and 7 for solving need to MATLAB. The final solution is:

Pn(0) = [1,0,0,0,0,0,0]

Stage 8: Determine the availability of the system

We can calculate the time elapse until system reaches to the steady state. Table 1 represents the probability of the system to be up (good) at time t, for different values of t.

Table 1: Elapse until system reaches to the steady state

By the following closed form formula, we can also calculate the time elapse until system reaches to the steady state.

As it can be seen from the Table 1 and the closed form formula, the system reaches to the steady state after 0.6 unit time.

The limiting probability can also calculated as follows:

Step 9: Determine the survivability and MTTFs of system:

For determining of the system survivability and MTTFs , according to the algorithm we have a reduced matrix as follow:

And

Now we can calculate the system survivability and MTTFs as follows:

CONCLUSION

The main purpose of this study was to offer a methodology for analyzing system transient availability with identical components and one repairman. the fault tree, Markov models, eigen vectors and eigen values concepts have been employed to develop the methodology for the transient reliability of such systems. Because of applying fault tree in The proposed methodology, it is used for large systems where high reliability is required and where the design is to incorporated many layers of protection such as in nuclear reactor systems and The proposed methodology is also a more effective method in the sense that it can be applied for analyzing the varieties of systems. This method is a sophisticated form of reliability assessment and it requires considerable time and effort by skilled analysts. Using fault tree in this method as being of major value in directing the analyst to ferret out failures deductively, pointing out the aspects of the system important in respect of the failure of interest, providing a graphical aid giving visibility to those in system management who are removed from system design changes, allowing the analyst to concentrate on one particular system failure at time and providing the analyst with genuine insight into system behavior. Although fault tree is the best tool available for a comprehensive analysis, it is not foolproof and, in particular, it does not of itself assure detection of all failures, especially common cause failures.

REFERENCES
1:  Moustafa, M.S., 1998. Transient analysis of reliability with and without repair for k-out-of-N: G systems with M failure modes. Reliabil. Eng. Syst. Safety, 59: 317-320.
CrossRef  |  Direct Link  |  

2:  Pham, H. and M. Pham, 1991. Optimal designs of {k, n-k+1}-out-of-n: F systems (subject to 2 failure modes). Microelect. Reliability, 40: 559-562.
CrossRef  |  Direct Link  |  

3:  Shao, J. and L.R. Lamberson, 1991. Modeling a shared-load k-out-of-n: G system. Microelect. Reliability, 40: 202-208.
CrossRef  |  Direct Link  |  

4:  Zhang, Y.L., M.J. Zuo and R.C.M. Yam, 2000. Reliability analysis of a circular consecutive-2-out-of-n: F repairable system with priority in repair. Reliabil. Eng. Syst. Safety, 68: 113-120.
CrossRef  |  Direct Link  |  

5:  Li, X., M.J. Zuo, R.C.M. Yam, 2006. Reliability analysis of a repairable k-out-of-n system with some components being suspended when the system is down. Reliabil. Eng. Syst. Safety, 91: 305-310.
CrossRef  |  Direct Link  |  

6:  Sarhan, A. and A.M. Abouammoh, 2001. Reliability of k-out-of-n non-repairable systems with non-independent components subjected to common shocks. Microelect. Reliabil., 41: 617-621.
CrossRef  |  Direct Link  |  

7:  Gohary, A.I.E. and A.M.Sarhan, 2005. Estimation of the parameters in three non independent component series system subjected to sources of shocks. Applied Math. Comput., 160: 29-40.
CrossRef  |  Direct Link  |  

8:  Azaron, A., H. Katagiri, K. Kato and M. Sakawa, 2006. Reliability evaluation of multi-component cold-standby redundant systems. Applied Math. Comput., 173: 137-149.
CrossRef  |  Direct Link  |  

9:  Amiri, M. and F. Ghassemi-Tari, 2007. A methodology for analyzing the transient availability and survivability of a system with repairable components. Applied Math. Comput., 184: 300-307.
CrossRef  |  Direct Link  |  

10:  Amiri, M. and F. Ghassemi, 2007. A methodology for analyzing the transient reliability of systems with identical components and identical repairmen. Int. J. Sci. Technol. Sci. Iranica, 14: 72-77.
Direct Link  |  

11:  Amiri, M., F. Ghassemi, A. Mohtashami and J. Salehi Sadaghiani, 2008. A Methodology for analyzing the transient availability and survivability of a system with the standby components in two cases: The identical components and the non-identical components. J. Applied Sci.

12:  Vesely, W.E., F.F. Goldberg, N.H. Roberts and D.F. Haasl, 1980. Fault Tree Hand Book. 1st Edn., US Nuclear Regulatory Commission, Washington DC., USA.

13:  Bolch, G., S. Greiner, H. de Meer and K.S. Trivedi, 2006. Queueing Networks and Markov Chains: Modeling and Performance Evaluation with Computer Science Applications. 2nd Edn., Wiley Inter-Science, New York, ISBN-13: 9780471565253, Pages: 896.

14:  Frank Lees, P., 1996. Loss Prevention in the Process Industries. 2nd Edn., Butterworth-Heinemann, Oxford, ISBN: 0-7506-1547-8.

15:  Moustafa, M.S., 1998. Transient analysis of reliability with and without repair for k-out-of-N: G systems with M failure modes. Reliabil. Eng. Syst. Safety, 59: 317-320.
CrossRef  |  Direct Link  |  

©  2021 Science Alert. All Rights Reserved