INTRODUCTION
Loss prevention is in large part the application of probabilistic methods to
the problem of failure in the process industries. The discipline which is concerned
with the probabilistic treatment of failure in systems in general is reliability
engineering. Reliability has been a major concern for the system designers.
Many systems consist of components having various failure modes. Several authors
have considered a koutofn system subject to two failure modes. Among those,
Moustafa (1996) presented Markov models for analyzing
the transient reliability of koutofn: G systems subject to two failure modes.
He proposed a procedure for obtaining closed form of the transient probabilities
and the reliability for nonrepairable systems. Moustafa
(1998) extended his work by providing a set of simultaneous linear differential
equations for repairable and nonrepairable of two different koutofn: G systems
subject to M failure modes. In his paper numerical solutions for the reliability
of the repairable systems were discussed, and closed formula for solutions of
the reliability for the nonrepairable systems were presented. Another research
effort is the work of Pham and Pham (1991) which has
considered [k, nk+1]out ofn: F systems subject to two failure modes. Shao
and Lamberson (1991) presented a model for koutofn: G system with load
sharing.
Zhang et al. (2000) presented circular consecutive
2outofn repairable system with one repairman. They determined rate of occurrence
of failure, mean time between failures, reliability and mean time to first failure.
Li et al. (2006) presented a koutofn system
with independent exponential components. They assigned that some working components
are suspended as soon as the system is down, repair starts immediately when
a component fails and repair times are independent and exponentially distributed.
Also they determined mean time between failures, mean working time in a failure
repair cycle and mean down time in a failurerepair cycle.
Another attempt is the study conducted by Sarhan and Abouammoh
(2001), who applied the concept of shock model to derive the reliability
function of a koutofn nonrepairable system with nonindependent and nonidentical
components. LaterGohary and Sarhan (2005) extended Sarhan
and Abouammoh (2001) work by proposing a Bayes estimator for of a three
nonindependent and nonidentical component series system under the condition
of four sources of fetal shock. They support their estimation method by presenting
a simulation study and showed how one can utilize the theoretical results obtained
in their study.
Azaron et al. (2006) introduced a new methodology,
by using continuous time Markov processes and shortest path technique, for the
reliability evaluation of an Ldissimilarunit nonrepairable coldstandby redundant
system. Amiri and Ghassemi (2007a) introduced a method
for transient analysis of availability and survivability of a system with repairable
components using Markov models, eigen values and eigen vectors. The considered
system was supposed to consist of n identical components and k repairmen which
components are arranged in series or in koutofn or in parallel. they proposed
a methodology for obtaining availability, survivability, MTTF_{s} (Mean
time to system failure) of the system and calculating the duration for the system
to reach to its steady state. Amiri and Ghassemi (2007b)
introduced a method for analyzing the transient reliability of systems with
identical components and identical repairmen using Markov models, eigen values
and eigen vectors. They assumed that The components of the systems under consideration
can have two distinct configurations, namely; they can be arranged in series,
or in parallel. they also considered third case in which the system is up (good)
if koutofn components are good. For all three cases they proposed a procedure
for calculating the transient probability of the system availability and the
duration of the system to reach the steady state. Amiri et
al. (2008) introduced a methodology for analyzing the transient availability
and survivability of a system with the standby components in two cases: the
identical components and the nonidentical components. They assumed that in
the standby systems the whole components are not employed at the moment, it
means that in each moment there is just one part or component that is employed,
and as soon as the failure of the operating component, the system switch on
another well component.
In major of actual systems composition of components is more complex
and there are combinations of component failures which if they all occur,
will cause main undesired event to occur.
A Markov model is a model of the probabilities of different states of a system
as a function of time. It therefore has two variables, state and time. The state
of a system can generally be defined in a number of ways. Thus for a two equipment
system with equipments of 1 and 2, one set of states is as follows: no equipment
failed, one equipment (either 1 or 2) failed, both equipment failed. Another
set of states is as follows: no equipment failed, equipment 1 failed, equipment
2 failed, both equipment failed. The transition in the system may be in the
forward direction only or they may be in both forward and backward directions.
The transition rates are defined by the states chosen. It is generally desirable
to choose the states so that the transition rates correspond to quantities which
are known, such as failure and repair rates, the transition rate of a Markov
model is a constant (Bolch et al., 2006).
A fault tree is a logic diagram depicting certain events that must occur in
order for other events to occur. The events are faults if they are initiated
by other events and are termed failures if they are the basic initiating events.
The fault tree interrelates events (faults to faults or faults to failures)
and certain symbols are used to depict the various relationships. The basic
symbol is the gate and each gate has inputs and an output. The two basic gate
categories are ORgate and the ANDgate. Because these gates relate events in
exactly the same way as the Boolean operations, there is a onetoone correspondence
between the Boolean algebraic representation and the fault tree representation
(Vesely et al., 1980). Fault tree analysis is used
for large systems where high reliability is required and where the design is
to incorporate many layers of protection, such as in nuclear reactor systems.
A fault tree is a graphical representation of the fault paths and logic of a
system and is of value as such. There are, however, a number of methods of fault
tree evaluation for example the minimal cut sets method, the gatebygate method
and Monte Carlo method (Frank Lees, 1996).
In this study we present a method for transient analysis of availability
and survivability of a system with repairable components that this method
is a technique for fault tree evaluation too. In this study we use from
concepts of fault tree, Markov models, eigen values and eigen vectors.
The considered system is supposed to consist of n identical components
and one repairman. We propose a methodology for obtaining availability,
survivability, MTTF_{s} (Mean time to system failure) of the system
and calculating the duration for the system to reach to its steady state.
This method is useful for both standard system series system, parallel
systems and standby systems and complex systems.
NOMENCLATURE AND DEFINITIONS
X (t)=State of possible combination failed components
at time t 
(1) 
p_{n} (t)=Probability of having State
of nth at time t; p_{n} (t) = P (X(t) = n). 
(2) 
A (t) 
= 
Probability of system to be up (good) at time t,
regardless of its historical components failure and/or repair. 
A (∞) 
= 
Long time system availability or system reliability. 
R_{s}(t) 
= 
Survivability function: 
Determines the probability that a system does not leave the set B of
functioning states during the time interval (0, t];
MTTFs: Mean time to system failure;
Definition 1: If we consider Q as the state transient rate matrix
and P(t) as the state transient probability in the exponential Markov
chain with the continuous time, then we have:
In which Q and P(t) are square matrixes and, where P_{n}(t) and
P_{n}(0) are row vectors.
THE MODEL
The aim of this study is to determine the availability, survivor function
and MTTF_{ }of a system with the following assumptions:
• 
The system consists of identical and independent components 
• 
Combination of the components can be simple and standard (series
system, parallel systems, …) or very complex 
• 
The components of system are repairable 
• 
The system consists of one repairman 
• 
The life time of each component is exponentially distributed with
the parameter λ 
• 
The service time of each component by repairman is exponentially
distributed with the parameter μ 
THE PROPOSED METHODOLOGY
To describe the proposed methodology for analyzing the system’s transient
reliability, survivability of the system and the time until the system
is reached to its steady state. Consider a system having n identical components
and one repairman. The components can be arranged in any structure. Our
methodology can analyze many distinct cases. First we must define main
undesired event (top event) in system and the system is then analyzed
in the context of its environment and operation to find all credible ways
in which the undesired event can occur. The fault tree is a graphic model
of the various parallel and sequential combinations of faults that will
result in the occurrence of the predefined undesired event. The faults
can be events that are associated with component hardware failure, human
errors or any other pertinent events which can lead to the top event.
A FT thus depicts the logical interrelationships of basic events that
lead to the undesired event which is the top event of the fault tree.
It is important to point out that a FT is not in itself a quantitative
model. It is a qualitative model that can be evaluated quantitatively
and often is. In the next phase we must determine minimal cut sets of
the FT. A minimal cut set is a smallest combination of component failure
witch, if they occur, will cause the top event to occur. A minimal cut
set is thus a combination of primary events sufficient for the top event.
If one of the failures in the cut set does not occur, then the top event
will not occur (by this combination).
It is assumed that the time between two components failure is a random
variable having the exponential distribution with the parameter λ.
It is also assumed that there is one repairman providing services to the
system. The service time of a component is also an exponentially distributed
random variable with the parameter μ. A system with n component has
2^{n} state that some of them is impossible for occur because
we assume that when system failed by a combination failed components,
then no component failed and only they can failed in operation mode and
in standby mode no failure occur. Some of these possible combination
failed components are terminal combination; a terminal combination is
a possible combination failed components that certainly include one minimal
cut set of the system.
Set of possible combination failed components = F_{p} =
{A_{1}, A_{2}, A_{3}, …., A_{k}}
k≤2^{n} 
Set of terminal combination failed components = F_{T} =
{A_{1}, A_{2}, A_{3},…, A_{m}} m≤k 
Considering X(t) as the state of possible combination failed components
at time t, As an example if we assume a system with 3 component (A,B,C)
that its minimal cut sets are C and A.B then the related Markov model
is represented in Fig. 1.
According to the above explanation F_{P }and F_{T} in
this case are:

Fig. 1: 
State transition diagram of the system with 3 components
by minimal cut sets of C and A.B 
set of possible combination failed components = F_{p} =
{0, A, B, C, AB, AC, BC} 
set of terminal combination failed components = F_{p} =
{C, AB, AC, BC} 
The proposed methodology for obtaining the system availability and the
transient probabilities are based on several theorems. These theorems
are established to provide the underlying theory of the methodology. We
first present these theorems as the following:
Theorem 1: Let us consider a continuous time exponential Markov
chain in which P’0 (t) = P (t)·Q, then we have:
P_{n} (t) = P_{n} (0)·e^{Q·t} 
Proof
P’(t) = P(t)·Q ⇒ 
⇒ InP (t) = Q · t + C ·I ⇒ e ^{Inp(t)}
= e ^{Q ·t} ·e ^{C ·I} ⇒
P (t) = e ^{Q ·t} ·e ^{C ·I} 
where, I is an identity matrix. Since P(0) = I then we have P (t) = e^{Q·t}.
By Definition 1 we will have:
P_{n} (t) = P_{n} (0)·
e^{Q·t}. 
Consider the following theorem.
Let us consider Q as an nxn square matrix which has n nonrepeating eigen
values, then we have:
e^{Q·t} = V·e^{d·t}
·V^{1} 
(7) 
where, t represent time, V is a matrix of eigen vectors
of Q, V^{1} is the inverse of V and d is a diagonal eigen values
of Q defined as follows:
And the matrix e^{d·t} is as follows:
Theorem 2: Consider P (t) = e^{Q·t} in which Q
is the transition matrix. In matrix Q one of the eigen values is
zero and the remaining eigen values are the complex number with the negative
real part.
Proof: Since in every row of transition matrix the summation of
row elements is zero, we can deduce that one its eigen value of matrix
Q is zero. By theorem 1 and Eq. 7, we have
where, λ_{k} is the kth eigen value, α_{ijk}’s
are constant values, and π_{j} is the limiting probability.
Using the contradictory concept, if we assume that one of the eigen values
of Q is a complex number with positive real part then we have:
Which contradicts
and therefore the eigen values of Q are complex numbers with the negative
real part.
Theorem 3: Consider P(t) = e^{Q·t} in which Q is
the transition matrix, the time elapse until system reaches to the steady
state (P (t) = Π) can be calculated by the following formula:
where, ε is a very small number (i.e., ε = 0.0001), S_{r}
is the largest real part of the eigen values excluding the zero element
of matrix Q and Π is a square matrix representing the limiting
probabilities. The elements of matrix P(t) and Π are shown as follows:
Proof
By theorem 2 all S_{m} are negative and
(π_{j}, α_{kjm}, S_{m}, and C_{m}
are constant numbers). Now suppose S_{r} is greater then S_{m},
then for large values of t we have:
Based on the proof of these theorems, an algorithmic procedure is proposed
for calculating the availability of the system.
Algorithm
• 
Let i=0 
• 
Draw the Fault tree of system 
• 
Determine the minimal cut sets of the Fault tree 
• 
Determine the sets of possible combination failed components and
terminal combination failed components 
• 
Determine the transition matrix Q 
• 
Determine the eigen values and eigen vectors of the matrix Q and
let i = i + 1 
• 
Determine P (t) = V.e^{d.t}.V^{1} 
• 
Determine P(t) = P_{n}(0).P(t) and if i = 1 go to step 9
and if i = 2 go to step 10 
• 
Determine the availability of the system by: 
F_{T} 
= 
set of terminal combination failed components 
Then delete those rows and columns of the Q matrix that related to members
of F_{t} set and go to step 6
• 
Determine the survivability and MTTF_{s} of the system as
follows: 
A NUMERICAL EXAMPLE: WATER PUMPING SYSTEM
As a very simple example, suppose the pumping system shown in Fig.
2.
Assume that our undesired event is no flow of water to pad deluge nozzles.
There is one repairman for repairing this system.

Fig. 2: 
Water pumping system 

Fig. 3: 
Basic fault tree for water pumping system example, T:
no flow of water to pad deluge nozzles, C: valve V fails closed, A:
pump 1 fails to run, B: pump 2 fails to run 
It is assumed that time
to failure of repaired component is a random variable with exponential
distribution function with the mean of 1/2 h. The repair time is also
considered to be a random variable distributed exponentially with the
mean of 1/10 of h. We would like to calculate availability of the system
at any given time.
Solution: There are 3 basic components (Valve V, Pump 1 and Pump
2) in this system. For determining of the system availability we have:
Stage 1: Draw the Fault tree of system: Ignoring the contribution
of the pipes, we can model this system by fault tree of Fig.
3.
Stage 2: Determine the minimal cut sets of the Fault tree: The
minimal cut sets of this tree are C and A.B. this tells us that our undesired
event will occur if either valve V fails closed or both pumps fail to
run. In this simple case the cut sets do not really provide any insights
that are not already quite obvious form the system diagram. In more complex
systems, however, where the system failure modes are not so obvious, the
minimal cut set computation provides analyst with a thorough and systematic
method for identifying the basic combination of component failures witch
can cause an undesired event. From the basic fault tree we can express
the top event as a Boolean function of the primary input events.
This expression of the top event in terms of basic inputs to the tree
is the Boolean algebraic equivalent of the tree itself.
In this example, we have found two minimal cut sets; one single and one
double:
Each of these defines an event or series of events whose existence or
joint existence will initiate the top event of the tree.
Stage 3: Determine the set of possible combination failed components:
From the minimal cut sets of example we can determine the set of possible
combination failed components. We have 3 components and 2 minimal cut
sets in this system and then we have 2^{3} = 8 states of combination
failed components. But some of these are not possible because we assume
that a component may failed only in operation mode. If components of one
state consist of more than one minimal cut set of the system then it must
eliminate and it is not possible combination failed component. But surveying
of all states in cases with more than 4 components is time consuming,
complex and hard to understand. Accordingly we must use special computer
program for determining set of possible combination failed components.
It is described earlier that F_{P }and F_{T} in this example
are:
set of possible combination failed components = F_{p} =
{0, A, B, C, AB, AC, BC} 
set of terminal combination failed components = F_{p} =
{C, AB, AC, BC} 
Stage 4: Determine the transition matrix Q: The transition matrix
of this example is:
The graphical Markov model of this example represented in Fig.
1 but in more complex systems it is not easy for representation and it is
not necessary too.
Stage 5:Determine the eigen values and eigen vectors of the matrix
Q.
Stage 6: Determine P (t) = V.e^{d.t}.V^{1}.
Stage 7: Determine P(t) = P_{n}(0).P(t) .
Stages of 5, 6 and 7 for solving need to MATLAB. The final solution is:
P_{n}(0) = [1,0,0,0,0,0,0] 
Stage 8: Determine the availability of the system
We can calculate the time elapse until system reaches to the steady state.
Table 1 represents the probability of the system to
be up (good) at time t, for different values of t.
Table 1: 
Elapse until system reaches to the steady state 

By the following closed form formula, we can also calculate the time
elapse until system reaches to the steady state.
As it can be seen from the Table 1 and the closed form
formula, the system reaches to the steady state after 0.6 unit time.
The limiting probability can also calculated as follows:
Step 9: Determine the survivability and MTTFs of system:
For determining of the system survivability and MTTF_{s} , according
to the algorithm we have a reduced matrix as follow:
And
Now we can calculate the system survivability and MTTF_{s} as
follows:
CONCLUSION
The main purpose of this study was to offer a methodology for analyzing
system transient availability with identical components and one repairman.
the fault tree, Markov models, eigen vectors and eigen values concepts
have been employed to develop the methodology for the transient reliability
of such systems. Because of applying fault tree in The proposed methodology,
it is used for large systems where high reliability is required and where
the design is to incorporated many layers of protection such as in nuclear
reactor systems and The proposed methodology is also a more effective
method in the sense that it can be applied for analyzing the varieties
of systems. This method is a sophisticated form of reliability assessment
and it requires considerable time and effort by skilled analysts. Using
fault tree in this method as being of major value in directing the analyst
to ferret out failures deductively, pointing out the aspects of the system
important in respect of the failure of interest, providing a graphical
aid giving visibility to those in system management who are removed from
system design changes, allowing the analyst to concentrate on one particular
system failure at time and providing the analyst with genuine insight
into system behavior. Although fault tree is the best tool available for
a comprehensive analysis, it is not foolproof and, in particular, it does
not of itself assure detection of all failures, especially common cause
failures.