Systematic Approach to Maintainability Analysis at Operational Phase
Fakhruldin Mohd Hashim
Ainul Akmar Mokhtar
Maintainability analysis provides quantifiable assessment
of the performance and effectiveness of the maintenance and support system so
that further improvement actions can be made. This study presents a systematic
and practical approach for conducting maintainability analysis at operational
phase. The proposed approach is demonstrated via a case study of a gas compression
train system. The results indicate the approach is effective in identifying
key contributors to system downtime and estimating maintainability measures
for future maintenance system improvement and planning.
Received: September 18, 2012;
Accepted: October 22, 2012;
Published: January 10, 2013
The applications of maintainability analysis of plant maintenance field data
are not widespread among industrial practitioners compared to those of reliability
analysis. Nevertheless, maintainability analysis is highly critical for ongoing
efforts in reducing operations and maintenance costs, thus should be appropriately
considered in every phase of a system life cycle (Blanchard
et al., 1995). At the operation and support phase, the on-going maintainability
analysis provides quantifiable assessment of the performance and effectiveness
of the maintenance and support system, identification of equipment, system and
process high cost and downtime drivers and evaluation of maintainability measures
and prediction. The results of the analysis are then used as valuable information
for operation, maintenance and design personnel to make the maintenance system
more effective, plan logistic support requirement (i.e., workers, tools and
materials), carry out improvement actions to reduce operation costs and achieve
current system operation performance targets, which are always changing as the
results of plant decreasing profit margin and escalating operation cost trend.
There are a limited number of studies in the literature about the maintainability
analysis applied at the operation and support phase. Examples of them can be
found in references (Thangamani et al., 1995;
Alvi, 1997; Hajeeh and Chaudhuri,
2000). Most studies, however, are not specifically on maintainability per
se, but rather are part of other larger studies such as availability and
RAM analysis. Consequently, the analyses done are generally not comprehensive
and lack details in the methodology used. Moreover, many assume constant maintenance
downtime or repair rate in the analysis model mainly for ease of calculations.
Some analyses, nevertheless, adopt the assumption after the results of some
statistical data analysis (Barabady and Kumar, 2008;
Elevli et al., 2008). In real world applications,
however, certain maintenance data do not exhibit such a steady state condition,
thus any prediction based on the constant repair rate assumption will likely
to produce incorrect results. The existence of a trend in data can be due to
the deterioration or improvement in the maintenance system. The improvement
trend in maintainability seen in a system is generally a direct result of effective
improvement actions carried out in the maintenance and support system to reduce
the downtime. In some systems such as an offshore system, the improvement trend
may be prominent only after few years of system commencement, taking into consideration
the learning curve and period to achieve stable operation. Many studies on system
maintainability fall short of considering this data pattern and instead tend
to blindly use all the data acquired since the beginning of system operation
in estimating maintainability measures.
This study aims to present a general framework of a practical, systematic and
detailed approach for analyzing maintainability of a system at operational phase.
The proposed approach is demonstrated by a case study of a gas compression train
system at an offshore platform. This study also addresses the issue of maintenance
data having improvement trend and proposes a simple method for estimating their
maintainability measures more effectively. The scope of the study is on the
analysis of Corrective Maintenance (CM) downtime.
MATERIALS AND METHODS
The proposed approach to maintainability analysis of a system in plant at operational
and support phase can be illustrated by a generic framework in Fig.
1. In general, it involves six major steps:
||Setting objectives: The most important factor for successful
maintainability study is having clear definition of the specific purpose
to be achieved at the end of the analysis (Denson, 2006).
Only by having unambiguous objectives in the beginning and consistently
sticking to it throughout the whole analysis process, can a proper and effective
analysis be accomplished (Ansell and Phillips, 1989).
The objective of the maintainability study has high influence on the approach
and method of modeling and analysis used (Aven and Jensen,
||Definition of system, failure and downtime: The definition of system
under studied, system boundary and operating states, failure event and modes
need to be clearly specified to put the subsequent analysis steps in the
right perspective and to minimize uncertainties associated with the data.
A distinct system boundary shall identify what are components within the
system and what are excluded from it. The boundary also defines what data
are to be collected. Other system information such as its descriptions,
applications, operating mode and environment conditions have also to be
clearly specified. At this stage, it is also important to define all assumptions
made in the maintainability model and determine hierarchical level (system,
subsystem, component etc.,) of which the data will be collected and analysis
will be conducted
||Data gathering: The quality and accuracy of maintainability analysis
is highly correlated to the quality of the data collected. High quality
data attributes include completeness of the data, compliance with data formats
and reliable sources of data (ISO 14224, 1999). The
primary source of data in this research comes from in-house plant maintenance
data. Data gathering step is usually the most time and effort consuming
activity due the nature of the data and their sources. There are many data
available in a plant such as those from maintenance, engineering, vendor
reports, SAP (CMMS) etc. Besides, they exist in various forms, thus choosing
the relevant one and translating them into distribution and failure statistics
can be a challenging task and normally requires considerable engineering
judgment. To overcome these issues, good cooperation and constant feedback
from plant personnel are highly required
||Exploratory analysis: Exploratory data analysis, first introduced
by Tukey (1997), is the process of using statistical
tools and techniques to investigate data sets in order to gain insight about
the data, understand their important characteristics, identify outliers
or errors, disclose underlying structure and extract important factors (NIST/SEMATECH,
2003) and assist in model formulation (Chatfield,
1985). Because of this apparent significance, many researchers propose
the use of exploratory analysis at the beginning of any plant reliability
data analysis process (Ansell et al., 1994;
Blischke and Murthy, 2000; Andrews
and Moss, 2002; OConnor et al., 2002;
Todinov, 2005; Muhammad and Majid,
2011). Some of the common tools used include simple plots like histogram,
stem and leaf, box-whiskers, Pareto, scattered diagram and time series trend.
These methods are significantly useful to get a feel about the data, identifying
possible errors in the data and key factors affecting system downtime performance
||Inferential analysis: The purpose of this step is to determine
the best statistical model to represent the data. According to Knezevic
(2009), two commonly used methods for analysis of the empirical downtime
data are the parametric and distribution approaches. In the parametric approach,
the main interest is to get the mean downtime, which is computed by dividing
the sum of all downtime hours by the total number of downtime events. In
the distribution approach, the downtime is expressed in term of probability
distributions, where the downtime is treated as random variable since every
failure event will always result in different downtime duration due to different
failure modes, components failure and skill level of maintenance people
(Ebeling, 1996). Due to this, the distribution approach
offers more information than the parametric approach (Knezevic,
2009), thus, is the preferred method in evaluating maintainability measures.
The most commonly used probability distributions to describe maintenance
downtime are the exponential, normal and log-normal (Blanchard
et al., 1995). For downtime data of non-repairable items, the
assumption of Independent and Identically Distributed (IID) is generally
hold, thus the data can be straightly modeled by statistical distribution.
For repairable data of a single equipment or system, however, the data should
be tested for IID assumption first before they can be fitted into any distribution.
The important of ensuring the data are IID before they can be used for prediction
model cannot be emphasized enough. The existence of trend exhibits that
the data are not in steady state thus cannot be justifiably fitted into
any statistical probability distribution. When data have monotonically increasing
trend, the more suitable model is non-homogeneous Poisson process (NHPP).
To estimate the best parameters for the statistical distribution, method
such as Maximum Likelihood Estimator (MLE) method can be employed. Subsequently,
analytical test such as one-sample Kolmogorov-Smirnov (KS test) can be used
to determine the best fit distribution to model the data
||Estimation of maintainability measures: Based on the appropriate
distribution selected and its associated parameters, the maintainability
measures can be determined. These include maintainability function, Mean
Downtime (MDT), Mean Time to Repair (MTTR) and percentage restoration time.
The obtained measures are then to be interpreted accordingly to provide
a basis for suitable recommendations for system improvement (e.g., which
equipment is critical, hence, should be focused on by management).
||Proposed generic methodology for maintainability analysis
at operational phase
CASE STUDY: A GAS COMPRESSION TRAIN SYSTEM
The system under studied is a parallel Gas Compression Train (GCT) consists
of two trains, part of a gas compression system on an offshore installation.
In this system raw gas from well undergoes various treatment processes and later
is compressed to higher pressure by a centrifugal compressor driven by a gas
turbine it is transferred to onshore facilities via pipeline. The main objectives
of this case study are as follows:
||To demonstrate the application of the proposed approach for
effective maintainability analysis
||To identify critical factors/subsystems affecting the system CM downtime
||To assess the maintainability measures of the system which are useful
for predicting future maintenance system and resources requirements
Plant maintenance data: System downtime for GCT is contributed by the
external events (emergency shutdown (ESD), plant shutdown, turnaround and system
standby) and the internal events (corrective maintenance and planned preventive
maintenance). All of the downtime data should be distinctly identified and categorized
so that the appropriate data are being captured and used for the analysis.
The maintenance data can be categorized into 10 subsystems as described in
Table 1. The data for the study are collected for the period
of 2002, where the offshore platform was first commissioned, until 2008. Table
2 shows the CM downtime data for both trains which are combined and arranged
Exploratory analysis: The availability of the system since it begins
operation is shown in Fig. 2. The plot indicates deterioration
in the system performance in 2003 and 2004, before it rebounds in 2005 and maintains
a good trend from 2006 onwards. To understand what cause this variation in availability,
one needs to look at the rate occurrence of failures (ROCOF) and the downtime
duration, since the availability of the system is the function of these two
A closer look at the ROCOF and the downtime duration per CM event plot (Fig.
3) shows that the availability performance is highly influenced by the variation
in downtime duration, rather than by ROCOF as the performance of ROCOF does
not vary very much.
|| Subsystems of gas compression train system
|| Downtime data of gas compression train system
|| GCT annual availability
|| ROCOF and downtime per CM event trend
The improvement in the system availability since 2005 is hence mainly due to
significant reduction in downtime duration. There are many factors contribute
to the above trend but the most influential factor is the improvement actions
carried out by the engineering, maintenance and production team in the plant.
Based on the inputs from engineering personnel, those important initiatives
Spare parts and technician logistics enhancement: Many critical spares
had been placed at the sites, which were previously being stored at warehouse/supplier
base on onshore or OEM vendors oversea. Turbo-machinery technicians are also
stationed at the platform to advise material personnel on the spare part requirement.
A Pit crew concept, which focuses on team efforts, early planning
and streamlining work during shutdown, was also implemented (Hasnan
et al., 2004).
Engine and compressor change-out policy: It is suspected that the turbine
engine and gas compressor failures which caused high downtime during 2003-2005
periods are caused by over utilization of the equipment. A prudent approach
has been taken to ensure that the equipment change-out action will be carried
effectively according to the standard industrial practice.
Supplier contract procedure improvement: A Long Term Service Agreement
(LTSA) with major OEM suppliers was implemented replacing the old bidding process
resulted in improved maintenance services and part delivery by the suppliers.
Pareto analysis: Based on a Pareto analysis of total CM downtime during
the studied period for all subsystems, as shown in Fig. 4,
major downtime contributors are gas compressor (63.0%), gas turbine (27.7%)
and starter system (5.0%) and lube oil system (3.0%). Further investigation
on the whole seven operation years, it is found that high gas compressor downtime
occurred in 2003 and 2004, but since then, it has shown drastic reduction indicating
the improvement activities carried out by the team paid off. Nevertheless, downtime
due to lube oil system has shown increasing trend towards the end of observation
period. This should be one of the areas that management needs to investigate
and focus on.
Trend analysis: The graph of cumulative number downtime over cumulative
downtime hours indicates that there is an obvious improvement trend since 2006,
as indicated by a concave up plot trend. The calculated Laplace test value,
U = 6.04, which is larger than the critical value of 1.95 at 95% confidence
level, also confirms the fact that the downtime is in improving trend. The serial
correlation test, however, indicates that the data are independent since the
data plot are randomly scattered.
|| Pareto of downtime by subsystems
|| Steady state region in the data plot
|| KS goodness-of-fit test
|Significance <0.005 indicates not a good fit
|| GCT CM maintainability measures
The trend test indicates that there is a trend, thus, the existing data is
not in a steady state (IID) and not appropriate to be used in the next analysis
either with the distribution or parametric approach. The trend however is not
monotonic. A closer look at the cumulative plot highlights that in the last
four years of operation, the data seem to level off (Fig. 5).
This steady state region can be demonstrated by constructing a simple linear
regression line using a least-squares method on those data. The resulted line
has large value of coefficient of determination, R2 at 0.903, which
indicates a good measure of goodness of fit of the regression line to the data.
To test whether the relationship is significance, a statistical test can be
done using F test (Anderson et al., 2002), with
the null hypothesis that there is no significance relationship between two variables.
The F test calculation resulted in F value of 300 which is greater than the
critical value of 7.5 for Type I error, α = 0.01, thus, indicates that
the null hypothesis can be rejected. Given this significance statistical relationship,
we can confidently assume that the latest data in the recent four years of operation
can be established as appropriate data for representing the actual current downtime
performance and can be used as the basis for evaluating maintainability or downtime
measures. The constant downtime rate predicted based on the slope of the linear
line is 24.2 h per downtime.
Inferential analysis: Three commonly used statistical probability distributions
(exponential, normal and log-normal) are chosen to model the downtime data of
the steady state region. Table 3 shows the results of the
calculated distributions parameters
using MLE and values of KS test. The calculations of MLE and KS test are done
using statistical software; Weibull++7 and SPSS. Based on the result, log-normal
distribution is found to be the best fit distribution.
Maintainability measures analysis: Table 4 lists the
maintainability measures of GCT based on steady state region and log-normal
distribution. Besides the mean downtime, the length of downtime at various percentages
of probabilities (10, 50 and 90) of maintenance tasks to be completed can also
be determined. This information is beneficial for management in maintenance
system planning and for determining the costing, maintenance scheduling, technical
and non-technical man-power planning and availability projection. The mean downtime
is at 28.7 h. The estimated downtime at 90% maintenance task completion rate
is 59.3 h. Conversely, the calculated downtime (log-normal is found to be the
best fit distribution) of all seven-year operational data resulted in 89.1 and
150.6 h for mean and 90% completion rate, respectively. This result is rather
pessimistic. For comparison, a set of available downtime data for 2009 is examined
and based on the log-normal distribution (best fit distribution), the mean downtime
is 6.4 h with standard deviation of 12.75 h. This estimation is relatively closer
to the steady state region result thus indicates that the proposed method is
practical to be applied for establishing proper downtime distribution. The estimation
using NHPP model, on the other hand, results in higher mean downtime at 120
h, due to poor data fitting.
This study has demonstrated through a case study a systematic approach to maintainability
analysis at the operation phase. The analysis is found highly critical as it
provides an insight on the current performance of the system, hence should be
performed comprehensively by plant management on a regular basis. The proposed
approach is found to be effective in highlighting various factors affecting
system downtime that require further feedback and improvement actions. In the
case study presented, the availability of the system is highly influenced by
the downtime performance. Major contributors to downtime are from gas compressor
and turbine failures. While both factors have shown significant reduction due
to effective improvement actions, downtime due to lube oil system has shown
an increasing trend. The proposed steady state condition approach is shown to
be a practical way to estimate maintainability measures of a system having non-monotonic
downtime improvement trend.
The authors wish to thank Universiti Teknologi PETRONAS for providing the necessary
support for this research.
1: Alvi, J.S., 1997. Availability analysis of integrated gasification combined cycle power plants with backup fuel and NOMx reduction. Reliabil. Eng. Syst. Safety, 55: 85-94.
2: Anderson, D.R., D.J. Sweeney and T.A. Williams, 2002. Statistics for Business and Economics. 8th Edn., Thomson South-Western, Australia
3: Andrews, J.D. and T.R. Moss, 2002. Reliability and Risk Assessment. 2nd Edn., Professional Engineering Publishing, New York, ISBN-13: 9781860582905, Pages: 540
4: Ansell, J.I. and M.J. Phillips, 1989. Practical problems in the statistical analysis of reliability data. Applied Stat., 38: 205-247.
5: Ansell, J.I., M.J. Phillips and J.I. Ansell, 1994. Practical Methods for Reliability Data Analysis. Oxford University Press, Oxford, UK., ISBN-13: 9780198536642, Pages: 240
6: Aven, T. and U. Jensen, 1999. Stochastic Models in Reliability. Springer-Verlag, New York
7: Barabady, J. and U. Kumar, 2008. Reliability analysis of mining equipment: A case study of a crushing plant at Jajarm Bauxite Mine in Iran. Reliabil. Eng. Syst. Safety, 93: 647-653.
8: Blanchard, B.S., D.C. Verma and E.L. Peterson, 1995. Maintainability: A Key to Effective Serviceability and Maintenance Management. John Wiley and Sons, New York, ISBN-13: 9780471591320, Pages: 560
9: Blischke, W.R. and D.N.P. Murthy, 2000. Reliability: Modeling, Prediction and Optimization. John Wiley and Sons, New York
10: Chatfield, C., 1985. Exploratory data analysis. Eur. J. Oper. Res., 23: 5-13.
11: Denson, W., 2006. Reliability Modeling: The RIAC Guide to Reliability Prediction, Assessment and Estimation. RIAC, New York, ISBN-13: 9781933904177, Pages: 412
12: Ebeling, C.E., 1996. An Introduction to Reliability and Maintainability Engineering. McGraw Hill, New York
13: Hajeeh, M. and D. Chaudhuri, 2000. Reliability and availability assessment of reverse osmosis. Desalination, 130: 185-192.
14: Hasnan, N., S. Nazim and Y. Yusof, 2004. Challenges to improve turbomachinery availability. Proceedings of the Conference on SPE Asia Pacific Oil and Gas, October 18-20, 2004, Perth, Australia -
15: ISO 14224, 1999. Petroleum and Natural Gas Industries: Collection and Exchange of Reliability and Maintenance Data for Equipment. 1st Edn., International Organization for Standardization, Geneva, Switzerland, Pages: 71
16: Knezevic, J., 2009. Maintainability and System Effectiveness. In: Handbook of Maintenance Management and Engineering, Ben-Daya, M., S.O. Duffuaa, A. Raouf, J. Knezevic and D. Ait-Kadi (Eds.). Springer, New York
17: NIST/SEMATECH, 2003. e-Handbook of statistical methods. http://www.itl.nist.gov/div898/handbook/.
18: Muhammad, M. and M.A.A. Majid, 2011. Reliability and availability evaluation for a multi-state system subject to minimal repair. J. Applied Sci., 11: 2036-2041.
19: O'Connor, P.D.T., D. Newton and R. Bromley, 2002. Practical Reliability Engineering. 4th Edn., John Wiley and Sons, New York
20: Elevli, S., N. Uzgoren and M. Taksuk, 2008. Maintainability analysis of mechanical systems of electric cable shovels. J. Sci. Ind. Res., 67: 267-271.
Direct Link |
21: Thangamani, G., T.T. Narendran and R. Subramanian, 1995. Assessment of availability of a Fluid Catalytic Cracking Unit through simulation. Reliabil. Eng. Syst. Safety, 47: 207-220.
22: Todinov, M., 2005. Reliability and Risk Models: Setting Reliability Requirements. John Wiley and Sons, New York, ISBN-13: 9780470094884, Pages: 340
23: Tukey, J.W., 1977. Exploratory Data Analysis. Addison Wesney, Tukey