INTRODUCTION
Survival analysis is a very popular tool in analyzing failure data such as
death of biological organisms or failure of mechanical systems. It plays an
important role in medicine, epidemiology, biology and demography. In engineering,
it is often referred to as reliability analysis. A standard approach in survival
analysis is to assume that the population is homogeneous, that is, every individual
has the same risk of death or failure. For this purpose, the Weibull distribution
has been used extensively to analyze survival data. As is well known, the Weibull
distribution generalizes the exponential distribution since it can incorporate
increasing, decreasing and constant hazard rates (Lee and
Wang, 2003) whereas, the hazard rate of the exponential distribution is
constant. Some recent examples of the application of the Weibull distribution
to survival analysis can be found by Mudholkar et al.
(1996) and Saat and Jemain (2009). Many authors
have proposed new distributions based on the traditional Weibull distribution
(Pham and Lai, 2007).
The assumption of a homogeneous population in most literature on survival analysis
is not always realistic (Vaupel et al., 1979;
Hougaard, 1984, 2000). In general,
the population consists of individuals that differ in their susceptibility to
causes of death or failure, responses to treatments and influences of various
risk factors. Furthermore, it is not always possible to include all risk factors
or covariates into the model. In addition, there are always unknown and unobservable
risk factors which can cause further heterogeneity within the population. To
overcome these problems, a random variable incorporating these unknown or unobservable
risk factors is included in the models for analysis of survival data. This random
variable, known as frailty, was introduced by Vaupel et
al. (1979) and acts multiplicatively on the hazard function. The resulting
modified hazard model, where the frailty variable is multiplied by the baseline
hazard function, is called a frailty model (Hougaard, 1984;
Aalen, 1988).
The underlying distribution most often used in association with the frailty
model is the gamma distribution due to its mathematical tractability. Other
choices are the inverse Gaussian (Hougaard, 1986; Keiding
et al., 1997; Kheiri et al., 2007)
and the positive stable distribution (Qiou et al.,
1999). An extended family of frailty distribution called the Power Variance
Function (PVF), which includes the gamma, inverse Gaussian and positive stable
distribution as special cases, was introduced by Hougaard
(1986). Subsequently, the PVF distribution was reparameterized by Aalen
(1992) and applied under various contexts by Hanagal
(2008, 2009). The distribution provided additional
flexibility when applying frailty models to real data.
In this study, we introduce the mixed Weibull hazard function with the frailty
term and PVF distribution to accommodate the problem of a heterogeneous population.
Since the proposed model includes four unknown parameters, we discuss the use
of Maximum Likelihood Estimation (MLE) for estimating these parameters. Finally,
the proposed model is applied to a real data set involving sufferers of AIDS
(Acquired Immunodeficiency Syndrome) to evaluate its usefulness.
A MIXTURE WEIBULL HAZARD RATE WITH PVF FRAILTY
Let T be an arbitrary continuous nonnegative random variable and Z be an nonnegative
random variable, called the frailty or mixing random variable. The frailty model
is the product of a random effect Z and baseline hazard rate. The conditional
hazard rate and survival function, respectively are:
and
where, h0(t) is called baseline hazard function and
is the cumulative baseline hazard function (Vaupel et
al., 1979).
If Z be a frailty random variable with probability density function (pdf) f(z)
then the marginal survival function or mixture survival function can be derived
by using the Laplace transform of Z (Hougaard, 2000). The
marginal survival function or mixture survival function, denoted SM(t)
is given by:
SM(t) = [e-ZH0(t)] = LZ[H0(t)] |
Where:
is the Laplace transform of Z. The mixture hazard rate, denoted hM(t)
is given by:
Weibull distribution: Let T be a random variable distributed as Weibull
distribution with parameters η and γ then the pdf denoted by Weibull
(η, γ) is given by:
The survival function of the Weibull distribution is:
The corresponding hazard function and cumulative hazard function, respectively
are:
and
PVF frailty distribution: Let Z be non-negative frailty random variable
from the PVF distribution. The pdf of Z is given by:
where, μ>0, δ>0 and 0<v<1 (Aalen,
1992; Duchateau and Janssen, 2008). The corresponding
Laplace transform is:
where, E(Z) = μ and Var(Z) = δμ2 are mean and variance,
respectively. In order to avoid an unidentifiable scale factor, in the frailty
model, it is common practice to take the mean of the frailty distribution as
equal to 1. If the parameter μ = 1 then E(Z) = 1 and Var(Z) = δ. The
Laplace transform under the mean equal to 1 is:
with δ = 0 and 0<v<1.
Under baseline cumulative hazard function in Eq. 4 and the
Laplace transform in Eq. 5, we can obtain the marginal survival
of the mixture Weibull with the PVF distribution:
and the corresponding mixture hazard function is:
where, η>0, γ>0, δ>0 and 0<v<1.
Theorem 1: Let T be a random variable of the mixture Weibull hazard
rate with PVF frailty distribution, denoted by T~Weibull-PVF (η, γ,
δ and v). The pdf of T is given by:
where, η>0, γ>0, δ>0 and 0<v<1.
Proof: Since:
We substitute hM(t) in Eq. 8 with 7 and SM(t)
in Eq. 8 with 6, we obtain:
with η>0, γ>0, δ>0 and 0<v<1.
The pdf plot of some values of the parameters is shown in Fig.
1 and its survival function and hazard function are presented, respectively
in Fig. 2 and 3.
The Weibull-PVF distribution includes some mixture hazard function, i.e.:
• |
If v = 1/2 then the Weibull-PVF can be reduced to the mixture
Weibull hazard rate with the inverse Gaussian frailty distribution (Weibull-Gamma)
which its hazard function given by: |
and the survival function associated with Eq. 9 is:
This is related to the inverse Gaussian frailty distribution (Duchateau
and Janssen, 2008).
• |
If v = 1 then the Weibull-PVF can be reduced to the mixture
Weibull hazard rate with the gamma frailty distribution (Weibull-IG) which
its hazard function given by: |
and the survival function associated with Eq. 10 is:
|
Fig. 1(a-f): |
The pdf plots of Weibull-PVF (η, γ, δ, v)
(a) 1, 1, 1, 0.5, (b) 0.5, 1, 1, 0.5, (c) 0.5, 3, 0.5, 0.2, (d) 2, 2, 3,
0.7 (e) 1, 2, 1, 0.9 and (f) 0.5, 2, 0.5, 0.5, f(t): Probability density
function |
|
Fig. 2(a-f): |
Some plots of Weibull-PVF (η, γ, δ, v) (a)
1, 1, 1, 0.5, (b) 0.5, 1, 1, 0.5, (c) 0.5, 3, 0.5, 0.2, (d) 2, 2, 3, 0.7,
(e) 1, 2, 1, 0.9 and (f) 0.5, 2, 0.5, 0.5, S(t): Survival function |
|
Fig. 3(a-f): |
The hazard shapes under Weibull-PVF (η, γ, δ,
v) (a) 1, 1, 1, 0.5, (b) 0.5, 1, 1, 0.5, (c) 0.5, 3, 0.5, 0.2, (d) 2, 2,
3, 0.7, (e) 1, 2, 1, 0.9 and (f) 0.5, 2, 0.5, 0.5, h(t): Hazard function |
It is the Weibull-Gamma frailty model (Molenberghs and
Verbeke, 2011).
Moreover, the Weibull-PVF distribution is also related to well known distributions,
e.g., Weibull and Lomax distribution as shown in the following Corollary 1 and
2.
Corollary 1: If δ→0, δ>0 then the Weibull-PVF (η,
γ, δ, v) distribution reduces to the Weibull (η, γ).
Proof: First, we consider the hazard function of Weibull-PVF (η,
γ, δ, v) if δ→0, δ>0, then:
Which is equivalent to Eq. 3. The survival function of Weibull-PVF
(η, γ, δ, v) when δ→0, δ>0 can be derived as
follows:
Since the limit leads to 0/0, we then apply the LHospitals rule:
Hence:
Consequently, we will obtain the pdf of Weibull-PVF (η, γ, δ,
v) when δ→0 based on Eq. 11 and 12
is:
fM(t) = hM(t).SM(t)
= ηγtγ-1 exp(-ηtγ) |
Corollary 2: If v→1, 0<v<1 and γ is equal to 1 then
the Weibull-PVF (η, γ, δ, v) distribution reduces to the Lomax
distribution (Lomax, 1954), also called the Pareto type
II distribution with parameters
which pdf given by:
Proof: By substituting v = 1 and γ = 1 in Eq. 7,
we obtain:
and the survival function in Eq. 6, if v→1 and γ
= 1, then:
Since the limit leads to 0/0, we then apply the LHospitals rule:
Therefore:
We then obtain the pdf of the Lomax distribution as:
ESTIMATION OF PARAMETERS
The MLE is the most popular method of estimation and inference for parametric
models. Recall that, if t1, t2,
, tn are
random samples from a population with a pdf
where
is a vector of parameters, then the likelihood function is defined by:
The likelihood function for the Weibull-PVF distribution where
,
can be written as:
and let L (·) = log L (·) which is in the form:
In real life problem of survival data analysis, we often found with the right
censored data. This paper derives the likelihood function and the log-likelihood
function for the Weibull-PVF distribution under right censored data.
The likelihood function for right censored data is defined as:
where, di is the censoring indicator, taking the value one if the
event is occurred, otherwise di take value zero. The log likelihood
function for the Weibull-PVF distribution with right censored data where
,
can be written as:
We can find the MLE by using a numerical method to optimize the equation:
Since the first derivatives of the log likelihood function are so complicated
and it is hardly to solve this system of equations. We then use the Newton-Raphson
procedure as the alternate iterative algorithm for estimating the parameters
under MLE.
APPLICATION TO AIDS DATA SET
Selvin (2008) presented data from the San Francisco
Mens Health Study (SFMHS) which was conducted in 1983 as a population-based
study of the epidemiology and natural history of the newly emerging AIDS. The
data explored here are a small subset consisting of the 174 individuals who
entered the study during the first year. Survival time is defined as the number
of months from diagnosis of AIDS to death with right censoring. In this study
we omit the covariates.
We fit the AIDS data set under the Weibull, Weibull-Gamma, Weibull-IG and Weibull-PVF.
The results of the estimated parameters by using MLE and the log-likelihood
values are shown in Table 1. This study utilizes the survival
function obtained by using the Kaplan-Meier estimate. The obtained survival
curves based on Weibull, Weibull-Gamma, Weibull-IG and Weibull-PVF are in Fig.
4. Moreover, the histograms of the AIDS data with fitted curves of Weibull,
Weibull-Gamma, Weibull-IG and Weibull-PVF are shown in Fig. 5.
The log-likelihood value of Weibull-PVF is-632.2820, which is better than that
of Weibull, Weibull-Gamma and Weibull-IG.
|
Fig. 4(a-d): |
The estimated survival curves using Kaplan-Meier based on
(a) Weibull, (b) Weibull-Gamma, (c) Weibull-IG and (d) Weibull-PVF model,
S(t): Survival function |
|
Fig. 5(a-d): |
The histograms of the AIDS data with fitted (a) Weibull,
(b) Weibull-Gamma, (c) Weibull-IG and (d) Weibull-PVF model |
Table 1: |
Estimated parameters and log-likelihood values of different
models for the AIDS data set |
 |
That is, the Weibull-PVF model is more effectively fit to this data set than
Weibull, Weibull-Gamma and Weibull-IG which are shown Fig. 4
and 5.
In addition, the log-likelihood values which are shown that the mixture Weibull
model provides a better fit compare to the standard Weibull model for this data
set when omits covariates or risk factors. However, the Weibull-PVF is more
flexible than Weibull Gamma and Weibull-IG since it includes the both models
as a special cases.
CONCLUSION
In this study, we introduced the four-parameters Weibull-PVF distribution.
This model included Weibull-Gamma and Weibull-IG as sub-models and related to
Weibull and Lomax distribution. Figure 1-3
showed the pdf plots, survival curves and hazard shape of Weibull-PVF for some
values of parameter, respectively. The MLE was used to estimate model parameters.
It was also included the likelihood function for right censored data, which
often occurs in survival analysis. Furthermore, the AIDS data with right censoring
from the SFMHS is applied to model fitting. The data was fitted to the Weibull,
Weibull-Gamma, Weibull-IG and Weibull-PVF models, we found that the Weibull-PVF
model gave the best fit. Thus, we can say that the Weibull-PVF distribution
is an alternative model for heterogeneous populations in survival analysis.
ACKNOWLEDGMENTS
We are grateful to the Commission on Higher Education, Thailand, for financial
support through a grant fund under the Strategic Scholarships Fellowships Frontier
Research Networks.