Research Article
An Analytical Approach on Non-parametric Estimation of Cure Rate Based on Uncensored Data
Not Available
A. Sen
Not Available
M.S. Noor
Not Available
M.N. Islam
Not Available
Z.I. Chowdhury
Not Available
When analyzing survival data from clinical trials, it is sometimes clear that a non-zero proportion of patients can be considered as cured. Cantor and Shuster (1992) make a constructive discussion about parametric and non-parametric methods for estimating cure rate based on censored survival data. They use Kaplan-Meier (1958) method for non-parametric estimation of cure proportion. On the other hand for parametric estimation of cure rates, they assume a survival function S(t) for which i.e., in a proportion of patients the event never occurs, using MLE one can estimate S(∞), which is considered as cure rate. Since, in many cases, distributional form clearly does not fit the survival data well, the use of non-parametric methods is attractive. However, the Kaplan-Meier method is not free of problems and limitations. Miller (1983) shows that the asymptotic efficiency of the KaplanMeier estimator tends to be low relative to the MLE for a given distribution.
Tsodikov (2001) studied the estimation of survival function based on proportional hazard model with cure. He proposed an algorithm to fit the proportional hazard model restricted by the fixed survival rates at the end of observation period. He used parametric cure model to estimate the proportion of long term survivors. To combine the stability of the parametric method, the survival function is estimated non-parametrically conditional on the cure rates provided by the parametric analysis. Peng and Carriere (2002) have proposed parametric and semi-parametric cured models for cure rate estimation. In their study, several parametric and semi-parametric models are compared and their estimation methods are discussed within the framework of EM algorithm. They showed that semi-parametric cure models can achieve efficiency levels similar to those of parametric cure models, provided that the failure time distribution is well specified and non-cured patients have an increasing hazard rate. They also recommended that the semi-parametric model is a viable alternative to parametric cure models. They have proposed mixture models for these analyses. The main objective of this analysis is to develop the non-parametric estimating equation to estimate the cure parameter of the model.
Cure model: Cure models were first proposed 50 years ago (Boag, 1949; Berkson et al., 1952; Haybittle, 1959) and have since received regular attention in the statistical literature (Haybittle, 1965; Mould et al., 1975; ONeill, 1979; Farewell, 1982; Goldman, 1984; Sposto and Sather, 1985; Farewell, 1986; Halpern, 1987; Goldman, 1991; Sposto et al., 1992; Sposto, 2002; Kuk et al., 1992; Maller et al., 1992; Cantor et al., 1992; Ghitany et al., 1994; Yakovlev, 1994; Cantor et al., 1994; Ghitany et al., 1995; Lee et al., 1995; Zhou et al., 1995; Laska et al., 1992; Tsodikov et al., 1998; Peng et al., 1998; Gieser et al., 1998; Tsodikov, 1998; Hauck et al., 1997; De Angelis et al., 1999; Sy and Tayler, 2000). However, they have not attained wide use or acceptance in the medical research literature, perhaps in part because of their reliance on particular parametric forms. However, parametric cure models provide a good empirical description of outcome in paediatric cancer data (Sposto and Sather, 1985; Sposto et al., 1992). Most importantly, they provide a single analytic method within which the effect of treatments and prognostic factors on the proportion cured can be assessed separately from their effect on the time to failure.
Frailty model: Chen et al. (1999) have proposed a different type of cure rate model, which is considered as a frailty model. The frailty model can be derived as follows:
Suppose that for an individual in the population, N denotes the total number of carcinogenic cells (often called clonogens) for that individual left active after the initial treatment and then assume that N has a Poisson distribution with mean θ. Also let Zi denote the random time for the ith clonogenic cell to produce a detectable cancer mass. That Zi can be viewed as an incubation time for the ith clonogenic cell. The variable Zi, i = 1, 2, .., are assumed to be identically independently distributed (i.i.d.) with a common distribution function F(t)=1-S(t)and are independent of N. Where S(t)is the survival function. The time to relapse of cancer can be defined by the random variable T= min{Zi, 0≤i≥N}, P(Z0 = ∞) = 1 and N is independent of the sequence Zi, Z2, The survival function for T and hence the survival function for the population is given by
(1) |
The model (1) is not a proper survival function. Because SP(∞) = exp(-θ). Note that F(0) = 0 and F(∞) = 1. We observe that model (1) shows explicitly the contribution to the failure time of two distinct characteristics of tumor growth: the initial number of carcinogenic cells and the rate of their progression. Thus the model incorporates parameters bearing clear biological meaning. The model (1) is suitable for any type of failure data that has a surviving fraction (cure rate). Thus the model can be useful for modeling various types of failure time data, including time to relapse, time to death, time to first infection and so forth.
We also observe that the cure rate π is given by
(2) |
As θ→ ∞, the cure rate tends to zero, whereas as θ→ o, the cure rate tends to 1. i.e., the cure rate lies between 0 and 1. Note that by taking first derivative of (1), we get,
Where, SP(t) and F(t) denote the first derivative of SP(t) and F(t), respectively and F(t) = f(t)
Or, - SP(t) = θf(t)exp(-θF(t))
Since, - SP(t) = θfP(t)
The density function corresponding to model (1) is given by
(3) |
We observe that SP(t) is not a proper survival function, because SP(∞) ≠ 0. Therefore fP(t) is not a proper probability density function. But f(t) in model (3) is a proper density function.
We try to estimate the cure parameter by using Non Parametric Maximum Likelihood Estimation (NPMLE) Method. The method is described as follows:
Non-parametric maximum likelihood method: Suppose that X is a random variable with probability density function f(x;θ), θ to be estimated and x1, x2, .. xn is a random sample of size n. The joint probability density function of the random variable comprising the sample is called the likelihood function of the sample and is given by
(4) |
If there exists a value L(x1,x2, xn;θ*)≥ L(x1,x2, xn;θ) such that θ for all possible choices of θ*, then based upon the meaning of likelihood function, θ* maximizes the Eq. (4) and this value θ* is considered as the maximum likelihood estimate. Choosing the value of θ that makes it most likely that the data would be as obtained is certainly a reasonable approach. Therefore, if a value θ* can be found such that θ* maximizes the likelihood function (4) for a given set of sample values x1,x2, xn, then θ* is called the maximum likelihood estimate for the given set of sample values. Since the likelihood function is a function of parameter, θ under the sample information (x1,x2, xn), the maximum likelihood estimate will be a function of the sample values. If the sample functional relationship between the estimate and the sample information (x1,x2, xn) holds for all possible choices of the xi, then that functional relationship can be taken as an estimation rule and the result will be an estimator of θ, this estimator being known as maximum likelihood estimator or the maximum likelihood filter.
In Non-parametric maximum likelihood method, we write the non-parametric likelihood function as
(5) |
where, F(.) is the common distribution function of Xi,1≤i≤n and ΔF(xi)-F(xi-) is the jump of F(.) at Xi,1≤i≤n. Putting ΔF(xi) = pi in (5), we can write
(6) |
Now we maximize (6) subject to condition
(7) |
The log-likelihood function becomes
(8) |
By using Lagrange multiplier method we can maximize (8). By adding a Lagrange multiplier λ, (8) becomes
(9) |
Thus, the non-parametric maximum likelihood estimator of pi is obtained by the solution of the following equations
(10) |
(11) |
(12) |
From (11), we can write
(13) |
(14) |
Therefore, the non-parametric maximum likelihood estimator of pi is
Suppose that T is the life time of a patient. Then P(X=∞) = , which is considered as cure rate. On the other hand, P(X<∞) = 1-e-θ, 0≤t≤∞ which is the probability of non-cured.
By using Non-Parametric Maximum Likelihood Estimation (NPMLE) method, we can estimate the cure parameter. For uncensored data we consider the following cases:
Case-(a): F0(.), f0(.) and θ are unknown. Here we observe both cured and non-cured group.
Suppose that we have the data in the form (xi,εi), i = 1,2,..,n,, where xi denotes the survival time for the ith patient, εi is the cured indicator with 1 if xi is not cured and 0 otherwise i.e., εi = 1{xi<∞}.
(15) |
Therefore, the non-parametric likelihood function is given by
(16) |
Where, ΔF(xi) = jump of F(.) at xi
(17) |
Where,
Therefore, the above likelihood function (16) can be written as
(18) |
We want to maximize (18) subject to condition
(19) |
The log-likelihood function becomes
(20) |
By using Lagrange multiplier method we can maximize (20). Adding Lagrange multiplier λ, we can write (20) as follows
(21) |
Therefore, the non-parametric maximum likelihood estimators of θ and Pi are obtained by the solution of the following equations
(22) |
Now gives,
(23) |
Similarly, gives, I = 1,2, ,n-1
(24) |
and gives,
(25) |
Multiplying (24) by pi and summing over i from 1 to n, we obtain the following equation
(26) |
Since, , so the equation (26) becomes
(27) |
(28) |
Now using the estimate of λ in (24), we get
(29) |
Therefore,
(30) |
Comment: This Eq. 30 can be considered as an estimating equation of Pi which can not be solved analytically but may be solved numerically. So the solution of this equation is our desired estimate of Pi.
Again using (28) in (25), we may obtain the estimate of Pn
Comment: The above equation also can not be solved analytically but may be solved numerically.
Finally the estimate of θ may be obtained from the numerical solution of Eq. (23).
Case-(b): F0(.), f0(.) and θ are unknown and only non-cured group are observed. Suppose that we have the data in the form (xi,εi), I = 1,2,..,n,where, xi denotes the survival time for the ith patient, εi is the cured indicator with 1 if xi is not cured and 0 otherwise. i.e., εi=1 {xi<∞}. The non-parametric likelihood function can be written as
The log-likelihood function is given by
(31) |
We want to maximize (31) subject to condition , By using Lagrange multiplier method we can maximize (31). Adding Lagrange multiplier λ in (31), we can write
(32) |
Therefore, the non-parametric maximum likelihood estimators of θ and pi are obtained by the solution of following equations
(33) |
Now gives
(34) |
and gives, i =1,2, ,n-1
(35) |
(36) |
Multiplying (35) by Pi and summing over i from1 to n we obtain
(37) |
Since , so the above equation becomes
(38) |
Multiplying Eq. (34) by θ and subtracting from (38), we obtain
Therefore,
(39) |
Using the estimate of λin (35), we obtain
Thus,
(40) |
Comment: This is an estimating equation of Pi, i =1,2,..,n-1. and it can not be solved analytically but it may be solved numerically and the solution of this equation is the desired estimate of Pi.
Again, using (39) in (36), we obtain
(41) |
We observe that the estimate of Pn may be obtained from the numerical solution of the Eq. (41)
Finally the estimate of θ can be obtained from the numerical solution of the Eq. (34).
Considering both non-cured and cured group, when we assume f0(.) and F0(.) are unknown, we found a non-parametric estimating equation of θ. Unfortunately we could not find an explicit solution for θ. But hopefully, this non-parametric estimating equation may be solved numerically by choosing an appropriate numerical method. Also we have found the same result when we consider non-cured group only. That is, in both the cases we have found a non-parametric estimating equation for θ.