In the present study, we extended the work that was published in performing statistical modeling on the breast cancer data containing patients in two different treatments, namely radiation plus tamoxifen and tamoxifen alone and compared the results and identified the attributable variables along with interactions that relates to reoccurrence of breast cancer based on the consistent results obtained from accelerated failure time model and cox proportional hazard model where interactions of attribute variables are introduced. Cox proportional hazard model is also chosen over accelerated failure time model to calculate the appropriate survival curves of relapse time for patients in different treatment groups. In addition, we have developed and applied the cure rate model which gives a better insight on the cure rate of the two breast cancer treatments from a different perspective.
PDF Abstract XML References Citation
How to cite this article
In the reoccurrence of breast cancer, there are many factors that might contribute or associated with it and a significant amount of studies have been published (Freedman et al., 2009; Brawley, 2009; Eleni and Gabriel, 2008; Habibi et al., 2008) to determine which factors or attributable variables are significantly contributing in predicting the relapse time and breast cancer therapy. The purpose of this study is to identify the significant attributable variables as well as all possible interactions among those variables and also to construct a statistical model to predict the relapse time as a function of those attributable variables and interactions so that given the information of the attributable variables, we will be able to identify how much time it takes before the reoccurrence of breast cancer for a specific patient. Once, the relapse time is obtained from the model with respect to the single treatment of tamoxifen alone and the model with respect to combined treatment of tamoxifen and radiation, it can provide very important guidance on which treatment to choose to increase reoccurrence time (Hancke et al., 2009; Kimmick et al., 2009; Throckmorton and Esserman, 2009; Hershman et al., 2008; Schnitt and Harris, 2008; Trent et al., 2007). Three different kinds of statistical models are presented: a parametric regression model which assumes certain probabilistic distribution for the error term; a semi-parametric model-cox proportional hazard model which assumes a proportional hazard and a cure rate model which takes into consideration cure rate, i.e., part of the patients that will never experience reoccurrence of breast cancer and for those patients who are subject to reoccurrence, various parametric models are constructed to predict the relapse time.
MATERIALS AND METHODS
Between December 1992 and June 2000, a total of 769 women were enrolled and randomized (Fyles et al., 2004) of which 386 received combined radiation (Taylor and Kim, 1993) and tamoxifen (RT+Tam) and the rest, 383, were assigned to tamoxifen-alone arm (Tam). The last follow up was conducted in the summer of 2002. The study introduced only those 641 patients that were enrolled at the Princess Margaret Hospital: 320 were treated with radiation and tamoxifen and 321 were treated with tamoxifen only. A summary of the actual data used in the present study is shown in Fig. 1.
The proposed statistical models in the present study are constructed for patients in RT+Tam Group and Tam group, respectively. Information concerning potential prognostic factors (attributable variables) are pathsize (size of tumor in cm); hist (Histology: DUC = Ductal, LOB = Lobular, MED = Medullar, MIX = Mixed, OTH = Other); hrlevel (Hormone receptor level: NEG = Negative, POS= Positive); hgb (Hemoglobin g L-1); nodediss (Whether axillary node dissection was done: Y = Yes, N = No); age (Age of the patient in years).
|Patients treatment data
The dependent variable or response variable is the relapse time (in years) of a given patient.
One important question that we will address is that which of these attributable variables are significantly contributing to the response variable - the relapse time. In addition, identify all possible contributing to relapse time.
ACCELERATE FAILURE TIME MODEL (AFT) AND COX PROPORTIONAL HAZARD (COX-PH) MODEL
AFT model: When covariates are considered, we assume that the relapse time has an explicit relationship with the covariates. Furthermore, when a parametric model (Lawless, 2003), is considered, we assume that the relapse time follows a given theoretical probability distribution and has an explicit relationship with the covariates.
Let T denote a continuous non-negative random variable representing the survival time (relapse time in this case), with probability density function (pdf) f(t) and cumulative distribution (cdf) F(t) = Pr(T<t). We will focus on the survival function S(t) = Pr(T>t), the probability of being alive at t reoccurrence free in this case. In this model, we start from a random variable W with a standard distribution in (-∞, +∞) and generate a family of survival distributions by introducing location and scale parameters to relate to the relapse time as follows:
where, α and σ are the location and scale parameters, respectively.
Adding covariates into the location parameter we have:
where, the error term W has a suitable probability distribution, e.g., extreme value, normal or logistic. This transformation leads to the Weibull, log-normal and log-logistic models for T. This type of statistical models are also called Accelerated Failure Time (AFT) model. More information about AFT models can be found (Yamaguchi, 1992; Shang and Jeremy, 2002; Lawless, 2003).
Cox-PH model: An alternative approach to modeling survival data is to assume that the effect of the covariates is to increase or decrease the hazard function by a proportionate amount at all durations. Thus:
where, λ0(t) is the baseline hazard function or the hazard for an individual with covariate values 0 and is the relative risk associated with the covariate values x. Subsequently, for the survival functions:
Hence, the survival function for covariates x is the baseline survivor raised to a power.
Parameter estimates in the cox-PH model are obtained by maximizing the partial likelihood as opposed to the likelihood. The partial likelihood is given by:
The log partial likelihood is given by:
In application of the Cox-PH model, we also included the interactions of the attributable variables.
The major objective of applying these models is to identify which of the six attributable variables are significant contributing to the relapse time of breast cancer patients receiving different treatments. The six explanatory variables used in the models are pathsize (size of tumor in cm); hist (Histology: DUC = Ductal, LOB = Lobular, MED = Medullar, MIX = Mixed, OTH = Other) hrlevel (Hormone receptor level: NEG = Negative, POS = Positive); hgb (Hemoglobin g L-1) nodediss (Whether axillary node dissection was done: Y = Yes, N = No) age (Age of the patient in years).
The most commonly used AFT models such as exponential, Weibull and log-normal AFT models and Cox-PH model are applied. After running the model including all covariates and interactions between covariates, number of parameters that drive the attributable variables are reduced using stepwise regression based on Arkariki Information Critria (AIC) (Akaike, 1974), is a measure of the goodness of fit of an estimated statistical model.
|Significant factors in parametric regression models for RT+Tam group
|The stars (*) in the table indicates that the variable is significant at significance level of 0.05
|Significant factors in parametric regression models for Tam group
It is trades off the complexity of an estimated model against how well the model fits the data. It is given by:
AIC = -2log (likelihood)+2(p+k)
where, p is the number of parameter and k is the number of parameters in the distribution. Statistical models with lower AIC are preferred. Table 1 given below shows the covariates and interactions in the related statistical models chosen using the AIC as well as their corresponding p-values for the breast cancer patients that were treated with both radiation and tamoxifen.
As can be seen from the table, age, pathsize, nodediss, hrlevel and the interactions between age and nodediss and interaction between nodediss and hrlevel are significant with respect to relapse time of breast cancer patients who received radiation and tamoxifen. The interaction of pathsize and hrlevel proves to be significant only in Weibull AFT model.
For patients who received tamoxifen only, only nodediss, hrlevel as single attributable variables are significant with respect to relapse time in this group. It is worth noticing that although age itself is not significantly contributing to relapse time, the interaction between age and nodediss is significant. hgb is found to be significant only in lognormal AFT model.
Comparing the results from the two treatment groups, for each group at significance level of 0.05, the three AFT models give almost the same results. Significant prognostic factors for relapse time of breast cancer patients who received combined treatment of radiation and tamoxifen are age, pathsize, nodediss, hrlevel, age:nodediss, nodediss:hrlevel which appears statistically significant in all lognormal, exponential and Weibull regression models. Only in Weibull regression model pathsize: hrlevel shows significant contribution to the model. For patients who are in Tam arm, all three models show nodediss, hrlevel and age:nodediss are significant contributing, only in lognormal regression model hgb shows significance.
Furthermore, significant prognostic factors identified using Cox-PH model confirm our conclusion. There are six significantly contributing variables two of which are interactions for RT+Tam arm and three significantly contributing variables one of which is interaction for Tam arm.
Next the predicted survival curves of the three AFT models and Cox-PH model for each arm are compared to Kaplan-Meier survival curve along with 95% confidence band to determine the best predicting model for relapse time and the results will be shown and discussed.
Kaplan-Meier vs. Parametric survival analysis: From the above four models, we identified the significant attributable variables and interactions between them that contributes to the relapse time of breast cancer patients in two different treatment groups. To investigate which model gives the best fit of the relapse time of breast cancer patients in those two groups, graphical presentation would be a useful tool. In this study, Kaplan-Meier curve as a commonly used nonparametric survival curve and its 95% confidence limits are plotted against the survival curves obtained from the four models discussed above to see which model gives the closest curve to Kaplan-Meier survival curve.
Kaplan-Meier is equivalent to the empirical distribution when we have censored data and its estimator of the survival function is estimated by:
where, S(t) is the probability that an individual will not have reoccurrence of breast cancer after time t and t1≤t2≤t3≤...≤tn are the observed times until reoccurrence for a sample size n; ni is the number of survivors just prior to time ti and di is the number of deaths at time ti.
As can be seen from Fig. 2, for the second year, third year and around the sixth year , the survival curve from lognormal AFT model runs out of the 95% confidence band of Kaplan-Meier curve.
For exponential AFT model, the same graphical representation is given in Fig. 3 below. Form this Fig. 3 the survival curve estimated from the exponential AFT model is off the 95% confidence band from year 1 to year 4 and from year 5 to year 6.
Figure 4 shows the graph of survival curve obtained from the Weibull AFT model, it deviates from the 95% confidence band of the Kaplan-Meier in a similar pattern as the survival curve of the exponential AFT model.
|Survival curve from lognormal regression model vs. Kaplan-Meier survive curve and its 95% confidence interval for RT+Tam
|Survival curve from exponential regression model vs. Kaplan-Meier survive curve and its 95% confidence interval for RT+Tam
However, in Fig. 5 which shows the survival curve obtained from the Cox-PH model, it is clear that most of the time, the survival curve lies within the 95% confidence band of Kaplan-Meier curve.
Thus, we can conclude from the above analysis that Cox-PH model with interactions gives a better prediction of relapse possibility of breast cancer patients in RT+Tam arm comparing to the three AFT models.
Similarly, we proceed to perform a survival analysis of the relapse time for the patients who are treated with tamoxifen only. Figure 6-8 show the survival curves obtained from lognormal, exponential and Weibull AFT models. It is clear that those survival curves fall out of the 95% confidence limits of the Kaplan-Meier curve most of the time. However, in Fig. 9 which shows the survival curve obtained from the Cox-PH model with interactions, we can see the survival curve lies within the 95% confidence band.
|Survival curve from Weibull regression model vs. Kaplan-Meier survive curve and its 95% confidence interval for RT+Tam
|Survival curve from Cox-PH model vs. Kaplan-Meier survive curve and its 95% confidence interval for RT+Tam
|Survival curve from lognormal regression model vs. Kaplan-Meier survive curve and its 95% confidence interval for Tam
|Survival curve from Exponential regression model vs. Kaplan-Meier survive curve and its 95% confidence interval for Tam
|Survival curve from Weibull regression model vs. Kaplan-Meier survive curve and its 95% confidence interval for Tam
|Survival curve from Cox-PH model vs. Kaplan-Meier survive curve and its 95% confidence interval for Tam
Therefore, we can conclude that for patients who received tamoxifen only, Cox-PH model with interactions gives a more precise prediction of the relapse time than AFT model.
Since, Cox-PH model gives better prediction of relapse possibility than AFT models for both groups, we recommend Cox-PH model to approximate the probability of having 2, 5 and 8-year reoccurrence-free and the results are shown in Table 3.
Although, there is consistency on identifying significant prognostic factors for reoccurrence of breast cancer, it can be seen from the above six graphs, regression model might not be a good choice for predicting purpose. Cox-PH models with interactions show more efficiency over regression models with respect to predicting power. So, it would be advisable to use Cox-PH model with interactions to predict the relapse time of a breast cancer patient given all the information of the attributable variables. And as can be seen form the reoccurrence-free table, patients with combined treatments have higher possibility of free of reoccurrence of cancer than those with single treatment. More information on comparing the survival models can be obtained (Nardi and Schemper, 2003; Orbe et al., 2002).
Cure rate statistical model
Model introduction: Any clinical trial consists of a heterogeneous group of patients that can be divided into two groups. Those who respond favorably to the treatment and become insusceptible to the disease are called cured. The others that do not respond to the treatment remain uncured. It would be of interest to determine the proportion of cured patients and study the causes for the failure of the treatment or reoccurrence of the disease. Unlike the above mentioned survival parametric regression model and semi-parametric Cox-PH model with interactions that assume each patient is susceptible to failure of treatment or reoccurrence, cure rate statistical models are survival models consisting of cured and uncured fractions. These models are being widely used in analyzing cancer data from clinical trials. The first model to estimate cure fraction was developed by Boag (1949) which is called mixture model or standard cure rate model. Further development of this model can be found (Peng et al., 1998; Goldman, 1984; Farewell, 1982; Ghitany et al., 1992; Theodora et al., 2008; Fabien and Pierre, 2007; Uddin et al., 2006).
Let π denote the proportion of cured patients and 1-π is the proportion of uncured patients, then the survival function for the group is given by:
where, Su(t) is the survival function of the uncured group. It follows that the density function is given by:
For uncured patients, we assume that the failure time or relapse time T follows a classical probability distribution and also we can add the effect of covariates into the model using the parametric survival regression models that we studied in the previous section. For cure rate π, it can either be assumed constant or dependent on covariates by a logistic model, that is:
Thus, covariates may be used either in cure rate or in the failure time probability distribution of the uncured patients. These different conditions will be considered in developing the modeling process.
Estimates of parameters in the model can be obtained by maximizing the overall likelihood function given by:
where ti is the observed relapse time and σi is the censoring indicator with σi = 1 if ti is uncensored and σi = 0, otherwise.
Model results for the breast cancer data: For the survival regression part, Weibull, lognormal (Lnormal), Gamma, generalized log-logistic (GLL), log-logistic (Llogistic), generalized F (GF), extended generalized gamma (EGG) and Rayleigh parametric regression are used. The following cases encompass the above statistical analysis as set forth.
Case 1: No covariates in π and Su(t): When both cure rate and survival curve of uncured groups are independent of covariates. But we get very different cure rates using different distributions which suggest the model is very sensitive to the underlying distribution of the failure time of uncured patients.
Case 2: No covariates in π, six single covariates in Su(t): When we consider covariates in survival function of uncured group, Table 4 shows there is some kind of consistency of cure rate among different distribution assumptions.
Case 3: Six single covariates in π, six single covariates in Su(t): When we consider covariates in both cure rate and survival function of uncured group. Although, we add six covariates into cure rate, there is not much improvement in the likelihood and sometimes the likelihood is even lower, which suggests cure rate might not be dependent on those covariates; instead, we can consider it as a constant.
Case 4: No covariates in π, six single covariates and their interactions in Su(t): Since, there is no significant difference in maximum likelihood between case 2 and 3, it shows including covariates does not improve the model much. Thus, in the following analysis, we consider cure rate as a constant, i.e., independent of those covariates. Table 5 shows uniformity of cure rates using different parametric regression models.
|Likelihood and cure rate with covariates in uncured survival function
|Likelihood and cure rate with covariates and interactions in uncured survival function
After computing the AIC of the above models for each group, the smallest one for RT+Tam is Gamma, the smallest one for Tam is EGG. Hence, we choose mixture cure model with Gamma regression for uncured RT+Tam group and with EGG regression for uncured Tam group. For patients who received radiation and tamoxifen, 10% of them will be cured of breast cancer and not be subject to reoccurrence. However, for those who received tamoxifen alone, only 7.48% will be cured of breast cancer which suggests that giving radiation to breast cancer patients who take tamoxifen could possibly decrease the probability of reoccurrence of breast cancer.
Thus, it is clear that cure rate model is useful in identifying the cure rate of breast cancer in each treatment group. 10% of the breast cancer patients who received combined treatment with tamoxifen and radiation would be cured of breast cancer and not susceptible to breast cancer again. However, the percentage of cured patients in tamoxifen group is only 7.48% which is much lower than that of the combined treatment group. This not only provides us insight on treatment selection with respect to cure rate, but also gives an creditable estimation of the percentage of cured patients.
By applying AFT and Cox-PH models, the significant factors and interactions that contribute to relapse time of a breast cancer patient receiving different treatments are identified and AFT and Cox-PH gives consistent results. With respect to predicting survival curve, Cox-PH model gives better fit than AFT models. Thus, given information of covariates of a given breast cancer patient, Cox-PH model with interactions can be applied to determine the time before reoccurrence of breast cancer. From a different perspective, cure rate model takes into consideration the fact that some part of the patients are cured and will never experience reoccurrence. It is found that cure rates for RT+Tam and Tam groups both are independent of the covariates and are different. For RT+Tam group, the cure rate is 0.1 which is higher than that of Tam group which is 0.0748. Thus, using the cure rate statistical model we conclude that patients received combined treatment of radiation and tamoxifen are more likely to be cured of breast cancer and less susceptible to reoccurrence of breast cancer than those who received single treatment.
The author would like to acknowledge the useful suggestions of Dr. James Kepner, Vice President, American Cancer Society, on the subject study.
- Fabien, C. and J. Pierre, 2007. A SAS macro for parametric and semiparametric mixture models. Comput. Meth. Programs Biomed., 85: 173-180.
- Fyles, A.W., D.R. McCready, L.A. Manchul , M.E. Trudeau and P. Merante, 2004. Tamoxifen with or without breast irradiation in women 50 years of age or older with early breast cancer. NEJM., 351: 963-970.
- Kimmick, G., R. Anderson, F. Camacho, M. Bhosle, W. Hwang and R. Balkrishnan, 2009. Adjuvant hormonal therapy use among insured low-income women with breast cancer. JCO., 27: 3445-3451.
- Shang, L.C. and T.M.G. Jeremy, 2002. A semi-parametric accelerated failure time cure model. Stat. Med., 21: 3235-3247.
- Orbe, J., E. Ferreira and V.N. Anton, 2002. Comparing proportional hazards and accelerated failure time models for survival analysis. Statist Med., 21: 3493-3510.
- Peng, Y., K.B. Dear and J.W. Denham, 1998. A generalized F mixture model for cure rate estimation. Stat. Med., 17: 813-830.
- Schnitt, S.J. and J.R. Harris, 2008. Evolution of breast-conserving therapy for localized breast cancer. JCO., 26: 1395-1396.
- Throckmorton, A.D. and L.J. Esserman, 2009. When informed all women do not prefer breast conservation. JCO., 27: 484-486.
- Trent, S., C. Yang, C. Li, M. Lynch and E.V. Schmidt, 2007. Heat shock protein B8, a cyclin-dependent kinase independent cyclin D1 target gene contributes to its effects on radiation sensitivity. Cancer Res., 67: 10774-10781.
- Yamaguchi, K., 1992. Accelerated failure-time regression model with a regression model of surviving fraction: An application to the analysis of permanent employment in Japan. JASA., 83: 222-230.