Generalized Estimating Equations (GEE) are useful for analyzing correlated data with categorical or continuous responses [1,2]. Parameter estimation is conducted through estimating equations which converge to a sum of mean zero random variables if the mean structure is correctly specified. There is no need to specify a joint distribution for the responses. However, assessing model fit is further complicated with GEE than for models assuming independence because no likelihood is available and the residuals are correlated within a cluster. Some methods are available for assessing the fit of GEE regression models with binary responses. Horton developed a goodness-of-fit test for assessing such model fit by extending Hosmer goodness-of-fit statistic for ordinary logistic regression. Their proposed statistic has an approximate chi-squared distribution when the model is specified correctly. Barnhart also propose a goodness-of-fit statistic for assessing the fit of GEE binary regression models. They extend Tsiatis method for assessing the fit of ordinary logistic regression models. This approach involves partitioning the space of covariates into distinct regions and forming score statistics that are asymptotically distributed as chi-square random variables with the appropriate degrees of freedom. Barnharts approach is best employed in the situation when there are only discrete covariates available because then there is no need to partition the covariates. Pan has proposed goodness-of-fit tests for GEE with correlated binary data. Pans two tests result in the Pearson chi-square and an unweighted sum of residual squares, both of which are based on the residuals. These two tests can only be used when there is at least one continuous covariate available. If the possibility of influential observations is of concern to the data analyst, Preisser have proposed deletion diagnostics for generalized estimating equations. The diagnostics consider leverage and residuals to measure the influence of a subset of observations on the fitted regression parameters. Preisser also generalize the GEE procedure to produce parameter estimates and fitted values that are resistant to influential data. Here, the concern is assessing the fit of GEE categorical response models by determining how well the covariates predict the subjects responses. We present the kappa-like classification statistic, which indicates how well the proposed model predicts the categorical response.
GENERALIZED ESTIMATING EQUATION (GEE)
We outline Lipsitz method as follows. Assume the response of
interest is a categorical outcome with K categories denoted zit =
k if the tth subunit from the ith cluster falls in the kth category, for i =
1, 2,......,N; t = 1, 2,.....,Ti(T = max(Ti)∀i
= 1, 2,.....,N) and k = 1, 2,.........,K. For simplicity, we will assume that
the data are balanced, i.e., Ti = T for i = 1, 2,......,N. The following
method will still be applicable in the case of unbalanced data. The T(K-1)x1
response vector Yt for cluster I consists of the binary random variables
Yitk, where, Yitk = 1 if Zit = k.
Typically one models the marginal cumulative probabilities of response, vitk
= Pr (Zit≤ k) for k = 1, 2,.......,K-1. The marginal probabilities
are denoted by πitk = Pr (Zit = k) = Pr (Yitk
= 1) = E (Yitk) = vitk - vit,k-1 and will comprise
the T(K-1)x1 vector πi. The vectors Yi and πi
require only T(K-1) elements instead of TK elements because
for i = 1, 2,........,K-1 and t = 1,2,........,T. Let Xit be the
p x 1 covariate vector for the tth subunit of the ith cluster.
The cumulative marginal response probabilities will be related to the covariates via ling function g, the kth intercept λk and the px 1 marginal parameter vector β, g (vitk) = λk + Xitβ.
The intercept are in increasing order: λ1≤λ2≤........≤λK-1.
For an ordinal response, the function g may be any link function such as the
logit function, probit function (φ-1), or the elementary log-log
function. Lipsitz suggest that one estimates β with the
following set of generalized estimating equations:
where, Di = Di (β) = dπt (β)|dβ, Vi = Vi (β, α) ≈ var (Yi) is a working covariance matrix of Yi[1,2] and α is a q x 1 vector of correlation parameters. The parameters α are associated with the correlation between the elements of the vectors Yis and Yit.
We used a kappa-like statistic to assess model fit for GEE categorical response
models. Historically, the kappa coefficient has been used to determine the agreement
of binary and categorical outcomes between raters.
Kappa corrects the percentage of agreement between raters by taking into account
the proportion of agreement expected by chance. Kappa has been used as a measure
of reproducibility in many epidemiologic settings, such as studies involving
twin similarity  and control-informant agreement collected from
case-control studies. The general expression for the kappa statistic is:
where, P0 is the observed proportion of agreement and Pe is the proportion of agreement expected by chance alone. A value of 0 for k indicates no agreement beyond chance and a value of 1 indicates perfect agreement, among many of ks desirable properties. Thus, larger values of k indicate greater agreement between the outcomes.
Here we use k as a measure of agreement between the predicted and observed
categorical responses to assess the fit of the GEE model. We estimate k in a
second set of estimating equations, similar to Lipsitz, Klar
and Williamson. With Lipsitzs method, we
estimate the probability of the response falling in each of the K categories.
Denote this estimated probability for the kth category, tth subunit, of the
ith cluster as
. We do not have a straightforward predicted response as with linear regression.
However, if we did have a predicted response for the tth subunit of the ith
, it is natural to assume that
would equal k (k = 1,2,AAA,K) with probability
. Let P0it denote the probability that the predicted response from
the model is equal to the observed response, i.e.,
= Zit. A natural estimate of P0it is obtained by using Zit,
the estimated probability from the fitted model that the response falls into
the observed category for the tth subunit of the ith cluster.
We define kit as the agreement between the predicted and observed
responses for the tth subunit of the ith cluster as follows:
where, P0it is defined above and Pe is the probability of correct prediction expected by chance alone.
As an estimate of Pe, we fit an intercept-only model. Cox
and Nagelkerke proposed using an intercept-only model as a baseline
model when generalizing the coefficient of determination for assessing the fit
of a logistic regression model. Thus, we will fit a model with just the intercepts,
the λk parameters for k = 1,2, AAA,K-1, and no covariates. This
baseline model will provide a good starting point from which to compare the
proposed model. The estimated category probabilities from the intercept-only
model will be the same for all clusters and subunits and will be denoted:
where, nk is the sum of observations in category k, i.e.,
for k = 1, . . .,K. All nk observations with response category k
will each be correctly predicted with probability
; accordingly, the estimate of Pe is:
The agreement between two raters for assessing a categorical outcome with K
categories can be depicted in a K x K contingency table. The
row and column total probabilities for the kth outcome category are pk.
and p.k, the marginal probabilities that the two raters assess the
outcome in the kth category. The estimate of the probability expected by chance
is calculated assuming independence between the rows and columns in the contingency
table and is ,
which is similar to the estimate above. We will estimate an overall k to ascertain
the fit of the model (k = kit for i = 1, . . .,N and t = 1, . . .
, T). By noting that P0it = Pe+k(1.0-Pe), we
use a second set of estimating equations as follows. Let P0i and
Ui denote the Tx1 vectors
The second set of estimating equations are, thus:
W i≈ var (Ui) is the T xT working covariance matrix
ofUi. To compute
one can use a Fisher-scoring-type algorithm such as:
where, m denotes the iteration. We use Liangs empirically
corrected variance estimate of
and Prentices empirically corrected variance estimate of
The second set of estimating equations can be solved non-iteratively if we choose
the T x T identity matrix for Wi:
is the average predicted probability corresponding to the observed responses.
If the fitted model predicts the categorical response perfectly, i.e., Zit
= 1.0 . If the fitted model predicts the responses no better than an intercept-only
model, thenThis kappa-like classification statistic should be interpreted as
the average probability of predicting the observed responses above and beyond
the prediction by the intercept-only model.
DATA AND VARIABLES
In present study the diabetes mellitus data was used to carry out the analysis. Here the follow up data on 528 patients registered at BIRDEM (Bangladesh Institute of Research and Rehabilitation in Diabetes, Endocrine and Metabolic disorders) in 1984-94 are used to identify the risk factors responsible for the transitions from controlled diabetic to confirmed diabetic state as well as confirm diabetic to controlled stage of diabetes. The response variable is defined in terms of the observed glucose level two hours of 75 g glucose load for each follow-up visit. The cut-off point for the blood glucose level is 11.1 mM L-1. If the observed response is less than 11.1, then the patient is defined as non diabetic (categorized as 0) if the response is greater than or equal to 11.1 then the patient is said to be diabetic (categorized as 1) according to WHO (1985) criteria. We include six independent variables, age, sex, education level, area of residence, family history of father and mother and time in our study. Out of these variables, age represents the age of the respondents at each visit. Time represents the length of time of the consecutive visits. These two variables are continuous variables and used directly in the analysis. Other variables are dichotomous variable.
RESULTS AND DISCUSSION
The kappa-like statistic takes on a value of 0.0 for the intercept-only model
and a value of 1.0 for the saturated model. An advantage of the statistic is
that no decisions need be made when calculating it, unlike methods based on
covariate partitioning (where to partition, how many partitioned categories),
Hosmer and Lemeshows approach, rank correlation methods and classification
tables. Interpretation of the kappa statistic is not always straightforward;
see Fleiss and Landis for details. Similar to
Landiss labeling of kappa values, Williamson
suggest interpreting the values of kappa for this classification index as follows:
First we fit a GEE logistic regression model with main effects terms only.
We then examined various interaction and quadratic variables for entry into
the regression model at the significance level of 0.05. The quadratic terms
for age and time were significant and entered into the final model. The kappa-like
classification index for this final model increased indicating a better fit.
|| GEE logistic regression models without quadratic terms assuming
the various correlation structures for diabetic mellitus study
||GEE logistic regression models with quadratic terms assuming
the various correlation structures for diabetic mellitus study
With the inclusion of the two quadratic terms, the model fit the data quite
well according to Barnharts test.
From Table 1, it can be seen that the kappa-like statistic is a fair predictor for the GEE model of exchangeable correlation structure, a good predictor for the autoregressive correlation structure and poor predictor for pairwise correlation structure. Though the likelihood ratio test shows significant effect of covariates in the model but this test does not indicate how well the model predicts the observed responses.
Table 2 shows that when we include the two quadratic terms the kappa-like statistic is good for GEE model of exchangeable correlation structure, a excellent predictor for the autoregressive correlation structure and fair predictor for pairwise correlation structure.
The kappa-like classification statistic is a more appropriate indicator of how well the model predicts the observed responses at the cluster level (e.g., an individual) as opposed to how well the model fits the data at the group level (e.g., treatment category). Often a model can fit the data well in terms of predicting the proportion of positive responses for a group of individuals, but is not necessarily useful for predicting a particular individuals response. The kappa-like classification index is an intuitive measure for assessing model fit in that it estimates the probability of an observation being correctly predicted by the fitted model. Then, this probability is corrected for chance by comparing it to the probability that an intercept-only model would have correctly predicted the observation. For the diabetes mellitus study, we would recommend that the kappa value for the GEE model with quadratic terms indicated better prediction than the GEE models without quadratic terms. That is, the GEE models with quadratic terms are fitter well.
The authors thank Dr. Kalipada Sen, Professor, Department of Statistics, University
of Dhaka, Bangladesh for helpful discussions and manuscript review.