INTRODUCTION
Generalized Estimating Equations (GEE) are useful for analyzing correlated data with categorical or continuous responses^{ [1,2]}. Parameter estimation is conducted through estimating equations which converge to a sum of mean zero random variables if the mean structure is correctly specified. There is no need to specify a joint distribution for the responses. However, assessing model fit is further complicated with GEE than for models assuming independence because no likelihood is available and the residuals are correlated within a cluster. Some methods are available for assessing the fit of GEE regression models with binary responses. Horton^{[3]} developed a goodnessoffit test for assessing such model fit by extending Hosmer^{[4]} goodnessoffit statistic for ordinary logistic regression. Their proposed statistic has an approximate chisquared distribution when the model is specified correctly. Barnhart^{[5]} also propose a goodnessoffit statistic for assessing the fit of GEE binary regression models. They extend Tsiatis’ method^{[6]} for assessing the fit of ordinary logistic regression models. This approach involves partitioning the space of covariates into distinct regions and forming score statistics that are asymptotically distributed as chisquare random variables with the appropriate degrees of freedom. Barnhart’s^{[5]} approach is best employed in the situation when there are only discrete covariates available because then there is no need to partition the covariates. Pan^{[7]} has proposed goodnessoffit tests for GEE with correlated binary data. Pan’s two tests result in the Pearson chisquare and an unweighted sum of residual squares, both of which are based on the residuals. These two tests can only be used when there is at least one continuous covariate available. If the possibility of influential observations is of concern to the data analyst, Preisser^{[8]} have proposed deletion diagnostics for generalized estimating equations. The diagnostics consider leverage and residuals to measure the influence of a subset of observations on the fitted regression parameters. Preisser^{[9]} also generalize the GEE procedure to produce parameter estimates and fitted values that are resistant to influential data. Here, the concern is assessing the fit of GEE categorical response models by determining how well the covariates predict the subject’s responses. We present the kappalike classification statistic, which indicates how well the proposed model predicts the categorical response.
GENERALIZED ESTIMATING EQUATION (GEE)
We outline Lipsitz^{[10]} method as follows. Assume the response of
interest is a categorical outcome with K categories denoted z_{it} =
k if the tth subunit from the ith cluster falls in the kth category, for i =
1, 2,......,N; t = 1, 2,.....,T_{i}(T = max(T_{i})∀i
= 1, 2,.....,N) and k = 1, 2,.........,K. For simplicity, we will assume that
the data are balanced, i.e., T_{i} = T for i = 1, 2,......,N. The following
method will still be applicable in the case of unbalanced data. The T(K1)x1
response vector Y_{t} for cluster I consists of the binary random variables
Y_{itk}, where, Y_{itk} = 1 if Z_{it} = k.
Typically one models the marginal cumulative probabilities of response, v_{itk}
= Pr (Z_{it}≤ k) for k = 1, 2,.......,K1. The marginal probabilities
are denoted by π_{itk} = Pr (Z_{it} = k) = Pr (Y_{itk}
= 1) = E (Y_{itk}) = v_{itk}  v_{it,k1} and will comprise
the T(K1)x1 vector π_{i}. The vectors Y_{i} and π_{i}
require only T(K1) elements instead of TK elements because
for i = 1, 2,........,K1 and t = 1,2,........,T. Let X_{it} be the
p x 1 covariate vector for the tth subunit of the ith cluster.
The cumulative marginal response probabilities will be related to the covariates via ling function g, the kth intercept λ_{k} and the px 1 marginal parameter vector β, g (v_{itk}) = λ_{k} + X’_{it}β.
The intercept are in increasing order: λ_{1}≤λ_{2}≤........≤λ_{K1}.
For an ordinal response, the function g may be any link function such as the
logit function, probit function (φ^{1}), or the elementary loglog
function. Lipsitz^{[10]} suggest that one estimates β with the
following set of generalized estimating equations:
where, D_{i} = D_{i} (β) = dπ_{t} (β)dβ, Vi = Vi (β, α) ≈ var (Y_{i}) is a working covariance matrix of Y_{i}^{[1,2]} and α is a q x 1 vector of correlation parameters. The parameters α are associated with the correlation between the elements of the vectors Y_{is} and Y_{it}.
KAPPALIKE STATISTIC
We used a kappalike statistic to assess model fit for GEE categorical response
models. Historically, the kappa coefficient has been used to determine the agreement
of binary^{[11]} and categorical^{[12]} outcomes between raters.
Kappa corrects the percentage of agreement between raters by taking into account
the proportion of agreement expected by chance. Kappa has been used as a measure
of reproducibility in many epidemiologic settings, such as studies involving
twin similarity^{ [13]} and controlinformant agreement collected from
casecontrol studies. The general expression for the kappa statistic is:
where, P_{0} is the observed proportion of agreement and Pe is the proportion of agreement expected by chance alone^{[12]}. A value of 0 for k indicates no agreement beyond chance and a value of 1 indicates perfect agreement, among many of k’s desirable properties^{[14]}. Thus, larger values of k indicate greater agreement between the outcomes.
Here we use k as a measure of agreement between the predicted and observed
categorical responses to assess the fit of the GEE model. We estimate k in a
second set of estimating equations, similar to Lipsitz^{[15]}, Klar^{[13]}
and Williamson^{[16]}. With Lipsitz’s^{[10]} method, we
estimate the probability of the response falling in each of the K categories.
Denote this estimated probability for the kth category, tth subunit, of the
ith cluster as
. We do not have a straightforward predicted response as with linear regression.
However, if we did have a predicted response for the tth subunit of the ith
cluster, denoted
, it is natural to assume that
would equal k (k = 1,2,AAA,K) with probability
. Let P_{0it} denote the probability that the predicted response from
the model is equal to the observed response, i.e.,
= Zit. A natural estimate of P_{0it} is obtained by using Z_{it},
the estimated probability from the fitted model that the response falls into
the observed category for the tth subunit of the ith cluster.
We define k_{it} as the agreement between the predicted and observed
responses for the tth subunit of the ith cluster as follows:
where, P_{0it} is defined above and P_{e} is the probability of correct prediction expected by chance alone.
As an estimate of P_{e}, we fit an interceptonly model. Cox^{[17]}
and Nagelkerke^{[18]} proposed using an interceptonly model as a baseline
model when generalizing the coefficient of determination for assessing the fit
of a logistic regression model. Thus, we will fit a model with just the intercepts,
the λ_{k} parameters for k = 1,2, AAA,K1, and no covariates. This
baseline model will provide a good starting point from which to compare the
proposed model. The estimated category probabilities from the interceptonly
model will be the same for all clusters and subunits and will be denoted:
where, n_{k} is the sum of observations in category k, i.e.,
for k = 1, . . .,K. All n_{k} observations with response category k
will each be correctly predicted with probability
; accordingly, the estimate of P_{e} is:
The agreement between two raters for assessing a categorical outcome with K
categories can be depicted in a K x K contingency table^{[14]}. The
row and column total probabilities for the kth outcome category are p_{k.}
and p_{.k}, the marginal probabilities that the two raters assess the
outcome in the kth category. The estimate of the probability expected by chance
is calculated assuming independence between the rows and columns in the contingency
table and is ,
which is similar to the estimate above. We will estimate an overall k to ascertain
the fit of the model (k = k_{it} for i = 1, . . .,N and t = 1, . . .
, T). By noting that P_{0it} = P_{e}+k(1.0P_{e}), we
use a second set of estimating equations as follows. Let P_{0i} and
U_{i} denote the Tx1 vectors
and .
The second set of estimating equations are, thus:
where, and
W _{i}≈ var (U_{i}) is the T xT working covariance matrix
ofU_{i}. To compute
one can use a Fisherscoringtype algorithm such as:
and
where, m denotes the iteration. We use Liang^{[1]}’s empirically
corrected variance estimate of
and Prentice’s^{[19]} empirically corrected variance estimate of
.
The second set of estimating equations can be solved noniteratively if we choose
the T x T identity matrix for W_{i}:
The term
is the average predicted probability corresponding to the observed responses.
If the fitted model predicts the categorical response perfectly, i.e., Z_{it}
then
= 1.0 . If the fitted model predicts the responses no better than an interceptonly
model, thenThis kappalike classification statistic should be interpreted as
the average probability of predicting the observed responses above and beyond
the prediction by the interceptonly model.
DATA AND VARIABLES
In present study the diabetes mellitus data was used to carry out the analysis. Here the follow up data on 528 patients registered at BIRDEM (Bangladesh Institute of Research and Rehabilitation in Diabetes, Endocrine and Metabolic disorders) in 198494 are used to identify the risk factors responsible for the transitions from controlled diabetic to confirmed diabetic state as well as confirm diabetic to controlled stage of diabetes. The response variable is defined in terms of the observed glucose level two hours of 75 g glucose load for each followup visit. The cutoff point for the blood glucose level is 11.1 mM L^{1}. If the observed response is less than 11.1, then the patient is defined as non diabetic (categorized as 0) if the response is greater than or equal to 11.1 then the patient is said to be diabetic (categorized as 1) according to WHO (1985) criteria. We include six independent variables, age, sex, education level, area of residence, family history of father and mother and time in our study. Out of these variables, age represents the age of the respondents at each visit. Time represents the length of time of the consecutive visits. These two variables are continuous variables and used directly in the analysis. Other variables are dichotomous variable.
RESULTS AND DISCUSSION
The kappalike statistic takes on a value of 0.0 for the interceptonly model
and a value of 1.0 for the saturated model. An advantage of the statistic is
that no decisions need be made when calculating it, unlike methods based on
covariate partitioning (where to partition, how many partitioned categories),
Hosmer and Lemeshow’s approach, rank correlation methods and classification
tables. Interpretation of the kappa statistic is not always straightforward;
see Fleiss^{[12]} and Landis^{[20]} for details. Similar to
Landis’s^{[20]} labeling of kappa values, Williamson^{[21]}
suggest interpreting the values of kappa for this classification index as follows:
First we fit a GEE logistic regression model with main effects terms only.
We then examined various interaction and quadratic variables for entry into
the regression model at the significance level of 0.05. The quadratic terms
for age and time were significant and entered into the final model. The kappalike
classification index for this final model increased indicating a better fit.
Table 1: 
GEE logistic regression models without quadratic terms assuming
the various correlation structures for diabetic mellitus study 

Table 2: 
GEE logistic regression models with quadratic terms assuming
the various correlation structures for diabetic mellitus study 

With the inclusion of the two quadratic terms, the model fit the data quite
well according to Barnhart’s^{[5]} test.
From Table 1, it can be seen that the kappalike statistic is a fair predictor for the GEE model of exchangeable correlation structure, a good predictor for the autoregressive correlation structure and poor predictor for pairwise correlation structure. Though the likelihood ratio test shows significant effect of covariates in the model but this test does not indicate how well the model predicts the observed responses.
Table 2 shows that when we include the two quadratic terms the kappalike statistic is good for GEE model of exchangeable correlation structure, a excellent predictor for the autoregressive correlation structure and fair predictor for pairwise correlation structure.
CONCLUSIONS
The kappalike classification statistic is a more appropriate indicator of how well the model predicts the observed responses at the cluster level (e.g., an individual) as opposed to how well the model fits the data at the group level (e.g., treatment category). Often a model can fit the data well in terms of predicting the proportion of positive responses for a group of individuals, but is not necessarily useful for predicting a particular individual’s response. The kappalike classification index is an intuitive measure for assessing model fit in that it estimates the probability of an observation being correctly predicted by the fitted model. Then, this probability is corrected for chance by comparing it to the probability that an interceptonly model would have correctly predicted the observation. For the diabetes mellitus study, we would recommend that the kappa value for the GEE model with quadratic terms indicated better prediction than the GEE models without quadratic terms. That is, the GEE models with quadratic terms are fitter well.
ACKNOWLEDGMENTS
The authors thank Dr. Kalipada Sen, Professor, Department of Statistics, University
of Dhaka, Bangladesh for helpful discussions and manuscript review.