INTRODUCTION
Discussion on binary response modeling was reported by Sarkar
and Midi (2010), Zadkarami (2008), Sarkar
et al. (2011) and the models usually adopted probit model and logit
model (Alpu and Fidan, 2004; Damisa
et al., 2007). Generally for panel data observations towards same
subjects and variables are conducted repeatedly. The model was constructed based
on approach of fixed effect, random and dynamic models (Lechner
et al., 2005). Simulation study of an efficient estimation of a mean
true effect using a fixed effects model and a random effects model in order
to establish appropriate confidence interval estimation have been reported (Antoine
et al., 2007). Method for estimating parameter on panel data modeling
used was Maximum Likelihood Estimating (MLE), moment method and Generalized
Estimating Equation (GEE) (Diggle et al., 1996).
The simplest model is independent model which the assumption taken based on
the response on the same and inter subjects is independent. Based on this assumption,
combined probability is result of multiplication of its marginal probability.
Liang and Zeger (1986) suggest that logistic and probit
analysis that are ignoring the correlation and produce consistence estimation
parameter but still have close relation becomes inefficient. Chaganty
and Joe (2004) have proposed the use of equations of the correlation estimator
for probit models in GEE.
In many cases data are multivariate or correlated (e.g., due to repeated observations
on a study subject or for subjects within centers) and it is appealing to have
a model that maintains a marginal logistic regression interpretation for the
individual outcomes. An alternative is to use a marginal analysis that avoids
complete specification of the likelihood (Lipsitz et
al., 1990). Prentice (1988) demonstrate modeling
strategic to obtain a consistent and normal asymptotic regression coefficient
estimator by using GEE. Double regression was not needed in this calculation.
Development of model by controlling unobserved and homogeneous individual characteristic
towards replication was reported. However, for heterogeneous individual characteristic,
problem would be found is that estimation parameter becomes bias (Greene,
2005). Double integral was needed in probit model. From several simulations,
Hajivassiliou et al. (1996) reported that GewekeHajivasilouKeane
(GHK) was the best simulation. Geweke et al. (1997)
also find double integral calculation by using Monte Carlo simulation known
as unbias and consistence GHK.
Eventually, several different independent variables are observed simultaneously.
This observation produces multivariate data. Nugraha (2000)
examined properties of parameter estimators on bivariate logistic regression
by using MLE and GEE method. These method produce a consistence estimator. Nugraha
et al. (2006) reported that for logistic model on multivariate binary,
smaller variance estimator gained by GEE and compared to independent assumption
approach. Nugraha et al. (2009a) evaluated the
comparison of GEE and MLE method for multivariate binary logistic. It was concluded
that GEE model has smaller bias. However, the estimator for correlation parameter
is always underestimate. Akanda et al. (2005)
proposed an alternative procedure for goodness of fit test based on GEE where,
the correlation between two responses was considered.
Nugraha et al. (2008) presented that analytically,
both first and second derivation of loglikelihood binary multivariate model
can be calculated. Based on its first derivation and second derivation, estimation
on the parameter can be executed by NewtonRaphson iteration procedure. Nugraha
et al. (2009b) obtained the estimator of binary multivariate probit
model by Simulated Maximum Likelihood Estimator (SMLE) based on GHK simulation.
During the last decade, practitioners and researchers are willing to move from
the multinomial and nested logit models, that were the standard until only a
few years ago, towards these more general models, specifically towards Mixed
Logit (Munizaga and AlvarezDaziano, 2004). In this present
study, correlation effect on mixed logit model towards binary multivariate data
is discussed. Estimation of parameter is conducted by MLE and GEE method and
comparative study for simulation data is subjected. Both simulation data and
estimation of parameter were performed utilizing R.2.10.0. program. (www.Rproject.org).
MULTIVARIATE BINARY RESPONSE
It is assumed that Y_{it} is binary response, Y_{it} = 1 as the subject i at the response of t choosing the alternative 1 and Y_{it} = 0 if the subject of i at the response of t choosing the alternative of 2. Each individual has covariate X_{i} as individual characteristics i and covariate Z_{ijt} as characteristic of choice/alternative j at the individu of i.
Utility of subject i selecting the alternative of j on response t is:
with:
V_{ijt } = α_{jt}+β_{jt} X_{i}+γ_{t}Z_{ijt}
By assuming that decision maker select the alternative based on the maximum utility value, the model can be expressed in the form of difference utility:
with:
V_{it} = (V_{i1t}–V_{i0t})
= (α_{1t}α_{0t})+(β_{1t}β_{0t})
X_{i}+γ_{t }(Z_{i1t}Z_{i0t}) = α_{t}+β_{t
}X_{i}+γ_{t}Z_{it} 
and ε_{it} = (ε_{it}ε_{i0t}). The probability of subject of i selecting (y_{i1} = 1, ...., y_{iT} = 1) is:
With ε_{i} = (ε_{i1}, ..., ε_{iT})'.
Value of probability is calculated by multiple integral T depend on the parameter
θ = (θ_{1}', θ_{2}', ..., θ_{T}')
as well as distribution of ε (Train, 2003).
GEE ON MODEL LOGIT
GEE on multivariate binary response can be expressed in the following form :
W_{i} is called as observation matrix:
and Δ_{i} = diag (π_{il} (1π_{il})
... π_{iT} (1π_{iT})) S_{i} is matrix TxT
and can be expressed as:
It is defined that and
R_{i} is “working” matrix with the correlation Y_{i}
in the size of TxT.
Paired correlation coefficient between response of s and response of r, ρ_{isr}= corr (Y_{is}, Y_{ir}) for i = 1, 2, ..n and s, r = 1, 2, …T. In order to estimate R_{i}, empirical correlation, r_{i} is defined as vector with the size of T (T1)/2 with the elements:
Empirical correlation r_{ist} that is contained parameter π_{is}
and π_{it} is unbias estimator for ρ_{ist} (Liang
and Zeger, 1986). As it is assumed that ρ_{ist} = ρ_{st}
for all of i, the estimator is:
Equation 4 and 5 can be solved at the same
time to get the parameter estimator θ and ρ.
Theorem 1: If is
GEE estimator that is solution of Eq. 4, so is
normal multivariate distributed:
where,
MIXED LOGIT MODEL
Mixed logit model is combination of extreme value distribution and normal distribution assumption. Based on equation of utility difference (2), the individual effect δ_{it} is added:
ε_{it} has extreme value type I distribution. δ_{it} is ith individual effect of tth response and is independent respect to ε_{it}. It is assumed that δ_{i} = (δ_{i1}, ...., δ_{iT}) has multivariate normal, δ_{it} ~ N (0, 1) and δ_{i}~N (0, Σ). Parameter that will be estimated on this mixed Logit model are the parameter of regression coefficient (α_{t}, β_{t}, γ_{t}) and Σ. Parameter Σ are estimated by the Cholesky factor, C.
Individual effect δ_{i} can be can be modified in the following form:
where, C is matrix of Cholesky factor of Σ:
CC’ = Σ and ξ_{i} ~ N (0, 1)
I is identity matrix with the size of TxT. Generally, for t = 1, …, T, the utility function can be constructed:
where,
extreme
value type I, c_{tl} is an element of cholesky matrix. Based on Eq.
8 conditional logit model is derived with expression:
Marginal probability (for t and i) in the mixed logit model is:
φ (ξ_{it}) is standard normal density.
The joint conditional probability is:
where, c = (c_{11}, c_{12}, c_{13}, c_{22}, c_{23}, c_{33}) and ξ_{i} (ξ_{il}, ..., ξ_{iT}). The joint probability is:
φ (ξ_{i}) is normal standard density.
Theorem 2: If it is known the utility model as represented in Eq.
8 that is fit to regularity condition, so MLE for parameter of θ_{t}
= (α_{t} ,β_{t}, γ_{t})' and is
the solution of estimator equation:
where, l = 1, ..., T. with:
and
can
be obtained by using iteration process of methods BHHH ((Berndt, Hall, Hall,
Hausman) or BFGS (Broyden, Fletcher, Goldfarb dan Shanno) (Chong
and Zak, 1996).
SIMULATION DATA
Simulation data was simulated by taking case T = 3 and n = 1000. Utility equation
is:
U_{i0t} = α_{0t}+β_{0t}
X_{i}+γ_{t} Z_{i0t}+ε_{i01}
dan U_{i1t} = α_{1t}+β_{1t} X_{i}+γ_{t}
Z_{i1t}+ε_{i1t} 
(13) 
For i = 1, ..., n; t = 1, 2, 3 and j = 0, 1. Equation of utility (13) can be transformed in the difference of utility: U_{it} = U_{i1t}–U_{i0t} and the corresponding model is:
U_{i1} = V_{i1}+ε_{i1}; U_{i2} = V_{i2}+ε_{i2}; U_{i3} = V_{i3}+ε_{i3}
With V_{it} = (V_{i1t}–V_{i0t}) = α_{t}+β_{t} X_{i}+γ_{t} Z_{it}; Z_{it} = (Z_{i1t }–Z_{i0t}); β_{t} = β_{0t}β_{1t}; α_{t }= (α_{0t}α_{1t}). Data was generated on parameter value of α_{t} = 1, β_{t} = 0.5 and γ_{t} = 0.3. Structure of correlation/covariances to be examined are r_{12} = ρ and r_{13} = r_{32} = 0:
Utility for t = 1 is in correlation with the utility for t = 2 and the correlation
order ρ = 0, 0.1, 0.2,..., 0.9. The value of observation variable X_{i}
and Z_{ijt} taken from normal distribution is:
Generated data with the assumption of ε_{ijt} is extreme value distributed and δ_{i}~N (0, σ^{2}). Estimation towards parameter was conducted by utilizing GEE and MLE for mixed logit model.
In order to estimate parameter using MLE, double integration was required and
for this purpose, Halton sequence can be implemented. Simulation program for
Halton can be adopted from library randtoolbox. For optimizing loglikelihood
function, optim function by available BFGS method in R.2.10 program was used
(R Development Core Team, 2009).
RESULT OF SIMULATION
Figure 121 depicted the effect of correlation
value to the each estimator bias. From results, it can be concluded that if
assumption by using GEE does not appropriate, the utilities that are correlated
each other will be bias. Utility 1 (U_{i1}) is in correlation with utility
2 (U_{i2}) and both are not correlated to utility 3 (U_{i3}).
Therefore, value of correlation only influences on parameters within U_{i1}
and U_{i2}. On both utilities, the estimator bias appreciable and comparable
respect to the correlation values (Fig. 19).
• 
Estimation by using MLE is more accurate compared to as in
using GEE. At bigger correlation between utilities, estimator on GEE will
has higher deviation (Fig. 19) and
this is caused misassumption. Model Logit model estimated by using GEE is
based on the assumption that error has variance of π^{2}/3
while for data, the variance is (σ^{2}+π^{2}/3).
As the assumption fulfilled, Mixed Logit model can estimate all parameter
appropriately (Fig. 1120) 
• 
Correlation parameter estimator for MLE is better (smallet bias) compared
to as or GEE (Fig. 20, 21) more over,
Correlation parameter estimator for GEE tends to be underestimate (Fig.
10) 
• 
In low correlation (less than 0.5), MLE is robust respect to assumption
deviation and can estimate precisely. Deviation of assumption in GEE method
produce in bias estimator 

Fig. 1: 
Box plot α_{1} (t = 1) in GEE 

Fig. 2: 
Box plot α_{2 }(t = 2) in GEE 

Fig. 3: 
Box plot α_{3} (t = 3) in GEE 

Fig. 4: 
Box plot β_{1} (t = 1) in GEE 

Fig. 5: 
Box plot β_{2} (t = 2) in GEE 

Fig. 6: 
Box plot β_{3} (t = 3) in GEE 

Fig. 7: 
Box plot γ_{1} (t = 1) in GEE 

Fig. 8: 
Box plot γ_{2} (t = 2) in GEE 

Fig. 9: 
Box plot γ_{3} (t = 3) in GEE 

Fig. 10: 
Box plot ρ in GEE 

Fig. 11: 
Box plot α_{1} (t = 1) in MLE 

Fig. 12: 
Box plot α_{2 }(t = 2) in MLE 

Fig. 13: 
Box plot α_{3} (t = 3) in MLE 

Fig. 14: 
Box plot β_{1} (t = 1) in MLE 

Fig. 15: 
Box plot β_{2} (t = 2) in MLE 

Fig. 16: 
Box plot β_{3} (t = 3) in MLE 

Fig. 17: 
Box plot γ_{1} (t = 1) in MLE 

Fig. 18: 
Box plot γ_{2} (t = 2) in MLE 

Fig. 19: 
Box plot γ_{3} (t = 3) in MLE 

Fig. 20: 
Box plot ρ in MLE 

Fig. 21: 
Graph of a bias ρ in GEE and MLE 
CASE STUDY
Data are obtained from library Ecdat in R.2.10.0 which are data Heating. Data are resulted from survey of room heater household in California. From 850 families, there are several variables observed. Y_{i }is heater system utilized, that are gc (gas central), gr (gas room), ec (electric central) and er (electric room). X_{1i} is income of each family (income). X_{2i} is age of familyhed (Agehed). X_{3i} is amount of room in house (rooms) where, I = 1, ..., 850. Furthermore, multivariate approach with two responses was utilized. Y_{i1} is source of energy (g = 1) and electricity (e = 0). Y_{i2} is application type, central controled (c = 1) and room controled (r = 0). Characteristic variable for each response is defined in Table 1. From those data, tabulated data between energy source (Y_{1}) and application type (Y_{2}) (Table 2).
Based in Chisquared Pearson’s test, obtained value χ^{2} = 95.8792 (pvalue<2.2e16), means that energy source choosen by respondents and application type are not independent.
If there is correlation among responses, parameter estimator will be bias. On this condition, Mixed Logit model is more accurate in estimating parameter compared with as in discrete choice model. For three approximations used; univariate, GEE, mixed logit model, instalation cost and operational cost are the significant variables influence to the Y variable. In contrast, effect of individual charactersitics of X_{1i}, X_{2i} and X_{3i} to the response is not significant.
Equation of marginal probability on independent model and GEE for energy source is as follow:
and the probability equation for application type is:
Estimator value for each parameter is presented in Table 3.
Table 1: 
Characteristic variabel of election 

Table 2: 
Crossed tabulation of energy source and application type 

Table 3: 
Estimator for three models 

Values in blanket is statistic Wald 
Equation of marginal probability for heater system on Logit model is:
with:
ξ_{i} has normal distribution with density φ (ξ_{i}) and:
Equation of marginal probability for the variable of application type is:
With:
Equation of combined probability on mixed logit model is:
If Y_{i1} = g, so y_{i1} = 1 and if Y_{i1} = e so y_{i1} = 0. As Y_{i2} = c, y_{i2} = 1 and as Y_{i2} = r y_{i2} = 0.
By using mixed logit model, correlation value between energy source and application source is 0.628 otherwise by using GEE correlation value is 0.313. From simulation result, it is noted that corelation estimator for GEE tend to be underestimate, therefore, estimator correlation on Mixed Logit model is more trusted.
CONCLUSION
Mixed Logit model is based on assumption that error component has extreme value distribution and parameter has normal distribution. Based on data simulation, if there is a violation to the assumption (specifically related to variance) GEE estimator will be bias. Estimator bias will be higher at higher variance value. Estimator bias in GEE can be reduced by MLE method. The correlation of parameter estimator in GEE tends to be underestimate but MLE can be used for estimating correlation between parameters accurately.