Discussion on binary response modeling was reported by Sarkar
and Midi (2010), Zadkarami (2008), Sarkar
et al. (2011) and the models usually adopted probit model and logit
model (Alpu and Fidan, 2004; Damisa
et al., 2007). Generally for panel data observations towards same
subjects and variables are conducted repeatedly. The model was constructed based
on approach of fixed effect, random and dynamic models (Lechner
et al., 2005). Simulation study of an efficient estimation of a mean
true effect using a fixed effects model and a random effects model in order
to establish appropriate confidence interval estimation have been reported (Antoine
et al., 2007). Method for estimating parameter on panel data modeling
used was Maximum Likelihood Estimating (MLE), moment method and Generalized
Estimating Equation (GEE) (Diggle et al., 1996).
The simplest model is independent model which the assumption taken based on
the response on the same and inter subjects is independent. Based on this assumption,
combined probability is result of multiplication of its marginal probability.
Liang and Zeger (1986) suggest that logistic and probit
analysis that are ignoring the correlation and produce consistence estimation
parameter but still have close relation becomes inefficient. Chaganty
and Joe (2004) have proposed the use of equations of the correlation estimator
for probit models in GEE.
In many cases data are multivariate or correlated (e.g., due to repeated observations
on a study subject or for subjects within centers) and it is appealing to have
a model that maintains a marginal logistic regression interpretation for the
individual outcomes. An alternative is to use a marginal analysis that avoids
complete specification of the likelihood (Lipsitz et
al., 1990). Prentice (1988) demonstrate modeling
strategic to obtain a consistent and normal asymptotic regression coefficient
estimator by using GEE. Double regression was not needed in this calculation.
Development of model by controlling unobserved and homogeneous individual characteristic
towards replication was reported. However, for heterogeneous individual characteristic,
problem would be found is that estimation parameter becomes bias (Greene,
2005). Double integral was needed in probit model. From several simulations,
Hajivassiliou et al. (1996) reported that Geweke-Hajivasilou-Keane
(GHK) was the best simulation. Geweke et al. (1997)
also find double integral calculation by using Monte Carlo simulation known
as unbias and consistence GHK.
Eventually, several different independent variables are observed simultaneously.
This observation produces multivariate data. Nugraha (2000)
examined properties of parameter estimators on bivariate logistic regression
by using MLE and GEE method. These method produce a consistence estimator. Nugraha
et al. (2006) reported that for logistic model on multivariate binary,
smaller variance estimator gained by GEE and compared to independent assumption
approach. Nugraha et al. (2009a) evaluated the
comparison of GEE and MLE method for multivariate binary logistic. It was concluded
that GEE model has smaller bias. However, the estimator for correlation parameter
is always underestimate. Akanda et al. (2005)
proposed an alternative procedure for goodness of fit test based on GEE where,
the correlation between two responses was considered.
Nugraha et al. (2008) presented that analytically,
both first and second derivation of log-likelihood binary multivariate model
can be calculated. Based on its first derivation and second derivation, estimation
on the parameter can be executed by Newton-Raphson iteration procedure. Nugraha
et al. (2009b) obtained the estimator of binary multivariate probit
model by Simulated Maximum Likelihood Estimator (SMLE) based on GHK simulation.
During the last decade, practitioners and researchers are willing to move from
the multinomial and nested logit models, that were the standard until only a
few years ago, towards these more general models, specifically towards Mixed
Logit (Munizaga and Alvarez-Daziano, 2004). In this present
study, correlation effect on mixed logit model towards binary multivariate data
is discussed. Estimation of parameter is conducted by MLE and GEE method and
comparative study for simulation data is subjected. Both simulation data and
estimation of parameter were performed utilizing R.2.10.0. program. (www.R-project.org).
MULTIVARIATE BINARY RESPONSE
It is assumed that Yit is binary response, Yit = 1 as the subject i at the response of t choosing the alternative 1 and Yit = 0 if the subject of i at the response of t choosing the alternative of 2. Each individual has covariate Xi as individual characteristics i and covariate Zijt as characteristic of choice/alternative j at the individu of i.
Utility of subject i selecting the alternative of j on response t is:
Vijt = αjt+βjt Xi+γtZijt
By assuming that decision maker select the alternative based on the maximum utility value, the model can be expressed in the form of difference utility:
Vit = (Vi1tVi0t)
Xi+γt (Zi1t-Zi0t) = αt+βt
and εit = (εit-εi0t). The probability of subject of i selecting (yi1 = 1, ...., yiT = 1) is:
With εi = (εi1, ..., εiT)'.
Value of probability is calculated by multiple integral T depend on the parameter
θ = (θ1', θ2', ..., θT')
as well as distribution of ε (Train, 2003).
GEE ON MODEL LOGIT
GEE on multivariate binary response can be expressed in the following form :
Wi is called as observation matrix:
and Δi = diag (πil (1-πil)
... πiT (1-πiT)) Si is matrix TxT
and can be expressed as:
It is defined that and
Ri is working matrix with the correlation Yi
in the size of TxT.
Paired correlation coefficient between response of s and response of r, ρisr= corr (Yis, Yir) for i = 1, 2, ..n and s, r = 1, 2,
T. In order to estimate Ri, empirical correlation, ri is defined as vector with the size of T (T-1)/2 with the elements:
Empirical correlation rist that is contained parameter πis
and πit is unbias estimator for ρist (Liang
and Zeger, 1986). As it is assumed that ρist = ρst
for all of i, the estimator is:
Equation 4 and 5 can be solved at the same
time to get the parameter estimator θ and ρ.
Theorem 1: If is
GEE estimator that is solution of Eq. 4, so is
normal multivariate distributed:
MIXED LOGIT MODEL
Mixed logit model is combination of extreme value distribution and normal distribution assumption. Based on equation of utility difference (2), the individual effect δit is added:
εit has extreme value type I distribution. δit is ith individual effect of tth response and is independent respect to εit. It is assumed that δi = (δi1, ...., δiT) has multivariate normal, δit ~ N (0, 1) and δi~N (0, Σ). Parameter that will be estimated on this mixed Logit model are the parameter of regression coefficient (αt, βt, γt) and Σ. Parameter Σ are estimated by the Cholesky factor, C.
Individual effect δi can be can be modified in the following form:
where, C is matrix of Cholesky factor of Σ:
CC = Σ and ξi ~ N (0, 1)
I is identity matrix with the size of TxT. Generally, for t = 1,
, T, the utility function can be constructed:
value type I, ctl is an element of cholesky matrix. Based on Eq.
8 conditional logit model is derived with expression:
Marginal probability (for t and i) in the mixed logit model is:
φ (ξit) is standard normal density.
The joint conditional probability is:
where, c = (c11, c12, c13, c22, c23, c33) and ξi (ξil, ..., ξiT). The joint probability is:
φ (ξi) is normal standard density.
Theorem 2: If it is known the utility model as represented in Eq.
8 that is fit to regularity condition, so MLE for parameter of θt
= (αt ,βt, γt)' and is
the solution of estimator equation:
where, l = 1, ..., T. with:
be obtained by using iteration process of methods BHHH ((Berndt, Hall, Hall,
Hausman) or BFGS (Broyden, Fletcher, Goldfarb dan Shanno) (Chong
and Zak, 1996).
Simulation data was simulated by taking case T = 3 and n = 1000. Utility equation
Ui0t = α0t+β0t
dan Ui1t = α1t+β1t Xi+γt
For i = 1, ..., n; t = 1, 2, 3 and j = 0, 1. Equation of utility (13) can be transformed in the difference of utility: Uit = Ui1tUi0t and the corresponding model is:
Ui1 = Vi1+εi1; Ui2 = Vi2+εi2; Ui3 = Vi3+εi3
With Vit = (Vi1tVi0t) = αt+βt Xi+γt Zit; Zit = (Zi1t Zi0t); βt = β0t-β1t; αt = (α0t-α1t). Data was generated on parameter value of αt = -1, βt = 0.5 and γt = 0.3. Structure of correlation/covariances to be examined are r12 = ρ and r13 = r32 = 0:
Utility for t = 1 is in correlation with the utility for t = 2 and the correlation
order ρ = 0, 0.1, 0.2,..., 0.9. The value of observation variable Xi
and Zijt taken from normal distribution is:
Generated data with the assumption of εijt is extreme value distributed and δi~N (0, σ2). Estimation towards parameter was conducted by utilizing GEE and MLE for mixed logit model.
In order to estimate parameter using MLE, double integration was required and
for this purpose, Halton sequence can be implemented. Simulation program for
Halton can be adopted from library randtoolbox. For optimizing log-likelihood
function, optim function by available BFGS method in R.2.10 program was used
(R Development Core Team, 2009).
RESULT OF SIMULATION
Figure 1-21 depicted the effect of correlation
value to the each estimator bias. From results, it can be concluded that if
assumption by using GEE does not appropriate, the utilities that are correlated
each other will be bias. Utility 1 (Ui1) is in correlation with utility
2 (Ui2) and both are not correlated to utility 3 (Ui3).
Therefore, value of correlation only influences on parameters within Ui1
and Ui2. On both utilities, the estimator bias appreciable and comparable
respect to the correlation values (Fig. 1-9).
||Estimation by using MLE is more accurate compared to as in
using GEE. At bigger correlation between utilities, estimator on GEE will
has higher deviation (Fig. 1-9) and
this is caused misassumption. Model Logit model estimated by using GEE is
based on the assumption that error has variance of π2/3
while for data, the variance is (σ2+π2/3).
As the assumption fulfilled, Mixed Logit model can estimate all parameter
appropriately (Fig. 11-20)
||Correlation parameter estimator for MLE is better (smallet bias) compared
to as or GEE (Fig. 20, 21) more over,
Correlation parameter estimator for GEE tends to be underestimate (Fig.
||In low correlation (less than 0.5), MLE is robust respect to assumption
deviation and can estimate precisely. Deviation of assumption in GEE method
produce in bias estimator
||Box plot α1 (t = 1) in GEE
|| Box plot α2 (t = 2) in GEE
|| Box plot α3 (t = 3) in GEE
|| Box plot β1 (t = 1) in GEE
|| Box plot β2 (t = 2) in GEE
|| Box plot β3 (t = 3) in GEE
|| Box plot γ1 (t = 1) in GEE
|| Box plot γ2 (t = 2) in GEE
|| Box plot γ3 (t = 3) in GEE
|| Box plot ρ in GEE
|| Box plot α1 (t = 1) in MLE
|| Box plot α2 (t = 2) in MLE
|| Box plot α3 (t = 3) in MLE
|| Box plot β1 (t = 1) in MLE
||Box plot β2 (t = 2) in MLE
||Box plot β3 (t = 3) in MLE
|| Box plot γ1 (t = 1) in MLE
|| Box plot γ2 (t = 2) in MLE
|| Box plot γ3 (t = 3) in MLE
|| Box plot ρ in MLE
|| Graph of a bias ρ in GEE and MLE
Data are obtained from library Ecdat in R.2.10.0 which are data Heating. Data are resulted from survey of room heater household in California. From 850 families, there are several variables observed. Yi is heater system utilized, that are gc (gas central), gr (gas room), ec (electric central) and er (electric room). X1i is income of each family (income). X2i is age of familyhed (Agehed). X3i is amount of room in house (rooms) where, I = 1, ..., 850. Furthermore, multivariate approach with two responses was utilized. Yi1 is source of energy (g = 1) and electricity (e = 0). Yi2 is application type, central controled (c = 1) and room controled (r = 0). Characteristic variable for each response is defined in Table 1. From those data, tabulated data between energy source (Y1) and application type (Y2) (Table 2).
Based in Chi-squared Pearsons test, obtained value χ2 = 95.8792 (p-value<2.2e-16), means that energy source choosen by respondents and application type are not independent.
If there is correlation among responses, parameter estimator will be bias. On this condition, Mixed Logit model is more accurate in estimating parameter compared with as in discrete choice model. For three approximations used; univariate, GEE, mixed logit model, instalation cost and operational cost are the significant variables influence to the Y variable. In contrast, effect of individual charactersitics of X1i, X2i and X3i to the response is not significant.
Equation of marginal probability on independent model and GEE for energy source is as follow:
and the probability equation for application type is:
Estimator value for each parameter is presented in Table 3.
|| Characteristic variabel of election
||Crossed tabulation of energy source and application type
|| Estimator for three models
|Values in blanket is statistic Wald
Equation of marginal probability for heater system on Logit model is:
ξi has normal distribution with density φ (ξi) and:
Equation of marginal probability for the variable of application type is:
Equation of combined probability on mixed logit model is:
If Yi1 = g, so yi1 = 1 and if Yi1 = e so yi1 = 0. As Yi2 = c, yi2 = 1 and as Yi2 = r yi2 = 0.
By using mixed logit model, correlation value between energy source and application source is 0.628 otherwise by using GEE correlation value is 0.313. From simulation result, it is noted that corelation estimator for GEE tend to be underestimate, therefore, estimator correlation on Mixed Logit model is more trusted.
Mixed Logit model is based on assumption that error component has extreme value distribution and parameter has normal distribution. Based on data simulation, if there is a violation to the assumption (specifically related to variance) GEE estimator will be bias. Estimator bias will be higher at higher variance value. Estimator bias in GEE can be reduced by MLE method. The correlation of parameter estimator in GEE tends to be underestimate but MLE can be used for estimating correlation between parameters accurately.