INTRODUCTION
Multilevel linear models or random coefficients models are a type of mixed
model with hierarchical data in away that each group at the higher level is
assumed to have different regression slopes as well as different intercepts
for purposes of predicting an individuallevel of the dependent variable. Random
coefficients model is illustrated by Bryk and Raudenbush (1992),
Goldstein (1987), Langford (1987)
and Raudenbush et al. (2005). The two levels model
can be expressed in two equations; level 1 and 2 as:
Level 1
where, i refers to the level 1 unit and j refers to the level 2 units, y_{ij}
is the response variable for level 1 unit i within level 2 unit j, B_{0j}
represents random intercept for the level 2 unit j, B_{1j} represents
random slope of variable X_{i} of unit j and r_{ij }represents
the residual for unit i within unit j. Also, J is the largest number of levels
and n_{j }is the jth level sample size.
Level 2
In level 2 the parameters γ_{00} and γ_{10} are intercepts, γ_{01} and γ_{11} represent slopes predicting B_{0j} and B_{1j}, respectively from an outer variable W_{j} explanatory variable of level 2. Noting that, W_{j} should be in matrix form involves a (J+1) row vector of predictors in a block diagonal fashion. Moreover, U_{0j} and U_{1j} are level two random errors (random effects) that assumed to have zero means with an arbitrary variance covariance matrix.
The traditional estimation method used to estimate the parameters of model
given in Eq. 1 and 2 is the iterative generalized
least squares method; which is a sequential refinement procedure based on Ordinary
Least Square (OLS) estimation. The method has been described in detail by Goldstein
(1986). For known variance covariance matrix (V) of the level1 residual,
then the GLS of the coefficients in level 1 is:
However, the GLS analysis to estimate level 2 parameters is:
Given that, V^{*} and V V,
where, vec is the vector operator and ^{}is
the Kronecker product.
GENERALIZED MAXIMUM ENTROPY
The traditional maximum entropy formulation is based on the entropyinformation
measure which reflects the uncertainty about the occurrence of a collection
of events. Shannon (1948) defined the entropy of the
distribution (discrete events{x_{1}, x2, ..., x_{k}} whose probabilities
of occurrences are p_{1}, p_{2}, ...,p_{k}), as the
average of selfinformation:
where, 0ln(0) = 0.
Since, the 1990’s many attempts have been made to apply the method of
maximum entropy in the area of linear models. Golan et
al. (1996) proposed an estimator based on the maximum entropy formalism
of Jaynes (1957) that they called the Generalized Maximum
Entropy (GME) estimator. The idea underling the GME approach in the general
linear model can be clarified by considering the following nonlinear relationships:
where, β = (β_{1},β_{2},...,β_{K})'
is the vector of parameters to be estimated, the regressor variable x_{i},
i = 1,2,... n are Kdimensional vectors whose values are assumed known and ε_{i},
i = 1,2,...,n is the random error.
In GME, the unknown parameters are reparameterized as follow: β = ZP; where, Z is a (KxKR) matrix and P is a KRvector of weights such that p_{k}>0 and p'_{k}1_{R} = 1for each k. Simply, each β_{k}, k = 1,2,...,K can be defined by a set of equally distanced discrete points Z'_{k} = [z_{k1},z_{k2}, z_{k3},...,z_{kR}] where, R≥2 with corresponding probabilities P'_{k} = [p_{k1},p_{k2},p_{k3},...,p_{kR}]. That is:
In similar fashion, the disturbance term may be rewritten as and ε = VW,
where V is a (nxx nJ1) matrix and W is a nJ1dimensional vector of weights;
that is to say:
The choice of Z should be uniformly and symmetrically around zero with equally
spaced distance discrete points, for example Z = (c,0,c), c large value. On
the other hand, the actual bounds for v_{i} depend on the observed sample
as well as any conceptual or empirical information about the underlying error.
However; if such conceptual or empirical information does not exist, then v_{i}
may be specified to be uniformly and symmetrically distributed around zero.
Chebychev’s inequality may be used as a conservative means of specifying
sets of error bounds. For any random variable, X, such that E(X) = 0 and Var(X)
= σ^{2}, the inequality provides, P(X<dσ)≥11/d^{2},
d>0; then the chebchyev’s error bounds are v_{1} = dσ and
v_{n} = dσ. One can use 3σ rules. However, the number of support
points for each parameter, R and for the disturbance, J1, may be increased to
reflect higher moments or more refined prior knowledge about β and ε,
based on AlNasser (2003), AlNasser
(2005), Ciavolino and AlNasser (2009) and Golan
(2008) it appears that the greatest improvement in precision comes for using
R and J1 to be 5 support points.
Now, using the reparameterized unknowns’ β = ZP and ε = VW,
we rewrite the general linear model as follows:
Then maximum entropy principle may be stated in scalar summations with two
nonnegative probability components and the GME estimators can be achieved by
solving the following nonlinear programming problem:
Subject to:
Note that is the Kronecker product, 1_{K} is a Kdimensional vector of ones and In(P) = (IN(p_{11}),In(p_{12}),...,In(p_{KR})). The GME system in Eq. 3 is a nonlinear programming system that can be solved by applying the Lagrangian method, in which after finding the lagrangian function, the first order conditions are solved.
GENERALIZED MAXIMUM ENTROPY TO RANDOM COEFFICIENT MODEL
In order to estimate the two level random coefficient model by using GME method we rewrite Eq. 1 and 2 by one equation as:
In the new general model Eq. 4, there are four unknown parameters
which should be reparametrized by following the GME principles, that is mean
each parameter will be rewritten as a convex combination of a discrete random
variable in the following matrix form:
Also, in this model; there are three error terms that should be reparametrized
in a similar manner:
Using these reparameterization expressions, then the model can be rewritten
as:
Therefore, the GME nonlinear programming system:
Maximize:
Subject to:
Note that
is the Kronecker product. Then this nonlinear programming system can be solved
numerically. However, the final estimators will be obtained by the following
formulas:
SIMULATION STUDY
A simple Monte Carlo simulation study is considered to study the performance of the parameter estimation using GME. For the purposes of the simulation study we considered the following balanced random slope model:
Then the compound model:
Table 1: 
Monte carlo comparisons between OLS and GME for Random coefficient
model 

Then the simulation study performed under the following assumptions:
• 
Generate 1000 random sample of size n = 10, 20, 30 and 50
and number of intercepts J = 2 
• 
The error ~ N(0, 1), X~Exp 1 and set β_{1} = β_{2}
= 1 
• 
The error u ~ N(0, 1), W~U(0,1) and set γ_{0} = 1, γ_{1}
= 1.5 
• 
For GME estimator, we initial three support values for parameter in the
interval [10, 0, 10] and three support values for the error term selected
in the interval [ 3 S, 0, 3S] where, S is the standard deviation of the
dependent variable y 
• 
The simulation results for estimating fixed effect parameters, are given
in Table 1 
The simulated bias and the efficiency are computed based on the following formulas:
Under the simulation assumptions, the results in Table 1
indicate the superiority of GME estimation method over the OLS. It can be noted
that, for all sample sizes the GME estimators are more accurate and more efficient
than their counter part based on the OLS estimation method.
AN APPLICATION TO A REAL DATA
The data of the High School and Beyond (HSB) study is used. These data consist of a total sample of 7,185 students who are nested within 160 schools; 90 public and 70 private. Between 14 and 67 students were assessed from each school, with a median of number of 47 students assessed. The outcome of interest is a studentlevel measure of math achievement. The first predictor was a continuous measure of student socioeconomic status (SES). The second predictor was a dichotomous measure of school sector in which a value of 0 reflected a public school and a value of 1 reflected a private school (49% of schools were private). The final predictor was a continuous measure of disciplinary climate of the school in which higher values reflected greater disciplinary problems. Since we need the average intercept and average slope, then each school has its own regression model and this means the intercept and the slope vary within schools.
The Model
The random coefficient model with two levels is used to represents the relationships
between the variables, where level1 is the student’s level and consists
of two variables:
• 
SES: Socioeconomic status 
• 
Mathach: Math achievement 

Fig. 1: 
Math achievement model 
where, y_{ij} is the outcome scores for student i in school j, x_{ij} are the values on the SES for student i in school j. Each school’s distribution of math achievement is characterized by two parameters: the intercept, β_{0j} and the slope β_{1j}
The intercept and slope parameters, β_{0j} and β_{1j}
are vary across schools in the level2 model which consists of two variables:
• 
Sector: 1 = Private, 0 = Public 
• 
Mean SES: Mean of the SES values for the students in this school
who are included in the level1 
Then the schoollevel model can be written as:
Where: 


γ_{00 } 
= 
Overall intercept 
γ_{01 } 
= 
The main effect of Mean SES, 
γ_{02 } 
= 
The main effect of sector 
γ_{10 } 
= 
The main effect of SES 
γ_{12 } 
= 
Two cross level with interactions involving sector with student SES 
γ_{11 } 
= 
Mean SES with student SES. Moreover 
u_{0j} and u_{1j } 
= 
Random errors 
The path diagram for the model is shows in Fig. 1.
The unified equation model; which represents the combinations of the Eq.
5 and 6 can be written as:
Table 2: 
Results for educational research model: OLS and GME estimates 

We estimate γ_{01} to study whether highSES differ from lowSES schools in means achievement (controlling for sector). Similarly, we estimate γ_{02} to learn whether private schools differ from public schools in terms of the mean achievement once Mean SES is controlled. These two estimates will clarify whether the Mean SES is significantly predicted the intercept or the school’s slope, respectively. While by estimating γ_{11} we discover whether highSES schools differ from lowSES schools in terms of the strength of association between student SES and achievement within them (controlling for sector). Also, we estimate γ_{12} to examine whether the private differ from the public schools in terms of the strength of association between student SES and achievement.
Now, in order to estimate the parameters by using GME estimation method; we need to reparametrize the unknowns and the error terms in Eq. 7. Then the GME system will be a nonlinear programming problem given as follows:
Maximize:
Subject to:
Hereafter, we use the IMSL/Fortran in order to estimate the unknown parameters.
The estimation results are shown in Table 2.
The estimations results indicate that both methods gave almost similar effect
on the predictors, but the GME estimators have smaller standard error. In general,
it can be noted that Mean SES is positively related to school mean math achievement,
=
5.33 (0.37) (GLS) and 5.08 (0.24) (GME). Also, Private schools have higher mean
achievement than public schools, controlling for the effect of Mean SES, =
1.23 (GLS) and 3.11 (GME). With regard to the slopes, there is a tendency for
schools of high MeanSES to have larger slopes than do schools with low Mean
SES, =
1.03 (GLS) and 2.95 (GME). The only difference between the two methods is that
the GLS estimates indicated that private schools have weaker SES slopes =
1.64, while the GME estimates indicated that private schools have positive
SES slopes =
0.81.
CONCLUSION
This study proposed the GME estimation method in context of parameter estimation of random coefficient models. By comparing the GME estimates with their counterpart based on Goldstein’s estimators, the simulation results demonstrated that GME estimates are superior and often closer to the parameter than the GLS estimates. For all sample sizes used in the simulation study, there is an advantage for using the GME estimator. Also, the real data analysis also supports the robustness of the GME estimation method since the GME estimators have smaller standard errors than the GLS estimators. The conclusions that are suggested by the analysis of the given example are that; the SES positively related to the Math achievement and the private schools have better achievement than the public schools. Moreover, even the GLS method has good advantages from the computational and interpretational point of view; it found that the GME method gives more precise estimates. Consequently, the GME estimator can be recommended as an alternative method for estimating the two level random coefficients parameters.
ACKNOWLEDGMENTS
Author would like to thank Luigi D’Ambra and Enrico Ciavolino the MTISD2008 Program Committee and a referee for their valuable suggestions which improved the contents of this study.