Subscribe Now Subscribe Today
Research Article

Multilevel Linear Models Analysis using Generalized Maximum Entropy

A.D. Al-Nasser, O.M. Eidous and L.M. Mohaidat
Facebook Twitter Digg Reddit Linkedin StumbleUpon E-mail

The aimed of this study was to introduce the general multilevel models and discusses the Generalized Maximum Entropy (GME) estimation method that may be used to fit such models. The proposed procedure is applied to the two-level data model. The GME estimates were compared with Goldstein’s generalized least squares estimates. The comparisons are made by two criteria; the bias and the efficiency. We find that the estimates of two level’s model were substantially and significantly biased using Goldstein’s generalized least squares approach. However, the GME estimates are unbiased and consistent, we conclude that the GME approach is a recommended procedure to fit multilevel models. An application to a real data in education is also discussed.

Related Articles in ASCI
Similar Articles in this Journal
Search in Google Scholar
View Citation
Report Citation

  How to cite this article:

A.D. Al-Nasser, O.M. Eidous and L.M. Mohaidat, 2010. Multilevel Linear Models Analysis using Generalized Maximum Entropy. Asian Journal of Mathematics & Statistics, 3: 111-118.

DOI: 10.3923/ajms.2010.111.118



Multilevel linear models or random coefficients models are a type of mixed model with hierarchical data in away that each group at the higher level is assumed to have different regression slopes as well as different intercepts for purposes of predicting an individual-level of the dependent variable. Random coefficients model is illustrated by Bryk and Raudenbush (1992), Goldstein (1987), Langford (1987) and Raudenbush et al. (2005). The two levels model can be expressed in two equations; level 1 and 2 as:

Level 1


where, i refers to the level 1 unit and j refers to the level 2 units, yij is the response variable for level 1 unit i within level 2 unit j, B0j represents random intercept for the level 2 unit j, B1j represents random slope of variable Xi of unit j and rij represents the residual for unit i within unit j. Also, J is the largest number of levels and nj is the jth level sample size.

Level 2


In level 2 the parameters γ00 and γ10 are intercepts, γ01 and γ11 represent slopes predicting B0j and B1j, respectively from an outer variable Wj explanatory variable of level 2. Noting that, Wj should be in matrix form involves a (J+1) row vector of predictors in a block diagonal fashion. Moreover, U0j and U1j are level two random errors (random effects) that assumed to have zero means with an arbitrary variance covariance matrix.

The traditional estimation method used to estimate the parameters of model given in Eq. 1 and 2 is the iterative generalized least squares method; which is a sequential refinement procedure based on Ordinary Least Square (OLS) estimation. The method has been described in detail by Goldstein (1986). For known variance covariance matrix (V) of the level1 residual, then the GLS of the coefficients in level 1 is:

However, the GLS analysis to estimate level 2 parameters is:

Given that, V* and V V, where, vec is the vector operator and is the Kronecker product.


The traditional maximum entropy formulation is based on the entropy-information measure which reflects the uncertainty about the occurrence of a collection of events. Shannon (1948) defined the entropy of the distribution (discrete events{x1, x2, ..., xk} whose probabilities of occurrences are p1, p2, ...,pk), as the average of self-information:

where, 0ln(0) = 0.

Since, the 1990’s many attempts have been made to apply the method of maximum entropy in the area of linear models. Golan et al. (1996) proposed an estimator based on the maximum entropy formalism of Jaynes (1957) that they called the Generalized Maximum Entropy (GME) estimator. The idea underling the GME approach in the general linear model can be clarified by considering the following nonlinear relationships:

where, β = (β12,...,βK)' is the vector of parameters to be estimated, the regressor variable xi, i = 1,2,... n are K-dimensional vectors whose values are assumed known and εi, i = 1,2,...,n is the random error.

In GME, the unknown parameters are reparameterized as follow: β = ZP; where, Z is a (KxKR) matrix and P is a KR-vector of weights such that pk>0 and p'k1R = 1for each k. Simply, each βk, k = 1,2,...,K can be defined by a set of equally distanced discrete points Z'k = [zk1,zk2, zk3,...,zkR] where, R≥2 with corresponding probabilities P'k = [pk1,pk2,pk3,...,pkR]. That is:

In similar fashion, the disturbance term may be rewritten as and ε = VW, where V is a (nxx nJ1) matrix and W is a nJ1-dimensional vector of weights; that is to say:

The choice of Z should be uniformly and symmetrically around zero with equally spaced distance discrete points, for example Z = (-c,0,c), c large value. On the other hand, the actual bounds for vi depend on the observed sample as well as any conceptual or empirical information about the underlying error. However; if such conceptual or empirical information does not exist, then vi may be specified to be uniformly and symmetrically distributed around zero. Chebychev’s inequality may be used as a conservative means of specifying sets of error bounds. For any random variable, X, such that E(X) = 0 and Var(X) = σ2, the inequality provides, P(|X|<dσ)≥1-1/d2, d>0; then the chebchyev’s error bounds are v1 = dσ and vn = dσ. One can use 3σ rules. However, the number of support points for each parameter, R and for the disturbance, J1, may be increased to reflect higher moments or more refined prior knowledge about β and ε, based on Al-Nasser (2003), Al-Nasser (2005), Ciavolino and Al-Nasser (2009) and Golan (2008) it appears that the greatest improvement in precision comes for using R and J1 to be 5 support points.

Now, using the reparameterized unknowns’ β = ZP and ε = VW, we rewrite the general linear model as follows:

Then maximum entropy principle may be stated in scalar summations with two nonnegative probability components and the GME estimators can be achieved by solving the following non-linear programming problem:

Subject to:


Note that is the Kronecker product, 1K is a K-dimensional vector of ones and In(P) = (IN(p11),In(p12),...,In(pKR)). The GME system in Eq. 3 is a non-linear programming system that can be solved by applying the Lagrangian method, in which after finding the lagrangian function, the first order conditions are solved.


In order to estimate the two level random coefficient model by using GME method we rewrite Eq. 1 and 2 by one equation as:


In the new general model Eq. 4, there are four unknown parameters which should be reparametrized by following the GME principles, that is mean each parameter will be rewritten as a convex combination of a discrete random variable in the following matrix form:

Also, in this model; there are three error terms that should be reparametrized in a similar manner:

Using these reparameterization expressions, then the model can be rewritten as:

Therefore, the GME nonlinear programming system:


Subject to:

Note that is the Kronecker product. Then this nonlinear programming system can be solved numerically. However, the final estimators will be obtained by the following formulas:


A simple Monte Carlo simulation study is considered to study the performance of the parameter estimation using GME. For the purposes of the simulation study we considered the following balanced random slope model:

Then the compound model:

Table 1: Monte carlo comparisons between OLS and GME for Random coefficient model

Then the simulation study performed under the following assumptions:

Generate 1000 random sample of size n = 10, 20, 30 and 50 and number of intercepts J = 2
The error ~ N(0, 1), X~Exp 1 and set β1 = β2 = 1
The error u ~ N(0, 1), W~U(0,1) and set γ0 = 1, γ1 = 1.5
For GME estimator, we initial three support values for parameter in the interval [-10, 0, 10] and three support values for the error term selected in the interval [- 3 S, 0, 3S] where, S is the standard deviation of the dependent variable y
The simulation results for estimating fixed effect parameters, are given in Table 1

The simulated bias and the efficiency are computed based on the following formulas:

Under the simulation assumptions, the results in Table 1 indicate the superiority of GME estimation method over the OLS. It can be noted that, for all sample sizes the GME estimators are more accurate and more efficient than their counter part based on the OLS estimation method.


The data of the High School and Beyond (HSB) study is used. These data consist of a total sample of 7,185 students who are nested within 160 schools; 90 public and 70 private. Between 14 and 67 students were assessed from each school, with a median of number of 47 students assessed. The outcome of interest is a student-level measure of math achievement. The first predictor was a continuous measure of student socioeconomic status (SES). The second predictor was a dichotomous measure of school sector in which a value of 0 reflected a public school and a value of 1 reflected a private school (49% of schools were private). The final predictor was a continuous measure of disciplinary climate of the school in which higher values reflected greater disciplinary problems. Since we need the average intercept and average slope, then each school has its own regression model and this means the intercept and the slope vary within schools.

The Model
The random coefficient model with two levels is used to represents the relationships between the variables, where level-1 is the student’s level and consists of two variables:

SES: Socio-economic status
Mathach: Math achievement

Fig. 1: Math achievement model


where, yij is the outcome scores for student i in school j, xij are the values on the SES for student i in school j. Each school’s distribution of math achievement is characterized by two parameters: the intercept, β0j and the slope β1j

The intercept and slope parameters, β0j and β1j are vary across schools in the level-2 model which consists of two variables:

Sector: 1 = Private, 0 = Public
Mean SES: Mean of the SES values for the students in this school who are included in the level-1

Then the school-level model can be written as:


γ00 = Overall intercept
γ01 = The main effect of Mean SES,
γ02 = The main effect of sector
γ10 = The main effect of SES
γ12 = Two cross level with interactions involving sector with student SES
γ11 = Mean SES with student SES. Moreover
u0j and u1j = Random errors

The path diagram for the model is shows in Fig. 1.

The unified equation model; which represents the combinations of the Eq. 5 and 6 can be written as:


Table 2: Results for educational research model: OLS and GME estimates

We estimate γ01 to study whether high-SES differ from low-SES schools in means achievement (controlling for sector). Similarly, we estimate γ02 to learn whether private schools differ from public schools in terms of the mean achievement once Mean SES is controlled. These two estimates will clarify whether the Mean SES is significantly predicted the intercept or the school’s slope, respectively. While by estimating γ11 we discover whether high-SES schools differ from low-SES schools in terms of the strength of association between student SES and achievement within them (controlling for sector). Also, we estimate γ12 to examine whether the private differ from the public schools in terms of the strength of association between student SES and achievement.

Now, in order to estimate the parameters by using GME estimation method; we need to reparametrize the unknowns and the error terms in Eq. 7. Then the GME system will be a nonlinear programming problem given as follows:


Subject to:

Hereafter, we use the IMSL/Fortran in order to estimate the unknown parameters. The estimation results are shown in Table 2.

The estimations results indicate that both methods gave almost similar effect on the predictors, but the GME estimators have smaller standard error. In general, it can be noted that Mean SES is positively related to school mean math achievement, = 5.33 (0.37) (GLS) and 5.08 (0.24) (GME). Also, Private schools have higher mean achievement than public schools, controlling for the effect of Mean SES, = 1.23 (GLS) and 3.11 (GME). With regard to the slopes, there is a tendency for schools of high Mean-SES to have larger slopes than do schools with low Mean SES, = 1.03 (GLS) and 2.95 (GME). The only difference between the two methods is that the GLS estimates indicated that private schools have weaker SES slopes = -1.64, while the GME estimates indicated that private schools have positive SES slopes = 0.81.


This study proposed the GME estimation method in context of parameter estimation of random coefficient models. By comparing the GME estimates with their counterpart based on Goldstein’s estimators, the simulation results demonstrated that GME estimates are superior and often closer to the parameter than the GLS estimates. For all sample sizes used in the simulation study, there is an advantage for using the GME estimator. Also, the real data analysis also supports the robustness of the GME estimation method since the GME estimators have smaller standard errors than the GLS estimators. The conclusions that are suggested by the analysis of the given example are that; the SES positively related to the Math achievement and the private schools have better achievement than the public schools. Moreover, even the GLS method has good advantages from the computational and interpretational point of view; it found that the GME method gives more precise estimates. Consequently, the GME estimator can be recommended as an alternative method for estimating the two level random coefficients parameters.


Author would like to thank Luigi D’Ambra and Enrico Ciavolino the MTISD2008 Program Committee and a referee for their valuable suggestions which improved the contents of this study.

Al-Nasser, A.D., 2003. Customer satisfaction measurement models: Generalized maximum entropy approach. Pak. J. Stat., 19: 213-226.
Direct Link  |  

Al-Nasser, A.D., 2005. Entropy type estimator to simple linear measurement error models. Aust. J. Stat., 34: 283-294.
Direct Link  |  

Bryk, A.S. and S.W. Raudenbush, 1992. Hierarchical Linear Model: Application and Data Analysis Method. Sage Publications, New Bury Park, CA.

Ciavolino, E. and A.D. Al-Nasser, 2009. Comparing generalized maximum entropy and partial least squares methods for structural equation models. J. Nonparam. Stat., 21: 1017-1036.
CrossRef  |  Direct Link  |  

Golan, A., 2008. Information and entropy econometrics-a review and synthesis. Foundation Trends Econ., 2: 1-145.
Direct Link  |  

Golan, A., G.G. Judge and D. Miller, 1996. Maximum Entropy Econometrics: Robust Estimation with Limited Data. Wiley, New York, USA., ISBN-13: 978-0471953111, Pages: 324.

Goldstein, H.I. 1987. Multilevel Model in Educational and Social Research. Oxford University Press, London.

Goldstein, H.I., 1986. Multilevel mixed linear model analysis using iterative generalized least squares. Biometrika, 73: 43-56.
CrossRef  |  Direct Link  |  

Jaynes, E.T., 1957. Information and statistical mechanics(I,II). Phys. Rev., 108: 171-190.
Direct Link  |  

Langford, N.T., 1987. A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested random effects. Biometrika, 74: 817-827.
Direct Link  |  

Raudenbush, S., A. Bryk and R. Condon, 2005. HLM: Hierarchical Linear and Non-linear Modeling. Scientific Software International, Chicago, IL.

Shannon, C.E., 1948. A mathematical theory of communication. Bell Syst. Tech. J., 27: 379-423.
Direct Link  |  

©  2020 Science Alert. All Rights Reserved