Research Article

# Estimating Multinomial Logit Model with Multicollinear Data ABSTRACT

The multinomial logit model is used to study the dependence relationship between a categorical response variable with more than two categories and a set of explicative variables. In presence of multicollinearity, the estimation of the multinomial logit model parameters becomes inaccurate. To solve this problem we develop an extension of principal component logistic regression. Finally a simulation study illustrates the advantages of the method.

 Services Related Articles in ASCI Similar Articles in this Journal Search in Google Scholar View Citation Report Citation Science Alert

 How to cite this article: I. Camminatiello and A. Lucadamo, 2010. Estimating Multinomial Logit Model with Multicollinear Data. Asian Journal of Mathematics & Statistics, 3: 93-101. DOI: 10.3923/ajms.2010.93.101 URL: https://scialert.net/abstract/?doi=ajms.2010.93.101

INTRODUCTION

The regression analysis studies the statistical dependence of one or more dependent variables, Y, on one or more explanatory variables, X. All procedures used and conclusions drawn in a regression analysis depend on assumptions of a regression model. The most used model is the classic linear regression model and the most used method for estimating classic model parameters is the method of Ordinary Least Squares (OLS).

Under the classic assumptions this method has some attractive statistical properties that have made it one of most powerful and popular methods of regression analysis. However, OLS is not appropriate when strong correlation among predictors (multicollinearity) exists, so alternative methods to OLS have been developed.

The presence of multicollinearity may indicate that some explanatory variables are linear combinations of the other ones. Consequently, they do not improve explanatory power of a model and could be dropped from the model. In some situations, it is not feasible to use variable selection to reduce the number of explanatory variables or it is not desirable to do so.

An alternative way of dealing with unpleasant consequences of multicollinearity lies in biased estimation: we can sacrifice a small bias for a significant reduction in variance of an estimator. This observation motivates a whole class of biased estimators called shrinkage estimators.

There are various shrinkage methods that perform well under multicollinearity and that can possibly act as variable selection tools as well: the Ridge Regression (RR) (Hoerl and Kennard, 1970) and its modifications, Partial Least Squares (PLS) regression (Wold, 1966, 1985) Principal Components Regression (PCR) (Massy, 1965).

Frank et al. (1993) compared the methods above mentioned. Although the results are conditional on the simulation design used in the study, the researchers indicate that RR, PCR and PLS have similar proprieties, give similar performance and are highly preferable to variable selection.

When the response is measured on nominal scale with more than two categories, the multinomial logit model (Luce, 1959; McFadden, 1974) is applied.

In this study, we show, in presence of multicollinearity, the estimation of the multinomial model parameters becomes inaccurate because of the need to invert nearsingular and ill-conditioned information matrices. To provide an accurate estimation of the model parameters, we propose a new strategy based on PCR. Then, the performance of the proposed model is analyzed by developing a simulation study. In order to validate the results obtained from each simulation we apply bootstrap procedure.

A SOLUTION TO MULTICOLLINEARITY: PRINCIPAL COMPONENT MULTINOMIAL REGRESSION

The binary logit model is used to predict a binary response variable in terms of a set of explicative ones. When the response is measured on nominal scale with more than two categories, the multinomial logit model is applied. Both the models become unstable when there is multicollinearity among predictors (Ryan, 1997).

To improve the estimation of the binary logit model parameters, Marx (1992) introduced iteratively reweighted partial least squares algorithm, Bastien et al. (2005) proposed partial least squares logit regression, Aguilera et al. (2006) presented Principal Component Logistic Regression (PCLR), Vagoand Kemeny (2006) developed the ridge logistic regression.

Following the approach proposed by Aguilera et al. (2006), we propose to use as covariates of the multinomial logit model a set of orthogonal variables, linear combination of original ones, in order to provide an accurate estimation of the parameters.

Here, we describe Principal Component Analysis (PCA) and multinomial logit regression, then we illustrate our idea.

Principal Component Analysis
Principal Component Analysis (PCA) is a multivariate technique that transforms a number of correlated variables into a (smaller) number of uncorrelated variables with maximum variance, called Principal Components (PC). The first principal component accounts for as much of the variability in the data as possible and each succeeding component accounts for as much of the remaining variability as possible.

Let be a set of p quantitative independent variables, y a categorical response variable with more than two categories. The aim of PCA is to find a set of uncorrelated latent variables which are linear combinations of the original variables Z = XV. The weight matrix is built by the eigenvectors of the covariance or correlation matrix. These matrices can be calculated from the data matrix X. The covariance matrix contains scaled sums of squares and cross products. The correlation matrix is similar to the covariance matrix but first the variables, i.e., the columns, have been standardized. For the reasons which are beyond the scope of this paper, it is often preferable to perform the analysis on correlation matrix R, whose elements are the correlation coefficients among the independent variables. The basic proprieties of the analysis are:

 • The PC’s are orthogonal • The weights used to determine the PC’s maximize the variance among the x variables, so, the first a

Multinomial Logit Regression
Multinomial logit regression is the simplest model in discrete choice analysis when more than two alternatives are in a choice set. It is derived from utility-maximizing theory that states that consumer chooses the alternative which maximizes his utility. Obviously not all the attributes of the alternatives will be observed and for this reason the utility is divided in two parts:

 • Dib is the systematic part of the utility that the individual i receives by a generic alternative b • εib is the random part and summarizes the contribution of unobserved variables (Ben-Akiva and Lerman, 1985). The probability to select a specific alternative c for the individual i is then: (1)

where, Dic is the systematic part of the utility that the individual i receives by the alternative c, εic is the disturbance and s is the number of alternatives.

If we assume that the disturbances are independent and identically extreme value distributed (Marschak, 1960) we obtain the multinomial logit model. The probability can be then expressed as follows: (2)

The term μ is a scale parameter and it can be normalized to 1; furthermore, if the systematic part of the utility is linear in the parameters, we have (Train, 2003): (3)

where, xij, (i = 1,...,n,; j = 1,...,p) are the elements of the X matrix and βjb are the parameters to be estimated.

Principal Component Multinomial Regression
At this point, we can present the new approach: Principal Component Multinomial Regression (PCMR). At first step, PCMR creates the PC’s of the regressors as described above. At second step the multinomial model is carried out on the set of p PC’s. The probability, for the individual i, to choose the alternative c can be expressed in terms of all PC’s as: (4)

 Where: zik, (i = 1,...,n; k = 1,...,p) = The elements of the PC matrix vkj, (j = 1,...,p) = The elements of the transposed matrix VT = The coefficients to be estimated βjb = The parameters expressed in function of original variables and S = The number of alternatives of the data set

At third step, the number of PC’s a<p, to be retained in the model, is chosen. The next paragraph discusses about the different tools for selecting the number of PC’s. At fourth step, the multinomial model is carried out on the subset of a<p PC’s. The probability, for the individual i, to choose the alternative c can be expressed in terms of a PC’s as: (5)

 Where: = The coefficients to be estimated on the subset of a PC’s = The PCMR parameters obtained after the extraction of the a components

Finally, the multinomial model parameters can be expressed in function of original variables (X matrix). (6)

where, β(a) = V(a)γ(a); Z(a) is the matrix of a PC’s; γ(a) is the matrix of parameters on a PC’s for the s alternatives; V(a) is the matrix of a eigenvectors; β(a) is the matrix of parameters expressed in function of original variables. An interesting result which has been obtained is β(p) = β, that is, if we retain all PC’s in the model, the matrix of parameters expressed in function of original variables, β(p) is equal to the matrix of classical multinomial parameters, β.

However, the most important result is that the PCMR leads to lower variance estimates of model parameters comparing to classical multinomial model. We calculate the variance of the estimated parameters of the multinomial model by bootstrap resampling. Let, be the bootstrap estimate of the parameter for the l-th sample, let be the estimated parameter, the bootstrap estimate of variance of is the empirical estimate calculated from m bootstrap values: (7)

where, is the bootstrap mean of the estimations of the j-th parameter

Model Calibration and Validation
The number of PC’s, a, is bounded from above by p, the number of x variables. Hence, the number of components should be chosen in the range 1≤a≤p. The number of PC’s, a, to be retained in the model can be selected according to different tools. The first possibility is to retain all the components, but the most used criteria are:

 • To consider the PC’s in their natural order and stop when explained variability is about 75% • To consider the PC’s that correspond to eigenvalues bigger than one

However, the dependence relationship between the response and the predictor variables is not taken into account. For this reason we propose the criterion of considering in the model all the PC’s that influence in statistical significant manner the response variable. A forward stepwise procedure is applied for selecting the significant components.

To determine the goodness of the different criteria, we develop a bootstrap procedure and we use the bootstrap samples to estimate the parameters, both for the original matrix and for the PC matrix. We propose two accuracy measures:

 • The Root Mean Squared Error (RMSE) of bootstrap estimates for • The BIAS for , calculated as the differences, in absolute value, between the bootstrap mean of the parameter estimations and the true values of the parameters

They are defined as follows: (8)

The simulation study and the accuracy measures show the best results are obtained, when the criterion of significant components (the third above-written-criterion) is used.

A SIMULATION STUDY

In order to illustrate the performance of the proposed approach, we develop a simulation study according to the scheme proposed by Hosmer et al. (1997).

The first step in the simulation process is to obtain a set of p explicative variables with a known correlation framework. For this purpose, we apply Cholesky decomposition. The second step is to fix a vector of real parameters β and compute the real probabilities. Finally, each value of the response is simulated from a multinomial distribution. After each data simulation, we fit the multinomial model. As we will see the estimated parameters are always very different to the real ones due to multicollinearity.

As we stated in previous sections, the PCA of the regressors helps to improve this inaccurate estimation of the parameters. Once the PC’s of the simulated covariates are computed, we fit the PCMR(a) models with different sets of a PC’s. Then, we compute for all fitted PCMR(a) models the estimated parameters in terms of the original variables and their variance defined in the Eq. 7 for testing the improvement in the parameter estimation.

This simulation design is carried out for three different numbers of regressors (p = 10, 12 and 15) two different sample sizes (n = 100 and 200) three different number of alternatives (s = 3, 4 and 5) and two different distributions of regressors (standard normal and uniform distribution). The number of performed simulations is 360. Different criteria are considered to decide the number of components to retain in the model.

We present two specific simulation studies:

 • Simulation 1 with n = 100, p = 10 and s = 4, correlation among the regressors from 0.4 to 0.9 and regressors with standard normal distribution • Simulation 2 with n = 100, p = 10 and s = 4, correlation among the regressors from 0.4 to 0.9 and regressors with uniform distribution

Table 1 to 5 shows the results for the simulation 1 and their validation. Table 6 displays the results for the simulation 2.

Let us focus on the simulation 1. In the first column of table 1 there are the labels of the parameters, then we have the real parameters (Real), the parameters estimated with all the components (All PC's), the parameters calculated on the dataset of PC’s with eigenvalue bigger than one (PCMR(1)) and the parameters estimated on the PC’s that influence in significant manner the dependent variable (PCMR (sign)). In all the cases the parameters are calculated for the three possible alternatives (the fourth is the fixed alternative). As β(p) = β, we do not insert the column of parameters obtained trough classical multinomial model.

It is easy to see (in bold character) that the parameters estimated with all the components have many discordances in the sign (15 discordant signs) and they are always very different to the real ones.

 Table 1: Estimated parameters with the different methods for all the alternatives (simulation 1) Bold values indicate the parameters estimated with all components have many discordances in the sign

 Table 2: Differences in absolute value between the real parameters and the estimated ones for all the alternatives (simulation 1) Table 3: RMSE of estimated parameters for the different methods for all the alternatives (simulation 1) Table 4: BIAS: differences between the mean of the parameter estimations and the true values for all the alternatives (simulation 1) Table 5: Bootstrap estimates of variance of the parameters for the different methods for all the alternatives (simulation 1) Table 6: Estimated parameters with the different methods for all alternatives (simulation 2) Bold values indicate the parameters estimated with all components have many discordances in the sign

The situation improves if we consider the parameters calculated with the other two methods and we have the best results for the PCMR(sign): 9 signs discordant and parameters more similar to real ones.

For the sake of facilitating the reading we calculate in Table 2 the differences, in absolute value, between the real parameters and the estimated ones, with the different methods. It is possible observe that such differences are high for the classical multinomial method. As we have previously stated, these results must be caused by multicollinearity.

In order to validate the results obtained from each simulation we apply bootstrap resampling. Table 3 and 4 show the RMSE and the BIAS for the different methods, parameters and alternatives. The number of bootstrap samples is 100. In the Table 3, we can note that the RMSE, computed considering all the Principal Components, is very high for the first 2 alternatives, assuming, sometimes, values higher than 1000000. For the third alternative, the situation is a little better, but the results are not so good, if we compare them to the ones obtained with the techniques based on PCMR. In fact, form Table 3, that the values of the RMSE are always lower than before (not higher than 4) and, for the PCMR(sign), they are very low.

Passing to consider the Table 4 calculated BIAS, which is very high for the classical method. This means that the averages of parameter estimations are very far from the real values of the coefficients. If the reader looks at the results relative to the PCMR(1) and PCMR(sign), he can observe that they are better and, in particular, the values for the last method show that the bootstrap means of estimates are very near to the real parameters.

Finally in Table 5, we calculate the bootstrap estimates of variance of the estimated parameters using the Eq. 7. It is interesting to observe the variance of PCMR estimates is lower than classical multinomial one. In particular, the variance of classical multinomial estimates varies from 4.228807 to 1049110, PCMR (sign) one, instead, from 0.019927 to 1.944938. So, the aim of obtaining a considerable reduction in variability of estimates is reached. The positive effect of this result on confidence intervals, test hypothesis, etc., is remarkable.

Table 6 shows the results for the simulation 2. They confirm that the parameters estimated with all the components have many discordances in the sign (14 discordant signs) and they are always very different to the real ones.

The situation improves if we consider the parameters calculated with the other two methods and we have the best results for the PCMR(sign): 8 discordant signs and parameters more similar to real ones.

CONCLUSION

In this study, we showed the estimation of the multinomial logit model parameters presents a high variance, when there is multicollinearity among the regressors. To solve the problem, we proposed to use as covariates of the multinomial model a reduced number of PC’s of the predictor variables. An extensive simulation study showed that the proposed approach is a valid alternative in presence of multicollinearity.

In order to select the optimum PCMR(a) model, we considered and compared different methods for including PC’s in the model. The method, that considered in model all the PC’s that influence in statistical significant manner the response variable, yielded more stable and lower variance estimates of model parameters. This is a considerable advantage that will allow us to focus on inferential aspects of the proposed approach.

Finally, a generalization of the other models, proposed in literature, for solving the problem of high-dimensional multicollinear data in the binary logit model could be interesting.

REFERENCES
Aguilera, A.M., M. Escabias and M.J. Valderrama, 2006. Using principal components for estimating logistic regression with high-dimensional multicollinear data. Comput. Stat. Data Anal., 50: 1905-1924.

Bastien, P., V. Esposito and M. Tenenhaus, 2005. PLS generalised linear regression. Comput. Stat. Data Anal., 48: 17-46.

Ben-Akiva, M. and S. Lerman, 1985. Discrete Choice Analysis. MIT Press, Cambridge, ISBN: 0262022176, pp: 390.

Frank, I.E., J.H. Friedman, S. Wold, T. Hastie and C. Mallows, 1993. A statistical view of some chemometrics regression tools. Technometrics, 35: 109-148.

Hoerl, A.E. and R.W. Kennard, 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12: 55-67.

Hosmer, D.W., T. Hosmer, S. Le Cessie and S. Lemeshow, 1997. A comparison of goodness-of-fit tests for the logistic regression model. Stat. Med., 16: 965-980.
CrossRef  |  PubMed  |

Luce, R.D., 1959. Individual Choice Behaviour: A Theoretical Analysis. John Wiley and Sons, New York.

Marschak, J., 1960. Binary Choice Constraints on Random Utility Indicators. In: Stanford Symposium on Mathematical Methods in the Social Scienses, Arrow, K. (Ed.). Stanford University Press, Stanford CA., USA.

Marx, B.D., 1992. A continuum of principal component generalized linear regression. Comput. Stat. Data Anal., 13: 385-393.

Massy, W.F., 1965. Principal components regression in exploratory statistical research. J. Am. Stat. Assoc., 60: 234-256.

McFadden, D., 1974. Conditional Logit Analysis of Qualitative Choice Behavior. In: Frontiers in Econometrics, Zarembka, P. (Ed.). Academic Press, New York, pp: 105-142.

Ryan, T.P., 1997. Modern Regression Methods. John Wiley and Sons, Inc., New York pp: 255-313.

Train, K.E., 2003. Discrete Choice Methods with Simulation. 1st Edn. Cambridge University Press, Cambridge.

Vago, E. and S. Kemeny, 2006. Logistic ridge regression for clinical data analysis (a case study). Applied Ecol. Environ. Res., 4: 171-179.

Wold, H., 1966. Estimation of Principal Components and Related Models by Iterative Least Squares. In: Multivariate Analysis, Krishnaiaah, P.R. (Eds.). Academic Press, New York, pp: 391-420.

Wold, H., 1985. Partial Least Squares. In: Encyclopedia of Statistical Sciences, Kotz, S. and N.L. Johnson (Eds.). Vol. 6, Wiles, New York, USA., pp: 581-591. 