Subscribe Now Subscribe Today
Research Article
 

The Application of Robust Multicollinearity Diagnostic Method Based on Robust Coefficient Determination to a Non-Collinear Data



H. Midi, A. Bagheri and A.H.M.R. Imon
 
ABSTRACT

In this study, we proposed Robust Variance Inflation Factors (RVIFs) in the detection of multicollinearity due to the high leverage points or extreme outliers in the X-direction. The computation of RVIFs is based on robust coefficient determinations which we called RR2 (MM) and RR2 (GM (DRGP)). The RR2 (MM) is coefficient determination of high breakdown point and efficient MM-estimators whereas RR2 (GM (DRGP)) has been defined through an improved GM-estimators. The GM (DRGP) is a GM-estimator with the main aim as downweighting high leverage points with large residuals. It has been introduced by employing S-estimators as initial values, Diagnostic Robust Generalized Potential based on MVE (DRGP (MVE)) as initial weight function and an Iteratively Reweighted Least Squares (IRLS) has been utilized as a convergence method. The numerical results and Monte Carlo simulation study indicate that the proposed RVIFs are very resistant to the high leverage points and unable to detect the multicollinearity in the data especially RR2 (GM (DRGP)). Hence, this indicates that the high leverage points are the source of multicollinearity.

Services
Related Articles in ASCI
Similar Articles in this Journal
Search in Google Scholar
View Citation
Report Citation

 
  How to cite this article:

H. Midi, A. Bagheri and A.H.M.R. Imon, 2010. The Application of Robust Multicollinearity Diagnostic Method Based on Robust Coefficient Determination to a Non-Collinear Data. Journal of Applied Sciences, 10: 611-619.

DOI: 10.3923/jas.2010.611.619

URL: https://scialert.net/abstract/?doi=jas.2010.611.619

INTRODUCTION

When two or more independent variables of a linear regression are highly correlated with each other, multicollinearity is said to exist. Multicollinearity produces unexpectedly large standard errors of the Ordinary Least Squares (OLS) estimates. The presence of multicollinearity in the data set is suggested by non-significant results in individual tests on the regression coefficients for important explanatory variables where in fact they are significant. Since, multicollinearity causes major interpretive problems in regression analysis, it is very crucial to investigate and detect its presence to reduce its destructive effects on the regression estimates. There are different primary sources of multicollinearity, such as the data collection method employed, constraints on the model or in the population being sampled, model specification and an over determined model (Montgomery et al., 2001). It is now evident that high leverage points which fall far from the majority of the explanatory variables are another source of multicollinearity (Kamruzzaman and Imon, 2002). These points are considered as good or bad leverage points according to whether they follow the same regression line as the other data in the data set or not. Furthermore, collinearity influential observations are the observations which can change the collinearity pattern of the data. They may be enhancing or reducing collinearity in the data set. All the high leverage points are not collinearity influential observations and vice versa (Hadi, 1988). While large magnitude of high leverage points which exist in more than one explanatory variable may be collinearity-enhancing observations and they may bring collinearity for explanatory variables in non-collinear data sets. Among different multicollinearity diagnostics which exists in the literatures, the Variance Inflation Factors (VIF) is the commonly used method (Montgomery et al., 2001; Belsley et al., 1980) which is sensitive to the presence of high leverage points. It is worth mentioning that the high leverage points which is pointed by Kamruzzaman and Imon (2002) as a new source of multicollinearity in non-collinear data set, is based on the classical diagnostics method. It is important to note that the estimation of regression parameters is unbiased in the presence of multicollinearity (Montgomery et al., 2001; Belsley et al., 1980). However, when high leverage points cause collinearity in the data set, the coefficient estimations will become bias and this situation is not desired (Bagheri and Midi, 2009). Thus, it is imperative to investigate the source of collinearity in the data set since, the remedial measures to solve the problem of multicollinearity are highly dependent on the understanding of the differences among these sources of multicollinearity (Montgomery et al., 2001). The use of classical diagnostic methods for the detection of multicollinearity caused by high leverage points may produce a misleading conclusion. In collinear or non-collinear data sets, these points have different effect on the classical diagnostics methods. Sometimes the classical methods in the presence of high leverage points indicate non-collinearity for the collinear data set or collinearity for the non-collinear data set. Therefore, if the high leverage points are the only source of multicollinearity in the data set, robust methods can be employed to remedy this problem. In this situation, other remedial measures of multicollinearity problem such as the Principal Component Regression (PCR) (Jolliffe, 1982) are not necessary. Unfortunately, little work has been explored in the effect of high leverage points on the classical multicollinearity diagnostics method. In the presence of high leverage collinearity influential observations, the classical multicollinearity diagnostics methods such as the VIF diagnose the existence of collinearity in the data set but it doesn’t reveal the source of multicollinearity. Making the VIF resistant to these high leverage points will guide us to the detection of the source of multicollinearity in the data set. High leverage points are the only source of multicollinearity if the Robust VIF doesn’t diagnose multicollinearity in their presence. However, on the contrary, the VIF suggests that multicollinearity exists in the data set. Subsequently, the RVIF is unable to detect the existence of mlticollinearity. For a through overview of the robust methods, one can refer to Rousseeuw and Leroy (2003), Wilcox (2005), Maronna et al. (2006) and Andersen (2008). It is important to point out that the formulation of VIF is based on the coefficient determination of fitted regression line when each of the explanatory variables is regressed with the remaining predictor variables using the Ordinary Least Squares (OLS) Method. Nevertheless, it is now evident that outliers in the X or Y direction have an unduly effect on the OLS estimates (Midi et al., 2009) and subsequently the classical VIF. This situation has motivated us to develop new robust VIFs which are based on Robust coefficient determination (RR2). Hence, the Robust coefficient determination (RR2) has to be formulated first, prior to propose any robust VIF. Several robust coefficient determinations exist in the literature of robust methods such as Rousseeuw and Hubert (1997) and Splus 6 Robust Library User’s Guide (2001). The Generalized M-estimators (GM-estimators) is one of the robust methods which have desirable properties that attempts to downweight high leverage points as well as large residuals. In this study, prior to developing a robust VIF, a new GM-estimator is proposed. The proposed method incorporates S-estimators defined by Rousseeuw and Yohai (1984) and DRGP (MVE) which is introduced by Midi et al. (2009). A robust coefficient determination is then proposed based on applying the new GM estimator to fit the regression model. Following this, the robust VIF is developed. Moreover, another Robust VIF is proposed by employing robust coefficient determination based on MM-estimator which is introduced by Yohai (1987). Our new proposed Robust VIFs will be applied to a well-known non-collinear data set. To compare the performance of these robust multicollinearity diagnostics methods, a Monte Carlo simulation study will also be carried out.

MATERIALS AND METHODS

New Proposed robust estimator: The multiple regression model can be expressed as:

(1)

where, Y is an nx1 vector of response or dependent variables, X is an nxp (n>k) matrix of predictors (explanatory variables), β is a px1 vector of unknown finite parameters to be estimated and is an nx1 vector of random errors. When the Ordinary Least Squares (OLS) method is employed to estimate the regression parameters we obtain:

(2)

In the presence of outliers in the X- or Y-directions, the OLS estimations are not reliable. Thus, utilizing robust methods will be necessary. The robust estimators through downweighting the outliers aim to fit the model according to the majority of the data. One of the resistant robust methods against high leverage points are GM-estimators (Hill, 1977) which is introduced by Schweppe. The major aim of these methods is to downweight those high leverage points which have large residuals or bad leverage points. These estimators have high efficiency and bounded influence properties which achieve a moderate break down point equal to 1/p (Simpson, 1995). The GM-estimator is the solution of the normal equation:

(3)

where, πi is defined to downweight high leverage points with high residuals, s is a robust scale estimate and ψ-function may be a monotonic ψ-function such as Huber’s ψ-function which is defined as:

(4)

It is noticeable that k = 1.345 has been chosen to achieve 95% efficiency under normal error distribution. Iteratively Reweighted Least Squares (IRLS) may be used to solve Eq. 4. At convergence, the GM-estimator is written as follows:

(5)

where, in this case the diagonal elements of W are the weights wi defined as:

(6)

Multi-stage GM-estimators have been developed in order to overcome the weakness of GM-estimators, that is low break down point. These estimators may have high break down point if we obtain appropriate initial estimators. One of the first GM-estimator with high efficiency, high breakdown (50%) and bounded influence was proposed by Coakley and Hettmansperger (1993) and (Wilcox, 2005). This method incorporated two most practical high breakdown robust estimators, that is the Least Median of Squares (LMS) and the Least Trimmed Squares (LTS). The LTS estimator is used as initial estimator and the LMS estimator is integrated in the development of scale estimate (Rousseeuw, 1984). The Robust Mahalanobis Distance (RMD) based on Minimum Volume Estimator (MVE) (Rousseeuw and van Zomeren, 1990) (RMD-MVE) also defined as leverage estimates. Rousseeuw (1985) defined RMD-MVE as:

(7)

where, TR (X) and CR (X) are robust location and shape estimate of MVE. In this estimator π-weight is a ratio of the χ2 cutoff value to the squared Robust Mahalanobis Distance. A one step Newton Raphson has been used as convergence approach (for more details, one can refer to Wilcox (2005).

One of the drawbacks of this estimator is in the definition of π-weight which depends on RMD-MVE. The RMD-MVE tends to swamp some low leverage points even though it can identify high leverage points correctly. Thus, it will produce low weights to some of the good leverages as well (Midi et al., 2009; Imon et al., 2009). Improving the precision of these estimators requires an effective diagnostics method that will identify appropriate π–weight. Simpson (1995) discussed several types of Multi-stage GM-estimators and a comparison evaluation of the existing robust estimators. In this study, a new GM-estimator whose major aim is downweighting those high leverage points which have large residuals will be utilized in developing our proposed method. The new GM-estimator which we called GM (DRGP)-estimator is proposed by employing the S-estimator as initial estimator instead of the LTS estimator in the GM-estimator algorithm of Coakley and Hettmansperger (1993). It has been verified that the asymptotic efficiency of this estimator is high (Andersen, 2008) for studying more about S-estimator and its properties. To overcome the shortcoming of π-weight in their algorithm, we propose to employ Diagnostic Robust Generalized Potential based on MVE (DRGP (MVE)) which is proposed by Midi et al. (2009). This latest diagnostics method of high leverage points has an attractive feature whereby it is able to identify the exact number of high leverage points that exist in the data set. The DRGP (MVE) can be defined as:

(8)

where, R {i; RMDi2,<MAD-cutoff(RMD2i)}, a non-parametric cutoff point which is called MAD-cutoff and defined as MAD-cutoff(θ) = Median(θ)+K*Mad(θ) where, K is set to be the constant values of 2 or 3. The RMDi2 is introduced in Eq. 7. Furthermore, D = RT and RT indicates the complement of collection R.The merit of this method is swamping less good leverage as high leverage points compared to the RMD-MVE.

Hence, the proposed algorithm for finding GM (DRGP)-estimator is as follow: To compute a GM estimator, begin by setting k = 0 and computing the S-estimate of the intercept and slope parameters, . Proceed as follows:

Step 1:Compute the residuals of the S-estimator as initial estimator and scale the residuals by applying:

Step 2: Form π-function as:

where, pi is defined in Eq. 8.

Step 3: Define the initial weights as:

for i = 1,…,n where, a Huber’s ψ-function which has been introduced in Eq. 4 is applied.

Step 4:Use these weights to obtain a weighted least squares estimates, . Increase k by step 1

Step 5:Repeat steps 1-4 until convergence. That is, iterate until the change in the estimated parameters is small

Another high break down point estimator which has been defined by Rousseeuw and Yohai (1984) is called S-estimators. These estimators are the solution that finds the smallest possible dispersion of the residuals:

(9)

Instead of minimizing the variance of the residuals; this robust S-estimation minimizes a robust M-estimate of the residual scale:

(10)

where, b is a constant defined as b = Eφ[ρ(e)] and φ represents the standard normal distribution. Differentiating Eq. 8 and solving the following resultant equation:

(11)

where, ψ is an appropriate weight function. The S-estimates have high efficiency which is more than the LTS estimators relative to the OLS (Croux et al., 1994) and has high breakdown point of 50%.

It is important to note that the MM-estimator is one of the most important robust estimators which is first proposed by Yohai (1987). These estimators combine high breakdown value estimators (50%) and M-estimators which have high efficiency (approximately 95% relative

to OLS under the Gauss-Markov assumptions). The MM-estimators in the name refers to the fact that more than one M-estimation procedure is used to calculate the final estimates.

Robust Variance Inflation Factor: Marquardt (1970) proposed the most popular diagnostic tools of multicollinearity, namely, the Variance Inflation Factor (VIF) which measures how much the variance of the estimated regression coefficients are inflated as compared to when the predictor variables are not linearly related. Thus it is define as follows:

(12)

where, R2 is the coefficient determination of each of the explanatory variables when regressed on the other explanatory variable in the ordinary regression model by using the OLS method. Moderate collineriaty exists in the data set when VIF is between 5 and 10. Any VIF value that exceeds its cutoff point 10, indicates that the associated regression coefficients are poorly estimated because of severe multicollinearity.

This multicollinearity diagnostics method is highly sensitive to the presence of high leverage points because of the effect of these points on R2. Consequently, misleading conclusions are obtained from the classical VIF. In this respect, it is imperative to formulate a robust diagnostics method to avoid from making a wrong conclusion. Since, the computation of VIF is highly dependent on the calculation of R2, the robust version of VIF also can be defined from RR2. In this study, we develop RVIF which are based on two robust coefficient determinations, namely the RR2 = (MM) and the RR2 (GM(DRGP).

The RR2 (MM) is one of the handiest Robust coefficient determination (RR2) and can be obtained from robust library of SPLUS software. It can be calculated as follows:

If the corresponding coefficient estimates are the initial S-estimates (RR2), the RR2 (MM) is computed using the initial S-estimates. If an intercept term is included in the model, then the RR2 (MM) is defined as:

(13)

where, se = 0 and sy is the minimized (μ), for a regression model with only an intercept term with parameter μ. If the corresponding coefficient estimates computed using the final M-estimates, it can be obtained through Final M-estimator . If an intercept term μ is included in the model, then the RR2 (MM) is defined as:

(14)

where, is the location M-estimate corresponding to the local minimum of:

such that:

where, μ* is the sample median estimate. RR2 (MM) also can be obtained from coefficient determination of MM-estimators algorithm. The RVIF which is defined by replacing R2 in Eq. 12 by RR2 = (MM) is called the RVIF (MM). Another proposed robust coefficient determination RR2 = (GM(DRGP)) may be defined as follows:

(15)

where,

rt and w(GM) are the residual and weight, respectively after the algorithm converged. Subsequently, the RVIF (GM (DRGP)) can be formulated by substituting the R2 in Eq. 12 with the RR2(GM(DRGP).

RESULTS

Commercial properties data: In order to investigate the effect of high leverage points on multicolllinearity pattern of the data, a non-collinear data set which was introduced by Kutner et al. (2004) is considered. Commercial Properties data containing 81 observations is taken from the suburban commercial properties. The response variable is rental rates which were regressed to the age (X1), operating expenses and taxes (X2) and vacancy rates (X3). This data set contains high leverage points while these high leverage points cannot cause collinearity in this data set (Bagheri and Midi, 2009). According to Bagheri and Midi (2009) adding large magnitude of high leverage points with the same observations to more than two explanatory variables, cause multicollinearity problem for non-collinear data set. Hence, this data set has been modified to have high leverage collinearity-enhancing observations. In order to modify this data set, the first observation of the first two explanatory variables is replaced with a large value of high leverage point (equal to 300). Figure 1a and b present the scatter plot of the original and the modified commercial properties data set.

The coefficient estimations, standard deviations and t-values of the original and the modified commercial properties data set are presented in Table 1 and 2, respectively. The Bootstrap standard deviation of the MM- and our proposed GM (DRGP)-estimates are also computed (Anderson, 2008).

The classical and the Robust VIFs for the original and the modified Commercial Properties data set are exhibited in Table 3.

To verify the merit of our new robust VIF method, a Monte Carlo simulation will be carried.

Fig. 1: The scatter plot of (a) original and (b) modified commercial properties data set

Table 1: Parameter estimations, standard deviations and t-values of original commercial properties data set

Table 2: Parameter estimation and standard deviation of modified commercial properties data set

Table 3: Classical and robust VIF for original and modified commercial properties data set

Monte Carlo simulation study: A simulation study is conducted to further asses the performance of our new proposed Robust VIF. Three explanatory variables were considered in which each variable was generated from N (0, 1). We refer to this generated data as the clean independent variables. In order to create collinearity-enhancing observations, the clean data is replaced by certain percentage of high leverage points. The level of high leverage points varied from zero to 25. We considered moderate sample size equals to 100 and 10000 replications in each simulation run. The Magnitude of Contamination (MC) in the X-direction has been varied from 20, 50, 100 and 300. In order to obtain collinearity-enhancing observations, the high leverage points were replaced in all three explanatory variables. The average values of the classical and robust VIF were computed over 10000 simulation runs. To be certain on the result of the simulation study, the percentage of errors for the classical and the robust diagnostics method are calculated to indicate the degree of multicollinearity, whether severe or moderate.

Table 4 shows the performance of our new proposed robust methods that is RVIF (MM) and RVIF (GM (DRGP)) when there aren’t any high leverage points in the data set and MC is equal to 100.

Table 5 and 6 exhibit the performance of classical and robust VIF for the small to moderate magnitude of contamination, that is 20 and 50 and moderate to large magnitude of contamination, that is 100 to 300, respectively.

Table 4: The performance of classical and robust VIF methods in the non-collinear data set

Table 5: The effect of different percentage and magnitude of high leverage points equal to 20 and 50 on classical and robust VIF when n=100
#%HL: Percentage of high leverage points, MC: Magnitude of contamination

Table 7 displays the percentage of error for each of the multicollinearity diagnostics method in the identification of the degree of collinearity when the percentage of high leverage point is 25 and different magnitude of high leverage points.

Table 6: The effect of different percentage and magnitude of high leverage points equal to 100 and 300 on classical and robust VIF when n = 100
#%HL: Percentage of high leverage points, MC: Magnitude of contamination

Table 7: The error percentage of classical and Robust VIF for diagnosing collinearity pattern of the data set in 10000 replications for 25 percent high leverage points

DISCUSSION

At first we discuss the results of the numerical examples. According to Figure 1a, none of the explanatory variables in the original data set are collinear (Kutner et al., 2004). It is interesting to see that as soon as the data is modified, the added high leverage points in x1 and x2 pull the regression line toward themselves and change the collinearity pattern of the data which obviously can be seen from Fig. 1b.

It is worth mentioning that for the original data set the F-test is significant (F-value=17.53 and p-value=0.0) which indicates that a linear relationship exist between our variables in the model. From Table 1, that the OLS parameter estimations are significant at significance level of 5% when we compare t-values with t (0.975, 77) =1.99. The results of the coefficient estimations of the two robust methods are also significant. Thus the classical and the robust methods confirm that none of the coefficient estimations are zero. After modifying the data set, the F-test is significant (F-value = 14.75 and p-value = 0.0) while, β3 for the OLS is not significant (t-value = 0.1777). This behavior is an indicator for the existence of multicollinearity in the data set (Montgomery et al., 2001; Kutner et al., 2004; Chatterjee and Hadi, 2006). However, the results of Table 2 suggested that both robust methods coefficient determinations are significant. These indicate that the robust methods are fitting the model to the majority of the data and resistant to the high leverage points (Maronna et al., 2006; Andersen, 2008). Consequently, these points can’t cause any change in the parameter estimates.

The results of Table 3 indicate that when this data set doesn’t contain any collinearity-enhancing observations, the classical VIF, RVIF (MM) and RVIF (GM (DRGP)) are not exceeding their cutoff points. Thus these results confirm that this data set is non-collinear data set. However, after the data is modified by creating high leverage points which cause collinearity in the data set, the classical VIF indicates severe multicollinearity (Midi et al., 2009; Imon et al., 2009). However, our proposed RVIF (GM (DRGP)) and RVIF (MM) are resistant to these added high leverage points and doesn’t show collinearity for the data compared to the CVIF which indicates severe multicollinearity. It is evident from the results that the high leverage points are the source of multicollinearity in the data set. The results reveal that the high leverage points which are claimed by Kamruzzaman and Imon (2002) to cause multicollinearity in non-collinear data set, is based on the classical VIF which is not resistant to these points.

Finally we discuss the results obtained from the simulations. According to Table 4, it is important to note that in normal situations, the values of the RVIF (MM) and RVIF (GM (DRGP)) are close to the classical VIF which indicates that they are as good as CVIF in diagnosing the collinearity pattern of the data correctly.

As to be expected, when there aren’t any high leverage points in the data set, the CVIF confirmed that the data set is not collinear (Table 4) (Hair et al., 2006). It can be observed from Table 5 and 6 that by increasing the percentage and magnitude of high leverage points, the value of CVIF become larger. Hence, the CVIF is very sensitive to the presence of added high leverage points to the data set. However, the RVIF (MM) increases for the small values of high leverage points such as 20 while for the large values of high leverage points such as 300 it doesn’t change drastically (Table 6). Hence, the RVIF (MM) for small values of high leverage points is not as resistant as high values.

It is evident from the results that the RVIF (GM (DRGP)) is not affected by the increased in the percentage and magnitude of high leverage points.

It is important to note that a method that has very small or high percentage of error reveals that it can detect correctly or unable to detect correctly the degree of multicollinearity in the data, respectively. Inevitably, robust methods have high percentage of error due to their resistance to the high leverage points. It can be observed from Table 7 that the percentage of errors for the CVIF is zero for different magnitude of high leverage points considered in this study. The results reveal that the CVIF has detected a severe multicollinearity due to their sensitivity to the added high leverage points in the data sets. On the other hands, the RVIF (MM) has detected a moderate degree of multicollinearity for small magnitude of high leverage point. This shows that this method is not robust against small magnitude of high leverage points. However, by increasing the magnitude of high leverage points up to 300, it becomes resistant to these points. The results also point out that the RVIF (GM (DRGP)) always robust against high leverage collinearity enhancing observations.

CONCLUSIONS

In this study, we proposed a robust RVIF (MM) and RVIF (GM (DRGP)) for detecting the source of multicollinearity which is caused by the high leverage points in the data set. Recently, high leverage points are known to be another source of multicollinearity. The results of the study signify that the high leverage points has an unduly effects on the classical multicollinearity diagnostics, specifically the VIF. In this situation, sometimes the classical VIF conveys the misleading interpretation about collinearity pattern of the data. The results of the real data and simulation study reveal that the high leverage points are the source of multicolinearity evidently by the failure of the RVIF in diagnosing multicollinearity, contradicts to the result of the CVIF which indicates existence of multicollinearity in the data set. Another important conclusion is that the RVIF (GM (DRGP)) is the most resistant diagnostic measure to high leverage points, followed by the RVIF (MM).

REFERENCES
Andersen, R., 2008. Modern Methods for Robust Regression. SAGE Publications, USA., ISBN: 9781412940726.

Bagheri, A. and H. Midi, 2009. Robust estimations as a remedy for multicollinearity caused by multiple high leverage points. J. Math. Statist., 5: 311-321.
CrossRef  |  Direct Link  |  

Bagheri, A., H. Midi and A.H.M.R. Imon, 2009. Two-step robust diagnostic method for identification of multiple high leverage points. J. Math. Statist., 5: 97-106.
CrossRef  |  Direct Link  |  

Belsley, D.A., E. Kuh and R.E. Welsch, 1980. Regression Diagnostics: Identifying Influential Data and Sources of Colinearity. John Willey and Sons Inc., New York.

Chatterjee, S. and A.S. Hadi, 2006. Regression Analysis by Example. 4th Edn., Wiley, New York, ISBN-10: 0471746966.

Coakley, C.W. and T.P. Hettmansperger, 1993. A bounded-influence, high-breakdown, efficient regression estimator. J. Am. Statist. Assoc., 88: 872-880.
Direct Link  |  

Croux, C., P.J. Rousseeuw and O. Hossjer, 1994. Generalized S-estimators. J. Am. Statist. Assoc., 89: 1271-1281.
Direct Link  |  

Habshah, M., M.R. Norazan and A.H.M.R. Imon, 2009. The performance of diagnostic-robust generalized potentials for the identification of multiple high leverage points in linear regression. J. Applied Statist., 36: 507-520.
Direct Link  |  

Hadi, A.S., 1988. Diagnosing collineariy-influential observations. Comput. Statist. Data Anal., 7: 143-159.
CrossRef  |  

Hair, J.F., R.E. Anderson, R.L. Tatham and W.C. Black, 2006. Multivariate Data Analysis. 5th Edn., Prentice Hall, Upper Saddle River, New Jersey, USA.,.

Hill, R.W., 1977. Robust regression when there are outliers in the carriers. Ph.D. Thesis, Harvard University, Boston, MA.

Jolliffe, I.T., 1982. A note on the use of principal components in regression. Applied Stat., 31: 300-303.
Direct Link  |  

Kamruzzaman, M. and A.H.M.R. Imon, 2002. High leverage point: Another source of multicollinearity. Pak. J. Statist., 18: 435-448.
Direct Link  |  

Kutner, M.H., C.J. Nachtsheim and J. Neter, 2004. Applied Linear Regression Models. 4th Edn., McGraw Hill, New York, ISBN: 978-0256086010.

Maronna, R.A., R.D. Martin and V.J. Yohai, 2006. Robust Statistics Theory and Methods. Willy and Sons, New York, ISBN: 10: 0470010924.

Marquardt, D.W., 1970. Generalized inverses, ridge regression, biased linear estimation and nonlinear estimation. Technometrics, 12: 591-612.
Direct Link  |  

Montgomery, D.C., E.A. Peck and G.G. Vining, 2001. Introduction to Linear Regression Analysis. 3rd Edn., Jon Wiley and Sons, New York, USA., ISBN-13: 978-0471315650, Pages: 672.

Rousseeuw, P.J. and A.M. Leroy, 2003. Robust Regression and Outlier Detection. John Willy, New York, ISBN-10: 0471852333.

Rousseeuw, P.J. and B.C. van Zomeren, 1990. Unmasking multivariate outliers and leverage points. J. Am. Stat. Assoc., 85: 633-639.
CrossRef  |  Direct Link  |  

Rousseeuw, P.J. and M. Hubert, 1997. Recent Development in PROGRESS. In: L1-Statistical Procedures and Related Topics, Dodge, Y. (Ed.). Vol. 31, Institute of Mathematical Statistics, Hayward, California, pp: 201-214.

Rousseeuw, P.J. and V.J. Yohai, 1984. Robust Regression by Means of S-Estimators. In: Robust and Nonlinear Time Series Analysis, Franke, J., W. Hurdle and R.D. Martin (Eds.). Springer-Verlag, New York, USA., pp: 256-272.

Rousseeuw, P.J., 1984. Least median of squares regression. J. Am. Stat. Assoc., 79: 871-880.
CrossRef  |  Direct Link  |  

Rousseeuw, P.J., 1985. Multivariate estimation with high breakdown point. Math. Statist. Appl., 13: 283-297.
Direct Link  |  

Simpson, J.R., 1995. New methods and comparative evaluations for robust and biased-robust regression estimation. Ph.D. Thesis, Arizona State University.

Wilcox, R.R., 2005. Introduction to Robust Estimation and Hypothesis Testing. 2nd Edn., Elsevier Academic Press, USA., ISBN: 0-12-751542-9.

Yohai, V.J., 1987. High breakdown-point and high efficiency robust estimates for regression. Ann. Stat., 15: 642-656.
Direct Link  |  

©  2019 Science Alert. All Rights Reserved