INTRODUCTION
Multiple Regression Analysis (MRA) is commonly used in all science fields. As being in other analysis techniques, MRA should be provided with some assumptions for reliable estimation of parameters: expected value of residual terms should be zero; residual terms should have a normal distribution; residual terms should be independent from each other; observation number should be more than parameter number; there should not be multicollinearity between or among independent variables^{[1,2]}.
The aim of MRA is to find the best set of the independent variables which can explain dependent variable on condition that the assumptions are provided^{[1,3]}. Diagnostics are analysis techniques that given an idea about determining levels of unfavorable cases such as lack of model and heterogeneity of variances which can be encountered in data set^{[1,3,4]}.
This paper dealt with some problems by using diagnostics mentioned below. Therefore, the aim of this study was to obtain some valuable information from different diagnostics in Multiple Regression Analysis (MRA).
MATERIALS AND METHODS
Materials of this study were composed of 18 malesingle lambs randomly selected from Hamdani lambs raised in Van province of Turkey. Data of body weights at different periods (birth weight, body weights at 45th, 60th and 75th days) of the lambs were recorded. The data set was analyzed by using SAS program^{[5]}.
MRA is used to explain effects of independent variables on dependent variables. Model of MRA can be written as follows:
Where, Y, dependent variable; X_{1},X_{2}...X_{k }are independent variables, β_{0}, β_{1}, β_{2}...β_{k} (regression coefficients (slopes) and ε_{i} random error.
Equation 1 can be rewritten as Y=Xβ + ε in matrix
notation where X, design matrix; β, coefficients vector of regression coefficients
and ∈, vector of random error. Regression coefficients can be estimated
by Ordinary Least Square (OLS) Method. The method is based on minimizing difference
between observed Y values with predicted values is solved
by using OLS then β_{0}, β_{1}, β_{2},....β_{k} were calculated^{[2]}.
Diagnostics: Regression diagnostics are statistics used for detecting problems which are encountered in model or data set^{[1,3]}. Let’s examine Diagnostics by turns.
Leverage points diagnostics: The diagnostics composed of residual Analysis, standardized residuals, studentized residuals and Hat matrix.
Residual analysis: Residual, difference between observed Y values with predicted values, denotes by e_{i}. The term can be obtained by Eq. 2. The assumptions that variance and expected values of error terms in MRA should be fixed, which is denoted by var (e) = σ^{2}I and E (e) = 0 ^{[2,4,6]}.
Standardized residuals: The diagnostic, which is denoted by r_{i}, the ratio of each residual to standard deviation of all residuals^{[14, 6]}, can be written as follows:
Where, e_{i }is residual, s term is the root of means squares of error and diagonal elements of hat matrix, h_{ii}.
Studentized residuals: Each residual is standardized with standard deviation which is calculated after it is released out of calculation^{[14,6]}. After the ith observation is removed from data set, variance for ith residual is denoted by s^{2}_{(i)}, estimating from the rest of data set. s^{2}_{(i)} can be calculated below:
Thus residual value converted to Student’s t, denotes by r_{i}^{*} and can be written follow as:
In application^{[1]}, Eq. 5 can be expressed as Eq. 6:
The i^{th} observation is an outlier if r_{i}^{*} > 1.96 or r_{i}^{*} > 2 The e_{i}, r_{i } and r_{i}^{* }values are based on studies related to effectiveness of model estimation^{[3,4]}. In most observations, these three values can have similar results. It was reported that studentized residuals can be used as appropriate criterion in point of size of residuals^{[1,2]}.
Hat matrix: Consider a matrix H;
Where, X is data matrix containing independent variables. First column of matrix X is only 1’s corresponding to intercept and matrix X^{T} is transpose of matrix X. The Eq. 7 is called as Hat Matrix whose diagonal elements are denoted by h_{ii}.
The h_{ii} value is an indicator of the leverage of data point concerning i^{th} observation from space centre of X variables (X_{1}, X_{2}…X_{n}). In other words, the value, which ranges from 0 to 1^{[2,4]} as well as lies between 1/n and 1/r according to other author where n, is the number of observations and r is the number of i^{th} observation and/or is shown whether i^{th }observation will be an outlier in a space of X variables^{[6]}.
The critic value or cutoff value for the statistics is 2 p’/n where the number of parameters or independent variables and regression constant (intercept =1), respectively, is denoted by p’ and p. For instance, let’s we have 3 independent variables in a model. The number of parameters estimated equals to p’ = 3+1 = 4. Observations whose h_{ii} values are larger than 2 p’/n values can be expressed as outliers in place of X variables^{[14]}.
DurbinWatson: The statistics whose optimum value ranges from 2 to 4 is used in determining sequent correlation among residuals^{[14,6,7]} and is calculated as:
Calculated value of DurbinWatson statistics is compared with the table value(s) containing the cutoff values for the statistics^{[4]}.
Influence statistics: The influence diagnostics comprised of Cook Distance, Differences between the fits (DFFITS), Differences between the betas (DFBETAS) and Covariance ratio (COVRATIO).
Cook’s distance: Cook’s distance is shown the combined effects of i^{th} observation on all regression coefficients. Observations whose values are larger than the cut off value for Cook’s distance, 4/n, can be expressed as influential observations and said to be effective on. The statistics can be calculated by Eq. 9:
Where , is calculated in the event of deletion of i^{th} observation and the other is normally calculated^{[14]}.
Differences between the fits (DFFITS): The statistics is given the changes of predicted is given when ith observation is ignored and its expression can be written as follows:
The cut value for the statistics is and if DFFITS value of ith observation is larger than the cut value, it can be said to be effective of the observation on
As there is a close association between the statistics and Cook’s distance, results of both statistics are similar^{[14,6]}.
Differences between the betas (DFBETAS): The statistics is based on
measured influence of i^{th }observation on each regression coefficient
and obtained from standardized differences between
and .
Where,
is obtained from ignored to i^{th} observation.
The cut off value for DFBETAS is and if DFBETAS value of i^{th} observation is larger than it, it can be said to be effective of i^{th} observation on j. regression coefficient^{[14,6]}.
Covariance ratio (COVRATIO): The statistics is the ratio of determinant of variancecovariance matrix calculated when ith observation is omitted to determinant of variancecovariance matrix calculated when all observations are considered.
The ratio, if closes to 1, influences of i^{th} observation on regression coefficients is small. If the ratio is larger than 1, its influence is larger compared to approximate ratio of 1.
The cut off values for COVRATIO are expressed as COVRATIO_{i} ≥ 1+3 p’/n or COVRATIO_{i } ≤ 13 p’/n^{[13,6]}.
RESULTS AND DISCUSSION
Descriptive statistics of live weights at different periods of Hamdani breed 18 malesingle randomly selected lambs born in early March of 2001 are presented in Table 1.
As examining in Table 2, correlations between different pairs of independent variables were more significant and much higher which showed an evidence for multicolinearity^{[14]}.
As shown in Table 3, the ratio of model explanation was 0.9186% in case of all being independent variables in model. In case of reliability of model, with coefficient of determination is much higher, assumptions (homogeneity of variance, expected value of error is zero) should be provided^{[14]}. Because of context of assumptions and reasons mentioned, it is inevitable that the diagnostics should be taken into account for MRA. The effect of only 60th live weight as independent variable on 75th live weight was significant. Besides, with respect to result of stepwise elimination method that the most ideal set of independent variables was determined; the effect of only 60th live weight on 75th live weight was significant.
The statistics related to residuals analysis such as e_{i}, r_{i} and r^{*}_{i}, are used for determining problems which are encountered in data set and model^{[1,3,4,8]}.
As examining Table 4, Observation 2 and 16 are outliers with respect to the statistics. It is obviously seen that only two observations of all observations are exceeding the cut off values with ±2.
Although these two observations had unfavorable effects on the assumption mentioned above, it was not correct to remove them from data set^{[1,3,4]}.
With respect to h_{ii }statistics, only the 2nd observation can potentially affect the regression analysis in point of X value. As examined Cook’s D and DFFITS, it is said that only observations 2nd and 16th on the results related to all regression coefficients can be effective. The Cook’s and DFFITS had similar results which were in consistent with those reported by other authors^{[3,4]}.
Table 1: 
Descriptive statistics of live weights at different periods 

Table 2: 
Correlation between all pair of variables 

* : p < 0.05, **: p < 0.01 
Table 3: 
Results of regression analysis related to estimation of parameters 

Model R^{2} value : 0.9186 Model (%CV) : 5.81 
Table 4: 
Results of residuals analysis concerning each observation 

Table 5: 
The cut off formulas and their values of influence statistic 

Table 6: 
Values of potential effective observation in point of influence
statistics 

The cut off formulas and their values concerning the statistics are presented in Table 5. Based on the cut off values of Table 5, the values of potential effective observations in point of influence statistics are given in Table 6.
As to COVRATIO statistics, six observations (2, 4, 8, 10, 14 and 16) on fitted or predicted values were potential effective.
According to DFBETAS statistics, 2nd and 6th observations which were potentially effective influenced on intercept and all regression coefficients.
DurbinWatson value for the data set was 2.31 which means that autocorrelation among residuals was not exist.
As a result, if points of observations with large leverage (outlier) are influential or potential, the observations (observation 2 and 16) should be carefully examined by researcher^{[4]}.
CONCLUSIONS
The aim of MRA is to determine the best set of independent variables most efficiently explaining variation of dependent variable, which is based on realizing the assumptions of MRA mentioned in introduction section. Diagnostics are given an idea about whether the basic assumptions will be provided or whether results of MRA will be reliable.
The most important results from this study can be summarized as;
Plot of residuals e_{i} versus fitted values gives an idea about whether
assumptions of normal distribution and homogeneity of error terms will be supplied.
In other words, the value of each residual should be in the interval of ±2
for ensuring the assumptions. Otherwise, ideal transformation as to scatter
form of residuals e_{i} versus fitted values should be performed to
dependent variable Y.
Being serial correlation among residuals means that residuals is not independent from each other. To provide this, the optimum cut off value of Durbin Watson statistics should be 2 to 4.
Consequently, it could be suggested that it would be useful to employ diagnostics in addition to MRA to make it more reliable due to discussed reasons above.