Number of Parameters Counting in a Hierarchically Multiple Regression Model

Zainodin, H.J.; Abdullah, Noraini; Yap, S.J.

ABSTRACT

Background: Manually, when a dependent variable is affected by a large number of independent variables, the number of parameters helps researchers determine the number of independent variables to be considered in an analysis. However, when there are many parameters to be estimated in a model, the manual counting is tedious and time consuming. Thus, this study derives a method to determine the number of parameters systematically in a model. Methods: The model building procedure in this study involves removing variables due to multicollinearity and insignificant variables. Eventually, a selected model is obtained with significant variables. Results: The findings of this study would enable researchers to count the number of parameters in a resulting model (selected model) with ease and speed. On top of that, models which fulfill the assumption are considered in the statistical analysis. In addition, human errors caused by manual counting can also be minimised and avoided by implementing the proposed procedure. Conclusion: These findings will also undoubtedly help many researchers save time when their analyses involve complex iterations.

PDF Abstract XML References Citation

INTRODUCTION

According to Ramanathan¹ in a linear regression model, regression coefficients are the unknown parameters to be estimated. In a simple linear regression, only two unknown parameters have to be estimated. However, problems arise in a multiple linear regression, when the numbers of parameters in the model are large and more complex, where three or more unknown parameters are to be estimated. Challenges arose when computer programme has to be written and complex iterations are required to perform with certain criterion. Thus, the exact number of parameters involved should be known in order to prepare the amount of data to suit such large and complex model. It is also important to note that in order to have a unique solution in finding the estimated parameters, according to the assumptions of multiple regression model stated by Gujarati and Porter², the number of estimated parameters must be less than the total number of observations.

According to Zainodin et al.³, Yahaya et al.⁴, in a multiple linear regression analysis, there are four phases in getting the best model, namely: Listing out all possible models, getting selected models, getting best model and conducting the validity of goodness-of-fit. In phase 1, the number of parameters for a possible model is denoted by NP. To get the selected models, after listing out all of the possible models in phase 1, multicollinearity test and coefficient test are conducted on the possible models in phase 2. Before continuing to phase 2, number of parameters in each model must be less than the sample size, n. Discard the model which failed the initial criteria. In the multicollinearity test, multicollinearity source variables are removed from each of the possible models. Then, coefficient test is conducted on the possible models that are free from multicollinearity problem. Detailed procedure of this phase is explained in Zainodin et al.³. This is to eliminate insignificant variables from each of the possible models.

For a general model Ma.b.c with parent model number “a”, the number of variables removed due to multicollinearity problem is denoted by b, the number of variables eliminated due to insignificance is denoted by (k+1) and the resulting number of parameters for a selected model is represented by (k+1). In most of the cases, if the number of the unknown parameters to be estimated for a possible model is large, then the number of parameters for a selected model will most probably be large too^5,6. In these cases, the manual counting on the large number of parameters is found to be time consuming. Furthermore, some of the parameters might be missed out due to human error in manual counting. Thus, the objective of this study is to propose a method to count the number of parameters for a selected model, (k+1). The information of the NP, b and c is useful in getting the number of parameters for a selected model, (k+1).

According to Gujarati and Porter², a simple linear regression model without any interaction variable can be written as follows:

(1)

where, Y is dependent variable, β₀ and β₁ are regression coefficients and they are the unknown parameters to be estimated, X₁ is single quantitative independent variable and u is error term. So, it can be observed that there are two unknown parameters to be estimated in Eq. 1. This equation can also be written in the form as follows:

Next, a hierarchically multiple linear regression models⁷ with interaction variable can be written as follows:

(2)

where, Y is dependent variable, β₀, β₁, β₂ and β₁₂ are regression coefficients and they are the unknown parameters to be estimated, X₁ and X₂are single quantitative independent variables, X₁₂ is first-order interaction variable and u is error term. Thus, it can be seen that there are four unknown parameters to be estimated in Eq. 2. This equation can also be written in the following form:

Next, an example for a linear regression model with interaction variables and dummy variables is shown as follows:

(3)

where, Y is dependent variable, β₀, β₁, β₂ β₁₂, β_D, β_1D and β_2D are regression coefficients and they are the unknown parameters to be estimated, X₁ and X₂are single quantitative independent variables, X₁₂ is first-order interaction variable, D is single independent dummy variable, X₁D is first-order interaction variable of X₁ and D, X₂D is first-order interaction variable of X₂ and D and u is error term. Therefore, it can be observed that there are seven unknown parameters to be estimated in Eq. 3. This equation can also be written as follows:

Equation 1-3 can be written in general model in the form:

(4)

where, Ω₀ is the intercept and Ω_j is the jth partial regression coefficient of the corresponding independent variable W_j for j =1, 2,..., k.

According to Zainodin et al.³, independent variable W_j included the single independent variables, interaction variables, generated variables, dummy variables and transformed variables. In this study, (k+1) denotes number of parameters for a selected model. The corresponding labels of the general model in Eq. 4-3 are shown in Table 1.

From Table 1, it is known that β₀ represents the Ω₀ in the general model, β₁ represents the Ω₁ and the same goes to other estimated parameters in Table 1. Variable X₁ in Eq. 3 represents W₁ in the general model and the same goes to other variables in Table 1.

Instead of counting the number of parameters one by one as above, a equation to count the number of parameters in model without interaction variable and in model with interaction variable is proposed in this study. The Eq. 5 is presented as follows:

(5)

where, NP is number of parameters for a possible model, g is number of single independent quantitative variables, h is number of single independent dummy variables and v is highest order of interaction (between single independent quantitative variable) in the model.

Image for - Number of Parameters Counting in a Hierarchically Multiple Regression Model

Here, v = 0 denotes model without interaction variable (or model with zero-order interaction variable) and v = 1, 2,… denotes model with first or higher order interaction variable(s). Hence, Eq. 5 can now be tested in the following instances to prove its validity in counting the number of parameters in a hierarchically multiple regression model. The aim of this study is to propose a method to count the number of parameters for a selected model, (k+1). The information of the NP, b and c is useful in getting the number of parameters for a selected model, (k+1).

MATERIALS AND METHODS

This will help to better understand the application of the equation as mentioned in previous section. In order to achieve a model free from multicollinearity effects and insignificant effects the following 4 phase model building procedure is implemented (details can be found in^{3, 4}).

•	Phase 1: All possible models
•	Phase 2: Selected model
•	Multicollinearity test and coefficient test (Include NPM is Near Perfect Multicollinearity test and NPC is Near Perfect Collinearity test)
•	Phase 3: Best model
•	Phase 4: Goodness-of-fit

Randomness test and normality test: This study also revealed the parameters of independent variables measure multicollinearity effect between independent variables parameters and structural parameters. As discussed in earlier section, the number of parameters is very important before arriving at a selected and best model. Thus, some illustrations follow:

Models without interaction variable: Here, the following models are considered without interaction variable or with zero-order interaction variable. As pointed out earlier in Eq. 5, the number of parameters in the case of model without interaction variable (or v = 0) can be computed using g+h+1. For instance, consider Eq. 6, a model with zero-order interaction variable (or v equals to 0), number of single quantitative independent variable, g equals to 1 and number of single dummy variable, h equals to 5, as follows:

(6)

where, Y is a dependent variable, X₁ is a single quantitative independent variable (g =1) and D, B, R, A and G are 5 single dummy variables (h =5). Then, total number of parameters involved is:

For simplicity, consider another example for calculating the number of parameters in a model without an interaction variable. Considering Eq. 7 which is a model with zero-order interaction variable (or v equals to 0), number of single quantitative independent variable, g equals to 8 and number of single dummy variable, h equals to 10, as follows:

(7)

where, X₁, X₂, X₃, X₄, X₅, X₆, X₇ and X₈ are single quantitative independent variables and B, C, L, E, W, K, A, G, H and S are 10 single dummy variables. Then, the following is obtained:

Models with interaction variable: Now, consider models with interaction variable (i.e., v = 1, 2,…) in this subsection. According to the equation mentioned in Eq. 5, it is known that the number of parameters in a model with interaction variable is calculated in a different way from a model without interaction variable. A few examples are presented to provide better understanding of this equation. For instance, model with interaction variable up to first-order (i.e., v equals to 1), number of single quantitative independent variable, g equals to 2 and number of single dummy variable, h equals to 5 is presented in Eq. 8:

(8)

where, X₂ and X₄ are single quantitative independent variables, D, B, R, A and G are single dummy variables and X₂₄, X₂D, X₂B, X₂R, X₂A, X₂G, X₄D, X₄B, X₄R, X₄A and X₄G are first-order interaction variables. Then, this led to:

Next, consider a larger model with higher order of interaction variable, for instance, a model with fifth-order interaction (i.e., v equals to 5), number of single quantitative independent variable, g equals to 6 and number of single dummy variable, h equals to 5 as presented in Eq. 9:

(9)

In Eq. 9, X₁, X₂, X₃, X₄, X₅ and X₆ are single quantitative independent variables, X₁₂, X₁₃, X₁₄, X₁₅, X₁₆, X₂₃, X₂₄, X₂₅, X₂₆, X₃₄, X₃₅, X₃₆, X₄₅, X₄₆ X₅₆, X₁D, X₁B, X₁R, X₁A, X₁G, X₂D, X₂B, X₂R, X₂A, X₂G, X₃D, X₃B, X₃R, X₃A, X₃G, X₄D, X₄B, X₄R, X₄A, X₄G, X₅D, X₅B, X₅R, X₅A, X₅G, X₆D, X₆B, X₆R, X₆A and X₆G are first-order interaction variables, X₁₂₃, X₁₂₄, X₁₂₅, X₁₂₆, X₁₃₄, X₁₃₅, X₁₃₆, X₁₄₅, X₁₄₆, X₁₅₆, X₂₃₄, X₂₃₅, X₂₃₆, X₂₄₅, X₂₄₆, X₂₅₆, X₃₄₅, X₃₄₆, X₃₅₆ and X₄₅₆ are second-order interaction variables, X₁₂₃₄, X₁₂₃₅, X₁₂₃₆, X₁₂₄₅, X₁₂₄₆, X₁₂₅₆, X₁₃₄₅, X₁₃₄₆, X₁₃₅₆, X₁₄₅₆, X₂₃₄₅, X₂₃₄₆, X₂₃₅₆, X₂₄₅₆ and X₃₄₅₆ are third-order interaction variables, X₁₂₃₄₅, X₁₂₃₄₆, X₁₂₃₅₆, X₁₂₄₅₆, X₁₃₄₅₆ and X₂₃₄₅₆ are fourth-order interaction variables and X₁₂₃₄₅₆ is fifth-order interaction variable. Then, total number of parameters:

Lastly, consider another example of a larger model which has interaction variable up to 6th order and nine single dummy variables. Consider Eq. 1, a model with highest order of interaction, v equals to 6, number of single quantitative independent variable, g equals to 7 and number of single dummy variable, h equals to 9.

(10)

Here, X₁₂₃₄₅₆, X₁₂₃₄₅₇, X₁₂₃₄₆₇, X₁₂₃₅₆₇, X₁₂₄₅₆₇, X₁₃₄₅₆₇ and X₂₃₄₅₆₇ are 5th order interaction variables and X_1234567 is a 6th order interaction variable. Then, total number of parameters:

(11)

As can be seen from the illustrations, number of parameters calculated using the derived equation tally with that in manual counting.

RESULTS AND DISCUSSION

The proposed equation defined in Eq. 5 is especially useful in counter checking the variables when listing all of the possible models in an analysis. This is because some of the variables might be missed out when there are a large number of parameters involved in a possible model.

Table 2 shows all the possible models for an analysis that has two single quantitative independent variables (X₁ and X₂) and one single dummy variable (D).

In Table 2, with the information on g, h and v, the NP for possible models M1-M12 can be computed by using Eq. 5. Then, the number of parameters for each of the possible models can be counterchecked by using the computed NP values. For simplicity, each of the 12 models can be written in general form as in Eq. 4.

After introducing the equation in counting the number of parameters for a possible model, the way of getting the number of parameters for a selected model, (k+1) is presented^8,9. A model M32, is used as an illustration and it had also been mentioned earlier in Eq. 3. So, multicollinearity test is conducted on the possible model, model M32. The removal of multicollinearity source variables from this model are shown in Table 2-4. This study uses the modified method in removing multicollinearity source variables (excel command: COUNTIF()).

In Table 3, all of the multicollinearity source variables (variables with absolute correlation coefficient values greater than or equal to 0.9500) are circled. Then, it is found that variables X₁₂, D, X₁D and X₂D have frequencies 2. So, according to Zainodin et al.³, model M32 belongs to case B. To avoid confusion between dummy variable B and case B, case B is represented by case 2 in this study.

Similarly, case C is represented by case 3 in this study. Based on the removal steps for case 2, variable X12 which has the weakest absolute correlation coefficient with dependent variable Y, if compared to variables D, X₁D and X₂D, is removed from model M32. The same removal steps are carried out on the reduced model; model M32.1, as presented in Table 4.

After removing variable X₂D from model M32.1 in Table 4, details correlation coefficients of the reduced model M32.2 are shown in Table 5. It is observed that each of the variables D and X₁D has the highest frequency of one, respectively. So, it is identified that model M32.2 belongs to case III. Therefore, variable X₁D which has a weaker absolute correlation coefficient with the dependent variable Y is then removed from this model. Thus, the resulting model free from multicollinearity is M32.3.

Table 6 shows that model M32.3 is free from multicollinearity source variables because all the absolute correlation coefficient values between all the independent variables are less than 0.9500 (except the diagonal values).

By observing the model M32.3, it is found that the number of variables removed due to multicollinearity problem is 3. Therefore, b for model M32.3 is 3. More details on the definition of model name can be found in³ and^4,6,10. Thus the resulting model, M32.3 is free from multicollinearity effects and the coefficient test is conducted on the model. The task then is to eliminate insignificant variables from this model, M32.3.

In Table 7, it is found that variable X₂ has the highest p-value among other independent variables and is greater than 0.05 (since the number of single quantitative independent variables is greater than 5 and the coefficient test is a two-tail, the level of significance is set at 10%. This is based on¹¹ recommendations). Thus, variable X₂ is eliminated from model M32.3 and this reduced model is called model M32.3.1 as 1 variable is eliminated due to insignificance. Details on the coefficient test can be found in^8,12,13.

Table 8 shows that model M32.3.1 is free from insignificant variable because both the p-values of variable X₁ and D are less than 0.05. From Table 8, it is found that 3 parameters (constant of the model, coefficient of variables X₁ and D) are left in model M32.3.1, or in other words, (k+1) equals to 3. From the model name, model M32.3.1; it is noticed that 1 variable is eliminated in coefficient test and c equals to 1. As mentioned earlier in Eq. 3, there are seven unknown parameters to be estimated for model M32, so NP equals to 7. Thus, by knowing the NP, b and c for model M32.3.1 (i.e., Ma.b.c), the number of parameters (k+1) for model M32.3.1 can be counterchecked using the proposed equationin this study as:

(12)

Therefore, it is shown in Table 8 that the number of parameters left in the selected model M32.3.1 is the same with the value obtained from the proposed in Eq. 11.

In line with the above discussion, other researchers also highlighted the importance of this parameters counting. They ranked them (as importance, significance or dependency etc.) the parameters of a model based on the magnitude of the coefficients^11,14,15.

CONCLUSION

This study is new and groundbreaking. It has succeeded in proposing a equation in counting the number of parameters for each of all possible models (details can be found in phase 1 and in^3,4). This equation helps to countercheck the number of parameters left in the selected model which is free from multicollinearity and from insignificant variable. It presented a equation to calculate the number of parameters and demonstrated their application on models with and without interaction variables.

As can be seen from previous section, it requires lengthy time to calculate the number of parameters in a model, especially for bigger models like Eq. 9-10. Instead of calculating the number of parameters one by one manually, the equation established in this study allow researchers to obtain the number of parameters is an easier, faster yet accurate way. Besides, human errors that are caused by manual counting, (have happened during model development) can also be minimised and avoided. The proposed equation also helps to save tremendous amount of time, where there are analysis involving complex iterations or repeated tasks, especially in software development.

REFERENCES

Ramanathan, R., 2002. Introductory Econometrics with Applications. 5th Edn., Harcourt College Publishers, Ohio, USA., ISBN-13: 9780030341861, Pages: 688.
Gujarati, D.N. and D.C. Porter, 2009. Basic Econometrics. 5th Edn., McGraw-Hill Irwin, New York, USA, ISBN: 9780071276252, Pages: 922.
Direct Link
Zainodin, H.J., A. Noraini and S.J. Yap, 2011. An alternative multicollinearity approach in solving multiple regression problem. Trends Applied Sci. Res., 6: 1241-1255.
CrossRef Direct Link
Yahaya, A.H., N. Abdullah and H.J. Zainodin, 2012. Multiple regression models up to first-order interaction on hydrochemistry properties. Asian J. Math. Stat., 5: 121-131.
CrossRef Direct Link
Anastasopoulos, P.C. and F.L. Mannering, 2009. A note on modeling vehicle accident frequencies with random-parameters count models. Accid. Anal. Prev., 41: 153-159.
CrossRef
Mazzocchetti, D., A.M. Berti, G. Lucchetti, R. Sartini, F. Colle, P. Iudicone and L. Pierelli, 2012. Evaluation of volume and total nucleated cell count as cord blood selection parameters. Am. J. Clin. Pathol., 138: 308-309.
CrossRef
Jaccard, J., 2001. Interaction Effects in Logistic Regression. SAGE Publications, London, Pages: 70.
Lind, D.A., W.G. Marchal and S.A. Wathen, 2012. Statistical Techniques in Business and Economics. 15th Edn., McGraw-Hill, New York.
Chittawatanarat, K., S. Pruenglampoo, V. Trakulhoon, W. Ungpinitpong and J. Patumanond, 2012. Development of gender-and age group-specific equations for estimating body weight from anthropometric measurement in Thai adults. Int. J. Gen. Med., 5: 65-80.
CrossRef PubMed
Zeileis, A., 2004. Econometric computing with HC and HAC covariance matrix estimators. J. Statist. Software, 11: 1-17.
Direct Link
Ansorge, H.L., S. Adams, A.F. Jawad, D.E. Birk and L.J. Soslowsky, 2012. Mechanical property changes during neonatal development and healing using a multiple regression model. J. Biomechan., 45: 1288-1292.
CrossRef
Emuchay, C.I., S.O. Okeniyi and J.O. Okeniyi, 2014. Correlation between total lymphocyte count, hemoglobin, hematocrit and CD4 count in HIV patients in Nigeria. Pak. J. Biol. Sci., 174: 57-573.
CrossRef
Furnstahl, R.J. and B.D. Serot, 2000. Parameter counting in relativistic mean-field models. Nucl. Phys. A, 671: 447-460.
CrossRef
Robinson, P.S., T.W. Lin, A.F. Jawad, R.V. Iozzo and L.J. Soslowsky, 2004. Investigating tendon fascicle structure-function relationships in a transgenic-age mouse model using multiple regression models. Ann. Biomed. Eng., 32: 924-931.
PubMed
Festing, M.F., 2006. Design and statistical methods in studies using animal models of development. ILAR J., 47: 5-14.
CrossRef

Science International

Research Article

Number of Parameters Counting in a Hierarchically Multiple Regression Model

ABSTRACT

How to cite this article

Search

REFERENCES

Search

Related Articles

Leave a Comment