Subscribe Now Subscribe Today
Research Article
 

Number of Parameters Counting in a Hierarchically Multiple Regression Model



H.J. Zainodin, Noraini Abdullah and S.J. Yap
 
Facebook Twitter Digg Reddit Linkedin StumbleUpon E-mail
ABSTRACT

Background: Manually, when a dependent variable is affected by a large number of independent variables, the number of parameters helps researchers determine the number of independent variables to be considered in an analysis. However, when there are many parameters to be estimated in a model, the manual counting is tedious and time consuming. Thus, this study derives a method to determine the number of parameters systematically in a model. Methods: The model building procedure in this study involves removing variables due to multicollinearity and insignificant variables. Eventually, a selected model is obtained with significant variables. Results: The findings of this study would enable researchers to count the number of parameters in a resulting model (selected model) with ease and speed. On top of that, models which fulfill the assumption are considered in the statistical analysis. In addition, human errors caused by manual counting can also be minimised and avoided by implementing the proposed procedure. Conclusion: These findings will also undoubtedly help many researchers save time when their analyses involve complex iterations.

Services
Related Articles in ASCI
Search in Google Scholar
View Citation
Report Citation

 
  How to cite this article:

H.J. Zainodin, Noraini Abdullah and S.J. Yap, 2014. Number of Parameters Counting in a Hierarchically Multiple Regression Model. Science International, 2: 37-43.

URL: https://scialert.net/abstract/?doi=sciintl.2014.37.43

INTRODUCTION

According to Ramanathan1 in a linear regression model, regression coefficients are the unknown parameters to be estimated. In a simple linear regression, only two unknown parameters have to be estimated. However, problems arise in a multiple linear regression, when the numbers of parameters in the model are large and more complex, where three or more unknown parameters are to be estimated. Challenges arose when computer programme has to be written and complex iterations are required to perform with certain criterion. Thus, the exact number of parameters involved should be known in order to prepare the amount of data to suit such large and complex model. It is also important to note that in order to have a unique solution in finding the estimated parameters, according to the assumptions of multiple regression model stated by Gujarati and Porter2, the number of estimated parameters must be less than the total number of observations.

According to Zainodin et al.3, Yahaya et al.4, in a multiple linear regression analysis, there are four phases in getting the best model, namely: Listing out all possible models, getting selected models, getting best model and conducting the validity of goodness-of-fit. In phase 1, the number of parameters for a possible model is denoted by NP. To get the selected models, after listing out all of the possible models in phase 1, multicollinearity test and coefficient test are conducted on the possible models in phase 2. Before continuing to phase 2, number of parameters in each model must be less than the sample size, n. Discard the model which failed the initial criteria. In the multicollinearity test, multicollinearity source variables are removed from each of the possible models. Then, coefficient test is conducted on the possible models that are free from multicollinearity problem. Detailed procedure of this phase is explained in Zainodin et al.3. This is to eliminate insignificant variables from each of the possible models.

For a general model Ma.b.c with parent model number “a”, the number of variables removed due to multicollinearity problem is denoted by b, the number of variables eliminated due to insignificance is denoted by (k+1) and the resulting number of parameters for a selected model is represented by (k+1). In most of the cases, if the number of the unknown parameters to be estimated for a possible model is large, then the number of parameters for a selected model will most probably be large too5,6. In these cases, the manual counting on the large number of parameters is found to be time consuming. Furthermore, some of the parameters might be missed out due to human error in manual counting. Thus, the objective of this study is to propose a method to count the number of parameters for a selected model, (k+1). The information of the NP, b and c is useful in getting the number of parameters for a selected model, (k+1).

According to Gujarati and Porter2, a simple linear regression model without any interaction variable can be written as follows:

Image for - Number of Parameters Counting in a Hierarchically Multiple Regression Model
(1)

where, Y is dependent variable, β0 and β1 are regression coefficients and they are the unknown parameters to be estimated, X1 is single quantitative independent variable and u is error term. So, it can be observed that there are two unknown parameters to be estimated in Eq. 1. This equation can also be written in the form as follows:

Image for - Number of Parameters Counting in a Hierarchically Multiple Regression Model

Next, a hierarchically multiple linear regression models7 with interaction variable can be written as follows:

Image for - Number of Parameters Counting in a Hierarchically Multiple Regression Model
(2)

where, Y is dependent variable, β0, β1, β2 and β12 are regression coefficients and they are the unknown parameters to be estimated, X1 and X2 are single quantitative independent variables, X12 is first-order interaction variable and u is error term. Thus, it can be seen that there are four unknown parameters to be estimated in Eq. 2. This equation can also be written in the following form:

Image for - Number of Parameters Counting in a Hierarchically Multiple Regression Model

Next, an example for a linear regression model with interaction variables and dummy variables is shown as follows:

Image for - Number of Parameters Counting in a Hierarchically Multiple Regression Model
(3)

where, Y is dependent variable, β0, β1, β2 β12, βD, β1D and β2D are regression coefficients and they are the unknown parameters to be estimated, X1 and X2 are single quantitative independent variables, X12 is first-order interaction variable, D is single independent dummy variable, X1D is first-order interaction variable of X1 and D, X2D is first-order interaction variable of X2 and D and u is error term. Therefore, it can be observed that there are seven unknown parameters to be estimated in Eq. 3. This equation can also be written as follows:

Image for - Number of Parameters Counting in a Hierarchically Multiple Regression Model

Equation 1-3 can be written in general model in the form:

Image for - Number of Parameters Counting in a Hierarchically Multiple Regression Model
(4)

where, Ω0 is the intercept and Ωj is the jth partial regression coefficient of the corresponding independent variable Wj for j =1, 2,..., k.

According to Zainodin et al.3, independent variable Wj included the single independent variables, interaction variables, generated variables, dummy variables and transformed variables. In this study, (k+1) denotes number of parameters for a selected model. The corresponding labels of the general model in Eq. 4-3 are shown in Table 1.

From Table 1, it is known that β0 represents the Ω0 in the general model, β1 represents the Ω1 and the same goes to other estimated parameters in Table 1. Variable X1 in Eq. 3 represents W1 in the general model and the same goes to other variables in Table 1.

Instead of counting the number of parameters one by one as above, a equation to count the number of parameters in model without interaction variable and in model with interaction variable is proposed in this study. The Eq. 5 is presented as follows:

Image for - Number of Parameters Counting in a Hierarchically Multiple Regression Model
(5)

where, NP is number of parameters for a possible model, g is number of single independent quantitative variables, h is number of single independent dummy variables and v is highest order of interaction (between single independent quantitative variable) in the model.

Image for - Number of Parameters Counting in a Hierarchically Multiple Regression Model

Here, v = 0 denotes model without interaction variable (or model with zero-order interaction variable) and v = 1, 2,… denotes model with first or higher order interaction variable(s). Hence, Eq. 5 can now be tested in the following instances to prove its validity in counting the number of parameters in a hierarchically multiple regression model. The aim of this study is to propose a method to count the number of parameters for a selected model, (k+1). The information of the NP, b and c is useful in getting the number of parameters for a selected model, (k+1).

MATERIALS AND METHODS

This will help to better understand the application of the equation as mentioned in previous section. In order to achieve a model free from multicollinearity effects and insignificant effects the following 4 phase model building procedure is implemented (details can be found in3, 4).

Phase 1: All possible models
Phase 2: Selected model
Multicollinearity test and coefficient test (Include NPM is Near Perfect Multicollinearity test and NPC is Near Perfect Collinearity test)
Phase 3: Best model
Phase 4: Goodness-of-fit

Randomness test and normality test: This study also revealed the parameters of independent variables measure multicollinearity effect between independent variables parameters and structural parameters. As discussed in earlier section, the number of parameters is very important before arriving at a selected and best model. Thus, some illustrations follow:

Models without interaction variable: Here, the following models are considered without interaction variable or with zero-order interaction variable. As pointed out earlier in Eq. 5, the number of parameters in the case of model without interaction variable (or v = 0) can be computed using g+h+1. For instance, consider Eq. 6, a model with zero-order interaction variable (or v equals to 0), number of single quantitative independent variable, g equals to 1 and number of single dummy variable, h equals to 5, as follows:

Image for - Number of Parameters Counting in a Hierarchically Multiple Regression Model
(6)

where, Y is a dependent variable, X1 is a single quantitative independent variable (g =1) and D, B, R, A and G are 5 single dummy variables (h =5). Then, total number of parameters involved is:

Image for - Number of Parameters Counting in a Hierarchically Multiple Regression Model

For simplicity, consider another example for calculating the number of parameters in a model without an interaction variable. Considering Eq. 7 which is a model with zero-order interaction variable (or v equals to 0), number of single quantitative independent variable, g equals to 8 and number of single dummy variable, h equals to 10, as follows:

Image for - Number of Parameters Counting in a Hierarchically Multiple Regression Model
(7)

where, X1, X2, X3, X4, X5, X6, X7 and X8 are single quantitative independent variables and B, C, L, E, W, K, A, G, H and S are 10 single dummy variables. Then, the following is obtained:

Image for - Number of Parameters Counting in a Hierarchically Multiple Regression Model

Models with interaction variable: Now, consider models with interaction variable (i.e., v = 1, 2,…) in this subsection. According to the equation mentioned in Eq. 5, it is known that the number of parameters in a model with interaction variable is calculated in a different way from a model without interaction variable. A few examples are presented to provide better understanding of this equation. For instance, model with interaction variable up to first-order (i.e., v equals to 1), number of single quantitative independent variable, g equals to 2 and number of single dummy variable, h equals to 5 is presented in Eq. 8:

Image for - Number of Parameters Counting in a Hierarchically Multiple Regression Model
(8)

where, X2 and X4 are single quantitative independent variables, D, B, R, A and G are single dummy variables and X24, X2D, X2B, X2R, X2A, X2G, X4D, X4B, X4R, X4A and X4G are first-order interaction variables. Then, this led to:

Image for - Number of Parameters Counting in a Hierarchically Multiple Regression Model

Next, consider a larger model with higher order of interaction variable, for instance, a model with fifth-order interaction (i.e., v equals to 5), number of single quantitative independent variable, g equals to 6 and number of single dummy variable, h equals to 5 as presented in Eq. 9:

Image for - Number of Parameters Counting in a Hierarchically Multiple Regression Model
(9)

In Eq. 9, X1, X2, X3, X4, X5 and X6 are single quantitative independent variables, X12, X13, X14, X15, X16, X23, X24, X25, X26, X34, X35, X36, X45, X46 X56, X1D, X1B, X1R, X1A, X1G, X2D, X2B, X2R, X2A, X2G, X3D, X3B, X3R, X3A, X3G, X4D, X4B, X4R, X4A, X4G, X5D, X5B, X5R, X5A, X5G, X6D, X6B, X6R, X6A and X6G are first-order interaction variables, X123, X124, X125, X126, X134, X135, X136, X145, X146, X156, X234, X235, X236, X245, X246, X256, X345, X346, X356 and X456 are second-order interaction variables, X1234, X1235, X1236, X1245, X1246, X1256, X1345, X1346, X1356, X1456, X2345, X2346, X2356, X2456 and X3456 are third-order interaction variables, X12345, X12346, X12356, X12456, X13456 and X23456 are fourth-order interaction variables and X123456 is fifth-order interaction variable. Then, total number of parameters:

Image for - Number of Parameters Counting in a Hierarchically Multiple Regression Model

Lastly, consider another example of a larger model which has interaction variable up to 6th order and nine single dummy variables. Consider Eq. 1, a model with highest order of interaction, v equals to 6, number of single quantitative independent variable, g equals to 7 and number of single dummy variable, h equals to 9.

Image for - Number of Parameters Counting in a Hierarchically Multiple Regression Model
(10)

Here, X123456, X123457, X123467, X123567, X124567, X134567 and X234567 are 5th order interaction variables and X1234567 is a 6th order interaction variable. Then, total number of parameters:

Image for - Number of Parameters Counting in a Hierarchically Multiple Regression Model
(11)

As can be seen from the illustrations, number of parameters calculated using the derived equation tally with that in manual counting.

RESULTS AND DISCUSSION

The proposed equation defined in Eq. 5 is especially useful in counter checking the variables when listing all of the possible models in an analysis. This is because some of the variables might be missed out when there are a large number of parameters involved in a possible model.

Image for - Number of Parameters Counting in a Hierarchically Multiple Regression Model

Table 2 shows all the possible models for an analysis that has two single quantitative independent variables (X1 and X2) and one single dummy variable (D).

In Table 2, with the information on g, h and v, the NP for possible models M1-M12 can be computed by using Eq. 5. Then, the number of parameters for each of the possible models can be counterchecked by using the computed NP values. For simplicity, each of the 12 models can be written in general form as in Eq. 4.

After introducing the equation in counting the number of parameters for a possible model, the way of getting the number of parameters for a selected model, (k+1) is presented8,9. A model M32, is used as an illustration and it had also been mentioned earlier in Eq. 3. So, multicollinearity test is conducted on the possible model, model M32. The removal of multicollinearity source variables from this model are shown in Table 2-4. This study uses the modified method in removing multicollinearity source variables (excel command: COUNTIF()).

In Table 3, all of the multicollinearity source variables (variables with absolute correlation coefficient values greater than or equal to 0.9500) are circled. Then, it is found that variables X12, D, X1D and X2D have frequencies 2. So, according to Zainodin et al.3, model M32 belongs to case B. To avoid confusion between dummy variable B and case B, case B is represented by case 2 in this study.

Similarly, case C is represented by case 3 in this study. Based on the removal steps for case 2, variable X12 which has the weakest absolute correlation coefficient with dependent variable Y, if compared to variables D, X1D and X2D, is removed from model M32. The same removal steps are carried out on the reduced model; model M32.1, as presented in Table 4.

After removing variable X2D from model M32.1 in Table 4, details correlation coefficients of the reduced model M32.2 are shown in Table 5. It is observed that each of the variables D and X1D has the highest frequency of one, respectively. So, it is identified that model M32.2 belongs to case III. Therefore, variable X1D which has a weaker absolute correlation coefficient with the dependent variable Y is then removed from this model. Thus, the resulting model free from multicollinearity is M32.3.

Table 6 shows that model M32.3 is free from multicollinearity source variables because all the absolute correlation coefficient values between all the independent variables are less than 0.9500 (except the diagonal values).

By observing the model M32.3, it is found that the number of variables removed due to multicollinearity problem is 3. Therefore, b for model M32.3 is 3. More details on the definition of model name can be found in3 and4,6,10. Thus the resulting model, M32.3 is free from multicollinearity effects and the coefficient test is conducted on the model. The task then is to eliminate insignificant variables from this model, M32.3.

In Table 7, it is found that variable X2 has the highest p-value among other independent variables and is greater than 0.05 (since the number of single quantitative independent variables is greater than 5 and the coefficient test is a two-tail, the level of significance is set at 10%. This is based on11 recommendations). Thus, variable X2 is eliminated from model M32.3 and this reduced model is called model M32.3.1 as 1 variable is eliminated due to insignificance. Details on the coefficient test can be found in8,12,13.

Table 8 shows that model M32.3.1 is free from insignificant variable because both the p-values of variable X1 and D are less than 0.05. From Table 8, it is found that 3 parameters (constant of the model, coefficient of variables X1 and D) are left in model M32.3.1, or in other words, (k+1) equals to 3. From the model name, model M32.3.1; it is noticed that 1 variable is eliminated in coefficient test and c equals to 1. As mentioned earlier in Eq. 3, there are seven unknown parameters to be estimated for model M32, so NP equals to 7. Thus, by knowing the NP, b and c for model M32.3.1 (i.e., Ma.b.c), the number of parameters (k+1) for model M32.3.1 can be counterchecked using the proposed equationin this study as:

Image for - Number of Parameters Counting in a Hierarchically Multiple Regression Model
(12)

Therefore, it is shown in Table 8 that the number of parameters left in the selected model M32.3.1 is the same with the value obtained from the proposed in Eq. 11.

Image for - Number of Parameters Counting in a Hierarchically Multiple Regression Model

Image for - Number of Parameters Counting in a Hierarchically Multiple Regression Model

Image for - Number of Parameters Counting in a Hierarchically Multiple Regression Model

Image for - Number of Parameters Counting in a Hierarchically Multiple Regression Model

Image for - Number of Parameters Counting in a Hierarchically Multiple Regression Model

Image for - Number of Parameters Counting in a Hierarchically Multiple Regression Model

In line with the above discussion, other researchers also highlighted the importance of this parameters counting. They ranked them (as importance, significance or dependency etc.) the parameters of a model based on the magnitude of the coefficients11,14,15.

CONCLUSION

This study is new and groundbreaking. It has succeeded in proposing a equation in counting the number of parameters for each of all possible models (details can be found in phase 1 and in3,4). This equation helps to countercheck the number of parameters left in the selected model which is free from multicollinearity and from insignificant variable. It presented a equation to calculate the number of parameters and demonstrated their application on models with and without interaction variables.

As can be seen from previous section, it requires lengthy time to calculate the number of parameters in a model, especially for bigger models like Eq. 9-10. Instead of calculating the number of parameters one by one manually, the equation established in this study allow researchers to obtain the number of parameters is an easier, faster yet accurate way. Besides, human errors that are caused by manual counting, (have happened during model development) can also be minimised and avoided. The proposed equation also helps to save tremendous amount of time, where there are analysis involving complex iterations or repeated tasks, especially in software development.

REFERENCES

  1. Ramanathan, R., 2002. Introductory Econometrics with Applications. 5th Edn., Harcourt College Publishers, Ohio, USA., ISBN-13: 9780030341861, Pages: 688


  2. Gujarati, D.N. and D.C. Porter, 2009. Basic Econometrics. 5th Edn., McGraw-Hill Irwin, New York, USA, ISBN: 9780071276252, Pages: 922
    Direct Link  |  


  3. Zainodin, H.J., A. Noraini and S.J. Yap, 2011. An alternative multicollinearity approach in solving multiple regression problem. Trends Applied Sci. Res., 6: 1241-1255.
    CrossRef  |  Direct Link  |  


  4. Yahaya, A.H., N. Abdullah and H.J. Zainodin, 2012. Multiple regression models up to first-order interaction on hydrochemistry properties. Asian J. Math. Stat., 5: 121-131.
    CrossRef  |  Direct Link  |  


  5. Anastasopoulos, P.C. and F.L. Mannering, 2009. A note on modeling vehicle accident frequencies with random-parameters count models. Accid. Anal. Prev., 41: 153-159.
    CrossRef  |  


  6. Mazzocchetti, D., A.M. Berti, G. Lucchetti, R. Sartini, F. Colle, P. Iudicone and L. Pierelli, 2012. Evaluation of volume and total nucleated cell count as cord blood selection parameters. Am. J. Clin. Pathol., 138: 308-309.
    CrossRef  |  


  7. Jaccard, J., 2001. Interaction Effects in Logistic Regression. SAGE Publications, London, Pages: 70


  8. Lind, D.A., W.G. Marchal and S.A. Wathen, 2012. Statistical Techniques in Business and Economics. 15th Edn., McGraw-Hill, New York


  9. Chittawatanarat, K., S. Pruenglampoo, V. Trakulhoon, W. Ungpinitpong and J. Patumanond, 2012. Development of gender-and age group-specific equations for estimating body weight from anthropometric measurement in Thai adults. Int. J. Gen. Med., 5: 65-80.
    CrossRef  |  PubMed  |  


  10. Zeileis, A., 2004. Econometric computing with HC and HAC covariance matrix estimators. J. Statist. Software, 11: 1-17.
    Direct Link  |  


  11. Ansorge, H.L., S. Adams, A.F. Jawad, D.E. Birk and L.J. Soslowsky, 2012. Mechanical property changes during neonatal development and healing using a multiple regression model. J. Biomechan., 45: 1288-1292.
    CrossRef  |  


  12. Emuchay, C.I., S.O. Okeniyi and J.O. Okeniyi, 2014. Correlation between total lymphocyte count, hemoglobin, hematocrit and CD4 count in HIV patients in Nigeria. Pak. J. Biol. Sci., 174: 57-573.
    CrossRef  |  


  13. Furnstahl, R.J. and B.D. Serot, 2000. Parameter counting in relativistic mean-field models. Nucl. Phys. A, 671: 447-460.
    CrossRef  |  


  14. Robinson, P.S., T.W. Lin, A.F. Jawad, R.V. Iozzo and L.J. Soslowsky, 2004. Investigating tendon fascicle structure-function relationships in a transgenic-age mouse model using multiple regression models. Ann. Biomed. Eng., 32: 924-931.
    PubMed  |  


  15. Festing, M.F., 2006. Design and statistical methods in studies using animal models of development. ILAR J., 47: 5-14.
    CrossRef  |  


©  2022 Science Alert. All Rights Reserved