HOME JOURNALS CONTACT

Journal of Applied Sciences

Year: 2011 | Volume: 11 | Issue: 16 | Page No.: 3015-3021
DOI: 10.3923/jas.2011.3015.3021
Multicollinearity Problem in Cobb-Douglas Production Function
Maryouma Enaami, Sazelli Abdul Ghani and Zulkifley Mohamed

Abstract: The Cobb-Douglas Production Functions (CDPF) are among the best known production functions utilized in applied production analysis. The estimation of production functions in general and CDPF in particular, presents many additional problems. Multicollinearity arising in least squares estimation of the CDPF is not new. It is a problem that emerged with the model itself. In this study an estimation method for CDPF parameters by partial least squares path modeling (PLS-PM) is developed. It solves the attendant multicollinearity problem. The newly developed method is then applied to agricultural production data obtained from Al- Kufra Agricultural Production Project, Libya. The results from the model strongly suggest that the measures like composite reliability and goodness-of-fit represent their respective latent constructs well. Consequently, a further investigation of the model is pursued and an analysis on PLS-PM is performed.

Fulltext PDF Fulltext HTML

How to cite this article
Maryouma Enaami, Sazelli Abdul Ghani and Zulkifley Mohamed, 2011. Multicollinearity Problem in Cobb-Douglas Production Function. Journal of Applied Sciences, 11: 3015-3021.

Keywords: ordinary least squares, partial least squares- path modeling, multicollinearity, Cobb-Douglas production function, structural equation models and wheat inputs

INTRODUCTION

Linear regression based on Ordinary Least Squares (OLS) is a feasible method to simply analyze linear relationships but is meaningless when relationships are non-linear. However, sometimes a non-linear relationship between an independent variable and the dependent one can be converted into a linear relationship by a numerical transformation of the variables (D’Ambra and Sarnacchiaro, 2010). Exponential relationships which are quite common in rational theories of economics, such as the CDPF can be turned into linear relationship by taking logs of the separate variables (Pennings et al., 2006). In applied work, most researchers in the economics area often commence by estimating the CDPF using OLS, hoping to obtain estimates of the labor and capital output elasticities that look plausible and interpretable from a theoretical point of view (Armagan and Ozden, 2007). But the OLS method has some problems like multicollinearity which often exists between the economic factors and which may greatly affect parameter estimation (Barrios and Vargas, 2007).

The multicollinearity will seriously affect some results such as increasing the OLS variance, reducing reliability of model and compromising stationarily (El-Salam, 2011). Since the rank of parameter estimation is close to zero, the diagonal data of the covariance matrices will be too big. Which means the Variance Inflation Factor (VIF) will be infinite. At the same time when extracting different data from sample parameter estimation, multicollinearity will lead to fluctuant estimation and lack of stationarily (D’Ambra and Sarnacchiaro, 2010). All these implications to a great extent decrease the accuracy of the OLS estimates and mask model’s significance. Moreover, in order to ascertain its statistical validity, the OLS method needs a great amount of sample data (Shang and Zhang, 2009). However due to macroeconomic situation in agricultural sector, it often hasn’t so many data with steady trend and other problems (Diao et al., 2007). The PLS method is adopted here in analyzing agriculture data to avoid the preceding limitations of OLS. PLS is especially good at dealing with small sample data, plenty of variables and multicollinearity. It can greatly improve reliability and precision of model (Shang and Zhang, 2009).

Economical problems underlie many events or problems that seem to be hard to explain and solve (Webster, 2003). The efforts for economical development have increasingly become important (Okafor and Eiya, 2011). Consequently, OLS is not the best method as described above. In present study where these issues are being addressed; it is proposed that the development of CDPF parameter using the Structural Equation Modeling (SEM) implemented with a PLS-PM method is to be applied on Libyan Agriculture sector data. According to PLS-PM structure, each part of the model requires to be validated: the measurement model, the structural model and the overall model. A PLS-PM is described by two models: (1) a measurement model relating the Manifest Variables (MVs) to their own Latent Variables (LVs) and (2) a structural model relating some endogenous LVs to other LVs. The structural model the inner model and the measurement model are also called the outer model (Tenenhaus et al., 2005; Vinzi et al., 2009). The multicollinearity problem have been checked by means of the VIF and then by the suggested PLS-PM method that would solve these problems in the CDPF to avoid the preceding limitations of OLS. The findings of the PLS-PM-CD model correspond to very good results and provide important new insights. This supports the point that if it solves problems, not only in agriculture sector but may be the most effective path to macroeconomic development. The strength of this study is in the development of a new model based on CDPF with the used of PLS-PM for parameter estimation. It attempts to solve multicollinearity problem.

Cobb-Douglas production function (CDPF): Production functions are basic component of all economics domains. As such, estimation of production functions has a long history in applied economics, starting as early as the early 1800’s. Unfortunately, this history cannot guarantee unequivocal success, as many of the econometrics problems that hampered early estimation are still an issue today (Ackerberg et al., 2006). In economics, the CD form of the production functions is widely used to represent the relationship of an output to inputs. It was proposed by Knut Wicksell (1851-1926) and tested against statistical evidence by Charles Cobb and Paul Douglas during the period 1900-1928 (Cobb and Douglas, 1928).

Partial least square-path modeling (PLS-PM): Generally speaking, PLS-PM is a statistical approach for modeling complex multivariable relationships between observed and latent variables (Vinzi at el., 2009). In the past few years, these approaches have enjoyed increasing popularity in several sciences. Structural Equation Models (SEMs) include statistical methodologies that allow us to estimate the causal relationships linking two or more latent complex concepts, each measured through a number of observable indicators (Chin et al., 2009). From the viewpoint of structural equation modeling, the PLS-PM is a component-based approach where the concept of causality is formulated in terms of linear conditional expectation. As an alternative to the classical covariance-based approach, the PLS-PM is claimed to be seeking for optimal linear predictive relationships rather than for causal mechanisms, thus privileging a predictive relevance-oriented discovery process to the statistical testing of causal hypotheses. The PLS-PM is a component-based estimation method .It is an iterative algorithm that separately solves out the blocks of the measurement model and then, in a second step, estimates the path coefficients in the structural model (Tenenhaus, 2008). According to PLS-PM structure, each part of the model needs to be validated: the measurement model, the structural model and the overall model.

DATA SOURCES

Agriculture is still one of the most important sectors in many economies and agricultural activities providing developing countries with food and revenue (World Bank, 2007). The significance of the agricultural sector in the process of economic development is indispensable (Kamat at el., 2007). With the recognition of this fact, planners in Libya have emphasized on the development of agricultural and allied sectors right from the beginning of the economic planning process in the country. This study collected and analyzed data from the wheat sector of Al-Kufra Agricultural Production Project for the period extending from 1960 to 2010. These data cover almost all important input economic activities. All reference data were collected from Libyan governmental sources (the main sources of data were government reports like the Economic Survey of Libya) including the Agricultural Research Center, General Directorate of Agricultural Projects Productivity and the Project to Support National Capacities for Data Collection and Analysis Farm.

RESEARCH METHODOLOGY

The variables: The following variables have been used to estimate the model:

Output items: (output) Wheat production (ton ha-1). We used the wheat production data because wheat is the most important crop among all cereal crops
Input data: Data on land, capital and other important inputs needed for this analysis include the following:

(LW) Land data requirements which include: land (thousand ha of land farmed with wheat) and water use (Million cubic meters of water ha-1)
(AR) Agricultural inputs which comprise the average per hectare of seeds (seed ton ha-1), Chemical fertilizers; (Fertilizer ton ha-1) and Pesticides; (pesticides Balter ha-1)
(HW) Hours of operation (h day-1); Labour use where the labor input was measured in person-year equivalent of workers directly engaged in production in farming; and Wages (salaries and administrative costs; Average cost per season of wages and salaries, social security, camping costs and administrative costs)
(OP) Operation and maintenance; average cost per season for electricity, fuel, spare parts and oil and lubrications

THE COBB-DOUGLAS FUNCTION

In recent study, Enaami et al. (2011), outline a methodology that, the production function is specified as Cobb-Douglas production function in the form:

(1)

where, Y is wheat crop output, the coefficient α0 is the total factor efficiency parameter for composite primary factor inputs in sector i, the parameters α1, α2, α3, α4, α5, α6, α7, α8, α9 and α10 are production elasticities and x1 = water, x2 = land, x3 = seeds, x4 = chemical fertilizer, x5 = pesticide, x6 = hours of operation, x7 = wages, x8 = spare parts, x9 = fuel, x10 = oil and lubrications and x11 = electricity. Equation 1 demonstrates that the relationship between output and input is nonlinear. However, after log-transforming the variables input to this model are the following:

Let us now make the main aims of the work more precise. Since the CDPF models mentioned above are estimated only by least-squares estimators, by Eq. 2 it is proposed that utilization of the PLS estimator first provides an estimate of the measurement model and describes the structural relationships between latent variables through the PLS-path modeling.

(2)

Partial Least Squares (PLS) analysis: assessment of the measurement and structural models: Evaluation of a research model using PLS analysis consists of two distinct steps. The first step includes the assessment of the measurement model and deals with evaluation of the characteristics of the latent variables and measurement items that represent them. The second step involves the assessment of the structural model and the evaluation of the relationships between the latent variables as specified by the research model. The results of PLS analysis which was conducted using the VLPLS package were presented in that order. According to Enaami et al. (2011), the external model consists of five measurement models, namely, land and water (LW), seeds, chemical fertilizer and pesticide (AR), hours of operation and wages (HW) (η3) fuel, spare parts, oil and lubrications and electricity (OP) (η4) and (η5) wheat crop (Output). While the internal model consists of five structural models: (LW) structured model (η1) (AR) structured model (η2) (HW) structured model (η3) (OP) structured model (η4) and (Output) structured model (η5). In addition to the common systems of measurement model and structural model, path diagrams linking (i) exogenous and endogenous variables and (ii) the indicator variables with exogenous variables and with latent endogenous variables. The suggested model methodology is illustrated by applying it to the data of an important product output (wheat) in the Libyan agricultural sector. In the theoretical study the following additive model was included:

(3)

where, the coefficient Λ0 is the total factor efficiency parameter for composite primary factor inputs in sector i. Parameters Λ15, Λ25, Λ35, Λ45 are production elasticity’s.

The overall structural model and the measurement models will be shown in Fig. 1. The sets describe the measurement model LW, AR, HW, OP and Output latent variables consisting of endogenous variable which is marked as η1, η2, η3, η4, η5, yi (i = 1, ..., 12) (water, land, seeds, chemical fertilizer, pesticide, hours of operation, wages, spare parts, fuel, oil and lubrications and electricity) are indicators variables, λi (i = 1, ..., 12) are the correlation coefficients between the indicator variables and the endogenous latent variables and measurement error which is represented by (δi) (i = 1, ..., 12) and Λ15, Λ25, Λ21, Λ31, Λ32, Λ35, Λ41, Λ42, Λ45 are the regression coefficients relating the exogenous latent variables and the endogenous latent variables. The sets structural model describes the relationship between the endogenous latent variables, the latent exogenous variables LW, AR, HW, OP and Output. Formation of this model was based on recommendations by Kherallah et al. (2000), Mahagayu et al. (2007) and Carver (2009) who expressed that there is a correlation between (Output) and each of the (LW) (AR) (OP) and (HW).

RESULTS

In this part of the study, the working database has been extended and improved. In reference to Eq. 1, the relationship between output and inputs is non-linear. However, after log- transforming the variables, we obtain the following linear model (Eq. 2) and results:

(4)

The negative coefficients in Eq. 4 points to the variables water, seeds, fertilizer, pesticide, hours of operation and oil and lubrications that have negative relationship with the outputs which is a result that contradicts with the economics logic as well as with the statistics logic. In light of this, the researcher attempt to find better ways of dealing with this problem.

Multicollinearity detection: This study shows that in this type of research multicollinearity may be present and lead to unstable OLS regressions. There are many methods used to detect multicollinearity. Examples include computation of the correlation matrix of predictor variables and analyzing the results. A very high correlation coefficient between any two variables may indicate that they are collinear. This method is easy but it cannot produce a clear estimate of the rate (degree) of multicollinearity. The VIF is another approach to testing for multicollinearity. Generally, when VIF>10, it is assumed that highly multicollinearity exists between the tested variables (Adnan et al., 2006). It is found that all the variables indicator of wheat yield had VIF values greater than 10 except the variable wages. Thus, the FIV shows that there is a problem of multicollinearity among the variables. Based on the popular methods of testing for multicollinearity, it can be concluded that there was a problem of multicollinearity in the data. Therefore, studies were launched using the PLS-PM-CD as suggested to be used to solve multicollinearity problems.

Modeling partial least squares-path modeling for cobb-douglas production function (PLS-PM-CD): The development of an estimation parameter for the CD is by the PLS-PM. The model can be divided into two parts. The measurement model is the part which relates measured variables to latent variables. The structural model is the part that relates latent variables to one another. Among the techniques available in the software used in this work is LVPLS-Lohmoller. The package fits the data to the specified model and produces the results which include overall model fit statistics and parameter estimates (Fig. 1). The relationship between the latent variables is illustrated in Table 1.

Fig. 1: Summarizes the path diagram of the PLS-PM-CD model

The correlation for the model is more than 0.75 which implies a fairly strong positive relationship. The correlation matrix of the latent variables is summarized also in Table 1.

Assessment of the measurement model: In order for a model to pass the test of composite reliability assessment, it must have the following properties.

It should have an internal consistency (Composite reliability) above 0.7 (Nunnally, 1978)
Its construct must meet the minimum reliability of 0.60 (Bagozzi and Yi, 1988)
The value of variance shared by each construct and measures (Average Variance Extracted-AVE) should be greater than 0.5 ((Fornell and Larcker, 1981)

In Table 3, the results of the assessment shows that the Composite reliability is between 0.85 and 1.00, AVE>0.7 and Cronbach’s Alpha>0.70. These results demonstrate that the research model successfully passed the test of composite reliability. The results of the assessment of reliability of the individual measures are also provided. It also illustrate that the individual loadings of the all items are greater than 0.87. This indicates that the proposed model satisfies the reliability of the individual items as well.

The measure of internal consistency is commonly used for assessing convergent validity of the measures (Fornell and Larcker, 1981). The evaluation process includes an estimate of the size and importance of the student’s t values for shipping from each of the individual elements (t-values are obtained by running a bootstrapping procedure), as well as the assessment of the loadings of the measures on their own constructs. The results prove that the proposed model passed the first test of convergent validity and all the t- values for all measures are significant p<0.05). Therefore, the proposed model proves to be reliable as evidenced by its performance in the foregoing tests and subsequently be preceded with the assessment of the structural model.

Assessment of the structural model: Assessment of the structural model includes testing for significance of the hypothesized relationships between the research model constructs. Once the path coefficients between the two constructs in the model have been calculated, their significance level of the path can be evaluated. In VPLS, the t-values are obtained by running a bootstrapping procedure while the significance level of the path is established by using a two-tailed t-test (Vinzi at el., 2009). Table 2 summarizes the results of the bootstrap re-sampling procedure with different numbers of re-samples.

Table 1: Coefficients of the correlations between the latent variables

Table 2: Structural model PLS-PM-CD bootstrap
* indicates significance at the 0.10 level of significance while non-labeled entries are significant at the 0.05 level of significance

Table 3: The model results

Results are very stable with respect to the number of re-samples. The results in the case of 200 re-samples point out that the model works well.

And as formerly discussed, there is no overall fit index in PLS-PM. Nevertheless, a global criterion of goodness of fit (The DoF Index) has been proposed by Tenenhaus et al. (2005). Such an index has been developed in order to take into account the model performance in both the measurement and the structural components. It thus provides a single measure for the model’s overall prediction performance. For this reason, the GoF index is the product of the square root of geometric mean of the average communality index and that of the average R2 value, according to the results (GoF = 0.6882), it shows that the model working well (Table 3). Discriminated validity can also be assessed by examining the average variance extracted estimates (AVE), which should be greater than the squared correlation estimate (Fornell and Larcker, 1981) and correlation between the variables in the confirmatory model should not be higher than 0.8 (Bagozzi and Heatherton, 1994). The suggested model meets all criteria and thus confirms the validity of differentiation is to study the structures. From Table 3 the AvCommun (AVE) = 0.8728 and camper with square correlation in Table 1. This indicates that the research model fares well in regard to the assessment of the AVE and square correlations of the individual items as well. Finally; the multicollinearity problems have been checked to determine whether it has been solved or otherwise. The latent variables can be treated just as classical variables. The scores produced at the end of the PLS-PM (Factor Score of Latent Variables) and thus multicollinearity can be checked for with classical tools (e.g., the VIF) by referring to the structural equation in the path model. By doing so, it was found that the value of the VIF is less than 5. That is to say that new model corresponds to no multicollinearity.

The overall structural production model developed through this study is already discussed in Fig. 1.

CONCLUSION

This study is more theoretical and should be seen as a new way to estimate Cobb-Douglas production function. The authors proposed the development of Cobb-Douglas production function parameter through the partial least squares- Path modeling (PLS-PM) method to be applied on Libyan Agriculture sector data. The results from the model strongly suggest that the measures like composite reliability, Cronbach’s Alpha, Cronbach’s Alpha, internal consistency and goodness-of-fit represent their respective latent constructs well. Consequently, a further investigation of the model is pursued and an analysis on PLS-PM is performed. Each block consists of strong latent variables and one or more indicator variables. In other words, the findings of the model correspond to very good results and provide important new insights. This supports the point that if it solves problems, not only in agriculture sector but may be the most effective path to macroeconomic development.

REFERENCES

  • Ackerberg, D., K. Caves and G. Frazer, 2006. Structural identification of production functions. December 28, 2006. http://folk.uio.no/rnymoen/Ackerberg_Caves_Frazer.pdf.


  • Adnan, N., M.H. Ahmad and R. Adnan, 2006. A comparative study on some methods for handling multicollinearity problems. Matematika, 22: 109-119.
    Direct Link    


  • D'Ambra, A. and P. Sarnacchiaro, 2010. Some data reduction methods to analyze the dependence with highly collinear variables: A simulation study. Asian J. Math. Stat., 3: 69-81.
    CrossRef    Direct Link    


  • Armagan, G. and A. Ozden, 2007. Determinations of total factor productivity with cobb-douglas production function in agriculture: The case of aydin-Turkey. J. Applied Sci., 7: 499-502.
    CrossRef    Direct Link    


  • Barrios, E. and G. Vargas, 2007. Forecasting from an additive model in the presence of multicollinearity. Proceedings of the 10th National Convention in Statistics (NCS), Oct. 1-2, EDSA Shangri-La Hotel, pp: 1-11.


  • Bagozzi, R.P. and Y. Yi, 1988. On the evaluation of structural equation models. J. Acad. Market. Sci., 16: 74-94.
    CrossRef    Direct Link    


  • Bagozzi, R.P. and T.F. Heatherton, 1994. A general approach to representing multifaceted personality constructs: Application to state self-esteem. Struct. Equ. Model.: Multidisciplinary J., 1: 35-67.
    CrossRef    


  • Carver, B.F., 2009. Wheat: Science and Trade. John Wiley and Sons, USA., pp: 569


  • Chin, W., C. Saunders and G. Marcoulides, 2009. Foreword: A critical look at partial least squares modeling. Manage. Inform. Syst. Quart., 33: 171-175.


  • Cobb, C.W. and P.H. Douglas, 1928. A theory of production. Am. Econ. Rev., 18: 139-165.
    Direct Link    


  • Diao, X., P. Hazell, D. Resnick and J. Thurlow, 2007. The role of agriculture in development: Implications for Sub-Saharan Africa. International Food Policy Research Institute, Research Report 153. http://www.ifpri.org/sites/default/files/publications/rr153.pdf.


  • Abd El-Salam, M.E.F., 2011. An efficient estimation procedure for determining ridge regression parameter. Asian J. Math. Stat., 4: 90-97.
    CrossRef    Direct Link    


  • Enaami, M., A.S. Ghain and Z. Mohamed, 2011. Theoretical estimation of Cobb-Douglas production function parameter through a robust partial least squares. J. Sci. Math., Vol. 2, No. 2.


  • Fornell, C. and D.F. Larcker, 1981. Evaluating structural equation models with unobservable variables and measurement error. J. Market. Res., 18: 39-50.
    CrossRef    Direct Link    


  • Kamat, M., S. Tupe and M. Kamat, 2007. Indian agriculture in the new economic regime, 1971-2003: Empirics based on the Cobb Douglas production function. MPRA Paper No. 6150, December 2007. http://mpra.ub.uni-muenchen.de/6150/1/MPRA_paper_6150.pdf.


  • Kherallah, M., H. Lofgren, P. Gruhn and M.M. Reeder, 2000. Wheat policy reform in Egypt: Adjustment of local markets and options for future reforms. Research Reports No. 115, International Food Policy Research Institute, pp: 170. http://ideas.repec.org/p/fpr/resrep/115.html.


  • Mahagayu. M.C., J. Kamwaga, A. Cndiema, J. Kamundia and P. Gamba, 2007. Wheat producticity, constraints associated in the Eastern parts of Kenya Timau division. Afr. Crop Sc. Conference Proc., 8: 1211-1214.
    Direct Link    


  • Nunnally, J.C., 1978. Psychometric Theory. 2nd Edn., McGraw-Hill, New York, United State, ISBN-13:9780070474659, Pages: 701
    Direct Link    


  • Okafor, C. and O. Eiya, 2011. Determinants of growth in government expenditure: An empirical analysis of Nigeria. Res. J. Bus. Manage., 5: 44-50.
    CrossRef    Direct Link    


  • Pennings, P., H. Keman and J. Kleinnijenhuis, 2006. Doing Research in Political Science. Sage, USA., pp: 324


  • Shang, W. and Y. Zhang, 2009. The relationship between rural infrastructure and economic growth based on partial least-squares regression. Networking Digital Soc. Int. Conf., 22: 127-130.


  • Vinzi, V.E., W.W. Chin, J. Henseler and H. Wang, 2009. Handbook of Partial Least Squares: Concepts, Methods and Applications. 1st Edn., Springer, Berlin, ISBN-13: 9783540328254


  • Tenenhaus, M., V.E. Vinzi, Y.M. Chatelin and C. Lauro, 2005. PLS path modeling. Comput. Stat. Data Anal., 48: 159-205.
    CrossRef    Direct Link    


  • Webster, T.J., 2003. Managerial Economics: Theory and Practice. Emerald Group Publishing, New Delhi, India, ISBN-13: 9780127408521, pp: 739


  • World Bank, 2007. World Development Indicators 2007. The World Bank, Washington DC., ISBN-13: 9780821369593


  • Tenenhaus, M., 2008. Component-based structural equation modelling. Total Qual. Manage. Bus. Excellence, 19: 871-886.
    Direct Link    

  • © Science Alert. All Rights Reserved