**ABSTRACT**

In this study modified estimation methods based on the maximum entropy median for fitting unreplicated functional model to data were proposed. Two types of modification were considered, modified Wald-type estimators and modified Theil-type estimators. Also, a coefficient of determination to estimate the proportion of variation when both variables are measured with error is proposed. A Monte Carlo experiment were used to study the performance of the proposed estimators, the simulation results showed that the proposed estimators are more efficient than the classical estimators when the data contains outliers. A real data analysis for fitting the relationships between the unemployment rate and the human development index in the Arab’s states also included. The data analysis showed that there is a negative relationship between both variables and suggested that the Arab’s states should change their economical, educational and social policies to improve the life quality in these states.

PDF Abstract XML References Citation

**Received:**November 14, 2011;

**Accepted:**January 21, 2012;

**Published:**March 21, 2012

####
**How to cite this article**

*Journal of Applied Sciences, 12: 326-335.*

**DOI:**10.3923/jas.2012.326.335

**URL:**https://scialert.net/abstract/?doi=jas.2012.326.335

**INTRODUCTION**

The classical method used in fitting simple linear relationship between two variables; y and x, is the Ordinary Least Squares (OLS) method by minimizing the sum of squares of the vertical distances “regression y on x” or the horizontal distances “regression x on y”; of the observations from the straight line; assuming that only one variable is measured with error. In fact, most of the analytical methods are subject to measurement errors and ignoring such error then not only the slope attenuated but also the data are noisier; with an increased error about the line (Carroll *et al*., 2004). Therefore, we should consider the migrations in x by distinguish between the measured value (x) and the true value:

by considering the simple linear Measurement Error Model (MEM) instead of the simple linear regression. Then; for a sample of size n; the simple MEM between two unobserved latent variables ξ and η are assumed to have the following relationships:

(1) |

To measure such latent variables with additive error one uses manifest variable Y and X, then the observed data are:

(2) |

where, δ_{i} and ε_{i} are assumed to be mutually independent with mean (0,0) and unknown variances (σ^{2}_{δ}, σ^{2}_{ε}). There are (n+4) parameters to be estimated in (1) and (2) which are α, β, σ^{2}, τ^{2} and the latent variables or the so-called “incidental” parameters ξ_{1}, ξ_{2},…, ξ_{n} (or equivalently η_{1}, η_{2},…, η_{n}). The presence of the incidental parameter leads to inconsistencies of the estimators. To overcome the inconsistencies of the estimators, one piece of information is required on the MEM (Kendall and Stuart, 1979; Fuller, 1987; Cheng and Van Ness, 1999; Al-Nasser *et al*., 2005; Carroll *et al*., 2009; Fah *et al*., 2007; O’Driscoll and Ramirez, 2011). In model (1) and (2) the vertical and the horizontal distances should be minimized at once in order to estimate the unknown parameters and this can be done by minimizing the orthogonal distances (Fig. 1) or the so called squared perpendicular error:

Fig. 1: | Orthogonal distances from the data to the line |

Under normality assumptions, the distances in Fig. 1 are given as follows:

where,

The errors then could be estimated by the classical OLS or maximum likelihood estimation methods given that λ is known and in case it is unknown (Jonathan and Terence, 2009) proposed an estimator of it based on the fourth moments:

(3) |

where,

As a matter of fact, the difficulties appeared in the OLS and the maximum likelihood estimation methods are good reasons to use alternative methods, the most efficient and easy methods in the literature are Wald-type (Wald, 1940; Al-Radaideh * et al*., 2010; Huwang and Yang, 2000; Hussin * et al*., 2009; Al-Nasser *et al*., 2005) and Theil-type estimate based on repeated median methods (Theil, 1950; Sen, 1968; Siegel, 1982).

In this study, a modified Wald-type and Theil-type estimators proposed based on the maximum entropy median which are preferable to get new robust estimators. Moreover, a coefficient of determination for the unreplicated functional model is proposed.

**ESTIMATION METHODS OF UNREPLICATED FUNCTIONAL MODEL (URFM)**

Various estimation methods have been suggested to estimate the unknown parameters of the URFM (Hussin *et al*., 2009; Al-Nasser *et al*., 2005). Two different types of estimation methods will be considered in this article, mean-based methods, i.e., Maximum Likelihood Estimation (MLE) and Grouping Estimation methods and median based methods, i.e., Theil-type estimation methods.

**Maximum likelihood estimators: **The classical MLE of model (1) and (2) can be obtained by minimizing the Mahalanobisdistance:

Under the normality assumption; the log likelihood function of the URFM:

Which can be rearrange in the following form:

Solari (1969) showed that there is no MLE solution for the slope if λ is unknown. Therefore, the piece of information about λ (or both error variances are known) is essential to obtain a consistent estimator (Kendall and Stuart, 1979). Then, based on the given information the MLE solution will be:

(4) |

and:

Then the corrected perpendicular distance (Cheng and Van Ness, 1999) can be obtained by:

It should be noted that; when λ = 1, the URFM is known as Deming regression. Also, the URFM estimate of the slope in Eq. 4 can be expressed as a combination of the regressions of y on x and x on y, which can be estimated by:

where,

Noting that; and are the regression estimate of y on x and x on y, respectively.

**Wald estimators:** One of the most popular alternative methods for MLE in the literature is the grouping method and well-known as Wald-type estimators. The method consists of ordering the observed pairs (x_{i}, y_{i}) ’s by the magnitude of the x_{i}’s, then divide the observations into some groups. Wald (1940) suggested two groups while Nair and Shrivastava (1942) and Bartlett (1949), proposed a more efficient estimators by forming the observations into three groups; such that the estimate will be based on the first and the last group while middle group will be omitted. In general, the grouping method has the following procedure (Al-Nasser *et al*., 2005):

• | Order the pairs (x_{i}, y_{i}) s by the magnitude of x_{i}: |

• | Select proportions P_{1} and P_{2} such that P_{1}+P_{2}≤1, place the first nP_{1} pairs in group one (G_{1}) and the last nP_{2} pairs in another group (G_{3}), discarding G_{2} the middle group of observations that is to say: |

where, x_{pi} is the p_{i} percentile

• | Then the slope and the intercept of URFM can be estimated, respectively by: |

where, {X_{np(I)}, I =1, 2,…, n} are the order statistics of X and {Y_{np(I)}, I = 1, 2,…, n } are the associated Y. Noting that when:

then the grouping method named by two groups and when:

then the method is called three groups

The advantage of these methods is the simplicity in computation and avoiding of the normality assumption. However, some weak precision of using this method arises if there is a set of outliers, because it is mean-based estimators. To avoid this problem it is preferable to use the median as center of gravity of each subgroup; then the Wald-type median based (two groups) estimators will be:

(5) |

where, = med(y_{i}) and = med(x_{i}) are the medians of subgroup j, I = 1, 2,…, n_{j}, j = 1, 2.

The similar modification can be made with three group’s method.

**Robust estimators:** Theil (1950) proposed a robust method for making inference about the slope of the unreplicated functional model given in Eq. 1 and 2. The method is based on finding the median of all possible pairs of observations, assuming that all x_{i} ’s are distinct:

(6) |

Sen (1968) extended the proposed estimator in Eq. 6 to the case in which two samples have the same x-coordinates by considering only the slope values with distinct x-coordinates. Then the URFM parameters can be estimated by:

(7) |

(8) |

However, Maritz (1979), extending Theil’s ideas by proposing a point estimate for the intercept term, by taking the median of all pairwise intercept estimates:

Hence:

(9) |

Moreover, Siegel (1982) introduced another estimate based on repeated median method. The slope of URFM can be estimated by two stages after evaluating Theil statistics given in Eq. 6:

• | First stage: | (10) |

• | Second stage: | (11) |

Then, the intercept can be estimated by using the same formula given in Eq. 8 or 9.

**PROPOSED ESTIMATORS **

Different types of estimators are proposed for the URFM. The first one is a modified Wald-type and Theil-Type parameter estimators based on the maximum entropy median and the second one is a coefficient of determination to estimate the proportion of variation in y that explained by x when both variables are measured with error.

**Modified parameter estimators for URFM:** As an alternative to the sample median; Theil and O’Brien (1980) proposed the ME median which is defined as:

it is obvious that for even sample size (n) the ME median is equal to the sample median. In case of add sample size; Bani-Mustafa and Al-Nasser (2008), showed that the ME-median is more robust than the sample median with small sample size and performs almost equally well as the sample median with large samples. These results could be considered as good evidence in order to use the maximum entropy median instead the sample median to estimate the URFM parameters. Based on this, there are two modified estimators, the first is to use the ME median in the Wald-type estimators, hence the first modification will be on the estimators given in Eq. 5 to be:

and:

The second modification is about the Theil-Type estimators and it is divided into two groups:

• | Group 1: | Modification of Eq. 7 and 8: |

• | Group 2: | Modification of Siegel estimators given in Eq. 10 and 11: |

• | First stage: | ||

• | Second stage: |

Then, the intercept can be estimated by using the modified formula of Eq. 8 or 9, where, B_{ij} is given in Eq. 6.

**Proposed coefficient of determination for the URFM: **Coefficient of determination is a measure used in statistical model analysis to assess howwell a model explains and predicts future outcomes on the basis of other related information. It is the proportion of variability in a data set that is accounted for by the statistical model in order to measure the accuracy of the model. In the context of simple linear regression this coefficient is estimated as:

For the URFM there is no such measure in the literature, however, this measure can be derived based on the second sample moments which converge in probability to their expectations:

Therefore, when both variables are measured with errors; the population coefficient of determination is given by:

Alternatively, this coefficient can be estimated by the unexplained variance based on the relationships between the Sum of Squared Errors (SSE) and the regression sum of squares (SSR). The SSE in the URFM can be estimated by:

however, this estimator is not very reliable (Cheng and Van Ness, 1999). Instead, we can use the perpendicular error:

(12) |

Moreover; and based on different estimation methods; the estimator given in Eq. 12 is inconsistent and should be adjusted based on the degrees of freedom i.e., n-2. Then the consistent estimator for SSE will be:

Consequently, the URFM regression sum of squares (SSR) can be derived from the differences between the total sum of squares (Syy) and SSE:

Hence, the estimated coefficient of determination i.e., proportion of variation in y that explained by x when both variables are measured with error; can be estimated by:

**Performance of the proposed parameter estimators: **Monte Carlo experiments were conducted to evaluate the performance of the new proposed estimators based on maximum entropy median. The experiments were carried out by generating 10000 samples from the simple SLFR model:

assuming that; ε~N (0, 1), δ~N (0, 1) and:

where, [*] is the largest number less than or equal to *.

In order to study the robustness of the proposed estimator; the sample sizes were selected to be 15, 35 and 95 to ensure odd subgroup sizes for the Wald-type estimators. Moreover, we considered the contaminated case or the so called slippage in variance model, for each sample, exactly one of the observed data points x_{i}’S (y_{i}’S) is drown from N (0, 4^{2}) and n-1 observations were drawn from the actual distribution i.e., N (0, 1^{2}).

Moreover, the outlier observed values selected into two cases, the first one to be fixed; by considering the position of the outlier to be the last data point and the other selection of the outlier position to be random in the interval (1,n). For the MLE based estimators, we assumed that λ = 1 is known (Deming regression). Then the performance of all estimates evaluated measuring the efficiency of the estimators:

where, ρ could be the intercept or the slope and could be any of the estimators considered in this article.

Table 1: | Efficiency of the SLFR model with contamination in X |

*2G: Two groups and 3G: Three groups |

The simulation results reported in Table 1 and 2 showed that the ME-Median based estimators are more efficient than MLE, Wald-type or Theil-Type estimators whenever the data contains an outlier. However, for the inliers data generation the MLE based estimators are more robust.

**EFFECTS OF UNEMPLOYMENT RATE ON HUMAN DEVELOPMENT INDEX**

The Unemployment Rate (UR) is an indicator of employable people in a country’s workforce who are over the age of 16 and who have either lost their jobs or have unsuccessfully sought jobs in the last month and are still actively seeking work (Kayani and Ghani, 2003; Tan and Santhapparaj, 2007; Fasoranti, 2009).

Statistically, UR is computed as a ratio of the unemployed workers to the total labor force. Lately, unemployment or joblessness is considered one of the major problems in the Arab states and only three countries (UAE, Qatar and Kuwait) out of the 22 formal members state of the Arab league (Appendix 1), have UR less than 2.5%.

Table 2: | Efficiency of the SLFR model with contamination in Y |

*2G: Two groups and 3G: Three groups |

On the other hand and since we believe that the people are the real wealth of a nation, the Human Development Index (HDI) is developed to create an enabling environment for people to enjoy long, healthy and creative lives. The HDI proposed (Sen, 2009; Eneh and Nkamnebe, 2011; UNDP, 1990) as a single statistic to serve as a frame of reference for both social and economic development by combining indicators of life expectancy (X1), literacy (X2) and (the log of) real GDP percapita (X3). The HDI is constructing in three steps:

• | Step 1: | Compute the deprivation indicator for the jth country with respect to the ith variable as: |

where, NOC represents the number of countries included in the study

• | Step 2: | Define the average deprivation indicator: |

• | Step 3: | Define the HDI indicator as one minus the deprivation indicator: |

HDI _{j} = 1-I_{j}; j = 1, 2,…, NOC |

Fig. 2(a-b): | Box-plot for UR and HDI in the Arab’s States |

**Arab’s states data: **The data about the twenty two Arab states in Appendix 1 were collected from two formal international databases; CIA world fact book and UNDP organization web page in November, 2011. However, one state has incomplete data (Somalia) and it is excluded from the analysis.

Table 3: | Quadrants limits of HDI and UR |

(1) Classification based on UNDP (2011) report and (2) Classifications based on the 10, 25 and 50th percentiles of the world’s UR |

Moreover, it could be noted from the box plot given in Fig. 2 that Djibouti and Palestine Authority have an extreme value of UR; therefore, a robust estimation method is required to avoid the effect of the outliers.

For better data description and countries classifications, we locate the UR and HDI into different quadrants (Table 3).

The classification results in Fig. 3; indicate that there are only two Arab countries with Very high-HDI and Low-UR (UAE and Qatar), on the other hand, Bahrain has strong HDI but it is weak in term of UR. It is worth to say that there are five Arab states (Yemen, Comoros, Sudan, Mauritania and Djibouti) reflect very negative results in both variables HDI and UR.

A general comment, by comparing Arab’s states with the world; it could be concluded that this region has a very high UR (in average, 17.571) comparing to the world rate (8.8) and also it has a medium HDI (in average, 0.6503) comparing to the world HDI (0.683). That means many of the economical, educational and social policies should be improved or totally changed by the governments of these states in order to be listed in the world civilized map.

**Modeling UR and HDI data: **Since a smaller UR more effort in the HDI, then it is expected as showed in the scatter plot (Fig. 4) that there is a linear (negative) relationship between HDI and UR variables. Moreover, different universal resources reports different indicators (with closed values) and some of these indicators were estimated from a historical data or rounded to a three digit values; therefore, the classical regression model is not suitable for these data; instead; the model given in (1)-(2) will be a good formulation of the relationships between the UR (x) and HDI (y). Hence, the suggested compounded URFM for HDI and UR could be formulated as:

HDI = α+β (UR-δ)+ε |

where, α and β are the intercept and the slope, respectively and δ and ε are the error terms.

**Maximum entropy based estimators: **The relationship between HDI and UR is modeled by estimation the unknown parameters with the four modified estimators based on the maximum entropy median.

Fig. 3: | Arab’s State classification with respect to UR and HDI |

Fig. 4: | Scatter diagram showing the relationships between the UR and HDI |

Table 4: | Estimated parameters of HDI and UR model by the proposed methods |

The estimation results along with the estimated coefficient of determination and the mean sum of squared perpendicular residual are given in Table 4.

The results in Table 4 indicate that in terms of mean squared perpendicular residual and in terms of coefficient of determination, the repeated median estimator gave better fit to the data. In general, all models suggested that whenever the country improve the citizen life by decreasing the unemployment rate, the human development index increases. Fig. 5 contains plots of the residuals against the predicted values for HDI and UR. There are no suggestions in any of these plots in order to say that symmetric deviations from the fitted response plan are presented.

Fig. 5: | Predicted values vs Residuals for the proposed estimators |

**CONCLUSION**

This study gives the researcher a more precise method for estimating the parameters of the URFM model when the data contains outliers. The main idea of the proposed estimator is to use the maximum entropy median instead of the sample mean or the sample median in Wald-type and Theil-type estimators. Also, the novel coefficient of determination when both variables are measured with error give a new trend of explanation of the fitted URFM. The Monte Carlo simulations provide a good evidence for the superiority of the proposed model on the classical MLE in terms of the efficiency when the data contains outliers. Also, the real data analysis was used to study the effect of the UR on the HDI based on the proposed estimation method suggested of using the modified repeated median methods. All methods suggested a negative relationship between the unemployment rate in the Arab’s states and the human development index. The data analysis suggested that the Arab’s states should change their economical, educational and social policies in order to be listed in the world civilized map.

**APPENDIX**

Appendix 1: | HDI and UR data in the Arab states |

(1) World fact book; (2) UNDP organization and (3) The statistics are available on Gaza Strip only; *** missing data |

####
**REFERENCES**

- Al-Radaideh, A., A. Al-Nasser and E. Ciavolino, 2010. Efficient wald type estimators for simple linear measurement error model. Asian J. Math. Stat., 3: 82-92.

CrossRefDirect Link - Bani-Mustafa, A. and A.D. Al-Nasser, 2008. On estimating the variance of maximum entropy median for odd sample size. Pak. J. Stat., 24: 1-10.

Direct Link - Bartlett, M.S., 1949. Fitting straight line when both variables are subject to error. Biometrics, 5: 207-212.

Direct Link - Carroll, R.J., A. Delaigle and P. Hall, 2009. Nonparametric prediction in measurement error models. J. Am. Stat. Assoc., 104: 993-1003.

CrossRefDirect Link - Carroll, R.J., D. Ruppert, C.M. Crainiceanu, T.D. Tosteson and M.R. Karagas, 2004. Nonlinear and nonparametric regression and instrumental variables. J. Am. Stat. Assoc., 99: 736-750.

CrossRef - Fah, C.Y., A.G. Hussin and O.M. Rijal, 2007. An investigation of causation: The unreplicated linear functional relationship model. J. Applied Sci., 7: 20-26.

Direct Link - Cheng, C.L. and J.W. Van Ness, 1999. Statistical Regression with Measurement Error. Arnold, New York, USA.,.

Direct Link - Tan, C.H. and A.S. Santhapparaj, 2007. Macroeconomic determinants of skilled labour migration: The case of Malaysia. J. Applied Sci., 7: 3015-3022.

CrossRefDirect Link - Fasoranti, M.M., 2009. The influence of the deregulated telecommunication sector on Urban employment generation in Nigeria. J. Econ. Theor., 3: 57-62.

Direct Link - Jonathan, G. and I. Terence, 2009. Methods of fitting straight lines where both variables are subject to measurement error. Curr. Clin. Pharmacol., 4: 164-171.

CrossRefDirect Link - Hussin, A.G., N.R.J. Fieller and E.C. Stillman, 2009. Parameter estimation in the unreplicated functional model based on a grouping algorithm. Eur. J. Sci. Res., 33: 220-233.

Direct Link - Huwang, L. and J. Yang, 2000. Trimmed estimation in the measurement error model when the covariate has replicated observations. Proc. Natl. Sci. Counc. Repub. China Part A Phys. Sci. Eng., 24: 405-412.

Direct Link - Maritz, J.S., 1979. On theil's method in distribution-free regression. Aust. J. Stat., 21: 30-35.

CrossRef - Nair, R.K. and P.M. Shrivastava, 1942. On a simple method of curve fitting. Sanynkhya, 6: 121-132.

Direct Link - O'Driscoll, D. and D.E. Ramirez, 2011. Geometric view of measurement errors. Commun. Stat. Simulat. Comput., 40: 1373-1382.

CrossRef - Eneh, O.C. and A.D. Nkamnebe, 2011. Gender gap and sustainable human development in Nigeria: Issues and strategic choices. Asian J. Rural Devel., 1: 41-53.

CrossRefDirect Link - Sen, P.K., 1968. Estimates of the regression coefficient based on Kendall's Tau. J. Am. Stat. Assoc., 63: 1379-1389.

CrossRefDirect Link - Solari, M.E., 1969. The "maximum likelihood solution" of the problem of estimating a linear functional relationship. J. R. Statist. Soc. B, 31: 372-375.

Direct Link - Theil, H. and P.C. O'Brien, 1980. The median of the maximum entropy distribution. Econ. Lett., 5: 345-347.

CrossRef - Theil, H., 1950. A rank-invariant method of linear and polynomialregression analysis. III. Proc. Koninalijke Nederlandse Akademie van Weinenschatpen A, 53: 1397-1412.

Direct Link - Wald, A., 1940. The fitting of straight lines if both variables are subject to error. Ann. Math. Stat., 11: 284-300.

Direct Link - Kayani, Z.I. and A.A. Ghani, 2003. Intercountry inequality in some measure of well-being: 1960-1992. J. Applied Sci., 3: 120-124.

CrossRefDirect Link