Subscribe Now Subscribe Today
Research Article
 

On the Kernel Estimation of the Conditional Mode



Raid Salha and Hazem El Shekh Ahmed
 
ABSTRACT

The estimator of the conditional mode obtained by maximizing the Nadaraya Watson (NW) kernel estimator of the conditional density function has disadvantages of producing rather large bias and boundary effects. The aim of this study is to overcome these disadvantages by proposing a modified estimator of the conditional mode obtained by maximizing the Reweighed Nadaraya Watson (RNW) kernel estimator of the conditional density function. The asymptotic normality and consistency of the proposed estimator are established and its efficiency is examined by two applications for both simulation and real life data.

Services
Related Articles in ASCI
Similar Articles in this Journal
Search in Google Scholar
View Citation
Report Citation

 
  How to cite this article:

Raid Salha and Hazem El Shekh Ahmed, 2009. On the Kernel Estimation of the Conditional Mode. Asian Journal of Mathematics & Statistics, 2: 1-8.

DOI: 10.3923/ajms.2009.1.8

URL: https://scialert.net/abstract/?doi=ajms.2009.1.8

INTRODUCTION

The problem of estimating the mode of a probability density function (pdf) is a matter of both theoretical and practical interest. Parzen (1962) considered the problem of estimating the mode of a univariate pdf. Parzen (1962) and Nadaraya (1965) have shown that under regularity conditions the estimator of the population mode obtained by maximizing a kernel estimator of the pdf is strongly consistent and asymptotically normally distributed. Samanta (1973) has given multivariate versions of Parzen’s results. Samanta and Thavaneswaran (1990) considered the problem of estimating the mode of a conditional pdf and they have shown under regularity conditions that the estimator of the population conditional mode is strongly consistent and asymptotically normally distributed. Salha and Ioannides (2004) generalized these results by considering the conditional mode evaluated at distinct conditional points. Vieu (1996) presented and compared four mode estimation procedures. Recently, for random design models, Ziegler (2002) proposed a kernel estimator of the mode and its asymptotic normality has been shown by Ziegler (2003). In addition, Ziegler (2004) presented an adaptive kernel estimator for the mode.

Assume that (X1, Y1,…,Xn, Yn) are i.i.d. random variables with joint pdf f(x, y) and a conditional pdf f(y|x) of Y1 given X1 = x. We assume that for each x, f(y|x) is uniformly continuous in y and it follows that f(y|x) possesses a uniquely conditional mode M(x) defined by:

Samanta and Thavaneswaran (1990) considered the problem of estimating the conditional mode and they use the Nadaraya Watson (NW) estimator of the conditional density function, but this estimator has disadvantages of producing rather large bias and boundary effects. To overcome these difficulties. Hall et al. (1999) proposed the Rewighted Nadaraya Watson (RNW) estimator as a weighted version of the NW estimator, which combines the better sides of the Local Linear (LL) estimators such as bias reduction and no boundary effects to preserve the property of the NW estimator is always a distribution function.

Let τ I(x) denote the probability like weights with properties that τi(x)≥0, and , where K(. ) is a kernel function, and h = hn>0 is the bandwidth. The roll of τi(x) is to adjust the NW weights such that the resulting conditional density estimator resembles that from the LL estimators. The RNW conditional density estimator is defined as follows:

If K(u) is chosen such that K(u) tends to zero as u tends to ± ∞, then for every sample sequence and for each x, fn(y|x) is a continuous function of y and tends to zero as y tends to ± ∞ . Consequentially, there is a random variable Mn(x), which is called the sample conditional mode, such that:

In this study, the conditional mode will be estimated using the RNW estimator of the conditional pdf and the asymptotic normality of this estimator will be proved and its performance will be examined by two applications.

CONDITIONS

Consider the following conditions:

Condition 1
The kernel function K(u) is a symmetric and bounded probability density function such that

The first two derivatives of K(u), (K(I) (u), I = 1,2) are functions of bounded variations.

Condition 2
The marginal density g(x) is uniformly continuous and is bounded from below by a positive constant.

Condition 3
The partial derivatives exist and are bounded for 1≤I+j≤3.

Condition 4
The bandwidth satisfying the following:

MAIN RESULTS

Here, the main two theorems of this study theorem 1 and 2 will be presented and proved. For proving these theorems, the following lemmas are required.

Lemma 1
Under the conditions 1, 2 and 4 (I) the following is true:

Where:

Poof
The proof of this lemma is a part of the proof of theorem 1 by De Gooijer and Zerom (2003).

Let

Where:

Lemma 2
Under the conditions 1, 3 and 4, the following holds:

Proof
Using Taylor expansion and integration by parts,

Then,


(1)

Then,


(2)

From Eq. 1 and 2, we get:

This completes the proof of the lemma.

Now,

This implies that:

(3)

Lemma 3
Under the conditions 1, 3 and 4, the following holds:

Proof
Let . . Since, E(εi|x) = 0, then E(Δ) = 0 which implies that E(J1) = 0.

This implies that,

To show that, we will use Liapunov’s Theorem, (Pranab and Julio Singer, 1993). It is sufficient to show that:

Since, . Therefore, the following holds:

Since,

Therefore, ρn→ ∞.

This implies that:

which leads to:

Since,

we get that:

Theorem 1
Under the conditions 1, 3 and 4 the following is true:

where;

Proof
A combination of lemma 3 and Eq. 3 completes the proof of theorem 1.

Now, using Taylor expansion:

This implies that:

where;

Therefore,

(4)

Lemma 4
Under the conditions 1-4, the following holds:

Proof
Using the same techniques of lemma 4 by Samanta and Thavaneswaran (1990).

Theorem 2
Under the conditions 1-4, the following is true:

Proof
The proof of the theorem follows directly by using Eq. 4, lemma 4 and theorem 1.

Note that Bias (fn (M (x)|x)→0 if we assume that the second moment of the kernel function K vanishes, that is as by Samanta and Thavaneswaran (1990).

Applications
The proposed method of RNW estimator is applied to find the conditional mode of different data sets. Standardized normal kernel function is used and the weights τi (x) are calculated as described by De Gooijer and Zerom (2003) and Cai (2002).

Example 1
This application will depend on some simulation data. Simulate a sample of size 200 from the model y = sin2π (1-x)2+xe, where x∼N(0,1) and e∼uniform[0,1]. A perfect smooth would recapture the original single y = sin2π(1-x)2, exactly. For a direct comparison of the perfect smooth and the conditional mode estimation, a scatter plot of the original data, the perfect smooth and the estimated conditional mode curve is shown in Fig. 1. The performance of the estimator can be tested using R2y,Ŷ (the correlation coefficient between Ŷ, the predicted values and y, the actual values).

where, SSE = denotes the error sum of squares, SSTO = denotes the total sum of squares and denotes the mean of the actual values . For the current data, SSE = 1.3958, which is small relative to SSTO = 15.3209 and R2y,Ŷ = 0.9089, which is closed to l and indicates that the correlation between the actual and predicated values is very strong. This comparison indicates that the proposed estimator of the conditional mode is reasonably good.

Fig. 1: Comparison between the mode estimation and the perfect curve

Fig. 2: Three different estimations for the ethanol data

Example 2
Consider the ethanol data, which describes the relationship between the predictor E (ethanol) and the response NOx (Nitric Oxide). Clearly the relationship is not linear. The regression relation estimated by using three different estimators, the proposed estimator (mode estimator) and another two estimators from S-Plus program, the locally weighted regression (loess) estimator and the kernel estimator. A scatter plot of the data together with the graphs of the three estimators is shown in Fig. 2. It is clear that the proposed estimator is reasonably good.

REFERENCES
Cai, Z., 2002. Regression quintiles for time series. Econometric Theory, 18: 169-192.
Direct Link  |  

De Gooijer, J. and D. Zerom, 2003. On conditional density estimation. Stat. Neerland., 57: 159-176.
Direct Link  |  

Hall, P., R.C.L. Wolf and Q. Yao, 1999. Methods for estimating a conditional distribution function. J. Am. Stat. Associat., 94: 154-163.
Direct Link  |  

Nadaraya, E.A., 1965. On non-parametric estimates of density functions and regression curves. Theory Probab. Applic., 10: 186-190.
CrossRef  |  Direct Link  |  

Parzen, E., 1962. On estimation of a probability density function and mode. Ann. Math. Stat., 33: 1065-1076.
Direct Link  |  

Pranab Sen, K. and M. Julio Singer, 1993. Large Sample Methods in Statistics. 1st Edn., Champion and Hall Inc., New York.

Salha, R. and D. Ioannides, 2004. Joint asymptotic distribution of the estimated conditional mode at a finite number of distinct points. Proceedings of the National Statistical Conference, April 14-18, 2004, Lefkada, Greece, pp: 587-594.

Samanta, M. and A. Thavaneswaran, 1990. Non-parametric estimation of the conditional mode. Commun. Stat.: Theory Methods, 19: 4515-4524.
CrossRef  |  Direct Link  |  

Samanta, M., 1973. Nonparametric estimation of the mode of a multivariate density. S. Afr. Stat. J., 7: 109-117.
Direct Link  |  

Schuster, E., 1972. Joint asymptotic distribution of the estimated regression function at a finite number of distinct points. Ann. Math. Stat., 43: 84-88.
Direct Link  |  

Vieu, P., 1996. A note on density mode estimation. Stat. Probab. Lett., 26: 297-307.
CrossRef  |  Direct Link  |  

Ziegler, K., 2002. On nonparametric kernel estimation of the mode of the regression function in the random design model. J. Nonparametric Stat., 14: 749-774.
CrossRef  |  Direct Link  |  

Ziegler, K., 2003. On the asymptotic normality of kernel regression estimators of the mode in the nonparametric random design model. J. Stat. Plann. Inference, 115: 123-144.
Direct Link  |  

Ziegler, K., 2004. Adaptive kernel estimation of the mode in the nonparametric random design regression model. J. Probabil. Math. Stat., 24: 213-235.
Direct Link  |  

©  2019 Science Alert. All Rights Reserved