Research Article
On the Kernel Estimation of the Conditional Mode
Department of Mathematics, The Islamic University of Gaza, Palestine
Hazem El Shekh Ahmed
Department of Mathematics, Al Quds Open University, Palestine
The problem of estimating the mode of a probability density function (pdf) is a matter of both theoretical and practical interest. Parzen (1962) considered the problem of estimating the mode of a univariate pdf. Parzen (1962) and Nadaraya (1965) have shown that under regularity conditions the estimator of the population mode obtained by maximizing a kernel estimator of the pdf is strongly consistent and asymptotically normally distributed. Samanta (1973) has given multivariate versions of Parzens results. Samanta and Thavaneswaran (1990) considered the problem of estimating the mode of a conditional pdf and they have shown under regularity conditions that the estimator of the population conditional mode is strongly consistent and asymptotically normally distributed. Salha and Ioannides (2004) generalized these results by considering the conditional mode evaluated at distinct conditional points. Vieu (1996) presented and compared four mode estimation procedures. Recently, for random design models, Ziegler (2002) proposed a kernel estimator of the mode and its asymptotic normality has been shown by Ziegler (2003). In addition, Ziegler (2004) presented an adaptive kernel estimator for the mode.
Assume that (X1, Y1, ,Xn, Yn) are i.i.d. random variables with joint pdf f(x, y) and a conditional pdf f(y|x) of Y1 given X1 = x. We assume that for each x, f(y|x) is uniformly continuous in y and it follows that f(y|x) possesses a uniquely conditional mode M(x) defined by:
Samanta and Thavaneswaran (1990) considered the problem of estimating the conditional mode and they use the Nadaraya Watson (NW) estimator of the conditional density function, but this estimator has disadvantages of producing rather large bias and boundary effects. To overcome these difficulties. Hall et al. (1999) proposed the Rewighted Nadaraya Watson (RNW) estimator as a weighted version of the NW estimator, which combines the better sides of the Local Linear (LL) estimators such as bias reduction and no boundary effects to preserve the property of the NW estimator is always a distribution function.
Let τ I(x) denote the probability like weights with properties that τi(x)≥0, and , where K(. ) is a kernel function, and h = hn>0 is the bandwidth. The roll of τi(x) is to adjust the NW weights such that the resulting conditional density estimator resembles that from the LL estimators. The RNW conditional density estimator is defined as follows:
If K(u) is chosen such that K(u) tends to zero as u tends to ± ∞, then for every sample sequence and for each x, fn(y|x) is a continuous function of y and tends to zero as y tends to ± ∞ . Consequentially, there is a random variable Mn(x), which is called the sample conditional mode, such that:
In this study, the conditional mode will be estimated using the RNW estimator of the conditional pdf and the asymptotic normality of this estimator will be proved and its performance will be examined by two applications.
CONDITIONS
Consider the following conditions:
Condition 1
The kernel function K(u) is a symmetric and bounded probability density function such that
• | The first two derivatives of K(u), (K(I) (u), I = 1,2) are functions of bounded variations. |
• | |
• |
Condition 2
The marginal density g(x) is uniformly continuous and is bounded from below by a positive constant.
Condition 3
The partial derivatives exist and are bounded for 1≤I+j≤3.
Condition 4
The bandwidth satisfying the following:
• | |
• |
MAIN RESULTS
Here, the main two theorems of this study theorem 1 and 2 will be presented and proved. For proving these theorems, the following lemmas are required.
Lemma 1
Under the conditions 1, 2 and 4 (I) the following is true:
Where:
Poof
The proof of this lemma is a part of the proof of theorem 1 by De Gooijer and Zerom (2003).
Let
Where:
Lemma 2
Under the conditions 1, 3 and 4, the following holds:
• | |
• |
Proof
Using Taylor expansion and integration by parts,
Then,
(1) |
Then,
(2) |
• | |
• |
This completes the proof of the lemma.
Now,
This implies that:
(3) |
Lemma 3
Under the conditions 1, 3 and 4, the following holds:
Proof
Let . . Since, E(εi|x) = 0, then E(Δ) = 0 which implies that E(J1) = 0.
This implies that,
To show that, we will use Liapunovs Theorem, (Pranab and Julio Singer, 1993). It is sufficient to show that:
Since, . Therefore, the following holds:
Since,
Therefore, ρn→ ∞.
This implies that:
which leads to:
Since,
we get that:
Theorem 1
Under the conditions 1, 3 and 4 the following is true:
where;
Proof
A combination of lemma 3 and Eq. 3 completes the proof of theorem 1.
Now, using Taylor expansion:
This implies that:
where;
Therefore,
(4) |
Lemma 4
Under the conditions 1-4, the following holds:
Proof
Using the same techniques of lemma 4 by Samanta and Thavaneswaran (1990).
Theorem 2
Under the conditions 1-4, the following is true:
Proof
The proof of the theorem follows directly by using Eq. 4, lemma 4 and theorem 1.
Note that Bias (fn (M (x)|x)→0 if we assume that the second moment of the kernel function K vanishes, that is as by Samanta and Thavaneswaran (1990).
Applications
The proposed method of RNW estimator is applied to find the conditional mode of different data sets. Standardized normal kernel function is used and the weights τi (x) are calculated as described by De Gooijer and Zerom (2003) and Cai (2002).
Example 1
This application will depend on some simulation data. Simulate a sample of size 200 from the model y = sin2π (1-x)2+xe, where x∼N(0,1) and e∼uniform[0,1]. A perfect smooth would recapture the original single y = sin2π(1-x)2, exactly. For a direct comparison of the perfect smooth and the conditional mode estimation, a scatter plot of the original data, the perfect smooth and the estimated conditional mode curve is shown in Fig. 1. The performance of the estimator can be tested using R2y,Ŷ (the correlation coefficient between Ŷ, the predicted values and y, the actual values).
where, SSE = denotes the error sum of squares, SSTO = denotes the total sum of squares and denotes the mean of the actual values . For the current data, SSE = 1.3958, which is small relative to SSTO = 15.3209 and R2y,Ŷ = 0.9089, which is closed to l and indicates that the correlation between the actual and predicated values is very strong. This comparison indicates that the proposed estimator of the conditional mode is reasonably good.
Fig. 1: | Comparison between the mode estimation and the perfect curve |
Fig. 2: | Three different estimations for the ethanol data |
Example 2
Consider the ethanol data, which describes the relationship between the predictor E (ethanol) and the response NOx (Nitric Oxide). Clearly the relationship is not linear. The regression relation estimated by using three different estimators, the proposed estimator (mode estimator) and another two estimators from S-Plus program, the locally weighted regression (loess) estimator and the kernel estimator. A scatter plot of the data together with the graphs of the three estimators is shown in Fig. 2. It is clear that the proposed estimator is reasonably good.