Abstract: This study presented a comparative study of the different smoothing parameter selection methods. The parameter selection problem has been examined in respect to a smoothing spline implementation in predicting nonparametric regression models. For this purpose, a simulation experiment has been carried out by using a program written in MATLAB 6.5. The simulation experiment provides a comparison of the six different selection methods. In this context, 500 replications have been carried out in simulation for sample sets with different sizes. Thus, the empirical performances of the six selection criteria have been investigated and the suitable selection criteria are obtained for an optimum parameter selection.
INTRODUCTION
Smoothing spline method can be used to estimate the nonparametric regression models. The main goal of this method is to estimate the function that minimizes the penalized residual sum of squares. Basis of the penalized least squares method is to add a penalty term to the residual sum of squares. In the light of this information, the estimation of the unknown function depends on smoothing parameter. Therefore, the determination of an optimum smoothing parameter in the interval (0, ∞) was found to be an underlying complication. In the literature, different selection methods are components of various studies for an appropriate smoothing parameter. Indeed, to a considerable extent (Craven and Wahba, 1979; Hardle, 1991; Hardle et al., 1988; Wahba, 1990; Hurvich et al., 1998; Eubank, 1999; Lee and Solo, 1999; Hastie and Tibshirani, 1990; Schimek, 2000; Cantoni and Ronchetti, 2001; Ruppert et al., 2003; Lee, 2003, 2004; Kou, 2003) supplement on the selection of the smoothing parameter.
The empirical performances of the selection methods used in selection of the smoothing parameter are compared in this paper. Selection methods used in simulation are an improved version of Akaike Information Criterion (AICc), Generalized Cross-Validation (GCV), Cross-Validation (CV), Mallows Cp criterion, risk estimation using classical pilots (RCP) and local risk estimation (LRS). A simulation experiment is achieved to find out which selection methods are the best in smoothing parameter selection. To throw light on this issue, the samples differing in small and large sizes are secured by means of the above mentioned simulation and moreover, six selection methods are evaluated.
This study is mainly concerned with the selection of smoothing parameter through Monte Carlo simulation. Smoothing parameters play a crucial role in this procedure. Krivobokova and Kauermann (2007) showed that using the REML to estimate smoothing parameter outperforms other methods such as (generalized) CV or Akaike criterion especially when the error correlation structure is misspecified. Krivobokova et al. (2008) formulated a hierarchical mixed model to estimate local smoothing parameter to achieve adaptive penalized spline smoothing. Cao et al. (2010) discussed different methods of choosing the important smoothing parameter and recommend GCV as the choice for penalized spline smoothing parameter selection for both computational efficiency and accuracy of the functional coefficient regression models. Aydin and Memmedli (2011) recommended GCV and REML as being good smoothing parameter selection criteria for small and medium sized samples.
SMOOTHING SPLINE METHOD
Suppose observed are n pairs of measurements (xi, yi), i = 1, 2,..., n, satisfying the model:
(1) |
where, fεC2 [a, b] is an unknown smooth function and (εi)ni = 1 are normal distributed random errors with zero mean and common variance σ2.
The basic aim of the nonparametric regression is to estimate unknown function
fεC2 [a, b] (the class of all functions f with continuous first
and second derivatives) in model 1. Smoothing spline estimate of the f function
appears as a solution to the following minimization problem: Find
(2) |
for pre-specified value λ>0. The first term in Eq. 2 denotes the Residual Sum of the Squares (RSS) and it penalizes the lack of fit. The second term which is weighted by λ denotes the roughness penalty and it imposes a penalty on roughness. In other words, it penalizes the curvature of the function f. The λ in Eq. 2 is recognized to be the smoothing parameter. As λ varies from 0 to +∞, the solution varies from interpolation to a linear model. As λ→+∞, the roughness penalty dominates in (2) and the spline estimate is compelled to be a constant. As λ→0, the roughness penalty disappears in (2) and the spline estimation interpolates the data. Thus, the smoothing parameter λ plays a key role in controlling the trade-off between the goodness of fit represented by:
and smoothness of the estimate measured by:
The solution based on smoothing spline for minimum problem in the Eq.
2 is known as a natural cubic spline with knots at x1,...,
xn. From this point of view, a special structured spline interpolation
which depends on a chosen value λ develops into a suitable approach of
function f in model 1. Let f = (F (x1),..., f (xn)) be
the vector of values of function f at the knot points x1,..., xn.
The smoothing spline estimate
(3) |
where,
SMOOTHING PARAMETER SELECTION METHODS
Although, smoothing spline estimator solves the problem of allowing fits with variable slope, a new dilemma emerges. In fact, it generates the determination of the appropriate value for the smoothing parameter λ for a given data set. The same value of λ is unlikely to work equally well with every data set. As such, the estimation methods have been introduced for the selection of smoothing parameter λ in Eq. 2. The positive value λ that minimizes any smoothing parameter selection methods is selected as an appropriate smoothing parameter.
Selection methods used in experiment: Various smoothing parameter selection methods are featured in the literature. Most of these suggested methods were implemented in our simulation study. Moreover, a selection criterion from previous studies in the literature to provide an effective performance was also introduced in this particular study. The selection criteria used in our simulation study are classified as:
Cross-validation: The basic idea of CV is to disregard one of the points
(4) |
where,
Using the approximations
(Hastie and Tibshirani, 1990).
Generalized cross-validation: GCV is a modified form of the CV which is a conventional method for choosing the smoothing parameter. The GCV score which is constructed by analogy to CV score can be obtained from the ordinary residuals by dividing by the factors 1-(Sλ)ii. The underlying design of GCV is to replace the factors 1-(Sλ)ii in Eq. 8 with the average score 1-n-1 tr (Sλ). Thus, by summing the squared corrected residual and factor 1-n-1 tr (Sλ)2, by the analogy ordinary cross-validation, the GCV score function can be procured as follow (Craven and Wahba, 1979; Wahba, 1990):
(5) |
Mallows CP criterion: In the literature, Cp criterion is referred to as an unbiased risk estimate (UBR). This type of estimate was suggested by Mallows (1973) in the regression case and applied to smoothing spline by Craven and Wahba (1979). If σ2 is recognized, an unbiased estimate of the residual sum of squares is provided by Cp criterion:
(6) |
Unless σ2 is known, in practice an estimation for σ2 can be given by:
(7) |
where,
Improved Akaike information criterion: An improved version of a criterion based on the classical Akaike Information Criterion (AIC), AICc criterion, is used for choosing the smoothing parameter for nonparametric smoothers (Hurvich et al., 1998). This improved criterion is defined as:
(8) |
As can be seen from the Eq. 8, this criterion is easy to apply for the selection of smoothing parameter.
Risk estimation using classical pilots: Risk function measures the distance
between the actual regression function (f) and its estimation (
(9) |
A clear-cut explanation shows that
(10) |
where,
Local risk estimation: The LRS method proposed by Lee
(2004), aims to select the
(11) |
In the above equation, (Sλf) (xi) is the ith element
of vector Sλf and Sλ (xi) is the
ith diagonal element of the square matrix
• | For a set of pre-selected smoothing parameters λ1<...<m,
calculate the corresponding set of smoothing spline estimates: |
• | Select the pilot value λp from AICc criterion in Eq. 8 by using the elements in F |
• | For λp, calculate the estimates |
• | Substitute the pilots |
• | For each xi, find the |
SIMULATION EXPERIMENT
This study was conducted to evaluate the performances of the nine selection methods mentioned above. The experimental setup in this paper is adopted from Professor Steve Marron. By using a program coded in MATLAB, we generated the samples sized n = 25, 50, 100, 150, 200, 350 and 400. The number of replication was 500 for each of the samples. For each simulated data sets, the Mean Squared-Errors (MSE) was used for evaluate the quality of any curve estimate
Experimental setup: The experimental setup applied at this stage was designed to study the effects of the following three factors which vary an independent and effective approach: noise level, spatial variation, variance function.
The setup specification is listed Table 1. Simulation study was performed according to MATLAB program and the experimental setup was designed in the following framework:
• | Totally three sets of numerical experiments were performed. For each set of experiments, only one of the above three experimental factors (e.g., noise level, degree of spatial variation and noise variance function) has been altered while the remaining two have been kept unchanged |
• | Within each set of experiments, the factor levels was modified four times (i.e., r = 1, 2, 3, 4) to detect the effects of any experimental factor in Table 1 |
• | To see the performance of the small and large samples of the selection methods. For each factor level r within each set of experiments, we generated 7 different samples with sample sizes n = 25, 50, 100, 150, 200, 350 and 400 |
• | The number of replications was 500 for each of the 84 numerical experiments |
• | We computed the appropriate smoothing spline estimators |
• | We used the MSE values to evaluate |
• | Paired Wilcoxon test was applied to test whether MSE values was considered as the performance measure of any two methods are significant or not |
• | The factor levels indicated as r was changed four times (i.e., r = 1, 2, 3, 4) in order to detect the effects of any factor from three factors in Table 1 |
• | By considering 3 factors, 4 factor levels and 7 samples, totally, 84 numerical experiments were conducted |
Evaluations of the experimental results: For each simulated data set
used in the experiments, the MSE values were used in order to evaluate the quality
of any curve estimate
In this simulation study, because totally 84 different configurations are made, it is not possible to display here all these configurations. Therefore, only 12 different configurations are given in Fig. 1-3 for n = 150.
Table 1: | Specification of the simulation setup |
n = 25, 50, 100, 150, 200, 350 and 400 (it was taken seven different sample size) |
Fig. 1(a-d): | Simulation results correspond to the noise level factor for n = 150, (a) 1 = 1, (b) 1 = 2, (c) 1 = 3 and (d) 1 = 4 |
Fig. 2(a-d): | Simulation results correspond to the spatial variation factor for n = 150, (a) 1 = 1, (b) 1 = 2, (c) 1 = 3 and (d) 1 = 4 |
Fig. 3(a-d): | Simulation results correspond to the variance factor for n = 150, (a) 1 = 1, (b) 1 = 2, (c) 1 = 3 and (d) 1 = 4 |
Table 2: | Averaged Wilcoxon test ranking values for the six selection methods |
*Selection method has the best ranking |
Table 3: | Averaged Wilcoxon test ranking values for the six selection methods in large sample sizes |
*Selection method has the best ranking |
The head row plots of the Fig. 1-3 display the true regression function together with all typical simulated data set. The bottom row plots display the boxplots of the logc MSE values for, from left to right, AICc, GCV, CV, Cp, RCP and LRS. The numbers below the boxplots are the paired Wilcoxon test rankings.
Table 4: | Means of the averaged Wilcoxon test ranking values for the six selection methods means for n = 25, 50, 100 and 150 |
*Selection method has the best ranking |
Table 5: | Means of the averaged Wilcoxon test ranking values for the six selection methods means for n = 200, 350 and 400 |
*Selection method has the best ranking |
For 84 different simulation experiments, the averaged ranking values of the selection methods according to Wilcoxon tests are tabulated in Table 2 and 3.
According to the results in Table 2, for small sized samples
(for n = 25, 50, 100 and 150), GCV has had the best empirical performance for
all factors. In accordance with the overall Wilcoxon test rankings in Table
2, GCV have also displayed a good performance. As shown Table
2, because of
According to Table 3, for large sized samples (for n = 200, 350 and 400), AICc criterion has had the best empirical performance. It is shown that AICc and GCV and Cp criteria have shared a better performance for n = 400. According to the overall Wilcoxon test rankings in Table 3, AICc, GCV, Cp and LRS criteria can be ranked in terms of the performance. As shown in Table 3, generally, Cp and GCV gave the same results. This can be interpreted to follow the accepted view that GCV is asymptotically equal to GCV (Hastie and Tibshirani, 1990). That is to say that, for large sized samples, because of the effect of the replication of simulation, performance of the GCV and Cp is approximately equal. RCP has also produced the worst performance for large samples.
The scores in Table 4 and 5 are obtained by taking the means of the averaged Wilcoxon test ranking values tie with each of the selection methods in Table 2-3, respectively.
As shown in Table 4, according to the means of the small samples for all factors, GCV criterion has had the best empirical performance. When it is compared to the other criteria, Cp (RCP), AICc criteria have resulted in the worst performance.
According to the means of the large samples for all factors in Table 5, generally it is shown that AICc. has had the best empirical performance. GCV (Cp) and LRS criteria can be ranked in terms of the performance after than AICc. However, CV has produced the worst result.
RECOMMENDATIONS
Finally, by considering the simulation results and evaluations given in the above, the following suggestions have to be taken into account:
• | For both large and small samples, GCV are recommended as being the best selection criterion |
• | For especially large samples, the use AICc would seem to be more appropriate. As for small samples, we propose the implementation of GCV criterion |
• | For large samples, the implementation of AICc criterion, in addition to and GCV (Cp) and LRS criteria would be more beneficial. For small samples, GCV criterion should prove fruitful |
Naturally, the above recommended suggestions have to be considered with a fair amount of caution as they are only an appraisal based on simulation results.