INTRODUCTION
Smoothing spline method can be used to estimate the nonparametric regression
models. The main goal of this method is to estimate the function that minimizes
the penalized residual sum of squares. Basis of the penalized least squares
method is to add a penalty term to the residual sum of squares. In the light
of this information, the estimation of the unknown function depends on smoothing
parameter. Therefore, the determination of an optimum smoothing parameter in
the interval (0, ∞) was found to be an underlying complication. In the
literature, different selection methods are components of various studies for
an appropriate smoothing parameter. Indeed, to a considerable extent (Craven
and Wahba, 1979; Hardle, 1991; Hardle
et al., 1988; Wahba, 1990; Hurvich
et al., 1998; Eubank, 1999; Lee
and Solo, 1999; Hastie and Tibshirani, 1990; Schimek,
2000; Cantoni and Ronchetti, 2001; Ruppert
et al., 2003; Lee, 2003, 2004;
Kou, 2003) supplement on the selection of the smoothing
parameter.
The empirical performances of the selection methods used in selection of the
smoothing parameter are compared in this paper. Selection methods used in simulation
are an improved version of Akaike Information Criterion (AIC_{c}), Generalized
CrossValidation (GCV), CrossValidation (CV), Mallows’ C_{p} criterion,
risk estimation using classical pilots (RCP) and local risk estimation (LRS).
A simulation experiment is achieved to find out which selection methods are
the best in smoothing parameter selection. To throw light on this issue, the
samples differing in small and large sizes are secured by means of the above
mentioned simulation and moreover, six selection methods are evaluated.
This study is mainly concerned with the selection of smoothing parameter through
Monte Carlo simulation. Smoothing parameters play a crucial role in this procedure.
Krivobokova and Kauermann (2007) showed that using the
REML to estimate smoothing parameter outperforms other methods such as (generalized)
CV or Akaike criterion especially when the error correlation structure is misspecified.
Krivobokova et al. (2008) formulated a hierarchical
mixed model to estimate local smoothing parameter to achieve adaptive penalized
spline smoothing. Cao et al. (2010) discussed
different methods of choosing the important smoothing parameter and recommend
GCV as the choice for penalized spline smoothing parameter selection for both
computational efficiency and accuracy of the functional coefficient regression
models. Aydin and Memmedli (2011) recommended GCV and
REML as being good smoothing parameter selection criteria for small and medium
sized samples.
SMOOTHING SPLINE METHOD
Suppose observed are n pairs of measurements (x_{i}, y_{i}), i = 1, 2,..., n, satisfying the model:
where, fεC^{2} [a, b] is an unknown smooth function and (ε_{i})^{n}_{i = 1} are normal distributed random errors with zero mean and common variance σ^{2}.
The basic aim of the nonparametric regression is to estimate unknown function
fεC^{2} [a, b] (the class of all functions f with continuous first
and second derivatives) in model 1. Smoothing spline estimate of the f function
appears as a solution to the following minimization problem: Find
that minimizes the penalized residual sum of squares:
for prespecified value λ>0. The first term in Eq. 2 denotes the Residual Sum of the Squares (RSS) and it penalizes the lack of fit. The second term which is weighted by λ denotes the roughness penalty and it imposes a penalty on roughness. In other words, it penalizes the curvature of the function f. The λ in Eq. 2 is recognized to be the smoothing parameter. As λ varies from 0 to +∞, the solution varies from interpolation to a linear model. As λ→+∞, the roughness penalty dominates in (2) and the spline estimate is compelled to be a constant. As λ→0, the roughness penalty disappears in (2) and the spline estimation interpolates the data. Thus, the smoothing parameter λ plays a key role in controlling the tradeoff between the goodness of fit represented by:
and smoothness of the estimate measured by:
The solution based on smoothing spline for minimum problem in the Eq.
2 is known as a “natural cubic spline” with knots at x_{1},...,
x_{n}. From this point of view, a special structured spline interpolation
which depends on a chosen value λ develops into a suitable approach of
function f in model 1. Let f = (F (x_{1}),..., f (x_{n})) be
the vector of values of function f at the knot points x_{1},..., x_{n}.
The smoothing spline estimate
of this vector or the fitted values for data y = (y_{1},..., y_{n})^{T}
are projected by:
where,
is a natural cubic spline with knots at x_{1},..., x_{n} for
a fixed λ>0 and S_{λ} is a wellknown positivedefinite
(symmetrical) smoother matrix which depends on λ and the knot points x_{1},...,
x_{n} but not on y. Function ,
the estimator of function f is obtained by cubic spline interpolation that rests
on condition .
To gain better perspective on smoothing spline (Eubank, 1999;
Green and Silverman, 1994; Wahba, 1990)
state studied opinions.
SMOOTHING PARAMETER SELECTION METHODS
Although, smoothing spline estimator solves the problem of allowing fits with variable slope, a new dilemma emerges. In fact, it generates the determination of the appropriate value for the smoothing parameter λ for a given data set. The same value of λ is unlikely to work equally well with every data set. As such, the estimation methods have been introduced for the selection of smoothing parameter λ in Eq. 2. The positive value λ that minimizes any smoothing parameter selection methods is selected as an appropriate smoothing parameter.
Selection methods used in experiment: Various smoothing parameter selection methods are featured in the literature. Most of these suggested methods were implemented in our simulation study. Moreover, a selection criterion from previous studies in the literature to provide an effective performance was also introduced in this particular study. The selection criteria used in our simulation study are classified as:
Crossvalidation: The basic idea of CV is to disregard one of the points
gradually, to select the smoothing parameter λ that minimizes the residual
sum of squares and to estimate the squared residual for a smooth function at
x_{i} based on the remaining (n1) points. The CV score can be translated
as:
where,
is the fit (spline smoother) for n pairs of measurements
with smoothing parameter λ and
is the fit calculated by leaving out the ith data point and (S_{λ})_{ii}
is the ith diagonal element of smoother matrix S_{λ} (Wahba,
1990; Green and Silverman, 1994).
Using the approximations
and
signifies that:
(Hastie and Tibshirani, 1990).
Generalized crossvalidation: GCV is a modified form of the CV which
is a conventional method for choosing the smoothing parameter. The GCV score
which is constructed by analogy to CV score can be obtained from the ordinary
residuals by dividing by the factors 1(S_{λ})_{ii}. The
underlying design of GCV is to replace the factors 1(S_{λ})_{ii}
in Eq. 8 with the average score 1n^{1} tr (S_{λ}).
Thus, by summing the squared corrected residual and factor 1n^{1}
tr (S_{λ})^{2}, by the analogy ordinary crossvalidation,
the GCV score function can be procured as follow (Craven
and Wahba, 1979; Wahba, 1990):
Mallows’ C_{P} criterion: In the literature, C_{p}
criterion is referred to as an unbiased risk estimate (UBR). This type of estimate
was suggested by Mallows (1973) in the regression case
and applied to smoothing spline by Craven and Wahba (1979).
If σ^{2} is recognized, an unbiased estimate of the residual sum
of squares is provided by C_{p} criterion:
Unless σ^{2} is known, in practice an estimation for σ^{2} can be given by:
where,
is prechosen with any of the CV, GCV or AIC_{C} criteria (
is an estimate of λ) (Lee, 2003, 2004;
Wahba, 1990). According to Hastie and
Tibshirani (1990) and Ruppert et al. (2003),
GCV is approximately equal to C_{p}.
Improved Akaike information criterion: An improved version of a criterion
based on the classical Akaike Information Criterion (AIC), AIC_{c} criterion,
is used for choosing the smoothing parameter for nonparametric smoothers (Hurvich
et al., 1998). This improved criterion is defined as:
As can be seen from the Eq. 8, this criterion is easy to apply for the selection of smoothing parameter.
Risk estimation using classical pilots: Risk function measures the distance
between the actual regression function (f) and its estimation ().
Needless to say that, a good estimate must contain minimum risk. A direct computation
leads to the biasvariance decomposition for :
A clearcut explanation shows that .
Because the risk
is an unknown quantity, socalled risk is now estimated by computable quantity
.
The obtained expression for
is:
where,
and
are the appropriate pilot estimates for σ^{2} and f, respectively.
The pilot λ_{p} selected by classical methods is used for computation
of the pilot estimates.
Local risk estimation: The LRS method proposed by Lee
(2004), aims to select the
that minimizes the local risk
for the each knot points x_{i}. A direct computation leads to the biasvariance
decomposition for R_{λ} (x_{i}):
In the above equation, (S_{λ}f) (x_{i}) is the ith element
of vector S_{λ}f and S_{λ} (x_{i}) is the
ith diagonal element of the square matrix .
An estimator for R_{λ} (x_{i}) is firstly computed and
the
is selected in order to minimizes it. This process is repeated for all x_{i}’s
and at the end of the process a final mixed estimate for f is derived. The LRS
method can be practically performed with the following five steps (Lee,
2003):
• 
For a set of preselected smoothing parameters λ_{1}<...<_{m},
calculate the corresponding set of smoothing spline estimates: 
• 
Select the pilot value λ_{p} from AIC_{c} criterion
in Eq. 8 by using the elements in F 
• 
For λ_{p}, calculate the estimates
and
by using the Eq. 7 
• 
Substitute the pilots
and
into the expression
and obtain the estimates 
• 
For each x_{i}, find the
from F which minimizes
and the final estimate accept the appropriate values
for f (x_{i}) 
SIMULATION EXPERIMENT
This study was conducted to evaluate the performances of the nine selection methods mentioned above. The experimental setup in this paper is adopted from Professor Steve Marron. By using a program coded in MATLAB, we generated the samples sized n = 25, 50, 100, 150, 200, 350 and 400. The number of replication was 500 for each of the samples. For each simulated data sets, the Mean SquaredErrors (MSE) was used for evaluate the quality of any curve estimate . To find out if the difference between the MSE median values of any two selection methods is significant or not, the paired Wilcoxon tests were assessed. In this way, methods which complement the best smoothing parameter were determined by evaluating socalled selection methods.
Experimental setup: The experimental setup applied at this stage was designed to study the effects of the following three factors which vary an independent and effective approach: noise level, spatial variation, variance function.
The setup specification is listed Table 1. Simulation study was performed according to MATLAB program and the experimental setup was designed in the following framework:
• 
Totally three sets of numerical experiments were performed.
For each set of experiments, only one of the above three experimental factors
(e.g., noise level, degree of spatial variation and noise variance function)
has been altered while the remaining two have been kept unchanged 
• 
Within each set of experiments, the factor levels was modified four times
(i.e., r = 1, 2, 3, 4) to detect the effects of any experimental factor
in Table 1 
• 
To see the performance of the small and large samples of the selection
methods. For each factor level r within each set of experiments, we generated
7 different samples with sample sizes n = 25, 50, 100, 150, 200, 350 and
400 
• 
The number of replications was 500 for each of the 84 numerical experiments 
• 
We computed the appropriate smoothing spline estimators
in equality (3) by selecting the smoothing parameter λ which minimizes
the selection methods 
• 
We used the MSE values to evaluate
computed according to each of the selection criterion: 

.
(where f (x_{i}) is value at knots x_{i} of the appropriate
function f defined in Table 1) 
• 
Paired Wilcoxon test was applied to test whether MSE values was considered
as the performance measure of any two methods are significant or not 
• 
The factor levels indicated as r was changed four times (i.e., r = 1,
2, 3, 4) in order to detect the effects of any factor from three factors
in Table 1 
• 
By considering 3 factors, 4 factor levels and 7 samples, totally, 84 numerical
experiments were conducted 
Evaluations of the experimental results: For each simulated data set
used in the experiments, the MSE values were used in order to evaluate the quality
of any curve estimate . Paired Wilcoxon
tests were applied to test whether the difference between the median MSE values
of any two methods is significant or not. The significance level used was 5%.
The selection methods were also ranked as follows: If median MSE value of a
method is significantly less than the remaining five, it will be assigned a
rank 1. If median MSE value of a method is significantly larger than one but
less than the remaining four, it will be assigned a rank 2 and similarly for
ranks 39. Methods having nonsignificantly different median values will share
the same averaged rank, on the other hand, method or methods having the smallest
rank will be superior.
In this simulation study, because totally 84 different configurations are made, it is not possible to display here all these configurations. Therefore, only 12 different configurations are given in Fig. 13 for n = 150.
Table 1: 
Specification of the simulation setup 

n = 25, 50, 100, 150, 200, 350 and 400 (it was taken seven
different sample size) 

Fig. 1(ad): 
Simulation results correspond to the noise level factor for
n = 150, (a) 1 = 1, (b) 1 = 2, (c) 1 = 3 and (d) 1 = 4 

Fig. 2(ad): 
Simulation results correspond to the spatial variation factor
for n = 150, (a) 1 = 1, (b) 1 = 2, (c) 1 = 3 and (d) 1 = 4 

Fig. 3(ad): 
Simulation results correspond to the variance factor for n
= 150, (a) 1 = 1, (b) 1 = 2, (c) 1 = 3 and (d) 1 = 4 
Table 2: 
Averaged Wilcoxon test ranking values for the six selection
methods 

*Selection method has the best ranking 
Table 3: 
Averaged Wilcoxon test ranking values for the six selection
methods in large sample sizes 

*Selection method has the best ranking 
The head row plots of the Fig. 13 display the true regression function together with all typical simulated data set. The bottom row plots display the boxplots of the log_{c} MSE values for, from left to right, AIC_{c}, GCV, CV, Cp, RCP and LRS. The numbers below the boxplots are the paired Wilcoxon test rankings.
Table 4: 
Means of the averaged Wilcoxon test ranking values for the
six selection methods means for n = 25, 50, 100 and 150 

*Selection method has the best ranking 
Table 5: 
Means of the averaged Wilcoxon test ranking values for the
six selection methods means for n = 200, 350 and 400 

*Selection method has the best ranking 
For 84 different simulation experiments, the averaged ranking values of the selection methods according to Wilcoxon tests are tabulated in Table 2 and 3.
According to the results in Table 2, for small sized samples
(for n = 25, 50, 100 and 150), GCV has had the best empirical performance for
all factors. In accordance with the overall Wilcoxon test rankings in Table
2, GCV have also displayed a good performance. As shown Table
2, because of ,
C_{p} and RCP methods produced the same results under all experimental
factors. In this situation, for small sample sizes, for which reason the effects
of the replication of simulation, C_{p} is approximately equal to its
E (C_{p}}. For small samples, it is observed that CV has produced the
worst performance.
According to Table 3, for large sized samples (for n = 200,
350 and 400), AIC_{c }criterion has had the best empirical performance.
It is shown that AIC_{c} and GCV and C_{p} criteria have shared
a better performance for n = 400. According to the overall Wilcoxon test rankings
in Table 3, AIC_{c}, GCV, C_{p} and LRS criteria
can be ranked in terms of the performance. As shown in Table 3,
generally, C_{p} and GCV gave the same results. This can be interpreted
to follow the accepted view that GCV is asymptotically equal to GCV (Hastie
and Tibshirani, 1990). That is to say that, for large sized samples, because
of the effect of the replication of simulation, performance of the GCV and C_{p}
is approximately equal. RCP has also produced the worst performance for large
samples.
The scores in Table 4 and 5 are obtained
by taking the means of the averaged Wilcoxon test ranking values tie with each
of the selection methods in Table 23, respectively.
As shown in Table 4, according to the means of the small samples for all factors, GCV criterion has had the best empirical performance. When it is compared to the other criteria, C_{p} (RCP), AIC_{c} criteria have resulted in the worst performance.
According to the means of the large samples for all factors in Table 5, generally it is shown that AIC_{c.} has had the best empirical performance. GCV (C_{p}) and LRS criteria can be ranked in terms of the performance after than AIC_{c. }However, CV has produced the worst result.
RECOMMENDATIONS
Finally, by considering the simulation results and evaluations given in the above, the following suggestions have to be taken into account:
• 
For both large and small samples, GCV are recommended as being
the best selection criterion 
• 
For especially large samples, the use AIC_{c }would seem to be
more appropriate. As for small samples, we propose the implementation of
GCV criterion 
• 
For large samples, the implementation of AIC_{c} criterion, in
addition to and GCV (C_{p}) and LRS criteria would be more beneficial.
For small samples, GCV criterion should prove fruitful 
Naturally, the above recommended suggestions have to be considered with a fair amount of caution as they are only an appraisal based on simulation results.