Subscribe Now Subscribe Today
Research Article

Smoothing Parameter Selection Problem in Nonparametric Regression Based on Smoothing Spline: A Simulation Study

Dursun Aydin and M. Seref Tuzemen
Facebook Twitter Digg Reddit Linkedin StumbleUpon E-mail

This study presented a comparative study of the different smoothing parameter selection methods. The parameter selection problem has been examined in respect to a smoothing spline implementation in predicting nonparametric regression models. For this purpose, a simulation experiment has been carried out by using a program written in MATLAB 6.5. The simulation experiment provides a comparison of the six different selection methods. In this context, 500 replications have been carried out in simulation for sample sets with different sizes. Thus, the empirical performances of the six selection criteria have been investigated and the suitable selection criteria are obtained for an optimum parameter selection.

Related Articles in ASCI
Similar Articles in this Journal
Search in Google Scholar
View Citation
Report Citation

  How to cite this article:

Dursun Aydin and M. Seref Tuzemen, 2012. Smoothing Parameter Selection Problem in Nonparametric Regression Based on Smoothing Spline: A Simulation Study. Journal of Applied Sciences, 12: 636-644.

DOI: 10.3923/jas.2012.636.644

Received: January 29, 2012; Accepted: April 03, 2012; Published: June 19, 2012


Smoothing spline method can be used to estimate the nonparametric regression models. The main goal of this method is to estimate the function that minimizes the penalized residual sum of squares. Basis of the penalized least squares method is to add a penalty term to the residual sum of squares. In the light of this information, the estimation of the unknown function depends on smoothing parameter. Therefore, the determination of an optimum smoothing parameter in the interval (0, ∞) was found to be an underlying complication. In the literature, different selection methods are components of various studies for an appropriate smoothing parameter. Indeed, to a considerable extent (Craven and Wahba, 1979; Hardle, 1991; Hardle et al., 1988; Wahba, 1990; Hurvich et al., 1998; Eubank, 1999; Lee and Solo, 1999; Hastie and Tibshirani, 1990; Schimek, 2000; Cantoni and Ronchetti, 2001; Ruppert et al., 2003; Lee, 2003, 2004; Kou, 2003) supplement on the selection of the smoothing parameter.

The empirical performances of the selection methods used in selection of the smoothing parameter are compared in this paper. Selection methods used in simulation are an improved version of Akaike Information Criterion (AICc), Generalized Cross-Validation (GCV), Cross-Validation (CV), Mallows’ Cp criterion, risk estimation using classical pilots (RCP) and local risk estimation (LRS). A simulation experiment is achieved to find out which selection methods are the best in smoothing parameter selection. To throw light on this issue, the samples differing in small and large sizes are secured by means of the above mentioned simulation and moreover, six selection methods are evaluated.

This study is mainly concerned with the selection of smoothing parameter through Monte Carlo simulation. Smoothing parameters play a crucial role in this procedure. Krivobokova and Kauermann (2007) showed that using the REML to estimate smoothing parameter outperforms other methods such as (generalized) CV or Akaike criterion especially when the error correlation structure is misspecified. Krivobokova et al. (2008) formulated a hierarchical mixed model to estimate local smoothing parameter to achieve adaptive penalized spline smoothing. Cao et al. (2010) discussed different methods of choosing the important smoothing parameter and recommend GCV as the choice for penalized spline smoothing parameter selection for both computational efficiency and accuracy of the functional coefficient regression models. Aydin and Memmedli (2011) recommended GCV and REML as being good smoothing parameter selection criteria for small and medium sized samples.


Suppose observed are n pairs of measurements (xi, yi), i = 1, 2,..., n, satisfying the model:


where, fεC2 [a, b] is an unknown smooth function and (εi)ni = 1 are normal distributed random errors with zero mean and common variance σ2.

The basic aim of the nonparametric regression is to estimate unknown function fεC2 [a, b] (the class of all functions f with continuous first and second derivatives) in model 1. Smoothing spline estimate of the f function appears as a solution to the following minimization problem: Find that minimizes the penalized residual sum of squares:


for pre-specified value λ>0. The first term in Eq. 2 denotes the Residual Sum of the Squares (RSS) and it penalizes the lack of fit. The second term which is weighted by λ denotes the roughness penalty and it imposes a penalty on roughness. In other words, it penalizes the curvature of the function f. The λ in Eq. 2 is recognized to be the smoothing parameter. As λ varies from 0 to +∞, the solution varies from interpolation to a linear model. As λ→+∞, the roughness penalty dominates in (2) and the spline estimate is compelled to be a constant. As λ→0, the roughness penalty disappears in (2) and the spline estimation interpolates the data. Thus, the smoothing parameter λ plays a key role in controlling the trade-off between the goodness of fit represented by:

and smoothness of the estimate measured by:

The solution based on smoothing spline for minimum problem in the Eq. 2 is known as a “natural cubic spline” with knots at x1,..., xn. From this point of view, a special structured spline interpolation which depends on a chosen value λ develops into a suitable approach of function f in model 1. Let f = (F (x1),..., f (xn)) be the vector of values of function f at the knot points x1,..., xn. The smoothing spline estimate of this vector or the fitted values for data y = (y1,..., yn)T are projected by:


where, is a natural cubic spline with knots at x1,..., xn for a fixed λ>0 and Sλ is a well-known positive-definite (symmetrical) smoother matrix which depends on λ and the knot points x1,..., xn but not on y. Function , the estimator of function f is obtained by cubic spline interpolation that rests on condition . To gain better perspective on smoothing spline (Eubank, 1999; Green and Silverman, 1994; Wahba, 1990) state studied opinions.


Although, smoothing spline estimator solves the problem of allowing fits with variable slope, a new dilemma emerges. In fact, it generates the determination of the appropriate value for the smoothing parameter λ for a given data set. The same value of λ is unlikely to work equally well with every data set. As such, the estimation methods have been introduced for the selection of smoothing parameter λ in Eq. 2. The positive value λ that minimizes any smoothing parameter selection methods is selected as an appropriate smoothing parameter.

Selection methods used in experiment: Various smoothing parameter selection methods are featured in the literature. Most of these suggested methods were implemented in our simulation study. Moreover, a selection criterion from previous studies in the literature to provide an effective performance was also introduced in this particular study. The selection criteria used in our simulation study are classified as:

Cross-validation: The basic idea of CV is to disregard one of the points gradually, to select the smoothing parameter λ that minimizes the residual sum of squares and to estimate the squared residual for a smooth function at xi based on the remaining (n-1) points. The CV score can be translated as:


where, is the fit (spline smoother) for n pairs of measurements with smoothing parameter λ and is the fit calculated by leaving out the ith data point and (Sλ)ii is the ith diagonal element of smoother matrix Sλ (Wahba, 1990; Green and Silverman, 1994).

Using the approximations and signifies that:

(Hastie and Tibshirani, 1990).

Generalized cross-validation: GCV is a modified form of the CV which is a conventional method for choosing the smoothing parameter. The GCV score which is constructed by analogy to CV score can be obtained from the ordinary residuals by dividing by the factors 1-(Sλ)ii. The underlying design of GCV is to replace the factors 1-(Sλ)ii in Eq. 8 with the average score 1-n-1 tr (Sλ). Thus, by summing the squared corrected residual and factor 1-n-1 tr (Sλ)2, by the analogy ordinary cross-validation, the GCV score function can be procured as follow (Craven and Wahba, 1979; Wahba, 1990):


Mallows’ CP criterion: In the literature, Cp criterion is referred to as an unbiased risk estimate (UBR). This type of estimate was suggested by Mallows (1973) in the regression case and applied to smoothing spline by Craven and Wahba (1979). If σ2 is recognized, an unbiased estimate of the residual sum of squares is provided by Cp criterion:


Unless σ2 is known, in practice an estimation for σ2 can be given by:


where, is pre-chosen with any of the CV, GCV or AICC criteria ( is an estimate of λ) (Lee, 2003, 2004; Wahba, 1990). According to Hastie and Tibshirani (1990) and Ruppert et al. (2003), GCV is approximately equal to Cp.

Improved Akaike information criterion: An improved version of a criterion based on the classical Akaike Information Criterion (AIC), AICc criterion, is used for choosing the smoothing parameter for nonparametric smoothers (Hurvich et al., 1998). This improved criterion is defined as:


As can be seen from the Eq. 8, this criterion is easy to apply for the selection of smoothing parameter.

Risk estimation using classical pilots: Risk function measures the distance between the actual regression function (f) and its estimation (). Needless to say that, a good estimate must contain minimum risk. A direct computation leads to the bias-variance decomposition for :


A clear-cut explanation shows that . Because the risk is an unknown quantity, so-called risk is now estimated by computable quantity . The obtained expression for is:


where, and are the appropriate pilot estimates for σ2 and f, respectively. The pilot λp selected by classical methods is used for computation of the pilot estimates.

Local risk estimation: The LRS method proposed by Lee (2004), aims to select the that minimizes the local risk for the each knot points xi. A direct computation leads to the bias-variance decomposition for Rλ (xi):


In the above equation, (Sλf) (xi) is the ith element of vector Sλf and Sλ (xi) is the ith diagonal element of the square matrix . An estimator for Rλ (xi) is firstly computed and the is selected in order to minimizes it. This process is repeated for all xi’s and at the end of the process a final mixed estimate for f is derived. The LRS method can be practically performed with the following five steps (Lee, 2003):

For a set of pre-selected smoothing parameters λ1<...<m, calculate the corresponding set of smoothing spline estimates:
Select the pilot value λp from AICc criterion in Eq. 8 by using the elements in F
For λp, calculate the estimates and by using the Eq. 7
Substitute the pilots and into the expression and obtain the estimates
For each xi, find the from F which minimizes and the final estimate accept the appropriate values for f (xi)


This study was conducted to evaluate the performances of the nine selection methods mentioned above. The experimental setup in this paper is adopted from Professor Steve Marron. By using a program coded in MATLAB, we generated the samples sized n = 25, 50, 100, 150, 200, 350 and 400. The number of replication was 500 for each of the samples. For each simulated data sets, the Mean Squared-Errors (MSE) was used for evaluate the quality of any curve estimate . To find out if the difference between the MSE median values of any two selection methods is significant or not, the paired Wilcoxon tests were assessed. In this way, methods which complement the best smoothing parameter were determined by evaluating so-called selection methods.

Experimental setup: The experimental setup applied at this stage was designed to study the effects of the following three factors which vary an independent and effective approach: noise level, spatial variation, variance function.

The setup specification is listed Table 1. Simulation study was performed according to MATLAB program and the experimental setup was designed in the following framework:

Totally three sets of numerical experiments were performed. For each set of experiments, only one of the above three experimental factors (e.g., noise level, degree of spatial variation and noise variance function) has been altered while the remaining two have been kept unchanged
Within each set of experiments, the factor levels was modified four times (i.e., r = 1, 2, 3, 4) to detect the effects of any experimental factor in Table 1
To see the performance of the small and large samples of the selection methods. For each factor level r within each set of experiments, we generated 7 different samples with sample sizes n = 25, 50, 100, 150, 200, 350 and 400
The number of replications was 500 for each of the 84 numerical experiments
We computed the appropriate smoothing spline estimators in equality (3) by selecting the smoothing parameter λ which minimizes the selection methods
We used the MSE values to evaluate computed according to each of the selection criterion:

  . (where f (xi) is value at knots xi of the appropriate function f defined in Table 1)
Paired Wilcoxon test was applied to test whether MSE values was considered as the performance measure of any two methods are significant or not
The factor levels indicated as r was changed four times (i.e., r = 1, 2, 3, 4) in order to detect the effects of any factor from three factors in Table 1
By considering 3 factors, 4 factor levels and 7 samples, totally, 84 numerical experiments were conducted

Evaluations of the experimental results: For each simulated data set used in the experiments, the MSE values were used in order to evaluate the quality of any curve estimate . Paired Wilcoxon tests were applied to test whether the difference between the median MSE values of any two methods is significant or not. The significance level used was 5%. The selection methods were also ranked as follows: If median MSE value of a method is significantly less than the remaining five, it will be assigned a rank 1. If median MSE value of a method is significantly larger than one but less than the remaining four, it will be assigned a rank 2 and similarly for ranks 3-9. Methods having non-significantly different median values will share the same averaged rank, on the other hand, method or methods having the smallest rank will be superior.

In this simulation study, because totally 84 different configurations are made, it is not possible to display here all these configurations. Therefore, only 12 different configurations are given in Fig. 1-3 for n = 150.

Table 1: Specification of the simulation setup
n = 25, 50, 100, 150, 200, 350 and 400 (it was taken seven different sample size)

Fig. 1(a-d): Simulation results correspond to the noise level factor for n = 150, (a) 1 = 1, (b) 1 = 2, (c) 1 = 3 and (d) 1 = 4

Fig. 2(a-d): Simulation results correspond to the spatial variation factor for n = 150, (a) 1 = 1, (b) 1 = 2, (c) 1 = 3 and (d) 1 = 4

Fig. 3(a-d): Simulation results correspond to the variance factor for n = 150, (a) 1 = 1, (b) 1 = 2, (c) 1 = 3 and (d) 1 = 4

Table 2: Averaged Wilcoxon test ranking values for the six selection methods
*Selection method has the best ranking

Table 3: Averaged Wilcoxon test ranking values for the six selection methods in large sample sizes
*Selection method has the best ranking

The head row plots of the Fig. 1-3 display the true regression function together with all typical simulated data set. The bottom row plots display the boxplots of the logc MSE values for, from left to right, AICc, GCV, CV, Cp, RCP and LRS. The numbers below the boxplots are the paired Wilcoxon test rankings.

Table 4: Means of the averaged Wilcoxon test ranking values for the six selection methods means for n = 25, 50, 100 and 150
*Selection method has the best ranking

Table 5: Means of the averaged Wilcoxon test ranking values for the six selection methods means for n = 200, 350 and 400
*Selection method has the best ranking

For 84 different simulation experiments, the averaged ranking values of the selection methods according to Wilcoxon tests are tabulated in Table 2 and 3.

According to the results in Table 2, for small sized samples (for n = 25, 50, 100 and 150), GCV has had the best empirical performance for all factors. In accordance with the overall Wilcoxon test rankings in Table 2, GCV have also displayed a good performance. As shown Table 2, because of , Cp and RCP methods produced the same results under all experimental factors. In this situation, for small sample sizes, for which reason the effects of the replication of simulation, Cp is approximately equal to its E (Cp}. For small samples, it is observed that CV has produced the worst performance.

According to Table 3, for large sized samples (for n = 200, 350 and 400), AICc criterion has had the best empirical performance. It is shown that AICc and GCV and Cp criteria have shared a better performance for n = 400. According to the overall Wilcoxon test rankings in Table 3, AICc, GCV, Cp and LRS criteria can be ranked in terms of the performance. As shown in Table 3, generally, Cp and GCV gave the same results. This can be interpreted to follow the accepted view that GCV is asymptotically equal to GCV (Hastie and Tibshirani, 1990). That is to say that, for large sized samples, because of the effect of the replication of simulation, performance of the GCV and Cp is approximately equal. RCP has also produced the worst performance for large samples.

The scores in Table 4 and 5 are obtained by taking the means of the averaged Wilcoxon test ranking values tie with each of the selection methods in Table 2-3, respectively.

As shown in Table 4, according to the means of the small samples for all factors, GCV criterion has had the best empirical performance. When it is compared to the other criteria, Cp (RCP), AICc criteria have resulted in the worst performance.

According to the means of the large samples for all factors in Table 5, generally it is shown that AICc. has had the best empirical performance. GCV (Cp) and LRS criteria can be ranked in terms of the performance after than AICc. However, CV has produced the worst result.


Finally, by considering the simulation results and evaluations given in the above, the following suggestions have to be taken into account:

For both large and small samples, GCV are recommended as being the best selection criterion
For especially large samples, the use AICc would seem to be more appropriate. As for small samples, we propose the implementation of GCV criterion
For large samples, the implementation of AICc criterion, in addition to and GCV (Cp) and LRS criteria would be more beneficial. For small samples, GCV criterion should prove fruitful

Naturally, the above recommended suggestions have to be considered with a fair amount of caution as they are only an appraisal based on simulation results.

Aydin, D. and M. Memmedli, 2011. Optimum smoothing parameter selection for penalized least squares in form of linear mixed effect models. Optimization: J. Math. Program. Oper. Res., 10.1080/02331934.2011.574698

Cantoni, E. and E. Ronchetti, 2001. Resistant selection of the smoothing parameter for smoothing splines. Stat. Comput., 11: 141-146.
CrossRef  |  Direct Link  |  

Cao, Y., H. Lin, T.Z. Wu and Y. Yu, 2010. Penalized spline estimation for functional coefficient regression models. Comput. Stat. Data Anal., 54: 891-905.
CrossRef  |  Direct Link  |  

Craven, P. and G. Wahba, 1979. Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math., 31: 377-403.
CrossRef  |  

Eubank, R.L., 1999. Nonparametric Regression and Smoothing Spline. 2nd Edn., Marcel Dekker Inc., New York, ISBN-13: 978-0824793371.

Green, P.J. and B.W. Silverman, 1994. Nonparametric Regression and Generalized Linear Models 1st Edn., Chapman and Hall, London.

Hardle, W., 1991. Applied Nonparametric Regression. 1st Edn., Cambridge University Press, Cambridge.

Hardle, W., P. Hall and J.S. Marron, 1988. How far are automatically chosen regression smoothing parameters from their optimum?. J. Am. Stat. Assoc., 83: 86-89.
Direct Link  |  

Hastie, T. and R.J. Tibshirani, 1990. Generalized Additive Models. Chapman and Hall, London.

Hurvich, C.M., J.S. Simonoff and C.L. Tsai, 1998. Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion. J. R. Stat. Soc. Ser. B, 60: 271-293.
Direct Link  |  

Kou, S.C., 2003. On the efficiency of selection criteria in spline regression. Probab. Theory Related Fields, 127: 153-176.
CrossRef  |  Direct Link  |  

Krivobokova, T. and G. Kauermann, 2007. A note on penalized spline smoothing with correlated errors. J. Am. Stat. Assoc., 102: 1328-1337.
CrossRef  |  Direct Link  |  

Krivobokova, T., C.M. Crainiceanu and G. Kauermann, 2008. Fast adaptive penalized splines. J. Comput. Graphical Stat., 17: 1-20.
CrossRef  |  Direct Link  |  

Lee, T.C.M. and V. Solo, 1999. Bandwidth selection for local linear regression: A simulation study. Comput. Stat., 14: 515-532.
Direct Link  |  

Lee, T.C.M., 2003. Smoothing parameter selection for smoothing splines: A simulation study. Comput. Stat. Data Anal., 42: 139-148.
CrossRef  |  Direct Link  |  

Lee, T.C.M., 2004. Improved smoothing spline regression by combining estimates of different smoothness. Stat. Probab. Lett., 67: 133-140.
CrossRef  |  Direct Link  |  

Mallows, C.L., 1973. Some comments on Cp. Technometrics, 15: 661-676.
Direct Link  |  

Ruppert, D., M.P. Wand and R.J. Carrol, 2003. Semi-Parametric Regression. 1st Edn., Cambridge University Press, Cambridge.

Schimek, M.G., 2000. Smoothing and Regression: Approaches, Computation and Application. John Wiley and Sons Inc., USA., ISBN-13: 9780471179467, pp: 19-39.

Wahba, G., 1990. Spline Models for Observational Data. 1st Edn., SIAM, Philadelphia, Pennsylvania, USA., ISBN: 10: 0898712440, Pages: 180.

©  2020 Science Alert. All Rights Reserved