Smoothing Parameter Selection Problem in Nonparametric Regression Based on Smoothing Spline: A Simulation Study

Aydin, Dursun; Tuzemen, M. Seref

ABSTRACT

This study presented a comparative study of the different smoothing parameter selection methods. The parameter selection problem has been examined in respect to a smoothing spline implementation in predicting nonparametric regression models. For this purpose, a simulation experiment has been carried out by using a program written in MATLAB 6.5. The simulation experiment provides a comparison of the six different selection methods. In this context, 500 replications have been carried out in simulation for sample sets with different sizes. Thus, the empirical performances of the six selection criteria have been investigated and the suitable selection criteria are obtained for an optimum parameter selection.

PDF Abstract XML References Citation

INTRODUCTION

Smoothing spline method can be used to estimate the nonparametric regression models. The main goal of this method is to estimate the function that minimizes the penalized residual sum of squares. Basis of the penalized least squares method is to add a penalty term to the residual sum of squares. In the light of this information, the estimation of the unknown function depends on smoothing parameter. Therefore, the determination of an optimum smoothing parameter in the interval (0, ∞) was found to be an underlying complication. In the literature, different selection methods are components of various studies for an appropriate smoothing parameter. Indeed, to a considerable extent (Craven and Wahba, 1979; Hardle, 1991; Hardle et al., 1988; Wahba, 1990; Hurvich et al., 1998; Eubank, 1999; Lee and Solo, 1999; Hastie and Tibshirani, 1990; Schimek, 2000; Cantoni and Ronchetti, 2001; Ruppert et al., 2003; Lee, 2003, 2004; Kou, 2003) supplement on the selection of the smoothing parameter.

The empirical performances of the selection methods used in selection of the smoothing parameter are compared in this paper. Selection methods used in simulation are an improved version of Akaike Information Criterion (AIC_c), Generalized Cross-Validation (GCV), Cross-Validation (CV), Mallows’ C_p criterion, risk estimation using classical pilots (RCP) and local risk estimation (LRS). A simulation experiment is achieved to find out which selection methods are the best in smoothing parameter selection. To throw light on this issue, the samples differing in small and large sizes are secured by means of the above mentioned simulation and moreover, six selection methods are evaluated.

This study is mainly concerned with the selection of smoothing parameter through Monte Carlo simulation. Smoothing parameters play a crucial role in this procedure. Krivobokova and Kauermann (2007) showed that using the REML to estimate smoothing parameter outperforms other methods such as (generalized) CV or Akaike criterion especially when the error correlation structure is misspecified. Krivobokova et al. (2008) formulated a hierarchical mixed model to estimate local smoothing parameter to achieve adaptive penalized spline smoothing. Cao et al. (2010) discussed different methods of choosing the important smoothing parameter and recommend GCV as the choice for penalized spline smoothing parameter selection for both computational efficiency and accuracy of the functional coefficient regression models. Aydin and Memmedli (2011) recommended GCV and REML as being good smoothing parameter selection criteria for small and medium sized samples.

SMOOTHING SPLINE METHOD

Suppose observed are n pairs of measurements (x_i, y_i), i = 1, 2,..., n, satisfying the model:

(1)

where, fεC² [a, b] is an unknown smooth function and (ε_i)ⁿ_{i = 1} are normal distributed random errors with zero mean and common variance σ².

The basic aim of the nonparametric regression is to estimate unknown function fεC² [a, b] (the class of all functions f with continuous first and second derivatives) in model 1. Smoothing spline estimate of the f function appears as a solution to the following minimization problem: Find that minimizes the penalized residual sum of squares:

(2)

for pre-specified value λ>0. The first term in Eq. 2 denotes the Residual Sum of the Squares (RSS) and it penalizes the lack of fit. The second term which is weighted by λ denotes the roughness penalty and it imposes a penalty on roughness. In other words, it penalizes the curvature of the function f. The λ in Eq. 2 is recognized to be the smoothing parameter. As λ varies from 0 to +∞, the solution varies from interpolation to a linear model. As λ→+∞, the roughness penalty dominates in (2) and the spline estimate is compelled to be a constant. As λ→0, the roughness penalty disappears in (2) and the spline estimation interpolates the data. Thus, the smoothing parameter λ plays a key role in controlling the trade-off between the goodness of fit represented by:

and smoothness of the estimate measured by:

The solution based on smoothing spline for minimum problem in the Eq. 2 is known as a “natural cubic spline” with knots at x₁,..., x_n. From this point of view, a special structured spline interpolation which depends on a chosen value λ develops into a suitable approach of function f in model 1. Let f = (F (x₁),..., f (x_n)) be the vector of values of function f at the knot points x₁,..., x_n. The smoothing spline estimate of this vector or the fitted values for data y = (y₁,..., y_n)^T are projected by:

Image for - Smoothing Parameter Selection Problem in Nonparametric Regression Based on Smoothing Spline: A Simulation Study

(3)

where, is a natural cubic spline with knots at x₁,..., x_n for a fixed λ>0 and S_λ is a well-known positive-definite (symmetrical) smoother matrix which depends on λ and the knot points x₁,..., x_n but not on y. Function , the estimator of function f is obtained by cubic spline interpolation that rests on condition . To gain better perspective on smoothing spline (Eubank, 1999; Green and Silverman, 1994; Wahba, 1990) state studied opinions.

SMOOTHING PARAMETER SELECTION METHODS

Although, smoothing spline estimator solves the problem of allowing fits with variable slope, a new dilemma emerges. In fact, it generates the determination of the appropriate value for the smoothing parameter λ for a given data set. The same value of λ is unlikely to work equally well with every data set. As such, the estimation methods have been introduced for the selection of smoothing parameter λ in Eq. 2. The positive value λ that minimizes any smoothing parameter selection methods is selected as an appropriate smoothing parameter.

Selection methods used in experiment: Various smoothing parameter selection methods are featured in the literature. Most of these suggested methods were implemented in our simulation study. Moreover, a selection criterion from previous studies in the literature to provide an effective performance was also introduced in this particular study. The selection criteria used in our simulation study are classified as:

Cross-validation: The basic idea of CV is to disregard one of the points gradually, to select the smoothing parameter λ that minimizes the residual sum of squares and to estimate the squared residual for a smooth function at x_i based on the remaining (n-1) points. The CV score can be translated as:

(4)

where, is the fit (spline smoother) for n pairs of measurements with smoothing parameter λ and is the fit calculated by leaving out the ith data point and (S_λ)_ii is the ith diagonal element of smoother matrix S_λ (Wahba, 1990; Green and Silverman, 1994).

Using the approximations and signifies that:

(Hastie and Tibshirani, 1990).

Generalized cross-validation: GCV is a modified form of the CV which is a conventional method for choosing the smoothing parameter. The GCV score which is constructed by analogy to CV score can be obtained from the ordinary residuals by dividing by the factors 1-(S_λ)_ii. The underlying design of GCV is to replace the factors 1-(S_λ)_ii in Eq. 8 with the average score 1-n^-1 tr (S_λ). Thus, by summing the squared corrected residual and factor 1-n^-1 tr (S_λ)², by the analogy ordinary cross-validation, the GCV score function can be procured as follow (Craven and Wahba, 1979; Wahba, 1990):

(5)

Mallows’ C_P criterion: In the literature, C_p criterion is referred to as an unbiased risk estimate (UBR). This type of estimate was suggested by Mallows (1973) in the regression case and applied to smoothing spline by Craven and Wahba (1979). If σ² is recognized, an unbiased estimate of the residual sum of squares is provided by C_p criterion:

(6)

Unless σ² is known, in practice an estimation for σ² can be given by:

(7)

where, is pre-chosen with any of the CV, GCV or AIC_C criteria ( is an estimate of λ) (Lee, 2003, 2004; Wahba, 1990). According to Hastie and Tibshirani (1990) and Ruppert et al. (2003), GCV is approximately equal to C_p.

Improved Akaike information criterion: An improved version of a criterion based on the classical Akaike Information Criterion (AIC), AIC_c criterion, is used for choosing the smoothing parameter for nonparametric smoothers (Hurvich et al., 1998). This improved criterion is defined as:

(8)

As can be seen from the Eq. 8, this criterion is easy to apply for the selection of smoothing parameter.

Risk estimation using classical pilots: Risk function measures the distance between the actual regression function (f) and its estimation (). Needless to say that, a good estimate must contain minimum risk. A direct computation leads to the bias-variance decomposition for :

(9)

A clear-cut explanation shows that . Because the risk is an unknown quantity, so-called risk is now estimated by computable quantity . The obtained expression for is:

(10)

where, and are the appropriate pilot estimates for σ² and f, respectively. The pilot λ_p selected by classical methods is used for computation of the pilot estimates.

Local risk estimation: The LRS method proposed by Lee (2004), aims to select the that minimizes the local risk for the each knot points x_i. A direct computation leads to the bias-variance decomposition for R_λ (x_i):

(11)

In the above equation, (S_λf) (x_i) is the ith element of vector S_λf and S_λ (x_i) is the ith diagonal element of the square matrix . An estimator for R_λ (x_i) is firstly computed and the is selected in order to minimizes it. This process is repeated for all x_i’s and at the end of the process a final mixed estimate for f is derived. The LRS method can be practically performed with the following five steps (Lee, 2003):

•	For a set of pre-selected smoothing parameters λ₁<...<_m, calculate the corresponding set of smoothing spline estimates:
•	Select the pilot value λ_p from AIC_c criterion in Eq. 8 by using the elements in F
•	For λ_p, calculate the estimates and by using the Eq. 7
•	Substitute the pilots and into the expression and obtain the estimates
•	For each x_i, find the from F which minimizes and the final estimate accept the appropriate values for f (x_i)

SIMULATION EXPERIMENT

This study was conducted to evaluate the performances of the nine selection methods mentioned above. The experimental setup in this paper is adopted from Professor Steve Marron. By using a program coded in MATLAB, we generated the samples sized n = 25, 50, 100, 150, 200, 350 and 400. The number of replication was 500 for each of the samples. For each simulated data sets, the Mean Squared-Errors (MSE) was used for evaluate the quality of any curve estimate . To find out if the difference between the MSE median values of any two selection methods is significant or not, the paired Wilcoxon tests were assessed. In this way, methods which complement the best smoothing parameter were determined by evaluating so-called selection methods.

Experimental setup: The experimental setup applied at this stage was designed to study the effects of the following three factors which vary an independent and effective approach: noise level, spatial variation, variance function.

The setup specification is listed Table 1. Simulation study was performed according to MATLAB program and the experimental setup was designed in the following framework:

•	Totally three sets of numerical experiments were performed. For each set of experiments, only one of the above three experimental factors (e.g., noise level, degree of spatial variation and noise variance function) has been altered while the remaining two have been kept unchanged
•	Within each set of experiments, the factor levels was modified four times (i.e., r = 1, 2, 3, 4) to detect the effects of any experimental factor in Table 1
•	To see the performance of the small and large samples of the selection methods. For each factor level r within each set of experiments, we generated 7 different samples with sample sizes n = 25, 50, 100, 150, 200, 350 and 400
•	The number of replications was 500 for each of the 84 numerical experiments
•	We computed the appropriate smoothing spline estimators in equality (3) by selecting the smoothing parameter λ which minimizes the selection methods
•	We used the MSE values to evaluate computed according to each of the selection criterion:

	. (where f (x_i) is value at knots x_i of the appropriate function f defined in Table 1)
•	Paired Wilcoxon test was applied to test whether MSE values was considered as the performance measure of any two methods are significant or not
•	The factor levels indicated as r was changed four times (i.e., r = 1, 2, 3, 4) in order to detect the effects of any factor from three factors in Table 1
•	By considering 3 factors, 4 factor levels and 7 samples, totally, 84 numerical experiments were conducted

Evaluations of the experimental results: For each simulated data set used in the experiments, the MSE values were used in order to evaluate the quality of any curve estimate . Paired Wilcoxon tests were applied to test whether the difference between the median MSE values of any two methods is significant or not. The significance level used was 5%. The selection methods were also ranked as follows: If median MSE value of a method is significantly less than the remaining five, it will be assigned a rank 1. If median MSE value of a method is significantly larger than one but less than the remaining four, it will be assigned a rank 2 and similarly for ranks 3-9. Methods having non-significantly different median values will share the same averaged rank, on the other hand, method or methods having the smallest rank will be superior.

In this simulation study, because totally 84 different configurations are made, it is not possible to display here all these configurations. Therefore, only 12 different configurations are given in Fig. 1-3 for n = 150.

Table 1:	Specification of the simulation setup

n = 25, 50, 100, 150, 200, 350 and 400 (it was taken seven different sample size)


Fig. 1(a-d):	Simulation results correspond to the noise level factor for n = 150, (a) 1 = 1, (b) 1 = 2, (c) 1 = 3 and (d) 1 = 4


Fig. 2(a-d):	Simulation results correspond to the spatial variation factor for n = 150, (a) 1 = 1, (b) 1 = 2, (c) 1 = 3 and (d) 1 = 4


Fig. 3(a-d):	Simulation results correspond to the variance factor for n = 150, (a) 1 = 1, (b) 1 = 2, (c) 1 = 3 and (d) 1 = 4

Table 2:	Averaged Wilcoxon test ranking values for the six selection methods

*Selection method has the best ranking

Table 3:	Averaged Wilcoxon test ranking values for the six selection methods in large sample sizes

*Selection method has the best ranking

The head row plots of the Fig. 1-3 display the true regression function together with all typical simulated data set. The bottom row plots display the boxplots of the log_c MSE values for, from left to right, AIC_c, GCV, CV, Cp, RCP and LRS. The numbers below the boxplots are the paired Wilcoxon test rankings.

Table 4:	Means of the averaged Wilcoxon test ranking values for the six selection methods means for n = 25, 50, 100 and 150

*Selection method has the best ranking

Table 5:	Means of the averaged Wilcoxon test ranking values for the six selection methods means for n = 200, 350 and 400

*Selection method has the best ranking

For 84 different simulation experiments, the averaged ranking values of the selection methods according to Wilcoxon tests are tabulated in Table 2 and 3.

According to the results in Table 2, for small sized samples (for n = 25, 50, 100 and 150), GCV has had the best empirical performance for all factors. In accordance with the overall Wilcoxon test rankings in Table 2, GCV have also displayed a good performance. As shown Table 2, because of , C_p and RCP methods produced the same results under all experimental factors. In this situation, for small sample sizes, for which reason the effects of the replication of simulation, C_p is approximately equal to its E (C_p}. For small samples, it is observed that CV has produced the worst performance.

According to Table 3, for large sized samples (for n = 200, 350 and 400), AIC_ccriterion has had the best empirical performance. It is shown that AIC_c and GCV and C_p criteria have shared a better performance for n = 400. According to the overall Wilcoxon test rankings in Table 3, AIC_c, GCV, C_p and LRS criteria can be ranked in terms of the performance. As shown in Table 3, generally, C_p and GCV gave the same results. This can be interpreted to follow the accepted view that GCV is asymptotically equal to GCV (Hastie and Tibshirani, 1990). That is to say that, for large sized samples, because of the effect of the replication of simulation, performance of the GCV and C_p is approximately equal. RCP has also produced the worst performance for large samples.

The scores in Table 4 and 5 are obtained by taking the means of the averaged Wilcoxon test ranking values tie with each of the selection methods in Table 2-3, respectively.

As shown in Table 4, according to the means of the small samples for all factors, GCV criterion has had the best empirical performance. When it is compared to the other criteria, C_p (RCP), AIC_c criteria have resulted in the worst performance.

According to the means of the large samples for all factors in Table 5, generally it is shown that AIC_c. has had the best empirical performance. GCV (C_p) and LRS criteria can be ranked in terms of the performance after than AIC_c.However, CV has produced the worst result.

RECOMMENDATIONS

Finally, by considering the simulation results and evaluations given in the above, the following suggestions have to be taken into account:

•	For both large and small samples, GCV are recommended as being the best selection criterion
•	For especially large samples, the use AIC_cwould seem to be more appropriate. As for small samples, we propose the implementation of GCV criterion
•	For large samples, the implementation of AIC_c criterion, in addition to and GCV (C_p) and LRS criteria would be more beneficial. For small samples, GCV criterion should prove fruitful

Naturally, the above recommended suggestions have to be considered with a fair amount of caution as they are only an appraisal based on simulation results.

REFERENCES

Aydin, D. and M. Memmedli, 2011. Optimum smoothing parameter selection for penalized least squares in form of linear mixed effect models. Optimization: J. Math. Program. Oper. Res.
CrossRef Direct Link
Cantoni, E. and E. Ronchetti, 2001. Resistant selection of the smoothing parameter for smoothing splines. Stat. Comput., 11: 141-146.
CrossRef Direct Link
Craven, P. and G. Wahba, 1979. Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math., 31: 377-403.
CrossRef
Eubank, R.L., 1999. Nonparametric Regression and Smoothing Spline. 2nd Edn., Marcel Dekker Inc., New York, ISBN-13: 978-0824793371.
Green, P.J. and B.W. Silverman, 1994. Nonparametric Regression and Generalized Linear Models 1st Edn., Chapman and Hall, London.
Hardle, W., 1991. Applied Nonparametric Regression. 1st Edn., Cambridge University Press, Cambridge.
Hardle, W., P. Hall and J.S. Marron, 1988. How far are automatically chosen regression smoothing parameters from their optimum?. J. Am. Stat. Assoc., 83: 86-89.
Direct Link
Hastie, T. and R.J. Tibshirani, 1990. Generalized Additive Models. Chapman and Hall, London.
Direct Link
Hurvich, C.M., J.S. Simonoff and C.L. Tsai, 1998. Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion. J. R. Stat. Soc. Ser. B, 60: 271-293.
Direct Link
Kou, S.C., 2003. On the efficiency of selection criteria in spline regression. Probab. Theory Related Fields, 127: 153-176.
CrossRef Direct Link
Krivobokova, T. and G. Kauermann, 2007. A note on penalized spline smoothing with correlated errors. J. Am. Stat. Assoc., 102: 1328-1337.
CrossRef Direct Link
Krivobokova, T., C.M. Crainiceanu and G. Kauermann, 2008. Fast adaptive penalized splines. J. Comput. Graphical Stat., 17: 1-20.
CrossRef Direct Link
Lee, T.C.M. and V. Solo, 1999. Bandwidth selection for local linear regression: A simulation study. Comput. Stat., 14: 515-532.
Direct Link
Lee, T.C.M., 2004. Improved smoothing spline regression by combining estimates of different smoothness. Stat. Probab. Lett., 67: 133-140.
CrossRef Direct Link
Lee, T.C.M., 2003. Smoothing parameter selection for smoothing splines: A simulation study. Comput. Stat. Data Anal., 42: 139-148.
CrossRef Direct Link
Mallows, C.L., 1973. Some comments on Cp. Technometrics, 15: 661-676.
Direct Link
Ruppert, D., M.P. Wand and R.J. Carrol, 2003. Semi-Parametric Regression. 1st Edn., Cambridge University Press, Cambridge.
Schimek, M.G., 2000. Smoothing and Regression: Approaches, Computation and Application. John Wiley and Sons Inc., USA., ISBN-13: 9780471179467, pp: 19-39.
Wahba, G., 1990. Spline Models for Observational Data. 1st Edn., SIAM, Philadelphia, Pennsylvania, USA., ISBN: 10: 0898712440, Pages: 180.
Cao, Y., H. Lin, T.Z. Wu and Y. Yu, 2010. Penalized spline estimation for functional coefficient regression models. Comput. Stat. Data Anal., 54: 891-905.
CrossRef Direct Link

Journal of Applied Sciences

Research Article

Smoothing Parameter Selection Problem in Nonparametric Regression Based on Smoothing Spline: A Simulation Study

ABSTRACT

How to cite this article

Search

INTRODUCTION

RECOMMENDATIONS

REFERENCES

Search

Leave a Comment