INTRODUCTION
Mathematical modeling of water flow in unsaturated natural soils leads to partial
differential equationRichard’s equation (Kutilek and Nielsen, 1994). To solve
this equation one needs the description of the relationships between soil moisture
content, hydraulic conductivity and capillary pressure. The moisture retention
characteristic (the relation between water content Θ and pressure head
h) can be measured for natural soils. A variety of parametric models
have been proposed to describe MRC. All these models are NLRmodels, which are
able to describe the Sshaped retention curve.
A well known MRC model was studied and moreover a now widely used class of functions for parametrizing measured soil water characteristics
is presented, where α, n and m are positive parameters defining the MRC’s shape (Van Genuchten, 1980; Vereecken et al., 1989). If we consider the saturated water content Θ_{s} and the residual water content Θ_{r} as parameters too, we will have a MRC model with five parameters, which have to be estimated from observed soil water retention data and this model is denoted by (VG1).
In King (1965), the following model with five real parameters Θ_{s}, h_{0}, b, ε and γ is suggested
where, b having negative values and this model denoted by (K1).
Table 1: 
Some variants of van Genuchten model (VG1) and King model
(K1). 

In ElShehawy (2008), some special cases of the models (VG1) and (K1) with fewer parameters are studied. Table 1 indicates to these special cases (VG2), (VG3), (VG4) and (VG5) of the model (VG1) and also the model (K2) as a special case of the model (K1).
The model selection in ElShehawy (2008) is executed by using the simulation study, which based on hundred sets of simulated data and the measures of nonlinearity (curvature, bias and skewness).
The results of analyzing experimental data using a parametric model may heavily depend on the chosen model. It is unknown whether a certain regression model describes the unknown true regression function sufficiently well.
In this study, we propose computational tools in SPlus for the adequate selection of NLRmodels based on crossvalidation and bootstrap.
The aim of the proposed technique in this study is to select the best parametric NLmodel, to compare the results of the used method of the selection with the results of other selection methods, to indicate the caused parameter of this nonlinearity and finally to avoid this defect.
DESCRIPTION OF RESAMPLING TECHNIQUE
Let us consider a general NLRmodel
where, y_{ij} is the values of the response variable Y at fixed values x_{i} of exploratory variables (nonrandom design points). Here, the real valued function f(., β) is expectation function and it is known up to a Pdimensional vector of parameters β = (β_{1}, β_{2}, ..., β_{p}) ε R^{P} and this function is twice continuously differentiable in β. Moreover, the random perturbations ε_{ij} (the measurement errors) are uncorrelated random variables with zero mean and unknown variance σ^{2}, which do not depend on x_{i}. Given the data
the estimation of the regression function f reduces to estimate its parameters.
If the object of the analysis is primarily the estimation of the regression function, that is, of the values of the regression function itself over a region U of interest, then the weighted crossvalidation criterion will be a convenient criterion characterizing the performance of the model f (x, β) (Bunke et al., 1998, 1999):
Here,
denotes the least squares estimator (LSE), which is calculated from the (n1)
observations left after deleting the observation (x_{i}, y_{ij}),
with x_{i} ∈ .
Thereby we assume that the (ordered) values x_{r}≤...≤x_{s
}of the independent variable are in the region.
If
= [a, b] is the interval of the interest with a≤x_{r}≤...≤x_{s}≤b,
then one will use weights
with and
If one would not like to specify the interval
= [a, b], then a = x_{1} and b = x_{2} should be the standard
values. The use of a criterion like (5) is justified by the fact, that it estimates
the weighted sum of the actual squared prediction errors. The crossvalidation
criteria (5) will be adequate, if prediction or estimation of the regression
function (here the MRC) is a primary objective of the data analysis.
Following Bunke et al. (1998, 1999), the model selection could be done by using the crossvalidation criteria (5) in three steps:
• 
List alternative models
corresponding to the theoretical or practical experience in the field of
application. 
• 
Select among these models
a model
with the smallest value of the chosen crossvalidation criteria. 
• 
The data analysis is then done with the estimated regression function
. 
As an alternative to the Ordinary Least Squares Estimator (OLSE) in view of
possibly heteroscedastic variances
a (nonhomogeneous) variance model could be chosen and a corresponding Weighted
Least Squares Estimator (WLSE) should be used. Furthermore, it is possible to
asses with a procedure in SPlus the accuracy of the parameter estimations using
resampling methods.
With respect to the goodness of fit, one can use the criteria ,
where
ACCURACY AND PROFILE
Estimates (for instance parameter estimates) will be only of value, if they are complemented by an assessment of their accuracy. Therefore, statistical packages offer the possibility to calculate characteristics like the standard deviation or confidence intervals. These calculations are often based on linear models (linearization) or asymptotic theories (Bates and Watts, 1988; Seber and Wild, 2005). In case of NLRmodels combined with typical small data sets these calculated results may differ significant from the true values. Especially strong nonlinear parameters may have confidence intervals which will be essentially differing from the calculated intervals using a linearization of the NLmodel. To overcome these difficulties it is possible to use resampling methods bootstrap or corrected versions of the confidence intervals based on nonlinear theories.
Accuracy of Parameter Estimates
The bootstrap is a resampling method that tries to mimic the (in general
unknown) distribution of the observations (measurement error) by generating
artificial independent bootstrap samples from the fitted model. The estimates
of the model parameters are repeatedly calculated for these bootstrap samples.
The accuracy of the original estimates in inferred from the empirical distribution
of the bootstrap estimates. In Bunke et al. (1998, 1999), a moment oriented
bootstrap procedure, which is based on a nonlinear transformation (BoxCoxtransformation)
to get nearly normally distributed residuals and on the estimation of variance
(second moment) of this distribution, is used. The bootstrap can be applied
when a regression and variance model were selected. Then it is possible to generate
bootstrap samples based on realizations of corresponding normally distributed
pseudo random numbers and to compute confidence intervals and some other measures
of accuracy (variance, bias and mean absolute relative error) based on this
simulation.
Profiles of Parameter Estimates
Another way to get exact confidence intervals for a single component of
the parameter vector is the use of so called profile t functions and plots (Bates
and Watts, 1988; Seber and Wild, 2005). Plots of the profile t function τ(β_{r})
provide exact likelihood confidence intervals of the parameters and in addition,
reveal how nonlinear the estimation situation for the corresponding parameter
is.
The profile t function represents the profile of the residual sum of square function as a function of a single component of the parameter vector in a neighborhood of the estimated parameter vector.
In case of a linear regression model (LRmodel) the squares root of this residual sum of square functionthe profile t function is a linear function δ(β_{r}) of the corresponding component β_{r} of the parameter vector β. Moreover, the plot of this function against the component would be a straight line.
For a nonlinear component of the parameter vector in a NLRmodel the plot of the profile t function will be curved and the amount of curvature gives some information about the nonlinearity of the model and of the component of the parameter vector. Using these profiles it is possible to construct exact confidence intervals for the components of the parameter vector of the model under consideration.
RESULTS
In this section we present the results of the model selection with resampling technique and assessment of accuracy of parameter estimates.
Model Selection with ReSampling
Now, we consider the model selection based on the weighted crossvalidation
criterion (5). Using this criterion we will look for a retention model, which
minimizes the weighted sum of the actual squared prediction errors. That means,
we look for a model, which is stably able to predict values of the retention
function based on the given data situation.
We consider the first simulated data set SIMUL_1 of sand (ElShehawy, 2008), which is given in Table 2.
Since the simulation uses a homogenous error variance, we consider only all combinations of retention models with a homogenous variance for the measurement errors.
Table 3 contains the ranking of the studied models with respect to the RSS criterion and the CV criterion. These results are obtained by using the SPLUS procedure in Appendix A.
We remark that the model (VG5) is not flexible enough and therefore convergence problems during the calculation of the CV value occur.
An application of the crossvalidation approach to our problem would lead to the selection of the model (VG4) with four parameters.
Table 2: 
The first simulated MRCdata for sand 

Table 3: 
Ranking of retention models with respect to CV and RSS for
the first simulated data 

Referring to ElShehawy (2008), if we are is interested in the model (VG4) with four parameters, we will try to improve the problematic qualities by the suggested reperarameterization (RVG4) of the model (VG4) through replacing the parameter n by .
Since the values of RSS and CV are independent of parameterization, then the model (RVG4) is identical with respect to these criteria with the model (VG4). The models (VG1) and (K1) with five parameters are very flexible and are having optimal values with respect to RSS.
These results are neither depending on the used simulated data set of a soil nor on the simulated type of soil. As a rule, we get optimal values with respect to the crossvalidation criterion for the model (VG4) and its reparametrization (RVG4). Moreover, the models (VG1) and (K1) are having the best fit with respect to the values of RSS.
If we consider data sets with a possibly inhomogeneous variance structure, we will obtain similar ranking of retention models in combinations with an inhomogeneous variance model. Referring to Bunke et al. (1999), variance modeling can improve the parameter estimation for a given model using the weighted least squares in place of the ordinary least squares.
Assessment of Accuracy of Parameter Estimates
The assessment of accuracy is a necessary part of the parameter estimation
for a given NLRmodel. We will consider accuracy statistics for some parameters
of the model (VG4) and its reparametrization (RVG4).
Table 4 gives such statistics for the parameters α,
n and
of the model (VG4) and (RVG4), respectively combined with the simulated data
SIMUL_1 of sand.
Table 4: 
Accuracy statistics for α, n and
of the models (VG4) and (RVG4) for SIMUL_1 


Fig. 1: 
Profile t function τ(α) and linear profile δ
(α) for the parameter α in the model (VG4) with respect to the
first simulated set of sand 

Fig. 2: 
Profile t function τ (n) and linear profile δ (n)for
the parameter n in the model (VG4) with respect to the first simulated set
of sand 
These statistics are obtained by using the SPLUS procedure in Appendix B.
This table contains the estimated values of the corresponding parameter of the
model and the used data set and several 95% confidence intervals CI, CI^{b}
and CI*.
We start with classical asymptotic results, which are based on a linearization of a NLmodel. The confidence interval CI with lower bound lb and upper bound ub shows these results. The calculations of these confidence intervals are based on the theory of LRmodels.
In case of a parameter, which has near linear behavior, this linear approximation of a NLRmodel provides adequate inferential results.
Unfortunately, in a nonlinear analysis and parameters with strong nonlinear
behavior combined with typical small data sets this approximation will be woefully
inadequate. For example, the parameter α shows weak nonlinear behavior,
the parameter n has stronger properties of nonlinearity with respect to its
estimation. Referring to ElShehawy (2008), the corresponding parameter
of the reparameterization (RVG4) has improved properties.
In the second attempt to improve the accuracy statistics based on a bootstrap simulation we calculated a 95% confidence interval CI^{b} of the estimated parameter. The calculation of this confidence interval is based on bootstrap sample of hundred data sets generated with NLRCHOICE (Bunke et al., 1998, 1999). These accuracy statistics are inferred from the empirical distribution of the estimated parameters for this bootstrap sample.
The third possibility is the calculation of an exact likelihood interval CI* with lower bound lb* and upper bound ub*. That means, if we are interested in this exact interval for the nonlinear parameter β_{r}, we will use the profile t function τ(β_{r}) to correct the asymptotic confidence interval CI.
In Fig. 1 and 2, which are obtained by using the SPLUS procedure in Appendix B, the curves (solid) show the profiles τ(α) and α(n), the dotted lines are the linear approximations δ(α) and δ(n) and the lines (dashed) describe the construction of 95% exact confidence intervals for the parameters α and n, respectively.
In case of the near linear parameter we get only a slight modification CI* of the classical asymptotic confidence interval CI, but in case of the parameter n with strong nonlinear estimation behavior the exact confidence interval CI* is completely different from CI. This exact confidence interval CI* is unbounded above. That means, for typical data sets frequently very large values of n will be estimated.
CONCLUSIONS
Model selection of NLRmodels depends on the main aim of the analysis. A retention model has to be able to describe the typical Sshaped retention curve.
If one is interested in the best fitting model, very flexible models with more parameters should be used. As a criterion ordinary or in case of heteroscedasticity weighted least squares can be used. In case of a retention curve the models (VG1) and (K1) with five parameters are optimal with respect to the goodness of fit.
If one is primary interested in the prediction of future values of the depending variable here water contenta crossvalidation criterion like will be used. This criterion describes the stiffness of the model that means the ability to predict values of the depending variable based on the given data. Optimal with respect to the crossvalidation criterion is the model (VG4) with m = 1.
The results of model selection are identical with the corresponding results which were obtained by the simulation studies or the calculation of measures of nonlinearity.
Using profile t functions, we note that:
• 
The selected model (VG4) has a strong nonlinear parameter
n and also it has a possible appropriate reparameterization (RVG4) with
respect to this parameter. 
• 
It is possible to correct classical confidence intervals of estimated
parameters, which having clear nonlinear behavior. 
APPENDIXES
A. SPLUS Procedure for Model Selection
B. SPLUS Procedure for Calculating the Exact Confidence Intervals and the
Profile of the Parameters