In tropical livestock systems, poor grassland productivity is one of the most important limitations due to the adaptability and persistence in these environments. However, new cultivars of the genus Brachiaria have been released to the market as options to overcome the problems observed in traditional forages, thus providing better fodder options1. Brachiaria brizantha is one of the superior introducing grasses that has been adapted and known by farmers. These grasses are compatible with the tropical climate and are tolerant of various types of soil, including acidic soils2. Knowledge of the distribution of production and quality of forage during the year is a tool to plan utilization3. Forage production in the tropics is related to the distribution of rainfall, which is generally asymmetrically distributed. This fact raises the need to use more complex models, especially when multiple comparisons are made and there are very large deviations from normality, the risk of error increases4,5.
Generalized Additive Models for Location, Scale and Shape (GAMLSS) are semi-parametric regression type models. They are parametric, in that they require a parametric distribution assumption for the response variable and “semi” in the sense that the modeling of the parameters of the distribution, as functions of explanatory variables, may involve using non-parametric smoothing functions. GAMLSS were introduced by researchers6 as a way of overcoming some of the limitations associated with the popular generalized linear models, GLM and generalized additive models, GAM7,8. In GAMLSS the exponential family distribution assumption for the response variable (y) is relaxed and replaced by a general distribution family, including highly skew and/or kurtotic continuous and discrete distributions9. The systematic part of the model is expanded to allow modeling not only of the mean (or location) but other parameters of the distribution of y as, linear and/or non-linear, parametric and/or additive non-parametric functions of explanatory variables and/or random effects6. Hence GAMLSS is especially suited to modeling a response variable that does not follow an exponential family distribution, (e.g., leptokurtic or platykurtic and/or positive or negative skew response data, or over-dispersed counts) or which exhibit heterogeneity, (e.g., where the scale or shape of the distribution of the response variable changes with explanatory variables(s))9.
In this paper, a GALMSS model for skewed data, where the exponential family assumption is relaxed and replaced by a very general distribution family is considered. Within this new framework, the systematic part of the model is expanded to allow not only the mean (or location) but all the parameters of the conditional distribution of y to be modeled as parametric and/or additive nonparametric (smooth) functions of explanatory variables and/or random-effects terms. The objective of this research is to fit a GAMLSS model on accumulated dry matter data from Brachiaria brizantha using a model selection algorithm.
MATERIALS AND METHODS
Study area: The study was carried out at a farm in Portuguesa state, Venezuela (Fig. 1) from November, 2016-October, 2017). The farm is located between 1008020-1004802 West Longitude and 425400-427420 North Latitude. The response variable ‘y’ is the accumulated dry matter (kg ha1). The explanatory variable ‘x’ is the interval between cutting (21, 28, 35 and 42 days).
Box-Cox power exponential distribution: The Box-Cox power exponential distribution (BCPE) is defined by (for details see11).
Let Y be a positive random variable having a Box-Cox power exponential distribution, denoted by , defined through the transformed random variable Z given by:
for 0<Y<∞ where y and where the random variable Z is assumed to follow a standard power exponential distribution with power parameter, τ>0, treated as a continuous parameter (Parameterization (1) assumes a standard normal distribution for Z)11.
The probability density function of Z, a standard power exponential variable, is given by:
for -∞<z<∞ y τ>0, where c2 = 22/τΓ(1/τ)[Γ(3/τ)]1. This parameterization ensures that Z has mean 0 and standard deviation 1 for all τ>0. Note that τ = 1 and 2 correspond to the Laplace (i.e., two-sided exponential) and normal distributions respectively, while the uniform distribution is the limiting distribution as τ→∞12.
|Fig. 1:||Relative location of the Portuguese state, Venezuela
From 2, the probability density function of Y, a BCPE (μ, σ, υ, τ), random variable, is given by:
Modeled data using GAMLSS: Let yT = (y1, y2,..., yn) be the vector of the response variable observations. Also, for k = 1,2,..., p, let gk(.) be a known monotonic link function relating θk to explanatory variables and random effects through an additive model given by:
where, θk and ηk are vectors of length n, e.g., θTk = (θ1k, θ2k,..., θnk), βTk = (β1k, β2k,..., βj’kk) is a parameter vector of length J’k, Xk is a known design matrix of order n×J’k, Zjk is a fixed known n×qjk design matrix and γjk is a qjk-dimensional random variable. Model (4) is called the GAMLSS.
The vectors γjk for j = 1,2,..., jk could be combined into a single vector with a single design matrix Zk.
If, for k = 1,2,..., p, Jk = 0 then model (4) reduces to a fully parametric model given by:
If Zjk = In, where In is an n×n identity matrix and γjk = hjk = hjk (xjk) for all combinations of j and k in a model (4), this gives:
where, xjk for j = 1,2,..., Jk and k = 1,2,..., p are vectors of length n. The function hjk is an unknown function of the explanatory variable xjk and hjk = hjk (xjk) is the vector that evaluates the function hjk at xjk. The explanatory vectors are assumed to be known. We call the model in equation (5) the semiparametric GAMLSS. Model (6) is an important special case of model (4). If Zjk = In and γjk = hjk = hjk (xjk) for specific combinations of j and k in a model (4), then the resulting model contains parametric, nonparametric and random-effects terms.
The first two population parameters θ1 and θ2 in a model (4) is usually characterized as location and scale parameters, denoted here by μ and σ, whereas the remaining parameter(s), if any, are characterized as shape parameters, although the model may be applied more generally to the parameters of any population distribution.
For many families of population distributions a maximum of two shape parameters ν (=) and τ (=θ3) suffice, giving the model:
The GAMLSS model (4) is more general in that the distribution of the dependent variable is not limited to the exponential family and all parameters (not just the mean) are modeled in terms of both fixed and random effects, for details see Rigby and Stasinopoulos11.
Model for the four parameters of the Box-Cox power exponential distribution: The parameters (μ, σ, ν, τ) of the Box-Cox power exponential distribution may be modeled as functions of many explanatory variables using the generalized additive model for location, scale and shape (denoted by GAMLSS) of Rigby and Stasinopoulos11. Here we consider a special case of the GAMLSS model where there is a single explanatory variate X. Given X = x; Y is modeled by a Box-Cox power exponential variable, BCPE (μ, σ, ν, τ), with probability density function fY(y), defined by (4) where the parameters μ, σ, ν and τ are modeled as smooth non-parametric functions of x, i.e. Y∼BCPE (μ, σ, ν, τ) where:
and for k = 1, 2, 3, 4, gk(.) are known monotonic link functions, usually the identity for μ and ν and the log for σ and τ and hk (x) are smooth non-parametric functions of x.
For i = 1,2,..., n, given X = xi, observations Yi are assumed to be independent BCPE (μi, σi, νi, τi) variables with probability density functions fYi(yi) obtained from (3) and parameters obtained from (8). This model is appropriate for independent observations of Y (e.g. cross-sectional data) rather than correlated observations (e.g. longitudinal data), for details see Rigby and Stasinopoulos11.
Model estimation and selection: The non-parametric functions hk for k = 1, 2, 3, 4 are estimated by maximizing the penalized log-likelihood function lp defined by:
where, h”k(u) is the second derivative of hk(u) to u and is the log-likelihood function of the data and li is the log-likelihood function of observation yi from a Box-Cox power exponential distribution, BCPE (μi, σi, νi, τi), obtained from (4).
The penalized log-likelihood function (9) is maximized iteratively using either the RS or CG algorithm of Rigby and Stasinopoulos11, which in turn uses a back-fitting algorithm to perform each step of the Fisher scoring procedure, requiring the log-likelihood of the data, Id and first and expected second derivatives to μ, σ, ν and τ.
A general criterion for model selection is the generalized Akaike Information Criterion (GAIC), obtained by adding to the fitted deviance a fixed penalty # for each effective degree of freedom used in a model, i.e. GAIC (#) = where df denotes the total effective degrees of freedom used in the model, denotes the fitted deviance and denotes the fitted log-likelihood.
The total effective degrees of freedom df combines the effective degrees of freedom used in the smooth functions h1 (x) to h4 (x) in (8), for modeling μ, σ, ν and τ, denoted by dfμ, dfσ, dfν and dfτ, respectively. Each effective degrees of freedom (e.g. dfμ) is defined by the trace of the corresponding smoothing matrix in the fitting algorithm, which is in turn directly related to the corresponding smoothing parameter (e.g. λ1), see Rigby and Stasinopoulos11 for more details. The model with the smallest value of the criterion GAIC (#) is then selected. The Akaike information criterion (AIC) and the Schwartz Bayesian criterion (SBC) are special cases of the GAIC (#) criterion corresponding to # = 2 and # = log (n), respectively. Let BCPE (dfμ, dfσ, dfν, dfτ, λ) represent the BCPE model (8), where the first four values inside the brackets denote the total effective degrees of freedom used in the smooth non-parametric functions h1 (x) to h4 (x) for modeling μ, σ, ν and τ, respectively and the fifth value λ denotes the power transform parameter in the transform x = x varλ, where xvar is the explanatory variate recorded in the data set.
For cumulated dry matter data set the three-step procedure for selecting a model of form (8) proposed by Rigby and Stasinopoulos11 is used:
|| Choose link functions gk (.) For k = 1, 2, 3, 4 in (4)
|| Choose an initial parameter value λ0 for λ in the transformation x = x varλ, assuming x var>0 for all observations. The initial value λ0 is chosen to provide an approximate constant absolute gradient between y and x. (A shifted (or offset) power transformation of x var, i.e., x = (x var+δ)λ could be considered if x var takes negative values)
|| Choose a single penalty # for each effective degree of freedom used in the models for μ, σ, ν and τ
|| From model BCPE (1, 1, 1, 1; λ0) (where the first four values dfμ = dfσ = dfν = dfτ = 1 indicate fitting a constant for parameters ), apply forward selection of to give the chosen value which minimizes criterion GAIC (#)
||In model BCPE (dfμ1, 1, 1, 1, λ), estimate λ, (e.g. using a grid search over values of λ) to give the value λ1 which minimizes GAIC (#)
||From model BCPE (1, 1, 1, 1; λ1), apply forward selection of dfσ to give chosen value dfσ1 which minimizes GAIC (#)
||From model BCPE (dfμ1, dfσ1, 1, 1, λ1) create a two-way table of values of GAIC (#) for combinations (dfν, dfτ ) and search for the combination (dfν1, dfτ1) which minimizes GAIC (#)
Fine-tuning of model BCPE (dfμ1, dfσ1, dfν1, dfτ1 λ1):
|| Fine-tuning of dfσ, by changing dfσ1, in steps of 1, if the value of GAIC (#) decreases
||Similar _ne tuning of dfμ
||Fine-tuning of λ by re-estimating λ as in step 2 (b)
The R code to use the algorithm described above is shown in the Appendix.
RESULTS AND DISCUSSION
Dry matter of Brachiaria brizantha: In Fig. 2a, the distribution of the dependent variable (accumulated dry matter) of B. brizantha showed asymmetry and kurtosis. Figure 2b the box plots show an asymmetric distribution of the accumulated dry matter data for intervals between cuts of 35 and 42 days. Figure 2c the densities associated with the accumulated dry matter data of B. brizantha show leptokurtic distributions for the 4 intervals between cuts. The data of Table 1 shows how the accumulated dry matter of B. brizantha increases as the interval between cuts increases. The data of
|Appendix: ||R code to fit a GAMLSS model on dry matter data in Brachiaria brizantha
||Dry matter distribution of Brachiaria brizantha
||(a) Histogram, (b) Box-plot and (c) Density at four intervals between cutting
Box-Cox Power Exponential Distribution (BCPE) fitted on Brachiaria brizantha dry matter
data at four intervals between cuttings
Yield (kg) of dry matter
of Brachiaria brizantha
at four intervals between cuttings
Assumptions on dry matter
data from Brachiaria brizantha
at four intervals between cutting
GAMLSS model adjusted on dry matter
data of Brachiaria brizantha
at four intervals between cutting
Table 2 shows the non-normality or asymmetry (p<0.05) of the accumulated dry matter distribution and serial autocorrelation of errors (p<0.05). From Fig. 3 and Table 3, the chosen BCPE model provides a substantially better fit than the previous models both in terms of the GAIC (865).
Box-Cox power exponential distribution (BCPE) fitted on Brachiaria brizantha dry matter
data at four intervals between cutting
Fitted values for the parameters (a) μ,(b) σ and (c) ν in the final chosen BCPE model on dry matter
data in Brachiaria brizantha
The fitted models for the μ, σ and in the chosen BCPE model have plotted in Fig. 4a-c, respectively. The estimated parameters for Box-Cox power exponential distribution (BCPE) fitted on Brachiaria brizantha dry matter data at four intervals between cuttings are shown in Table 4. The parameters μ, σ, ν and τ of the BCPE distribution may be interpreted as relating to the location (median), scale (coefficient of variation), skewness (power transformation to the symmetry) and kurtosis, respectively9. The fitted value for τ in Equation (13) indicates leptokurtosis as τ>3. From Table 5 for the different intervals between cutting, the distribution of dry matter from B. brizantha is asymmetric as vi>0. From Eq. 10 and Fig. 4a, it is observed that the longer the interval between cuts, the average yield of dry matter increases. Similar results are reported by Hare et al.13 in a study on the effect of cutting interval on yield and quality of three brachiaria hybrids in Thailand. From Fig. 4b it is observed that the longer the interval between cuts, the variability in the dry matter yield decreases and from Fig. 4c it is observed that for intervals between the cutting of 21-28 days the asymmetry of the distribution of dry matter yield decreases, while for intervals between the cutting of 35-42 days it increases.
This indicates that cutting stimulates regrowth of the plant and consequently its dry matter production since a significant increase in dry matter is observed when going from the first to the third cut. There is enough information in the literature to explain the effect of the cutting stimulus13 or grazing1,14 in the dry matter production of pastures. These variations in the average dry matter production between cuts of the genotypes as a whole were the product of the environmental effect (heterogeneous distribution of rainfall within the same dry season), rather than by effects of the genotype itself (p>0.05). In this regard, some studies show that environmental effects are often more important than the effects of the genotype on the productive traits of forage plants15,16. In this way, the modeling of the accumulated dry matter using the BCPE distribution allows to show the effect of the interval between cut on the accumulated dry matter and to make decisions in the management of this species that allow better yields.
|Table 5:||Estimated parameters from the GAMLSS model on the accumulated dry matter from B. brizantha at four intervals between cuttings
In the parameterization of the Box-Cox Power Exponential Distribution (BCPE) fitted on Brachiaria brizantha dry matter data at four intervals between cutting with parameters used in the GAMLSS methodology, its expectation is given by: E[Y] = μ and its variance by Var[Y] = σ2μ2:
GAMLSS allowed flexible modeling of both the distribution of the dry matter yield of B. brizantha and the dependence of all the parameters of the distribution on intervals between cutting. The BCPE distribution provides a flexible model for the dry matter yield of B. brizantha, exhibiting both skewness and leptokurtosis. For the dry matter yield from Brachiaria brizantha, exhibiting skewness and platykurtosis, the BCPE distribution, provided the best fit. Finally, the results of this research suggested that the interval between cutting has an effect that is reflected in the average yield of dry matter of B. brizantha, that is, the yield of dry matter increases as the interval between cutting increases. . Similarly, the interval between cuts affects the skewness and the kurtosis of the distribution.
This study uncovers the potential uses of GAMLSS models in the dynamics of dry matter production that may be beneficial to pasture and forage researchers. This study uncovers the potential uses of GAMLSS models in the dynamics of dry matter production that may be beneficial to pasture and forage researchers. This study will help the pasture and forage researcher to uncover critical areas of accumulated dry matter production of Brachiaria brizantha that many researchers were unable to explore. Therefore, a new theory can be reached on modeling the dynamics of accumulated dry matter production of B. brizantha.