HOME JOURNALS CONTACT

Journal of Applied Sciences

Year: 2007 | Volume: 7 | Issue: 13 | Page No.: 1762-1767
DOI: 10.3923/jas.2007.1762.1767
The Application of Overdispersion and Generalized Estimating Equations in Repeated Categorical Data Related to the Sexual Behaviour Traits of Farm Animals
Abdullah Yesilova and Ayhan Yilmaz

Abstract: In this study, the Poison regression, negative binomial regression and generalized estimating equations were applied to the repeated measurements based on count data obtained from the sexual behaviors of ram lambs. Negative binomial regression was more effective to handle the over dispersion that causes bias in parameter estimations in Poison regression. The generalized estimating equations were used for analyzing repeated categorical data. GEE estimates were obtained by using the exchangeable working correlation. As a result of GEE analyses, it was concluded that flehmen lip curl response, tail raising, mount duration, vocalization and weight of the ram lamb were statistically important (p<0.05) for mount frequent. However, the anogenital sniff found be not significant.

Fulltext PDF Fulltext HTML

How to cite this article
Abdullah Yesilova and Ayhan Yilmaz, 2007. The Application of Overdispersion and Generalized Estimating Equations in Repeated Categorical Data Related to the Sexual Behaviour Traits of Farm Animals. Journal of Applied Sciences, 7: 1762-1767.

Keywords: Poison regression, negative binominal regression, Over dispersion and generalized estimating equations

INTRODUCTION

When researchers gather data from individual unit on multiple occasions (i.e., repeated measurements), such data are routinely analyzed with classical univariate ANOVA methods (e.g., linear regression, repeated measures design ANOVA). These procedures assume normality, independence of observations and equality of variance (or sphericity). Data obtained in many experimental setting (e.g., longitudinal) rarely satisfy these assumptions (Frome et al., 1973; Agresti, 1997; Gardner et al., 1995). The response variables used in this study were collected within subjects across time. Poison or negative binomial regression using Generalized Linear Models (GLM) has been accepted as the best means of estimating probabilities in cases in which the dependent variable consists of counted data (Gardner et al., 1995; Cameron and Trivedi, 1998).

The most significant property of the Poison Regression (PR) is the equality of the mean and variance. But in the practice this equality does not realize always. When the variances derived from the data are higher or lower than mean in the model, the data may be over-or underdispersed (Cox, 1983; Breslow, 1990; Stokes et al., 2000). In such cases ordinary PR produces biased estimates for the response variable modeled. Instead, quasi likelihood methods (Mccullagh and Nelder, 1989; Breslow, 1990) and random effect models and mixture models are advised (Lawles, 1987; Wang et al., 1996, 1998; Dalrymple et al., 2003).

When analyzing counted data, a Negative Binomial Regression (NBR), one of the models of random effects (Wang et al., 1996) should be specified in cases in which the dispersion is high. In NBR uses log link function, which links the linear structure of the explanatory variables to the expected value of the dependent variable, to model the data (Frome, 1983; Mccullagh and Nelder, 1989; Dobson, 1990; Agresti, 1997; Cameron and Trivedi, 1998). Negative binomial allows for extra-Poison variation du to other variables not included in the model.

Generalized Estimating Equations (GEE) approach for counted data was developed by Liang and Zeger (1986) to produce more efficient and unbiased regression estimates for use in analyzing longitudinal or repeated measures research designs with non-normal response variables. GEE provides a semi-parametric approach on individuals for observances obtained in longitudinal data (Stokes et al., 2000). The GEE for estimating parameter is an extension of the independent estimating equation to correlated data. In contrast to the method of estimating equation requires only assumptions of relevant moments such as means, variances and correlation. GEEs permit specification of distributions from the exponential family of distributions, which includes normal, inverse normal, binomial, Poison, negative binomial and Gamma distributions. As in GLMs, the variance needs to be expressed as a function of the mean; this is then incorporated in the calculation of the covariance matrix by multiplying the components against an NxN matrix (W) with a value Wi (i = 1,2,…,N) on the diagonal that is determined by the variance function (Davis, 2002; Stokes et al., 2000 ). GEE assume the relationship between the mean and variance of the responses is known, but remaining characteristics of the distribution need not be specified. Experimental units are assumed to be independent, but observations over time from a given unit are allowed to be correlated. This correlation is considered a nuisance to be adjusted for but not of interest in and of itself (Okut et al., 1999).

MATERIALS AND METHODS

Materials: The animal material of this study was carried out Norduz Ram lambs rearing in the Investigation and Practicing Farm of the Agriculture Faculty of Yüzüncü Yil University in Van province of Turkey in 2003. The studies of determination sexual behaviors of the ram lambs were started when the ram lambs were six months old. Sexual behaviors tests were done once a fortnight until they were 12 months old and between 12 and 13 just one test was carried out. In order to value the sexual behaviors of the each lamb at a 5.40x5.00 m2 with 1-3 stimulus sheep for 15 min was individually tested. The stimulus sheep used in the test were fixed during the mating season in the evening by using the searching rams, in the same evening were taken to the testing department and were got accustomed to there. Apart from mating season, inter vaginal sponges were applied to 6-9 sheep for 12-14 days. The sponges were taken out in the early morning and after that 500 IU PMSG was injected into each sheep intramuscularly. Between 24-72 h following the PMSG injection in the morning and evening hours a rake of estrus was done and sheep giving estrus symptoms were distinguished from the others and immediately sexual behaviors of male lambs were analyzed. In order to remove the effect of the test time the ram lambs were tested after being selected randomly. During the testing the same test department was used. The determination of sexual behaviors of the male lambs was done as described Price (1993).

Mount duration: The time that passed until the ram lamb determined the oestrus sheep and mounted on it.

Mount frequent: The number of mountings that were done by Ram lamb in order to determine the oestrus sheep and display the act of climbing over. The intention of mounting and mounting itself was appraised together (Katz et al., 1988).

Flehmen lip curl: The act of curling its upper lip upward performed by ram lamb after sniffing its urine and its genital zone. At the same time, the male lambs were recorded to anogenital sniffing, the ability of tail raising and a frequency of vocalization.

Methods Poison regression: In the PR the yi (i = 1,…,n) dependent variable which is the number of the interested event is supposed to have the Poison distribution when the xi independent variables are given. In this case the logarithm of μ, the average of the Poison, is supposed to be a linear function of independent variables (SAS, 2005). The function, using GLMs, is given as,

or

(1)

Describing the dispersion parameter that is resulted from the inequality of the means and the variance, it possible to explain the dispersion occurring in the PR. When we take the dispersion parameter ø into account, the relation between the means and the variance is described as,

Var(Yi) = φμ

When φ>1, it is assumed that there is over dispersion (SAS, 2005).

Negative binomial regression: In GLM, Poison and negative binomial regression models share similar procedures to estimate the parameters in the model. The expected value of yi denoted by μi for given xi is

μ(x) = g(x;β)

Where, g(x;β) is a positive valued function of the x, β is the vector of regression parameters.

In the log linear form, it is described as,

g(x;β) = exp(xi’β)

NBR model is given as,

(2)

In Eq. 2, a is called dispersion parameter for the model. In the NBR model, the means and the variance can be given as,

E(Yi) = μ(x)

and,

Var(yi) = μ(x)+aμ(x2)

(Lawles, 1987; Yeşilova, 2003). The distribution of response variables in NBR model is

Y = Poison(μ(x),a)

If a = 0, then the model reduced to Poison model.

When we consider the φ as distribution parameter then the variance of NBR would be rewritten as,

Var(Yi) = φ(μ+aμ2)

The negative binomial distribution contains a parameter a, called the negative binomial dispersion parameter. This is not the same as the GLM dispersion φ, but it is an additional distribution parameter that must be estimated or set to a fixed value (SAS, 2005).

Generalized estimating equations: The GEE approach of Zeger and Liang facilitates analysis of data collected in longitudinal, nested, or repeated measures designs. GEEs use the GLM to estimate more efficient and unbiased regression parameters relative to ordinary least squares regression in part because they permit specification of a working correlation matrix that accounts for the form of within-subject correlation of responses on dependent variables of many different distributions, including normal, binomial and Poison (Davis, 2002; Stokes et al., 2000). Correlated data are modeled using the same link function and linear predictor setup (systematic component) as in the independence case. The random component is described by the same variance functions as in the independence case, but the covariance structure of the correlated measurements must also be modeled (Littell et al., 1996; SAS, 2005).

GEE uses quasi-likelihood methods for estimating parameters. To define a quasi-likelihood function only the first two moments of the dependent variable need to be specified. The motivation for using quasi-likelihood methodology is clear: A possible consequence of violating the distributional assumption is that the estimates of the parameters or their variances estimated by maximum likelihood may be biased, or may not have an asymptotic normal distribution (Okut et al., 1999). The estimate from quasi-likelihood is no longer a likelihood estimate, since the quasi-likelihood may not be a likelihood function at all. Nevertheless, the quasi-likelihood estimate possesses all of the properties of likelihood estimate. In the quasi-likelihood regression method, both the marginal mean and the marginal variance are assumed, although the marginal variances or their parameters are often not of interest in most regression problems (Breslow, 1990).

The GEE for parameters vector is an extension of the independence estimating equation to correlated data and is given by (Littell et al., 1996; SAS, 2005). Therefore, in order to estimate β, GEE, similar to GLM, can be written as,

(3)

In the Eq. 3, μi, μi = (μil,...,μiti)’ is the vector of the means; Yi = (Yil,...,Yiti) is the observation vector and Vi is an estimation of the covariance matrix. All these described equations are similar the GLM equations. Typically, covariance Vi is written as;

(4)

In the Eq. 4, φ is scale parameter, Ai is a tixti diagonal matrix, Ai = diag [V(μij)] and R(α) is nixni working correlation matrix that is specified by the vector parameters and estimated as

using the current value of the parameter vector β to compute appropriate functions of the Pearson residual (Stokes et al., 2000; Davis, 2002).

For each yi = (yil,...,yiti), Ri(α) working correlation matrix for repeated measurements for each individual is calculated. When ti = 1 then GEE is equal to GLM. The following step require for obtain the estimates with GEE (Stokes et al., 2000).

Compute an initial estimate of β by GLMs with specify an appropriate link function, g(E(yij)) = X’ijβ where g is the link function (logit, probit, identity etc.,). E(yij) = μij which is the marginal proportion.

Specify the variance of yij, Var(yij) = V(μij)φ, for binomial data, for example, Var(y)-pq, so V(μij) = p(l-p) = μij(l-μij). φ is the scale parameter and φ = 1 for logit data.
Choose a working correlation, Rij(α).
Compute an initial estimate of β. This can be estimated with Ordinal Least Square (OLS) assuming independence.
Compute the working correlation Ri(α) based on the standardized residuals. Standardized residuals for normal case, for example, ri = (yi - μi)/σi. There are structure of working correlation i.e., fixed, m-dependent, exchangeable, unstructured and Autoregressive AR(1).
Compute an estimate of the covariance matrix, Vi
Estimate regression parameters β, cov(βi, βj) and update. Iterative quasi-likelihood methods for estimating β,

where the working covariance matrix for yi is Vi and diagonal elements of A are V(μij)

Update βt+1 and iterate until convergence.

The GEE parameter estimates are consistent as the number of clusters becomes large, even if the working correlation matrix have mis-specified, as long as the mean model is correct. In this study exchangeable correlation matrix was used. The exchangeable correlation matrix presumes the correlation of the measurements between any two observations when time is constant. The exchangeable correlation matrix is described as,

(Stokes et al., 2000; Davis, 2001).

RESULTS

The descriptive statistics about the variables used in the model are presented in Table 1

The criteria for assessing goodness of fit are displayed in Table 2, contains statistics that summarize the fit of the specified model. These statistics are used in judging the adequacy of a model and in comparing it with other models under consideration. The value/DF of deviance for PR and NBR (4.5006 vs 1.0446) indicate that NBR fits the data reasonably well for the specified model. When the value/DF of deviance approach to 1, there is no over dispersion.

Table 3 displays the analysis of initial parameter estimates. Entries in the chi-square column are likelihood ratio statistics for testing the significance of the effect added to the model containing all the preceding effects. On the Table 3, analysis of initial parameter estimates about the variables modeled is given.

Table 1: Basic statistics for the data used

Table 2: Goodness of fit criteria for Poison regression and negative binomial regression
Bold numbers is the value/df of deviance of the selected model.

Table 3: Initial parameter estimates
**p<0.01

Table 4: GEE parameter estimates
**p<0.01

It was seen that over flehmen lip curl, the weight of the ram, the tail raising vocalization and the mounting duration on the number mounting are important (p<0.01) but the behaviour of anogenital sniffing is unimportant (p>0.01).

Forteen separate inspections were applied to 32 animals of the data set. Because of that for repeated measurement values GEE estimations were obtained. On the Table 4 GEE parameter values are given. When the values of the parameters are evaluated it was seen that over flehmen lip curl, the weight of the ram, the tail raising, vocalization and the mounting duration on the number mounting are important (p<0.01) but the behaviour of anogenital sniffing is unimportant (p>0.01). It was determined that flehmen lip curl, the weight of the ram and the mounting duration effected the number of mounting negatively. However it was observed that vocalization and the tail raising have a positive effect on the number of mounting.

It is clear that the parameter estimates observed on the Table 4 are in accordance with initial parameter estimates. Besides, exchangeable correlation matrix, used for working correlation, are obtained as:

DISCUSSION

The results about Flehmen lip curl and mount duration and their effects on the number of the mount are expected behaviors for sexual behaviors. In fact, when rams do not display the mating behaviors they show the behaviors of courting (Katz et al., 1988). This state both increase the mount frequent and negatively affects the number of the mating behaviors performed in the unit time. But vocalization and tail raising behaviors affected positively the number of mount.

Negative Binomial Regression is widely used for investigating the overdispersion happening in the Poisson regression (Wang et al., 1998; Dalrymple et al., 2003; Yeşilova, 2003). Analysis without considering the overdispersion causes incorrect parameter estimations. In this study, it is revealed that NBR is very effective for explaining the overdispersion. Besides NBR is a quite effective method for ceasing the later heterogeneity of the population (Dalrymple et al., 2003; Yeşilova, 2003; Wang et al., 1996, 1998). Using GEEs, parameter estimations were obtained. Especially in the studies related with cattle species, repeated measurements are widely used. Some reproduction characteristics of the domesticated animals are categorically obtained. In such data normal distribution assumption are not obtained. (Tempelman and Gianola, 1993; 1996; Tempelman, 1998). GEE is widely used in the analysis of repeated data that the interested variable is categorical. GEE uses a working estimation about the forms of correlations. When the correlations are of medium degree, GEE gives similar results for all working correlations (Davis, 2002; Stokes et al., 2000). In this study the exchangeable was used as the working correlation. Also analyses were made for the other working correlations. Especially in the form of unstructured correlation for the mounting frequency, vocalization and the tail raising it was determined that the algorithms of GEE were not converged.

REFERENCES

  • Agresti, A., 1997. Categorical Data Analysis. John and Wiley and Sons, Inc., UK


  • Breslow, N., 1990. Tests of hypotheses in overdispersed poisson regression and other quasi-likelihood models. J. Am. Stat. Assoc., 85: 565-571.


  • Cameron, A.C. and P.K. Trivedi, 1998. Regression Analysis of Count Data. 2nd Edn., Cambridge University Press, New York, ISBN-13: 9780521635677


  • Cox, R., 1983. Some remarks on overdispersion. Biometrica, 70: 269-274.


  • Dalrymple, M.L., I.L. Hudson and R.P.K. Ford, 2003. Finite mixture, zero-inflated poisson and hurdle models with application to SIDS. Comput. Stat. Data Anal., 41: 491-504.
    Direct Link    


  • Davis, C.S., 2002. Statistical Methods for the Analysis of Repeteated Measurements. 1st Edn. Springer Verlag, Heidelberg, ISBN: 0-387-95370-1, pp: 125-167


  • Dobson, J.A., 1990. An Introduction to Generalized Linear Models. Chapman and Hall, New York, pp: 174


  • Frome, E.L., 1983. The analysis of rates using poisson regression models. Biometrics, 39: 665-674.


  • Gardner, W., E.P. Mulvey and E.C. Shaw, 1995. Regression analyses of counts and rates: Poisson overdispersed poisson and negative binomial models. Psychol. Bull., 118: 392-404.


  • Katz, L.S., E.O. Price, S.J.R. Wallach and J.J. Zenchak, 1988. Sexual performance of rams reared with or without females after weaning. J. Anim. Sci., 66: 1166-1173.


  • Lawles, J., 1987. Negative binomial and mixed poisson regression. Can. J. Stat., 15: 209-225.
    CrossRef    


  • Liang, K.Y. and S.L. Zeger, 1986. Logitudional data analysis using generalized linear models. Biometrica, 73: 1-322.


  • Little, R.C., G.A. Miliken, W.W. Stroup and R.C. Wolfinger, 1996. SAS System for Mixed Models. Statistical System Inc., Cary, North Carolina, USA., pp: 633


  • McCullagh, P. and J.A. Nelder, 1989. Generalized Linear Models. 2nd Edn., Chapmann and Hall, London, UK., Pages: 536


  • Okut, H., S.A. Gokdere and A. Yesilova, 1999. Application generalized linear mixed models. Proceedings of the 3rd National Conference of the Italian Biometric Society, Roma, 1999.


  • Price, E.O., 1993. Practical considerations in the measurement of sexual behavior. Meth. Neurosci., 14: 16-31.


  • SAS: SAS/STAT Software, 2005. Hangen and Enhanced. SAS, Institute Inc., USA


  • Stokes, M.E., C.S. Davis and G.G. Koch, 2000. Categorical Data Analysis Using the SAS System. John and Wiley and Sons, Inc., USA., pp: 626


  • Tempelman, R.J. and D. Gianola, 1993. Genetic analysis of fertility in dairy cattle using negative binomial mixed models. J. Dairy Sci., 72: 1557-1568.


  • Tempelman, R.J. and D. Gianola, 1996. A mixed effects model for overdispersed count data in animal breeding. Biometrics, 52: 265-279.


  • Tempelman, R.J., 1998. Generalized linear mixed models in dairy cattle breeding. J. Dairy Sci., 81: 1428-1444.


  • Wang, P., M.L. Putterman, I.M. Cockburn and N. Le, 1996. Mixed poisson regression models with covariate dependent rates. Biometrics, 52: 381-400.


  • Wang, P., I.M. Cockburn and M.L. Putterman, 1998. Analysis of patent data-A mixed poisson regression model approach. J. Bus. Econ. Stat., 16: 27-41.


  • Yesilova, A., 2003. Using of poisson mixture regression models for categorical data in biology. Ph.D. Thesis, University of Yuzuncu Yil, Van, Turkey.

  • © Science Alert. All Rights Reserved