Abstract: In this study, simulation activities are suggested as a tool in understanding the properties of estimators of the coefficients of linear regression model when the variables are circular. This activities are suitable for undergraduate students who have learned simple linear regression theory and would like to extend the idea of regression to the case when the sets of measurements are directions, as well as the using of various approximations in parameter estimations.
INTRODUCTION
Simulation activities has been suggested as a complement to the teaching of theory of statistics. The main objective is to show the probabilistic properties of certain estimators using empirical results of simulation actions. Armero and Ferrandiz (2002) proposed a simulation activity to show empirically the probabilistic properties of least squares estimators of slope, intercept and residual variance of a simple linear regression model. In this research, we extend the activity by considering the theory of circular regression model which is the simple linear regression of directional data. The topic is chosen due to its similarity to the simple linear regression theory and to motivate students to think on theory beyond their present knowledge. Appropriate programs in SPlus are provided as simulation tools.
CIRCULAR RANDOM VARIABLE
A circular random variable is a variable which takes values on the circumference of a circle, for example the angle is in the range [0, 2π) radians or [0°, 360°). Regression involving a circular response variable is common in a number of areas of application, particularly in biological, geological, astronomical, meteorological and economical sciences. Examples include the relationship between the direction an animal moves and the distance moved; the dependence of the strike of a fault plane on displacement; and the dependence of wind direction on wind speed.
THE VON MISES DISTRIBUTION
A circular random variables θ is said to have a Circular Normal (CN) or von Mises distribution, denoted by VM(μ, κ), if it has the density function:
|
(1) |
Where, 0≥μ<2π and κ≥0 are parameters. Here I0(κ) in the normalizing constant is the modified Bessel function of the first kind and order zero and is given by:
|
(2) |
This distribution also known as a Circular Normal distribution to emphasize its important and similarities to the Normal distribution on the real line was first introduced by von Mises and was discussed by Mardia (1972), who provide a nice discussion of this distribution and some of its properties. SPlus program for generating sample from this distribution is given in Appendix A.
CIRCULAR REGRESSION MODEL
Let X and Y to be a circular explanatory variable and a circular response variable respectively. The circular variable X is usually assumed to be a variable that is controllable by the experimenter. Therefore the experiment is designed so as to choose the values X and observe the corresponding value of Y. Suppose the true relationship between Y and X is linear and that the observation Y at each fixed value x of X is circular random variable. For each observation Y, the model is given by
|
(3) |
Where, ε is a circular random error having a von Mises distribution with mean circular 0 and concentration parameter κ, i.e., ε~VM(0, κ). The ε are also assumed to be uncorrelated with each other. For practical purposes, we will consider β≈1, as an example is the measurements of wind direction by two different techniques. Knowing ε with fixed values of x, the values for y can be generated using (3). SPlus program CirDat given in Appendix B can be used to generate y with fixed values of x.
MAXIMUM LIKELIHOOD ESTIMATES OF α, β AND κ
The maximum likelihood estimators of α and β, denoted by
|
(4) |
Where:
|
(5) |
and
|
(6) |
Since both the x and y are measurements of the same quantity (as an example, the wind directions), unity would be logical initial estimate of β and so a possible initial estimate for iteration is β0 = 1.0. We can then update α and β and proceed iteratively. This iteration procedure will continue until the convergence criterion is satisfied.
Using the final maximum likelihood estimate of α and β obtained above, then maximum likelihood estimate of κ is given by
|
(7) |
A simple and reasonably accurate approximation to A-1 (w) was given by Best and Fisher (1981), which is
Further, the asymptotic properties of
|
(8) |
|
(9) |
|
(10) |
Where:
For large n, the estimator
OBJECTIVES OF ACTIVITIES
• | To show that there is no closed-form available for maximum likelihood estimator for circular regression model compared to simple linear regression model and the estimate of α, β and κ may be obtained iteratively. |
• | To show that for large sample size, all the estimators i.e, |
SIMULATION ACTIVITIES
The activities can be arranged in 4 different steps as follows.
Step 1: Simulating the Circular Data
The simulation will be based on model given by (3):
Let the student choose a value of α, β and κ, say, α = 4, β = 1 and κ = 3 with size 30. Then, we can generate εi from VM (0, κ = 3) using the VM SPlus procedure given in Appendix A by typing VM (0, 3, 30) in the command window. Consequently, a data set, {(xi, yi), I = 1, 2, ...., 30} is generated with where xi are fixed by the instructor as xi = 12i° while yi/xi = 4+xi+εi (mod 2π). For example, with seed number 100, the data set given in Table 1 is obtained using CirDat programs given in Appendix B. It is done by typing CirDat (0, 3, 30, 4, 1) in the command window.
Step 2: Finding the Estimates of α, β and κ
Using the programs CirReg given in Appendix C, the estimates
and
Var(
The estimates are close to the true values of the parameters of α = 4, β = 1 and κ = 3, respectively.
Step 3: Replicating for m Times
We let the students to repeat step 2, say m = 40 repetitions, giving a list
of 40 values of
Step 4: Analysing the Parameter Estimates
Students can now evaluate the parameter estimates obtained from step 3.
Firstly, students will look at the accuracy of the estimates as given in Table
3. The values of mean and median for
Table 1: | Single simulated data set |
Table 2: | Values of estimates for 40 replications |
Table 3: | Statistics of estimates |
Fig. 1: | Histogram of estimates |
Table 4: | Results by Kolmogorov-Smirnov method |
Fig. 2: | The quantile-quantile plots of the estimator |
Next, we want to show that the estimators follow a normal distribution. This
can be shown by using graphical tools such as the quantile-quantile normal plot
or hypothesis test approach such as the Kolmogorov-Smirnov method (Montgomery,
1992). Figure 2 gives the quantile-quantile plots of the estimator
CONCLUSION
The simulation activities as suggested is an example of statistical exercises that can be used to encourage students to investigate some aspect of statistical theory using simulation and approximation approach. These activities can motivate students` interest in extending the theory of statistics to other research area which has practical applications.
Appendix A
Appendix B
Appendix C
Appendix D