INTRODUCTION
The simple circular regression model is one of the circular models that is proposed to represent the relationship between two circular variables. This model can be used in many scientific fields. For example, in studying bird migration, the study is interested to observe wind direction and flight direction of the birds or in medical studies, some circular variables are recorded for the study of vector cardiograms. Sometimes, it is interested to predict one variable given other. This can be done by simple circular regression model. However, the existence of outliers can cause a huge effect of the statistical analysis and the final outcomes. In reallife applications, samples from any field might include noise or outliers. Outlier is an observation which appears inconsistent (extreme) with the other observations in the statistical data and effect on the results. There are often two problems with methods of detecting outliers: ‘Masking’ and ‘Swamping’ problems. Masking is the inability of the procedure to detect correct outliers and swamping is the identification of inliers as outliers^{1}. It is now evident that the presence of outliers causes misleading conclusions to be drawn from the results. Thus, researchers are interested in improving the ways of detecting outliers in statistical data. Many researchers have proposed methods to identify outliers in linear regression model. However, there are only few methods in the study that develop methods of detecting outliers in a simple circular regression model.
Jammalamadaka and Sarma^{2} proposed a regression model when both the response and the explanatory variables are circular. Downs and Mardia^{3} suggested a regression model in which both the response and the explanatory variables are circular with means β and α, respectively. Hussin et al.^{4} extended the previous models and suggested a simple circular regression model when both the response and the explanatory variables are circular variables. Kato et al.^{5} suggested another regression model in the case when both the response and the explanatory variables are circular and assumed that the angular error follows a wrapped Cauchy distribution. Hussin et al.^{6} extended the COVRATIO statistic, which is used to detect outliers in the linear regression model to the detection of outliers in the functional relationship model. Abuzaid et al.^{7} suggested using the COVRATIO statistic to detect outliers in the response variable in a simple circular regression model by using a row deletion approach. Rambli^{8} achieved many objectives in his studies by developing procedures for identifying outliers in circular regression models by using COVRATIO and the mean circular error statistic DMCEs that was proposed by Abuzaid^{9}. Abuzaid et al.^{10} also proposed the mean circular error statistic DMCEc to identify outliers in the response variable of a simple circular regression model by using a row deletion approach. Later, Abuzaid^{11} compared the performance of COVRATIO statistic for a Simple Circular (SC) regression model and a Complex Linear (CL) regression model. It was found that the COVRATIO statistic performs better for the SC model than for the CL model. Hussin et al.^{12} proposed a complex linear regression model to fit the circular data by using the complex residuals to detect any possible outliers.
In this study, a new approach is proposed to identify outliers in the response variable in a simple circular regression model.
MATERIALS AND METHODS
Simple circular regression model: Hussin et al.^{4} proposed a simple circular regression model when both the response variable y and the explanatory variable x are circular variables and there is a linear relationship between them; their model is given in Eq. 1:
where, α and β are the parameters and ε is the circular random error, which follows the von Mises distribution with a circular mean μ and concentration parameter k. It is obvious that the angles ϑ and ϑ+2π give the same point on the circle. The von Mises distribution is called the normal distribution for the circular data. Let ϑ_{1}, ϑ_{2},……, ϑ_{n} follow the von Mises distribution with mean direction μ and concentration parameter k, which can be denoted by [vM(μ,k)], then the probability density function of the von Mises distribution is given in Eq. 2^{13}:
where, I_{0} denotes the modified Bessel function of the first kind and order zero, which can be defined as:
The maximum likelihood estimates of the model parameters are given in following Eq. 36^{12}:
where,
where, , with being the ratio of the modified Bessel function of the first kind and of order one to that of the first kind and of order zero.
Conventional methods for the detection of outliers in a simple circular regression model
COVRATIO statistic: Abuzaid et al.^{7} proposed COVRATIO statistic to detect outliers in the response variable of a simple circular regression model. This statistic is based on the covariance matrix of the simple circular regression model. The COVRATIO statistic is given in Eq. 7:
where, COV is the determinant covariance matrix of coefficients for the full data set:
and COV_{(i)} is the determinant covariance matrix of coefficients for the reduced data set formed by excluding the ith row:
The ith observation is identified as an outlier if COVRATIO_{(i)}1  exceeds the cutoff point.
Mean circular error statistic: Abuzaid^{9} and Abuzaid et al.^{10} suggested two statistics, the DMCEs and DMCEc statistics, to identify outliers in the response variable y in a simple circular regression model.
DMCEs statistic: Abuzaid et al.^{10} proposed to use sine function as a measure of mean circular error to identify outliers, where sin is an increasing function on the interval [0, π/2]. The mean circular error is given as follows:
where, is the circular distance between y_{i} and , MCEs∈[0,1]. The existence of outliers is expected to increase the value of MCEs and the removal of outlier decreases the value of MCEs. Thus, the statistic to detect outliers is given in Eq. 8:
where, MCEs_{(i)} is MCEs with the ith observation removed. The cutoff point represents the maximum absolute difference between the value of the statistic for the full data and the reduced data set (formed by excluding the ith observation) which shown in Eq. 9:
The ith observation is identified as an influential observation if DMCEs_{(i)} is greater than the cutoff point.
DMCEc statistic: Abuzaid^{9} proposed to use cosine function as an alternative measure of mean circular error. This statistic is given by:
where, MCEc ∈[0,2]. If y_{i} is an outlier then the circular distance between y_{i} and is expected to be relatively large. Hence, the existence of outlier in a data set will increase value of MCEc. Consequently, the removal outlier will decrease the value of the statistic. The statistic to identify outlier is given in Eq. 10:
where, MCEc_{(i)} is MCEc with the ith observation removed. The cutoff point is the maximum absolute difference between the value of the statistics for the full data set and the reduced data sets as obtained in Eq. 11:
If DMCEc_{(i)} is greater than the cutoff point, the ith observation is detected as an outlier.
Proposed robust circular distance for Y, RCD_{y}: The distance between two circular observations is completely different from the distance between two linear observations as circular data are in angular form. The maximum distance between two circular observations cannot be more than π in a circle. For example, if ϑi =350° and ϑ_{j} = 10° the difference between them is equal to 340°, but this is not the true circular distance. It can be seen from the hypothetical Fig. 1 that ϑ_{i} =350° and ϑ_{j} = 10° are not far from each other. It is obvious that their distance is equal to 20°.
It is important to mention that the von Mises distribution is symmetric around the circular mean. Jammalamadaka and SenGupta^{14} defined circular distance (cd) in Eq. 12:
where, ϑ_{i} and ϑ_{j} are circular observation. It is anticipated that any observations in which their circular residuals lie far away from their circular mean can be considered as outliers. However, the circular mean is not robust against outliers. Hence, it is proposed using circular median in this regard. This issue has motivated us to formulate a new statistic to detect outliers in simple circular regression model, by employing circular median instead of circular mean. It is called this statistic robust circular distance, denoted as RCD_{y}.
The following steps are applied to compute the proposed robust circular distance (RCD_{y}): First, calculate the absolute value of the estimated circular residuals:
Note that it cannot calculate the difference between y_{i} and directly as it is done with the linear method, because of the circular geometry theory. Second, compute the robust circular distance (RCD_{(i)})_{y} between and circular median of:
Then any y_{i} is suspected to be an outlier if its corresponding [RCD_{i}]_{y} is relatively large. This is due to the fact that if y_{(i)} is an outlier, it affects on the value of

Fig. 1:  A hypothetical figure for the distance between two circular observations 
Subsequently, the circular distance between and circular median of is relatively large. Hence, the cutoff point should be the maximum value of the [RCD]_{y} defined in Eq. 13:
Any circular data that [RCD_{i}]_{y} greater than the cutoff point is declare as an outlier.
RESULTS AND DISCUSSION
Simulated cutoff point of the RCD_{y} statistic: A simulation study is designed to determine the cutoff points (percentage points) of the null hypothesis for the distribution of no outliers in circular data of the proposed statistic. The same procedure has been used by Pearson and Hartley^{15} and Collett^{16} to determine cut off points in their studies. For any data set of sample size n and concentration parameter k, it will be rejected the null hypothesis if the value of computed statistic is larger than the cutoff points, which suggested that there is an outlier in the data set. In this simulation study, first, it is considered 20 different sample sizes of n = 10, 20, 30, …, 200 and 9 values of concentration parameter k = 2, 3, 5, 6, 8, 10, 12, 15, 20. Second, generate a set of explanatory variables X of size n, such that:
Third, generate a set of circular random errors, such that [e~νM(0, k)] for each sample size n and concentration parameter k. Next, fix the initial values of the parameters in the model at α = 0, β = 1. Then, calculate the values of the response variable Y and fit the generated circular data. The RCD_{y} statistic is then computed and its maximum value is determined. These processes are replicated 5000 times for each combination of sample size n and concentration parameter k. The 10 and 5% upper percentile values of the maximum RCD_{y} are determined as shown in Table 1 and 2, respectively. The priority is to calculating these values that they can be used as cutoff points of the RCD_{y} statistic to identify outliers of the simple circular regression model according to the sample sizes and values of the concentration parameter.
It is noticeable from Table 1 and 2 that the cutoff point is an increasing function of the sample size n for any value of the concentration parameter k. This is reasonable, because when sample size is increased, the data will be more spread out. Consequently, the circular distance between them and the circular mean increases. Moreover, the cutoff point is a decreasing function of the concentration parameter k for any sample size n. This is because increasing the concentration parameter causes the concentration of the circular data around the circular mean to increase.
Performance of the RCD_{y} statistic: The performance of the RCD_{y}, COVRATIO, DMCEs and DMCEc statistics are examined by using Monte Carlo simulations.
Table 1:  Cutoff points of RCD_{y} with 10% upper percentile 

Table 2:  Cutoff points of RCD_{y} with 5% upper percentile 

Four different sample sizes are used, namely n =10, 50, 100 and 150 and six concentration parameters, k = 2, 3, 5, 6, 8 and 10. The data are contaminated in the response variable Y according to the following formula in Eq. 14:
where, λ is the degree of contamination, such that (0<λ<1). If λ = 0, there is no contamination. If λ = 1, the circular observation is located at the antimode of its initial location.
For all combinations of sample sizes and concentration parameters, it is generated 10 and 20% contaminated data with λ = 0.8. To evaluate the performance of all the statistics, three measures are considered namely, the proportion of outliers detected, the masking and the swamping rates. The processes are replicated 5000 times for each combination of sample size n and concentration parameter k. In each time of replication, it is observed that the number of detected true outliers (generated). Then the proportion of outliers are calculated as follows:
where, P is percentage of contamination. Similarly, to calculate the rate of masking and the rate of swamping, it is observed that the number of generated outliers detected as inlier (clean observation) and the number of inlier detected as outlier, respectively as follows:
A good method is one that has the highest detection rate for the outliers and low masking and swamping rates. Figure 2 and 3 exhibit the proportion of outliers detected and the rate of masking and swamping with 5% upper percentile for n = 50 and 100. The results of Fig. 2 and 3 showed that the rates of swamping are zero or close to zero for all statistics. However, the rates of masking of COVRATIO statistic are very high and the proportions of outliers detected are very low, for all combinations of sample sizes, concentration parameters and percentage of outliers. It was noticed that the proportion of outliers detected of the MCEs statistic is low when the concentration parameter is less than 6 with 10% contamination and this proportion significantly decrease with 20 % contamination.

Fig. 2(af): 
Proportion of outliers detected and rate of masking and swamping with 5% cutoff points and 10% contamination 

Fig. 3(af): 
Proportion of outliers detected and rate of masking and swamping with 5% cutoff points and 20% contamination 

Fig. 4:  Scatter plot of the wind direction data 
Consequently, it has high rate of masking. The MCEc statistic relatively has a higher proportion of outliers detected than the MCEs statistic and the proportion increases with the concentration parameter but it is low at 20% contaminated. The proportion of outliers detected of the proposed RCD_{y} statistic is relatively low for small value of k. This is acceptable because the circular data will be more spread around the circumference of the circle when the concentration parameter is low. Consequently, it is very difficult to identify outliers in this case^{16}. As expected, the RCD_{y} statistic gives a greater proportion of outliers detected than the other statistics. The proportion is an increasing function of the concentration parameter and increases to 100% for values of the concentration parameter greater than 5. Therefore, the rate of masking is very low and is a decreasing function of the concentration parameter, decreasing down to 0%.
In general, the proposed RCD_{y} statistic is very successful in the detection of outliers because the circular median is one of the robust location parameter in a circular data. For these reasons, the RCD_{y} statistic is the best when compared to the other three measures. It has the highest proportion of outliers detected and the lowest rates of both masking and swamping. Due to space constraints, the results for n = 10 and 50 are not shown. However, the results are consistent.
Practical example: The wind direction data that is studied by Abuzaid et al.^{7} is considered. The data represent measurements by using an HF radar system and an anchored wave buoy with sample size (n = 129). Figure 4 showed the scatter plot of the wind direction data, where X are the circular observations measured by the HF radar and Y are the circular observations measured by the anchored wave buoy. The estimated concentration parameter is . By referring to Table 2 with 5% upper percentile, the cutoff point is equal to 1.28. The RCD_{y} statistic is calculated and the results are plotted in Fig. 5.

Fig. 5:  [RCD_{i}]_{y} statistic of the wind direction data 
It can be seen that the observations numbered 38 and 111 exceed the cutoff point. Therefore, they are considered as outliers. These results are in agreement with the results of Abuzaid et al.^{7}. Abuzaid et al.^{7}pointed out that the two points at the left top of the plot in Fig. 4 are not outliers. They are consistent with the rest of the observations at the right top or left bottom because of the closed range property of the circular variable.
CONCLUSION
A new statistic is proposed to detect outliers in Y in a simple circular regression model. Proportion of detection outliers and rate of masking and swamping are used to evaluate performance of the proposed method. The results showed that the proposed RCD_{y} statistic is very successful in identifying genuine outliers for different sample sizes and with very low rates of masking and swamping.