Subscribe Now Subscribe Today
Research Article
 

An Investigation of Causation: The Unreplicated Linear Functional Relationship Model



Chang Yun Fah, Abdul Ghapor Hussin and Omar Mohd Rijal
 
Facebook Twitter Digg Reddit Linkedin StumbleUpon E-mail
ABSTRACT

This study investigates the Unreplicated Linear Functional Relationship (ULFR) model where the measurement error term, δ, is introduced. The coefficient of determination (COD) of ULFR, denoted by is proposed and it properties are investigated. When the introduction of δ increases significantly the COD, we say that the causation factor has been incorporated into the independent variable. Present result on the Malaysian road accident data illustrates the causal relationship between the socio-biological factors and road accident may be explained.

Services
Related Articles in ASCI
Similar Articles in this Journal
Search in Google Scholar
View Citation
Report Citation

 
  How to cite this article:

Chang Yun Fah, Abdul Ghapor Hussin and Omar Mohd Rijal, 2007. An Investigation of Causation: The Unreplicated Linear Functional Relationship Model. Journal of Applied Sciences, 7: 20-26.

DOI: 10.3923/jas.2007.20.26

URL: https://scialert.net/abstract/?doi=jas.2007.20.26

INTRODUCTION

The seriousness of the road accident problem warrants a deeper investigation into the possible causes. In particular the ULFR model is investigated. The studies of functional and structural relationships have been focused on estimating the parameters and the measurement error models since it was first introduced by Adcock in year 1877 (Sprent, 1990). Several methods were proposed, but the widely used estimator is the maximum likelihood method (Dolby, 1976; Fuller, 1987; Kendall and Stuart, 1979). An early study to check the adequacy of the ULFR model is given by Fuller (1987) who suggests using the residual plot as an indication of nonlinearity, lack of homogeneity of the error variances, nonnormality of the errors and outlier observations. In this study we propose the COD as a measure to check the model adequacy and the assumption of independent variable. Without loss of generality, the properties of this measurement for case λ = 1 is investigated. A large value of COD of the ULFR relative to the COD of ordinary Simple Linear (SL) regression model is indicative of causation.

As pointed out by Fuller (1987), the assumption that the independent variable can be measured exactly may not be realistic in many situations. The estimates of independent variable may contain measurement error arising from trying to quantify a variable that has no physical dimension. For example in the road accident problem, it has been shown that the male motorist is more prone to be involved in accident. One possible explanation is the performance of male drivers may vary with state of mind. Since this factor, the state of mind cannot be measured directly, it is more reasonable to consider them as a measurement error rather than introduce a new independent variable into the model.

UNREPLICATED LINEAR FUNCTIONAL RELATIONSHIP MODEL

Suppose that X and Y are two linearly related unobservable variables

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model
(1)

and the two corresponding random variables x and y are observed with error δ and ε, respectively as

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model
(2)

The following conditions are assumed

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model
(3)

where, δi and εi are mutually independent and normally distributed random variables. Equation 1 and 2 are known as ULFR model when there is only one relationship between the two variables X and Y. When the ratio of the error variances is known, that is Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model, then the maximum likelihood estimators of parameters α, β, and Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model Xi are:

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model
(4)

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model
(5)

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model
(6)

and

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model
(7)

where,

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model
Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model

DERIVATION OF THE COEFFICIENT OF DETERMINATION (COD)

In an ordinary SL regression analysis, we look at the COD as a measure of the variability in y explained by the regression model. The construction of the COD is a good practice when fitting the ULFR model.

The Eq. 1 and 2 can be rewritten as

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model
(8)

If we substitute Xi by (xii) in (8), then we have the expression

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model

Where, the errors of the model , Vi = (εi - βδi) = yi - (α + βxi) for = 1, 2, …, n is a normally distributed random variable with zero mean and variance Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model. If are the estimates of α and β respectively, then

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model
(9)

will be the residual of the model. From Kendall and Stuart (1979) and Andrson (1984), that the sum of squared distances of the observed points from the fitted line or the residual sum of squares (SSE) is given as:

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model
(10)

We shall consider here that the ratio of the error variances is equal to one (λ = 1). For those cases when λ ≠ 1, we can always reduce this to the case of λ = 1 by dividing the observed values of y by λ1/2 (Kendall and Stuart, 1979). Hence, we have:

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model
(11)

In the same way as ordinary linear regression, we can now define the COD of the ULFR (Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model) as the proportion of variation explained by the variable x, that is:

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model
(12)

where, SSR is the regression sum of squares which can be derived as:

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model
(13)

We can summarise our proposed COD with the following results.

Result 1: Let the ratio of the error variances be known and is equal to one, Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model = λ = 1, then the COD for the ULFR model is

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model
(14)

Proof: When λ = 1, then Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model in Eq. 5 becomes

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model

and

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model

Notice that Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model From Eq. 13,

we have,

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model

From the R.H.S.

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model

Result 2: Let Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model and Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model be the slope estimator for the simple linear regression and the ULFR, respectively. The corresponding COD are:

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model

Then Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model.

Proof: From the regression sum of squares, we obtain

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model

we also know that simple linear regression has Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model.

We now proof Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model by contradiction. Assuming that

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model
Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model

Also from Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model, this is a contradiction. Hence, we have Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model..

PROPERTIES OF Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model

Since Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model is computed following the same method as Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model, some of the properties of Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model may also remain for Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model. It is known that Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model does not measure the appropriateness of the linear model (Montgomery and Peck, 1992). This holds for Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model too. As an example, when a nonlinear model has a large Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model value, it is obvious that Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model will also be large (Result 2).

Secondly, the magnitude of Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model depends on the range of variability in the variables X and Y. From Eq. 14, we have

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model

Let var(X) and var(Y) be variances of X and Y, respectively then and var(Y) = Syy/ n-1. We consider three possible cases of relationship between var(X) and var(Y) when Sxx > 0 and Syy > 0.

Case 1: If var(X) = var(Y) ⇔ Sxx = Syy, then

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model

when compared with Eq. 14, this holds when Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model = 1.

Case 2: If var(X) < var(Y) ⇔ Syy/k = Sxx < Syy for k > 1, then

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model

when compared with Eq. 14, this holds when Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model >1.

Case 3: If var(X) > var(Y) ⇔ kSyy = Syy for k>1, then

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model

when compared with Eq. 14, this holds when Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model<1.

We can summarize the above properties with the following result.

Result 3: Let Sxx > 0, Syy > 0 and Sxx = kSyy or var(x) = k var(y), then

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model

The numerical example of the above results is given in Fig. 1a-1f. To explain the relationship between Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model and Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model, the value of CODs are compared for different sample sizes (n = 10, 50, 100, 1000, 5000 and 10000). For a given sample size, fifty samples were generated from a uniform distribution, each contains two random data sets (yi, xi). As an example, Fig. 1a shows fifty values of COD derived from SL and ULFR methods with sample size n = 10. Figure 1b-1f show the same plot with sample sizes n = 50, 100, 1000, 5000 and 10000, respectively. It is clearly shown that Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model is always greater or equal to Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model. One interesting feature shown in Fig. 1 is that both Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model and Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model decreased when the sample size increased.

APPLICATION IN ROAD ACCIDENT PROBLEM

So far, various road accident models have been proposed to explain Malaysian road accident phenomena (Radin et al., 1995; Radin, 1996; Razali and Dahalan, 1992). However, PDRM road statistics show that the number of road accidents still remains high. One reason the Malaysian road accident problem has not been solved effectively is that the existing models are constructed for prediction purpose rather than causal inference. The crucial factor ignored in the existing models is the driver ’s state of mind that cannot be measured directly. The driver state of mind includes driver ’s attitude, ability to concentrate and tendency to take risks. It has been suggested that the driver ’s state of mind is an essential factor that causes road accident (Leeming, 1969; Forbes, 1972) .

In this example, we introduce the state of mind as a factor that causes variations in other physically measurable factors such as driver ’s sex, age and race. This is formally expressed using the ULFR model where the measurement error, δ is representing the state of mind. Difference between Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model and Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model is used to measure the effect of the state of mind to the road accident model. Since Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model and Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model have similar interpretations by construction, when Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model is significantly larger than Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model we say that the state of mind has been incorporated into the independent variable and the model now explains the causal relationship of road accident.

Table 1 shows the socio-biological variables of the Malaysian road accident data for 1996 obtained form Malaysian Royal Police (PDRM) headquarters. The Box-Cox power transformation (Box and Cox, 1964) is presented to access normality with Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model = [0.25 0.69 1.25 0.14 0.23 0.47]. Without loss of generality, we assume that the ratio of variances, λ, equals one by dividing the number of state accidents by the total number of accidents in Malaysia.

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model
Fig. 1a: COD plots for SL and ULFR models with n = 10

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model
Fig. 1b: COD plot for SL and ULFR models with with n = 50

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model
Fig. 1c: COD plots for SL and ULFR models with n = 100

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model
Fig. 1d: COD plot for SL and ULFR models with n = 1000

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model
Fig. 1e: COD plots for SL and ULFR models with n = 5000

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model
Fig. 1f: COD plot for SL and ULFR models with with n = 10000

Table 1: Socio-biological explanatory variables
Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model

Source: PDRM (Malaysian Royal Police, 1995-1998)

Note:

The number of vehicles involved in 1996 does not include accident of type ‘slightly injured ’ and property damaged only
Y = number of road accidents ( ‘000)
P1 = Age (ratio of drivers aged 16-30 years to total number of drivers involved in accidents)
P2 = Race (ratio of Chinese drivers to total number of drivers involved in accidents)
P3 = Number of serious and fatal accidents due to influence of alcohol ( ’00)
P4 = Experience (ratio of vehicles with drivers of less than 5 years ’ experience to total number of vehicles involved in accidents)
P5 = Sex (ratio of male drivers to total number of drivers involved in accidents)
The numbers given are ratios explained in (2). For example, 0.28 in column two row six means that for every 100 cases of road accident in Selangor during 1996 about 28 cases involved Chinese drivers.

First, we look at the estimation of parameters for the ULFR model containing the state of mind measurement error for drivers age (P1) is given as

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model

therefore, the ULFR model is

Table 2: Estimated parameters, coefficient of determination for the ULFR and SL models and the effect of measurement error
Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model

with the coefficient of determination, Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model = 0.4408 Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model (see Result 3). Comparing with the SL model

Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model

the result shows that the ULFR model explains the variability of Y better than the SL model (Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model >R2). The SL model explains 12.31% of the variance of the number of road accidents but it increases to 44.08% when conditioned to ‘state of mind ’ factor. This suggests that the state of mind explains 31.77% (Ee = Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model-Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model) of the variance of road accident number.

It is clearly show (Table 2) that the ULFR model gives a better fit than the SL model with higher COD. The SL model indicates that variables P1, P3 and P5 are not significantly contribute to the road accident rates. The first interpretation opposes other studies (Arthur and Little, 1970; IATSS, 1988; OECD Road Research Group, 1978; OECD Scientific Export Group, 1986) that age, gentle and alcoholic are important factors. However, when the state of mind is introduced, the three variables, especially P5 become important and it may use to explain causal relationship of the road accident. For example, we say the younger driver may be the cause of higher accident rates because of their driving attitude or tendency to take risks.

The introduction of state of mind does not increase the COD of variables P2 and P4 significantly. This suggests that the driver ’s race and experience are two exact predictor of road accident. On the other hand, both Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model and Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model indicate race and driving experience are not significantly explained the road accident model. Henceforth, only variables P1, P3 and P5 are recommended for further analysis such as multiple regression.

CONCLUSION AND FURTHER WORK

In this study, the COD of ULFR is proposed and it properties are investigated. The proposed COD and the COD of ordinary linear regression model are used to investigate the effect of measurement error and hence try to explain the causal relationship of the model. It is a useful way to check whether a variable is reasonably assumed to be independent. This study on the Malaysian road accident problem demonstrates that the effect of the state of mind, indicated by the measurement error is clear but different in all factors.

However, confirmation of the causal properties in the ULFR model may be further strengthening by studying further the distribution of Image for - An Investigation of Causation: The Unreplicated Linear Functional Relationship Model. The multivariate version of ULFR may also be considered in further studies.

REFERENCES
1:  Anderson, T.W., 1984. The 1992 Wald memorial lectures: Estimating linear statistical relationships. Ann. Stat., 12: 1-18.

2:  Arthur, D. and I.N.C. Little, 1970. The State of the Art of Traffic Safety: A Comprehensive Review of Existing Information. Praeger Publishers, New York, pp: 38-42.

3:  Box, G.E.P. and D.R. Cox, 1964. An analysis of transformation. J. Roy. Stat. Soci., 26: 211-252.
Direct Link  |  

4:  Dolby, G.R., 1976. The ultrastructural relation: A synthesis of the functional and structural relations. Biometrika, 63: 39-50.
CrossRef  |  Direct Link  |  

5:  Forbes, T.W., 1972. Human Factors in Highway Traffic Safety Research. John Wiley and Sons, Canada.

6:  Fuller, W.A., 1987. Measurement Error Models. Wiley, New York.

7:  IATSS., 1988. Statistics '87/'88: Road accidents Japan, Tokyo. Traffic Bureau, National Police Agency.

8:  Kendall, M.G. and A. Stuart, 1979. The Advanced Theory of Statistics. 2nd Edn., Griffin, London.

9:  Leeming, J.J., 1969. Road Accident: Prevent or Punish? Cassell and Company Ltd., London.

10:  Montgomery, D.C. and E.A. Peck, 1992. Introduction to Linear Regression Analysis 2nd Edn., John Wiley and Sons Inc., New York.

11:  OECD Road Research Group, 1978. New research on the role of alcohol and drugs in road accident. A Report, Paris: OECD.

12:  OECD Scientific Expert Group, 1986. Guidelines for improving the safety of elderly road users. Paris: OECD.

13:  Polis Diraja Malaysia (PDRM), 1995-1998. Laporan tahunan cawangan traffik bukit aman. Kuala Lumpur: PDRM.

14:  Umar, R.S.R., M.G, Mackay and B.L, Hill, 1995. Preliminary analysis of exclusive motorcycle lanes along the federal highway FO2, Shah Alam, Malaysia. J. IATSS Res., 19: 93-98.
Direct Link  |  

15:  Radin, U.R.S., 1996. Model kematian dan kecederaan di malaysia unjuran tahun 2000. Kertas Dasar Keselamatan Jalan Raya, Serdang: Kementerian Pengangkutan dan Universiti Putra Malaysia.

16:  Razali, A.M. and B.H. Dahalan, 1992. Analysis Data Kamalangan Jalan Raya. Jabatan Statistik Universiti Kebangsaan Malaysia, Bangi.

17:  Sprent, P., 1990. Some history of functional and structural relationships. Contemporary Math., 112: 3-15.

©  2021 Science Alert. All Rights Reserved