Consider a normally distributed random variable X with a probability density function f (x) specified as
In many applications, the random variable X, which has a N(1, σ2) distribution do not admit values less than or equal to zero. We must therefore disregard or truncate all values of X#0 to take care of the admissible region of X>0.
Now, if the values of X below or equal to zero cannot be observed due to censoring or truncation, then the resulting distribution is a left-truncated normal distribution with probability density function f*(x) given by Uche (2003).
The distribution (2) is said to be truncated to the left at zero, k is the normalizing quantity and its value is obtained by noting that
Following Hald=s conventions (Hald, 1952; Rohatgi, 1976) the resulting distribution (2) that satisfies (3) can be written as
where f(x) is as defined in (1). For purposes of generality, Eq. (4) can be re-stated in terms of the standard normal distribution (Cohen, 1991; Johnson and Kotz, 1970; Schneider, 1986). It is important to note that Eq. (2) and (4) are the same and all computations in this research are based on (2).
For a review of prior research on estimation procedures/algorithms of the parameters of the truncated normal distribution (Johnson and Thomopoulos, 2004). Johnson and Thomopoulos (2004) after looking at the characteristics of the truncated population, also provided tables of the cumulative distribution function of the left-truncated normal distribution.
The focus of this research is on the characteristic parameters (mean and standard deviation) of the truncated population with a view to achieving positive values only. The restriction to be placed on σ for the achievement of positive values will be investigated. The N(1, σ2) assumption is always placed on the residuals of the multiplicative time series models in descriptive time series analysis. The residuals are also assumed to be positive. To linearize the multiplicative model, the logarithmic transformation is taken. A brief discussion of the logarithmic transformation of the left-truncated N(1, σ2) distribution is also presented.
MEAN AND VARIANCE OF THE TRUNCATED N(1, σ2) DISTRIBUTION
Using (2), we obtain
and Eq. (6) is a proper p.d.f. The expectation from is
If we let ,
we obtain from (6) that
Using (7) and (10), we obtain
The following equations are correct:
Using (13) and (14), we obtain
Using (8) and (11), we obtain
Finally, substituting (17) into (18), we obtain
Our computations have shown that the expression for the mean of the left-truncated N(1, σ2) distribution is in terms of the cumulative distribution function (c.d.f.) of the standard normal Eq. (8), while that of the variance is in terms of the cumulative distribution function of the standard normal and the c.d.f. of the Chi-Square distribution with one degree of freedom Eq. 20. Our results for the variance agree with those of Barr and Sherill (1999).
PROPERTIES OF THE MEAN AND VARIANCE OF THE TRUNCATED N(1, σ2) Distribution
In most applications, the requirements needed are (i) f*(x), x#0, (ii) E* (X)
= E(X) and var* (X) = var(X). Our derivations have shown that all these depend
on the parameter σ. To investigate these requirements, we look for the
values of σ for which (i) f*(0) ≈ 0, (ii) E*(X)-E(X) ≈ 0 and (iii) var(X)
- var* (X) ≈ 0.
Figure 1 gives the shape of the distribution curve for some
values of σ. The two curves for the original (non-truncated) and the truncated
populations coincide for some values of σ, while for some values of σ
the truncated curve do not admit values greater than zero when they are approximately
the same. Generally, the truncated values are always greater or equal to the
non-truncated values for all values of σ. We would, therefore, expect the
two stochastic variables to behave alike in some interval σ0 (0, b). The
quantity b will be determined by use of the three requirements stated above.
||The curve shapes of the left-truncated N(1, σ2)
||Summary of results for f* (0), E* (X) - E (X), var (X) - var*
Our simulation results (all simulations in this paper were performed using
MINITAB) will help determine the level of accuracy required for the two stochastic
variables to behave alike in a prescribed interval.
Using Eq. 5, 8 and 20, results were obtained for 0.01≤≤2.0 and the results obtained are summarized in Table 1.
From Table 1 and using the three requirements stipulated, it is clear that the desired interval is the interval σ<0.30. It follows from our analysis that the 0.001 limits may be used to give practical assurance that the truncated values from the N(1, σ2) are all positive. The 0.001 probability limits (99.9% confidence limits) are practically equivalent to 3.2905σ limits and for the lower confidence limit to be positive we have 1 - 3.2905σ>0. This implies that σ<0.30 with the truncated values lying between 0.01 and 1.99 (0.01<X<1.99). It is also clear from this analysis that the admissible region, σ<0.30, is needed to ensure that the mean and variance of the truncated distribution remains, respectively, at the assumed values of 1 and σ2.
TIME SERIES APPLICATION
A general multiplicative model for descriptive time series analysis would be of the form
where, for time t, Xt denotes the observed value of the series,
Tt is the trend, St the seasonal term, Ct the
cyclic term and It is the irregular or random component of the series.
The model (21) will be regarded to be adequate when the irregular component
is purely random. If short period of time are involved, the cyclical component
is superimposed into the trend (Chartfield, 1980), p.13 and we obtain a trend-cycle
component denoted by Mt For our purposes, It to be denoted
by et are independent, identically distributed normal errors with
mean 1 and variance .
Model (21) can then be written as
Descriptive modeling of the multiplicative time series model (22) generally involve the logarithmic transformation. Upon linearization of (22), we obtain
. It is important to note that (24) is the additive time series model. For model
(24), the variant assumption on the random component
is that they are independent, identically distributed normal errors with mean
zero and variance .
It is evident that for the left-truncated N(1, σ2) distribution given by (6), we need to investigate the characteristics of
The probability density function of Y is given by .
Equation (26) is a proper probability density function and its curve shapes for some values of σ are given in Fig. 2.
It is clear from Fig. 2 that the distribution of Y = loge X is skewed to the left with some symmetric and bell-shaped conditions for small values of σ. For all values of σ, the curve f (y) has one maximum point (y0) and one maximum value (f(y0)). To obtain the maximum point for a given value of σ, we consider f` (y) = 0. Now,
||Curve shapes of the logarithm of the left-truncated
If we let w = ey, we obtain
For our purposes (investigation of bell-shaped conditions), y0 . 0. Using the number of significant decimal places to determine the level of accuracy required, we obtain the summary listed in the last column of Table 1. We use 2 significant decimal places as the level of accuracy required to obtain that σ#0.07 for symmetry and bell-shaped conditions. However, if and when the accuracy requirement is 1 significant decimal place, we use σ≤0.23.
Bell-shaped frequency distributions will have
is the kth moment about the mean μy of Y = loge X. We must recall at this point that γ1 is a measure of skewness and γ2 is a measure of kurtosis. The solution of (33) cannot be solved easily with an analytic approach, we therefore resort to the Monte Carlo method.
Sample estimates of the moments about the mean are given by
where n is the sample size. Artificial data was generated from the N(1, σ2) density function for the variable X and the X values obtained were subsequently used to generate from the model Y = loge X, where 0.07#σ#0.23. For each configuration of (n = 100, 0.07#σ#0.23), 1000 replications were performed. For want of space, we show the results of the first 30 replications for two configurations: (n = 100, σ#0.07) in Table 2 and (n = 100, σ#0.23) in Table 3.
We now consider the significance of skewness and kurtosis for samples of size
n = 100. We must note that γ1 and γ2 measure
nonnormality on the basis of skewness and kurtosis of the distribution, respectively.
The 0.05 and 0.01 points of the sampling distributions of γ1
and γ2 are given in Duncan (1974) as Tables F and G, respectively.
If a given sample has a value of γ1 and γ2 beyond
the 0.05 point (a stricter test would use the 0.01 points) for that statistics
the population may be deemed to be nonnormal. For samples of n = 100, the upper
0.05 point for γ1 is 0.389 and the upper 0.01 point is 0.567.
||Simulation results for skewness (γ1) and (γ2)
kurtosis when σ = 0.07
||Simulation results for skewness (γ1) and (γ2)
kurtosis when σ = 0.23
Equivalently, the upper 0.05 point for γ2 is -0.65, for γ2<0
and 0.77 for γ2>0; while the upper 0.01 point is -0.82 for
γ2<0 and 1.39 for γ2>0. It will be noted
that a population distribution can have γ1 = 0 or γ2
= 0 without their being normal (Shewart, 1931). Sample values close to these
normal values do not therefore prove normality; they merely do not lead to the
rejection of the normal hypothesis.
It is clear from Table 2 that when σ = 0.07, both X and Y = loge X can be assumed to be normally distributed with var (X) . var (Y). On the other hand, we infer from Table 3 that when σ = 0.23, Y = loge X, cannot be assumed to be normal on the basis of the sample values of γ1 and/or γ2. Our search for normality of Y = loge X for X - N(1, σ2), 0.07#σ#0.23 showed that the sample value of γ1 indicates departure from normality from σ = 0.10. We conclude that when X - N(1, σ2), the logarithmic transform Y = loge X can be assumed to be normally distributed with mean zero and variance σ2, provided σ<0.10.
In this research, we have examined some implications of truncating the N(1, σ2) distribution to the left at zero. We observed that the truncated values are always greater or equal to the non-truncated values for all values of σ. However, the two stochastic variables behave alike in the interval σ<0.30. It follows from our analysis that the 0.001 limits may be used to give practical assurance that the truncated (truncation at zero) values from the N(1, σ2) distribution are all positive. In the interval σ<0.30, the truncated and the non-truncated variables have the same mean equal to 1 and variance equal to σ2.
The most important implication of truncating the N(1, σ2) distribution to the left at zero is in descriptive modelling of time series data, where the logarithmic transform of the truncated distribution is equally assumed to have mean zero and some finite variance. It was noted that the logarithmic transform will have mean zero and the same variance as both the original N(1, σ2) distribution and its truncated distribution in the interval σ<0.10.
The results of our study suggests that the logarithmic transforms of the normal distribution are worthy of future research. Part of this research must include the left-truncated N(1, σ2) distribution which has important applications in descriptive time series studies. Further work is needed to compute analytically the moments about the mean of the logarithmic transform. However, our results suggests one should exercise extreme caution with respect to the value of σ when linearizing multiplicative models in descriptive time series analysis.