The Monte Carlo method provides approximate solutions to a variety of mathematical problems (Bauer, 1958). As is well known, the Monte-Carlo method is composed from three composite parts. Firstly, this is a simulation of random variables with the known distributions, secondly, construction of probability models for real processes and at last, problems of theory of the statistical estimation (Rubenstein, 1981). Certainly, the basic ideas of this method are the law of large numbers and the central limit theorem (Ermakov, 1971; Bevrani, 2003; Gentle, 2004). In both cases the sample size is unknown. Frequently there is a question, whether enough the available statistical data that the inference made on their basis, were exact and reliable, in other words, whether available sampling is representative. Also it is rather general problem. Therefore the purpose of the given article is the estimation of samples value for the Monte-Carlo method.
THE MONTE-CARLO METHOD
Let it is required to calculate approximately model I with the help of a Monte-Carlo method. Then it is necessary to find an random variable U, such, that its mathematical expectation is equal I: EU = I.
Lets consider (n+1) independent identically distributed random variables U1,
U2,...,Un with the finite second moments. Then from the
central limiting theorem it follows that;
where, Φ (x) is a standard normal distribution function.
This relation means, that if we have sufficiently big amount of observations
U1, U2,..., Un, the required model can be approximately
calculated as follows:
Thus, with the probability near to 0.998, we mistake on value, not exceeding.
Easy to see, that EI*n = I.
ESTIMATION OF SAMPLE SIZE
Lets consider the problem on the accuracy of the approximation I*n
≈. I. Unfortunately, unlike the determined (nonrandom) schemes, analysis
of random data requires more then one parameter describing the accuracy, as
random, for any (0,
1), that is, for one sampling this event may happen and for any another-may
not. Therefore alongside with the parameter ε describing the accuracy,
well set one more parameter γ (0,
1)- confidence of a statistical inference. Well require, that the probability
of the indicated event was not less then γ, that is,
Thus it is clear, that ε should be close to zero and γ should be
close to unit, characterizing our confidence of the regularity of the inference.
Now we are passing to the estimation of a sample size. Well start with traditional
approaches, using the Chebyshevs inequality and the central limiting theorem.
Then well consider more accurate estimates which take into account an error
of normal approximation. These estimates will be based of the Berry-Esseens
inequality and its more exact analogue for the case of smooth distributions.
Solution based of the Chebyshevs inequality: On the Chebyshevs inequality:
Hence, condition (3) is satisfied, if Denote
DU = σ2, then and
a low bound for the number of observations will look like:
Solution based on the central limit theorem: As is well known, the Chebyshevs
inequality is rather rough, therefore, using the Central Limiting Theorem (CLT)
instead of it permits to hope, that estimates for the necessary sample size
and appropriate accuracy would be more optimistically. CLT implies, that for
the sufficiently big n
Taking into account requirements on the confidence of our inference, we obtain,
that probability (6) should be no more than 1–γ:
when in view of definition α-quantiles zα of the standard
normal law we obtain
As it is easy to see, estimates (5) and (7)
differ only in the factors (1–γ)-1 and .For
example, let us assign γ = 0.95, then the condition (5)
requires that the relation was not
less than 20, while the (7) one-only 3.85 (z0.975
= 1.96), that is more, than five times better. Such in the image, the CLT allows
to receive more optimistically estimates, however optimism from apparent advantage
of the solution based on the CLT, doesnt owe us to weaken. The matter is that
the Chebyshevs inequality gives though rough, but absolutely correct, guaranteed
estimates for the samples value and for the accuracy. At the same time, attracting
the CLT, we use approximate equality (6), which brings itself
an error into the inference. In the following section well correct this lack.
Solutions which take into account the accuracy of the normal approximation: The Berry-Esseen inequality as an estimate of the rate of convergence in the CLT is well known in the probability theory. This estimate holds for an arbitrary distribution with the finite third moment.
Assume, that the random variable U has the finite third moment and denote β3
= M|U–I|3. Then, applying the Berry-Esseens inequality to
the accuracy estimation of relation (6), we obtain:
and C0 is an absolute constant with the upper bound C0<0.7655
(Shiganov, 1986; Korolev and Shevtsova, 2006). Thus, more accurate estimate
for the samples value is as follows:
Results analysis: Let σ2 = 1. Then the required samples
value can be easily computed with the help of relations Eq. 5,
7, 9 and 11. The outcomes of these computations
are shown in the Table 1 and 2 (Appendix).
The first Table 1 is constructed for ε = 0.001 and the
second one- for ε = 0.01. The upper rows contain values of confidence level
γ, the second and the third ones-values of the samples sizes, obtained
by using the Chebyshevs inequality (Eq. 5) and the CLT (Eq.
7), accordingly. A marginal left column contains the values of L3
(from 0.7655 till 2.1655 with the step 0.1). We consider so lower bound for
L3, because as it follows from the Lyapunovs inequality and
therefore L3>=C0. The samples value can be found in the
intersection of the row with appropriate value L3 and the column
with required confidence level γ.
||Estimations for the samples value when ε = 0.001
||Estimations for the samples value when ε = 0.01
This research has been supported by the Research Institute for Fundamental Sciences, Tabriz, Iran. The authors would like to thank this support.