INTRODUCTION
Metaanalysis can be a useful tool in several research experiments where certain data are directly effected by natural events. The examples could include environmental research, forestry inventory control, sea food harvesting management, resource intensive or very costly natural experiments and other similar problems. For the purpose of our discussion, we have used the example of environmental research.
Many investigators, including Cuthbertson^{[1]}, have highlighted the value of metadata and metaanalysis in environmental studies. For example, it is worth quoting from Hedges and Olkin^{[2]}. “Finally, some notice might be in order regarding the usefulness of metaanalysis in environmental research. Using metaanalysis a researcher can ascertain which issues are likely to reward additional inquiry. In addition, at the peerreview level, which issues are likely to reward additional inquiry. In addition, at the peerreview level, metaanalysis can be used to synthesize apparently conflicting results into a unified corpus of knowledge”. Metaanalysis basically deals with techniques of synthesis of statistical inferences from two different/independent studies, possibly different spatially and/or at two different points of time. One of the currently investigated areas, receiving intensive attention of researchers in this field, is the handling of studies which might be dependent: possibly also with missing data^{[3,4]}.
From metadata perspective, we would almost always have, at the metalevel, the temporal and spatial gaps in the data. Because of fastchanging environmental systemparameters in the present decade (by comparison with the years of, say, the preceding decade), timeseries methods might not be adequate if we wish to have synthetic data for relevant system variables. On the contrary, using the preceding decade’s metadata might well capture this fastchanging pattern more adeptly than the use of time series methods. It will, therefore, do better if treated via the use of ancillary/auxiliary variables setup, to capture this ratherfasterchanging pattern more adeptly than via the timeseries methods. In addition, there might be many spatial gaps where the underdeveloped and developing countries have insufficient resources, not enough to monitor many of the environmental systemvariables, for some variables, data may not be available at all. To fill these gaps, smallscale surveys may be conducted by international agencies. However. Some of the variables may continue to be missing due to practical reasons, such as the lack of suitable infrastructure for the experimental work, etc.
Therefore, same strategy of using information on the ancillary/auxiliary variables
would be appropriate for those missing variables. It is necessary to devise
an estimation strategy which would exploit the high correlation (positive or
negative) between the study variables and ancillary/ auxiliary variables. There
are two main facts to be noted in this context: firstly, the usable data on
such auxiliary/ancillary variables may not be extensive. The available data
on such variables might have high dispersion temporally and/or spatially. Therefore
it might not be advisable to use more than say, ten to twenty years’ data
due to rapidly changing environmental system. Similarly in the other case, due
to high spatial variability of data it could be equally undesirable to use data
on the relevant auxiliary variable for more than ten or possibly less number
of countries or sites with similar environmental zone where such a data becomes
available. Secondly, due to systemdynamics of the environs and its rapidly
changing patterns (spatially and temporally), the dispersion of the relevant
metadata on such an auxiliary/ancillary variables would be relatively very
high. This has to be tackled statistically in as much as the variances of subgroups
(spatial/temporal) of data on such variables should not be highly significantly
different to be synthesized in metadata setup.
The data on the environmental variables are usually collected using small samples. Otherwise the data become very expensive. In some instances, if the study continues for longer period, the information on certain environmental variable becomes more expensive. Specially in the case of developing countries, it is very common to face the problem of missing information (spatial and/or temporal) on environmental variables due to limited availability of resources to monitor those. To overcome these difficulties, we might be better off is we tackle the gaps in the data using metadata setup by searching for highly correlated (but relatively dispersed) variables having auxiliary/ancillary status of information for the study variable.
EXAMPLE APPLICATION
To illustrate, survey results for 19 landslidedamaged sites are reported by Reddy and Singh^{[5]}. Eleven of these sites were in Oakzone and 8 were in Pine or mixed Pine zones. Vegetation analysis was carried out for 10 one metersquared quadrants: Biomass of the herb layer at the peak growth stage (i.e. in last week of august) was determined for 5 soil monoliths (25x25x30 cm (deep)) excavated randomly. Soil samples were collected in triplicate, from each site from 010 cm soil depths. Statistical analysis was done via the analysis of variance and nonlinear regressions. We note that all the data were collected using small samples. Here the variable (age of site in years), say X, was found to be a good auxiliary variable. The relative dispersion for X was high: C_{X} = 0.98 for Pinezone and C_{X} = 1.09 for Oakzone: It was almost double of the relative dispersion of one of the important study variable, namely, A:P (Annual to Perennial (Herbspecies) Ratio), say Y, with C_{Y} = 0.48 for the Pinezone and C_{Y} = 0.57 for the Oakzone. The corresponding values of C_{Y} for the other study variables for these two zones were (0.18, 0.28), (0.10, 0.19), (0.50, 0.42) and (0.62, 0.61); when Y ≡ Species, Annuals, Perennial and Cover, respectively and were as high as 0.81 and 0.90, when Y ≡ Density. Reddy and Singh^{[5] }reported many couples of variables with a significantly high correlation: ± 0.6 to as high as 0.972. For the one relevant to our example, the coefficient of correlation between the study variables Y (i.e. A:P Ratio) and the auxiliary variables X (i.e. age of site) are fairly significant as reported in the paper: 0.718 for Pinezone and –0.420 for Oakzone.
Thus in the context of environmental database management in the metadata setup and in the light of the aforesaid facts, we need to devise (for small sample) an estimation strategy for transporting the ancillary information, contained in as auxiliary variable (X). This variable X would be having a significant correlation with the study variable (Y); whereas, Xdata might well be rather much more dispersed than the Ydata.
THE ESTIMATORS
The estimation strategies proposed in this study are motivated by the mixingtype estimation of Vos^{[6]}. Sahai^{[7]} used auxiliary information efficiently in his mixing type estimators. The same goal of efficiency is addressed here using auxiliary information (when the sample is small, i.e. n<30) for the problem of estimating the population average of the study variable. Subsequently, it can be used in aggregation or disaggregation of metadata. In practice, it is possible to get hold of such an auxiliary variable with population mean (e.g. population average of age (in years) of the Oakzone/Pinezone sites in the proceeding illustration) which could either be known or could be know using past information.
With the above motivation, the following families of estimates are proposed. Each of these two families of estimates consists of two nonstochastic design parameters, namely α and β:
and
Where, x̄ and are the sample means of the auxiliary and study variables, respectively
and is the population mean of auxiliary variable. The choice of the appropriate
values for the two designparameters is governed by their roles to minimize
the first order approximation (upto the terms of O(n^{1}) and the second
order approximation (upto terms O(n^{2})) to the standard error of
the relevant estimators. Further, the use of estimators in these families is
highly recommended, provided the absolute value of the quantity within square
brackets of Eq. 1 and 2 (i.e., the perturbators
for
) stay away from unity on its either side by upto 30 percent of the quantity
(1+2ρ^{2}) where, ρ is the coefficient of correlation between
the two variables X and Y. We may note that the term (1 + 2ρ^{2})
is the leading multiplier of the second order approximation of the mean square
error Eq. 10 and 11. The simulation study reported that only
in 3% to 9% of the cases (depending on ρ values) the perturbations stay
out of these bounds.
It may be noted that the mixing estimators of Vos^{[6]} are:
and
wherein
are the wellknown product and quotient estimators, respectively and α
is the designparameter that minimizes the first order approximation (O(n^{1}))
to the standard error of the estimator.
On the other hand structurally, the proposed families of estimators happen
to be a result of marrying the perspective of the oneparameter, family of the
estimators, _{sa
}^{[7]} and the one with that of ^{[8]}. These are, respectively:
where, in both cases the value of the designparamater, α is chosen similarly
as in the case of the estimators of Vos^{[6]}, so that it minimizes
the first order approximation (O(n^{1})) to the standard error of the
respective estimators. It is worth noting that the proposed families of estimators
inherit the perspective of a version of ^{ sr [8]} and that a fractional
α is not computationally favorable to numerical exponential approximation.
Another such estimator with a similar perspective has been given by Reddy^{[9]}:
Note that the above estimator is embedded in the proposed family of estimators
_{αβ}
(1) with α = 0, i.e. oneparameter subfamily without the facility of exploitation
of a second degree of freedom. Then, the only parameter β controls both
first and second order of approximations to the mean square error.
Using the result of bivariate normal population^{[10]}, we get the following expressions for the first and second order of approximations to the Mean Square Error (MSE) of the relevant estimators (MSE_{1}(.) and MSE_{2}(.)), respectively
and
where, k = ρ(C_{y}/C_{z}). By minimizing MSE_{1}(.) and MSE_{2}(.) in Eqs. 8 to 11, we get:
To use these optimal values of designparameters, we are constrained by the lack of knowledge of the values of ρ and k, where k is a function of ρ and C_{y}/C_{x} (Eq.12 and 13). However, we use the easily available close guesses on ρ and C_{y}/C_{x} via the past data or long association with the experimental setup in order to obtain a value for k.
To study the sensitivity (robustness) of the estimators in the proposed families to the relative errors in guessing k as also to study their relative efficiencies in order to discover
the appropriate estimation strategy to be recommended in a practical situation,
an extensive simulation study was carried out.
Table 1: 
Relative efficiency for different ratioproduct estimators 

Table 2: 
Relative efficiency for different ratioproduct estimators 

Table 3: 
Relative efficiency for different ratioproduct estimators 

Table 4: 
Relative efficiency for different ratioproduct estimators 

Table 5: 
Relative efficiency for different ratioproduct estimators 

RESULTS AND DISCUSSION
For simulation study, we consider two independent normal populations with
μ_{Y} = 6, σ_{Y} = 6 and μ_{X}
= 6, σ_{X} = 12 so that C_{X} = 2C_{Y}
We generate respective bivariate normal population using ten different values: ±0.1, ±0.3, ±0.5, ±0.7 and ±0.9 for three different sample sizes: n = 5, 10 and 20.
Also to discover the robustness of the estimators (1) and (2), we carry out sensitivity analysis using nine cases of the relative error in guessing k by using g, say 
REG(k) = [(gk)/k]100%
We have generated 5000 samples for each combination of n, ρ and REG(k) and have calculated the actual MSE of all the estimators mentioned (Eqs. 1 to 7). These results are summarized in Table 15. The tabulated values represent the relative efficiencies of the estimators.
Table 15 are organized for different data of coefficient of correlation(ρ), sample size (n) and REG (k) = 0, ±5%, ±10%, ±15% and ±20%.
The results are encouraging for the proposed families of estimators. As expected,
^{sa}
and ^{sr}
din not perform so well as the estimators of the proposed families. The same
happened to Vos’[6] ^{M} (2) as compared to his ^{M}(1). Hence we excluded these
estimators while presenting these results.
The results also shows that:
• 
For ρ ≥ 0.5, the proposed families of estimators
are significantly more efficient as compared to ^{M}(1) and ^{re}. Moreover, for
larger values of ρ, these efficiencies become higher when the sample
size is small. 
• 
When the correlation is very low (say ρ < 0.3),
the estimators ^{M}(1) and ^{re}performed better than the proposed estimators even
though the improvement over ^{}was rather insignificant. 
• 
In the case to underguess (i.e. when REG(k) is negative),
^{αβ}(1) is more favorable. 
• 
For the case of the overguess (i.e. when REG(k) is positive),
^{αβ}(2) performs better. 
Therefore, in practice, it will be prudent to use the estimation strategy via
the simple mixing:
since it would be unknown to us as to whether we are underguessing
of overguessing.