Sampling and determining sample size is one of the major problems in health
research. Sample size determination is defined as the mathematical process of
deciding, before a study begins, how many subjects should be studied (Last,
1995). The most important question asked by a health researcher when designing
and planning of a study or survey is how large a sample size do we need? and
the answer will depend on the objectives, nature and scope of the study and
the expected results. Many interested health researchers who may even not be
academic staff, have problems in determination of sample size. Estimation of
the power of tests is one of the important problems, when the researcher doesnt
find significant statistical differences. On the other hand the majority of
researchers while studying results presented or published by other researchers,
as they are faced with summarized data and because of the complexity of statistical
formulas, mostly, dont know if they can rely on what they read or not?
Acquiring a fairly large count of randomized numbers, especially when the numbers
are to be in a given range is another time consuming if not difficult activity.
So a need has always been arisen to design sample size software packages facilitating
the process. Several software packages have been designed and presented up to
now but nearly all of them need a user with a good knowledge about biostatistics
and statistical tests. Based on some WHO publications a software package has
been designed for those health researchers who have not a good knowledge of
biostatistics as well as biostatisticians and epidemiologist. The software is
nominated by the designer as Yasin sampling software. The aim of this paper
is to evaluate the primary version of this software compared to some other available
sample size software packages.
MATERIALS AND METHODS
Visual basic 6 was the language for programming. Standard tables such as Z table were used by self defined functions and modules. SQL language was used to extract data from access database taking advantage of ADO technology. Active Xs, DLLs and MSchart, PPT, Mplayer components were used every where needed. RND function and For-Next and Do-Until loops were used extracting random numbers.
Algorithm used for designing sample size calculation is taken mainly from WHO recommendations in WHO different publications mainly Lemshaws Sample size determination in health studies. Algorithm was improved using scientific recommendations from a professor of biostatics and an epidemiologist along with testing applicability of it looking up for special examples of sample size calculation, reviewing 5 peer reviewed medical journals.
For primary evaluation purpose, 23 academic staff of Ardabil University of medical sciences and Public Health School of Tehran University of medical sciences were selected and presented the different capabilities of software and then, designed questionnaire for evaluation of it was filled by them.
Data collected were analyzed with SPSS software. formulas used in estimation of sample size and key descriptions were mostly taken from Lemeshaw textbook.
Capabilities of software:
||Calculating sample size for a vast range of health studies
such as descriptive studies, cohort studies case control studies and clinical
trial studies (Fig. 1).
||Suggesting complete printable report and sample size estimation
graphs for different values of power or significance level (Fig.
||Calculating the power of study.
||Accomplishing major statistical tests on summarized data.
||Producing random numbers in required count and given upper
and lower limits.
||Presenting some educational matters about different aspects
of sampling in power point format.
||Giving some useful examples of calculating sample size in
different situations to help young researchers get acquainted with sampling
All the above mentioned capabilities are provided in four parts. The first of which is sample size determination Applied algorithmic approach for designing of different situations of sample size estimation is adopted from WHO Publications in the field of research methodology and mostly from, sample size determination textbook edited by Lemeshaw. This algorithmic approach was adopted thanks to the recommendations of two professors of biostatistics and epidemiology. Validity of calculations were tested by comparing the results of given examples by Lemeshaw.
Grouped situations for sample size calculation and the formulas used are as following:
One-sample situations: Three components for estimation of sample size were predicted in the one- sample situations.
The first component is calculating sample size for estimation of a proportion or prevalence of a disease in a given population. Formula used was:
The second component is when estimating of mean for a quantitative measure
in a given population.
|| Interface view for the sample size calculation section of
|| Printable report and sample size estimation graphs for different
values of power and significance level
Third component is related to hypothesis testing of a population proportion. As stated by Lemshaw this section applies to studies designed to test the hypothesis that the proportion of individuals in a population possessing a given characteristic is equal to a particular value.
Formulas used for either one sided or two sided tests were:
Two-sample situations: estimation of the difference between two population proportions with specified absolute precision. Formulas used were:
Hypothesis tests for two-population proportion: This section is used for studies to test the hypothesis that two-population proportions are equal.
Formulas used for one sided or two sided tests were:
Case_control studies: In these studies, estimation of sample size was done on the basis of two exposure ratio in control group and case group and/or exposure ratio in control group and odds ratio (OR). Related values of power, confidence level and ratio of number of controls over cases consisted other necessary information.
Cohort and clinical trials studies: Calculation of sample size is like case-control studies except for entering some specific features like loss to follow up ratio in cohort studies.
Lot quality assurance sampling: Accepting population prevalence as not
exceeding a specified value. This section outlines how to determine the minimum
sample size that should be selected from a given population so that, if a particular
characteristic is found in no more than a specified number of sampled individuals.
the prevalence of the characteristic in the population can be accepted as not
exceeding a certain value. The formulas used were:
The value of n is obtained by solution of the inequality
where M = NP, for a finite population; or
for an infinite population.
Of all the evaluation participants two were associate professors, 7 were assistant professors and others were university lecturers and health researchers. Ideas of participants about the quality of graphical views used is given in Table 1.
Considering the use of music in software 61.9% agreed with it while 23.8% were against it and 14.3 had no ideas in this regard.
About the application language 42.9% of evaluators preferred it to be available both in English and Farsi languages and others in equal proportion preferred either English or Farsi languages.
Using sample size estimation graphs was evaluated to be necessary or quite
necessary in 76% and only one of the participants had considered it to be unnecessary.
The examples and Help available in software were evaluated as excellent in 19%
and no one had evaluated it as weak or poor.
||Distribution of the overall ideas of participants in evaluating
Yasin health research software
|| Ideas of evaluators about the quality of graphical views
All of the participants except one had stated the presentation of formulas in printable report is necessary. 38.1% had evaluated the power determination section of application as very useful and 57% as useful (Table 1). Nearly half of the evaluators had found the random numbers section of application as excellent and except for one person who had no idea the other half had evaluated it as good.
Distribution of the overall ideas of participants in evaluating Yasin health research software as a useful research tool is given in Fig. 3.
There have been several other software applications developed for sample size calculation (CDC/EPI Info 2002; software, 2002; Dupont and Plummer, 2003; Iwane and Plante, 1997; Luttke, 1991, Lwanga and Lemeshow, 1991; Arshi and Sadeghi, 2003). In this discussion we will comparatively review capabilities of Yasin software as well as some other sample size software applications.
REPLI: A program written in elementary BASIC, calculates the approximate sample size, which is required to detect a desired difference between any two group means in an experiment with n groups for a given probability and at three significance levels of the means difference. The current program consists of three blocks carrying out (1) the reading of data for the t-value matrices, (2) acceptance of parameters (coefficient of variation of means, difference to be detected, probability for detection, number of groups in experiment and a pre-selected threshold), (3) the calculation proper, the output on screen and the option to rerun the program with new parameters immediately.
EPI Info statcalc: Both DOS application versions and windows application versions provide equally determined capabilities in relation to sample size calculation. Two main capabilities of this software are 1-Estimation of sample size in population surveys, when p-value, precision, confidence level and target populations are determined. 2-Comparison of difference between two ratios. A complete algorithmic approach and help are not provided by this software.
PS: This software is designed on the basis of types of statistical tests and is very complete compared to Epi info. An ultimate algorithmic approach is used. This software is a window application written by with visual basic programming language. But PS software users need to be quite acquainted with statistical methods and not easy to use for many health researchers.
Nsurv: Nsurv is specifically designed for computing sample size and power for two-group studies with exponentially distributed time-to-event data. Sample size calculations are based on the method of Lachin and Foulkes. Nsurv is a companion package to N, which computes sample size and Rower for studies with normally distributed measurement data or binary data. For time-to-event data, Nsurv calculates the quantity of interest, such as sample size, when the user specifies a one-sided or two-sided test, an efficacy or equivalence study design, cumulative event rates for each treatment group at a specified time, the ratio of the sample sizes for the two treatment groups, the cumulative percent lost-to-follow-up (exponentially distributed) common at a specified time to both groups and types I and II error rates. The user must choose one of four configurations of accrual and follow-up periods, which then constrains the available options in later menus.
Clinical Trials Design Program (CTDP): CTDP can perform a wide array
of calculations for a variety of trial designs, including those with endpoints
which are time-to-event, binary, or normally distributed means, as well as some
epidemiological designs. We address survival designs only. The package provides
the user with three methods for time to-event outcomes based on three different
statistical approaches. The input/output screens are similar for the different
methods, with available parameter options determined by the statistical approaches.
EGRET SIZ: Egret Siz is an extensive software package with several screens to navigate through when calculating sample size and power based on the Cox proportional hazard model. Power and sample size calculations follow the approach of Self, Mauritsen and Ohara (Self et al., 1992) which is based on a noncentral Chi-square approximation to the distribution of the Likelihood Ratio Test (LRT) statistic. Egret siz will also calculate sample size and power for models that are prospective logistic, unmatched retrospective logistic, conditional logistic for matched sets and Poisson regression for subject-time data. Unlike the other software packages evaluated, Egret Siz allows more than two treatment groups and specification of covariates.
Power: power, a simple DOS prompt-driven package, calculates sample size, power, or the detectable difference for time-to-event data based on the formula in Schoenfeld and Richter (1982) for exponentially distributed event times. It also performs calculations for matched and unmatched study designs with binary or continuous outcome variables.
Power Analysis and Sample Size (PASS): Pass calculates power, sample size and other design parameters for a broad range of study designs for outcome variables that are continuous, proportions and time-to-event and for an array of analyses including those based on analysis of variances (ANOVAs), linear regressions, correlation coefficients, logistic regressions, matched and unmatched analyses and log-rank tests. As with Nsurv, Pass uses the method of Lachin and Foulkes for exponentially distributed time-to-event data.
Ex-Sample : Ex-Sample, like Power, calculates sample size using the method of Schoenfeld and Richter. The design parameters are entered into a single survival design screen which is reached through two other screens.
Yasin software is a windows application compatible with all windows operating systems available up to 2006. Except for PS all other software applications evaluated and discussed above are DOS applications. Although windows applications have very graphical capabilities but in our pilot evaluation, we found that most of them liked its graphical interface to be improved and in some face to face interviews some of them said that the first interface graphical view of the software is some how complex while they preferred a simple graphical view for a scientific software like this.
Both Yasin and PS provide an estimation graph for different powers and significance
levels. Egret Siz and Pass and Power to a more limited extent, can perform calculations
over a user specified range of values for design parameters, e.g., power calculations
for various sample sizes or sample size calculations for various types I and
II error. In case of cohort and case control and clinical trail studies comparing
proportions Yasin can estimate power of the study if proportions and sample
size is available. It seems that future versions of software should include
power calculation when means are compared either. The other packages return
only a single calculated sample size or power. Thus, a user who wants to consider
many design scenarios must conduct each in a separate computer run. Yasin gives
a one sided and two sided hypothesis option in calculation of sample size. Most
of the software applications discussed have the same capability but Nsurvs
two-sided option gives higher sample sizes than the other software packages
because the hazard difference under the two-sided alternative is conceptualized
in a different way (Frick, 1991). Nsurv makes conservative assumptions and the
sample size is larger since fewer events are predicted.
The help in Yasin is quite expansive and provides examples from WHO publications in a power point slide show format. The examples and explanations make user to learn not only about using the software but also about basic research methodological knowledge in some cases. It can be recommended to software designer to provide an expansive printable embedded manual or a complementary handbook either. Nsurv has provided such a handbook accompanying the software package.
Suggestions for improvement of Yasin software:
||A separate section on survival studies should be included.
||Power determination to be expanded for both studies comparing
means as well as proportions.
||To increase the validity of software at least 10 well known
biostaticians and epidemiologists can be invited to give recommendations
for improving the software and discuss the methods used
||A web based application is encouraged to be developed.
||An accompanying handbook is suggested to be made available.
||Sample size determination for clinical trials with more than
two comparison groups should be included.
||An online help system for answering arisen questions of users
can be developed.
||A special section for enthomology toxicology studies can be
Yasin sampling software after some improvements can be a very useful and easy to use scientific software in field of health research especially for those who have not a high expertise in biostatistics.
We thank to Mrs. Amini and Mr. Amani for their kind help and recommendations.
Thanks also to Ardabil University of Medical Sciences for the financial support
provided. It is declared by all authors that the right for patent of software
belongs only to the software designer who is the first author of this paper
or any other person introduced by him as helping in software design. The designer
dedicates the software to Dr. Parvaneh Vosough Professor of Pediatric Oncology
who has saved the lives of many cancer patients including Yasin.