Abstract: The crux of this study is to consider a randomized response model using allocation problem in presence of non-response based on model and minimize the variance subject to cost constraint. The costs (measurement costs and total budget of the survey) in the cost constraint are assumed as fuzzy numbers, in particular triangular and trapezoidal fuzzy numbers due to the ease of use. The problem formulated is solved by using Lagrange multipliers technique and the optimum allocation obtained in the form of fuzzy numbers is converted into crisp form using α-cut method at a prescribed value of α. Numerical illustrations are also given in support of the present study and the results are formulated through LINGO.
INTRODUCTION
The most important things for obtaining data pertaining to human population is the social survey. To measure opinions, attitudes and behaviors that cover a wide band of interests, the social survey has been established as being tremendously practical. The surveys are conducted due to many reasons, non-availability of certain facts/information in the archives being the most understandable and apparent. For instance, if one is interested in knowing crime rate, information about unseen crimes or unreported victimization experience is not available in formal records on crimes. Sometimes the facts about the individuals (in a population) are inaccessible to the investigators for legal reasons. Questionnaires, in particular social surveys, generally consist of many items. Some of the items may be about sensitive/high risk behavior, due to the social stigma carried by them. One problem with research on high-risk behavior is that respondents may consciously or unconsciously provide incorrect information. In psychological surveys, a social desirability bias has been observed as a major cause of distortion in standardized personality measures. Survey researchers have similar concerns about the truth of survey results/findings about such topics as drunk driving, use of marijuana, tax evasion, illicit drug use, induced abortion, shop lifting, child abuse, family disturbances, cheating in exams, HIV/AIDS and sexual behavior. Thus to obtain trustworthy data on such confidential matters, especially the sensitive ones, instead of open surveys alternative procedures are required. Such an alternative procedure known as “randomized response technique (RRT)” was first introduced by Warner1.
The Randomized response (RR) technique was first introduced by Warner1 mainly to cut down the possibility of (1) Reduced response rate and (2) Inflated response bias experienced in direct or open survey relating to sensitive issues. Warner1 himself pointed out how one may get a biased estimate in an open survey when a population consists of individuals bearing a stigmatizing character A or its complement AC, which may or may not also be stigmatizing. It requires the interviewee to give a “Yes” or “No” answer either to the sensitive question or to its negative depending on the outcome of a randomized device not reported to the interviewer. Greenberg et al.2 derived results for Warner’s model in the case of less than completely truthful reporting. Later several modifications in RR technique have been developed by various authors Fox and Tracy3, Chaudhuri and Mukerjee4, Singh and Tarray5-7, Tarray and Singh8, Singh and Tarray9, Tarray and Singh10-13 and Tarray14.
Stratified random sampling is generally obtained by dividing the population into non-overlapping groups called strata and selecting a simple random sample from each stratum. An RR technique using stratified random sampling provides the group characteristics related to each stratum estimator. Also, stratified sample, protect a researcher from the possibility of obtaining a poor sample15.
A STUDY OF RANDOMIZED RESPONSE TECHNIQUES
The description of the models due to Singh16 and Kim and Warde15 are given below:
Singh model: Singh16 developed randomized response techniques named RRT1 which is given below:
RRT1: In this procedure, each interviewee in a with replacement simple random sample of size n is provided with one randomized response device. It consists of the statement “I belong to the sensitive group” with known probability P, exactly the same probability as used by Warner1 and the statement “Yes” with probability (1-P). The interviewee is instructed to use the device and report “Yes” or “No” for the random outcome of the sensitive statement according to his/her actual status. Otherwise, it is simply to report the “Yes” statement observed on the randomized response device. The whole procedure is completed by the respondent, unobserved by the interviewer. Then θ1 the probability of a "Yes" answer in the population is:
θ1 = PπS+(1-P)
An unbiased estimator of πS due to Singh16 is given by:
where,
The variance of the estimator
Kim and Warde model: Kim and Warde15 suggested a stratified randomized response model based on Warner1 model. Suppose the population is partitioned into strata and a sample is selected by simple random sampling with replacement in each stratum. To get the full benefit from stratification, it is assumed that the number of units in each stratum is known. An individual respondent in the sample of stratum ‘i’ is instructed to use the randomization device Ri which consists of a sensitive question (S) card with probability Pi and its negative question (Sc) with probability (1-Pi). The respondent should answer the question by "Yes" or "No" without reporting which question card she or he has. A respondent belonging to the sample in different strata will perform different randomization devices, each having different pre-assigned probabilities. Let ni denote the number of units in the sample from stratum i and n denote the total number of units in sample from all stratum so that
Zi = PiπSi+(1-Pi)(1-πSi), for ( i =1, 2 ..., k)
where, Zi is the proportion of “Yes” answers in a stratum i, πSi is the proportion of respondents with the sensitive trait in a stratum i and Pi is the probability that a respondent in the sample stratum i has a sensitive question (S) card.
The maximum likelihood unbiased estimate of πSi is shown to be:
where,
where, N is the number of units in the whole population, Ni is the total number of units in the stratum i and wi = (Ni/N) for (i = 1, 2, ...k) so that
The variance of is
The optimal (Neyman) allocation of n to n1, n2...nk-1 and nk to derive the minimum variance of the
Thus the minimal variance of an estimator
Ki et al.17 envisaged RR technique that applied the same randomization device to every stratum. Stratified random sampling is generally obtained by dividing the population into non-overlapping groups called strata and selecting a simple random sample from each stratum. An RR technique using a stratified sampling gives the group characteristics related to each stratum estimator. Also, stratified samples protect a researcher from the possibility of obtaining a poor sample. For the sake of completeness and convenience to the readers, the descriptions of fuzzy sets, fuzzy numbers, Triangular Fuzzy Number (TFN) and Trapezoidal Fuzzy Number (TrFN) which are reproduced here from Bector and Chandra18, Mahapatra and Roy19, Hassanzadeh et al.20 and Aggarwal and Sharma21.
Fuzzy sets were introduced by Zadeh22 to represent/manipulate data and information possessing non-statistical uncertainties.
It was specifically designed to mathematically represent uncertainty and vagueness and to provide formalized tools for dealing with the imprecision intrinsic to many problems. However, the story of fuzzy logic started much more earlier. To devise a concise theory of logic and later mathematics, Aristotle posited the so-called “Laws of Thought”. One of these, the “Law of the Excluded Middle,” states that every proposition must either be True (T) or False (F). Even when Parmenides proposed the first version of this law (around 400 Before Christ) there were strong and immediate objections: for example, It was Plato who laid the foundation for what would become fuzzy logic, indicating that there was a third region (beyond T and F) where these opposites “tumbled about.” A systematic alternative to the bi-valued logic of Aristotle. Three-valued logic, along with the mathematics to accompany it. The third value can be best be translated as the term “possible,” and the numeric value between T and F. Eventually, an entire notation and axiomatic system from which he hoped to derive modern mathematics. Later, four-valued logics, five-valued logics and then declared that in principle there was nothing to prevent the derivation of an infinite-valued logic. three- and infinite-valued logics were the most intriguing, but ultimately settled on a four-valued logic because it seemed to be the most easily adaptable to Aristotelian logic.
The notion of an infinite-valued logic was introduced in Zadeh’s seminal work ”Fuzzy Sets” where the mathematics of fuzzy set theory and by extension fuzzy logic. This theory proposed making the membership function (or the values F and T) operate over the range of real numbers [0, 1]. New operations for the calculus of logic were proposed and showed to be in principle at least a generalization of classic logic. Fuzzy logic provides an inference morphology that enables approximate human reasoning capabilities to be applied to knowledge-based systems. The theory of fuzzy logic provides a mathematical strength to capture the uncertainties associated with human cognitive processes, such as thinking and reasoning. The conventional approaches to knowledge representation lack the means for representating the meaning of fuzzy concepts. As a consequence, the approaches based on first order logic and classical probability theory do not provide an appropriate conceptual framework for dealing with the representation of commonsense knowledge, since such knowledge is by its nature both lexically imprecise and non-categorical.
There are two main characteristics of fuzzy systems that give them better performance for specific applications.
| Fuzzy systems are suitable for uncertain or approximate reasoning, especially for the system with a mathematical model that is difficult to derive. |
| Fuzzy logic allows decision making with estimated values under incomplete or uncertain information |
Fuzzy set: A fuzzy set in a universe of discourse X is defined as the following set of pairs
α-Cut: The α-cut for a fuzzy set
(1) |
where, X is the universal set.
Upper and lower bounds for any α-cut
Fuzzy number: A fuzzy set in R is called a fuzzy number if it satisfies the following conditions:
| A is convex and normal |
| Aα is a closed interval for every ∈ (0, 1] |
| Support of is bounded |
Triangular fuzzy number (TFN): A fuzzy number
(2) |
Trapezoidal fuzzy number (TrFN): A fuzzy set
(3) |
PROBLEM FORMULATION
Ki et al.17 suggested a stratified RR technique that applied the same randomization device to every stratum. Stratified random sampling is generally obtained by dividing the population into two over lapping groups called strata and selecting a simple random sample from each stratum. An RR technique using a stratified random sampling gives the group characteristics related to each stratum estimator. Also, stratified sample protect a researcher from the possibility of obtaining a poor sample. Under Ki et al.17 proportional sampling assumption, it may be easy to derive the variance of the proposed estimator; however, it may cause a high cost because of the difficulty in obtaining a proportional sample from some stratum. To rectify this problem, Kim and Warde15 present a stratified randomized response technique using an optimal allocation which is more efficient than a stratified randomized response technique using a proportional allocation. Singh and Tarray7 developed a stratified randomized response models designated as SRRT1 which is described below:
SRRT1: In this procedure, an individual respondent in a sample from each stratum is provided with one randomized response device. It consists of the statement “I belong to the sensitive group” with known probability Pi, exactly the same probability as used by Kim and Warde15 and the statement “Yes” with probability (1-Pi). The interviewee is instructed to use the device and report “Yes” or “No” for the random outcome of the sensitive statement according to his actual status. Otherwise, it is simply requested to report the “Yes” statement observed on the randomized response device. The whole procedure is completed by the respondent, unobserved by the interviewer. A respondent belonging to the sample in different strata will perform different randomization devices, each having different pre-assigned probabilities. The probability of a "Yes" answer in a stratum i for this procedure is:
(4) |
where, θ1i is the proportion of “Yes” answers in a stratum i, πSi is the proportion of respondents with the sensitive trait in a stratum i and Pi is the probability that a respondent in the sample stratum i has a sensitive question.
The maximum likelihood estimate of πSi in this procedure will be:
(5) |
where,
where they denote N to be the number of units in the whole population, Ni to be the total number of units in the stratum i and wi = (Ni/N) for ( i = 1, 2,...k) so that
(6) |
Since each unbiased estimator
(7) |
(8) |
(9) |
Or:
(10) |
To find the optimum allocation we either maximize the precision for fixed budget or minimize the cost for fixed precision. A linear cost function which is an adequate approximation of the actual cost incurred will be
The linear cost function is:
(11) |
where, C0 is the over head cost, ci is the per unit cost of measurement in ith stratum, C is the available fixed budget for the survey.
In view of Eq. 4 and 11, the problem of optimum allocation can be formulated as a non linear programming problem (NLPP) for fixed cost as:
(12) |
The restrictions 1<ni and ni<Ni are placed to have the representation of every stratum in the sample and to avoid the oversampling, respectively.
FUZZY FORMULATION
Generally, real-world situations involve a lot of parameters such as cost and time, whose values are assigned by the decision makers and in the conventional approach, they are required to fix an exact value to the aforementioned parameters. However, decision-makers frequently do not precisely know the value of those parameters. Therefore, in such cases it is better to consider those parameters or coefficients in the decision-making problems as fuzzy numbers. The mathematical modeling of fuzzy concepts was presented by Zadeh22. Therefore, the fuzzy formulation of problem (12) with fuzzy cost constraint is given by considering two cases of fuzzy numbers, that is, triangular fuzzy number (TFN) and trapezoidal fuzzy number (TrFN).
For triangular fuzzy number (TFN) we consider:
(13) |
Where:
(14) |
And
(15) |
Similarly, the membership function for available budget can be expressed as:
(16) |
and for trapezoidal fuzzy number (TrFN) we consider:
(17) |
Where:
(18) |
and
(19) |
Similarly, the membership function for available budget can be expressed as:
(20) |
LAGRANGE MULTIPLIERS FORMULATION
Let us now determine the solution of problems (13) by ignoring upper and lower bounds and integer requirements the NLPP with TFNs is solved by Lagrange multipliers technique (LMT).
The Lagrangian function may be:
(21) |
Differentiating Eq. 21 with respect to ni and λ and equating to zero, we get the following sets of equations:
(22) |
Or:
(23) |
Also:
(24) |
Which gives:
(25) |
Or:
(26) |
Substituting Eq. 23 in Eq. 26, we have:
(27) |
In similar manner, the optimum allocation of NLPP Eq. 17 with trapezoidal fuzzy number can be obtained as follows:
(28) |
To convert fuzzy allocations into a crisp allocation by-cut method.
PROCEDURE FOR CONVERSATION OF FUZZY NUMBERS
The fuzzy allocations into a crisp allocation by α-cut method let
and:
(29) |
where,
(30) |
Similarly, let
And:
(31) |
where,
(32) |
Fig. 1: | Triangular fuzzy number with an α-cut |
The allocations obtained by Eq. 30 and Eq. 32 provide the solution to NLPP Eq. 13 and 17 if it satisfies the restriction 1<ni<Nh, i = 1, 2,..., k. The allocations obtained in Eq. 30 and 32 may not be integer allocations, so to get integer allocations, round off the allocations to the nearest integer values. After rounding off we have to be careful in rechecking that the round-off values satisfy the cost constraint. Now we further discuss equal and proportional allocations as follows:
Equal allocation: In this method, the total sample size is divided equally among all the strata, that is, for the th Stratum:
(33) |
where, can be obtained from the cost constraint equation as follows:
(34) |
Or:
(35) |
Now substituting the value of ni in Eq. 34, we get:
(36) |
Proportional allocation: This allocation was originally proposed by Bowley23. This procedure of allocation is very common in practice because of its simplicity. When no other information except Ni, the total number of units in the ith
Fig. 2: | Trapezoidal fuzzy number with an α-cut |
stratum, is available, the allocation of a given sample of size n to different strata is done in proportion to their sizes, that is, in the ith stratum:
(37) |
NUMERICAL ILLUSTRATION
A hypothetical example is given to illustrate the computational details of the proposed problem. Let us suppose the population size is 1000 with total available budget of the survey as TFNs and TrFNS are (3500, 4000, 4800) and (3500, 4000, 4400, 4600) units, respectively. The other required relevant information is given in Table 1. By using the value of Table 1, it was computed the values of Ai which is given in Table 2.
After substituting all the values from Table 1 and 2 in Eq. 13, the required FNLLP is given as:
(38) |
The required optimum allocations for problem (Eq. 13) obtained by substituting the values from Table 1 and 2 in Eq. 30 at α = 0.5 will be:
In similar manner, optimum allocation for problem (Eq. 17) obtained by substituting the values from Table 1 and 2 in Eq. 32 at α = 0.55 will be:
The values of Xi, Ai and
Fig. 3: | Various nodes of NLPP |
Table 1: | Stratified population with two strata |
Table 2: | Calculated values of Ai and |
Table 3: | Calculated values of optimum allocation and variance |
Applying the α-cut and LMT, the optimum allocation after is obtained and summarized in Table 3 for both the cases i.e., case of TFN and case of TrFN with variance as24-25:
Case-I:
Using the above minimization problem, we get optimal solution as n1 = 300, n2 = 177.778 and optimal value is Minimize V(S) = 0.0008320149.
Since n1 and n2 are required to be the integers, we branch problem R1 into two sub problems R2 and R3 by introducing the constraints n2<177 and n2>178 respectively indicated by the value n1 = 300 and n2 = 177 and n1 =296 and n2 = 178. Hence the solution is treated as optimal. The optimal value is n1 = 296 and n2 = 178 and optimal solution is to minimize V (
Case-II:
Using the above minimization problem, we get optimal solution as n1 = 240.86, n2 = 175.91 and optimal value is Minimize V (
Since n1 and n2 are required to be the integers, so problem R1 is further branched into sub problems R2; R2; R4 and R5 with additional constraints as n1<240 ; n1>241 ; n2<175 and n2>176; respectively. Problems R2, R4 and R5 stand fathomed as the optimal solution in each case is integral in n1 and n2. Problem R3 has been further branched into sub problems R4 and R5 with additional constraints as n1<175 and n1>176; respectively which suggests that R6 is fathomed and R7 has no feasible solution. The optimal value is n1 = 240 and n2 = 136 and optimal solution is to Minimize V (
Case-III:
Using the above minimization problem, we get optimal solution as n1 = 180.25, n2 = 170 and optimal value is Minimize V (
Fig. 4: | Various nodes of NLPP |
Fig. 5: | Various nodes of NLPP |
Since n1 and n2 are required to be the integers, so problem R1 is further branched into sub problems R2 and R3 with additional constraints as n1<180 ; n1>181, respectively. Problems R2 stand fathomed as the optimal solution in each case is integral in n1 and n2. Problem R3 has been further branched into sub problems R4 and R5 with additional constraints as n2<169 and n2>170; respectively. R4 is fathomed and R5 has no feasible solution. Hence the solution is treated as optimal. The optimal value is n1 = 180.25 and n2 = 170 and optimal solution is to Minimize V (
In both the three cases we find that the optimal value n1 = 296 and n2 = 178 and optimal solution is to Minimize V (
DISCUSSION
A stratified randomized response method assists to solve the limitations of randomized response that is the loss of individual characteristics of the respondents. The optimum allocation problem for two-stage stratified random sampling based on Singh and Tarray7 model with fuzzy costs is formulated as a problem of fuzzy nonlinear programming problem. The problem is then solved by using Lagrange multipliers technique for obtaining optimum allocation. The optimum allocation obtained in the form of fuzzy numbers is converted into an equivalent crisp number by using α-cut method at a prescribed value of α.
For practical purposes we need integer sample sizes. Therefore, in instead of rounding off the continuous solution, we have obtained integer solution, by formulating the problem as fuzzy integer nonlinear programming problem and obtained the integer solution by LINGO software.