INTRODUCTION
Collecting accurate information about the behavior and composition of hidden social groups specially those exposing high risk and transmitted diseases such as HIV/AIDS is one of the vital issues of the most policy makers all around the world. Some wellknown examples of such populations are Injection Drug Users (IDU)^{13}, sex workers^{4} and men who have sex with men (MSM)^{5}.
Conventional sampling methods can not achieve a good prospective of these populations who influence the public health of society. These methods require a known probability of selection which means to have a sampling frame from all members of these populations. This information mostly does not exist^{6}. One of the efficient methods of collecting information about hidden populations is institutional sampling method. Sampling IDU population by this method will be restricted for example to those attending to a drug rehabilitation program. In this way, any inferences from the resulting sample are not statistically valid^{7}.
Another two most common approaches of sampling these populations are targeted and timespace sampling methods. Both of these methods treat the hidden population members discretely and they do not use population’s networks relationships to estimate accurately. Targeted sampling or street outreach mostly can not reach a large number of noninstitutional members of the hidden population^{8}. The sample is not selected randomly and the probability selection of sample is unknown. Timespace sampling method introduced to select samples with known probability in the identified venues and ease inferences in this way^{9}. Safety and cost concerns of some venues cause them not to be accessible for the researchers and the sample in this way is not representative of target populations^{10}. Coverage problem of this method which makes unknown bias to the hidden population estimates by Stueve et al.^{10}.
Another common approach is chainreferral sampling which can successfully penetrate in hidden populations and recruit their members but the estimations from these populations are statistically invalid. These methods has been called as nonprobability methods due to have unknown probability of sample selection^{11}. One of the most applicable chainreferral sampling methods is snowball sampling, which is first introduced by Goodman^{12}.
A new sampling method, called Respondent Driven Sampling (RDS) introduced by Heckathorn^{13,14}. It is an altered method of snowball sampling. Since, last two decades of introducing RDS method, lots of researchers in different field of science, such as health and demography are studying and applying this sampling method^{1521}.

Fig. 1:  Recruitment process in RDS method 
The RDS sample is gathered in the same way as any chainreferral sampling method. Figure 1 presents the sampling process of RDS method. In this study, sampling starts from initial members, which are called seeds. Seeds are nonrandom members of the sample that are selected by researchers from their interested hidden population. They are those members who have bigger social networks among the others. These new members (recruiters) start recruiting from their social networks and introduce new members to the sample (recruits). These new recruiters recruit new members and ties are formed. This procedure continues till the desired sample size is achieved.
The RDS is an indirect method of estimating from sample about the population. The sample is used to make estimations about social network connections and then this information could be applied to make asymptotically unbiased population proportion estimates^{6}. The bias of these estimates is on the order of sample size inverse, so, it is negligible for meaningful sample sizes^{22}.
To find population estimators of RDS samples, there are different approaches for dichotomous (for example, HIV+, HIV) and nondichotomous variables (for example, quartiles of age, 20, 30 and 3040 and etc). The main aim of this study is to review estimator for dichotomous variables and define data smoothing estimators for nondichotomous variables.
MATERIALS AND METHODS
The RDS recruitment network reflects preexisting social relationships that link recruiters and recruits. These relationships are reciprocal. In this way, T_{ab} the number of ties from AB equals those from BA that means as shown in Eq. 1:
The number of such crosscutting ties depends on three following factors in Eq. 2:
T_{ab} = N_{a}D_{a}S_{a}b  (2) 
The same results can also be concluded for T_{ba}. Setting these two equations in Eq. 1 and dividing 2 parts of the equation to N, population size can get following Eq. 3:
P_{a}D_{a}S_{ab} = P_{b}D_{b}S_{ba}  (3) 
By substituting (1P_{a}) for P_{b}, the Eq. 3 will be an estimate of population size of group A, P_{a}, based on the reciprocity model given Eq. 4:
The same results can also be concluded for P_{b}. Equation 4 provides an estimation of the proportional size of the hidden population according to the transition probabilities based on recruitment patterns and selfreported personal network size^{14,23}.
To find unbiased estimators of the population proportion size, the transition probabilities and personal network size should be estimated. A recruitment matrix, R, where R_{ab} is the number of recruitments by the members of group A of members of group B could be considered as in Eq. 5:
According to this matrix, the unbiased estimators of S_{ab} and S_{ba} can be computed as in Eq. 6:
where, RB_{a} = R_{aa}+R_{ab} and RB_{b} = R_{bb}+R_{ba}. The second element of RDS population size estimator is the mean degree of each group which could be estimated as Salganik and Heckathorn^{6} given in Eq. 7:
The RDS is based on the theories of Markov chain model so the recruitment process in RDS has the following characteristics.
A memory less process: The recruitment pattern of each recruiter depends only on its own recruiter. It has been termed as a first Markov process characteristic by Heckathorn^{13}.
An ergodic process: A recruiter with one set of characteristics recruiting another subject with the same or different characteristics. After one or more recruitment waves, a recruit can have the same characteristics as the earlier recruiter.
To reduce the bias of not selecting RDS seeds randomly, the recruitment should be continued until equilibrium is reached. To compute equilibrium analytically, the low of large numbers for Markov chains can be applied. The equilibrium for a system of two groups, A and B could be defined as in Eq. 8:
E_{a} = S_{aa}E_{a}+S_{ba}E_{b} 

Solving this Eq. 8 system results in Eq. 9:
Population proportion estimator for nondichotomous variables: Reciprocity model for a system with N groups can be shown by a system of equations, where the 1st equation express equality of summing the population proportion size of groups to one. Each of the other equations shows reciprocity principals for each of the other two groups that can be shown as following in Eq. 10 and 11 for a 4 group system of A, B, C and D:
1 = P_{a}+P_{b}+P_{c}+P_{d}  (10) 
This system of equations are overdetermined because the number of unknowns and equations are not equal. This problem could happen for nondichotomous variables (any three or more categories variables). There are two different approaches for dealing with this problem. Linear least square is a standard solving process for these systems which has the same logic as linear regression^{24}. Another approach that is explained in the study is an alternative method for calculating the population proportion estimates that is drawn from the same logic as reciprocity models.
Data smoothing approach: Data smoothing approach as a solution for overcoming the problem of overdetermination in finding population proportion estimates for nondichotomous variables has been introduced by Heckathorn^{14}. The main idea in this study which comes from reciprocity models is about equality of the number of recruits, RO and the number of recruitments, RB for each group. By considering random recruitments from personal networks, crossgroup recruitments will be equal for each pair of groups. The best estimate for the number of crossrecruitments between each pair of groups is the mean of recruitments in each direction. In this way, the problem of overdetermination is solved by reducing the number of terms from which population estimates are calculated. This approach results in more efficient estimates comparing to linearleast squares approach^{25}. The point estimations are not affected for dichotomous variables by this approach, though for estimating variance it could be effective^{25}. This study can be done in the following steps.
Ranking or demographic adjustment step: That could be done by transforming the recruitment matrix using two conditions of not changing recruitment pattern and equaling the row and column sums for recruitment matrix (for any group, RO = RB). Each element in the transformed recruitment matrix (R_{ab}) is the product of three terms of the selection proportion the equilibrium for the recruiter’s group and the total number recruitments (RB). For a system with N group, the adjusted recruitment matrix R* is given in Eq. 12:
So for any groups, such as A, and .
Smoothing step: The best estimate for demographically adjusted recruitment counts are smoothed counts which are calculated by finding the mean of these counts across groups. In this way, the smoothed matrix, R** is found as in Eq. 13:
The R** becomes the basis for all the other calculations and the smoothed population proportion estimate could be computed according to this matrix.
RESULTS
In this study, by considering a hypothetical population which is made of two groups, Injection Drug User (IDU) with positive HIV (HIV^{+}), group A and Injection Drug User (IDU) with negative HIV (HIV‾), group B, the population proportion estimates are calculated. Table 1 presents this population consisting of 30 cases. According to Table 1, ID is respondent identification, IDR is respondent identification of each respondent’s recruiter, D is respondent’s selfreported degree and V is a dichotomous variable (IDU with HIV^{+} and IDU with HIV‾) and U is a nondichotomous variable (educational levels, (1) <Diploma, (2) Diploma and bachelor and (3) >Bachelor).
Respondent driven sampling estimates for dichotomous variables: The population proportion estimates for these two groups, IDU with HIV^{+} and IDU with HIV‾ are computed by RDS estimators. According to the Table 1, the first respondent is seed, so it doesn’t have any recruiter. Note that the degree data for cases of 2, 16 and 26 are missing.
Table 1:  Characteristics of hypothetical population 

Table 2:  Recruitment of matrix of IDU with HIV^{+} and IDU with HIV‾ groups 

Table 2 presents recruitment matrix of data in Table 1. As it is presented in Table 1, the first respondent (seed) is a member of group IDU with HIV^{+}, recruited respondents 2, 3 and 4, respondents 2 and 4 are members of group IDU, respondent 3 is a member of group IDU with HIV‾. So, transition probabilities could be computed according to this recruitment pattern. Table 2 also consists of recruitment proportions which are computed from the recruitment counts.
The number of each group can be found from extracting seeds and members with missing data of degree, so n_{a} = 1712 = 14 and n_{b} = 131 = 12. The estimated degrees for these two groups according to the Eq. 7 are:
The population estimates for these two groups also are shown in Eq. 14:
Table 3:  Recruitment matrix of three educational levels 

Table 4:  Selection probabilities of three educational levels 

Table 5:  Equilibrium of three educational levels 

The equilibriums for these two groups are calculated from transition probabilities as given in Eq. 15:
Respondent driven sampling estimates for nondichotomous variables: In this study, data smoothing procedure is examined in driving population proportion estimates for variables with three or more categories. Table 3 demonstrates recruitment matrix by educational levels. Table 4 also presents selection probabilities for this variable.
According to the selection probabilities in Table 4, solving following equations results in equilibrium of these three levels of education which has been shown in Table 5:
Table 6:  Demographically adjusted matrix of three educational levels 

Table 7:  Data smoothed recruitment matrix of three educational levels 

Table 8:  Data smoothed selection probabilities of three educational levels 

To find the demographically adjusted matrix that is presented in Table 6 similar to the demographically adjusted matrix in Eq. 7, each cell has been multiplied by the selection probabilities in Table 4, equilibrium in Table 5 and the total recruitment (RB).
Table 7 presents smoothed recruitment matrix according to the Eq. 13 that is found by averaging the crossrecruitment counts. Then this matrix could be employed to recalculate all the other terms such as data smoothed selection probabilities in Table 8 and estimated degrees:
According to the results of estimated degrees and smoothed selection probabilities, following equations could be concluded:
So, the population proportion estimates for three educational levels are equal to .
DISCUSSION
Most of present studies about hidden populations result in samples which have been collected by lots of effort and could not be generalized to the interested populations^{6,1521,26,27}. So, in this way, some descriptive statistics could be drown from these samples and no statistical inferences could be possible to conclude^{13,14}. It also leads researchers to some misleading conclusions^{6}. The RDS method which has been introduced by Heckathorn^{13,14} and reviewed in this study could solve this problem. However, if the expected procedure in RDS couldn’t be followed it will not result in more efficient estimators comparing to chain referral sampling methods^{28}. The RDS results in asymptotically unbiased population proportion estimates when theoretical and analytical assumptions of this sampling method could achieve^{6}. In the condition that some of nonsampling errors, such as nonrandom selection of seeds, notachieving to equilibrium in most important interested variables in the study and countering homophile which is recruiting nonrandomly from social networks exist, RDS will not conclude in unbiased estimators^{6,13,14}. Moreover, no statistical inferences could be drown from RDS samples.
For finding estimates of nondichotomous variables in RDS method, reciprocal approach introduced by Salganik and Heckathorn^{6} and Heckathorn^{14} results in overdetermination equations which can be solved by either least squares or data smoothing approaches. Data smoothing approach has been reviewed and calculated for a hypothetical data in this study. This method has been claimed that is much more effective in calculating unbiased estimators comparing to least squares^{6,13,14}.
CONCLUSION
The RDS sample could result in asymptotically unbiased estimates of population proportions, by collecting the information about social networks. By applying the theories behind reciprocal models, RDS results in population proportion estimates for dichotomous and nondichotomous variables. The main purpose of this study is to introduce data smoothing that is an effective method to overcome overdetermination problem in nondichotomous variables case has been reviewed. An hypothetical data has been applied to show the novelty of this method for finding population proportion estimates for nondichotomous variables.
ACKNOWLEDGMENTS
The authors would like to acknowledge the following article extracted from a survey under the title of "Respondent Driven Sampling Statistical Inferences to Estimate Demographical Parameters" which is supported by National Population Studies and Comprehensive Management Institute in 2015 by the registered number of 20/18616.