INTRODUCTION
Developments in database and networking technologies in the past few decades led to advances in distributed database systems. A DDS is a collection of sites connected by a communication network, in which each site is a database system in its own right, but the sites have agreed to work together, so that a user at any site can access data anywhere in the network exactly as if the data were all stored at the user=s own site (Date, 1990; Özsu and Valduriez, 1991).
The primary concern of a DDS is to design the fragmentation and allocation of the underlying database. Fragmentation unit can be a file where allocation issue becomes the file allocation problem. File allocation problem is studied extensively in the literature, started by Chu (1969) and continued for nonreplicated and replicated models (Apers, 1988; Casey, 1972; Grapa and Belford, 1977; Mahmoud and Riordan 1976; Morgan and Levin, 1977; Ramamoorthy and Wah, 1983; Whitney, 1970). Some studies considered dynamic file allocation (Ames, 1977; Smith, 1981; Wah, 1979; Wang and Chen, 2005; Ahn and Kim, 2005).
Data allocation problem was introduced when Eswaran (1974) first proposed the
data fragmentation. Studies on vertical fragmentation (Babad, 1977; Ceri
et al., 1989; Hoffer, 1976; Navathe et al., 1984), horizontal fragmentation
(Ceri et al., 1983) and mixed fragmentation (Chang and Cheng, 1980; Cheng
et al., 2002; March, 1983; Sacca and Wieder hold, 1985; Sacco, 1986;
Zhang and Orlowska, 1994) were conducted. The allocation of the fragments is
also studied extensively (Ahmad et al., 2002; Apers, 1988; Bakker, 2000;
Chang, 2002; Kwok et al., 1996; So et al., 1999; Zhou et al.,
1999; Gorawski et al., 2005).
In these studies, data allocation has been proposed prior to the design of a database depending on some static data access patterns and/or static query patterns. In a static environment where the access probabilities of nodes to the fragments never change, a static allocation of fragments provides the best solution. However, in a dynamic environment where these probabilities change over time, the static allocation solution would degrade the database performance. Initial studies on dynamic data allocation give a framework for data redistribution (Wilson and Navathe, 1986) and demonstrate how to perform the redistribution process in minimum possible time (RiveraVega et al., 1990). In (Brunstrom et al., 1995), a dynamic data allocation algorithm for nonreplicated database systems is proposed, but no modeling is done to analyze the algorithm. Instead, the paper focused on load balancing issue.
This study proposes a new dynamic data allocation algorithm for nonreplicated distributed databases and analyzes the algorithm using a finitestate Markov chain. Present study is based on the research conducted by Ulus, (1999). In this study, horizontal, vertical or mixed fragmentation can be used. Allocation unit can even be as small as a record or an attribute.
THE THRESHOLD ALGORITHM
In some cases, due to extra storage space need, it could be very costly to
use the optimal algorithm (Ulus, 1999) in its original form. For a less costly
algorithm, the solution is to decrease the need for extra storage space. The
proposed threshold algorithm in this paper serves this purpose.
Let the number of nodes be n and let X_{s} denote the access probability of a node to a particular fragment. Suppose the fragment is stored in this particular node (i.e., it is the owner node). For the sake of simplicity, let X_{d} denote the access probability of all the other nodes to this particular fragment. The owner does local access, whereas the remaining nodes do remote access to the fragment.
The probability that the owner node does not access the fragment is (n1) X_{d}. The probability that the owner node does not perform two successive accesses is ((n1) X_{d})^{2}. Similarly, the probability that the owner node does not perform m successive accesses is ((n1) X_{d})^{m}. Therefore, the probability that the owner node performs at least one access of m successive accesses is 1((n1) X_{d})^{m}.
Table 1 shows the probabilities that the owner node performs at least one access out of m successive accesses, where x_{s} ranges from 0.1 through 0.9 and where m is 5, 10, 25, 50 and 100. The values in the table are truncated to five decimal digits.
According to the table, the probability that the owner node with the access
probability of 0.1 performs at least one access of ten successive accesses is
0.65132. It is trivial from the table that as the access probability of owner
node increases, so as the probability that at least one local access occurs
in m accesses.
Applying the same idea, a new threshold based algorithm (or threshold algorithm) can be proposed. In threshold algorithm, only one counter per fragment is stored. Figure 1 shows fragment I together with its counter. Comparing it to the optimal algorithm, this radically decreases the extra amount of storage space to just one value compared to an array of values in the optimal algorithm.
In the threshold algorithm, the initial value of the counter is zero. The counter
value is increased by one for each remote access to the fragment. It is reset
to zero for a local access. In other words, the counter always shows the number
of successive remote accesses.
Table 1: 
The probability that at least one local access occurs in m
accesses 


Fig. 1: 
Any fragment i in threshold algorithm 

Fig. 2: 
Threshold algorithm 
Whenever the counter exceeds a predetermined threshold value, the ownership
of the fragment is transferred to another node.
At this point, the critical question is which node will be the fragment's new owner. The algorithm gives very little information about the past accesses to the fragment. In fact, throughout the entire access history only the last node that accessed the fragment is known. So, there are two strategies to select the new owner. Either it is chosen randomly, or the last accessing node is chosen. In the former, the randomly chosen node could be one that has never accessed the fragment before. So picking the latter strategy is heuristically more reasonable.
Initially, all fragments are distributed to the nodes according to any method. A threshold value t is chosen. Afterwards, any node j, runs the threshold algorithm given in Fig. 2 for every fragment i, that it stores.
Threshold algorithm overcomes the volley of a fragment between two nodes provided that a threshold value greater than one is chosen. The algorithm guarantees the stay of the fragment for at least (t+1) accesses in the new node after a migration. In other words, it delays the migration of the fragment from any node for at least (t+1) accesses.
An important point in the algorithm is the choice of threshold value. This value will directly affect the mobility of the fragments. It is trivial that as the threshold value increases, the fragment will tend to stay more at a node; and as the threshold value decreases, the fragment will tend to visit more nodes.
Another point in the algorithm is the distribution of the access probabilities. If the access probabilities of all nodes for a particular fragment are equal, the fragment will visit all the nodes. The same applies for two nodes when there are two highest equal access probabilities.
MARKOV CHAIN MODEL OF THRESHOLD ALGORITHM
General Case: Let there be n nodes (n 0 Z^{+}), denoted by 0 through (n 1). Let the threshold value be t (t 0 Z c {0}). For simplicity, suppose the access probabilities of the nodes are discrete random variables. Assume the nodes have access probabilities X_{0} through X_{n1} for a particular fragment, subscripts showing the node index. The following is satisfied for the access probabilities where X_{i} 0 [0, 1] for all i = 0,…, n 1.
Figure 3 shows the finite state diagram of the system described.
In the diagram, two numbers determine the name of each state; first number denotes the node name where the fragment is currently stored, and the second number denotes the successive remote access counter. For example, when the system is in state 00, this means that the fragment is currently stored in node 0, and the current successive remote access counter is 0 (which implies that either last access performed on the fragment is local or the fragment has just migrated to node 0).
There are (t+1) states per node. In all these states, the fragment is stored in that particular node. These states correspond to the different values of successive remote access counter for the node.
The state transition probabilities are given next to each transition indicated
by the arrows. For example, for the state 00 there are several incoming and
outgoing transitions. One transition is both incoming and outgoing with a probability
of x_{0}. This transition implies that with a probability of x_{0},
node 0 accesses the fragment, and the counter, that is already zero, is reset
to zero and the fragment stays at node 0. As a result, the system does not change
a state. Besides this transition, there is only one outgoing transition to state
01 with a probability of (1x_{0}).

Fig. 3: 
Finitestate diagram of the system in general case 
This transition implies that with a probability of (1x_{0}) a remote
node accesses the fragment, and the counter is increased by one, to one. As
a result, the fragment still stays at node 0, but a state change from 00 to
01 takes place. Besides these two transitions, there are two groups of incoming
transitions all with a probability of x_{0}. One group of transitions
comes from the states 01 through 0t. A local access causes these transitions.
As a result of these transitions, the counter is reset to zero and the fragment
still stays in node 0, but it leads to a state change from the previous state
(01 through 0t) to 00. The other group of transitions comes from the states
0t through (n1)t. Before these transitions, the fragment is in a node other
than node 0 and the counter is t. The transition occurs when node 0 accesses
the fragment. As a result, the counter value exceeds the predetermined threshold
value and the fragment is transferred to the ownership of node 0. Hence a state
change from the previous state (0t through (n1)t) to 00 occurs.
Figure 3 shows a Markov chain due to its memory less property.
It is memory less because, for any state the system can enter, the next state
entered depends solely on the current state of the system. Furthermore, this
Markov chain has discretetime, finitestate, irreducible, aperiodic and recurrent
properties. It is discretetime, because the state transitions occur in discrete
times (when an access to the fragment is performed) and state transition duration
is negligible. It is finitestate, because the number of states is finite. It
is irreducible, because every state can be reached from every other state. This
Markov chain is aperiodic, because for every state, the entrance to the same
state is not periodic. This Markov chain is recurrent, because it is finitestate
and irreducible (Kleinrock, 1975).
Let π be a 1 by n probability vector whose elements π_{k}, show the steady state probability that the system is in state k.
Let P be the n by n state transition probability matrix whose elements p_{ij},
show the state transition probability from state i to state j.
Equation 1 defines the steady state of a discretetime, finitestate,
irreducible, aperiodic and recurrent Markov chain. Given the state transition
probability matrix P, the system determines the steadystate probability vector
π (Kleinrock, 1975).
Readjusting Eq. 1 and 2 is obtained.
are n equations and n unknowns. But since one of the equations is linearly
dependent on the others, one more equation is needed to solve the system (Kleinrock,
1975). Last equation is, the one that shows the summation of the steady state
probabilities, given by Eq. 3.
Replacing the first equation in Eq. 2 by Eq. 3 and 4 is obtained.
In Eq. 4, Q, π' and r are as follows.
These equations can be adapted to system in Fig. 3. For the
threshold algorithm model in general case of Fig. 3, let π_{g}
be the 1 by n probability vector and P_{g} be the n by n state transition
probability matrix. They are as follows.
Notice that π_{g} elements have two indices. First index is the
node name and the second index is the successive remote access counter. And
finally, Q_{g} is as follows.
After solving for π_{g} vector in Eq. 4, the
probabilities, that the fragment is in a particular node, are calculated as
follows for all i = 0,.., (n1) where O_{i} denotes the probability
that the fragment is in node i (here notice that the node names are used as
subscripts in calculation).
Since the number of unknowns, namely the equilibrium probabilities in the general case is very large, it is very hard to investigate the general case situation. For the sake of simplicity, a special case, that will decrease the number of unknowns to just two, will be examined.
Special case: Assume an n node DDS. Assume further that one particular node denoted by s has an access probability of x_{s} to a particular fragment of DDS. Suppose all the other nodes denoted by d_{1} through d_{n1} have the equal access probabilities of x_{d} to the same fragment. The following equation is satisfied for the access probabilities where xε[0,1] and x_{d}ε[0,1]
The finitestate diagram of this system is given in Fig. 6.
In the Fig. 4, states s0 through st corresponds to node s that has an access probability of x_{s} to the fragment. For the rest of the nodes d_{1} through d_{n1}, there are the states d_{i}0 through d_{i}t, (n1) of each.

Fig. 4: 
Finitestate diagram of the system in special case 
Lemma 1: For the system of Fig. 4, the steady state
probabilities of all nodes, except node s, corresponding to a particular threshold
value t, is equal. In other words,
where h shows any node index varying from 1 to (n1) and f shows any threshold
value varying from 0 to t.
Proof: (Ulus.,1999).
The finitestate diagram of the system after Lemma 1 is given in Fig. 5.
In Fig. 5, states s0 through st corresponds to node s that has an access probability of x_{s} to the fragment. For the rest of the nodes d_{1} through d_{n1}, there are the states d0 through dt as a corollary to Lemma 1.
For the threshold algorithm model of Fig. 7, let π_{m} be a 1 by n(t+1) steady state probability vector and let P_{m} be the n(t+1) by n(t+1) state transition probability matrix. They are as follows.
It can be easily seen that in π_{m} vector the elements π_{d0}
through π_{dt} repeat themselves (n1) times. The dimension of
the system can be decreased as shown in Lemma 2.
Lemma 2: Let π_{r} be a 1 by 2(t+1) steady state probability vector and let P_{r} be the 2(t+1) by 2(t+1) state transition probability matrix as shown below.
The system of equations given by π_{m}=π_{m} P_{m}
and π_{r}=π_{r}P_{r} are the same

Fig. 5: 
Simplified finitestate diagram of the system in special case 
Proof: (Ulus, 1999)
Theorem 1: Assume that the fragments of a DDS are allocated to n nodes,
denoted by 0 through (n1). Assume all nodes have equal access probability of
x_{d} to a particular fragment except node 0, which has a different
access probability of x_{s} where x_{s} ε [0, 1]
and x_{d} ε [0, 1]. When the threshold algorithm with a threshold
t is used, the fragment will be in node 0 with the probability O_{s}
given by
where x_{s}≠0, x_{d}≠0 and x_{s}≠1.
Proof: (Ulus, 1999)
Theorem 2: Assume that the fragments of a DDS are allocated to n nodes, denoted by 0 through (n1). Assume all nodes have equal access probability of x_{d} to a particular fragment except node 0, which has a different access probability of x_{s} where x_{s} ε [0, 1] and x_{d} ε [0, 1]. When the threshold algorithm with a threshold t is used, the fragment will be in the nodes other than node 0 with the probability O_{d} given by
where x_{s}≠0, x_{d}≠0 and x_{s}≠1.
Proof: (Ulus, 1999)
Equation 6 gives the probability that the fragment is in node 0, whereas Eq. 7 gives the probability that the fragment is in the other nodes. Since the fragment is either in node 0 or in a node other than node 0, the sum of O_{s} and O_{d} is 1.
RESULTS
Let us investigate Eq. 6 and 7. Since, O_{s}+O_{d}=1, investigating only O_{s} is sufficient.
In Eq. 6, the parameters are x_{s}, x_{d}
and t. In other words, the probability that the fragment is in node 0 is determined
by the access probability of node 0, the access probability of the other nodes
and the threshold value. Furthermore, the number of nodes, n, is another parameter,
since it specifies the relationship between x_{s} and x_{d}
with the following formula.
Now, let us find how a change in the access probabilities and the threshold
value effect the probability that the fragment is in any node.
Change in Access Probability: The relation between x_{s} and
x_{d} is given by the following equation.
Since x_{s} and x_{d} are access probabilities, the following
inequalities are satisfied.
When n is held constant, x_{s} and x_{d} are inversely proportional. So, it is sufficient to investigate only the change in x_{s} of O_{s}.
Lemma 3: When X_{s} = 1, O_{s} = 1.
Proof: When x_{s} = 1, all π_{sj} values of O_{s} given by Eq. 5 are 0 except π_{s0} value. π_{s0} value is 1 which makes O_{s} value 1 as well.
Lemma 4: When x_{s} = 0, O_{s} = 0.
Proof: When x_{s} = 0, all π_{sj} values of O_{s} given by Eq. 5 are 0 which makes O_{s} value 0 as well.
Lemma 5: O_{s} is strictly increasing with respect to x_{s} in the interval of (0,1).
Proof: Let us investigate the change in O_{s} with respect to x_{s}. The partial derivative of O_{s} with respect to x_{s} gives the change in O_{s} with respect to x_{s}. The partial derivative is as shown below where O_{1} and O_{2} are the nominator and the denominator of O_{s}, respectively.
It is obvious that the partial derivative is positive for all x_{s}ε[0,1].
Therefore, O_{s} is strictly increasing with respect to x_{s}
in (0,1).
Figure 6 shows the behaviour of O_{s} as a function of x_{s} in a fivenode system. Figure 6 is drawn for three different threshold values, 0, 3 and 10.
For the threshold of 0, O_{s} is a linear function of x_{s} with a slope of 1. This means that when the threshold is 0, the access probability of a node directly gives the steadystate probability that the fragment is in the corresponding node.
For threshold values of 3 and 10, notice the change in steepness of the curve.
Change in threshold value: Threshold t can take only nonnegative integer values. Let us investigate under which circumstances O_{s} is increasing or decreasing with respect to t.
Lemma 6: The following holds for the change in O_{s} with respect to t, provided that x_{s}≠0, x_{d}≠0 and x_{s}≠1:
• 
When x_{s} = x_{d}, O_{s} is constant
with respect to t. 
• 
When x_{s}>x_{d}, O_{s} is increasing with
respect to t. 
• 
When x_{s}<x_{d}, O_{s} is decreasing with
respect to t. 
Proof: To investigate the behaviour of O_{s} with respect to
the threshold, the partial derivative of O_{s} with respect to t should
be examined. But since O_{s} is defined only for nonnegative integer
values of t, it is not continuous for t. Therefore it is not possible to find
the partial derivative of O_{s} with respect to t. Instead, to investigate
the sign of the difference O_{s}(r + 1)  O_{s}(r), for any
positive integer r, would be sufficient. If the sign of this expression is positive,
the probability will be increasing. Otherwise it will be decreasing.
For simplicity, let us substitute a and b given by the equations a = 1x_{s} and b = 1x_{d} in Eq. 6. The difference will be as follows.

Fig. 6: 
O_{s} as a function of x_{s} in a fivenode
system for thresholds 0, 3 and 10. 
In this expression, all the terms except (ba+a^{r+2} (1b)b^{r+2}
(1a) in the nominator are positive provided that x_{s}≠0, x_{d}≠0
and x_{s}≠1. Only the sign of this term determines the sign of the
whole expression. Let D denote this term and let us substitute a and b expressions
back in. The result is as follows.
Let us multiply and divide D by x_{s}x_{d} and readjust it. The expression takes the following form.
Applying Eq. C.2, D is found as follows.
The sign of D depends on the relation between x_{s} and x_{d}.
According to this:
• 
If x_{s} = x_{d}, D is zero. Therefore, when
x_{s} = x_{d}, O_{s} is constant with respect to
t. 
• 
If x_{s}>x_{d}, D is positive. Therefore, when x_{s}>
x_{d}, O_{s} is increasing with respect to t. 
• 
If x_{s}<x_{d}, D is negative. Therefore when x_{s}<x_{d},
O_{s} is decreasing with respect to t. 
Lemma 7: provided that x_{d}≠0.
Proof: Readjusting Eq.6, the following formula is obtained:

Fig. 7: 
O_{s} as a function of t in a fivenode system for
x_{s} values of 0.28, 0.24, 0.16, 0.12 and 0.2 
Using this formula, provided that x_{d}≠ 0:
Figure 7 shows the behavour of O_{s} as a function
of t in a fivenode system. Figure 7 is drawn for five different
access probabilities x_{s} of 0.28, 0.24, 0.2, 0.16 and 0.12.
For 0.28 and 0.24, O_{s} converges to one. This is because x_{s}>x_{d}. Noticing the change in steepness of two curves, it converges faster for greater access probabilities.
For 0.2, O_{s} is constant at 0.2. This is because x_{s} = x_{d}. In this case, the access probability of a node directly gives the steadystate probability that the fragment is in the corresponding node.
For 0.16 and 0.12, O_{s} converges to zero. This is because x_{s}<x_{d}. Noticing the change in steepness of two curves, it converges faster for smaller access probabilities.
CONCLUSIONS
In this study, a new dynamic data allocation algorithm, namely threshold algorithm,
for nonreplicated DSSs is introduced. In the threshold algorithm, the fragments,
previously distributed over a DDS, are continuously reallocated according to
the changing data access patterns. The node in which a fragment is stored is
considered the owner of that particular fragment. When its owner in the past
few successive accesses, specified by the threshold value, never accesses a
fragment, the ownership of the fragment is transferred to another node.
The threshold algorithm is modeled using a finitestate Markov chain. To simplify the model, a special case where the access probabilities of the nodes are all equal except a single node is examined. The equilibrium probabilities for a fragment in any node are obtained in terms of access probabilities and the threshold value. The behavior of a fragment, in reaction to a change in access probabilities or to a change in threshold value, is investigated. It is shown that the fragment tends to stay at the node with higher access probability. As the access probability of the node increases, the tendency to remain at this node also increases. It is also shown that as the threshold value increases, the fragment will tend to stay more at the node with higher access probability.
Threshold algorithm can be used for dynamic data allocation to enhance the performance of nonreplicated DDSs. For further research, the algorithm can be extended to use on the replicated DSSs as in (Sistla et al., 1998; Wolfson et al., 1995, 1997).