Abstract: Novel data dissemination and collection algorithms for Wireless Sensor Networks (WSNs) were developed in which n sensor nodes are distributed randomly in a certain field to measure a physical phenomenon. Such sensors have limited energy, short covering range, band width and memory constraints. It is desired to disseminate the sensed data throughout the network such that a base station will be able to collect the sensed data with high probability by querying a small number of nodes. Two data Dissemination and Collection Algorithms (DCAs) were proposed to solve the data collection and dissemination problem. In particular, data dissemination was achieved through dynamical selection of some nodes. The selected nodes will be changed after a time slot t and will be repeated after a period T. The simulation and performance results met the developed theoretical bounds.
INTRODUCTION
Wireless Sensor Networks (WSNs) are expanding rapidly due to various applications and ease of development. However, WSNs encounter several challenges to be deployed efficiently in a given environment. Such challenges are limited source of energy, limited transmission bandwidth, short covering range, data dissemination, data persistence, redundancy of defective nodes and data security. A typical Wireless Sensor Network (WSN) can be used in many applications such as monitoring physical phenomenon from the surrounding environment like temperature, gases, humidity, volcanoes and tornados. Also, it can be used in tracking animals, forest fire detection and military applications such as detection of enemy intrusion.
Many techniques are used in data dissemination (Imran et al., 2010; Kokalj-Filipovic et al., 2007) and cluster head election (Buttyan and Holczer, 2009; Liu et al., 2007; Younis and Fahmy, 2004). Fountain codes and random walks have been used to disseminate data from k sources to a set of storage nodes n, (Kokalj-Filipovic et al., 2008, 2009). LEACH algorithm (Handy et al., 2002) is the most popular clustering algorithm. Lots of cluster head selection algorithms are based on LEACH architecture. The main drawback of the mentioned techniques is the requirement that all positions of all sensors must be known. Our algorithms dont use Fountain codes or random walks and independent on sensors positions.
This study considered a model for large-scale wireless sensor net- works with n identical sensing nodes distributed randomly and uniformly in a certain field. The nodes do not know the locations of the neighboring nodes as required (Dimakis et al., 2006) and they dont maintain routing tables. In this work, two algorithms were proposed for data dissemination and data collection in wireless sensor networks. The first algorithm is Pre-known Heads for data Dissemination and Collection Algorithm (PHDCA). The second algorithm is Random Heads for data Dissemination and Collection Algorithm (RHDCA). The main aim was to develop an efficient method to randomly distribute and collect information from n sensors by querying 10-20% of nodes for retrieving information about all network nodes with a high probability. The main advantages of the proposed algorithms are as follows:
• | No need for routing tables and the geographical positions of sensing nodes |
• | The possibility of controlling the amount of energy consumption in accordance with the desired application |
• | The algorithms are highly suitable for low data rates applications |
• | The base station can query 10-20% of the total nodes to retrieve information about all sensing nodes |
• | No synchronization is needed between network nodes |
This study was organized as follows:
• | Section 1: Background and short survey of the related work |
• | Section 2: Network model |
• | Section 3: Proposed DCAs algorithms |
• | Section 4: Some analysis for the DCAs algorithms |
• | Section 5: Simulation studies for the proposed algorithms. The conclusions were presented in Section IX |
NETWORK MODEL
Network model was presented and described. For example: Consider a set of n identical sensing nodes distributed randomly in a field F of dimensions A = L×W, where L and W are the length and the width of F, respectively. It was assumed that each node has at least one neighboring node, meaning that with probability P = 1 there are no isolated nodes (Fig. 1):
• | Definition 1: (Cluster head): The cluster head node (HN) is an arbitrary node among all network nodes N which exchanges its neighbors data with the other neighboring cluster head nodes |
• | Definition 2: (Node degree): The node degree dsi is the number of neighboring nodes to the node Si within its coverage range. The average mean degree of all nodes in N is given by: |
(1) |
The total period (T) is the period after which the sensed data has been disseminated in the network N and it is divided into equal time slots:
(2) |
for some integer number ε. The algorithm performance and simulation results confirm our theoretic bounds.
Fig. 1: | A model for WSNs with n nodes distributed randomly and uniformly among them are k cluster head nodes with a blue color to illustrate data dissemination. The base station query some node to retrieve network data |
The head nodes consume more energy than other nodes due to excess transmissions
needed for data dissemination and data collection. So, the head nodes are dynamically
selected to apply fairness in energy consumption on all nodes. Also, the dynamical
selection improves the performance of data dissemination in the network. The
head nodes will be changed every time slot t. The number of head nodes in the
network is k (where k/n
Assumptions: Let S = {S1, .Sn} be a set of n identical sensing nodes distributed randomly in a field F of dimensions A = L×W, where L and W are the length and the width of F, respectively.
Let H = {h1, .hk} be a set of k head nodes selected from the n sensing nodes to disseminate the data in the network and they will changed at each time slot t.
Let T be the period after which the sensed data has been disseminated in the network and it is divided into equal slots t = {t1, ..te}.
The nodes use flooding to know their neighbors, as each node will send a message
containing its
Each node in the network generates a packet
(3) |
where,
Each node has radio range coverage ri.
The node si will be considered as a neighbor of sj if
and only if
Initially, let the number of nodes n is known. Practically, the number of nodes in the network varies due to node energy depletion, failure nodes and added redundant nodes. Hence, it is important to estimate the number of nodes at each period T. The base station will consider the number of retrieved nodes when querying 10% of nodes as the total number of nodes n. The estimated number of nodes will be sent to the first survived head node (i.e., the first survived node from H) to be disseminated in the network. Figure 2 and 3 show the distribution of the head nodes at time slots t0, t1 in PHDCA and RHDCA, respectively.
DCAS ALGORITHM
The DCAs algorithms were described in detail as follows.
PHDCA algorithm: This algorithm dynamically selected the k cluster head nodes that disseminate the data in the network according to a pre-known manner. The algorithm can be classified into four phases as follows:
• | Initialization phase: In this phase, the head nodes
are initially selected from |
Fig. 2: | The distribution of head nodes at time slots t0, t1 in PHDCA |
Fig. 3: | The distribution of head nodes at time slots t0,t1 in RHDCA |
• | Flooding phase: In this phase, each sensor broadcasts a message
containing its |
(4) |
• | Sensing and data dissemination phase: In this phase, each sensor reads a new data, it will send this data to some of its neighboring nodes: |
(5) |
The neighboring head nodes will disseminate data in the network by exchanging different data between them. The head nodes will be changed at each time slot and repeated each period T as shown in Algorithm 1. Therefore, dynamical selection of the head nodes improves the performance of data dissemination in the network.
• | Data collection phase: In this phase, the base station can query small number of any nodes to retrieve the data sensed by the n sensing nodes and make estimation for n to send it to the first survived node |
RHDCA algorithm: In PHDCA algorithm, it was assumed that the selection of head nodes is pre-known at each time slot t and the head nodes are repeated each period T. The disadvantage of this algorithm is the topology dependence. The performance of PHDCA depends on the distribution of head nodes. The PHDCA was extended to obtain RHDCA that randomly selects k head nodes at each time slot t. The performance of RHDCA is topology independent due to the randomly selection of head nodes. The main difference between PHDCA and RHDCA algorithms is the sensing and dissemination phase as follows:
• | Sensing and data dissemination phase: In this phase, k head nodes are selected randomly at each time slot t. Each head node will have a status bit is set to 1 to indicate that the node is head node or to 0 otherwise: |
(6) |
Also, each sensor reads a new data which will be sent to its neighboring nodes. Neighboring head nodes will disseminate data in the network by exchanging the different data between them. Each current head node will select randomly one of its neighbors to be a head node at the next time slot t.
Algorithm 1: | PHDCA algorithm: Data dissemination algorithm for WSNs using dynamic deterministic cluster head nodes. |
Algorithm 2
RHDCA algorithm: Data dissemination algorithm for WSNs using dynamic
random cluster head nodes.
Proof: Number of ways in which the m nodes can be drawn from the total number of nodes n is:
Number of ways so that no head nodes exist in the set M is
Lemma 4: The probability that a set M of sensors has a set Z of cluster head nodes is given by:
(8) |
where, z = |Z| is the number of nodes in Z.
Proof: Number of ways in which the m nodes can be drawn from the n sensing
nodes is
Definition 5: (Head energy consumption (Eh)): is the energy consumption at the nodes n due to data dissemination in the network N when all nodes have the same coverage range and packet size.
Lemma 6: Let β be the probability that a node si has a set Z of neighboring head nodes. From Eq. 8, β can be given by:
where, zsi is the number of neighboring head nodes to node si
and
The total energy consumption Eh is given by:
(9) |
where, αi is the number of transmissions between the node si and its neighbors and pt, pr are the transmitted and received energy costs due to sending one packet.
Proof: Let σ be the received energy cost of nodes n due to data dissemination, so:
where, k/n is the probability that a node si is a head node. Let ξ be the transmitted energy cost of n nodes due to data dissemination, so:
Therefore, the total energy consumption due to data dissemination at time slots is given by:
Lemma 7: The total energy consumption at the sensing nodes due to sending the sensed data to their neighbors is given by:
(10) |
where all nodes have the same coverage range and packet size.
Proof: The energy consumption at nodes n due to sending its sensed data
is npt. The energy consumption at nodes n due to all received packets
is.
Lemma 8: The optimum number of head nodes that gives minimum energy consumption is given by:
(11) |
Where:
Proof: The optimum number of head nodes that gives minimum energy consumption can be driven by the differentiation of Eq. 9 as follows:
Lemma 9: The total number of nodes that achieves a certain value of the average mean degree of all nodes μ which indicates the network density, when they distributed randomly and uniformly in a certain field of region A = L×W is given by:
(12) |
where, πr2≤a≤(2r)2.
Proof: As the nodes are distributed uniformly in the field, an arbitrary area of dimensions πr2≤a≤(2r)2 will contain μ nodes. Hence, the total number of nodes is:
Therefore, Lemma 9 illustrates the relation between total number of nodes n in the network N, mean degree of graph μ, coverage area of sensors and the field area.
SIMULATION AND PERFORMANCE EVALUATIONS
In this section we will demonstrate some simulation results to illustrate the performance of the proposed algorithms.
Definition 10: Decoding Ratio (η) is the ratio between the number
of queried nodes
(13) |
Definition 11: Successful Decoding Probability (Ps) is the
probability that the n source packets are all recovered from the
Figure 4 and 6 show the relation between the successful decoding probability and the decoding ratio for different values of sensing nodes n in PHDCA and RHDCA algorithms. Increasing the number of network nodes n and fixing the covering radius r of all nodes will result in an improvement in the successful decoding probability as well. We can notice that as the number of nodes increases, the ratio of queried sensors will be decreased to recover data with a reasonable successful probability. Particularly, for n >500, we see that querying up to 10% will reveal about 85% of network data in PHDCA and about 92% of network data in RHDCA.
Figure 5 and 7 show the amount of energy consumption at each node after the dissemination of data in the network N in PHDCA and RHDCA algorithms. From these figures, it can be noticed that energy consumption in PHDCA algorithm is better than the obtained result in RHDCA algorithm. We assumed that energy consumption at the sensing node due to sensing the data itself is neglected and each sensor node is assumed to be of initial battery charge 5 Joule. The energy consumption was calculated according to Meghanathan et al. (2010).
Fig. 4: | Thes figure shows the relation between the successful decoding probability and the decoding ratio for n = 100, n = 200, n = 300, n =400, n = 500 when A = 100x100 and r = 10 in PHDCA |
Fig. 5: | Energy consumption at each node in network N in PHDCA after period T |
Fig. 6: | This figure shows relation between the successful decoding probability and the decoding ratio for n = 100, n = 200, n = 300, n =500, n = 1000 when A = 100 * 100 and r = 10 in RHDCA |
Fig. 7: | Energy consumption at each node in network N in RHDCA after period T |
They assumed that the energy lost at a sensor node si due to transmission of one packet is given by:
(14) |
and the energy lost at a sensor node si due to receiving of one packet is given by:
(15) |
where Csi is the packet size of node si.
Definition 12: Death Rate (DR) is the ratio between the number of dead
nodes
(16) |
Figure 8 and 9 illustrate the relation between the death rate and the total number of sensing nodes. Increasing the number of network nodes n and fixing the field area will result in an increasing in the death rate as well due to the excess transmissions needed for data dissemination.
Figure 10 and 11 show the relation between the performance of data collection and the elapsed time in DCAs algorithms. As the elapsed time increases, more nodes disappear from the network N (i.e., DR increases) due to energy depletion.
Fig. 8: | Relation between the death rate and number of nodes n in PHDCA |
Fig. 9: | Relation between the death rate and number of nodes n in RHDCA |
Fig. 10: | Performance of PHDCA algorithm along time |
Fig. 11: | Performance of RHDCA algorithm along time |
Hence, data dissemination and data collection performances will be negatively
affected by the disappeared nodes
COMPARISON
This section provided evaluation and comparison analysis between PHDCA and RHDCA algorithms. PHDCA has low energy consumption and fixed data dissemination performance over different periods on account of the periodic selection of cluster head nodes.
Fig. 12: | A comparison between the performance of PHDCA and RHDCA algorithms |
This is in contrast to RHDCA which consumes more energy and has non predictable data dissemination performance over different periods due to the random selection of cluster head nodes. PHDCA has frustrating data dissemination performance with regard to RHDCA on account of the dependence on the topology of the pre-known cluster head nodes. RHDCA is suitable for the applications that require high data dissemination performance and PHDCA is suitable for the applications that have energy limitations.
Figure 12 shows a comparison between the data dissemination performance of PHDCA and RHDCA. The figure shows that the data collection performance of RHDCA is better than the performance of PHDCA as the probability of successful de- coding in RHDCA is higher than the probability of successful decoding in PHDCA for the same decoding ratio.
RELATED WORK
In this section we will indicate the related work to our work.
• | The authors in Aly et al. (2011) proposed a distributed data collection algorithm to store and forward information obtained by wireless sensor networks. They used n-k storage nodes to collect the sensed data from the network, where k is the sensor nodes, n is the total number of nodes and (n-k)/n is 20% |
• | The authors in Aly et al. (2009), Kong et al. (2010), Aly et al. (2008) suggested two distributed storage algorithms for large-scale wireless sensor networks. They assigned a time-to-live counter to each node packet depending on its degree. According to this counter, each packet can travel to a certain number of hops. Each node chooses randomly one of its neighbors to send its data to another neighbor. The receiver node will decide with a random probability if it will accept the incoming message or not. The base station can query about 20%- 30% of the network nodes in order to retrieve the data collected by the n sensing nodes |
• | The authors in Dimakis et al. (2005) used a decentralized implementation of Fountain codes that uses geographic routing and every node has to know its location |
• | The authors in Kamra et al. (2006) proposed a novel technique called growth codes to increase data persistence in wireless sensor networks, i.e. increasing the amount of information that can be recovered at the sink |
• | The authors in Dimakis et al. (2010) presented a general theoretic framework that can determine the information that must be communicated to repair failures in encoded systems and identified a tradeoff between storage and repair bandwidth |
• | Authors in Heinzelman et al. (2002) used a central controller to select CH nodes throughout the network. The main drawbacks of this algorithm are non-automatic cluster-head selection and the requirement that the position of all sensors must be known |
• | Authors in Handy et al. (2002) extended LEACH stochastic algorithm with a deterministic cluster-head selection, which utilizes the remaining energy level of each node to determine the threshold |
• | The authors in Younis and Fahmy (2004) proposed a distributed clustering scheme HEED (Hybrid Energy- Efficient Distributed Clustering) in which CH nodes are picked from the deployed sensors. HEED considers a hybrid of energy and communication cost when selecting CHs |
• | The authors in Meghanathan et al. (2010), Eriksson et al. (2008) used mobile sinks to obtain potential energy savings for the sensors during data dissemination in wireless sensor networks. Each sink node is assigned a particular cluster of sensors to monitor and collect data. A sink node moves to the vicinity of the sensor nodes (within a few hops) to collect data. The collected data is exchanged with peer mobile sinks and can also be transferred to a control center through multi-hop sink-to-sink data propagation |
• | The authors in Bandyopadhyay and Coyle (2003) proposed a distributed, randomized clustering algorithm to organize the sensors in a wireless sensor network into clusters. They extended this algorithm to generate a hierarchy of cluster heads and observe that the energy savings increase with the number of levels in the hierarchy |
• | The authors in Buttyan and Holczer (2009) presented simple protocol suitable for both locations based and topology based clustering. Also, they proposed a useful extension to this basic protocol. They showed how to fine tune its parameters such that the amount of information leaked by the protocol about the identity of the cluster heads is minimized |
• | The authors in Lin et al. (2007a) considered cluster-based architecture and provided distributed clustering algorithms for mobile sensor nodes which minimize the energy dissipation for data-gathering in a wireless mobile sensor network |
• | The authors in Imran et al. (2010) proposed a gossip based protocol that consumes little resources. The proposed scheme aimed to keep the routing table size R as low as possible |
• | The authors in Lin et al. (2007b) proposed priority random linear codes, to maintain measurement data in different priorities, such that critical data have a higher opportunity to survive node failures than data of less importance |
• | The authors in Yu et al. (2009) proposed optimal data storage (ODS) algorithms that can produce global optimal data storage position in linear, grid, and mesh network topologies. To reduce the computation of ODS in the mesh network topology, they presented a near-optimal data storage (NDS) algorithm, which is an approximation algorithm and can obtain a local optimal position |
• | The authors in Kokalj-Filipovic et al. (2009) studied decentralized, Fountain and network-coding based strategies to allow for a reduced delay collection by a data collector who accesses the network at a random position and random time. Data dissemination is performed by a set of relays which form a circular route to exchange source packets |
Fig. 13: | Wireless sensor devices are deployed in Minna field for gas monitoring, temperature monitoring and crowd sensing. Approximately 50.000 camp tents are located in Mina to accommodate 3-5 million people for 4-8 days during pilgrimage, according to 2010 KSA statistics |
PRACTICAL ASPECTS
The proposed algorithms are suitable to be applied on the American-made camp tents in Minna and Arafat fields located in the east south of Makkah, KSA for monitoring and measuring certain phenomenon such as temperature, gases, pollution and crowd estimation. Approximately 50.000 camp tents are located in Minna to accommodate 3-5 million people for 4-8 days during pilgrimage, according to 2010 KSA statistics. Hence, a safety system is needed to protect the camp tents against fires and pollution. Wireless sensor devices can be scattered in Minna field to gather and collect the required data to be monitored at the base station to take the safety precautions at emergency cases as shown in Fig. 13. Also, The Wireless sensor devices can detect the crowded areas and inform the base station about the non-crowded areas to be exploited.
CONCLUSION
This study presented two algorithms for data dissemination and collection in wireless sensor networks. Given n sensing nodes with limited buffers. The study demonstrated schemes to disseminate sensed data throughout the network with less computational overhead. The proposed algorithms did not assume any pre-known of routing tables or nodes locations. In addition, the time factor T increases the network life time, as it can be selected to be suitable for the intended applications and minimizing energy consumption. Our future work will develop accurate practical algorithms to optimize energy consumptions in the sensor network.
ACKNOWLEDGMENTS
This study is funded by a grant from the Long-Term National Plan for Science, Technology and Innovation (LT- NPSTI), No. 11-INF1702-10, the King Abdulaziz City for Science and Technology (KACST), Kingdom of Saudi Arabia. We thank the Science and Technology Unit at Umm Al-Qura University for their continued logistics support.