**ABSTRACT**

In this study, we focus on the development of energy efficient and achievable load balancing mechanisms for wireless sensor networks. Due to resource constraint and tremendous amount of sensors, one possible way of achieving maximum lifetime of the network is applying data aggregation on sensor data. However, existing approaches introduce significant computation and control overheads that often not suitable for sensor networks applications. In view of this, we propose an in-network data aggregation scheme by exploiting the inherent spatial correlation of the sensed data. Each sensor multiplies its reading with a random coefficient and sends the product to the next hop to calculate a weighted sum of all the massage. Instead of receiving individual sensor readings, the sink will receive all weighted sum and restore the original data. By doing so, each node only performs one addition and one multiplication in order to compute one weighted sum and consume the same amount of energy. Simulation results demonstrate that the proposed scheme is efficient and outperforms existing schemes in terms of energy gain and network lifetime.

PDF Abstract XML References Citation

**Received:**June 06, 2010;

**Accepted:**August 16, 2010;

**Published:**November 12, 2010

####
**How to cite this article**

*Information Technology Journal, 10: 409-415.*

**DOI:**10.3923/itj.2011.409.415

**URL:**https://scialert.net/abstract/?doi=itj.2011.409.415

**INTRODUCTION**

Wireless sensor networks are usually composed of hundreds or thousands of inexpensive, low-powered sensing devices with limited memory, computational and communication resources (Akyildiz *et al*., 2002). Prior researches explored various aspects of Wireless Sensor Networks (WSNs), including system architecture (Gober *et al*., 2005), routing (Meng *et al*., 2008) data gathering (Yu *et al*., 2008), power efficiency (Wei *et al*., 2007) and so on. Energy is a primary constraint in the deployment of such WSNs, since sensor nodes are typically powered by small batteries, which cannot be generally changed or recharged (Pottie and Kaiser, 2000).

In-network data aggregation has been a promising solution to reduce the number of transmissions or the length of the packets to be transmitted. On the other hand, the number of sensor nodes deployed could be on the order of hundreds or thousands and data transmissions are accomplished through multi-hop routing from individual sensor nodes to the data sink (Yick and Ghosal, 2008). Successful deployment of such large-scale sensor networks faces two major challenges. First, reduce transmission costs and thereby save energy resources. Second, the need for energy consumption load balancing and extend the lifetime of the sensor networks.

The need for global communication cost reduction is obvious because such sensor networks typically are composed of tremendous amount of sensor, generating significant data to be delivered to data sink. It is very much desired to take full advantage of the correlations among the sensor data to reduce the cost of communication. Data aggregation as an important aspect of sensor networks has been widely studied to achieve energy saving. However, existing approaches adopt in-network data aggregation, such as entropy coding (Cristescu *et al*., 2006) or transform coding (Ciancil *et al*., 2006), to reduce global traffic. However, these approaches introduce significant computation and control overheads that often not suitable for sensor networks applications.

The need for energy consumption load balancing is also clear because of the required multi-hop data transmission for such large-scale sensor networks. Suppose N sensors form a multi-hop route to the sink.

Obviously, the sensor nodes closer to the sink will soon run out of energy and lifetime of sensor network will be significantly shortened.

This study presents a novel approach to apply compressive sampling theory (Candes and Wakin, 2008) to sensor data aggregation for large-scale wireless sensor networks, successfully addressing the two major challenges as outlined above. First, the proposed scheme is able to achieve substantial sensor data aggregation without introducing excessive computation and control overheads. Second, the proposed scheme is also able to disperse the communication costs to all sensor nodes along a given sensor data gathering route. This will result in a natural load balancing and extend the lifetime of the sensor network.

This study makes three main contributions. First, we take both energy saving and load balancing into consideration in deploying a large-scale sensor networks. Second, the proposed scheme achieves substantial sensor data aggregation without introducing excessive computation and control overheads. Third and more importantly, we take advantage of spatial correlations in sensor readings and distribute the traffic uniform on all sensor nodes, which significantly prolong the network lifetime.

**RELATED WORK**

The fundamental assumption of in-network data aggregation is that sensor nodes have spatial correlations in their readings. Cristescu *et al*. (2006) propose a joint entropy coding approach, where nodes use relayed data as side information to encode their readings. By doing so, jointly encoded messages cost less bits than independently encoded messages. However, the joint entropy coding relies heavily on how the routes are organized and performed complex computation in sensors. Ciancil *et al*. (2006) propose to compress piecewise smooth data through distributed wavelet transform. In doing so, even nodes first broadcast their readings. Upon receiving the readings from both sides, odd nodes compute the high pass coefficients h (•). Then, odd nodes transmit h (•) back and even nodes compute the low pass coefficients l (•). After the transform, nodes transmit coefficients to the sink. Although wavelet de-correlation can be performed for multiple levels, it is not suggested to do so in distributed processing because of the large amount of communication overhead.

Distributed source coding (Chou *et al*., 2003; Cristescu *et al*., 2004; Hua and Chen, 2008) intend to reduce complexity at sensor nodes and utilize correlation at the sink. They are based on the Slepian-Wolf coding theory (Slepian and Wolf, 1973), which claims that aggregation of correlated readings, when separately encoded, can achieve same efficiency as if they are jointly encoded, provide that messages are jointly decoded. This important conclusion not only eliminates data exchanges, but also decouples routing from aggregation. After encoding sensor readings independently, each node simply sends the compressed message along the shortest path to the sink. However, a prerequisite of Slepian-Wolf coding is that the global correlation structure needs to be known in order to allocate appropriate number of bits to be used by each node. In view of this, Yuen *et al*. (2008) adopts a localized Slepian-Wolf coding scheme. Based on the assumption that sensors outside immediate neighbourhood have weak correlation in their readings, a node may only consider its data correlation with one-hop neighbours when determining the size of encoded message.

With the emergence of compressive sampling theory (Baraniuk, 2007; Donoho, 2006; Candes and Wakin, 2008), researchers have seen a new avenue of research in the field of in-network data aggregation. Compressive Wireless Sensing (CWS) (Bajwa *et al*., 2006) appears to be able to reduce the latency of data gathering in a single-hop network by delivering linear projections of sensor readings through synchronized amplitude-modulated analogue transmissions. Due to the difficulties in analogue synchronization, CWS is less practical for large-scale sensor networks. Rabbat *et al*. (2006) leverages compressive sampling for data persistence, instead of data gathering, in a WSN. In an overview study, Haupt *et al*. (2008) also speculate the potential of using compressive sampling theory for data aggregation in a multi-hop WSN. However, no real scheme has been reported based on this initial idea.

**THE PROPOSED SCHEME**

The objective of compressive data gathering is two-fold: compress sensor readings to reduce global data traffic and distribute energy consumption evenly to prolong network lifetime. Similar to distributed source coding, the data correlation pattern shall be utilized on the decoder end. Besides, aggregation and routing are decoupled and therefore can be separately optimized.

**System model:** In wireless sensor network, a large number of sensor nodes collect application specific information from the environment and this information is transferred to a central base station where it is processed, analyzed and used by the application. Figure 1 shows a typical routing tree in which the sink has two children.

Fig. 1: | Tree-based data aggregation |

Each of them leads a sub-tree. Data aggregation and data restore are performed on the sub-tree basis.

To facilitate efficient aggregation, when a node chooses a parent node, it sends a subscribe notification to that node; when a node changes parent, it sends an unsubscribe notification to the old parent. After all nodes acquire their readings, leaf nodes initiate the transmission. Data transmissions are accomplished through multi-hop routing from lower level leaf nodes to the upper level data sink.

**Data aggregation:** We suppose N sensors, denoted as s_{1}, s_{2}, …, s_{N}. Let, d_{j} denote the readings obtained by node s_{j}. Consider Figure 1 again as an example, leaf node s_{1} generates a random number Φ_{i1}, computes Φ_{i1}d_{1} and transmits the value to s_{5}. The index i denotes the i-th weighted sum ranging from 1 to M. similarly, s_{2}, s_{3} and s_{4} transmits Φ_{i2}d_{2}, Φ_{i3}d_{3} and Φ_{i4}d_{4} to s_{5}. Once s_{5} receives the four values, it computes Φ_{i5}d_{5}, adds to the sum of relayed values and transmits:

to s_{10}, s_{10} also receives product from s_{9}, denotes as:

then s_{10} computes Φ_{i10}d_{10} and

finally the message containing the weighted sum of all readings in a sub-tree is forwarded to the sink.

Assume that there are N nodes in a particular tree and the sink intends to collect M measurements. Then all nodes send the same number of O (M) messages regardless of their hop distance to the sink. The overall message complexity is O (NM). When M<<N, our scheme transmits less messages than the conventional data collection whose worst-case message complexity is O (N^{2}). More importantly, the transmission load is spread out uniformly so that the lifetime of bottleneck sensors and the entire network is greatly extended.

The i-th weighted sum can be represented by:

(1) |

The sink obtains M weighted sums {y_{i}}, i=1, 2, …, M. Mathematically, we have:

(2) |

In Eq. 2, each column of {Φ_{ij}} contains the series of random numbers generated at a corresponding node. In order to avoid transmitting this random matrix from sensors to the sink, we adopt a simple strategy: before data transmission, the sink broadcasts a random seed to the entire network. Then each sensor generates its own seed using this global seed and its unique identification. With a pre-installed pseudo random number generator, each sensor is able to generate the corresponding series of coefficients. These coefficients can be reproduced at the sink given that the sink knows the identifications of all sensors.

Since, the random coefficients Φ_{ij} are irrelevant to sensor readings, we may treat d_{i} as a vector. The weighted sums y_{i} become vectors of the same dimension too. When M<N, solving a set of M linear equations with N unknown variables is an ill-posed problem. However, in most cases, the sensor field follows a certain structure because of the spatial or temporal correlations. Hence, the signal is sparse in a transform domain. Under this assumption, what requirement M should meet to solve them and how these equations can be solved. Fortunately, the compressive sampling theory has a positive answer to these questions.

**Data restore:** According to compressive sampling theory, a K-sparse signal can be reconstructed form a small number of measurements with a probability close to one. The weighted sums obtained in Eq. 2 are a typical type of measurements. Signal sparsity characterizes the correlations within a signal. An N-dimensional signal is considered as a K-sparse signal if there exists a domain in which this signal can be represented by K (K<<N) non-zero coefficients.

In a densely deployed sensor networks, sensors have spatial correlations in their readings. Let N sensor readings form a vector d= [d_{1} d_{2} … d_{N}]^{T}, then d is a K-sparse signal in a particular domain Ψ. Denote Ψ= [ψ_{1} ψ_{2} … ψ_{N} ] as the representation basis with vectors {ψ_{i}} as columns and x = [x_{1}, x_{2},…, x_{N}]^{T} are the corresponding coefficients. Then, d can be represented in the ψ domain as:

(3) |

Compressive sampling theory tells that a K-sparse signal can be reconstructed from M measurements if M satisfies the following conditions (Candes *et al*., 2006):

(4) |

where, c is a positive constant, Φ is the sampling matrix as defined in Eq. 2 and μ (Φ, Ψ) is the coherence between sampling basis Φ and representation basis Ψ. The coherence metric measures the largest correlation between any two elements of Φ and Ψ and is defined as:

(5) |

From Eq. 5, we can see that the smaller the coherence between Φ and Ψ is, the less measurements are needed to reconstruct the signal. In practice, using random measurement matrix is a convenient choice, since a random basis has been shown to largely incoherent with any fixed basis and M = 3K~4K is usually sufficient to satisfy Eq. 4.

With sufficient number of measurements, the sink is able to reconstruct sensor readings through solving an *l*_{1}-minization problem:

(6) |

Suppose x’ is the solution to this convex optimization problem, then the proposed reconstruction of the original signal is d’=Ψx’. It has been shown that the above *l*_{1}-minization problem can be solved with Linear Programming (LP) techniques (Donoho, 2006).

In Eq. 6, the Ψ matrix describes the correlation pattern among sensor readings. It utilizes only in data recovery process and is not required to be known to sensors. In this way, most of the computations are shifted from sensors to the sink. Such asymmetry of computation complexity makes it an appealing choice for WSNs.

**Capacity gain:** The previous subsection illustrated how to gather and recover sensor readings acquired in one time instance. This subsection will investigate the benefit of the proposed scheme from the viewpoint of network capacity, i.e., how it can achieve the capacity over baseline transmission. So, let us look at the baseline transmission first, in Fig. 1, nodes s_{1}, s_{2}, s_{3} and s_{4} transmit their readings d_{1}, d_{2}, d_{3} and d_{4} to s_{5}, s_{5} transmits both its reading d_{5} and the relayed readings to s_{10}. At the end of the route, s_{10} transmits all readings in its sub-tree to the sink.

**Definition 1:** We shall define that a rata ë is achievable in a data gathering sensor network, if there exists a time instance t_{0} and duration T such that [t_{0}, t_{0}+T] the sink receives ëT bits of data generated by each of the sensors s_{i}, i=1, 2, …, N. Then, network capacity C is defined as the supremum of the achievable rata, or C=sup {λ}.

Note that the traffic pattern in our study is many-to-one. We let all sensors generate data at the same rate and assume that sensor readings acquired at the same time instance are K-sparse. Denote λ_{1} as the network capacity of baseline transmission. It is achieved when every node is allowed to transmit once every n_{1} slots and traffic is evenly distributed among n_{2} one-hop neighbours of the sink. Then we have:

(7) |

where, W is the total transmission capacity of each sensor node. Denote λ_{2} as the network capacity of the proposed scheme. If the same transmission schedule and routing structure are adopted, we have:

(8) |

From Eq. 7 and 8, we can conclude that our scheme can achieve a capacity gain of N/M over baseline transmission.

**SIMULATION**

The goal of this simulation study is assessment of the achievable energy gain during the compress-with-transmission process. Unless specified otherwise, our simulations were set up with the following parameters: we adopt a grid topology contains N nodes, the network capacity gain keeps N/M = 5 in different circumstances.

Fig. 2: | Total energy consumption for BG scheme, CAG scheme and our scheme |

The distance between adjacent nodes is 15 m. The sink is in the middle of the network. The sub-trees contain similar number of sensor nodes, though not exactly the same. We run 10 times for each parameter setting and in each test run, ten messages per node are collected at given intervals. We compare our proposed scheme with chain-based gathering scheme (Chen and Wang, 2008) and clustered-based gathering scheme (Yoon and Shahabi, 2005).

The chain-based scheme (Chen and Wang, 2008) can treated as a Baseline Gathering (BG) scheme since it is expected to deploy in large-scale networks and data transmissions are accomplished through multi-hop routing from individual sensor nodes to the data sink. The clustered aggregation (CAG) technique (Yoon and Shahabi, 2005) forms clusters based on sensing values. By grouping sensors with similar readings, CAG only transmits one reading per group to achieve a predefined error threshold.

Figure 2 presents the total energy consumption as a function of the number of sensors nodes for the BG scheme, the CAG scheme and our scheme. Based on Fig. 2, we observed that the BG scheme has highest energy consumption among the three schemes and our scheme has least energy consumption, besides, our scheme gain more benefits as N increases. For example, when the number of sensors is 2000, the BG scheme and the CAG scheme consume 32.2 J and 19.5 J of total energy, respectively, approximately 5 times and 3 times more than our scheme, respectively, which consumes 6.3 J of total energy cost. While, in the BG scheme, the sensor readings are transmitted along a long-range multi-hop route to the data sink. The closer a sensor is to the sink, the more energy consumed.

Fig. 3: | Energy gain for BG scheme, CAG scheme and our scheme |

Thus, the sensor nodes closer to the data sink will soon run out of energy and lifetime of sensor network will be significantly shortened. Similarly, in the CAG scheme, cluster-head sensor nodes consumed more energy in performing complex computation and communication tasks, leading to the depletion of its energy quickly, as a result, this imbalance consumption drops down the system lifetime as well as individual sensor node, when the number of sensor nodes increases, the performance worsen. For our scheme, the energy saved from fewer communication costs and control overheads, leading to lower total energy consumption. We can conclude that the benefit of compressive sampling theory has major effects on total energy saving in our scheme and this verifies the effectiveness of our idea.

In order to gain more insight on the effectiveness of the proposed scheme, we have made a small modification to the above network settings, with 1089 nodes arrange in 33 rows by 33 columns, the sink has four sub-trees and data on each sub-tree can be reconstructed from 55 random measurements. We show energy gain as a function of time (t) for the BG scheme, the CAG scheme and the proposed scheme in Fig. 3. As showed in Fig. 3, our scheme has an increases energy gain graceful with the run time, while in both BG scheme and CAG scheme, the energy gain drop dramatically with the run time. We attribute this difference to apply different data gathering patterns in the three schemes. In our scheme, we applying compressive sampling theory to achieves communication efficient and load balance, leads to almost 50% energy gain when t = 50, while in the BG scheme and the CAG scheme, they are either introduce complex computation in sensors or require a large amount of data exchanges, resulting in consume significant energy and load imbalance. Thus, when t = 50, the energy gain of BG and CAG scheme is 0.5 and 7, respectively. Therefore, the results verify the above analysis and well confirm the effectiveness of the proposed scheme again.

**CONCLUSION**

In this study, we take the advantage of the inherent spatial correlation among the sensed data and developed novel techniques to aggregate sensed data in large-scale wireless sensor networks. The proposed scheme is able to achieve a capacity gain of N/M over baseline scheme, moreover, our scheme can achieve desired energy saving and load balancing. Compared with existing works, our scheme can significant extend the lifetime of the sensor networks as well as individual sensors. In our future work, we will explore different aggregation algorithms and consider other important metrics, such as data accuracy, latency, fault-tolerant and aggregation ratio.

**ACKNOWLEDGMENTS**

This study is supported by the Scientific Research Fund of Hunan Provincial Education Department under Grant No. 09C403 (2009.6-2012.6, Hunan University of Science and Technology), National Natural Science Foundation of Hunan Province and Xiangtan united Foundation under Grant No. 09JJ9006 (2009.6-2012.6, Hunan University of Science and Technology), Key Program of Scientific Research Fund of Hunan Provincial Education Department under Grant No. 09A027 (2009.6-2012.6, Hunan University of Science and Technology).

####
**REFERENCES**

- Akyildiz, I.F., W. Su, Y. Sankarasubramaniam and E. Cayirci, 2002. A survey on sensor networks. IEEE Commun. Mag., 40: 102-114.

CrossRefDirect Link - Meng, L., W. Fu, Z. Xu, J. Zhang and J. Hua, 2008. A novel Ad hoc routing protocol based on mobility prediction. Inform. Technol. J., 7: 537-540.

CrossRefDirect Link - Yu, Y., B. Krishnamachari and V.K. Prasanna, 2008. Data gathering with tunable compression in sensor networks. IEEE Trans. Parallel Distrib. Syst., 19: 276-287.

Direct Link - Wei, D., H.A. Chan and B. Silombela, 2007. Rectangular grids design to balance power consumption for homogeneous sensor networks with high node density. Inform. Technol. J., 6: 827-834.

CrossRefDirect Link - Pottie, G.J. and W.J. Kaiser, 2000. Wireless integrated networks sensors. Commun. ACM, 43: 51-58.

Direct Link - Yick, J., B. Mukherjee and D. Ghosal, 2008. Wireless Sensor Network Survey. J. Comput. Network, 52: 2292-2330.

CrossRefDirect Link - Cristescu, R., B. Beferull-Lozano, M. Vetterli and R. Wattenhofer, 2006. Network correlated data gathering with explicit communication: Np-completeness and algorithms. IEEE/ACM Trans. Network., 14: 41-54.

Direct Link - Ciancil, A., S. Pattem, A. Ortega and B. Krishnamachari, 2006. Energy-efficient data representation and routing for wireless sensor networks based on a distributed wavelet compression algorithm. Proceedings of the 5th International Conference on Information Processing in Sensor Networks, April 19-21, USA., pp: 309-316.

Direct Link - Candes, E.J. and M.B. Wakin, 2008. An introduction to compressive sampling. IEEE Signal Process. Magazine, 25: 21-30.

CrossRef - Chou, J., D. Petrovic and K. Ramchandran, 2003. A distributed and adaptive signal processing approach to reducing energy consumption inn sensor networks. Proceedings of the INFOCOM 22nd Annual Joint Conference of the IEEE Computer and Communications Societies, March 30-April 3, San Francisco, CA, USA., pp: 1054-1062.

Direct Link - Cristescu, R., B. Beferull-Lozano and M. Vetterli, 2004. On network correlated data gathering. Proc. IEEE INFOCOM, 4: 2571-2582.

Direct Link - Hua, G. and C.W. Chen, 2008. Correlated data gathering in wireless sensor networks based on distributed source coding. Int. J. Sensor Networks, 4: 13-22.

Direct Link - Slepian, D. and J.K. Wolf, 1973. Noiseless encoding of correlated information sources. IEEE Trans. Inform. Theory, 19: 471-480.

Direct Link - Yuen, K., B. Liang and L. Baochun, 2008. A distributed framework for correlated data gathering in sensor networks. IEEE Trans. Vehicular Technol., 57: 578-593.

CrossRef - Bajwa, W., J. Haupt, A. Saeed and R. Nowak, 2006. Compressive wireless sensing. Proceedings of the 5th International Conference on Information Processing in Sensor Networks, April 19-21, Nashville, Tennessee, USA., pp: 134-142.

Direct Link - Rabbat, M., J. Haupt, A. Singh and R. Nowak, 2006. Decentralized aggregation and pre-distribution via random gossiping. Proceedings of the 5th International Conference on Information Processing in Sensor Networks, April 19-21, Nashville, Tennessee, USA., pp: 51-59.

Direct Link - Haupt, J., W.U. Bajwa, M. Rabbat and R. Nowak, 2008. Compressed sensing for networked data. IEEE Signal Process. Mag., 25: 92-101.

Direct Link - Candes, E., J. Romberg and T. Tao, 2006. Roust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inform. Theory, 52: 489-509.

Direct Link - Chen, C.W. and Y. Wang, 2008. Chain-type wireless sensor network for monitoring long range infrastructures: Architecture and protocols. Int. J. Distributed Sensor Networks, 4: 287-314.

Direct Link

####
**Related Articles**

Rectangular Grids Design to Balance Power Consumption for Homogeneous Sensor Networks with High Node Density |

A Novel Ad hoc Routing Protocol Based on Mobility Prediction |