Energy Efficient Clustering Algorithm for Data Gathering in Wireless Sensor Networks


Wireless sensor networks are characterized by centralized data gathering, multi-hop communication and many-to-one traffic pattern. These three characteristics may give rise to funneling effects that can lead to severe packet collision, network congestion, packet loss and even congestion collapse. This can also result in hotspots of energy consumption that may cause premature death of sensor nodes and even premature death of entire network. In this paper, exploiting spatial correlation of nodes to form clusters of nodes sensing similar values, and only cluster head sensor reading is transmit to sink, such can efficiently alleviates the funneling effects. A novelty clustering algorithm is proposed which can greatly reduce the number of cluster heads. Experimental results validate the effectiveness of this approach.
Energy Efficient Clustering Algorithm for Data
Gathering in Wireless Sensor Networks
Jutao Hao, Qingkui Chen, Huan Huo
School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology
Shanghai, China
Jingjing Zhao
School of Electric Power and Automation Engineering, Shanghai University of Electric Power
Shanghai, China
Index Terms—wireless sensor networks, data gathering,
clustering algorithm, spatial correlations
In the recent years, the rapid technological advances in
microelectro-mechanical systems, low power and highly
integrated digital electronics, small scale energy supplies,
tiny microprocessors, and low power radio technologies
have created low power, low cost and multifunctional
wireless sensor devices, which can observe and react to
changes in physical phenomena of their surrounding
environments. These sensor devices are equipped with a
small battery, a tiny microprocessor, a radio transceiver,
and a set of transducers that used to acquire information
that reflect the changes in the surrounding environment of
the sensor node. The emergence of these low cost and
small size wireless sensor devices has motivated
intensive research in the last decade addressing the
potential of collaboration among sensors in data
gathering and processing, which led to the invention of
wireless sensor networks(WSNs)[1,2].
A wireless sensor network (WSN) is a wireless
network consisting of spatially distributed autonomous
devices using sensors to cooperatively monitor physical
or environmental conditions, and report the collected data
through wireless interface to a center node (sink node).
The areas of applications of WSNs vary from civil,
healthcare, and environmental to military. Examples of
applications include target tracking in battlefields [3],
habitat monitoring [4], civil structure monitoring [5], and
forest fire detection [6].
Although WSNs resemble conventional ad hoc
networks [7] in many aspects, they have their own
specific features as follows.
Sensors are deployed with a large density in a
wider area compared with nodes in traditional ad
hoc networks.
Naturally data communication in wireless sensor
networks is mainly a multi-point to point
Data samples sensed by sensors are spatio-
temporally correlated. This correlation has been
approved to have great impacts on protocol design
in WSNs.
Most applications in WSNs usually require
information about a specific region. Clearly,
addressing each individual sensor for the available
data leads a large amount of overhead which is not
desired in sensor networks.
One of the advantages of wireless sensors networks is
their ability to operate unattended in harsh environments
in which contemporary human-in-the-loop monitoring
schemes are risky, inefficient and sometimes infeasible.
Therefore, sensors are expected to be deployed randomly
in the area of interest by a relatively uncontrolled means,
e.g. dropped by a helicopter, and to collectively form a
network in an ad-hoc manner [8, 9].
Given the vast area to be covered, the short lifespan of
the battery-operated sensors and the possibility of having
damaged nodes during deployment, large population of
sensors are expected in most WSNs applications. It is
envisioned that hundreds or even thousands of sensor
nodes will be involved. Designing and operating such
large size network would require scalable architectural
and management strategies.
In addition, sensors in such environments are energy
Figure 1. Sensor network architecture.
constrained and their batteries cannot be recharged.
Therefore, with the specific consideration of the unique
properties of sensor networks such limited power,
stringent bandwidth, dynamic topology (due to node
failures, adding/removing nodes, or even physical
mobility), high network density and large scale
deployments have posed many challenges in the design
and management of sensor networks. These challenges
have demanded energy awareness and robust protocol
designs at all layers of the networking protocol stack [10].
Since sensor nodes are energy-constrained, the
networks lifetime is a major concern; especially for
applications of WSNs in harsh environments. There has
been a significant interest in designing algorithms,
applications, and network protocols to reduce energy
usage of sensors [11]. Generally, energy conservation is
dealt with on five different levels [1213]:
efficient scheduling of sensor states to alternate
between sleep and active modes;
efficient control of transmission power to ensure
an optimal tradeoff between energy consumption
and connectivity;
data compression (source coding) to reduce the
amount of uselessly transmitted data;
efficient channel access and packet retransmission
protocols on the Data Link Layer;
Energy-efficient routing, clustering and data
In this paper we will refer mainly to the sensor
network model depicted in FIG. 1 and consisting of one
sink node (or base station) and a (large) number of sensor
nodes deployed over a large geographic area (sensing
Data are transferred from sensor nodes to the sink
through a multi-hop communication paradigm [14]. As
depicted in Figure1, data collected by sensors is
transmitted to a special node equipped with higher energy
and processing capabilities called “Sink Node”. The sink
collects, filters and aggregates data sent by sensors in
order to extract useful information. Due to their energy
constraint, wireless sensors usually have a limited
transmission range making multi-hop data routing toward
the sink more energy-efficient than one-hop
transmissions wireless networks (cellular, WLAN, etc.),
The rest of the paper is organized as follows. Section II
discusses the related works on this topic. In Section III
we will briefly introduce CAG clustering technique for
aggregation. The improved algorithm will be illustrated
in section IV. Section V presents four experiments to
validate proposed algorithm. Section Finally, conclusions
and open issues are discussed in Section VI.
In this section, we provide a brief overview of some
related research work.
Grouping sensor nodes into clusters has been widely
pursued by the research community in order to achieve
the network scalability objective. In addition to
supporting network scalability, clustering has numerous
advantages. It can localize the route set up within the
cluster and thus reduce the size of the routing table stored
at the individual node [15]. Clustering can also conserve
communication bandwidth since it limits the scope of
inter-cluster interactions to CHs and avoids redundant
exchange of messages among sensor nodes [16].
Moreover, clustering can stabilize the network topology
at the level of sensors and thus cuts on topology
maintenance overhead.
Furthermore, Energy-efficient clustering algorithms
for wireless sensor networks have been widely addressed
in literature. The main goal of clustering is to efficiently
maintain the energy consumption of sensor nodes by
involving them in multi-hop communication within a
particular cluster and by performing data aggregation and
fusion in order to decrease the number of transmitted
messages to the sink.
Every cluster would have a leader, often referred to as
the cluster-head (CH). A CH may be elected by the
sensors in a cluster or pre-assigned by the network
designer. A CH may also be just one of the sensors or a
node that is richer in resources. Cluster formation is
typically based on the energy reserve of sensors and
sensor’s proximity to the CH [17].
For instance, Low-Energy Adaptive Clustering
Hierarchy (LEACH) [18], one of the first clustering
algorithms proposed for sensor networks, is a distributed,
proactive, dynamic algorithm that forms clusters of
sensors based on the received signal strength and uses
local CHs as routers to the sink. Each node makes its own
decision whether to become CH based on how often and
the last time it has been CH but also on the optimal
percentage of CHs in the network (pre-determined value).
Transmissions are operated only by CHs which saves
energy. LEACH provides a balance of energy
consumption through a random rotation of CHs. However,
CHs transmit data directly to the sink, which can be
energy-consuming in large-scale sensor networks.
Power-efficient GAthering in Sensor Information
Systems (PEGASIS) [19] and its variation Hierarchical-
PEGASIS are two improvements of LEACH. Rather than
forming multiple clusters, PEGASIS forms chains of
sensor nodes so that each sensor transmits and receives
from a neighbor and only one node is selected from that
chain to convey data to the PN. Still, communication
between the elected CH and the PN is one-hop, which
may waste energy and prove to be unsuitable for large-
sized networks. Weighted Clustering Algorithm (WCA)
[20] is a reactive clustering algorithm where cluster
election is based on the evaluation, for every sensor, of a
score function called’ combined weight’. This function is
a weighted linear combination of the degree, the mobility
level, the transmission power and the residual energy of
the sensor. Every sensor broadcasts its combined weight
to its neighbors and the sensor having the lowest weight
is elected CH.
Hybrid Energy-Efficient Distributed Clustering (HEED)
[21] is a distributed clustering protocol that uses a hybrid
combination of the residual energy and the intra-cluster
communication cost as attribute for cluster head selection.
HEED ensures a uniform distribution of CHs across the
network and adjusts the probability of CH-selection to
ensure inter-CH connectivity. In its initialization phase,
HEED allows sensors to compute a probability of
becoming CH, proportional to its residual energy and to a
pre-determined percentage of CHs. Then, during a
repetition phase, sensors seek the best CH to connect to.
If no CH is found, the sensor doubles its probability to
become CH and broadcasts it again to its neighbors, and
so forth. This phase stops either when this probability
equals 1 (i.e: the sensor elects itself as CH) or when it
finds a CH to connect to.
Energy-efficient Strong Head clustering (EESH) [17]
is a recently published clustering protocol. In EESH,
nodes are promoted CHs according to their respective
residual energies, their respective degrees and the
distance to and the residual energy of their neighbors. For
that, EESH evaluates a cost function for every sensor in
the network and iteratively elects the node having the
greatest cost as CH. This process terminates when all the
sensors in the network are connected to at least one CH.
EESH has been shown to outperform HEED and LEACH,
so we used it as a comparative base in our performance
Traditionally, the sensors are deployed in a redundant
fashion. Since sensor nodes might generate significant
redundant data, similar packets from multiple nodes can
be aggregated so that the number of transmissions would
be reduced. Data aggregation combines data from
different sources by using functions such as suppression
(eliminating duplicates), min, max and average [22].
Some of these functions can be performed either partially
or fully in each sensor node, by allowing sensor nodes to
conduct in-network data reduction. Recognizing that
computation would be less energy consuming than
communication, substantial energy savings can be
obtained through data aggregation.
In the cased of allowing for an approximate result, and
not requiring an exact answer, enables exploiting the
correlations in the sensor data by selecting a small subset
of sensor nodes called representative set whose signal
data values will represents the whole sensor networks
with sufficient accuracy.
This technique has been used to achieve energy
efficiency and traffic optimization in a number of routing
protocols. YOON and SHAHABI have proposed a
clustered aggregation (CAG) technique leveraging spatial
and temporal correlations in wireless sensor networks [23,
In-network query processing and data aggregation are
widely used to save energy, increase scalability, and
reduce computation in many monitoring applications of
WSN [25, 26].
TAG, the landmark in-network query processing
system, constructs a query routing tree and performs in-
network aggregation along the tree [27].
TAG operates as follows: users pose aggregation
queries from a powered, storage-rich base station.
Operators that implement the query are distributed into
the network by piggybacking on the existing ad hoc
networking protocol. Sensors route data back towards the
user through a routing tree rooted at the base station. As
data flows up this tree, it is aggregated according to an
aggregation function and value-based partitioning
specified in the query. As an example, consider a query
that counts the number of nodes in a network of
indeterminate size.
First, the request to count is injected into the network;
Then, each leaf node in the tree reports a count of 1 to
their parent;
Interior nodes sum the count of their children, add 1 to
it, and report that value to their parent;
Counts propagate up the tree in this manner, and flow
out at the root.
CAG branches out from TAG for further energy saving
by using spatial correlation of data to improve existing
in-network aggregation mechanisms.
CAG forms clusters of the sensor nodes sensing similar
values and transmits only a single value per cluster as
opposed to a single value per node as in TAG like
schemes. Thus, CAG can significantly reduce the number
of transmissions, which results in energy savings while
incurring a small error in the query result.
The CAG algorithm operates in two phases: query and
response. During the query phase, CAG forms clusters
when TAG-like forwarding tree is built using a user-
specified error threshold
. In the response phase, CAG
transmits a single value per cluster. CAG is a lossy
clustering method; only the cluster heads contribute to the
A user-provided error threshold,
, is used while
building clusters. Each node decides to join a cluster
based on cluster head sensor Reading (CR) and My local
sensor Reading(MR); if MR < CR ± CR ×
, then the
sensor is included in the same cluster.
The pseudo code of the CAG algorithm can be
summarized as follows:
Figure2. An example of CAG clustering result.
Figure3. The problem illustration of CAG clustering algorithm
In order to be more intuitive, an example of using
CAG clustering algorithm is to shown in Figure2.
CAG algorithm seems perfect, but through carefully
studying the clusters formation process of CAG, we
found that where still some flaws exist in the algorithm.
The ultimate goal of CAG is to divide the sensor
network into some clusters, and a representative is
selected form each cluster, which also is called cluster
head. The cluster head is responsible to answer for
thequeries send by base station quickly. Therefore, CAG
should satisfied two requirements, one is that the cluster
head can represent the whole cluster and the other is the
number of cluster head should be as less as possible. This
problem is illustrated in the Figure3. The whole
monitoring region is classified into three classes labeled
with different colors. The assumption adopted is that the
readings of each sensor located in the same region are
equal. According to the idea of CAG, only a sensor node
is selected as cluster head from a subgraph located in the
same region. But the clustering result can be seen in
Figure3, every node is elected as a cluster head if its
parent is located in a different region. The flaws of CAG
are obviously that to many cluster heads were selected to
represent the same region.
To overcome the disadvantages of the CAG, we
proposed the improved clustering algorithm:
At the every beginning, each node sends a HELLO
message to build its neighborship table. Every node
contain two routing items, one is that the destination is
the root (base station) node, by which the node forwards
the message received from its child node, the other is that
the destination is its cluster head, through which the node
send message to the cluster head for aggregation.
When the root node prepares to collect data, it labels
itself as root node and fulfill network initialization
message NET_INIT and broadcasts it. The NET_INIT is
a seven-tuple< QueryID, Attribute,τ,ParentID,MyID
level, CR >, where QueryID is the query number,
Attribute designates the query attribute for multi-sensor
node.τ is a user-specified error threshold ,and the level is
the depth of the current node in the forwarding tree.
Once an intermediate node receiving the NET_INIT in
the first time, it add a routing item to is route table and
forward the NET_INIT message, otherwise dropped the
For the first time receiving a broadcast message,
according to clustering rule
±< CRMR , the node judge
whether it belongs to the same cluster with its parents or
not. If the reading of the node satisfy clustering rule, the
node labels itself as cluster member and join the cluster.
Otherwise, the node checks it neighbors for cluster head.
If there is a node already becomes cluster head, and then
joins the cluster, else it labels itself as cluster head.
Such a process will be continued, until all of the nodes
joined the routing tree.
Analyzing above clustering process, we can find that in
the improved algorithm when a node receive the
NET_INIT message, it firstly check whether its neighbor
has become a cluster heard or not. Joining its brother’s
cluster is its first choice, only all of the conditions are not
satisfied, the node labels itself as cluster head.
So, based on the above theoretical analysis of our
proposed algorithm, we can conclude that our algorithm
will generate less cluster heads than CAG does. In the
next section, several experiments will be conducted to
validate the effectiveness of our algorithm.
In this section, three experiments were designed to
evaluate the performance of the improved clustering
algorithm. The simulating program was developed by our
team using java language. In the following experiments,
A total of 500 nodes were deployed a 1000x1000 2
rectangular area. The transmission ranges of all the sensor
nodes are set to 100m.
A. Experiment 1
The parameters in the first experiment are set as
follows: threshold
=; the base station is located in
the center of the monitoring area, dark labeled node in
Fig 4(a); sensor reading generation scheme is: the node
reading in the center is 50 unit, and the others reading is
generated according to the rule )500/1(50 Dist×
where Dist denotes the distance to the center.
Fig.4 (b) shows the TAG routing tree. Fig.4 (c) and (d)
illustrate the clustering results adopting CAG algorithm
and our proposed algorithm, respectively. For the purpose
of facilitating the observation, the cluster heads and the
link between cluster head and their parents are marked in
red color. To give a vivid cluster impression, each node is
linked to its cluster head, but this link does not represent
real routing path. So, the number of red line is equal to
the number of cluster. In this experiment, there are 227
cluster heads existing in the networks using CAG
clustering algorithm, while only 71 cluster head
generated by our proposed algorithm.
B. Experiment 2
The only difference between this experiment and first
experiment is that the sensor reading generation
mechanism. In this experiment, 10 points were randomly
selected in the monitoring region as data centers. The
difference between each center value is 10 units. The
reading of each sensor node follows
)500/1( DistV
,where Vis the value of the nearest
data center, and Dist is the distance between them.
Fig. 5(a) and (b) give the clustering results using CAG
and our proposed algorithm, and the cluster heads are 191
and 66 respectively.
C. Experiment 3
Parameter settings in this experiment are similar with
experiment 1, and the difference is the base station is
located in the upper left corner of the region. The
clustering results are shown in Figure6. And the numbers
of cluster heads obtained by CAG and our proposed
algorithm are176 and 50, respectively.
D. Experiment 4
The base station was placed in the upper left corner of
the region and the node readings generation mechanism is
the same as experiment 2. The clustering results are
shown in Figure7. And the numbers of cluster heads
obtained by CAG and our proposed algorithm are 166
and 59, respectively.
E. Experiment 5
This experiment is mainly used to test the effect of
is set to 2 and 10, respectively, The
results shown in Figure7.
(a) (b)
(c) (d)
Figure4.Results of experiment 1.
(a) Nodes distribution ;(b) TAG route tree; (c) CAG clustering result(cluster head 227);
(d) Clustering result using proposed algorithm (cluster head 71)
(a) (b)
Figure5.Results of experiment 2.
(a) CAG clustering result(cluster head 191); (b) Clustering result using proposed algorithm (cluster head 66)
(a) (b)
(c) (d)
Figure6. Results of experiment 3
(a)Nodes deployment;(b) TAG routing tree;(c)CAG clustering results(Heads:176);(d) Clustering result using the proposed method(Heads:50).
(a) (b)
Figure7. Results of experiment 4. (a) CAG clustering results(Heads:166);(b) Clustering result using the proposed method(Heads:59).
(a) (b)
(c) (d)
Figure8. Results of experiment 5
(a) CAG clustering results(τ=10Heads:104); (b) Clustering result using the proposed method(τ=10Heads:66);
(c) CAG clustering results(τ=2Heads:283);(d) Clustering result using the proposed method(τ=2Heads:171)
A wireless sensor network (WSN) is a wireless
network consisting of spatially distributed autonomous
devices using sensors to cooperatively monitor physical
or environmental conditions, such as temperature, sound,
vibration, pressure, motion or pollutants, at different
locations. In this paper, A novelty clustering algorithm is
proposed which can greatly reduce the number of cluster
heads, by exploiting spatial correlation of nodes to form
clusters of nodes sensing similar values, and only cluster
head sensor reading is transmit to sink, such can
efficiently alleviates the funneling effects. Experimental
results validate the effectiveness of this approach.
In the next research, we will develop a prototype
system to further verify the validity of our approach and
will give the exact energy consumption.
This work was supported by the National Nature
Science Foundation of China (Grant No.
60970012) ,Shanghai Key Science and Technology
Project in Information Technology Field (Grant
No.09511501000), Shanghai Key Science and
Technology ProjectGrant No.09220502800),Shanghai
leading academic discipline project (Grant No.S30501),
Innovation Program of Shanghai Municipal Education
Commission (Grant No.10YZ102) and supported by
Science Foundation for the Excellent Youth Scholars of
Shanghai of China (Grant No. slg08014)
