ArticlePDF Available

Distributed Adaptive Node-Specific Signal Estimation in Fully Connected Sensor Networks—Part I: Sequential Node Updating

Authors:

Abstract and Figures

We introduce a distributed adaptive algorithm for linear minimum mean squared error (MMSE) estimation of node-specific signals in a fully connected broadcasting sensor network where the nodes collect multichannel sensor signal observations. We assume that the node-specific signals to be estimated share a common latent signal subspace with a dimension that is small compared to the number of available sensor channels at each node. In this case, the algorithm can significantly reduce the required communication bandwidth and still provide the same optimal linear MMSE estimators as the centralized case. Furthermore, the computational load at each node is smaller than in a centralized architecture in which all computations are performed in a single fusion center. We consider the case where nodes update their parameters in a sequential round robin fashion. Numerical simulations support the theoretical results. Because of its adaptive nature, the algorithm is suited for real-time signal estimation in dynamic environments, such as speech enhancement with acoustic sensor networks.
Content may be subject to copyright.
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, OCTOBER 2010 5277
Distributed Adaptive Node-Specific Signal Estimation
in Fully Connected Sensor Networks—Part I:
Sequential Node Updating
Alexander Bertrand, Student Member, IEEE, and Marc Moonen, Fellow, IEEE
Abstract—We introduce a distributed adaptive algorithm for
linear minimum mean squared error (MMSE) estimation of
node-specific signals in a fully connected broadcasting sensor
network where the nodes collect multichannel sensor signal obser-
vations. We assume that the node-specific signals to be estimated
share a common latent signal subspace with a dimension that is
small compared to the number of available sensor channels at
each node. In this case, the algorithm can significantly reduce the
required communication bandwidth and still provide the same
optimal linear MMSE estimators as the centralized case. Further-
more, the computational load at each node is smaller than in a
centralized architecture in which all computations are performed
in a single fusion center. We consider the case where nodes update
their parameters in a sequential round robin fashion. Numerical
simulations support the theoretical results. Because of its adaptive
nature, the algorithm is suited for real-time signal estimation in
dynamic environments, such as speech enhancement with acoustic
sensor networks.
Index Terms—Adaptive estimation, distributed estimation, wire-
less sensor networks (WSNs).
I. INTRODUCTION
IN a sensor network [1] a general objective is to utilize all
sensor signal observations available in the entire network to
perform a certain task, such as the estimation of a parameter or
signal. Gathering all observations in a fusion center to calculate
an optimal estimate may however require a large communica-
tion bandwidth and computational power. This approach is often
Manuscript received October 21, 2009; accepted March 21, 2010. Date of
publication June 10, 2010; date of current version September 15, 2010. The as-
sociate editor coordinating the review of this manuscript and approving it for
publication was Dr. Ta-Sung Lee. The work of A. Bertrand was supported by
a Ph.D. grant of the I.W.T. (Flemish Institute for the Promotion of Innovation
through Science and Technology). This work was carried out at the ESAT Labo-
ratory of Katholieke Universiteit Leuven, in the frame of K.U. Leuven Research
Council CoE EF/05/006 Optimization in Engineering (OPTEC), Concerted Re-
search Action GOA-AMBioRICS, Concerted Research Action GOA-MaNet,
the Belgian Programme on Interuniversity Attraction Poles initiated by the Bel-
gian Federal Science Policy Office IUAP P6/04 (DYSCO, “Dynamical sys-
tems, control and optimization,” 2007–2011), and Research Project FWO nr.
G.0600.08 (“Signal processing and network design for wireless acoustic sensor
networks”). The scientific responsibility is assumed by its authors.
The authors are with the Department of Electrical Engineering (ESAT-SCD/
SISTA), Katholieke Universiteit Leuven, B-3001 Leuven, Belgium (e-mail:
alexander.bertrand@esat.kuleuven.be; marc.moonen@esat.kuleuven.be).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TSP.2010.2052612
referred to as centralized fusion or estimation. An alternative is
a distributed approach where each node has its own processing
unit and the estimation relies on distributed processing and co-
operation. This approach is preferred, especially so when it is
scalable in terms of its communication bandwidth requirement
and computational complexity.
In many sensor network estimation frameworks, the sensor
signal observations are used to estimate a common network-
wide desired parameter or signal, denoted here by . This means
that all nodes contribute to a common goal, i.e., the estimation of
the globally defined variable , which is the same for all nodes
(see for example [2]–[8]). This can be viewed as a special case
of the more general problem, which is considered here, where
each node in the network estimates a different node-specific de-
sired signal, i.e., node estimates the locally defined signal .
This means that all nodes have a different local objective, which
they pursue through cooperation with other nodes. We describe
a distributed adaptive node-specific signal estimation (DANSE)
algorithm that operates in an ideal fully connected network. The
nodes broadcast compressed multichannel sensor signal obser-
vations that can be captured by all other nodes in the network,
possibly with the help of relay nodes. The computational load
is distributed over the different nodes in the network.
The DANSE algorithm is designed for the case where the
node-specific desired signals share a common (unknown) la-
tent signal subspace. If this signal space has a small dimension
compared to the number of available sensor channels at each
node, the DANSE algorithm exploits this common interest of
the nodes to significantly compress the data to be broadcast, and
yet converge to the optimal linear minimum mean squared error
(MMSE) estimators as if all sensor signal observations were
available at each node. Although the DANSE algorithm implic-
itly assumes a specific structure in the relationship between the
desired signals of the different nodes, it is noted that the actual
parameters of these latent dependencies are not assumed to be
known, i.e., nodes do not know how their desired signal is re-
lated to the desired signals of other nodes. The model that is
assumed in the DANSE algorithm naturally emerges in adap-
tive signal estimation problems in dynamic scenarios where the
target signal statistics and the transfer functions to the sensors
are not known and may change during operation of the algo-
rithm. Therefore, the original target signal cannot be recovered,
and so an option is then to let the nodes optimally estimate the
signal as it is observed locally by the node’s sensors. In this case,
the desired signals of the different nodes are differently filtered
1053-587X/$26.00 © 2010 IEEE
5278 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, OCTOBER 2010
versions of the same target signal, i.e., they share a common la-
tent signal subspace.
Because of its adaptive nature, the DANSE algorithm is
suited for real-time applications in dynamic environments.
Typical applications are vibration monitoring, wireless acoustic
sensor networks (for surveillance, video conferencing, do-
motics, audio recording ), and noise reduction in hearing aids
with external sensor nodes and/or cooperation between multiple
hearing aids [9], [10]. Node-specific estimation is particularly
important in applications where a target signal needs to be
estimated as it is observed at a specific sensor position. For
instance, in acoustic surveillance, it is often required to be able
to locate a sound source, so spatial information in the obser-
vations of different nodes must be retained in the estimation
process. In cooperating hearing aids, it is important to estimate
the signal as it impinges at the hearing aid itself, to preserve the
auditory cues for directional hearing [11], [12].
The DANSE algorithm is based on linear compression of
multichannel sensor signal observations. Linear compression
of sensor signal observations for data fusion has been the
topic of earlier work, e.g., [5]–[8]. The presented techniques,
however, assume prior knowledge of the intra- and intersensor
(cross-)correlation structure in the entire network. This must
be obtained by a priori training using all uncompressed sensor
signal observations, or must be derived from a specific data
model. Such assumptions make it difficult to apply the resulting
algorithms in adaptive networks or dynamic environments
where the statistics of the desired signals or sensor signals may
change. The DANSE algorithm can adapt to these changes
because nodes estimate and reestimate all required statistical
quantities on the compressed data during operation. For this,
we assume that each node can adaptively estimate the cross cor-
relation between its local sensor signals and its desired signal.
It is noted that the acquisition of these signal statistics is often
difficult or impossible, since the target signal is assumed to be
unknown. However, we will explain that in particular cases,
it is possible to estimate the required statistics, e.g., when the
target signal has an ONOFF behavior (such as speech signals),
or when the target source periodically transmits a priori known
training sequences. In cases where the local statistics cannot
be estimated adaptively, the DANSE algorithm can still be
used in a semi-adaptive context, i.e., scenarios with static noise
statistics but with changing target signal statistics or vice versa,
assuming that the static correlation structure is a priori known.
In [13], a batch-mode description of the DANSE algorithm
was briefly introduced. In this paper, we provide more details,
i.e., we include a convergence proof and introduce a truly adap-
tive version. In addition, we address implementation aspects,
and provide extensive simulation results, both in batch mode and
in a dynamic scenario. We only consider the case where nodes
update their parameters in a sequential round robin fashion. The
case where nodes update simultaneously or asynchronously is
treated in a companion paper [14]. In [10], a pruned version
of the DANSE algorithm has been used for microphone-array
based speech enhancement in binaural hearing aids, where it
was referred to as distributed multichannel Wiener filtering. In
this application, two hearing aids in a binaural configuration ex-
change a linear combination of their microphone signals to esti-
mate the target sound that is recorded by their reference micro-
phone. Convergence of the two-node system has been proven for
the special case where there is a single target speaker. The more
general DANSE algorithm provided in this paper allows for a
nontrivial extension to a scenario with multiple target speakers
and a network with more than two nodes. Using extra acoustic
sensor nodes that communicate with the hearing aids generally
improves the noise reduction performance, since the acoustic
sensors physically cover a larger area [9].
The paper is organized as follows. The problem formulation
and notation are presented in Section II. In Section III, we first
address the simple case in which the node-specific desired sig-
nals are scaled versions of each other and we prove conver-
gence of the DANSE algorithm to the optimal linear MMSE
estimators when nodes update their parameters sequentially. In
Section IV, this algorithm is generalized to the case in which
the node-specific desired signals share a common latent -di-
mensional signal subspace. In Section V, we address some im-
plementation details of DANSE and we study the complexity
of the algorithm. Finally, Section VI illustrates the convergence
results with numerical simulations. Conclusions are given in
Section VII.
II. PROBLEM FORMULATION AND NOTATION
A. Node-Specific Linear MMSE Estimation
We consider an ideal fully connected network with sensor
nodes , in which data broadcast by a node
can be captured by all other nodes in the network
through an ideal link. Node collects observations of a com-
plex1valued -channel signal , where is the dis-
crete time index, and where is an -dimensional column
vector. Each channel , , of the signal
corresponds to a sensor signal to which node has access.
We assume that all signals are stationary and ergodic. In prac-
tice, the stationarity and ergodicity assumption can be relaxed to
short-term stationarity and ergodicity, in which case the theory
should be applied to finite signal segments that are assumed to
be stationary and ergodic. For the sake of an easy exposition, we
will omit the time index when referring to a signal, and we will
only write the time index when referring to one specific obser-
vation, i.e., is the observation of the signal at time .We
define as the -channel signal in which all are stacked,
where . This scenario is described in Fig. 1.
It is noted that this problem formulation also allows for hier-
archical network architectures, in which the sensors are grouped
in clusters. The sensors of a specific cluster then transmit
their observations to a nearby fusion center, i.e., a “higher level”
node. The fusion centers then correspond to the nodes in
the above framework, and the collected observations in sensor
cluster correspond to the -channel signals as explained
above. Fig. 2 shows such a scenario for a network with three fu-
sion centers .
We first consider the centralized estimation problem, i.e., we
assume that each node has access to the observations of the en-
tire -channel signal . This corresponds to the case where
1Throughout this paper, all signals are assumed to be complex valued to
permit frequency-domain descriptions.
BERTRAND AND MOONEN: DANSE IN FULLY CONNECTED SENSOR NETWORKS—PART I 5279
Fig. 1. Description of the scenario. The network contains sensor nodes,
, where node collects -channel sensor signal observations and es-
timates a node-specific desired signal , which is a mixture of the channels
of a common latent signal .
Fig. 2. A hierarchical architecture with 3 fusion centers , each one
collecting sensor signals from nearby sensors.
nodes broadcast their uncompressed observations to all other
nodes. In Sections III and IV, the general goal will be to com-
press the broadcast signals, while preserving the estimation per-
formance of this centralized estimator. The objective for node
is to estimate a complex valued node-specific signal , referred
to as the desired signal, from the observations of . We consider
the general case where is not an observed signal, i.e., it is as-
sumed to be unknown, as it is the case in signal enhancement
(e.g., in speech enhancement, is the speech component in a
noisy microphone signal). Node uses a linear estimator to
estimate as where is a complex valued -di-
mensional vector, and where superscript denotes the conju-
gate transpose operator. We assume that the -channel signal
is correlated to the node-specific desired signals, but unlike [6],
[8], we do not restrict ourselves to any data model generating the
sensor signals, nor do we make any assumptions on the proba-
bility distributions of the involved signals. We consider linear
MMSE estimation based on a node-specific estimator , i.e.
(1)
with the expected value operator. Assuming that the cor-
relation matrix has full rank,2the unique so-
lution of (1) is [15]:
(2)
with , where denotes the complex conjugate
of . Based on the assumption that the signals are ergodic,
and can be estimated by time averaging. The is di-
rectly estimated from the sensor signal observations. Since is
assumed to be unknown, the estimation of the correlation vector
has to be done indirectly, based on specific strategies, e.g.,
by exploiting the ONOFF behavior of the target signal (e.g., for
speech enhancement [9], [10]), by using training sequences, or
by using partial prior knowledge when the estimation is per-
formed in a semi-adaptive context. We will provide more details
on these strategies in Section V-A. In the sequel, we assume that
can be estimated during operation of the algorithm.
In the above estimation procedure, temporal correlation ap-
pears to be ignored. However, differently delayed versions of
one or more sensor signals at node can be added to the chan-
nels of , to also exploit the temporal information in the sig-
nals. For example, assume that node has access to 4 sensor
signals. Then each of these signals is delayed with 1, up to
sample delays, resulting in extra (delayed) channels. In
this case, the dimension of is .
It is noted that our problem statement differs from [2]–[4],
where each node collects different spatio–temporal observations
of two correlated signals and . The objective is then to find
the best common linear fit between these observations, with a
single set of coefficients , which is assumed to be the same for
each node. Since the coefficients in are of interest, only the
locally estimated ’s must be shared between nodes, whereas
the sensor observations themselves are only used locally to up-
date the estimate of . Since all nodes are assumed to estimate
the same set of coefficients, incremental or diffusive averaging
strategies can be used.
B. Common Latent Signal Subspace
In our problem statement, each node only collects observa-
tions of which corresponds to a subset of the channels of the
full signal . To find the optimal MMSE solution (2), each node
therefore in principle has to broadcast its observations of
to all other nodes in the network, which requires a large com-
munication bandwidth. One possibility to reduce the required
bandwidth is to broadcast only a few linear combinations of the
components of the observations instead of all compo-
nents. Finding the optimal linear compression is often a non-
trivial task, and in general this will not lead to the optimal solu-
tions (2). In many practical cases, however, the signals share
a common latent signal subspace, and then this can be exploited
in the compression. The most simple case is when all ,
i.e., the desired signal is the same for all nodes. We will first
handle the slightly more general case where all are scaled
versions of a common latent single-channel signal . For this
2This assumption is mostly satisfied in practice because of a noise component
at every sensor that is independent of other sensors, e.g., thermal noise. If not,
pseudoinverses should be used. A further comment on the rank-deficient case is
made in Section IV-C.
5280 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, OCTOBER 2010
scenario, we will introduce the algorithm, in which
the data to be broadcast by each node is compressed by a
factor . Despite this compression, the algorithm converges
to the optimal node-specific solution (2) at every node as if no
compression were used for the broadcasts.
This scenario can then be extended to the more general case
where the desired signals share a common -dimensional signal
subspace, i.e.
(3)
with defining an unknown -dimensional complex vector,
and a latent complex valued -channel signal defining the
-dimensional signal subspace that contains all signals. This
model applies to situations where the desired signal is generated
by multiple latent processes simultaneously (e.g., measuring vi-
brations when there are multiple exciters, or recording a con-
versation between multiple speakers [9]). Since the statistics of
the latent signals as well as the propagation properties to the
different sensors are generally unknown, the signal estimation
procedure can only use statistics that can be obtained from the
local sensor signal observations. The desired signal of each node
is then the linear mixture of the latent target signals as locally
observed by a reference sensor.
In the sequel, we consider the general case where node es-
timates a -channel desired signal
(4)
with a complex valued matrix. This data model is
depicted in Fig. 1. It is noted that the matrix and the la-
tent signal are assumed to be unknown, i.e., nodes do not
know how their node-specific desired signals are related to
each other. Since we also consider complex valued signals, (4)
can correspond to a frequency domain description of a convo-
lutive mixture in the time domain, as in [9], [10]. Expression
(4) then defines a different estimation problem for each specific
frequency. This yields frequency dependent estimators ,
which translate to multitap filters in the time domain.
Notice that, if , the desired signal spans the com-
plete signal subspace defined by the -channel signal (pro-
vided that the matrix has full rank). If this holds
for each node in the network, we will show that the data to
be broadcast by node can be compressed by a factor .
This means that node only needs to broadcast linear com-
binations of the components of its observations of , while the
optimal node-specific solution (2) is still obtained at all nodes.
Notice that in practical applications, the actual signal(s) of in-
terest can be a subset of the entries in , in which case the
other entries should be seen as auxiliary channels to capture
the latent -dimensional signal subspace that contains the ’s.
For instance, consider the case where nodes estimate the target
signal as observed by their reference sensor, i.e., node esti-
mates the node-specific desired signal as in (3). Node then
selects extra auxiliary reference sensors, and also esti-
mates the target signal as it arrives on these sensors. The re-
sulting -channel desired signal then spans the complete
signal subspace if .
III. DANSE WITH SINGLE-CHANNEL BROADCAST SIGNALS
The algorithm introduced in this paper is an iterative scheme
referred to as distributed adaptive node-specific signal estima-
tion (DANSE), since its objective is to estimate a node-spe-
cific signal at each node in a distributed fashion. In the gen-
eral scheme, each node broadcasts -component
compressed sensor signal observations. We will refer to this as
, where the subscript refers to the number of chan-
nels of the broadcast signals. For the sake of an easy exposi-
tion, we first introduce the DANSE algorithm for the simple case
where and we will show that converges to the
optimal filters if , i.e., if the single-channel desired signals
are nonzero scaled versions of the same latent single-channel
signal . In Section IV we generalize this to the more general
algorithm, and we will show that this algorithm con-
verges to the optimal filters if and if all in (4) have
rank .
A. Algorithm
The goal for each node is to estimate the signal with a
linear estimator that uses all observations in the entire network,
i.e., . We aim to obtain the MMSE solutions (2),
without the need for each node to broadcast all components
of the observations. For this, we define a partitioning of the
estimator as with denoting the
-dimensional subvector of that is applied to , and with
superscript denoting the transpose operator. In this way, (1)
is equivalent to
.
.
.
(5)
Since node only has access to the sensor signal observations
of , it can only control a specific part of the estimator ,
namely . In the algorithm, each node broad-
casts the output of this partial estimator, i.e., observations of the
compressed signal . This reduces the data to be
broadcast by a factor . It is noted that acts both as a
compressor and as a part of the estimator , i.e., the observa-
tions of the compressed signal that is broadcast by node is
also used in the estimation of at node itself.
A node now has access to input channels, i.e.,
its own sensor signals and signals that it receives
from the other nodes. Node will compute the optimal linear
combiner of these input channels to estimate .
The coefficient that is applied to the signal observations of
at node is denoted by . A schematic illustration of this
scheme (for ) is shown in Fig. 3. Notice that there is
no decompression involved, i.e., node does not expand the
observations of the signal, but only scales these with a scaling
BERTRAND AND MOONEN: DANSE IN FULLY CONNECTED SENSOR NETWORKS—PART I 5281
Fig. 3. The scheme with three nodes . Each node es-
timates a signal using its own -channel sensor signal observations, and
two single-channel signals broadcast by the other two nodes.
factor . As visualised in Fig. 3, the parametrization of the
now effectively applied at node is therefore
.
.
.(6)
i.e., each is now defined by the set of ’s to-
gether with a vector , defining the scaling
parameters. We use a tilde to indicate that the estimator is pa-
rametrized according to (6), which defines a solution space for
with a specific structure. In this parametrization,
node can only manipulate the parameters and . In the
sequel, we set to remove the ambiguity in
(hence is omitted in Fig. 3). Notice that the solution space
of is -dimensional,
which is smaller3than the original -dimensional solution
space corresponding to the centralized algorithm,
i.e., the solution space of the optimization problem (1). Still, the
goal of the algorithm is to iteratively update the pa-
rameters of (6) until .
In the sequel, we will use the following notation and defi-
nitions. In general, we will use to denote at iteration ,
where can be a signal or a parameter. The -channel signal
is defined as . We define as the vector
with entry omitted. Similarly, we define as the vector
with entry omitted.
At every iteration in the algorithm, one specific
node will update its local parameters and ,by
solving its local node-specific MMSE problem with respect to
3It is assumed here that , i.e., , , and there is at least
one node for which .
its input signals, consisting of its own sensor signal observa-
tions and the compressed signal observations of , i.e.,
it solves
(7)
Let denote the stacked version of the local input signals at
node , i.e.
(8)
Then the solution of (7) is
(9)
with
(10)
(11)
Since there is no decompression involved, the local estimation
problems (7) have a smaller dimension than the original net-
work-wide estimation problems (1), , i.e., the matrix
is smaller than the matrix in (2).
We define a block size which denotes the number of obser-
vations that the nodes collect in between two successive node
updates, i.e., in between two increments of . The al-
gorithm now consists of the following steps:
1) Initialize: ,
Initialize and with random vectors, .
2) Each node performs the following operation cycle:
Collect the sensor observations ,
.
Compress these -dimensional observations to
(12)
Broadcast the compressed observations ,
, to the other nodes.
Collect the -dimensional data vectors
, , which are stacked
versions of the compressed observations received from
the other nodes.
Update the estimates of and , by including
the newly collected data.4
Update the node-specific parameters:
if
if (13)
4In Section V-A, we will suggest some possible strategies to estimate these
parameters.
5282 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, OCTOBER 2010
Compute the estimate of , ,as
(14)
3) .
4) .
5) Return to step 2)
Remark I: Notice that the different iterations are spread out
over time. Therefore, iterative characteristics of the algorithm
do not have an impact on the amount of data that is transmitted,
i.e., each sample is only broadcast once since the time index in
(12) and (14) shifts together with the iteration index.
Remark II: In the above algorithm description, it is not
mentioned how the correlation matrix and the cor-
relation vector should be estimated. This estimation
process depends on the application and the signals involved.
In Section V-A, we will suggest some possible strategies to
estimate and .
Remark III: It is noted that, when a node updates its node-
specific parameters and , the signal statistics of
change, i.e., changes to . Therefore, the next node to
perform an update needs a sufficient number of observations of
to reliably estimate the correlation coefficients involving this
signal. Therefore, the block-length should be chosen large
enough.
B. Convergence and Optimality of if and
Nonzero Desired Signals
We now assume that all are a nonzero scaled version of
the same signal , i.e., , with a nonzero complex
scalar but unknown to the individual nodes. Formula (2) shows
that in this case, all are parallel, i.e.
(15)
with . Therefore, the set belongs
to the solution space used by , as specified by (6), i.e.,
.
In the theoretical convergence analysis in the sequel, we as-
sume that the correlation matrices and the correlation
vectors , , are perfectly estimated, i.e., as if they
are computed over an infinite observation window. Under this
assumption, the following theorem guarantees convergence and
optimality of the algorithm.
Theorem III.1: If the sensor signal correlation matrix has
full rank, and if , , with a complex valued
single-channel signal and , then the
algorithm converges for any initialization of its parameters to
the MMSE solution (2) for all .
Before proving this theorem, we introduce some additional
notation. The vector (without subscript) denotes the stacked
vector of all vectors, i.e.
.
.
.(16)
We also define the following MSE cost functions corresponding
to node :
(17)
(18)
where is defined from and as in (6). Notice that con-
tains the entry , which is a fictitious variable that is never ac-
tually computed by the algorithm. We define
as the function that generates according to (9), i.e.
(19)
with denoting a identity matrix and denoting
an all-zero matrix. It is noted that the right-hand side
of (19) depends on all entries of the argument through the
signal , which is not explicitly revealed in this expression.
The proof of Theorem III.1 provided here differs from the
proof in [10], where a scheme similar to with
has been proved to converge to the optimal solution. Unlike
the proof in [10], our proof allows for a generalization to the
case with , it allows , and provides
more insight in the convergence properties of the algorithm. We
first prove the convergence statement of Theorem III.1, and then
the optimality statement.
Proof of Convergence: We prove that the sequence
and the sequences converge to a
limit point and respectively. When node performs
an update of its variables and at iteration , these
are replaced by the solution of the local MMSE problem (7),
repeated here for convenience:
(20)
If another node were to optimize the variables and
with respect to its own node-specific estimation problem, it
would solve the problem
(21)
Since with , the solution of (20) and
(21) are identical up to a scalar . This means that an update
of and at node , which is an optimization leading
to a decrease of , will also lead to a decrease of for any
if node were allowed to also perform a responding
optimization of its . This shows that for any (independent of
the selection of the node that actually performs an update at
iteration )
(22)
Since all have a lower bound, each sequence
converges to a limit , i.e.
(23)
BERTRAND AND MOONEN: DANSE IN FULLY CONNECTED SENSOR NETWORKS—PART I 5283
If we again assume that node performs an update at iteration
, then because of the strict convexity of the cost function in (20),
the following expression holds:
(24)
with
(25)
This shows that, after convergence of the sequences
, , any update of a
must correspond to a scaling. Notice however that
.
.
..
.
.(26)
i.e., a scaling of a in node does not change the update of
in node , since the scaling is implicitly compensated in
by the parameter . This proves convergence of the sequence
to a limit point and therefore also the sequences
must converge to a limit point , . Notice
that after convergence, based on what was stated earlier
(27)
or equivalently
(28)
From the proof of convergence, one can also conclude that
convergence of the cost functions will be monotonic, when
sampled at the iteration steps in which node updates its
parameters. Indeed, whenever node optimizes its own local
MMSE problem, it also optimizes the corresponding MMSE
problem in node , at least when the latter is allowed to perform
a responding update of its parameter . This shows that the
algorithm is at least as fast as a centralized equivalent
that would use an alternating optimization (AO) technique
[16], which is often referred to as the nonlinear Gauss-Seidel
algorithm [17], with partitioning following directly from the
parameters and for each node.
Proof of Optimality: We now prove that is the solution
of (1) for every node , which is equivalent to proving that the
gradient of is zero when evaluated at equilibrium, i.e.
(29)
Because the solution of (20) sets the partial gradient of with
respect to to zero, we find that
(30)
Since , we can show that
(31)
Combining (30) and (31) yields
(32)
Notice that (27) is equivalent with
(33)
Substituting (33) in (32) yields
(34)
which is equivalent to (29). This proves the theorem.
IV. DANSE WITH -CHANNEL BROADCAST SIGNALS
A. Algorithm
In the algorithm, each node broadcasts
-component compressed sensor signal obser-
vations to the other nodes. This compresses the data to be
sent by node by a factor of . We as-
sume that each node estimates a -channel desired signal
. Assuming that the desired signals
share a common -dimensional latent signal subspace, we
will show in Section IV-B that achieves the optimal
estimators if is chosen equal to . Notice that the actual
signal(s) of interest can be a subset of the vector , and the
other entries should then be seen as auxiliary channels to fully
capture the latent signal subspace, as explained in Section II-B.
Generally, these auxiliary channels are obtained by choosing
extra reference sensors at node .
Again, we use a linear estimator to estimate as
. The objective for node is to
find the linear MMSE estimator
(35)
The solution of (35) is
(36)
with . Again, we define a partitioning of the
estimator as with denoting
the submatrix of that is applied to . We wish
to obtain (36) without the need for each node to broadcast all
components of the observations. Instead each node
will broadcast observations of the -channel compressed signal
. Since the channels of will be highly corre-
lated, further joint compression is possible, but we will not take
this into consideration throughout this paper.
A node can transform the observations of that it receives
from node by a transformation matrix . Again,
it is noted that does not decompress the observations of
the signal , but makes new linear combinations of their
5284 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, OCTOBER 2010
components. The parametrization of the effectively applied
at node is then
.
.
.(37)
which is a generalization of (6). Here, node can only optimize
the parameters and . We set
with denoting the identity matrix.
The -channel signal is a stacked ver-
sion of all the broadcast signals. Similarly to the notation in
Section III, we define the signal as the signal with
omitted, and we define as the matrix with the subma-
trix omitted. The MMSE problem that is solved at node ,
at iteration ,isnow
(38)
The solution of (38) is
(39)
with defined as in (10) and with
(40)
The algorithm consists of the following steps:
1) Initialize: , .
Initialize and with random matrices, .
2) Each node performs the following operation cycle:
Collect the sensor observations ,
.
Compress these -dimensional observations to
-dimensional vectors
(41)
Broadcast the compressed observations ,
, to the other nodes.
Collect the -dimensional data vectors
, , which are stacked
versions of the compressed observations received from
the other nodes.
Update the estimates of and , by including
the newly collected data.
Update the node-specific parameters:
if
if (42)
Compute the estimate of , ,
as
(43)
3) .
4) .
5) Return to step 2)
is a straightforward generalization of the
algorithm as explained in Section III-A, where all vector-vari-
ables are replaced by their matrix equivalent. Similarly, expres-
sions (16)–(19) can be straightforwardly generalized to their
matrix equivalent.
B. Convergence and Optimality of if and
Full Rank
We now assume that , , with a
matrix of rank and a complex valued -channel signal.
This means that all desired signals share the same -dimen-
sional latent signal subspace (i.e., ). Formula (36) shows
that in this case all have the same column space, i.e.
(44)
with . Therefore, the set be-
longs to the solution space used by , as specified by
(37), i.e., . The following
theorem generalizes Theorem III.1.
Theorem IV.1: If the sensor signal correlation matrix
has full rank, and if , , with a complex
valued -channel signal and a matrix of rank ,
then the algorithm converges for any initialization of
its parameters to the MMSE solution (36) for all .
Proof: The proof of Theorem III.1 can straightforwardly
be generalized to prove Theorem IV.1, by replacing every
and by its matrix version and .
In practice, the matrices should be well-conditioned to
obtain the optimal estimators, which is reflected in Theorem
IV.1 by the condition that has full rank. If the -channel
desired signal is defined as the target signal in reference
sensors at node , this matrix can be ill-conditioned if the refer-
ence sensors are close to each other. This problem is investigated
in [9], where the DANSE algorithm is used for noise reduction
in acoustic sensor networks, and a solution is proposed to tackle
this problem.
C. DANSE Under Rank Deficiency
Until now, we have avoided the case where does not
have full rank or when the parameter is overestimated, i.e.,
. Both cases can result in broadcast data for which the
correlation matrix is rank deficient.5In this case, (38) becomes
ill-posed since singular correlation matrices are involved. The
algorithm can cope with these situations by adding
5In the case where , (44) has multiple solutions for since
, . Therefore, the correlation matrix of the broadcast
signal becomes singular, once the submatrix reaches this
rank deficiency.
BERTRAND AND MOONEN: DANSE IN FULLY CONNECTED SENSOR NETWORKS—PART I 5285
a minimum-norm constraint to the local MMSE problems (38),
i.e., using the pseudo-inverse instead of a matrix inverse in the
computation of the solution of (38) [15]. Extensive simulations
have shown that with this modification, the algorithm
still converges to an MMSE solution for rank deficient estima-
tion problems (see Section VI).
However, if the matrix does not have full rank, the so-
lution of (1) is not unique. Simulations have shown that the
solutions obtained by the algorithm, although
leading to a minimal MSE cost at node , are generally different
from the solutions provided by the centralized minimum norm
version, i.e.
(45)
where superscript denotes the pseudoinverse.
V. IMPLEMENTATION ASPECTS
A. Estimation of the Signal Statistics
In the theoretical analysis of the algorithm, it is as-
sumed that the second order signal statistics, which are needed
to solve the MMSE problem (38) are perfectly known. How-
ever, in a practical application, the correlation matrices
and have to be estimated, based on the collected signal
observations. In this section, we will describe some strategies to
estimate these quantities.
Estimation of signal correlation matrices is typically done by
time averaging. This means that some assumptions are made on
short-term ergodicity and stationarity of the signals involved.
However, this stationarity assumption is not necessarily strict.
Even when the signals involved are nonstationary (such as in
speech processing), the algorithm can provide good
estimators. By using long-term correlation matrices, the influ-
ence of rapidly changing temporal statistics is smoothed out,
yielding estimators that mainly exploit the spatial coherence
between the sensors. Since spatial coherence typically changes
slowly, the algorithm is able to provide good estima-
tors, even when the signals themselves are highly nonstationary
(this is e.g., demonstrated by the multichannel speech enhance-
ment experiments in [9]).
We let denote the estimate of at time . Signal
correlation matrices are often estimated in practice by means of
a forgetting factor , i.e.
(46)
Notice that in the algorithm, the statistics change
every time a node updates its parameters. Therefore, (46) is not
suited to compute and , since it uses an infi-
nite time window. A better alternative is a simple time averaging
in a finite observation window, i.e.
(47)
where is the length of the observation window. The procedure
(46) puts more emphasis on the most recent samples, whereas
(47) applies an equal weight to all past samples in the obser-
vation window. The procedure (47) can be implemented recur-
sively by means of an updating and a downdating term, i.e.
(48)
Notice that the window length introduces a trade-off between
tracking performance and estimation performance. Indeed, to
have a fast tracking, the statistics must be estimated from short
signal segments, yielding larger estimation errors in the correla-
tion matrices that are used to compute the estimators at the dif-
ferent nodes. However, as will be demonstrated in Section VI-B,
the algorithm is more robust to these errors, com-
pared to the equivalent centralized algorithm, due to the fact
that uses correlation matrices with smaller dimen-
sions than the network-wide estimation problem.
The estimation of is less straightforward since the
signal cannot be observed directly. However, depending on
the application and the signals involved, some strategies can be
developed to estimate , as explained in the following two
examples.
If the transmitting sources are controlled by the application
itself, as it is the case in a communications scheme, the source
signals that define the different channels in can be manipu-
lated directly. At periodic intervals, a deterministic training se-
quence can be broadcast by the transmitters. If the nodes have
knowledge about these training sequences, they can use this to
compute in a similar way as in (48), during the broad-
cast of these training sequences. After the broadcast, the esti-
mate is fixed until new training sequences are broadcast.
A different strategy can be applied if the desired signal has
an ONOFF behavior.6Assume that the sensor signals in con-
sist of a desired component and an additive noise component
, i.e., , where has an ONOFF behavior, and where
then . In many practical applications, it can
also be assumed that and are independent, and therefore7
(49)
If there is a detection mechanism available that detects whether
the signal is present or not, one can estimate in
time segments where only noise is observed (“noise-only seg-
ments”). Since the noise is uncorrelated to the desired compo-
nent , we find that
(50)
with
(51)
where is the desired component in the signal . The se-
lection matrix is used to select the first columns corre-
6This is often used in speech enhancement applications, since a speech signal
typically contains a lot of silent pauses in between words or sentences.
7For the sake of an easy exposition, we assume that the signals and have
zero mean.
5286 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, OCTOBER 2010
sponding to . Define the noise correlation
matrix
(52)
where denotes the noise component in the signal . With
(50), and similarly to (49), we readily find that
(53)
Using (53), one can compute as the difference between
and , where the latter is computed as in (48),
during noise-only periods.
Notice that, even if the target signal does not have this ONOFF
behavior, the above strategy can be used in a semi-adaptive con-
text, i.e., where the target signal statistics may change but the
noise statistics are static and a priori known (or vice versa). In-
deed, if is known, then (53) can be used to compute
the required statistics. Notice that in (53) is a compressed
version of , i.e., it depends on the current parameters
in . Therefore, each node has to broadcast the entries of
, which are needed in the other nodes to compress the cor-
responding submatrices in . Since these values change
only once for each observations that are collected by the
sensors, the resulting increase in bandwidth is negligible com-
pared to the transmission of the samples of .
B. Computational Complexity
The estimation of the correlation matrices and ,
and the inversion of the former, are the most computationally
expensive steps of the algorithm. From (48) it fol-
lows that an update of at node , has a computational
complexity of
(54)
i.e., it is quadratic in the number of nodes , the number of
channels in the broadcast signals, and the number of channels
of the signal . If node updates its parameters and
according to (39), it performs a matrix inversion, which is
computationally more expensive than (54). However, instead of
computing this inversion, node can directly update the inverse
of at each time by means of the matrix inversion
lemma [15], i.e.
(55)
(56)
This update also has computational complexity (54), and
therefore this is the overall complexity for a single node in the
algorithm.
VI. NUMERICAL SIMULATIONS
In this section, we provide simulation results to demonstrate
the behavior of the algorithm. In Section VI-A, we
perform batch mode simulations where the required statistics
are computed over the full length signals, and where the ’s are
available8to compute . In the batch version of ,
all iterations are performed on the same set of signal observa-
tions. In Section VI-B, a more practical scenario with moving
sources is considered. The algorithm adapts to the
changes in the scenario, and each set of observations is only
broadcast once, i.e., subsequent iterations are performed over
different observation sets. Furthermore, a practical estimation
of the correlation matrices is used, where the ’s are assumed
to be unavailable.
A. Batch Mode Simulations
In this section, we simulate the algorithm in batch
mode. This means that all iterations are performed on the full
signal length. The network consists of four nodes , each
having 10 sensors . The dimension of the latent signal
subspace defined by is . All 3 channels of are uni-
formly distributed random processes on the interval [ ]
from which samples are generated. The coefficients
in are generated by a uniform random process on the unit in-
terval. The sensor signals in consist of the different random
mixtures of the latent -channel signal to which zero-mean
white noise is added with half the power of the channels of .
The initial values of all and are taken from a uniform
random distribution on the unit interval.
The batch mode performance of the algorithm as
well as the algorithm is simulated for this particular
scenario. All evaluations of the MSE cost functions are per-
formed on the equivalent least-squares (LS) cost functions, i.e.
(57)
Also, the correlation matrices are replaced by their least squares
equivalent, i.e., is replaced by where denotes
the sample matrix that contains samples of the variable
in its columns.
The results are illustrated in Fig. 4, showing the LS cost of
node 1 versus the iteration index . Node 1 is the first node
that performs an update. It is observed that the al-
gorithm converges to the optimal linear LS solution, whereas
the algorithm does not since in this case.
Downsampling the curve corresponding to by a factor
, keeping only the iterations in which node 1 updates its
parameters, results in a monotonically decreasing cost. This is
because of expression (22), showing that the cost indeed mono-
tonically decreases whenever a node optimizes its parame-
ters. If the curve corresponding to is downsampled
8This is similar to using a priori known training sequences.
BERTRAND AND MOONEN: DANSE IN FULLY CONNECTED SENSOR NETWORKS—PART I 5287
Fig. 4. LS error of node 1 versus iteration for four different scenarios in a
network with nodes. Each node has 10 sensors.
Fig. 5. LS error of node 1 versus iteration for networks with ,
and nodes respectively. Each node has 10 sensors.
with the same factor, we do not obtain a monotonically de-
creasing cost, since expression (22) is not valid anymore for this
case.
In Fig. 5, we vary the number of nodes , keeping all other
parameters unchanged. All nodes again have 10 sensors. Not
surprisingly, the convergence time of increases lin-
early with since the effective number of updates per time unit
in node 1 is reduced. As soon as each node has updated its pa-
rameters three times, the cost is almost at its minimum at each
node.
In Fig. 6(a), we increase the value of while keeping
. Notice that this corresponds to the case where is
overestimated and hence communication bandwidth is used
inefficiently. The estimation problem becomes rank deficient in
this case, and so the algorithm should be modified by replacing
matrix inversions by pseudoinversions (see Section IV-C). The
algorithm still converges, and the optimal LS cost is again
reached after three iterations per node when is overesti-
mated. In Fig. 6(b), we increase the value of together with
, keeping . This is again observed to have a negligible
effect on convergence time.
As a general conclusion, we can state that for all settings
of the parameters , , , the algorithm approxi-
mately achieves convergence as soon as each node has updated
its parameters three times.
Simulation results with speech signals are provided in a
follow-up paper [9]. In this paper, a distributed speech enhance-
ment algorithm based on and its variations, is tested
in a simulated acoustic sensor network scenario.
B. Adaptive Implementation
In this section, we show simulation results of a practical
implementation of the algorithm in a scenario with
moving sources. The main difference with the batch mode
simulations is that subsequent iterations are now performed
on different signal segments, i.e., the same data is never used
twice. This yields larger estimation errors, since shorter signal
segments are used to estimate the statistics of the input signals.
Furthermore, we will use a practical estimation procedure to
estimate the correlation matrices and , yielding
larger estimation errors.
The scenario is depicted in Fig. 7. The network contains
nodes . Each node has a reference sensor at the node itself,
and can collect observations of five additional sensors that
are uniformly distributed within a 1.6-m radius around the node.
Eight localized white Gaussian noise sources are present.
Two target sources move back and forth over the indicated
straight lines at a speed of 1 m/s, and halt for 2 s at the end points
of these lines. The first source (moving on the vertical line)
transmits a low-pass filtered white noise signal with a cut-off
frequency of 1600 Hz. The other source transmits a band-pass
filtered white noise signal in the frequency range from 1600 to
3200 Hz. Both target sources have an ONOFF behavior with a
period of 0.2 s and both are active 66% of the time. It is assumed
that at each time , all nodes can detect whether the sources are
active or not. The time between two consecutive updates is 0.4 s,
which corresponds to two ONOFF cycles of the target sources.
This means that, every 0.4 s, the iteration index changes to
. The sensors observe their signals at a sampling frequency
of .
The target source signals have half the power of the noise
sources. In addition to the spatially correlated noise, indepen-
dent white Gaussian sensor noise is added to each sensor signal.
This noise component is 10% of the power of the localized
noise signals. The individual signals originating from the target
sources and the noise sources that are collected by a specific
sensor are attenuated in power and summed. The attenuation
factor of the signal power is , where denotes the distance
between the source and the sensor. We assume that there is no
time delay in the transmission path between the sources and the
5288 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, OCTOBER 2010
Fig. 6. LS error of node 1 versus iteration in a network with nodes. Each node has 10 sensors. (a) Different values of , keeping and (b) different
values of .
Fig. 7. Description of the simulated scenario. The network contains four nodes
, each node collecting observations in a cluster of six sensors . One
sensor of each cluster is positioned at the node itself. Two target sources are
moving over the indicated straight lines. Eight noise sources are present .
sensors.9Each node collects six sensor signal observations, and
uses five differently delayed versions of each of these signals in
its estimation process to exploit the temporal correlation in the
target source signals. This means that .
We let denote the signal that is collected at the reference
sensor of node . It consists of an unknown mixture of the
two target source signals, and a noise component , i.e.
(58)
9Since the time delays are the same for all sensors, the spatial information
is purely energy based in this case. Therefore, the nodes cannot perform any
beamforming towards specific locations by exploiting different delay paths be-
tween sources and sensors.
where is the two-channel signal containing the two target
source signals, and where denotes an unknown mixture
vector. The goal for node is to estimate the signal , i.e., the
target source component in its reference sensor. Since ,
the algorithm is used, and therefore an auxiliary de-
sired channel is used to obtain a two-channel desired signal at
every sensor. The auxiliary channel of consists of the target
source component in the signal that is collected by an-
other sensor of node . This component consists of another un-
known mixture of the target sources, so that the conditions of
Theorem IV.1 are satisfied.
The correlation matrix is computed according to
(53). The estimates and are computed sim-
ilarly to (48) with a window length of and
, respectively, which matches the time between two con-
secutive updates.
We will use the signal-to-error ratio (SER) as a measure to as-
sess the performance of the estimators. The instantaneous SER
for node at time and iteration is computed over 3200 sam-
ples, and is defined as
(59)
where denotes the first column of the estimator ,as
defined in (37). Notice that this is the estimator that is of ac-
tual interest, since it estimates the desired component in
the reference sensor. The other column of is viewed as an
auxiliary estimator that is used for the generation of the second
channel of the broadcast signal .
Fig. 8 shows the SER of the four nodes at different time in-
stants. Dashed vertical lines are plotted to indicate the points in
time where both sources start moving, and full vertical lines in-
dicate when they stop moving. The sources stand still in the time
intervals [0–4] s, [10–12] s, and [18–20] s. The performance is
BERTRAND AND MOONEN: DANSE IN FULLY CONNECTED SENSOR NETWORKS—PART I 5289
Fig. 8. SER versus time at the four nodes depicted in Fig. 7. The centralized version is added as a reference. Window lengths are and .
compared to the centralized version, in which all sensor signals
are centralized in a single fusion center that computes the op-
timal estimators according to (2).
In the first 4 s, both sources stand still. The algo-
rithm needs some time to reach a good estimator at each node
(about 2 s), whereas the centralized algorithm converges much
faster. This is because the algorithm updates its nodes
one at a time, with 0.4 s in between two consecutive updates.
The centralized algorithm on the other hand, can update its es-
timators every time a new sample is collected. After a number
of iterations however, the algorithm converges to the
optimal estimators.
Not surprisingly, it is observed that the centralized algorithm
has better tracking capabilities than the algorithm.
This is again a consequence of the fact that the centralized
version computes a new estimator each time a new sample is
collected, yielding a much faster convergence. However, the
algorithm is able to react to changes in the scenario
and always regains optimality after a number of iterations.
Notice that, once the algorithm has converged, it
outperforms the centralized algorithm. This can be explained
by the fact that the algorithm uses correlation ma-
trices with smaller dimension compared to the correlation ma-
trices that are used by the centralized algorithm. Small ma-
trices are generally better conditioned and have a smaller es-
timation error than larger matrices. This performance increase
of compared to its centralized version is observed
to become more significant when the number of sensors in-
creases, yielding larger matrices, or when the window length
decreases, yielding larger estimation errors in the correla-
tion matrices. Fig. 9 shows the performance of and
its centralized version, now with window lengths
and , i.e., roughly half the sizes of the first ex-
periment. It is observed that the estimation performance of the
centralized algorithm significantly decreases compared to the
first experiment, whereas the algorithm is less influ-
enced by the short window length. This observation demon-
strates that is more robust to estimation errors in the
correlation matrices compared to its centralized equivalent. No-
tice that converges much faster in the second exper-
iment, since the time between two consecutive updates is now
0.2 s instead of 0.4 s, due to the shorter window lengths. As al-
ready mentioned in Section V, this faster tracking comes with
the drawback that the estimation performance decreases due to
larger errors in the estimation of the correlation matrices.
In [14], a modified algorithm is studied, where
an improved tracking performance is obtained, by letting nodes
update simultaneously.
VII. CONCLUSION
In this paper, we have introduced a distributed adaptive al-
gorithm for linear MMSE estimation of node-spe-
cific signals in a fully connected broadcasting sensor network,
5290 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, OCTOBER 2010
Fig. 9. SER versus time at the four nodes depicted in Fig. 7. The centralized version is added as a reference. Window lengths are and .
where each sensor node collects multichannel sensor signal ob-
servations. The algorithm significantly compresses the data to
be broadcast, and the computational load is shared amongst
the nodes. It is shown that, if the node-specific desired sig-
nals share a common low-dimensional latent signal subspace,
converges and provides the optimal linear MMSE
estimator for every node-specific estimation problem, as if all
nodes have access to all the sensor signals in the network. Sim-
ulations demonstrate that the algorithm achieves the same per-
formance as a centralized algorithm. A practical adaptive imple-
mentation of the algorithm is described and simulated, demon-
strating the tracking capabilities of the algorithm in a dynamic
scenario. It is observed that the algorithm is more ro-
bust to estimation errors in the correlation matrices, compared to
its centralized equivalent. In this paper, we have only considered
the case where nodes update their parameters in a sequential
round robin fashion. A modified algorithm is studied
in a companion paper [14], where an improved tracking perfor-
mance is obtained, by letting nodes update simultaneously.
ACKNOWLEDGMENT
The authors would like to thank B. Cornelis and the anony-
mous reviewers for their valuable comments after proof-reading
this paper.
REFERENCES
[1] D. Estrin, L. Girod, G. Pottie, and M. Srivastava, “Instrumenting the
world with wireless sensor networks,” in Proc. 2001 IEEE Int. Conf.
Acoust., Speech, Signal Processing (ICASSP ’01), 2001, vol. 4, pp.
2033–2036.
[2] C. G. Lopes and A. H. Sayed, “Incremental adaptive strategies over
distributed networks,” IEEE Trans. Signal Processing, vol. 55, pp.
4064–4077, Aug. 2007.
[3] C. G. Lopes and A. H. Sayed, “Diffusion least-mean squares over adap-
tive networks: Formulation and performance analysis,IEEE Trans.
Signal Processing, vol. 56, pp. 3122–3136, Jul. 2008.
[4] F. Cattivelli, C. G. Lopes, and A. H. Sayed, “Diffusion recursive least-
squares for distributed estimation over adaptive networks,IEEE Trans.
Signal Processing, vol. 56, pp. 1865–1877, May 2008.
[5] I. Schizas, G. Giannakis, and Z.-Q. Luo, “Distributed estimation using
reduced-dimensionality sensor observations,” IEEE Trans. Signal Pro-
cessing, vol. 55, pp. 4284–4299, Aug. 2007.
[6] Z.-Q. Luo, G. Giannakis, and S. Zhang, “Optimal linear decentralized
estimation in a bandwidth constrained sensor network,” in Proc. 2005
Int. Symp. Inf. Theory (ISIT ), Sept. 2005, pp. 1441–1445.
[7] K. Zhang, X. Li, P. Zhang, and H. Li, “Optimal linear estimation fu-
sion—Part VI: Sensor data compression,” in Proc. 2003 Sixth Int. Conf.
Inf. Fusion, 2003, vol. 1, pp. 221–228.
[8] Y. Zhu, E. Song, J. Zhou, and Z. You, “Optimal dimensionality re-
duction of sensor data in multisensor estimation fusion,” IEEE Trans.
Signal Processing, vol. 53, pp. 1631–1639, May 2005.
[9] A. Bertrand and M. Moonen, “Robust distributed noise reduction in
hearing aids with external acoustic sensor nodes,” EURASIP J. Adv.
Signal Process., vol. 2009, p. 14, 2009, 10.1155/2009/530435, Article
ID 530435.
[10] S. Doclo, T. van den Bogaert, M. Moonen, and J. Wouters, “Reduced-
bandwidth and distributed MWF-based noise reduction algorithms for
binaural hearing aids,” IEEE Trans. Audio, Speech, Language Process.,
vol. 17, pp. 38–51, Jan. 2009.
BERTRAND AND MOONEN: DANSE IN FULLY CONNECTED SENSOR NETWORKS—PART I 5291
[11] T. Klasen, T. Van den Bogaert, M. Moonen, and J. Wouters, “Binaural
noise reduction algorithms for hearing aids that preserve interaural time
delay cues,” IEEE Trans. Signal Processing, vol. 55, pp. 1579–1585,
April 2007.
[12] S. Doclo, T. Klasen, T. Van den Bogaert, J. Wouters, and M. Moonen,
“Theoretical analysis of binaural cue preservation using multi-channel
Wiener filtering and interaural transfer functions,” in Proc. Int. Work-
shop Acoust. Echo Noise Contr. (IWAENC), Paris, France, Sep. 2006.
[13] A. Bertrand and M. Moonen, “Distributed adaptive estimation of cor-
related node-specific signals in a fully connected sensor network,” in
Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP),
Apr. 2009, pp. 2053–2056.
[14] A. Bertrand and M. Moonen, “Distributed adaptive node-specific signal
estimation in fully connected sensor networks—Part II: Simultaneous
and asynchronous node updating,” IEEE Trans. Signal Process., vol.
58, no. 10, pp. 5292–5306, Oct. 2010.
[15] G. H. Golub and C. F. van Loan, Matrix Computations, 3rd ed. Bal-
timore, MD: The Johns Hopkins University Press, 1996.
[16] J. C. Bezdek and R. J. Hathaway, “Some notes on alternating optimiza-
tion,” in Advances in Soft Computing. Berlin, Germany: Springer,
2002, pp. 187–195.
[17] D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Compu-
tation: Numerical Methods. Belmont, MA: Athena Scientific, 1997.
Alexander Bertrand (S’08) was born in Roeselare,
Belgium, in 1984. He received the M.Sc. degree in
electrical engineering from Katholieke Universiteit
Leuven, Belgium, in 2007.
He is currently pursuing the Ph.D. degree with
the Electrical Engineering Department (ESAT),
Katholieke Universiteit Leuven, and was supported
by a Ph.D. grant of the Institute for the Promotion
of Innovation through Science and Technology in
Flanders (IWT-Vlaanderen). His research interests
are in multichannel signal processing, ad hoc sensor
arrays, wireless sensor networks, distributed signal enhancement, speech
enhancement, and distributed estimation.
Marc Moonen (M’94–SM’06–F’07) received the
electrical engineering degree and the Ph.D. degree
in applied sciences from Katholieke Universiteit
Leuven, Belgium, in 1986 and 1990, respectively.
Since 2004, he has been a Full Professor with
the Electrical Engineering Department, Katholieke
Universiteit Leuven, where he is heads a research
team working in the area of numerical algorithms
and signal processing for digital communications,
wireless communications, DSL, and audio signal
processing.
Dr. Moonen received the 1994 KU Leuven Research Council Award, the 1997
Alcatel Bell (Belgium) Award (with P. Vandaele), the 2004 Alcatel Bell (Bel-
gium) Award (with R. Cendrillon), and was a 1997 “Laureate of the Belgium
Royal Academy of Science.” He received a journal Best Paper award from the
IEEE TRANSACTIONS ON SIGNAL PROCESSING (with G. Leus) and from Elsevier
Signal Processing (with S. Doclo). He was chairman of the IEEE Benelux Signal
Processing Chapter (1998–2002), and is currently Past-President of European
Association for Signal Processing (EURASIP) and a member of the IEEE Signal
Processing Society Technical Committee on Signal Processing for Communica-
tions. He served as Editor-in Chief for the EURASIP Journal on Applied Signal
Processing (2003–2005), and has been a member of the editorial board of Inte-
gration, the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II (2002–2003)
and IEEE SIGNAL PROCESSING MAGAZINE (2003–2005) and Integration, the
VLSI Journal. He is currently a member of the editorial board of EURASIP
Journal on Advances in Signal Processing, EURASIP Journal on Wireless Com-
munications and Networking, and Signal Processing.
... The newly presented CE and RC methods will first be derived in a centralized setting, i.e., as if all antenna signals serving a certain UE are gathered in one point of the network. As this requires a large communication bandwidth and furthermore introduces a single point of failure putting a heavy load on the backhaul network [35], a distributed algorithm will be presented, based on existing algorithms in wireless sensor networks [36], [37], [38], [39]. The channel estimates can then be obtained by processing the local antenna signals in each AP, together with compressed versions of all the other antenna signals of the APs serving the UE and computing a few extra parameters using innetwork sums. ...
... R L dimensional-version of the antenna signals of the other APs. The algorithm is referred to as the rank {R B , R L } GEVD-based distributed user-centric CE and RC (DUCERC) algorithm and can be related to the distributed algorithms referred to as DANSE, TI-DANSE and GEVD-DANSE of [36], [37], [38] designed for wireless sensor networks. The DUCERC algorithm has a slower adaptation and tracking speed compared to the centralized GEVD-based CE and RC, as it requires multiple iterations on the same data to converge to a stationary solution. ...
... In one iteration n, each AP compresses the received signals y t l during the pilot phase with F n kl ∈ C N ×R L to obtain χ t kl as in (67). Using in-network sums and after subtracting its own compressed channel as in (68), it receives a new 7 Instead of round-robin updates, asynchronous updating as discussed in [36] can also be performed, but this is not discussed here for ease of exposition. ...
Article
Full-text available
Cell-free massive MIMO (CFmMIMO) is considered as one of the enablers to meet the demand for increasing data rates of next generation (6G) wireless communications. In user-centric CFmMIMO, each user equipment (UE) is served by a user-selected set of surrounding access points (APs), requiring efficient signal processing algorithms minimizing inter-AP communications, while still providing a good quality of service to all UEs. This paper provides algorithms for channel estimation (CE) and uplink (UL) receive combining (RC), designed for CFmMIMO channels using different assumptions on the structure of the channel covariances. Three different channel models are considered: line-of-sight (LoS) channels, non-LoS (NLoS) channels (the common Rayleigh fading model) and a combination of LoS and NLoS channels (the general Rician fading model). The LoS component introduces correlation between the channels at different APs that can be exploited to improve the CE and the RC. The channel estimates and receive combiners are obtained in each AP by processing the local antenna signals of the AP, together with compressed versions of all the other antenna signals of the APs serving the UE, during UL training. To make the proposed method scalable, the distributed user-centric channel estimation and receive combining (DUCERC) algorithm is presented that significantly reduces the necessary communications between the APs. The effectiveness of the proposed method and algorithm is demonstrated via numerical simulations.
... One of the main challenges is the need for a distributed strategy which does not rely on a fusion center as most of classic beamformers do. Distributed algorithms have been proposed for speech enhancement in ad-hoc microphone arrays [1,2,3,4] and recently, a deep neural network (DNN)-based distributed solution has also been introduced to combine the increased modelling capacity of DNNs with the flexibility of use of ad-hoc microphone arrays [5]. Besides, because the microphones embedded in different devices do not share the same hardware, they are acquired at different sampling rates (SRs) (even if the nominal SR is the same), causing a sampling rate offset (SRO), and triggered at different starting times, causing a sampling time offset (STO). ...
... The data used to train and evaluate our systems is extracted from the DISCO dataset. 1 Room impulse responses in shoeboxshaped rooms are simulated. The rooms have a length, width and height randomly picked in the ranges 3; 8 m, 3; 5 m and 2; 3 m respectively. ...
Preprint
Full-text available
Speech enhancement in ad-hoc microphone arrays is often hindered by the asynchronization of the devices composing the microphone array. Asynchronization comes from sampling time offset and sampling rate offset which inevitably occur when the microphones are embedded in different hardware components. In this paper, we propose a deep neural network (DNN)-based speech enhancement solution that is suited for applications in ad-hoc microphone arrays because it is distributed and copes with asynchronization. We show that asynchronization has a limited impact on the spatial filtering and mostly affects the performance of the DNNs. Instead of resynchronising the signals, which requires costly processing steps, we use an attention mechanism which makes the DNNs, thus our whole pipeline, robust to asynchronization. We also show that the attention mechanism leads to the asynchronization parameters in an unsupervised manner.
... Smart devices are often equipped with one or more microphones [1], which enable advanced audio signal processing [2], such as spatial filtering [3,4], source localization [5][6][7], and distributed signal enhancement [2]. When several devices with independent sampling clocks are distributed in space and interconnected using wireless data transmission, the setup is referred to as a Wireless Acoustic Sensor Network (WASN) [1]. ...
Article
Full-text available
The network of distributed microphone arrays is usually established in an ad hoc manner; hence, network parameters such as the mutual positioning and rotation of the arrays, positions of sources, and synchronization of their recording onset times are initially unknown. In this article, we consider the problem of passively jointly self-calibrating and synchronizing distributed arrays in reverberant rooms. We use a typical two-step approach where, initially, the relative geometry of the network is estimated using Direction of Arrival (DoA) measurements. Subsequently, the absolute scale and synchronization parameters are estimated using Time Difference of Arrival (TDoA) measurements. This article presents methods to improve the robustness and accuracy of estimation of the absolute geometric scaling and synchronization parameters in reverberant conditions, in which TDoA measurements do not follow a normal distribution; furthermore, outliers often occur. To remedy these issues, we propose a Weighted Least Squares (WLS) estimator and schema for weighting the TDoA measurements to increase the estimation accuracy from heteroscedastic TDoA measurements. In addition, we propose an iterative reweighing algorithm with a binary weight to detect and reject TDoA outliers, which exploits the residuals of the parametric model in the least absolute value minimization. A numerical evaluation shows significant improvements in the proposed method over the state of the art in terms of the relative scaling error and mean absolute value of the synchronization parameters.
... Because of bandwidth or administrative reasons, merging them in a central computing node from those sources might not be feasible (Hu et al., 2019). In some applications, data come naturally feature-distributed, such as the wireless sensor networks (Bertrand and Moonen, 2010, 2015. ...
Preprint
Full-text available
Feature-distributed data, referred to data partitioned by features and stored across multiple computing nodes, are increasingly common in applications with a large number of features. This paper proposes a two-stage relaxed greedy algorithm (TSRGA) for applying multivariate linear regression to such data. The main advantage of TSRGA is that its communication complexity does not depend on the feature dimension, making it highly scalable to very large data sets. In addition, for multivariate response variables, TSRGA can be used to yield low-rank coefficient estimates. The fast convergence of TSRGA is validated by simulation experiments. Finally, we apply the proposed TSRGA in a financial application that leverages unstructured data from the 10-K reports, demonstrating its usefulness in applications with many dense large-dimensional matrices.
Article
In acoustic sensor networks (ASNs), the desired speech signal is commonly corrupted by the reverberation and background noise. To solve this problem, the distributed Kalman filtering method for joint dereverberation and noise reduction is proposed in this paper. To be specific, the Kalman filter with dereverberation and noise reduction is first introduced for one node in ASNs, where the multi-channel linear prediction and the sidelobe-cancellation filter are employed, and both filters are jointly estimated by a single Kalman filter. Then, the local distributed Kalman filter (localDKF) for joint dereverberation and noise reduction is presented in ASNs by only exchanging the measurements among nodes, where every node based on the local observation and the neighboring interaction can obtain the speech source estimation. Finally, to enable nodes to communicate with their neighbors in an isotropic manner, the diffusion-based distributed Kalman filter (diffDKF) approach is proposed by fusing all available information among nodes. The proposed method can jointly perform dereverberation and noise reduction in a fully distributed solution by communicating only with neighboring nodes. Experimental results show the validity of the proposed method in noisy and reverberant ASNs.
Article
Full-text available
This paper presents two super-Gaussian-based multimicrophone maximum a posteriori (MAP) estimators which exploit both amplitude and phase of speech signal from noisy observations. It is well known that super-Gaussian distributions model the statistical properties of speech signal more accurately. Under the independent Gaussian statistical assumption for noise signals, which is usually valid in wireless acoustic sensor networks, two joint multimicrophone estimators are derived while the speech signal is modeled by super-Gaussian distribution. Since the microphones are distributed randomly and may also belong to different devices, the independency assumption of noise signals is more reasonable in these networks. The performance of the proposed estimators is compared to that of four baseline estimators; the first is the multimicrophone minimum mean square error (MMSE) estimation, where both amplitude and phase are derived assuming Gaussian properties for speech signal. The second baseline is the multimicrophone MAP-based amplitude estimator, that utilizes the super-Gaussian statistics to just obtain the amplitude of speech and keeps the phase unchanged. As the third one, we have considered a minimum variance distortion-less response filter followed by a super-Gaussian MMSE estimator. We have also compared the performance of the proposed estimators with the centralized multichannel Wiener filter. The simulation experiments demonstrate remarkable ability of the proposed estimators to enhance speech quality and intelligibility when the clean speech is degraded by a mixture of both point source interference and additive noise in reverberant environments.
Article
Full-text available
In this paper a theoretical analysis of the binaural cue preserva- tion of the multi-channel Wiener filter (MWF) is performed. We will prove that in the case of a single speech source the MWF perfectly preserves the binaural cues of the speech component, but changes the binaural cues of the noise component to the cues of the speech component. In addition, we show that by extend- ing the MWF cost function with terms related to the interaural transfer function it is possible to preserve the binaural cues of both the speech and the noise component, without considerably reducing the noise reduction performance.
Conference Paper
Full-text available
Consider a bandwidth constrained sensor network in which a set of distributed sensors and a fusion center (FC) collaborate to estimate an unknown vector. Due to power and cost limitations, each sensor must compress its data in order to minimize the amount of information that need to be communicated to the FC. In this paper, we consider the design of a linear decentralized estimation scheme (DES) whereby each sensor transmits over a noisy channel to the FC a fixed number of real-valued messages which are linear functions of its observations, while the FC linearly combines the received messages to estimate the unknown parameter vector. Assuming each sensor collects data according to a local linear model, we propose to design optimal linear message functions and linear fusion function according to the minimum mean squared error (MMSE) criterion. We show that the resulting design problem is nonconvex and NP-hard in general, and identify two special cases for which the optimal linear DES design problem can be efficiently solved either in closed form or by semi-definite programming (SDP)
Article
Full-text available
In a binaural hearing aid system, output signals need to be generated for the left and the right ear. Using the binaural multichannel Wiener filter (MWF), which exploits all microphone signals from both hearing aids, a significant reduction of background noise can be achieved. However, due to power and bandwidth limitations of the binaural link, it is typically not possible to transmit all microphone signals between the hearing aids. To limit the amount of transmitted information, this paper presents reduced-bandwidth MWF-based noise reduction algorithms, where a filtered combination of the contralateral microphone signals is transmitted. A first scheme uses a signal-independent beamformer, whereas a second scheme uses the output of a monaural MWF on the contralateral microphone signals and a third scheme involves an iterative distributed MWF (DB-MWF) procedure. It is shown that in the case of a rank-1 speech correlation matrix, corresponding to a single speech source, the DB-MWF procedure converges to the binaural MWF solution. Experimental results compare the noise reduction performance of the reduced-bandwidth algorithms with respect to the benchmark binaural MWF. It is shown that the best performance of the reduced-bandwidth algorithms is obtained by the DB-MWF procedure and that the performance of the DB-MWF procedure approaches quite well the optimal performance of the binaural MWF.
Article
Full-text available
In this paper, we revisit an earlier introduced distributed adaptive node-specific signal estimation (DANSE) algorithm that operates in fully connected sensor networks. In the original algorithm, the nodes update their parameters in a sequential round-robin fashion, which may yield a slow convergence of the estimators, especially so when the number of nodes in the network is large. When all nodes update simultaneously, the algorithm adapts more swiftly, but convergence can no longer be guaranteed. Simulations show that the algorithm then often gets locked in a suboptimal limit cycle. We first provide an extension to the DANSE algorithm, in which we apply an additional relaxation in the updating process. The new algorithm is then proven to converge to the optimal estimators when nodes update simultaneously or asynchronously, be it that the computational load at each node increases in comparison with the algorithm with sequential updates. Finally, based on simulations it is demonstrated that a simplified version of the new algorithm, without any extra computational load, can also provide convergence to the optimal estimators.
Conference Paper
Full-text available
Pervasive micro-sensing and actuation may revolutionize the way in which we understand and manage complex physical systems: from airplane wings to complex ecosystems. The capabilities for detailed physical monitoring and manipulation offer enormous opportunities for almost every scientific discipline, and it will alter the feasible granularity of engineering. We identify opportunities and challenges for distributed signal processing in networks of these sensing elements and investigate some of the architectural challenges posed by systems that are massively distributed, physically-coupled, wirelessly networked, and energy limited
Conference Paper
Full-text available
Let f : ℜs ↦ ℜ be a real-valued scalar field, and let x = (x1,…, xs)T ∈ ℜs be partitioned into t subsets of non-overlapping variables as x = (X1,…,Xt )T, with Xi ∈ ℜp 1, for i = 1,…, t, ∈t i=1pi = s Alternating optimization (AO) is an iterative procedure for minimizing (or maximizing) the function f(x) = f(X1,X2,…,Xt) jointly over all variables by alternating restricted minimizations over the individual subsets of variables X1,…,Xt. AO is the basis for the c-means clustering algorithms (t=2), many forms of vector quantization (t = 2, 3 and 4), and the expectation-maximization (EM) algorithm (t = 4) for normal mixture decomposition. First we review where and how AO fits into the overall optimization landscape. Then we discuss the important theoretical issues connected with the AO approach. Finally, we state (without proofs) two new theorems that give very general local and global convergence and rate of convergence results which hold for all partitionings of x.
Conference Paper
Full-text available
We introduce a distributed adaptive estimation algorithm operating in an ideal fully connected sensor network. The algorithm estimates node-specific signals at each node based on reduced-dimensionality sensor measurements of other nodes in the network. If the node-specific signals to be estimated are linearly dependent on a common latent process with a low dimension compared to the dimension of the sensor measurements, the algorithm can significantly reduce the required communication bandwidth and still provide the optimal linear estimator at each node as if all sensor measurements were available in every node. Because of its adaptive nature and fast convergence properties, the algorithm is suited for real-time applications in dynamic environments, such as speech enhancement in acoustic sensor networks.
Article
An adaptive distributed strategy is developed based on incremental techniques. The proposed scheme addresses the problem of linear estimation in a cooperative fashion, in which nodes equipped with local computing abilities derive local estimates and share them with their predefined neighbors. The resulting algorithm is distributed, cooperative, and able to respond in real time to changes in the environment. Each node is allowed to communicate with its immediate neighbor in order to exploit the spatial dimension while limiting the communications burden at the same time. A spatial-temporal energy conservation argument is used to evaluate the steady-state performance of the individual nodes across the entire network. Computer simulations illustrate the results.
Article
We study the problem of distributed estimation over adaptive networks where a collection of nodes are required to estimate in a collaborative manner some parameter of interest from their measurements. The centralized solution to the problem uses a fusion center, thus, requiring a large amount of energy for communication. Incremental strategies that obtain the global solution have been proposed, but they require the definition of a cycle through the network. We propose a diffusion recursive least-squares algorithm where nodes need to communicate only with their closest neighbors. The algorithm has no topology constraints, and requires no transmission or inversion of matrices, therefore saving in communications and complexity. We show that the algorithm is stable and analyze its performance comparing it to the centralized global solution. We also show how to select the combination weights optimally.
Article
We formulate and study distributed estimation algorithms based on diffusion protocols to implement cooperation among individual adaptive nodes. The individual nodes are equipped with local learning abilities. They derive local estimates for the parameter of interest and share information with their neighbors only, giving rise to peer-to-peer protocols. The resulting algorithm is distributed, cooperative and able to respond in real time to changes in the environment. It improves performance in terms of transient and steady-state mean-square error, as compared with traditional noncooperative schemes. Closed-form expressions that describe the network performance in terms of mean-square error quantities are derived, presenting a very good match with simulations.