ArticlePDF Available

Distributed Adaptive Node-Specific Signal Estimation in Fully Connected Sensor Networks—Part I: Sequential Node Updating

November 2010
IEEE Transactions on Signal Processing 58(10):5277 - 5291

November 2010
58(10):5277 - 5291

DOI:10.1109/TSP.2010.2052612

Source
IEEE Xplore

Authors:

Alexander Bertrand

KU Leuven

Marc Moonen

KU Leuven

We introduce a distributed adaptive algorithm for linear minimum mean squared error (MMSE) estimation of node-specific signals in a fully connected broadcasting sensor network where the nodes collect multichannel sensor signal observations. We assume that the node-specific signals to be estimated share a common latent signal subspace with a dimension that is small compared to the number of available sensor channels at each node. In this case, the algorithm can significantly reduce the required communication bandwidth and still provide the same optimal linear MMSE estimators as the centralized case. Furthermore, the computational load at each node is smaller than in a centralized architecture in which all computations are performed in a single fusion center. We consider the case where nodes update their parameters in a sequential round robin fashion. Numerical simulations support the theoretical results. Because of its adaptive nature, the algorithm is suited for real-time signal estimation in dynamic environments, such as speech enhancement with acoustic sensor networks.

Description of the scenario. The network contains J sensor nodes, k = 1...J, where node k collects M -channel sensor signal observations and estimates a node-specific desired signal d , which is a mixture of the Q channels of a common latent signal d.

…

A hierarchical architecture with 3 fusion centers (J = 3), each one collecting sensor signals from nearby sensors.

…

LS error of node 1 versus iteration i for four different scenarios in a network with J = 4 nodes. Each node has 10 sensors.

…

LS error of node 1 versus iteration i for networks with J = 4, J = 8 and J = 15 nodes respectively. Each node has 10 sensors.

…

LS error of node 1 versus iteration i in a network with J = 4 nodes. Each node has 10 sensors. (a) Different values of K , keeping Q = 3 and (b) different values of K = Q.

…

Figures - uploaded by Alexander Bertrand

Content may be subject to copyright.

Content uploaded by Alexander Bertrand

Content may be subject to copyright.

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, OCTOBER 2010 5277

Distributed Adaptive Node-Speciﬁc Signal Estimation

in Fully Connected Sensor Networks—Part I:

Sequential Node Updating

Alexander Bertrand, Student Member, IEEE, and Marc Moonen, Fellow, IEEE

Abstract—We introduce a distributed adaptive algorithm for

linear minimum mean squared error (MMSE) estimation of

node-speciﬁc signals in a fully connected broadcasting sensor

network where the nodes collect multichannel sensor signal obser-

vations. We assume that the node-speciﬁc signals to be estimated

share a common latent signal subspace with a dimension that is

small compared to the number of available sensor channels at

each node. In this case, the algorithm can signiﬁcantly reduce the

required communication bandwidth and still provide the same

optimal linear MMSE estimators as the centralized case. Further-

more, the computational load at each node is smaller than in a

centralized architecture in which all computations are performed

in a single fusion center. We consider the case where nodes update

their parameters in a sequential round robin fashion. Numerical

simulations support the theoretical results. Because of its adaptive

nature, the algorithm is suited for real-time signal estimation in

dynamic environments, such as speech enhancement with acoustic

sensor networks.

Index Terms—Adaptive estimation, distributed estimation, wire-

less sensor networks (WSNs).

I. INTRODUCTION

IN a sensor network [1] a general objective is to utilize all

sensor signal observations available in the entire network to

perform a certain task, such as the estimation of a parameter or

signal. Gathering all observations in a fusion center to calculate

an optimal estimate may however require a large communica-

tion bandwidth and computational power. This approach is often

Manuscript received October 21, 2009; accepted March 21, 2010. Date of

publication June 10, 2010; date of current version September 15, 2010. The as-

sociate editor coordinating the review of this manuscript and approving it for

publication was Dr. Ta-Sung Lee. The work of A. Bertrand was supported by

a Ph.D. grant of the I.W.T. (Flemish Institute for the Promotion of Innovation

through Science and Technology). This work was carried out at the ESAT Labo-

ratory of Katholieke Universiteit Leuven, in the frame of K.U. Leuven Research

Council CoE EF/05/006 Optimization in Engineering (OPTEC), Concerted Re-

search Action GOA-AMBioRICS, Concerted Research Action GOA-MaNet,

the Belgian Programme on Interuniversity Attraction Poles initiated by the Bel-

gian Federal Science Policy Ofﬁce IUAP P6/04 (DYSCO, “Dynamical sys-

tems, control and optimization,” 2007–2011), and Research Project FWO nr.

G.0600.08 (“Signal processing and network design for wireless acoustic sensor

networks”). The scientiﬁc responsibility is assumed by its authors.

The authors are with the Department of Electrical Engineering (ESAT-SCD/

SISTA), Katholieke Universiteit Leuven, B-3001 Leuven, Belgium (e-mail:

alexander.bertrand@esat.kuleuven.be; marc.moonen@esat.kuleuven.be).

Color versions of one or more of the ﬁgures in this paper are available online

at http://ieeexplore.ieee.org.

Digital Object Identiﬁer 10.1109/TSP.2010.2052612

referred to as centralized fusion or estimation. An alternative is

a distributed approach where each node has its own processing

unit and the estimation relies on distributed processing and co-

operation. This approach is preferred, especially so when it is

scalable in terms of its communication bandwidth requirement

and computational complexity.

In many sensor network estimation frameworks, the sensor

signal observations are used to estimate a common network-

wide desired parameter or signal, denoted here by . This means

that all nodes contribute to a common goal, i.e., the estimation of

the globally deﬁned variable , which is the same for all nodes

(see for example [2]–[8]). This can be viewed as a special case

of the more general problem, which is considered here, where

each node in the network estimates a different node-speciﬁc de-

sired signal, i.e., node estimates the locally deﬁned signal .

This means that all nodes have a different local objective, which

they pursue through cooperation with other nodes. We describe

a distributed adaptive node-speciﬁc signal estimation (DANSE)

algorithm that operates in an ideal fully connected network. The

nodes broadcast compressed multichannel sensor signal obser-

vations that can be captured by all other nodes in the network,

possibly with the help of relay nodes. The computational load

is distributed over the different nodes in the network.

The DANSE algorithm is designed for the case where the

node-speciﬁc desired signals share a common (unknown) la-

tent signal subspace. If this signal space has a small dimension

compared to the number of available sensor channels at each

node, the DANSE algorithm exploits this common interest of

the nodes to signiﬁcantly compress the data to be broadcast, and

yet converge to the optimal linear minimum mean squared error

(MMSE) estimators as if all sensor signal observations were

available at each node. Although the DANSE algorithm implic-

itly assumes a speciﬁc structure in the relationship between the

desired signals of the different nodes, it is noted that the actual

parameters of these latent dependencies are not assumed to be

known, i.e., nodes do not know how their desired signal is re-

lated to the desired signals of other nodes. The model that is

assumed in the DANSE algorithm naturally emerges in adap-

tive signal estimation problems in dynamic scenarios where the

target signal statistics and the transfer functions to the sensors

are not known and may change during operation of the algo-

rithm. Therefore, the original target signal cannot be recovered,

and so an option is then to let the nodes optimally estimate the

signal as it is observed locally by the node’s sensors. In this case,

the desired signals of the different nodes are differently ﬁltered

5278 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, OCTOBER 2010

versions of the same target signal, i.e., they share a common la-

tent signal subspace.

Because of its adaptive nature, the DANSE algorithm is

suited for real-time applications in dynamic environments.

Typical applications are vibration monitoring, wireless acoustic

sensor networks (for surveillance, video conferencing, do-

motics, audio recording ), and noise reduction in hearing aids

with external sensor nodes and/or cooperation between multiple

hearing aids [9], [10]. Node-speciﬁc estimation is particularly

important in applications where a target signal needs to be

estimated as it is observed at a speciﬁc sensor position. For

instance, in acoustic surveillance, it is often required to be able

to locate a sound source, so spatial information in the obser-

vations of different nodes must be retained in the estimation

process. In cooperating hearing aids, it is important to estimate

the signal as it impinges at the hearing aid itself, to preserve the

auditory cues for directional hearing [11], [12].

The DANSE algorithm is based on linear compression of

multichannel sensor signal observations. Linear compression

of sensor signal observations for data fusion has been the

topic of earlier work, e.g., [5]–[8]. The presented techniques,

however, assume prior knowledge of the intra- and intersensor

(cross-)correlation structure in the entire network. This must

be obtained by a priori training using all uncompressed sensor

signal observations, or must be derived from a speciﬁc data

model. Such assumptions make it difﬁcult to apply the resulting

algorithms in adaptive networks or dynamic environments

where the statistics of the desired signals or sensor signals may

change. The DANSE algorithm can adapt to these changes

because nodes estimate and reestimate all required statistical

quantities on the compressed data during operation. For this,

we assume that each node can adaptively estimate the cross cor-

relation between its local sensor signals and its desired signal.

It is noted that the acquisition of these signal statistics is often

difﬁcult or impossible, since the target signal is assumed to be

unknown. However, we will explain that in particular cases,

it is possible to estimate the required statistics, e.g., when the

target signal has an ON–OFF behavior (such as speech signals),

or when the target source periodically transmits a priori known

training sequences. In cases where the local statistics cannot

be estimated adaptively, the DANSE algorithm can still be

used in a semi-adaptive context, i.e., scenarios with static noise

statistics but with changing target signal statistics or vice versa,

assuming that the static correlation structure is a priori known.

In [13], a batch-mode description of the DANSE algorithm

was brieﬂy introduced. In this paper, we provide more details,

i.e., we include a convergence proof and introduce a truly adap-

tive version. In addition, we address implementation aspects,

and provide extensive simulation results, both in batch mode and

in a dynamic scenario. We only consider the case where nodes

update their parameters in a sequential round robin fashion. The

case where nodes update simultaneously or asynchronously is

treated in a companion paper [14]. In [10], a pruned version

of the DANSE algorithm has been used for microphone-array

based speech enhancement in binaural hearing aids, where it

was referred to as distributed multichannel Wiener ﬁltering. In

this application, two hearing aids in a binaural conﬁguration ex-

change a linear combination of their microphone signals to esti-

mate the target sound that is recorded by their reference micro-

phone. Convergence of the two-node system has been proven for

the special case where there is a single target speaker. The more

general DANSE algorithm provided in this paper allows for a

nontrivial extension to a scenario with multiple target speakers

and a network with more than two nodes. Using extra acoustic

sensor nodes that communicate with the hearing aids generally

improves the noise reduction performance, since the acoustic

sensors physically cover a larger area [9].

The paper is organized as follows. The problem formulation

and notation are presented in Section II. In Section III, we ﬁrst

address the simple case in which the node-speciﬁc desired sig-

nals are scaled versions of each other and we prove conver-

gence of the DANSE algorithm to the optimal linear MMSE

estimators when nodes update their parameters sequentially. In

Section IV, this algorithm is generalized to the case in which

the node-speciﬁc desired signals share a common latent -di-

mensional signal subspace. In Section V, we address some im-

plementation details of DANSE and we study the complexity

of the algorithm. Finally, Section VI illustrates the convergence

results with numerical simulations. Conclusions are given in

Section VII.

II. PROBLEM FORMULATION AND NOTATION

A. Node-Speciﬁc Linear MMSE Estimation

We consider an ideal fully connected network with sensor

nodes , in which data broadcast by a node

can be captured by all other nodes in the network

through an ideal link. Node collects observations of a com-

plex1valued -channel signal , where is the dis-

crete time index, and where is an -dimensional column

vector. Each channel , , of the signal

corresponds to a sensor signal to which node has access.

We assume that all signals are stationary and ergodic. In prac-

tice, the stationarity and ergodicity assumption can be relaxed to

short-term stationarity and ergodicity, in which case the theory

should be applied to ﬁnite signal segments that are assumed to

be stationary and ergodic. For the sake of an easy exposition, we

will omit the time index when referring to a signal, and we will

only write the time index when referring to one speciﬁc obser-

vation, i.e., is the observation of the signal at time .We

deﬁne as the -channel signal in which all are stacked,

where . This scenario is described in Fig. 1.

It is noted that this problem formulation also allows for hier-

archical network architectures, in which the sensors are grouped

in clusters. The sensors of a speciﬁc cluster then transmit

their observations to a nearby fusion center, i.e., a “higher level”

node. The fusion centers then correspond to the nodes in

the above framework, and the collected observations in sensor

cluster correspond to the -channel signals as explained

above. Fig. 2 shows such a scenario for a network with three fu-

sion centers .

We ﬁrst consider the centralized estimation problem, i.e., we

assume that each node has access to the observations of the en-

tire -channel signal . This corresponds to the case where

1Throughout this paper, all signals are assumed to be complex valued to

permit frequency-domain descriptions.

BERTRAND AND MOONEN: DANSE IN FULLY CONNECTED SENSOR NETWORKS—PART I 5279

Fig. 1. Description of the scenario. The network contains sensor nodes,

, where node collects -channel sensor signal observations and es-

timates a node-speciﬁc desired signal , which is a mixture of the channels

of a common latent signal .

Fig. 2. A hierarchical architecture with 3 fusion centers , each one

collecting sensor signals from nearby sensors.

nodes broadcast their uncompressed observations to all other

nodes. In Sections III and IV, the general goal will be to com-

press the broadcast signals, while preserving the estimation per-

formance of this centralized estimator. The objective for node

is to estimate a complex valued node-speciﬁc signal , referred

to as the desired signal, from the observations of . We consider

the general case where is not an observed signal, i.e., it is as-

sumed to be unknown, as it is the case in signal enhancement

(e.g., in speech enhancement, is the speech component in a

noisy microphone signal). Node uses a linear estimator to

estimate as where is a complex valued -di-

mensional vector, and where superscript denotes the conju-

gate transpose operator. We assume that the -channel signal

is correlated to the node-speciﬁc desired signals, but unlike [6],

[8], we do not restrict ourselves to any data model generating the

sensor signals, nor do we make any assumptions on the proba-

bility distributions of the involved signals. We consider linear

MMSE estimation based on a node-speciﬁc estimator , i.e.

(1)

with the expected value operator. Assuming that the cor-

relation matrix has full rank,2the unique so-

lution of (1) is [15]:

(2)

with , where denotes the complex conjugate

of . Based on the assumption that the signals are ergodic,

and can be estimated by time averaging. The is di-

rectly estimated from the sensor signal observations. Since is

assumed to be unknown, the estimation of the correlation vector

has to be done indirectly, based on speciﬁc strategies, e.g.,

by exploiting the ON–OFF behavior of the target signal (e.g., for

speech enhancement [9], [10]), by using training sequences, or

by using partial prior knowledge when the estimation is per-

formed in a semi-adaptive context. We will provide more details

on these strategies in Section V-A. In the sequel, we assume that

can be estimated during operation of the algorithm.

In the above estimation procedure, temporal correlation ap-

pears to be ignored. However, differently delayed versions of

one or more sensor signals at node can be added to the chan-

nels of , to also exploit the temporal information in the sig-

nals. For example, assume that node has access to 4 sensor

signals. Then each of these signals is delayed with 1, up to

sample delays, resulting in extra (delayed) channels. In

this case, the dimension of is .

It is noted that our problem statement differs from [2]–[4],

where each node collects different spatio–temporal observations

of two correlated signals and . The objective is then to ﬁnd

the best common linear ﬁt between these observations, with a

single set of coefﬁcients , which is assumed to be the same for

each node. Since the coefﬁcients in are of interest, only the

locally estimated ’s must be shared between nodes, whereas

the sensor observations themselves are only used locally to up-

date the estimate of . Since all nodes are assumed to estimate

the same set of coefﬁcients, incremental or diffusive averaging

strategies can be used.

B. Common Latent Signal Subspace

In our problem statement, each node only collects observa-

tions of which corresponds to a subset of the channels of the

full signal . To ﬁnd the optimal MMSE solution (2), each node

therefore in principle has to broadcast its observations of

to all other nodes in the network, which requires a large com-

munication bandwidth. One possibility to reduce the required

bandwidth is to broadcast only a few linear combinations of the

components of the observations instead of all compo-

nents. Finding the optimal linear compression is often a non-

trivial task, and in general this will not lead to the optimal solu-

tions (2). In many practical cases, however, the signals share

a common latent signal subspace, and then this can be exploited

in the compression. The most simple case is when all ,

i.e., the desired signal is the same for all nodes. We will ﬁrst

handle the slightly more general case where all are scaled

versions of a common latent single-channel signal . For this

2This assumption is mostly satisﬁed in practice because of a noise component

at every sensor that is independent of other sensors, e.g., thermal noise. If not,

pseudoinverses should be used. A further comment on the rank-deﬁcient case is

made in Section IV-C.

5280 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, OCTOBER 2010

scenario, we will introduce the algorithm, in which

the data to be broadcast by each node is compressed by a

factor . Despite this compression, the algorithm converges

to the optimal node-speciﬁc solution (2) at every node as if no

compression were used for the broadcasts.

This scenario can then be extended to the more general case

where the desired signals share a common -dimensional signal

subspace, i.e.

(3)

with deﬁning an unknown -dimensional complex vector,

and a latent complex valued -channel signal deﬁning the

-dimensional signal subspace that contains all signals. This

model applies to situations where the desired signal is generated

by multiple latent processes simultaneously (e.g., measuring vi-

brations when there are multiple exciters, or recording a con-

versation between multiple speakers [9]). Since the statistics of

the latent signals as well as the propagation properties to the

different sensors are generally unknown, the signal estimation

procedure can only use statistics that can be obtained from the

local sensor signal observations. The desired signal of each node

is then the linear mixture of the latent target signals as locally

observed by a reference sensor.

In the sequel, we consider the general case where node es-

timates a -channel desired signal

(4)

with a complex valued matrix. This data model is

depicted in Fig. 1. It is noted that the matrix and the la-

tent signal are assumed to be unknown, i.e., nodes do not

know how their node-speciﬁc desired signals are related to

each other. Since we also consider complex valued signals, (4)

can correspond to a frequency domain description of a convo-

lutive mixture in the time domain, as in [9], [10]. Expression

(4) then deﬁnes a different estimation problem for each speciﬁc

frequency. This yields frequency dependent estimators ,

which translate to multitap ﬁlters in the time domain.

Notice that, if , the desired signal spans the com-

plete signal subspace deﬁned by the -channel signal (pro-

vided that the matrix has full rank). If this holds

for each node in the network, we will show that the data to

be broadcast by node can be compressed by a factor .

This means that node only needs to broadcast linear com-

binations of the components of its observations of , while the

optimal node-speciﬁc solution (2) is still obtained at all nodes.

Notice that in practical applications, the actual signal(s) of in-

terest can be a subset of the entries in , in which case the

other entries should be seen as auxiliary channels to capture

the latent -dimensional signal subspace that contains the ’s.

For instance, consider the case where nodes estimate the target

signal as observed by their reference sensor, i.e., node esti-

mates the node-speciﬁc desired signal as in (3). Node then

selects extra auxiliary reference sensors, and also esti-

mates the target signal as it arrives on these sensors. The re-

sulting -channel desired signal then spans the complete

signal subspace if .

III. DANSE WITH SINGLE-CHANNEL BROADCAST SIGNALS

The algorithm introduced in this paper is an iterative scheme

referred to as distributed adaptive node-speciﬁc signal estima-

tion (DANSE), since its objective is to estimate a node-spe-

ciﬁc signal at each node in a distributed fashion. In the gen-

eral scheme, each node broadcasts -component

compressed sensor signal observations. We will refer to this as

, where the subscript refers to the number of chan-

nels of the broadcast signals. For the sake of an easy exposi-

tion, we ﬁrst introduce the DANSE algorithm for the simple case

where and we will show that converges to the

optimal ﬁlters if , i.e., if the single-channel desired signals

are nonzero scaled versions of the same latent single-channel

signal . In Section IV we generalize this to the more general

algorithm, and we will show that this algorithm con-

verges to the optimal ﬁlters if and if all in (4) have

rank .

A. Algorithm

The goal for each node is to estimate the signal with a

linear estimator that uses all observations in the entire network,

i.e., . We aim to obtain the MMSE solutions (2),

without the need for each node to broadcast all components

of the observations. For this, we deﬁne a partitioning of the

estimator as with denoting the

-dimensional subvector of that is applied to , and with

superscript denoting the transpose operator. In this way, (1)

is equivalent to

(5)

Since node only has access to the sensor signal observations

of , it can only control a speciﬁc part of the estimator ,

namely . In the algorithm, each node broad-

casts the output of this partial estimator, i.e., observations of the

compressed signal . This reduces the data to be

broadcast by a factor . It is noted that acts both as a

compressor and as a part of the estimator , i.e., the observa-

tions of the compressed signal that is broadcast by node is

also used in the estimation of at node itself.

A node now has access to input channels, i.e.,

its own sensor signals and signals that it receives

from the other nodes. Node will compute the optimal linear

combiner of these input channels to estimate .

The coefﬁcient that is applied to the signal observations of

at node is denoted by . A schematic illustration of this

scheme (for ) is shown in Fig. 3. Notice that there is

no decompression involved, i.e., node does not expand the

observations of the signal, but only scales these with a scaling

BERTRAND AND MOONEN: DANSE IN FULLY CONNECTED SENSOR NETWORKS—PART I 5281

Fig. 3. The scheme with three nodes . Each node es-

timates a signal using its own -channel sensor signal observations, and

two single-channel signals broadcast by the other two nodes.

factor . As visualised in Fig. 3, the parametrization of the

now effectively applied at node is therefore

.(6)

i.e., each is now deﬁned by the set of ’s to-

gether with a vector , deﬁning the scaling

parameters. We use a tilde to indicate that the estimator is pa-

rametrized according to (6), which deﬁnes a solution space for

with a speciﬁc structure. In this parametrization,

node can only manipulate the parameters and . In the

sequel, we set to remove the ambiguity in

(hence is omitted in Fig. 3). Notice that the solution space

of is -dimensional,

which is smaller3than the original -dimensional solution

space corresponding to the centralized algorithm,

i.e., the solution space of the optimization problem (1). Still, the

goal of the algorithm is to iteratively update the pa-

rameters of (6) until .

In the sequel, we will use the following notation and deﬁ-

nitions. In general, we will use to denote at iteration ,

where can be a signal or a parameter. The -channel signal

is deﬁned as . We deﬁne as the vector

with entry omitted. Similarly, we deﬁne as the vector

with entry omitted.

At every iteration in the algorithm, one speciﬁc

node will update its local parameters and ,by

solving its local node-speciﬁc MMSE problem with respect to

3It is assumed here that , i.e., , , and there is at least

one node for which .

its input signals, consisting of its own sensor signal observa-

tions and the compressed signal observations of , i.e.,

it solves

(7)

Let denote the stacked version of the local input signals at

node , i.e.

(8)

Then the solution of (7) is

(9)

with

(10)

(11)

Since there is no decompression involved, the local estimation

problems (7) have a smaller dimension than the original net-

work-wide estimation problems (1), , i.e., the matrix

is smaller than the matrix in (2).

We deﬁne a block size which denotes the number of obser-

vations that the nodes collect in between two successive node

updates, i.e., in between two increments of . The al-

gorithm now consists of the following steps:

1) Initialize: ,

Initialize and with random vectors, .

2) Each node performs the following operation cycle:

• Collect the sensor observations ,

• Compress these -dimensional observations to

(12)

• Broadcast the compressed observations ,

, to the other nodes.

• Collect the -dimensional data vectors

, , which are stacked

versions of the compressed observations received from

the other nodes.

• Update the estimates of and , by including

the newly collected data.4

• Update the node-speciﬁc parameters:

if (13)

4In Section V-A, we will suggest some possible strategies to estimate these

parameters.

5282 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, OCTOBER 2010

• Compute the estimate of , ,as

(14)

3) .

4) .

5) Return to step 2)

Remark I: Notice that the different iterations are spread out

over time. Therefore, iterative characteristics of the algorithm

do not have an impact on the amount of data that is transmitted,

i.e., each sample is only broadcast once since the time index in

(12) and (14) shifts together with the iteration index.

Remark II: In the above algorithm description, it is not

mentioned how the correlation matrix and the cor-

relation vector should be estimated. This estimation

process depends on the application and the signals involved.

In Section V-A, we will suggest some possible strategies to

estimate and .

Remark III: It is noted that, when a node updates its node-

speciﬁc parameters and , the signal statistics of

change, i.e., changes to . Therefore, the next node to

perform an update needs a sufﬁcient number of observations of

to reliably estimate the correlation coefﬁcients involving this

signal. Therefore, the block-length should be chosen large

enough.

B. Convergence and Optimality of if and

Nonzero Desired Signals

We now assume that all are a nonzero scaled version of

the same signal , i.e., , with a nonzero complex

scalar but unknown to the individual nodes. Formula (2) shows

that in this case, all are parallel, i.e.

(15)

with . Therefore, the set belongs

to the solution space used by , as speciﬁed by (6), i.e.,

In the theoretical convergence analysis in the sequel, we as-

sume that the correlation matrices and the correlation

vectors , , are perfectly estimated, i.e., as if they

are computed over an inﬁnite observation window. Under this

assumption, the following theorem guarantees convergence and

optimality of the algorithm.

Theorem III.1: If the sensor signal correlation matrix has

full rank, and if , , with a complex valued

single-channel signal and , then the

algorithm converges for any initialization of its parameters to

the MMSE solution (2) for all .

Before proving this theorem, we introduce some additional

notation. The vector (without subscript) denotes the stacked

vector of all vectors, i.e.

.(16)

We also deﬁne the following MSE cost functions corresponding

to node :

(17)

(18)

where is deﬁned from and as in (6). Notice that con-

tains the entry , which is a ﬁctitious variable that is never ac-

tually computed by the algorithm. We deﬁne

as the function that generates according to (9), i.e.

(19)

with denoting a identity matrix and denoting

an all-zero matrix. It is noted that the right-hand side

of (19) depends on all entries of the argument through the

signal , which is not explicitly revealed in this expression.

The proof of Theorem III.1 provided here differs from the

proof in [10], where a scheme similar to with

has been proved to converge to the optimal solution. Unlike

the proof in [10], our proof allows for a generalization to the

case with , it allows , and provides

more insight in the convergence properties of the algorithm. We

ﬁrst prove the convergence statement of Theorem III.1, and then

the optimality statement.

Proof of Convergence: We prove that the sequence

and the sequences converge to a

limit point and respectively. When node performs

an update of its variables and at iteration , these

are replaced by the solution of the local MMSE problem (7),

repeated here for convenience:

(20)

If another node were to optimize the variables and

with respect to its own node-speciﬁc estimation problem, it

would solve the problem

(21)

Since with , the solution of (20) and

(21) are identical up to a scalar . This means that an update

of and at node , which is an optimization leading

to a decrease of , will also lead to a decrease of for any

if node were allowed to also perform a responding

optimization of its . This shows that for any (independent of

the selection of the node that actually performs an update at

iteration )

(22)

Since all have a lower bound, each sequence

converges to a limit , i.e.

(23)

BERTRAND AND MOONEN: DANSE IN FULLY CONNECTED SENSOR NETWORKS—PART I 5283

If we again assume that node performs an update at iteration

, then because of the strict convexity of the cost function in (20),

the following expression holds:

(24)

with

(25)

This shows that, after convergence of the sequences

, , any update of a

must correspond to a scaling. Notice however that

.(26)

i.e., a scaling of a in node does not change the update of

in node , since the scaling is implicitly compensated in

by the parameter . This proves convergence of the sequence

to a limit point and therefore also the sequences

must converge to a limit point , . Notice

that after convergence, based on what was stated earlier

(27)

or equivalently

(28)

From the proof of convergence, one can also conclude that

convergence of the cost functions will be monotonic, when

sampled at the iteration steps in which node updates its

parameters. Indeed, whenever node optimizes its own local

MMSE problem, it also optimizes the corresponding MMSE

problem in node , at least when the latter is allowed to perform

a responding update of its parameter . This shows that the

algorithm is at least as fast as a centralized equivalent

that would use an alternating optimization (AO) technique

[16], which is often referred to as the nonlinear Gauss-Seidel

algorithm [17], with partitioning following directly from the

parameters and for each node.

Proof of Optimality: We now prove that is the solution

of (1) for every node , which is equivalent to proving that the

gradient of is zero when evaluated at equilibrium, i.e.

(29)

Because the solution of (20) sets the partial gradient of with

respect to to zero, we ﬁnd that

(30)

Since , we can show that

(31)

Combining (30) and (31) yields

(32)

Notice that (27) is equivalent with

(33)

Substituting (33) in (32) yields

(34)

which is equivalent to (29). This proves the theorem.

IV. DANSE WITH -CHANNEL BROADCAST SIGNALS

A. Algorithm

In the algorithm, each node broadcasts

-component compressed sensor signal obser-

vations to the other nodes. This compresses the data to be

sent by node by a factor of . We as-

sume that each node estimates a -channel desired signal

. Assuming that the desired signals

share a common -dimensional latent signal subspace, we

will show in Section IV-B that achieves the optimal

estimators if is chosen equal to . Notice that the actual

signal(s) of interest can be a subset of the vector , and the

other entries should then be seen as auxiliary channels to fully

capture the latent signal subspace, as explained in Section II-B.

Generally, these auxiliary channels are obtained by choosing

extra reference sensors at node .

Again, we use a linear estimator to estimate as

. The objective for node is to

ﬁnd the linear MMSE estimator

(35)

The solution of (35) is

(36)

with . Again, we deﬁne a partitioning of the

estimator as with denoting

the submatrix of that is applied to . We wish

to obtain (36) without the need for each node to broadcast all

components of the observations. Instead each node

will broadcast observations of the -channel compressed signal

. Since the channels of will be highly corre-

lated, further joint compression is possible, but we will not take

this into consideration throughout this paper.

A node can transform the observations of that it receives

from node by a transformation matrix . Again,

it is noted that does not decompress the observations of

the signal , but makes new linear combinations of their

5284 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, OCTOBER 2010

components. The parametrization of the effectively applied

at node is then

.(37)

which is a generalization of (6). Here, node can only optimize

the parameters and . We set

with denoting the identity matrix.

The -channel signal is a stacked ver-

sion of all the broadcast signals. Similarly to the notation in

Section III, we deﬁne the signal as the signal with

omitted, and we deﬁne as the matrix with the subma-

trix omitted. The MMSE problem that is solved at node ,

at iteration ,isnow

(38)

The solution of (38) is

(39)

with deﬁned as in (10) and with

(40)

The algorithm consists of the following steps:

1) Initialize: , .

Initialize and with random matrices, .

2) Each node performs the following operation cycle:

• Collect the sensor observations ,

• Compress these -dimensional observations to

-dimensional vectors

(41)

• Broadcast the compressed observations ,

, to the other nodes.

• Collect the -dimensional data vectors

, , which are stacked

versions of the compressed observations received from

the other nodes.

• Update the estimates of and , by including

the newly collected data.

• Update the node-speciﬁc parameters:

if (42)

• Compute the estimate of , ,

(43)

3) .

4) .

5) Return to step 2)

is a straightforward generalization of the

algorithm as explained in Section III-A, where all vector-vari-

ables are replaced by their matrix equivalent. Similarly, expres-

sions (16)–(19) can be straightforwardly generalized to their

matrix equivalent.

B. Convergence and Optimality of if and

Full Rank

We now assume that , , with a

matrix of rank and a complex valued -channel signal.

This means that all desired signals share the same -dimen-

sional latent signal subspace (i.e., ). Formula (36) shows

that in this case all have the same column space, i.e.

(44)

with . Therefore, the set be-

longs to the solution space used by , as speciﬁed by

(37), i.e., . The following

theorem generalizes Theorem III.1.

Theorem IV.1: If the sensor signal correlation matrix

has full rank, and if , , with a complex

valued -channel signal and a matrix of rank ,

then the algorithm converges for any initialization of

its parameters to the MMSE solution (36) for all .

Proof: The proof of Theorem III.1 can straightforwardly

be generalized to prove Theorem IV.1, by replacing every

and by its matrix version and .

In practice, the matrices should be well-conditioned to

obtain the optimal estimators, which is reﬂected in Theorem

IV.1 by the condition that has full rank. If the -channel

desired signal is deﬁned as the target signal in reference

sensors at node , this matrix can be ill-conditioned if the refer-

ence sensors are close to each other. This problem is investigated

in [9], where the DANSE algorithm is used for noise reduction

in acoustic sensor networks, and a solution is proposed to tackle

this problem.

C. DANSE Under Rank Deﬁciency

Until now, we have avoided the case where does not

have full rank or when the parameter is overestimated, i.e.,

. Both cases can result in broadcast data for which the

correlation matrix is rank deﬁcient.5In this case, (38) becomes

ill-posed since singular correlation matrices are involved. The

algorithm can cope with these situations by adding

5In the case where , (44) has multiple solutions for since

, . Therefore, the correlation matrix of the broadcast

signal becomes singular, once the submatrix reaches this

rank deﬁciency.

BERTRAND AND MOONEN: DANSE IN FULLY CONNECTED SENSOR NETWORKS—PART I 5285

a minimum-norm constraint to the local MMSE problems (38),

i.e., using the pseudo-inverse instead of a matrix inverse in the

computation of the solution of (38) [15]. Extensive simulations

have shown that with this modiﬁcation, the algorithm

still converges to an MMSE solution for rank deﬁcient estima-

tion problems (see Section VI).

However, if the matrix does not have full rank, the so-

lution of (1) is not unique. Simulations have shown that the

solutions obtained by the algorithm, although

leading to a minimal MSE cost at node , are generally different

from the solutions provided by the centralized minimum norm

version, i.e.

(45)

where superscript denotes the pseudoinverse.

V. IMPLEMENTATION ASPECTS

A. Estimation of the Signal Statistics

In the theoretical analysis of the algorithm, it is as-

sumed that the second order signal statistics, which are needed

to solve the MMSE problem (38) are perfectly known. How-

ever, in a practical application, the correlation matrices

and have to be estimated, based on the collected signal

observations. In this section, we will describe some strategies to

estimate these quantities.

Estimation of signal correlation matrices is typically done by

time averaging. This means that some assumptions are made on

short-term ergodicity and stationarity of the signals involved.

However, this stationarity assumption is not necessarily strict.

Even when the signals involved are nonstationary (such as in

speech processing), the algorithm can provide good

estimators. By using long-term correlation matrices, the inﬂu-

ence of rapidly changing temporal statistics is smoothed out,

yielding estimators that mainly exploit the spatial coherence

between the sensors. Since spatial coherence typically changes

slowly, the algorithm is able to provide good estima-

tors, even when the signals themselves are highly nonstationary

(this is e.g., demonstrated by the multichannel speech enhance-

ment experiments in [9]).

We let denote the estimate of at time . Signal

correlation matrices are often estimated in practice by means of

a forgetting factor , i.e.

(46)

Notice that in the algorithm, the statistics change

every time a node updates its parameters. Therefore, (46) is not

suited to compute and , since it uses an inﬁ-

nite time window. A better alternative is a simple time averaging

in a ﬁnite observation window, i.e.

(47)

where is the length of the observation window. The procedure

(46) puts more emphasis on the most recent samples, whereas

(47) applies an equal weight to all past samples in the obser-

vation window. The procedure (47) can be implemented recur-

sively by means of an updating and a downdating term, i.e.

(48)

Notice that the window length introduces a trade-off between

tracking performance and estimation performance. Indeed, to

have a fast tracking, the statistics must be estimated from short

signal segments, yielding larger estimation errors in the correla-

tion matrices that are used to compute the estimators at the dif-

ferent nodes. However, as will be demonstrated in Section VI-B,

the algorithm is more robust to these errors, com-

pared to the equivalent centralized algorithm, due to the fact

that uses correlation matrices with smaller dimen-

sions than the network-wide estimation problem.

The estimation of is less straightforward since the

signal cannot be observed directly. However, depending on

the application and the signals involved, some strategies can be

developed to estimate , as explained in the following two

examples.

If the transmitting sources are controlled by the application

itself, as it is the case in a communications scheme, the source

signals that deﬁne the different channels in can be manipu-

lated directly. At periodic intervals, a deterministic training se-

quence can be broadcast by the transmitters. If the nodes have

knowledge about these training sequences, they can use this to

compute in a similar way as in (48), during the broad-

cast of these training sequences. After the broadcast, the esti-

mate is ﬁxed until new training sequences are broadcast.

A different strategy can be applied if the desired signal has

an ON–OFF behavior.6Assume that the sensor signals in con-

sist of a desired component and an additive noise component

, i.e., , where has an ON–OFF behavior, and where

then . In many practical applications, it can

also be assumed that and are independent, and therefore7

(49)

If there is a detection mechanism available that detects whether

the signal is present or not, one can estimate in

time segments where only noise is observed (“noise-only seg-

ments”). Since the noise is uncorrelated to the desired compo-

nent , we ﬁnd that

(50)

with

(51)

where is the desired component in the signal . The se-

lection matrix is used to select the ﬁrst columns corre-

6This is often used in speech enhancement applications, since a speech signal

typically contains a lot of silent pauses in between words or sentences.

7For the sake of an easy exposition, we assume that the signals and have

zero mean.

5286 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, OCTOBER 2010

sponding to . Deﬁne the noise correlation

matrix

(52)

where denotes the noise component in the signal . With

(50), and similarly to (49), we readily ﬁnd that

(53)

Using (53), one can compute as the difference between

and , where the latter is computed as in (48),

during noise-only periods.

Notice that, even if the target signal does not have this ON–OFF

behavior, the above strategy can be used in a semi-adaptive con-

text, i.e., where the target signal statistics may change but the

noise statistics are static and a priori known (or vice versa). In-

deed, if is known, then (53) can be used to compute

the required statistics. Notice that in (53) is a compressed

version of , i.e., it depends on the current parameters

in . Therefore, each node has to broadcast the entries of

, which are needed in the other nodes to compress the cor-

responding submatrices in . Since these values change

only once for each observations that are collected by the

sensors, the resulting increase in bandwidth is negligible com-

pared to the transmission of the samples of .

B. Computational Complexity

The estimation of the correlation matrices and ,

and the inversion of the former, are the most computationally

expensive steps of the algorithm. From (48) it fol-

lows that an update of at node , has a computational

complexity of

(54)

i.e., it is quadratic in the number of nodes , the number of

channels in the broadcast signals, and the number of channels

of the signal . If node updates its parameters and

according to (39), it performs a matrix inversion, which is

computationally more expensive than (54). However, instead of

computing this inversion, node can directly update the inverse

of at each time by means of the matrix inversion

lemma [15], i.e.

(55)

(56)

This update also has computational complexity (54), and

therefore this is the overall complexity for a single node in the

algorithm.

VI. NUMERICAL SIMULATIONS

In this section, we provide simulation results to demonstrate

the behavior of the algorithm. In Section VI-A, we

perform batch mode simulations where the required statistics

are computed over the full length signals, and where the ’s are

available8to compute . In the batch version of ,

all iterations are performed on the same set of signal observa-

tions. In Section VI-B, a more practical scenario with moving

sources is considered. The algorithm adapts to the

changes in the scenario, and each set of observations is only

broadcast once, i.e., subsequent iterations are performed over

different observation sets. Furthermore, a practical estimation

of the correlation matrices is used, where the ’s are assumed

to be unavailable.

A. Batch Mode Simulations

In this section, we simulate the algorithm in batch

mode. This means that all iterations are performed on the full

signal length. The network consists of four nodes , each

having 10 sensors . The dimension of the latent signal

subspace deﬁned by is . All 3 channels of are uni-

formly distributed random processes on the interval [ ]

from which samples are generated. The coefﬁcients

in are generated by a uniform random process on the unit in-

terval. The sensor signals in consist of the different random

mixtures of the latent -channel signal to which zero-mean

white noise is added with half the power of the channels of .

The initial values of all and are taken from a uniform

random distribution on the unit interval.

The batch mode performance of the algorithm as

well as the algorithm is simulated for this particular

scenario. All evaluations of the MSE cost functions are per-

formed on the equivalent least-squares (LS) cost functions, i.e.

(57)

Also, the correlation matrices are replaced by their least squares

equivalent, i.e., is replaced by where denotes

the sample matrix that contains samples of the variable

in its columns.

The results are illustrated in Fig. 4, showing the LS cost of

node 1 versus the iteration index . Node 1 is the ﬁrst node

that performs an update. It is observed that the al-

gorithm converges to the optimal linear LS solution, whereas

the algorithm does not since in this case.

Downsampling the curve corresponding to by a factor

, keeping only the iterations in which node 1 updates its

parameters, results in a monotonically decreasing cost. This is

because of expression (22), showing that the cost indeed mono-

tonically decreases whenever a node optimizes its parame-

ters. If the curve corresponding to is downsampled

8This is similar to using a priori known training sequences.

BERTRAND AND MOONEN: DANSE IN FULLY CONNECTED SENSOR NETWORKS—PART I 5287

Fig. 4. LS error of node 1 versus iteration for four different scenarios in a

network with nodes. Each node has 10 sensors.

Fig. 5. LS error of node 1 versus iteration for networks with ,

and nodes respectively. Each node has 10 sensors.

with the same factor, we do not obtain a monotonically de-

creasing cost, since expression (22) is not valid anymore for this

case.

In Fig. 5, we vary the number of nodes , keeping all other

parameters unchanged. All nodes again have 10 sensors. Not

surprisingly, the convergence time of increases lin-

early with since the effective number of updates per time unit

in node 1 is reduced. As soon as each node has updated its pa-

rameters three times, the cost is almost at its minimum at each

node.

In Fig. 6(a), we increase the value of while keeping

. Notice that this corresponds to the case where is

overestimated and hence communication bandwidth is used

inefﬁciently. The estimation problem becomes rank deﬁcient in

this case, and so the algorithm should be modiﬁed by replacing

matrix inversions by pseudoinversions (see Section IV-C). The

algorithm still converges, and the optimal LS cost is again

reached after three iterations per node when is overesti-

mated. In Fig. 6(b), we increase the value of together with

, keeping . This is again observed to have a negligible

effect on convergence time.

As a general conclusion, we can state that for all settings

of the parameters , , , the algorithm approxi-

mately achieves convergence as soon as each node has updated

its parameters three times.

Simulation results with speech signals are provided in a

follow-up paper [9]. In this paper, a distributed speech enhance-

ment algorithm based on and its variations, is tested

in a simulated acoustic sensor network scenario.

B. Adaptive Implementation

In this section, we show simulation results of a practical

implementation of the algorithm in a scenario with

moving sources. The main difference with the batch mode

simulations is that subsequent iterations are now performed

on different signal segments, i.e., the same data is never used

twice. This yields larger estimation errors, since shorter signal

segments are used to estimate the statistics of the input signals.

Furthermore, we will use a practical estimation procedure to

estimate the correlation matrices and , yielding

larger estimation errors.

The scenario is depicted in Fig. 7. The network contains

nodes . Each node has a reference sensor at the node itself,

and can collect observations of ﬁve additional sensors that

are uniformly distributed within a 1.6-m radius around the node.

Eight localized white Gaussian noise sources are present.

Two target sources move back and forth over the indicated

straight lines at a speed of 1 m/s, and halt for 2 s at the end points

of these lines. The ﬁrst source (moving on the vertical line)

transmits a low-pass ﬁltered white noise signal with a cut-off

frequency of 1600 Hz. The other source transmits a band-pass

ﬁltered white noise signal in the frequency range from 1600 to

3200 Hz. Both target sources have an ON–OFF behavior with a

period of 0.2 s and both are active 66% of the time. It is assumed

that at each time , all nodes can detect whether the sources are

active or not. The time between two consecutive updates is 0.4 s,

which corresponds to two ON–OFF cycles of the target sources.

This means that, every 0.4 s, the iteration index changes to

. The sensors observe their signals at a sampling frequency

of .

The target source signals have half the power of the noise

sources. In addition to the spatially correlated noise, indepen-

dent white Gaussian sensor noise is added to each sensor signal.

This noise component is 10% of the power of the localized

noise signals. The individual signals originating from the target

sources and the noise sources that are collected by a speciﬁc

sensor are attenuated in power and summed. The attenuation

factor of the signal power is , where denotes the distance

between the source and the sensor. We assume that there is no

time delay in the transmission path between the sources and the

5288 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, OCTOBER 2010

Fig. 6. LS error of node 1 versus iteration in a network with nodes. Each node has 10 sensors. (a) Different values of , keeping and (b) different

values of .

Fig. 7. Description of the simulated scenario. The network contains four nodes

, each node collecting observations in a cluster of six sensors . One

sensor of each cluster is positioned at the node itself. Two target sources are

moving over the indicated straight lines. Eight noise sources are present .

sensors.9Each node collects six sensor signal observations, and

uses ﬁve differently delayed versions of each of these signals in

its estimation process to exploit the temporal correlation in the

target source signals. This means that .

We let denote the signal that is collected at the reference

sensor of node . It consists of an unknown mixture of the

two target source signals, and a noise component , i.e.

(58)

9Since the time delays are the same for all sensors, the spatial information

is purely energy based in this case. Therefore, the nodes cannot perform any

beamforming towards speciﬁc locations by exploiting different delay paths be-

tween sources and sensors.

where is the two-channel signal containing the two target

source signals, and where denotes an unknown mixture

vector. The goal for node is to estimate the signal , i.e., the

target source component in its reference sensor. Since ,

the algorithm is used, and therefore an auxiliary de-

sired channel is used to obtain a two-channel desired signal at

every sensor. The auxiliary channel of consists of the target

source component in the signal that is collected by an-

other sensor of node . This component consists of another un-

known mixture of the target sources, so that the conditions of

Theorem IV.1 are satisﬁed.

The correlation matrix is computed according to

(53). The estimates and are computed sim-

ilarly to (48) with a window length of and

, respectively, which matches the time between two con-

secutive updates.

We will use the signal-to-error ratio (SER) as a measure to as-

sess the performance of the estimators. The instantaneous SER

for node at time and iteration is computed over 3200 sam-

ples, and is deﬁned as

(59)

where denotes the ﬁrst column of the estimator ,as

deﬁned in (37). Notice that this is the estimator that is of ac-

tual interest, since it estimates the desired component in

the reference sensor. The other column of is viewed as an

auxiliary estimator that is used for the generation of the second

channel of the broadcast signal .

Fig. 8 shows the SER of the four nodes at different time in-

stants. Dashed vertical lines are plotted to indicate the points in

time where both sources start moving, and full vertical lines in-

dicate when they stop moving. The sources stand still in the time

intervals [0–4] s, [10–12] s, and [18–20] s. The performance is

BERTRAND AND MOONEN: DANSE IN FULLY CONNECTED SENSOR NETWORKS—PART I 5289

Fig. 8. SER versus time at the four nodes depicted in Fig. 7. The centralized version is added as a reference. Window lengths are and .

compared to the centralized version, in which all sensor signals

are centralized in a single fusion center that computes the op-

timal estimators according to (2).

In the ﬁrst 4 s, both sources stand still. The algo-

rithm needs some time to reach a good estimator at each node

(about 2 s), whereas the centralized algorithm converges much

faster. This is because the algorithm updates its nodes

one at a time, with 0.4 s in between two consecutive updates.

The centralized algorithm on the other hand, can update its es-

timators every time a new sample is collected. After a number

of iterations however, the algorithm converges to the

optimal estimators.

Not surprisingly, it is observed that the centralized algorithm

has better tracking capabilities than the algorithm.

This is again a consequence of the fact that the centralized

version computes a new estimator each time a new sample is

collected, yielding a much faster convergence. However, the

algorithm is able to react to changes in the scenario

and always regains optimality after a number of iterations.

Notice that, once the algorithm has converged, it

outperforms the centralized algorithm. This can be explained

by the fact that the algorithm uses correlation ma-

trices with smaller dimension compared to the correlation ma-

trices that are used by the centralized algorithm. Small ma-

trices are generally better conditioned and have a smaller es-

timation error than larger matrices. This performance increase

of compared to its centralized version is observed

to become more signiﬁcant when the number of sensors in-

creases, yielding larger matrices, or when the window length

decreases, yielding larger estimation errors in the correla-

tion matrices. Fig. 9 shows the performance of and

its centralized version, now with window lengths

and , i.e., roughly half the sizes of the ﬁrst ex-

periment. It is observed that the estimation performance of the

centralized algorithm signiﬁcantly decreases compared to the

ﬁrst experiment, whereas the algorithm is less inﬂu-

enced by the short window length. This observation demon-

strates that is more robust to estimation errors in the

correlation matrices compared to its centralized equivalent. No-

tice that converges much faster in the second exper-

iment, since the time between two consecutive updates is now

0.2 s instead of 0.4 s, due to the shorter window lengths. As al-

ready mentioned in Section V, this faster tracking comes with

the drawback that the estimation performance decreases due to

larger errors in the estimation of the correlation matrices.

In [14], a modiﬁed algorithm is studied, where

an improved tracking performance is obtained, by letting nodes

update simultaneously.

VII. CONCLUSION

In this paper, we have introduced a distributed adaptive al-

gorithm for linear MMSE estimation of node-spe-

ciﬁc signals in a fully connected broadcasting sensor network,

5290 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, OCTOBER 2010

Fig. 9. SER versus time at the four nodes depicted in Fig. 7. The centralized version is added as a reference. Window lengths are and .

where each sensor node collects multichannel sensor signal ob-

servations. The algorithm signiﬁcantly compresses the data to

be broadcast, and the computational load is shared amongst

the nodes. It is shown that, if the node-speciﬁc desired sig-

nals share a common low-dimensional latent signal subspace,

converges and provides the optimal linear MMSE

estimator for every node-speciﬁc estimation problem, as if all

nodes have access to all the sensor signals in the network. Sim-

ulations demonstrate that the algorithm achieves the same per-

formance as a centralized algorithm. A practical adaptive imple-

mentation of the algorithm is described and simulated, demon-

strating the tracking capabilities of the algorithm in a dynamic

scenario. It is observed that the algorithm is more ro-

bust to estimation errors in the correlation matrices, compared to

its centralized equivalent. In this paper, we have only considered

the case where nodes update their parameters in a sequential

round robin fashion. A modiﬁed algorithm is studied

in a companion paper [14], where an improved tracking perfor-

mance is obtained, by letting nodes update simultaneously.

ACKNOWLEDGMENT

The authors would like to thank B. Cornelis and the anony-

mous reviewers for their valuable comments after proof-reading

this paper.

REFERENCES

[1] D. Estrin, L. Girod, G. Pottie, and M. Srivastava, “Instrumenting the

world with wireless sensor networks,” in Proc. 2001 IEEE Int. Conf.

Acoust., Speech, Signal Processing (ICASSP ’01), 2001, vol. 4, pp.

2033–2036.

[2] C. G. Lopes and A. H. Sayed, “Incremental adaptive strategies over

distributed networks,” IEEE Trans. Signal Processing, vol. 55, pp.

4064–4077, Aug. 2007.

[3] C. G. Lopes and A. H. Sayed, “Diffusion least-mean squares over adap-

tive networks: Formulation and performance analysis,” IEEE Trans.

Signal Processing, vol. 56, pp. 3122–3136, Jul. 2008.

[4] F. Cattivelli, C. G. Lopes, and A. H. Sayed, “Diffusion recursive least-

squares for distributed estimation over adaptive networks,” IEEE Trans.

Signal Processing, vol. 56, pp. 1865–1877, May 2008.

[5] I. Schizas, G. Giannakis, and Z.-Q. Luo, “Distributed estimation using

reduced-dimensionality sensor observations,” IEEE Trans. Signal Pro-

cessing, vol. 55, pp. 4284–4299, Aug. 2007.

[6] Z.-Q. Luo, G. Giannakis, and S. Zhang, “Optimal linear decentralized

estimation in a bandwidth constrained sensor network,” in Proc. 2005

Int. Symp. Inf. Theory (ISIT ), Sept. 2005, pp. 1441–1445.

[7] K. Zhang, X. Li, P. Zhang, and H. Li, “Optimal linear estimation fu-

sion—Part VI: Sensor data compression,” in Proc. 2003 Sixth Int. Conf.

Inf. Fusion, 2003, vol. 1, pp. 221–228.

[8] Y. Zhu, E. Song, J. Zhou, and Z. You, “Optimal dimensionality re-

duction of sensor data in multisensor estimation fusion,” IEEE Trans.

Signal Processing, vol. 53, pp. 1631–1639, May 2005.

[9] A. Bertrand and M. Moonen, “Robust distributed noise reduction in

hearing aids with external acoustic sensor nodes,” EURASIP J. Adv.

Signal Process., vol. 2009, p. 14, 2009, 10.1155/2009/530435, Article

ID 530435.

[10] S. Doclo, T. van den Bogaert, M. Moonen, and J. Wouters, “Reduced-

bandwidth and distributed MWF-based noise reduction algorithms for

binaural hearing aids,” IEEE Trans. Audio, Speech, Language Process.,

vol. 17, pp. 38–51, Jan. 2009.

BERTRAND AND MOONEN: DANSE IN FULLY CONNECTED SENSOR NETWORKS—PART I 5291

[11] T. Klasen, T. Van den Bogaert, M. Moonen, and J. Wouters, “Binaural

noise reduction algorithms for hearing aids that preserve interaural time

delay cues,” IEEE Trans. Signal Processing, vol. 55, pp. 1579–1585,

April 2007.

[12] S. Doclo, T. Klasen, T. Van den Bogaert, J. Wouters, and M. Moonen,

“Theoretical analysis of binaural cue preservation using multi-channel

Wiener ﬁltering and interaural transfer functions,” in Proc. Int. Work-

shop Acoust. Echo Noise Contr. (IWAENC), Paris, France, Sep. 2006.

[13] A. Bertrand and M. Moonen, “Distributed adaptive estimation of cor-

related node-speciﬁc signals in a fully connected sensor network,” in

Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP),

Apr. 2009, pp. 2053–2056.

[14] A. Bertrand and M. Moonen, “Distributed adaptive node-speciﬁc signal

estimation in fully connected sensor networks—Part II: Simultaneous

and asynchronous node updating,” IEEE Trans. Signal Process., vol.

58, no. 10, pp. 5292–5306, Oct. 2010.

[15] G. H. Golub and C. F. van Loan, Matrix Computations, 3rd ed. Bal-

timore, MD: The Johns Hopkins University Press, 1996.

[16] J. C. Bezdek and R. J. Hathaway, “Some notes on alternating optimiza-

tion,” in Advances in Soft Computing. Berlin, Germany: Springer,

2002, pp. 187–195.

[17] D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Compu-

tation: Numerical Methods. Belmont, MA: Athena Scientiﬁc, 1997.

Alexander Bertrand (S’08) was born in Roeselare,

Belgium, in 1984. He received the M.Sc. degree in

electrical engineering from Katholieke Universiteit

Leuven, Belgium, in 2007.

He is currently pursuing the Ph.D. degree with

the Electrical Engineering Department (ESAT),

Katholieke Universiteit Leuven, and was supported

by a Ph.D. grant of the Institute for the Promotion

of Innovation through Science and Technology in

Flanders (IWT-Vlaanderen). His research interests

are in multichannel signal processing, ad hoc sensor

arrays, wireless sensor networks, distributed signal enhancement, speech

enhancement, and distributed estimation.

Marc Moonen (M’94–SM’06–F’07) received the

electrical engineering degree and the Ph.D. degree

in applied sciences from Katholieke Universiteit

Leuven, Belgium, in 1986 and 1990, respectively.

Since 2004, he has been a Full Professor with

the Electrical Engineering Department, Katholieke

Universiteit Leuven, where he is heads a research

team working in the area of numerical algorithms

and signal processing for digital communications,

wireless communications, DSL, and audio signal

processing.

Dr. Moonen received the 1994 KU Leuven Research Council Award, the 1997

Alcatel Bell (Belgium) Award (with P. Vandaele), the 2004 Alcatel Bell (Bel-

gium) Award (with R. Cendrillon), and was a 1997 “Laureate of the Belgium

Royal Academy of Science.” He received a journal Best Paper award from the

IEEE TRANSACTIONS ON SIGNAL PROCESSING (with G. Leus) and from Elsevier

Signal Processing (with S. Doclo). He was chairman of the IEEE Benelux Signal

Processing Chapter (1998–2002), and is currently Past-President of European

Association for Signal Processing (EURASIP) and a member of the IEEE Signal

Processing Society Technical Committee on Signal Processing for Communica-

tions. He served as Editor-in Chief for the EURASIP Journal on Applied Signal

Processing (2003–2005), and has been a member of the editorial board of Inte-

gration, the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II (2002–2003)

and IEEE SIGNAL PROCESSING MAGAZINE (2003–2005) and Integration, the

VLSI Journal. He is currently a member of the editorial board of EURASIP

Journal on Advances in Signal Processing, EURASIP Journal on Wireless Com-

munications and Networking, and Signal Processing.

Distributed Combined Channel Estimation and Optimal Uplink Receive Combining for User- Centric Cell-Free Massive MIMO Systems

Article

Full-text available

Jan 2024

Cell-free massive MIMO (CFmMIMO) is considered as one of the enablers to meet the demand for increasing data rates of next generation (6G) wireless communications. In user-centric CFmMIMO, each user equipment (UE) is served by a user-selected set of surrounding access points (APs), requiring efficient signal processing algorithms minimizing inter-AP communications, while still providing a good quality of service to all UEs. This paper provides algorithms for channel estimation (CE) and uplink (UL) receive combining (RC), designed for CFmMIMO channels using different assumptions on the structure of the channel covariances. Three different channel models are considered: line-of-sight (LoS) channels, non-LoS (NLoS) channels (the common Rayleigh fading model) and a combination of LoS and NLoS channels (the general Rician fading model). The LoS component introduces correlation between the channels at different APs that can be exploited to improve the CE and the RC. The channel estimates and receive combiners are obtained in each AP by processing the local antenna signals of the AP, together with compressed versions of all the other antenna signals of the APs serving the UE, during UL training. To make the proposed method scalable, the distributed user-centric channel estimation and receive combining (DUCERC) algorithm is presented that significantly reduces the necessary communications between the APs. The effectiveness of the proposed method and algorithm is demonstrated via numerical simulations.

SAMbA: Speech enhancement with Asynchronous ad-hoc Microphone Arrays

Preprint

Full-text available

Jul 2023

Speech enhancement in ad-hoc microphone arrays is often hindered by the asynchronization of the devices composing the microphone array. Asynchronization comes from sampling time offset and sampling rate offset which inevitably occur when the microphones are embedded in different hardware components. In this paper, we propose a deep neural network (DNN)-based speech enhancement solution that is suited for applications in ad-hoc microphone arrays because it is distributed and copes with asynchronization. We show that asynchronization has a limited impact on the spatial filtering and mostly affects the performance of the DNNs. Instead of resynchronising the signals, which requires costly processing steps, we use an attention mechanism which makes the DNNs, thus our whole pipeline, robust to asynchronization. We also show that the attention mechanism leads to the asynchronization parameters in an unsupervised manner.

Reverberation-Robust Self-Calibration and Synchronization of Distributed Microphone Arrays by Mitigating Heteroscedasticity and Outlier Occurrence in TDoA Measurements

Article

Full-text available

Dec 2023
SENSORS-BASEL

The network of distributed microphone arrays is usually established in an ad hoc manner; hence, network parameters such as the mutual positioning and rotation of the arrays, positions of sources, and synchronization of their recording onset times are initially unknown. In this article, we consider the problem of passively jointly self-calibrating and synchronizing distributed arrays in reverberant rooms. We use a typical two-step approach where, initially, the relative geometry of the network is estimated using Direction of Arrival (DoA) measurements. Subsequently, the absolute scale and synchronization parameters are estimated using Time Difference of Arrival (TDoA) measurements. This article presents methods to improve the robustness and accuracy of estimation of the absolute geometric scaling and synchronization parameters in reverberant conditions, in which TDoA measurements do not follow a normal distribution; furthermore, outliers often occur. To remedy these issues, we propose a Weighted Least Squares (WLS) estimator and schema for weighting the TDoA measurements to increase the estimation accuracy from heteroscedastic TDoA measurements. In addition, we propose an iterative reweighing algorithm with a binary weight to detect and reject TDoA outliers, which exploits the residuals of the parametric model in the least absolute value minimization. A numerical evaluation shows significant improvements in the proposed method over the state of the art in terms of the relative scaling error and mean absolute value of the synchronization parameters.

Scalable High-Dimensional Multivariate Linear Regression for Feature-Distributed Data

Preprint

Full-text available

Jul 2023

Feature-distributed data, referred to data partitioned by features and stored across multiple computing nodes, are increasingly common in applications with a large number of features. This paper proposes a two-stage relaxed greedy algorithm (TSRGA) for applying multivariate linear regression to such data. The main advantage of TSRGA is that its communication complexity does not depend on the feature dimension, making it highly scalable to very large data sets. In addition, for multivariate response variables, TSRGA can be used to yield low-rank coefficient estimates. The fast convergence of TSRGA is validated by simulation experiments. Finally, we apply the proposed TSRGA in a financial application that leverages unstructured data from the 10-K reports, demonstrating its usefulness in applications with many dense large-dimensional matrices.

Distributed Kalman Filtering for Speech Dereverberation and Noise Reduction in Acoustic Sensor Networks

Article

Dec 2023

In acoustic sensor networks (ASNs), the desired speech signal is commonly corrupted by the reverberation and background noise. To solve this problem, the distributed Kalman filtering method for joint dereverberation and noise reduction is proposed in this paper. To be specific, the Kalman filter with dereverberation and noise reduction is first introduced for one node in ASNs, where the multi-channel linear prediction and the sidelobe-cancellation filter are employed, and both filters are jointly estimated by a single Kalman filter. Then, the local distributed Kalman filter (localDKF) for joint dereverberation and noise reduction is presented in ASNs by only exchanging the measurements among nodes, where every node based on the local observation and the neighboring interaction can obtain the speech source estimation. Finally, to enable nodes to communicate with their neighbors in an isotropic manner, the diffusion-based distributed Kalman filter (diffDKF) approach is proposed by fusing all available information among nodes. The proposed method can jointly perform dereverberation and noise reduction in a fully distributed solution by communicating only with neighboring nodes. Experimental results show the validity of the proposed method in noisy and reverberant ASNs.

Statistically Optimal Joint Multimicrophone MAP Estimators Under Super-Gaussian Assumption

Article

Full-text available

Nov 2023
CIRC SYST SIGNAL PR

This paper presents two super-Gaussian-based multimicrophone maximum a posteriori (MAP) estimators which exploit both amplitude and phase of speech signal from noisy observations. It is well known that super-Gaussian distributions model the statistical properties of speech signal more accurately. Under the independent Gaussian statistical assumption for noise signals, which is usually valid in wireless acoustic sensor networks, two joint multimicrophone estimators are derived while the speech signal is modeled by super-Gaussian distribution. Since the microphones are distributed randomly and may also belong to different devices, the independency assumption of noise signals is more reasonable in these networks. The performance of the proposed estimators is compared to that of four baseline estimators; the first is the multimicrophone minimum mean square error (MMSE) estimation, where both amplitude and phase are derived assuming Gaussian properties for speech signal. The second baseline is the multimicrophone MAP-based amplitude estimator, that utilizes the super-Gaussian statistics to just obtain the amplitude of speech and keeps the phase unchanged. As the third one, we have considered a minimum variance distortion-less response filter followed by a super-Gaussian MMSE estimator. We have also compared the performance of the proposed estimators with the centralized multichannel Wiener filter. The simulation experiments demonstrate remarkable ability of the proposed estimators to enhance speech quality and intelligibility when the clean speech is degraded by a mixture of both point source interference and additive noise in reverberant environments.

A Distributed Adaptive Algorithm for Node-Specific Signal Fusion Problems in Wireless Sensor Networks

Conference Paper

Sep 2023

Distributed parameterized topology-independent noise reduction in acoustic sensor networks

Article

Oct 2023
APPL ACOUST

Online Distributed Detection of Sensor Networks with Delayed Information

Article

Aug 2023
J FRANKLIN I

Centralized Cascade Multi-Channel Noise Reduction and Acoustic Feedback Cancellation in a Wireless Acoustic Sensor And Actuator Network

Conference Paper

Jun 2023

“Theoretical analysis of binaural cue preservation using multi-channel Wiener filtering and interaural transfer functions,”

Article

Full-text available

Jan 2006

In this paper a theoretical analysis of the binaural cue preserva- tion of the multi-channel Wiener filter (MWF) is performed. We will prove that in the case of a single speech source the MWF perfectly preserves the binaural cues of the speech component, but changes the binaural cues of the noise component to the cues of the speech component. In addition, we show that by extend- ing the MWF cost function with terms related to the interaural transfer function it is possible to preserve the binaural cues of both the speech and the noise component, without considerably reducing the noise reduction performance.

Optimal linear decentralized estimation in a bandwidth constrained sensor network

Conference Paper

Full-text available

Oct 2005

Consider a bandwidth constrained sensor network in which a set of distributed sensors and a fusion center (FC) collaborate to estimate an unknown vector. Due to power and cost limitations, each sensor must compress its data in order to minimize the amount of information that need to be communicated to the FC. In this paper, we consider the design of a linear decentralized estimation scheme (DES) whereby each sensor transmits over a noisy channel to the FC a fixed number of real-valued messages which are linear functions of its observations, while the FC linearly combines the received messages to estimate the unknown parameter vector. Assuming each sensor collects data according to a local linear model, we propose to design optimal linear message functions and linear fusion function according to the minimum mean squared error (MMSE) criterion. We show that the resulting design problem is nonconvex and NP-hard in general, and identify two special cases for which the optimal linear DES design problem can be efficiently solved either in closed form or by semi-definite programming (SDP)

Reduced-Bandwidth and Distributed MWF-Based Noise Reduction Algorithms for Binaural Hearing Aids

Article

Full-text available

Feb 2009

In a binaural hearing aid system, output signals need to be generated for the left and the right ear. Using the binaural multichannel Wiener filter (MWF), which exploits all microphone signals from both hearing aids, a significant reduction of background noise can be achieved. However, due to power and bandwidth limitations of the binaural link, it is typically not possible to transmit all microphone signals between the hearing aids. To limit the amount of transmitted information, this paper presents reduced-bandwidth MWF-based noise reduction algorithms, where a filtered combination of the contralateral microphone signals is transmitted. A first scheme uses a signal-independent beamformer, whereas a second scheme uses the output of a monaural MWF on the contralateral microphone signals and a third scheme involves an iterative distributed MWF (DB-MWF) procedure. It is shown that in the case of a rank-1 speech correlation matrix, corresponding to a single speech source, the DB-MWF procedure converges to the binaural MWF solution. Experimental results compare the noise reduction performance of the reduced-bandwidth algorithms with respect to the benchmark binaural MWF. It is shown that the best performance of the reduced-bandwidth algorithms is obtained by the DB-MWF procedure and that the performance of the DB-MWF procedure approaches quite well the optimal performance of the binaural MWF.

Distributed Adaptive Node-Specific Signal Estimation in Fully Connected Sensor Networks—Part II: Simultaneous and Asynchronous Node Updating

Article

Full-text available

Nov 2010

In this paper, we revisit an earlier introduced distributed adaptive node-specific signal estimation (DANSE) algorithm that operates in fully connected sensor networks. In the original algorithm, the nodes update their parameters in a sequential round-robin fashion, which may yield a slow convergence of the estimators, especially so when the number of nodes in the network is large. When all nodes update simultaneously, the algorithm adapts more swiftly, but convergence can no longer be guaranteed. Simulations show that the algorithm then often gets locked in a suboptimal limit cycle. We first provide an extension to the DANSE algorithm, in which we apply an additional relaxation in the updating process. The new algorithm is then proven to converge to the optimal estimators when nodes update simultaneously or asynchronously, be it that the computational load at each node increases in comparison with the algorithm with sequential updates. Finally, based on simulations it is demonstrated that a simplified version of the new algorithm, without any extra computational load, can also provide convergence to the optimal estimators.

Instrumenting The World With Wireless Sensor Networks

Conference Paper

Full-text available

Jan 2001
Acoust Speech Signal Process

Pervasive micro-sensing and actuation may revolutionize the way in which we understand and manage complex physical systems: from airplane wings to complex ecosystems. The capabilities for detailed physical monitoring and manipulation offer enormous opportunities for almost every scientific discipline, and it will alter the feasible granularity of engineering. We identify opportunities and challenges for distributed signal processing in networks of these sensing elements and investigate some of the architectural challenges posed by systems that are massively distributed, physically-coupled, wirelessly networked, and energy limited

Some Notes on Alternating Optimization

Conference Paper

Full-text available

Feb 2002

Let f : ℜs ↦ ℜ be a real-valued scalar field, and let x = (x1,…, xs)T ∈ ℜs be partitioned into t subsets of non-overlapping variables as x = (X1,…,Xt )T, with Xi ∈ ℜp 1, for i = 1,…, t, ∈t i=1pi = s Alternating optimization (AO) is an iterative procedure for minimizing (or maximizing) the function f(x) = f(X1,X2,…,Xt) jointly over all variables by alternating restricted minimizations over the individual subsets of variables X1,…,Xt. AO is the basis for the c-means clustering algorithms (t=2), many forms of vector quantization (t = 2, 3 and 4), and the expectation-maximization (EM) algorithm (t = 4) for normal mixture decomposition. First we review where and how AO fits into the overall optimization landscape. Then we discuss the important theoretical issues connected with the AO approach. Finally, we state (without proofs) two new theorems that give very general local and global convergence and rate of convergence results which hold for all partitionings of x.

Distributed adaptive estimation of correlated node-specific signals in a fully connected sensor network

Conference Paper

Full-text available

Apr 2009
Acoust Speech Signal Process

We introduce a distributed adaptive estimation algorithm operating in an ideal fully connected sensor network. The algorithm estimates node-specific signals at each node based on reduced-dimensionality sensor measurements of other nodes in the network. If the node-specific signals to be estimated are linearly dependent on a common latent process with a low dimension compared to the dimension of the sensor measurements, the algorithm can significantly reduce the required communication bandwidth and still provide the optimal linear estimator at each node as if all sensor measurements were available in every node. Because of its adaptive nature and fast convergence properties, the algorithm is suited for real-time applications in dynamic environments, such as speech enhancement in acoustic sensor networks.

Incremental Adaptive Strategies Over Distributed Networks

Article

Aug 2007

An adaptive distributed strategy is developed based on incremental techniques. The proposed scheme addresses the problem of linear estimation in a cooperative fashion, in which nodes equipped with local computing abilities derive local estimates and share them with their predefined neighbors. The resulting algorithm is distributed, cooperative, and able to respond in real time to changes in the environment. Each node is allowed to communicate with its immediate neighbor in order to exploit the spatial dimension while limiting the communications burden at the same time. A spatial-temporal energy conservation argument is used to evaluate the steady-state performance of the individual nodes across the entire network. Computer simulations illustrate the results.

Diffusion Recursive Least-Squares for Distributed Estimation Over Adaptive Networks

Article

May 2008

We study the problem of distributed estimation over adaptive networks where a collection of nodes are required to estimate in a collaborative manner some parameter of interest from their measurements. The centralized solution to the problem uses a fusion center, thus, requiring a large amount of energy for communication. Incremental strategies that obtain the global solution have been proposed, but they require the definition of a cycle through the network. We propose a diffusion recursive least-squares algorithm where nodes need to communicate only with their closest neighbors. The algorithm has no topology constraints, and requires no transmission or inversion of matrices, therefore saving in communications and complexity. We show that the algorithm is stable and analyze its performance comparing it to the centralized global solution. We also show how to select the combination weights optimally.

Diffusion Least-Mean Squares Over Adaptive Networks: Formulation and Performance Analysis

Article

Jul 2008

We formulate and study distributed estimation algorithms based on diffusion protocols to implement cooperation among individual adaptive nodes. The individual nodes are equipped with local learning abilities. They derive local estimates for the parameter of interest and share information with their neighbors only, giving rise to peer-to-peer protocols. The resulting algorithm is distributed, cooperative and able to respond in real time to changes in the environment. It improves performance in terms of transient and steady-state mean-square error, as compared with traditional noncooperative schemes. Closed-form expressions that describe the network performance in terms of mean-square error quantities are derived, presenting a very good match with simulations.

Distributed Adaptive Node-Specific Signal Estimation in Fully Connected Sensor Networks—Part I: Sequential Node Updating

Abstract and Figures

Recommended publications

Energy Detection for MIMO Decision Fusion in Underwater Sensor Networks

Charging Wireless Sensor Networks with Mobile Charger and Infrastructure Pivot Cluster Heads

Distributed Node-Specific LCMV Beamforming in Wireless Sensor Networks

Data Fusion in a Multi-Target Radar Sensor Network