ArticlePDF Available

Urban Surveillance Systems: From the Laboratory to the Commercial World

November 2001
Proceedings of the IEEE 89(10):1478 - 1497

November 2001
89(10):1478 - 1497

DOI:10.1109/5.959342

Source
IEEE Xplore

Authors:

Ioannis Pavlidis

University of Houston

Vassilios Morellas

University of Minnesota Twin Cities

Panagiotis Tsiamyrtzis

Athens University of Economics and Business

Research in the surveillance domain was confined for years in the military domain. Recently, as military spending for this kind of research was reduced and the technology matured, the attention of the research and development community turned to commercial applications of surveillance. In this paper we describe a state-of-the-art monitoring system developed by a corporate R&D lab in cooperation with the corresponding security business units. It represents a sizable effort to transfer some of the best results produced by computer vision research into a viable commercial product. Our description spans both practical and technical issues. From the practical point of view we analyze the state of the commercial security market, typical cultural differences between the research team and the business team and the perspective of the potential users of the technology. These are important issues that have to be dealt with or the surveillance technology will remain in the lab for a long time. From the technical point of view we analyze our algorithmic and implementation choices. We describe the improvements we introduced to the original algorithms reported in the literature in response to some problems that arose during field testing. We also provide extensive experimental results that highlight the strong points and some weaknesses of the prototype system

The security service market by region in U.S. dollars (billions). The numbers after the year 2000 are projections (source: The Freedonia Group).

…

Architecture of the DETER system.

…

Camera configuration scheme for DETER in the HL parking lot. This figure depicts the FOVs for the day channels of the cameras.

…

Camera configuration scheme for DETER in the HL parking lot. This figure depicts the FOVs for the night channels of the cameras.

…

+11

Camera configuration scheme in the HL parking lot before DETER. The ad-hoc and inadequate coverage is obvious.

…

Figures - uploaded by Vassilios Morellas

Content may be subject to copyright.

Content uploaded by Vassilios Morellas

Content may be subject to copyright.

Urban Surveillance Systems: From the

Laboratory to the Commercial World

IOANNIS PAVLIDIS, SENIOR MEMBER, IEEE, VASSILIOS MORELLAS, MEMBER, IEEE,

PANAGIOTIS TSIAMYRTZIS,

AND STEVE HARP

Invited Paper

Research in the surveillance domain was confined for years in

the military domain. Recently, as military spending for this kind

of research was reduced and the technology matured, the atten-

tion of the research and development community turned to com-

mercial applications of surveillance. In this paper we describe a

state-of-the-art monitoring system developed by a corporate R&D

lab in cooperation with the corresponding security business units.

It represents a sizable effort to transfer some of the best results

produced by computer vision research into a viable commercial

product. Our description spans both practical and technical issues.

From the practical point of view we analyze the state of the com-

mercial security market, typical cultural differences between the re-

search team and the business team and the perspectiveofthepoten-

tial users of the technology. These are important issues that have to

be dealt with or the surveillance technology will remain in the lab

for a long time. From the technical point of view we analyze our

algorithmic and implementation choices. We describe the improve-

ments we introduced to the original algorithms reported inthe liter-

ature in response to some problems that arose during field testing.

We also provide extensive experimental results that highlight the

strong points and some weaknesses of the prototype system.

Keywords—Multicamera fusion, object tracking, security

market, security system, surveillance system, threat assessment.

I. INTRODUCTION

Thecurrentsecurity infrastructurecouldbe summarizedas

follows. 1) Security systems act locally and they do not co-

operate in an effective manner. 2) Very high value assets are

inadequately protected by antiquated technology systems.

3) Reliance on intensive human concentration to detect and

assess threats.

Manuscript received October 16, 2000; revised June 7, 2001. The proto-

type DETER system was funded by two major Research Intiative awards

from Honeywell Labs.

I. Pavlidis, V. Morellas, and S. Harp are with Honeywell Laboratories,

Minneapolis, MN 55418 USA.

P. Tsiamyrtzis is with the School of Statistics, University of Minnesota,

Minneapolis, MN 55455 USA.

Publisher Item Identifier S 0018-9219(01)08433-X.

Taking into account the above state of commercial art and

the maturation of surveillance research many R&D teams,

such as ours, thought that the transfer of surveillance tech-

nology to production is not only warranted but also easy. In

thiscontext,ourteamundertookamajoreffortincoordination

with the Honeywell security business units to field one of the

first advanced urban surveillance products. In our endeavor

we came to learn that good laboratory technology should be

supported by deep knowledge of the business, market, and

userrealities to become asuccess story. Actually,we can now

corroborate that in certain cases technology transfer can be

as challenging as the basic research that preceded it.

The result of our endeavor is Detection of Events for

Threat Evaluation and Recognition (DETER), a prototype

urban surveillance system aimed for the high end of the

security market. DETER can be seen as an attempt to bridge

the gap between current systems reporting isolated events

and an automated cooperating network capable of inferring

and reporting threats, a function currently being performed

by humans. The prototype DETER system is installed at the

parking lot of Honeywell Laboratories (HL) in Minneapolis.

The computer vision module of DETER is reliably tracking

pedestrians and vehicles and is reporting their annotated

trajectories to the threat assessment module for evaluation.

DETER features a systematic optical and system design that

sets it apart from “toy” surveillance systems.

In Section II of this paper, we analyze the current state

of the security market and how it affected our research and

development effort. Then, in Section III, we move on to de-

scribe the recent technical developments reported in the re-

search literature. Sections IV–VIII describe and analyze the

characteristics of our prototype surveillance system. In Sec-

tion IX, we report extensive experimental results from actual

field operations that highlight some strong as well as some

weak points of DETER. Finally, in Section X, we conclude

our paper by summarizing the business and technical results,

drawing conclusions, and outlining our strategic and tactical

plan for the future.

1478 PROCEEDINGS OF THE IEEE, VOL. 89, NO. 10, OCTOBER 2001

Authorized licensed use limited to: Universitat Autonoma De Barcelona. Downloaded on March 10,2010 at 05:34:51 EST from IEEE Xplore. Restrictions apply.

II. THE CURRENT STATE OF THE SECURITY MARKET

The security business has a surprisingly long history. For

example,Pinkerton,oneofthepremiersecurityservicescom-

panies, recently celebrated its 150th anniversary. Tradition-

ally, the security industry relies primarily on its human re-

sources.Technology is not alwayshighlyregarded and some-

times is viewed withsuspicion. The last universally accepted

technological change in the security industry is the adoption

of radio communication between guarding parties. Many of

usmay havetheimpression thatanalog video recording isan-

otheruniversallyadoptedtechnologybythesecurityindustry.

This is, however, far from true. There are significant portions

of the security market that do not use video recording at all

and rely exclusively on human labor. A good example is the

majority of stake-out operations performed by law enforce-

ment agencies in the United States.

An understanding of the industry’s peculiarities and the

forces that shape up its current profile is essential for anyone

who is interested to perform technology transfer in the secu-

ritydomain. Below,we enumerate whatwe consider the most

important characteristics of the current security market.

Low Profit Margin: The security market is very cost

sensitive. One can identify two major segments in the se-

curity market: the Home Security and the Building Security.

The competition in the Home Security segment is fierce and

the profitmarginvery low. The averagemonthly subscription

to a home security service in U.S. is about $20 per month in

year2000 valuation.The initial installation cost sometimesis

waived as a means to attract customers. The Building Secu-

rity segment is at the upper end of the market but still cost is

a major issue andthe volume ofthis segment is much smaller

than the Home Security segment. In an era where quarterly

profits make or brake corporate giants in areas with much

higher profit margin, the security industry always struggles

to “make the numbers.” Its strategic horizon usually does

not extend beyond six months. It is frequently cited in the

technical literature that the current low cost of computational

power and cameras will open up the way for the automation

of surveillance products and services. As it turns out, “low

cost” is a relative term. A Pentium II PC box running at 233

MHz and priced around $200 in year 2000 valuation is con-

sidered a high-end device with a substantial price tag.

Resistanceto Change: Likemost traditional industries,

the security industry is not an advocate of innovation by na-

ture. Partof theproblem is that someof its customers are also

resistant to change. A typical example is the failed attempt to

introduce GPS receivers into police cruisers in several North

American cities. The GPS receivers would enable police de-

partments to know exactly the position of all theircruisers all

the time. There are obvious benefits to personnel safety and

resource scheduling from this technology. Thepolice unions,

however, opposed the plan because they considered it as an

invasion of privacy to the lives of the individual officers.

Low-Tech Culture: The security industry is permeated

by low-tech culture. The management and the engineers of

the security business units are trained and grown within a

low-tech environment and are ignorant and suspicious of

state-of-the-art developments. Their users and customers are

often underpaid and undereducated security guards that also

view high technology with skepticism.

Hardware Mentality: The most advanced members of

the security industry are probably the camera manufacturers.

Even these, although they produce some advanced electronic

products, have difficulty outfitting them with the necessary

software. An example is Sony, which recently produced

some excellent 1394 security cameras like the DFW-VL500

and started selling them in the market without the necessary

software drivers. Since these cameras can send their video

output only to a computer, without software drivers they

were useless.

The problem is compounded by the mentality of the re-

search community thatcan supportthe security industry with

advanced video surveillance concepts. Computer vision re-

searchers both in academia and corporate labs used to per-

form research for military surveillance projects where cost

was not an issue. Even corporate researchers that performed

research and development with a commercial application in

mind used to do that in isolation hypothesizing the problems

and need to be addressed. In most cases, the development

concluded with a demo without addressing system design

issues and without performing some rudimentary cost and

market analysis. When theytried to sell the ideato a business

unit for productization, the result was a predictable failure.

Despite the presence of many negative factors, the future

of the security industry can be viewed only in positive light.

And although the transformation of the industry and the

market will take time to complete, it has already started hap-

pening in small steps. As a result of upcoming technology

offerings, the Freedonia group is projecting significant

growth of the security market during the next several years

(see Fig. 1). This growth will fuel further research and

development and will hopefully bootstrap the process of

incorporating the security industry to the new economy.

Taking into account the practical realities, we decided to

cooperate very closelyboth withthe business unit that would

ultimately productize our surveillance prototype as well as

with potential customers. Out of this cooperation we quickly

formed a very specific strategy.

1) The prototype should be developed and tested within

an actual environment and not in the lab. This would

be the ultimate proof of its fitness.

2) The prototype system should address security needs

of buildings and not homes since the profit margin of

a potential building product would be far greater and

the competition in this market segment less fierce.

3) The prototype system should be geared toward

perimeter surveillance and not toward indoor building

surveillance. Admittedly, indoor building surveillance

correlates more aptly with the notion of “big brother”

and would generate bad publicity among the cus-

tomers’workforceforourperspectiveproductoffering.

ThemajorityofU.S.businessbuildingsaresurrounded

by parking lots. So it is the case with the perimeter of

suburban malls and other public places. Therefore,

we paid some particular attention to the parking lot

scenariowithoutoverconstrainingourselves.

PAVLIDIS et al.: URBAN SURVEILLANCE SYSTEMS 1479

Authorized licensed use limited to: Universitat Autonoma De Barcelona. Downloaded on March 10,2010 at 05:34:51 EST from IEEE Xplore. Restrictions apply.

Fig. 1 The security service market by region in U.S. dollars (billions). The numbers after the year

2000 are projections (source: The Freedonia Group).

4) We did not try to invent needs. We paid particular at-

tention to thereal needsof ourpotential customersand

tried to address them through state-of-the-art techno-

logical solutions. Two potential customers were inter-

viewed extensively and their needs were factored in

the design of the system to the degree possible. One

customer was the security personnel of our building

and the other was the Dade County Sheriff’s office

in Miami. One could group their feedback into two

trends. The security personnel emphasized the neces-

sity for a few simple automated alarms. An example

was the capture of any vehicle that enters the parking

lot of the building after hours. The sheriff’s office

showed an interest for the capture of more compli-

cated traffic patterns. An example was a vehicle that

enters the parking lot and exits after wandering for a

while without ever parking. Also, the sheriff’s office

was placing a lot of emphasis on the portability of the

system. They wanted a system that would be able to

set easily and quickly, use it for a period, and then

move it to another location. This is consistent with the

mode of stakeout operations they perform. We decided

to accommodate in the system design the detection of

simple as well as somewhat more complicated traffic

patterns but leave out any portability considerations.

We determined thatthe portabilityquestion wouldsub-

stantially increase the technical risk and the develop-

ment cost while it is of importance only to a small per-

centage of the customer base (law enforcement agen-

cies). Our strategyis to address theportability problem

ina subsequentstage ofdevelopmentafterour baseline

product offering generates some revenue first.

5) We decided to design the system in a manner that

would be able to perform multiple functions (beyond

the securitydomain). This would increase its appeal to

potential customers. We have particularly focused on

analyzing traffic statistics for the benefit of building

operations. For example, traffic statistics may provide

an insight into parking lot utilization during different

times and days. This insight can support a functional

redesign of the open space to better facilitate trans-

portation and safety needs.

6) On one hand, the cost of the hardware components

and installation for the system should be kept at the

minimum because of the cost sensitivity of the secu-

rity industry. On the other hand, the computer vision

algorithms require substantial computational power

and full coverage of the surveyed area. As a way of

compromise, we chose not the low-end processors

(233 MHz), currently in wide use by the security

industry, but rather mid-end processors (500 MHz)

that we project will become the mainstay at about

the same time the prototype system will move into

production in 2002. We also identified the need for an

optimization method that, given the CAD design of

the surveyed area, will produce the minimum number

of cameras and their locations for full coverage.

7) We decided to choose off-the-self hardware and soft-

ware development components and adopt an open ar-

chitecture strategy. For example, we used off-the-shelf

PCs, cameras, and nonembedded software tools. This

was a radical move in the framework of an industry

that is used to produce “proprietary systems.” Our ra-

tionale is that open systems reduce development and

maintenance cost and time. Open systems can also

capitalize upon existing assets at the customer’s site

and make the technology transition more appealing.

We also believe that nowadays the best way to outma-

1480 PROCEEDINGS OF THE IEEE, VOL. 89, NO. 10, OCTOBER 2001

Authorized licensed use limited to: Universitat Autonoma De Barcelona. Downloaded on March 10,2010 at 05:34:51 EST from IEEE Xplore. Restrictions apply.

neuver the competition is not by building proprietary

systems, but by delivering continuous value to thecus-

tomersthrough innovationand streamlined operations.

8) In addressing the technical challenges for our surveil-

lance system, we decided notto startfrom scratch. The

purpose of a corporate R&D effort is not innovation

for the sake of innovation but innovation for the sake

of results. We performed a careful evaluation of the

technicalliteraturetofindanappropriatestartingpoint.

Then,wefilledupthegapsandimprovedtheinitialidea

in step with our experimental experience and results.

III. R

ELEVANT TECHNICAL WORK

The computer vision community has performed extensive

research in the area of video-based surveillance for the

past 20 years. Initially, the research was focused almost

exclusively to military applications and employed nonvisible

band cameras (e.g., thermal, laser, and radar). The emphasis

was on the recognition of military targets (automatic target

recognition or ATR). An interesting survey of this type of

work can be found in [1]. Upon the end of the cold war in the

1990s,attentionshiftedgraduallytosurveillanceapplications

in nonmilitary settings using visible band cameras. The

research emphasis was also shifted from object recogni-

tion to tracking of human and vehicular motion. Even the

military participated in this research shift to prepare for the

so called “asymmetric threat.” Asymmetric threat refers

to the possibility of terrorist activities against animate and

inanimate government assets (e.g., government officials,

embassies, etc.). The Video Surveillance and Monitoring

programexemplifiestheshifttourban surveillancescenarios.

The VSAM program was funded by DARPA in 1997-99 [2],

[3] and pushed the state of the art to a point where future

commercial application of the technology is not unthinkable

anymore. No large-scale research and development effort

has been undertaken since then in the area of surveillance.

Isolatedresearch efforts,however, continued to push the state

of the art in a variety of urban surveillance applications [4],

[5]. In these efforts, we witness increased participation by

commercial R&D labs [6], [7].

The latest research activities in the area of commer-

cial surveillance applications are ripe as they are aided by

improvements in the computational power, the camera tech-

nology, and the introduction of robust statistical methods in

computer vision. All these research efforts try to address to

one degree or the other the fundamental urban surveillance

question:

motiondetection.Ifasystemcanreliablydetectmo-

tion,thenitcanreasonaboutmotionpatterns,recordintrusion,

and issue alerts (reason-record-issue).It isworth mentioning

thatexistingcommercialsecurity systemscannot performthe

sequence of the above three functions. They rely exclusively

on human attention and labor to close the feedback loop.

Some research groups reach a lot further than the basic

reason-record-issue paradigm and perform research on an-

alyzing human motion or modeling human interactions [8],

[9]. Although these investigations are scientifically elegant,

their value to the security industry in the near- and mid-term

is minimal. The industryis preoccupied witha lot moremun-

dane problems at the moment.

A variety of moving object segmenters has been reported

in the literature. There are two conventional approaches to

moving object segmentation with respect to a static camera:

temporal differencing [10] and background subtraction [11].

Temporal differencing is very adaptive to dynamic environ-

ments, but generally does a poor job of extracting all the

relevant object pixels. Background subtraction provides the

most complete object data, but is extremely sensitive to dy-

namic scene changes due to lighting and extraneous events.

Most researchers have abandoned nonadaptive methods of

backgrounding, which are useful only in highly supervised,

short-term tracking applications without significant changes

in the scene. More recent adaptive backgrounding methods

[12] can cope much better with environmental dynamism.

They still, however, cannot handle bimodal backgrounds and

have problems in scenes with many moving objects. Stauffer

et al. [13], [14] haveproposeda more advancedobject detec-

tion method based on a mixture of Normals representation at

the pixel level. This method features a far better adaptability

and can handle even bimodal backgrounds (e.g., swaying

tree branches). The secret is in the powerful representation

scheme. Each Normal reflects the expectation that samples

of the same scene point are likely to display Gaussian noise

distributions. The mixture of Normals reflects the expecta-

tion that more than one process may be observed over time.

Elgammal et al., in [15], propose a generalization of the

Normal mixture model where density estimation is achieved

through a Normal kernel function. Their method features

some improved behavior with respect to the method pro-

posed in [13] including the suppression of shadows. In gen-

eral, the mixture of Normals paradigm is not only theoreti-

cally elegant but has also produced promising test results in

challenging outdoor conditions. It is for this reason we chose

it as the baseline algorithm for our surveillance system.

Clearly, most of the research in urban surveillance system

was directed toward moving object segmentation. There is a

good reason for that since failures at the segmentation level

can seal the fate of the entire surveillance system. Never-

theless, a comprehensive surveillance system involves ad-

ditional technologies to moving object segmentation. These

technologiesincludetracking, multicamerafusion, andthreat

assessment. The research community addressed these prob-

lems to various degrees. Tracking refers to the association of

segmented objects across the timeline. Thetracking methods

employed in surveillancesystems are usually borrowed from

research performed for radar applications. The issue of mul-

ticamera fusion is an important one for seamless tracking in

large open spaces that cannot be covered by a single camera.

Researchers have addressed thisissue in the surveillance and

other contexts [16]–[18]. The interested reader can look at

[19] for a thorough presentation of the relevant mathematics.

The stage of threat assessment is the least explored. It is the

one that interfaces with the human operator, however, and in

this respect isvery important. InSection VIII, we present our

approach to threat assessment, which focuses on the detec-

tion of a few threatening motion patterns.

PAVLIDIS et al.: URBAN SURVEILLANCE SYSTEMS 1481

Authorized licensed use limited to: Universitat Autonoma De Barcelona. Downloaded on March 10,2010 at 05:34:51 EST from IEEE Xplore. Restrictions apply.

IV. SYSTEM ARCHITECTURE

A comprehensive urban video surveillance system, such

as DETER, depends primarily on two different technologies:

computer vision and threat assessment. The computer vision

part consists of the optical and system design, the moving

object segmentation and tracking and the multicamera fu-

sion stages. The threat assessment part consists of the fea-

ture assembly, the off-line training, and the threat classifica-

tion stages (see Fig. 2). We will give a brief overview of each

stage and compare our solutions to others proposed in thelit-

erature.

Our system is probably the only one that features a formal

optical and system design stage. Most of the efforts reported

in the literature had as their main objective to demonstrate

the feasibility of a novel idea and they did not pay any atten-

tion to the practical aspects of fielding a surveillance system.

There is a number of requirements that a surveillance system

needs to fulfill to function properly and be commercially vi-

able. First, it should ensure full coverage of the open space

or blind spots may pause the threat of a security breach. It is

often argued in the technical literature that video sensors and

computational power are getting cheaper and therefore can

be employed in mass to provide coverage for any open space

[20].In reality, things arenot so rosy. Mostof the cheapvideo

sensors still do not have the requiredresolution to accommo-

date high-quality object tracking. Both cheap and expensive

cameras also need to become weather proof for employment

outdoors, which increases their cost substantially. Then, it is

the issue of installation cost that includes the provision of

power and the transmission of video signals, sometimes at

significant distances from the building. The installation cost

for each camera is usually a figure many times its original

value. Even if there were no cost considerations, cameras

cannot be employed arbitrarily in public places. There are

restrictions due to the topography of the area (e.g., streets,

tree lines) and due to city and building ordinances (e.g., aes-

thetics). All these considerations severely curtail the allow-

able number and positions of cameras for an urban surveil-

lance system.

In addition to optical considerations there are also system

design considerations including the type of computational

resources, the computer network bandwidth, and the display

capabilities.Due to the cost sensitivityof the securitymarket,

all these become criticalissues andshould beaddressed inan

optimal manner.

We achieve motion segmentation through a multi-Normal

representation at the pixel level. Our method resembles the

method described in [14] with some interesting modifica-

tions. The method identifies foreground pixels in each new

frame while updating the description of each pixel’s mixture

model. The labeled foreground pixels can then be assem-

bled into objects using a connected components algorithm

[21]. Establishing correspondence of objectsbetween frames

(tracking) is accomplished using a linearly predictive mul-

tiple hypotheses tracking algorithm which incorporates both

position and size.

Fig. 2. Architecture of the DETER system.

No single camera is able to cover large open spaces, like

parking lots, in their entirety. Therefore, we need to fuse the

fields of view (FOV) of the various cameras into a coherent

super picture to maintain global awareness. We fuse (cali-

brate) multiple camerasby computingthe respective homog-

raphy matrices. The computation is based on the identifica-

tion of several landmark points in the common FOV between

camera pairs.

The threat assessmentportion of DETER consists of a fea-

ture assembly module followed by a threat classifier. Feature

assemblyextractsvarioussecurityrelevant statisticsfrom ob-

ject tracks and groups of tracks. The threat classifier decides

in real time wheteher a particular point in feature space con-

stitutes a threat. The classifier is assisted by an off-line threat

modeling component (see Fig. 2).

V. O

PTICAL AND SYSTEM DESIGN

Theoptical and overallsystemdesign for DETER includes

the specification of a camera set arrangement that optimally

covers the HL Minneapolis parking lot. It also includes the

specificationof thecomputational resourcesnecessary torun

the DETER algorithms in real-time. Finally, it includes the

specification of the display hardware and software.

The optical design effort, in particular, has the following

objectives.

1) Specify the camera model.

2) Specify the camera lens.

3) Specify the number of cameras.

4) Specify the camera locations.

We decided to employ dual channel camera systems.

These systems utilize a medium-resolution color camera

during the day and a high-resolution grayscale camera

during the night. Switching from day to night operation

is controlled automatically through a photo-sensor. The

dual channel technology capitalizes upon the fact that

1482 PROCEEDINGS OF THE IEEE, VOL. 89, NO. 10, OCTOBER 2001

Authorized licensed use limited to: Universitat Autonoma De Barcelona. Downloaded on March 10,2010 at 05:34:51 EST from IEEE Xplore. Restrictions apply.

color information in the low light conditions at night is

lost. Therefore, there is no reason for employing color

cameras during nighttime conditions. Instead we can em-

ploy cheaper and higher resolution grayscale cameras to

compensate for the loss of color information. We have

selected the camera model to be the DSE DS-5000 dual

channel system. The color day camera has a resolution of

. The grayscale night camera has a

resolution of

. The DSE DS-5000

camera system has a

mm vari-focal auto iris

lens for both day and night cameras. This permits us to vary

the FOV of the cameras from FOV

– .

We seek an optimal solution that provides coverage to the

entire parking lot area with the minimum number of cam-

eras and installation cost. There are practical constraints im-

posed by the topography of the area under surveillance. For

example, we cannot place a camera pole in the middle of the

road, existing poles and rooftops should be utilized to the

extent possible to reduce the installation cost and city codes

regarding the aesthetics have to be obeyed. Taking into ac-

count all these considerations we can delineate in the com-

puter-aided design (CAD) of the parking lot the possible in-

stallation sites. These are usually only a small fraction of the

entireopen area and, therefore, our search spaceis drastically

reduced.

The installation search space is reduced even further when

we consider the constraints imposed by the computer vision

algorithms. Specifically:

1) An urban surveillance system such as DETER is mon-

itoring two kind of objects: vehicles and people. In

terms of size, people are the smallest objects under

surveillance.Therefore,theirfootprintshould drive the

requirements for the limiting range of the cameras. In

turn, the determination of the limiting range will help

us to verify if there is any space in the parking lot that

is not covered under any given camera configuration.

2) Each camera should have an overlapping FOV with at

least one more camera. The overlapping arrangement

should be done in such a way, so that we are able to

transition from one camera to the other through in-

dexing of the overlapped areas and manage to visit all

the cameras in a unidirectional trip without encoun-

tering any discontinuity.

3) The overlapping in the FOVs should be between

25%–50% for the multicamera calibration algorithm

to perform reliably. This requirement stems from the

need to get several well-spread landmark points in

the common field of view for accurate homography.

Usually, a portion of the overlapping area cannot

be utilized for landmarking because it is covered

by nonplanar structures like tree lines. Therefore, at

times the common area between two cameras may be

required to cover as much as half of the individual

FOVs.

As we mentioned earlier, the DSE DS-5000 cameras fea-

ture a vari-focal lens with a FOV that can range between

44.4

and82.4 .We choosethe intermediate valueofFOV

as the basis of our calculations. To satisfy the overlap-

ping constraints, we may need to increase or decrease the

FOV of some of the cameras from this average value. The

camera placement algorithm proceeds as follows.

1) In one of the allowed installation sites place a camera

in such a way that its FOV borders the outer edge of

the parking lot.

2) Continue adding cameras around the initial camera

until you reach the next outer edge of the parking lot.

Make sure there is at least 25% overlapping in neigh-

boring FOVs.

3) Compute the limiting range of the installed cameras.

By knowing the FOV and the limiting range, we know

the full useful coverage area for each camera.

4) Continue with the next installation site that is just out-

side of the already coveredarea. Makesure thatat least

one of the new cameras overlaps at least 25% with one

of the previous cameras.

5) Repeat the abovethree steps untilthe entire parking lot

area is covered.

6) Make some post-processing adjustments. These in-

volve usually the increase or reduction of the FOV for

some of the cameras. This FOV adjustment is meant

to either trim some excessive overlapping or add some

extra overlapping in areas where there is little planar

space (lots of trees).

Of particular interest is the computation of the camera’s

limiting range

. It is computed from the equation

(IFOV)

where

is the smallest acceptable pixel footprint of

a human and IFOV is the instantaneous field of view.

Based on our experimental experience, the signature

of the human body should not become smaller than a

rectangle on the focal plane

array (FPA). Clusters with fewer than 27 pixels are likely to

be below the noise level. If we assume that the width of an

average person is about

in then the pixel footprint

. The IFOV is computed from the following

formula:

IFOV

FOV

where is the resolution for the camera.For FOV

and (color day camera), the lim-

iting range is

ft. For FOV and

(grayscale night camera), the lim-

iting range is

ft. In other words, between two

cameras with the same FOV, the higher resolution camera

has larger useful range. Conversely, if two cameras have

the same resolution, then the one with the smaller FOV has

larger useful range. During post-processing, we needed to

reduce the FOV (FOV

) in some of the lower resolu-

tion day camera channels to increase their effective range

limit. Extended tree lines in the HL parking lot necessitate

larger overlapping areas than the anticipated minimum.

PAVLIDIS et al.: URBAN SURVEILLANCE SYSTEMS 1483

Authorized licensed use limited to: Universitat Autonoma De Barcelona. Downloaded on March 10,2010 at 05:34:51 EST from IEEE Xplore. Restrictions apply.

Fig. 3. Camera configuration scheme for DETER in the HL

parking lot. This figure depicts the FOVs for the day channels of

the cameras.

Fig. 4. Camera configuration scheme for DETER in the HL

parking lot. This figure depicts the FOVs for the night channels

of the cameras.

A good optical design is essential to the success of

an urban surveillance system and many computer vision

projects often ignore this aspect altogether. The principles,

algorithms and computations weused forthe DETERoptical

design can be codified and automate the optical design of

future similar security systems in any other parking lot or

open area.

Our study concluded that seven cameras in the configu-

ration shown in Figs. 3 and 4 is the recommended arrange-

ment for our parking lot. We have assigned one standard PC

(500-MHz Pentium) for the processing requirements of each

camera. One of the seven PCs is designated as the server and

this is where the fusion of information from all seven cam-

eras takes place. As a way of comparison, see in Fig. 5 the

camera arrangement in the parking lot of our building before

DETER. The inadequate coverage is the typical outcome of

bad design and budgetary restrictions.

The fused video information is displayed in a 44-inch flat

panel display along with all the necessary annotation. This

comprehensive high-quality picture allows the security op-

Fig. 5. Camera configuration scheme in the HL parking lot before

DETER. The ad-hoc and inadequate coverage is obvious.

erator to maintain instant awareness without the distraction

of multiple fragmented views. It also underlines our ultimate

goal, which is the enhancement and not the replacement of

the role of the security guard.

Our design philosophy is geared toward open systems.

We chose to use standard NTSC cameras that are favored

by the security industry. We do not aim at developing smart

on-the-chip cameras. We project that the cost of developing

special hardware and embedded software is quite substan-

tial. Also, a product based on smart cameras would appeal

only to new customers. The management of existing build-

ings would much rather prefer to upgrade their legacy in-

frastructure than scrap it altogether. With our design, they

can use their old cameras and possibly add a few more to

achieve complete coverage. Also, the computational hard-

ware could be found for free. Most corporations renew their

PCs every 2–3 years. Since DETER is designed to run on

moderate PC hardware,recycled PC units can be used for the

processing of the video information. There is no bandwidth

problem between the camera and the PC since the standard

coaxial cable can accommodate comfortably video transmis-

sions of 30 frames per second. After the information is pro-

cessed at the PC, either is stored locallyor transmitted across

the building’s intranet on an event basis. Based on the above

description, DETER can be sold more as an upgrade service

instead of a new security product. We believe that this busi-

ness model is necessary for the rapid spread of high tech-

nology in the security marketplace.

VI. O

BJECT SEGMENTATION AND TRACKING

A. Initialization

The goal of the initialization phase is to provide statisti-

cally valid values for the pixels corresponding to the scene.

These values are then used as starting points for the dynamic

process of foreground and background awareness. Initializa-

tion happens only once and there are no strict real-time pro-

cessing requirements for this phase. We process a certain

number of frames

( ) on-line or off-line.

1484 PROCEEDINGS OF THE IEEE, VOL. 89, NO. 10, OCTOBER 2001

Authorized licensed use limited to: Universitat Autonoma De Barcelona. Downloaded on March 10,2010 at 05:34:51 EST from IEEE Xplore. Restrictions apply.

Fig. 6. Visualization of the mixture of normals model at the pixel level. The normals of a gray

channel is depicted for simplicity purposes.

Each pixel is considered as a mixture of five

time-varying trivariate normal distributions

where

and

are the mixing proportions (weights) and denotes

a trivariate Normal distribution with vector mean

and vari-

ance-covariance matrix

. The distributions are trivariate to

account for thethree component colors (red, green, and blue)

of each pixel in the general case of a color camera. Please

note that

where , , and stand for the measurement we re-

ceived from the red, green, and blue channel of the camera

for the specific pixel

For simplification, the variance-covariance matrix is as-

sumed to be diagonal with

, , having identical vari-

ance within each Normal component, but not across all com-

ponents (i.e.,

for components). Therefore,

Other similar methods reported in the literature initialize

the pixel distributions either randomly or with the K-means

algorithm. Random initialization results in slow learning

during the dynamic mixture model update phase. Some-

times, it even results in instability. Initialization with the

K-means or the expectation-maximization (EM) method

[22] gives significantly better results. The EM algorithm is

computationally intensiveand takes the initializationprocess

off-line for about 1 min. In the parking lot application where

human and vehicular traffic is small, the short off-line

interval is not a problem. Actually, the EM initialization

performs a little better particularly if the weather conditions

are dynamic (e.g., fast moving clouds). But, if the area under

surveillance were a busy plaza (many moving humans and

vehicles), the on-line K-means initialization might have

been more preferable.

B. Segmentation of Moving Objects

The initial mixture model is updated dynamically there-

after. The update mechanism is based on the incoming ev-

idence (new camera frames). Several things could change

during an update cycle.

1) The form of some of the distributions could change

(weight

, mean , and variance ).

2) Some of the foreground states could revert to back-

ground and vice versa.

3) One of the existing distributions could be dropped and

replaced with a new distribution.

At every point in time, the distribution with the strongest

evidence is considered to represent the pixel’s most prob-

able background state. Fig. 6 presents a visualization of the

PAVLIDIS et al.: URBAN SURVEILLANCE SYSTEMS 1485

Authorized licensed use limited to: Universitat Autonoma De Barcelona. Downloaded on March 10,2010 at 05:34:51 EST from IEEE Xplore. Restrictions apply.

Fig. 7. Visualization of the mixture model update mechanism.

The normals of a gray channel is depicted for simplicity purposes.

The small ellipse marks the pixel area under monitoring.

mixture of Normal’s model while Fig. 7 depicts the update

mechanism for the mixture model.

The update cycle for each pixel proceeds as follows:

1) First, the existing distributions are ordered in de-

scending order based on their weight values.

2) Second, the algorithm selects the first

distributions

thataccount for apredefined fractionof the evidence

arg

where are the respective distribution

weights. These

distributionsare considered asback-

ground distributions while the remaining

distri-

butions are considered foreground distributions.

3) Third, the algorithm checks if the incomingpixel value

can be ascribed to any of the existing Normal distribu-

tions. The matching criterion we use is the Jeffreys (J)

divergence measure and is a key differentiator of our

approach from other similar approaches.

4) Fourth, the algorithm updates the mixture of distribu-

tions and their parameters. The nature of the update

depends on the outcome of the matching operation. If

a match is found, the update is performed using the

method of moments. Thisis alsoa key differentiatorof

our approach. If a match isnot found, then the weakest

distribution is replaced with a new distribution. The

update performed in this case guarantees the inclusion

of the new distribution in the foreground set, which is

another novelty of our method.

The matching and model update operations are quite in-

volved [23] and are described in detail in the next three sub-

sections.

1) The Matching Operation: We use the Jeffreys di-

vergence measure

[24] to determine whether the

incoming data point belongs or not to one of the existing

five distributions. The Jeffreys number measures how

unlikely it is that one distribution (

) was drawn from the

population represented by the other (

). For a presentation

of the theoretical properties of the Jeffreys divergence mea-

sure, see [25]. The five existing Normal distributions are:

, . Since the relates to

distributions and not to data points, we need to associate the

incoming data point with a distribution. We construct the

incoming distribution as

. We assume that

and

where is the incoming data point. The choice of

is the result of experimental observation about the typical

spread of successivepixel valuesin smalltime windows. The

five divergence measures between

and , are

computed by the following formula:

Once the five divergence measures have been calculated,

we find the distribution

( ) for which

and we have a match between and if and only if

where is a prespecified cutoff value. In the case where

, then the incoming distribution cannot be

matched to any of the existing distributions.

The key point here is thatwe measuredissimilarity against

all the available distributions. Other approaches, like [13],

measure dissimilarity against the existing distributions in a

certain order. Depending on the satisfaction of a certain con-

dition the process may stop before all five measurements are

taken and compared. We will see in Section VI-B4 how this

“preferential” treatment can weaken the performance of the

segmenter under certain weather scenarios.

2) Model Update When a Match is Found: If the

incoming distribution matches to one of the existing distri-

butions, we pool themtogether to a new Normal distribution.

This new Normal distribution is considered to represent

the current state of the pixel. The state is labeled either

background or foreground depending on the position of the

matched distribution in the ordered list of distributions. The

nextissue needed to be clarified is how we updatethe param-

eters of the mixture. We use the Method of Moments. First,

we introduce some learning parameter

, which weighs

on the weights of the existing distributions. So we subtract

weight from each of the five existing weights and

1486 PROCEEDINGS OF THE IEEE, VOL. 89, NO. 10, OCTOBER 2001

Authorized licensed use limited to: Universitat Autonoma De Barcelona. Downloaded on March 10,2010 at 05:34:51 EST from IEEE Xplore. Restrictions apply.

we assign it to the incoming distribution’s weight. In other

words, the incoming distribution has weight

since

and the five existing distributions have weights: ,

Obviously, for

we need to have . The choice

depends mainly on the choice of . The two quantities

are inversely related. The smaller the value of

, the higher

the value of

and vice versa. The values of and are

also affected by how much noise we have in the monitoring

area. So if, for example, we were monitoring an outside re-

gion and had a lot of noise due to environmental conditions

(rain, snow, etc.), then we would need a “high” value of

and thus a “small” value of , since non-match to one of the

distributionsis verylikely to be caused by background noise.

On the other hand, if we were recording indoors where the

noise is almost non existent then we would prefer a “small”

valueof

and thus a“higher” value of , because any time

that we do not get a match to one of the existing five distri-

butions, that is very likely to occur due to some foreground

movement (since the background has almost no noise at all).

Let us assume that we have a match between the new dis-

tribution

and one of the existing distributions where

. Then, we update the weights of the mixture

model as follows:

and

We also update the mean vectors and the variances. If we

call

as: , i.e., is the weight of the th

component (which is the winnerin thematch) before pooling

it with the new distribution

and if we call i.e., the

weight of the new observation, then define

Using the method of moments [26], we get

while the other four (unmatched)distributions keep the same

mean and variance that they had at time

3) ModelUpdate When a MatchisNotFound: In thecase

where a match is not found (i.e.,

then we commit the current pixel state to be foreground and

we replace the last distribution in the ordered list with a new

one. The parameters of the new distribution are computed as

follows.

1) The mean vector

is replaced with the incoming

pixel value.

2) The variance

is replaced with the minimum vari-

ance from the list of distributions.

3) The weight of the new distribution is computed as fol-

lows:

where is the background threshold index. This for-

mula guarantees the classification of the current pixel

state as foreground. The weights of the remaining four

distributions are updated according to the following

formula:

4) Justificationof the ModificationsIntroduced toNormal

Mixture Modeling: We initially implemented the Normal

Mixture Modeling reported in [13] . The performance of the

moving object segmenter under that scheme was satisfactory

in the experimental trials and we did not plan on modifying

the approach in any way. During late spring and early

summer of 2000, however, weather phenomena in Min-

neapolis revealed some weak points of the method. During

this time of year, the weather in Minneapolis features broken

clouds, due to increased evaporation from the lakes and

brisk Canadian winds. Small clouds of various density pass

rapidly across the camera’s field of view in high frequency.

This type of weather substantially affected the performance

of the segmenter and either increased dramatically the false

alarms or reduced the detection sensitivity depending on

how we set the algorithmic parameters.

In [13], the distributions of the mixture model are always

kept in a descending order according to

, where is

the weight and

the variance of each distribution. Then, in-

coming pixels are matched against the ordered distributions

in turn from the top toward the bottom of the list. If the in-

coming pixel value is found to be within 2.5 standard de-

viations of a distribution, then a match is declared and the

processstops. Thismethod is vulnerableto thefollowingsce-

nario: An incoming pixel value is more likely to belong, for

example, to distribution 4 but still satisfies the 2.5 standard

deviationcriterion for adistribution earlier in thequeue (e.g.,

2). Then, the process stops before it reaches the right distri-

butionanda match is declaredearly (see Fig. 8). Thematch is

followed with a model update that unjustly favors the wrong

distribution. These cumulative errors can affect the perfor-

mance of the systemafter acertain point. They canevenhave

an immediate and serious effect if one distribution (e.g., 2)

happens to be background and the other (e.g., 4) foreground.

The above scenario can be put into motion by fast moving

clouds. In [13], when a new distribution is introduced into

the system it is centered around the incoming pixel value

and is given an initially high variance and small weight. As

more evidence accumulates, the variance of the distribution

drops and its weight increases. Consequently, the distribu-

tion advances in the ordered list of distributions. Because,

however, the weather pattern is very active, the variance of

the distribution remains relatively high since supporting ev-

PAVLIDIS et al.: URBAN SURVEILLANCE SYSTEMS 1487

Authorized licensed use limited to: Universitat Autonoma De Barcelona. Downloaded on March 10,2010 at 05:34:51 EST from IEEE Xplore. Restrictions apply.

Fig. 8. Visualization of the failure mode of the method described in [13].

idence is switched on and off at high frequency. This re-

sults in a mixture model with distributions that are relatively

spread out. If an object of a certain color happens to move

in the scene during this time, it generates incoming pixel

values that may marginally match distributions at the top of

the queue and therefore interpreted as background. Since the

movingclouds affectwide areas ofthe camera’s field of view

post-processing cannot save the day.

In contrast, ourmethod does not try to match theincoming

pixel value from the top to the bottom of the ordered distri-

bution list. It rather creates a narrow distribution that repre-

sents the incoming data point. Then, it performs the match

by finding the minimum divergence value between the in-

coming distribution and all the distributions of the mixture

model (see Fig. 9). In this manner, the incoming data point

has a much better chance of being matched to the right dis-

tribution than in [13].

C. Multiple Hypotheses Predictive Tracking

In the previous section we describeda statistical procedure

to perform on-line segmentation of foreground pixels corre-

sponding to moving objects of interest, i.e., people and ve-

hicles. In this section, we describe how to form trajectories

traced by the various moving objects. Fig. 10 shows a snap-

shot of the output from the various computer vision modules

of DETER. The basic requirement for forming object trajec-

tories is the calculation of blob centroids (corresponding to

moving objects). Blobs are formed after we apply a standard

8-connected component analysis algorithm to the foreground

pixels. The connected component algorithm filters out blobs

witharealess than

pixelsasnoise.According

to our optical computation in Section V, this is the minimal

pixel footprint of the smallest object of interest (human) in

the camera’s FOV.

A Multiple Hypotheses Tracking (MHT) algorithm is then

employed that groups the blob centroids of foreground ob-

jects into distinct trajectories. MHT is considered to be the

best approach to multitarget tracking applications. It is a re-

cursive Bayesian probabilistic procedure that maximizes the

probability of correctly associating input data with tracks. Its

superiority against other tracking algorithms stems from the

fact that it does not commit early to a trajectory. Early com-

mitment usually leads to mistakes. MHT groups the input

1488 PROCEEDINGS OF THE IEEE, VOL. 89, NO. 10, OCTOBER 2001

Authorized licensed use limited to: Universitat Autonoma De Barcelona. Downloaded on March 10,2010 at 05:34:51 EST from IEEE Xplore. Restrictions apply.

Fig. 9. Visual representation of the way our method matches incoming data points to existing

distributions.

data into trajectories only after enough information has been

collected and processed. In this context, it forms a number

of candidate hypotheses regarding the association of input

data with existing trajectories. MHT has shown to be the

method of choice for applications with heavy clutter and

dense traffic. In difficult multitarget tracking problems with

crossed trajectories, MHT performs very well as opposed

to other tracking procedures such as the Nearest Neighbor

(NN) correlation and the Joint Probabilistic Data Associa-

tion (JPDA) [27].

Fig. 11 depicts the architecture of our MHT algorithm. An

integral part of any tracking system is the prediction module.

Prediction provides estimates of moving objects’ states and

in the DETER system is implemented as a Kalman filter.

Kalman filter predictions are made based on a priori models

for target dynamics and measurement noise. Validation is

a process which precedes the generation of hypotheses re-

garding associations between input data (blob centroids) and

the current set of trajectories (tracks). Its function is to ex-

clude, early on, associations that are unlikely to happen thus

limiting the number of possible hypotheses to be generated.

Central to the implementationof the MHTalgorithm is the

generationand representation of trackhypotheses. Tracks are

generated based on the assumption that a new measurement

may:

1) belong to an existing track;

2) be the start of a new track;

3) be a false alarm.

Assumptions are validated through the validation process

before they incorporated into the hypothesis structure. The

complete set of track hypotheses can be represented by a hy-

pothesis matrix as shown in Table 1. The hypothetical situa-

tioninTable1correspondstoasetoftwoscansof2and1mea-

surementsmade respectivelyonframe

and .

Some notation clarification is in order. A measurement

is the th observation(blob centroid) made onframe . In ad-

dition, a false alarm is denoted by 0 while the formation of

a new track (

) generated from an old track ( )is

shownas

. The firstcolumn in this table is the

Hypothesis index.In our example case we havea total of four

hypotheses generated during scan 1, and eight more are gen-

erated during scan 2. The last column lists the tracks that the

PAVLIDIS et al.: URBAN SURVEILLANCE SYSTEMS 1489

Authorized licensed use limited to: Universitat Autonoma De Barcelona. Downloaded on March 10,2010 at 05:34:51 EST from IEEE Xplore. Restrictions apply.

Fig. 10. Visualization of the computer vision operation of DETER. The snapshot was taken “live”

on March 3, 2000. (a) Live video feed. (b) Segmented moving object. (c) Dynamically updated

backgroud. (d) Trajectories of the current moving objects. (e) Centroids of the moving objects.

(f) Results of the blob analysis. (g) Cumulative trajectory visualization of human and vehicle

traffic for the past hour.

Fig. 11. Architecture of the MHT algorithm.

particular hypothesis contains (e.g., hypothesis contains

tracks 1 and 4). The row cells in the hypothesis table denote

thetracks to which the particular measurement

belongs

(e.g.,underhypothesis

themeasurement belongsto

track 5). A hypothesis matrix is represented computationally

by a tree structure as it is schematically shown in Fig. 12. The

branches of the tree are in essence the hypotheses aboutmea-

surements-track associations.

As it is evident from the above example, the hypothesis

tree can grow exponentially with the number of mea-

surements. We apply two measures to reduce the number of

Table 1

Complete Set of Track Hypotheses with the Associated

Sets of Tracks

hypotheses.Ourfirstmeasureisto clusterthe hypothesesinto

disjoint sets [28]. In this sense, tracks that do not compete

for the same measurements compose disjoint sets which in

turn are associated withdisjoint hypothesis trees. Oursecond

measure is to assign probabilities on every branch of hypoth-

esis trees. The set of branches with the

highest prob-

abilities are only considered. The derivation of hypothesis

probabilities is out of the scope of this paper. However, the

1490 PROCEEDINGS OF THE IEEE, VOL. 89, NO. 10, OCTOBER 2001

Authorized licensed use limited to: Universitat Autonoma De Barcelona. Downloaded on March 10,2010 at 05:34:51 EST from IEEE Xplore. Restrictions apply.

Fig. 12. Formation of a hypothesis tree.

interested reader is referred to [28] and [29]. It only suffices

to say that a recursive Bayesian methodology is followed for

calculating hypothesis probabilities from frame to frame.

VII. M

ULTICAMERA FUSION

Monitoring of large sites (such as parking lots) can be

accomplished only through the coordinated use of multiple

cameras. In DETER, we need to have seamless tracking of

humans and vehicles across the whole geographical area

covered by all cameras. We produce a panoramic view of

the HL parking lot by fusing the individual camera FOVs.

Then, object motion is registered against a global coordinate

system. We achieve multicamera registration (fusion) by

computing the Homography transformation between pairs

of cameras (see Fig. 13). Our homography computation

procedure takes advantage of the overlapping that exists

between pairs of camera FOVs. We use the pixel coordinates

of more than four points to calculate the homography trans-

formation matrix. These points are projections of physical

ground plane points that fall in the overlapping area between

the two camera FOVs. We select and physically mark these

points on the ground with paint during the installation phase.

We then sample the corresponding projected image points

through the DETER graphical user interface (GUI). This is

a process that happens only in the beginning and once the

camera cross-registration is complete is never repeated.

A. Homography Computation

The homography computation ischallenging primarilyfor

two reasons.

• It is an underconstrained problem that is usually based

on a small number of matching points.

• It introduces inaccuracies in specialized transforma-

tions (e.g., pure rotation or translation).

A very popular and relatively simple method for the com-

putation of the homography matrices is the so-called least

squares method [16]. This method may provide a poor so-

lution to the underconstrained system of equations due to

biased estimation. It also cannot effectively specialize the

general homography computation when special cases are at

hand.

We have adopted the algorithm by Kanatani [17] to com-

pute the homography matrices. The algorithm is based on

a statistical optimization theory for geometric computer vi-

sion [18] and cures the deficiencies exhibited by the least

squares method. The basic premise is that the epipolar con-

straint may be violated by various noise sources due to the

statistical nature of the imaging problem (see Fig. 14).

VIII. T

HREAT ASSESSMENT

Automation is clearly necessary to allow limited and fal-

lible human attention to monitor a large protected space. The

primary objective of DETER is to alert security personnel to

just those activities that require their scrutiny, while ignoring

innocuous use. DETER achieves its objective by processing

the computer vision information through its threat assess-

ment module. All of the threat assessment analysis is done

after converting the pixel coordinates of the object tracks

into a world coordinate system set by the CAD drawing of

the facility. Thus, we can use well-known landmarks to pro-

vide content for evaluating intent. Such landmarks include

individual parking spots, lot perimeter, power poles, and tree

lines. The coordinate transformation is achieved through the

use of the optical computation package CODE V.

The feature assembly uses the trajectory information pro-

vided by the computer vision module to compute relevant

higher level features on a per-vehicle/pedestrian basis. The

features are designed to capture “common sense” beliefs

about innocuous, law abiding trajectories, and the known or

supposed patterns of intruders. In the current prototype, the

features calculated include the following:

• number of sample points;

• starting position (

);

• ending position (

);

• path length;

• distance covered (straight line);

• distance ratio (path length/distance covered);

• start time (local wall clock);

• end time (local wall clock);

• duration;

• average speed;

• maximum speed;

• speed ratio (average/maximum);

• total turn angles (radians);

• average turn angles;

• number of “M” crossings.

Most of these are self explanatory, but a few are not so

obvious. The wall clock is relevant since activities on some

paths are automatically suspect at certain times of day—par-

ticularly late night and early morning.

The turn angles and distance ratio features capture aspects

of howcircuitous wasthe path followed. The legitimateusers

of the facility tend to follow the most direct paths permitted

by the lanes. “Browsers” may take a more serpentine course.

The “M” crossings feature attempts to monitor a

well-known tendency of car thieves to systematically check

multiple parking stalls along a lane, looping repeatedly back

to the car doors for a good look or lock check (two loops

yielding a letter “M” profile). This can be monitored by

keeping reference lines for the parking stalls and counting

the number of traversals into stalls. An “M” type pedestrian

crossing captured by DETER is illustrated in Fig. 15.

PAVLIDIS et al.: URBAN SURVEILLANCE SYSTEMS 1491

Authorized licensed use limited to: Universitat Autonoma De Barcelona. Downloaded on March 10,2010 at 05:34:51 EST from IEEE Xplore. Restrictions apply.

Fig. 13. Fused view from two DETER cameras. Because we compute a near optimal camera

configuration scheme (coverage versus cost), the cameras are far apart and their optical axes form

angles that vary wildly. As a result, one can notice the substantial image skewing produced by the

highly nonlinear homography transformation. Despite the nonlinearity we achieve smooth image

display thanks to a proprietary Honeywell warping algorithm.

Fig. 14. The statistical nature of the imaging problem affects

the epipolar constraint.

and are the optical centers of the

corresponding cameras. P(

) is a point in the scene that

falls in the common area between the two camera FOVs. Ideally,

the vectors

are coplanar. Due to the noisy

imaging process, however, the actual vectors

may not be coplanar.

The output of the feature assembly module for trajectories

recorded from the site over some period of time is fed into

theoff-linetraining module. The goal of off-linetrainingis to

produce threat models based on a database of features. In the

current system, we have gathered data by running DETER

over a period of several hours. During this period, we staged

several suspicious events (like “M” type strolls) to enrich

our data collection. We then manually labeled the individual

object trajectories as either innocuous (OK) or suspicious

(THREAT). In the future, a clustering algorithm (see Fig. 2)

Fig. 15. An M-pattern traced by DETER. The centroids

constituting the track are superimposed on the parking lot’s CAD

drawing. The M-pattern is a stroll mode favored by potential car

thieves and it was one of the events staged during the benchmark

recording.

will assist in the production of more parsimonious descrip-

tions of object behavior. The complete training data consist

of the labeled trajectories and the corresponding feature vec-

tors. They are all processed together by a classification tree

induction algorithm based on CART [30]. The trained clas-

sifier is then used on-line to classify incoming live data as

either innocuous or suspicious.

1492 PROCEEDINGS OF THE IEEE, VOL. 89, NO. 10, OCTOBER 2001

Authorized licensed use limited to: Universitat Autonoma De Barcelona. Downloaded on March 10,2010 at 05:34:51 EST from IEEE Xplore. Restrictions apply.

IX. EXPERIMENTAL RESULTS

At the time of the writing, DETER has been operating for

almost a year. During this time there have been incremental

improvements at the algorithmic and software level. We use

the experience of the building’s guards as the primary feed-

back mechanism. This feedback is primarily qualitative but

is very important since this is the way products are evaluated

in the security market place. The fundamental criteria they

use in their evaluation are as follows.

• Is the system trustworthy? In other words, does it pro-

duce a lot of false alarms or does it miss important

events?

• How does it compare with the legacy system?

• Does it add value to their function?

• Is it easy to learn and operate?

Inthe matter of trustworthiness,the guards werethe firstto

pinpoint the faulty behavior of the system when the weather

featured broken clouds and brisk winds. This prompted the

investigation by the R&D team that led to the modification

of the moving object segmenter. After the modified com-

puter vision subsystem was put into use in August 2000, the

problem was fixed and no other major complaints came into

being.

The guards were also very excited with some functions of

DETERthat did not relate directlyto automated surveillance.

An example was the fusion of the multiple camera field of

viewsintoasuper picture andits projection ona bigflat panel

display. This gives the guards a comprehensive view of the

entire perimeter of the building and does not fragment their

attention. This attitude is a testament to the anthropocentric

character of the security market.

The only persistent complaint that still stands regards the

user interface portion of DETER. Ultimately, the guards

would like to add functions, like the detection of over-

speeding, by clicking and pointing away. Right now they

need the help of a member of the R&D team whenever they

want to set a new function for the threat assessment module.

In addition to the qualitative testing performed by the ac-

tual users, we also performed quantitative testing for bench-

marking purposes. Since August 11, 2000, we measured the

tracking performance of DETER in the HL parking lot for

8 h. The testing was done in 1 h increments spread over dif-

ferent days, times of day, and seasons. Meticulous ground-

truthing was performed by two R&D engineers and their

results were compared and reconciled for accuracy. We se-

lected this data set to fulfill certain requirements.

1) Sizeable duration (several hours).

2) Scenarios with significant traffic and others predomi-

nantly inactive. Typical busy times that were captured

were in mid-afternoon during a workday when people

leaving for their homes. Typical inactive times were

late night hours.

3) Inclusion of some unusual events. We have induced

these events ourselves in the absence of criminal ac-

tivity.

Table 2

Experimental Results for the 8-Hour-Long Data Set

4) Challenging weather conditions. We have included a

partly cloudy day with strong winds (1 h). We have

also included a snowy day (1 h) and a rainy day (1 h).

Table 2 shows the results of the DETER performance in

the field tests . The ground truth was done by indexing back

the actual events on the video clip to the annotated output

of DETER on the CAD design of our lot (see Fig. 16).

Parking lot activity included walking and running of a single

individual, simultaneous walking of a number of individuals

(following crossing or parallel paths), driving of a single

and multiple cars, and finally a combination of cars and

humans in motion. As we explained earlier, staged events

included geometrically interesting walking patterns such as

the ones we call M-Patterns (see Fig. 15) and dangerous

driving. These events were identified as suspicious by the

Threat Assessment classifier.

DETER detected and tracked perfectly 554 objects out

of 666. In 77 instances, DETER has lost momentarily track

of the object but regained it very quickly. The result was

a split track. That was typically the case with pedestrians

as they ventured momentarily under the tree lines (summer

and early fall trials). Tracking was correctly resumed once

the pedestrians were again out of the tree line and in clear

view. We do not consider the split tracks of pedestrians as a

sign of algorithmic weakness. DETER employs a relatively

small number of cameras because it is a cost-sensitive ap-

plication. Therefore, during summer time when the trees are

fully bloomed, coverage under the tree lines is not perfect.

The problem can be solved by employing additional cam-

eras if split tracks prove to be a serious security loophole

(cost versus risk analysis). Alternatively, DETER can main-

tain the same number of cameras and recognize objects that

appear and disappear from the FOV within short time in-

tervals. To perform this recognition function, DETER needs

cameras with higher resolution to capture detailed features

of cars and especially humans. A solution would be to have

the DETER cameras equipped with automated zoom mech-

anisms. Then they will be able to zoom in momentarily on

every detected object and capture a detailed object signature.

This capability will increase exponentially the algorithmic

and software complexity of DETER.

Another type of eventthat was prone to split tracks was the

unparking of vehiclesin the parking lot. As the vehicles back

up to get outof the parking stall,they stop temporarily before

they start moving forward. This results in the loss of track

association. This is a predictive tracking problem and not an

object segmentation problem. For all practical purposes, it

does not have any substantial effect on the intended use of

the system and we have decided to ignore it.

In a few occasions (16) where pedestrians were moving

next to each other (party of two), DETER correctly detected

PAVLIDIS et al.: URBAN SURVEILLANCE SYSTEMS 1493

Authorized licensed use limited to: Universitat Autonoma De Barcelona. Downloaded on March 10,2010 at 05:34:51 EST from IEEE Xplore. Restrictions apply.

Fig. 16. Live video snapshot of a car moving out of the parking lot and its itinerary (line marked by

the letter D) as it is recorded by DETER at the CAD design level.

andtracked the motion butas a single object. This is a camera

resolution problem. Ifwe covered less area with each camera

the resolution would have been better and the segmentation

of closely spaced moving objects more accurate. This loss

of information would have been important only if we were

interested in monitoring human interaction.

DETER produced a small number of false alarms. Four

of the five false alarms were produced in a snowy day as

accumulated iced snow was hovering from the top cover of

one of the cameras.

Finally, DETER missed altogether three objects—all

pedestrians. The puzzling thing is that all three cases were

recorded on a clear day and the objects were in clear view

of the cameras. The issue is under study. Although the

number of missed objects is small, it is clearly a concern

since it relates to DETER’s most important requirement—to

function as a sophisticated motion detector.

In general, the computer vision part of DETER and partic-

ularly the moving object segmenter performed very well for

the purposes of its intended use.

We have also set up a laboratory experiment to quantify

the performance of our latest moving object segmenter with

regard to the old moving segmenter modeled after [13]. The

experiment was geared to gauge the performance of the two

systems under frequent global illumination changes. The ex-

periment took place in our lab where we had a model train

that was running up and down a fixed track. During the ex-

periment, we were switching on and off some of the over-

1494 PROCEEDINGS OF THE IEEE, VOL. 89, NO. 10, OCTOBER 2001

Authorized licensed use limited to: Universitat Autonoma De Barcelona. Downloaded on March 10,2010 at 05:34:51 EST from IEEE Xplore. Restrictions apply.

Fig. 17. Three different snapshots from the lab experimental

setup. The scene appears in three different lighting conditions.

One can notice the proximity in tones of the train and the floor

background.

head lights randomly from time to time to emulate the ef-

fect of passing clouds (see Fig. 17). The experiment run for

15 min. During this period, the model train made 30 passes

through the camera’s field of view and, thus, a perfect de-

tection and tracking performance would have produced 30

tracks. Table 3 shows the results of the experiment.

The older system modeled after [13] produced a rather

high number ofinstances of split and missed tracks verifying

thefieldtest indications andour theoreticalanalysis (see Sec-

tion VI-B4). Thisbehavior can berectified if one lowerssub-

stantially the background threshold B that defines how many

Table 3

Experimental Results for the Comparative Experiment in the

Laboratory

of the distributions can be considered background at each

point. Of course, the system then performs at a high false

alarm rate, which is worse because it affects performance

during normal weather conditions. Our modified system ex-

hibited substantially better detecting power at only a slightly

higher false alarm rate.

X. C

ONCLUSION AND FUTURE WORK

We have presented DETER, a prototype urban surveil-

lance system for monitoring large open spaces. We have

provided the context of the current state of the security

market and how it affected the design of DETER. DETER

reliably tracks humans and vehicles both day and night. It

consists of a computer vision module and a threat assess-

ment module. The two primary components of the computer

vision module is the moving object segmenter and the

associated tracker. We have adopted the general approach

described in [13] . We have introduced, however, some

modifications that improve the performance of the system

when there is high frequency of global illumination changes.

Based on the object segmentation results, tracks are formed

using a MHT algorithm and external multicamera calibra-

tion is achieved through the computation of homographies.

The calibrated scene is mapped into the CAD design of the

area under surveillance to facilitate higher level reasoning.

The threat assessment module reports suspicious patterns

detected in the annotated trajectory data at the CAD level.

The threat assessor also uses the information produced by

the computer vision module to perform some nonsecurity

functions, like monitoring the capacity of the parking lot.

DETER is the result of compromise among lofty research

and development ideals and the business and market reali-

ties. It is characteristic that the information produced by the

computer vision module is used only for a small number

of relatively simple functions (e.g., motion detection, recog-

nition of a few specific motion patterns, and detection of

overspeeding). The current experimental users of the proto-

type find these features nearly overwhelming. Our ongoing

work focuses on the development of a more sophisticated

user interface that will allow naive users of the system to

introduce new behaviors at the CAD level by pointing and

clicking away. Additionally, we are working toward the im-

provement of the threat assessment module with the inclu-

sion of a clustering algorithm. The clustering algorithm will

help in the partial automation of the off-line training, cur-

rently performed manually.

DETER is scheduled for productization in 2002, after

the above mentioned improvements get incorporated into

the prototype. It is characteristic of the global nature of

PAVLIDIS et al.: URBAN SURVEILLANCE SYSTEMS 1495

Authorized licensed use limited to: Universitat Autonoma De Barcelona. Downloaded on March 10,2010 at 05:34:51 EST from IEEE Xplore. Restrictions apply.

the security industry that the software maintenance of the

product (or service) has been assigned to the Honeywell

division in Bangalor India to keep the price competitive and

the marketing to the Honeywell Australian security division.

CKNOWLEDGMENT

We would like to thank a number of individuals for con-

tributing to the success of this project, including K. Haigh,

M. Bazakos, J. Droesller, R. Van Riper, P. Reutiman, and T.

Faltesek.

EFERENCES

[1] J. A. Ratches, “Aided and automatic target recognition based upon

sensory inputs from image forming systems,” IEEE Trans. Pattern

Anal. Mach. Intell., vol. 19, pp. 1004–1019, Sept. 1997.

[2] Vsam home page [Online]. Available: www.cs.cmu.edu/

vsam/vsamhome.html

[3] R. T. Collins, A. J. Lipton, T. Kanade, H. Fujiyoshi, D. Duggins, Y.

Tsim,D. Tolliver,N. Enomoto,O.Hasegawa,P. Burt,and L.Wixson,

“A system for video surveillance and monitoring: Vsam final re-

port,” Robotics Institute, Carnegie Mellon Univ., Pittsburgh , PA ,

Tech. Rep. CMU-RI-TR-00–12, 2000.

[4] E. Stringa and C. S. Regazzoni, “Real-time video-shot detection for

scene surveillance applications,” IEEE Trans. Image Processing,

vol. 9, pp. 69–79, Jan. 2000.

[5] C. Sacchi and C. S. Regazzoni, “A distributed surveillance system

for detection of abandoned objects in unmanned railway environ-

ments,” IEEE Trans. Veh. Technol., vol. 49, pp. 2013–2026, Sept.

2000.

[6] X. Gao, T. E. Boult, F. Coetzee, and V. Ramesh, “Error analysis of

background adaptation,” inProc. 2000 IEEE Conf. Computer Vision

and Pattern Recognition, vol. 1, Hilton Head Island, SC, June 2000,

pp. 503–510.

[7] D. Comaniciu, V. Ramesh, and P. Meer, “Real-time tracking of non-

rigid objects using mean shift,” in Proc. 2000 IEEE Conf. Computer

Vision and Pattern Recognition, vol. 2, Hilton Head Island, SC, June

2000, pp. 142–149.

[8] D. Ormoneit, H. Sidenbladh, M. J. Black, T. Hastie, and D. J. Fleet,

“Learning and tracking human motion using functional analysis,”

in Proc. 2000 IEEE Workshop Human Modeling, Analysis and Syn-

thesis, Hilton Head Island, SC, June 2000, pp. 2–9.

[9] N. M. Oliver, B. Rosario, and A. P. Pentland, “A bayesian computer

vision system for modeling human interactions,” IEEE Trans. Pat-

tern Anal. Mach. Intell., vol. 19, pp. 1004–1019, Sept. 1997.

[10] C. H. Anderson, P. J. Burt, and G. S. V. D. Wal, “Change detection

and tracking using pyramid transform techniques,” in Proc. SPIE

Int. Soc. Opt. Eng., vol. 579, Cambridge, MA, Sept. 16–20, 1985,

pp. 72–78.

[11] I. Haritaoglu, D. Harwood, and L. S. Davis, “W/sup 4/s: A real-time

system for detecting and tracking people in 21/2d,” inProc. 5th Eur.

Conf. Computer Vision, vol. 1, Freiburg, Germany, June 2–6, 1998,

pp. 877–892.

[12] T. Kanade, R. T. Collins, A. J. Lipton, P. Burt, and L. Wixson,

“Advances in cooperative multi-sensor video surveillance,” in Proc.

DARPA Image Understanding Workshop, Monterey, CA, Nov.

1998, pp. 3–24.

[13] C. Stauffer and W. E. L. Grimson, “Adaptive background mixture

models for real-time tracking,” in Proc. 1999 IEEE Conf. Computer

Vision and Pattern Recognition, vol. 2, Fort Collins , CO, June

23–25, 1999, pp. 246–252.

[14]

, “Learning patterns of activity using real-time tracking,” IEEE

Trans. Pattern Anal. Mach. Intell., vol. 22, pp. 747–767, Aug. 2000.

[15] A. Elgammal, D. Harwood, and L. Davis, “Non-para-

metric model for background subtraction,” in Proceedings

IEEE FRAME-RATE Workshop, Corfu, Greece , Sept. 2000,

www.eecs.lehigh.edu/FRAME.

[16] L. Lee, R. Romano, and G. Stein, “Monitoring activities from mul-

tiplevideo streams:Establishing acommoncoordinate frame,”IEEE

Trans. Pattern Anal. Mach. Intell., vol. 22, pp. 758–767, Aug. 2000.

[17] K. Kanatani, “Optimal homography computation with a reliability

measure,” in Proc. IAPR Workshop Machine Vision Applications,

Makuhari, Chiba , Japan, Nov. 1998, pp. 426–429.

[18]

, Statistical Optimization for Geometric Computer Vision:

Theory and Practice. Amsterdam, The Netherlands: Elsevier,

1996.

[19] R. Hartley and A. Zisserman, Multiple View Geometry in Computer

Vision. Cambridge, U.K.: Cambridge Univ. Press, 2000.

[20] W. E. L. Grimson,C. Stauffer,R. Romano, and L. Lee, “Using adap-

tive tracking to classify and monitor activities in a site,” in Proc.

1998 IEEE Conf. Computer Vision and Pattern Recognition, Santa

Barbara, CA, June 23–25, 1998, pp. 22–29.

[21] B. K. P. Horn, Robot Vision. Cambridge, MA: MIT Press, 1986,

pp. 66–69.

[22] A. P. Dempster, N. M. Laird,and D. B.Rubin, “Maximum likelihood

from incomplete data via the em algorithm (with discussion),” J.

Roy. Stat. Soc. B, vol. 39, pp. 1–38, 1977.

[23] P. Tsiamyrtzis, “A Bayesian approach to quality control problems,”

Ph.D. dissertation, School of Statistics, Minneapolis, MN, 2000.

[24] H. Jeffreys, Theory of Probability. London, U.K.: Oxford Univ.

Press, 1948.

[25] J. Lin, “Divergence measures based on the Shannon entropy,” IEEE

Trans. Inform. Theory, vol. 37, pp. 145–151, Jan. 1991.

[26] G. J. McLachlan and K. E. Basford, Mixture Models Inference and

Applications to Clustering. New York : Marcel Dekker , 1988.

[27] S. S. Blackman, Multiple-Target Tracking with Radar Applica-

tions. Norwood, MA: Artech House , 1986.

[28] D. B. Reid, “An algorithm for tracking multiple targets,” IEEE

Trans. Automat. Contr., vol. 24, pp. 843–854, 1979.

[29] I. J. Cox and S.L. Hingorani, “An efficient implementation of reid’s

multiplehypothesis tracking algorithm andits evaluation for the pur-

pose of visual tracking,” IEEE Trans. Pattern Anal. Mach. Intell.,

vol. 18, pp. 138–150, Feb. 1996.

[30] W. Buntine, “Learning classification trees,” Stat. Comput., vol. 2,

no. 2, pp. 63–73, 1992.

[31] “World security servicesto 2004,” TheFreedonia Group ,Tech. Rep.

1348, 2000.

Dr. Ioannis Pavlidis (Senior Member, IEEE) re-

ceived the B.S. degree in electrical engineering

fromthe DemocritusUniversity, Greece,the M.S.

degree in robotics from the Imperial College of

the University of London, and the M.S. and Ph.D.

degrees in computer science from the University

of Minnesota.

He joined the Honeywell Laboratories, Min-

neapolis, MN, immediately upon his graduation

in January 1997. His expertise is in the areas of

computer vision beyond the visible spectrum and

pattern recognition of highly variable patterns. He published extensively in

these areas in major journals and refereed conference proceedings over the

past several years. He is the co-chair of the IEEE series of Workshops in

Computer Vision Beyond the Visible Spectrum and serves as a Program

Committee member in several other major conferences.

Dr. Pavlidis is a Fulbright Fellow and a Member of ACM.

Vassilios Morellas (Member, IEEE) received the

B.S. degree in mechanical engineering from the

National Technical University of Athens, Greece,

the M.S. degree in mechanical engineering from

ColumbiaUniversity, andthePh.D. degreein me-

chanical engineering from the University of Min-

nesota.

He has been with the Honeywell Laboratories,

Minneapolis, MN, since 1998. His expertise is in

the areas of computer vision, sensor integration,

and learning theories as they apply to enhancing

robot autonomy and advancingmachine intelligence. Prior to his current po-

sition, he pioneered the SAFETRUCK researchproject while workingat the

University of Minnesota as a Research Associate. SAFETRUCK success-

fully demonstrated the use of differential GPS (global positioning system)

and radar sensing technologies to enhance safety of semi-tractor-trailers

by developing lane departure detection and collision avoidance systems.

SAFETRUCK won the second prize in the 1997 ITS World GPS Showcase

competition.

1496 PROCEEDINGS OF THE IEEE, VOL. 89, NO. 10, OCTOBER 2001

Authorized licensed use limited to: Universitat Autonoma De Barcelona. Downloaded on March 10,2010 at 05:34:51 EST from IEEE Xplore. Restrictions apply.

Panagiot Tsiamyrtzis received the B.S. degree in mathematics from the

Aristotle University, Greece, and the Ph.D. degree instatistics fromthe Uni-

versity of Minnesota.

He served as a faculty member in the Department of Statistics of the Uni-

versity of Minnesota in Fall 2000. He is currently with the Greek Army. His

expertise is in the area of quality control.

Dr. Tsiamyrtzis was the recipient of the best student paper award in 2000

from the American Statistical Association.

Steve Harp received the Ph.D. degree in

psychology (program in perception) from North-

western University in 1986, where his research

was on the perception of visual motion and

camouflage, and the M.S. degree in statistics

from the University of Minnesota in 1994.

He has been with the Honeywell Laboratories,

Minneapolis, MN, since 1985, when he was first

employed as an intern. Since then, he has worked

on a wide range of projects involving artificial

intelligence, statistical analysis, communications

networks, and user interfaces. He has delivered numerous public talks and

papers on these topics.

Dr. Harp is the recipient of two technical achievement awards and the

Honeywell Sweatt award. He is a Member of the American Statistical As-

sociation and the American Association of Artificial Intelligence.

PAVLIDIS et al.: URBAN SURVEILLANCE SYSTEMS 1497

Authorized licensed use limited to: Universitat Autonoma De Barcelona. Downloaded on March 10,2010 at 05:34:51 EST from IEEE Xplore. Restrictions apply.

A High-Precision Algorithm for DOA Estimation Using a Long-Baseline Array Based on the Hearing Mechanism of the Ormia Ochracea

Article

Full-text available

Feb 2022
SENSORS-BASEL

Inspired by the Ormia Ochracea hearing mechanism, a new direction of arrival estimation using multiple antenna arrays has been considered in spatially colored noise fields. This parasitoid insect can locate s cricket’s position accurately using the small distance between its ears, far beyond the standard array with the same aperture. This phenomenon can be understood as a mechanical coupled structure existing between the Ormia ears. The amplitude and phase differences between the received signals are amplified by the mechanical coupling, which is functionally equivalent to a longer baseline. In this paper, we regard this coupled structure as a multi-input multi-output filter, where coupling exists between each pair of array elements. Then, an iterative direction-finding algorithm based on fourth-order cumulants with fully coupled array is presented. In this manner, the orientation of the mainlobe can direct at the incident angle. Hence, the direction-finding accuracy can be improved in all possible incident angles. We derive the Cramér-Rao lower bound for our proposed algorithm and validate its performance based on simulations. Our proposed DOA estimation algorithm is superior to the existing biologically inspired direction-finding and fourth-order cumulants-based estimation algorithms.

Optimizing Visual Sensors Placement With Risk Maps Using Dynamic Programming

Article

Full-text available

Nov 2021

Typically, optimizing the poses and placement of surveillance cameras is usually formulated as a discrete combinatorial optimization problem. The traditional aspects of solving the camera placement problem attempt to maximize the area monitored by the camera array and/or reduce the cost of installing a set of surveillance cameras. Several approximate optimization techniques have been proposed to locate near-optimal solution to the placement problem. Thus, related surveillance planning methods optimize the placement of visual sensors based on equally significance grids by not limiting to demand of coverage. This article explores the efficiency of the visual sensor placement based on a combination of two methods namely, a deterministic risk estimation for the risk assessment and a dynamic programming for optimizing the placement of surveillance cameras. That is, the enhanced efficiency of coverage is obtained by developing a prior grid assessment practice to stress on the security sensitive zones. Then, the dynamic programming algorithm operates on security quantified maps rather than uniform grids. The attained result is compared to the respective heuristic search algorithm outcomes. The overall assessment shows the reliability of the proposed methods’ combinations.

Using artificial intelligence search in solving the camera placement problem

Chapter

Full-text available

Jan 2022

Due to the impact of optimal camera placement on the efficiency and the cost of surveillance systems as well as the rapid development in sensor technologies and the pressing security needs, the last two decades witnessed an increasing interest in developing and introducing efficient methods for solving what is known as the camera placement problem. Given some monitoring quality measures coupled with the specifications of the visual sensors in hand, the goal of the camera placement framework is to capitalize the area seen by a set of visual sensors. This problem is considered a discrete optimization problem and is known to have an NP-hard problem complexity. In order to solve the camera placement problem, a crucial fundamental step is modeling the coverage of the cameras in use. Following the coverage modeling, an optimization method needs to be used to locate the optimal poses and/or camera positions. In general, artificial intelligence search strategies are extensively used in solving discrete optimization problems. In particular, this chapter discusses formulating the camera placement problem in order to be solved by artificial intelligence search. Moreover, the chapter applies selective artificial intelligence search strategies to solve the camera placement problem. Most of these search formulations investigate the problem from a greedy-based perspective. Thus the target is to maximize the primary coverage of the camera network. Additionally, the initiation of the camera model and the subsequent coverage table are counted as key steps prior to applying the optimization method. Thus all instances of the camera coverage over the potential locations must be computed and stored in tabular form, usually known as a coverage table. The computation of the coverage table offers essential data to formulate the search space. Furthermore, in order to locate the solution to the problem, each search strategy defines a unique path throughout the search space. However, regardless of the selection of the search technique, the solutions are usually attained by utilizing randomization restart settings. The chapter also carries out an analytical review of three main searching algorithms namely, generate and test, uninformed search, and hill climbing search algorithms. Two case studies are used to evaluate those algorithms, and the camera placement problem is formulated as a coverage maximization problem. The various searching algorithms are implemented to seek the maximum coverage of the camera array. The placement results obtained based on those algorithms are critically compared in terms of the algorithms’ efficiency and performance. Finally, the chapter highlights the strengths and weaknesses of each approach.

FogSurv: A Fog-Assisted Architecture for Urban Surveillance Using Artificial Intelligence and Data Fusion

Article

Full-text available

Aug 2021

Urban surveillance, of which airborne urban surveillance is a vital constituent, provides situational awareness (SA) and timely response to emergencies. The significance and scope of urban surveillance has increased manyfold in recent years due to the proliferation of unmanned aerial vehicles (UAVs), Internet of things (IoTs), and multitude of sensors. In this article, we propose FogSurv—a fog-assisted surveillance architecture and framework leveraging artificial intelligence (AI) and information/data fusion for enabling real-time SA and monitoring. We also propose an AI- and data-driven information fusion model for FogSurv to help provide (near) real-time SA, threat assessment, and automated decision-making. We further present a latency model for AI and information fusion processing in FogSurv. We then discuss several use cases of FogSurv that can have a huge impact on multifarious fronts of national significance ranging from safeguarding national security to monitoring of critical infrastructures. We conduct an extensive set of experiments to demonstrate that FogSurv using AI and data fusion help provide near real-time inferences and SA. Experimental results demonstrate that FogSurv provides a latency improvement of 37% on average over cloud architectures for the selected benchmarks. Results further indicate that combining AI with data fusion as in FogSurv can provide a speedup of up to 9.8× over AI without data fusion while also maintaining or improving the inference accuracy. Additionally, results show that AI combined with fusion of different image modalities obtained through UAVs in FogSurv results in improved average precision of target detection for surveillance as compared to AI without data fusion for different target scales and environment complexity.

An Integrated Mechanical Intelligence and Control Approach Towards Flight Control of Aerobat

Conference Paper

May 2021

Biologically inspired direction‐finding for short baseline

Article

Full-text available

May 2021
IET RADAR SONAR NAV

Abstract The optimal implementation for a biologically inspired coupling structure to overcome the limitation of short baseline direction‐finding is determined. This approach is inspired by the Ormia ochracea, a parasitoid insect living in North America. It can locate the crickets’ call accurately with the very small distance between its ears far beyond the accuracy of an interferometer with the same baseline. This outstanding performance depends on the mechanical coupling in its auditory system. The first research focus is on the mechanism of the coupling structure, considering not only the amplification on phase difference but also the effect on output power, which leads to the performance improvement in comparison with the traditional method. Then, the biologically inspired coupling structure is optimised to achieve the best direction‐finding performance in crickets’ sound frequency, reducing the estimation error by 75% when the signal incident at boresight. To implement the actual coupling structure with optimal direction‐finding performance, both the analogue circuit and the digital filter implementation are discussed, and the latter attains the theoretical optimal performance. Finally, a direction‐finding system prototype is carried out to verify the advantage of digitally implemented coupling structure, and the measurement result approximates the corresponding Cramér–Rao lower bound.

An Integrated Mechanical Intelligence and Control Approach Towards Flight Control of Aerobat

Preprint

Full-text available

Mar 2021

Our goal in this work is to expand the theory and practice of robot locomotion by addressing critical challenges associated with the robotic biomimicry of bat aerial locomotion. Bats are known for their pronounced, fast wing articulations, e.g., bats can mobilize as many as forty joints during a single wingbeat, with some joints reaching over one thousand degrees per second in angular speed. Copying bats flight is a significant ordeal, however, very rewarding. Aerial drones with morphing bodies similar to bats can be safer, agile and energy-efficient owing to their articulated and soft wings. Current design paradigms have failed to copy bat flight because they assume only closed-loop feedback roles and ignore computational roles carried out by morphology. To respond to the urgency, a design framework called Morphing via Integrated Mechanical Intelligence and Control (MIMIC) is proposed. In this paper, using the dynamic model of Northeastern University's Aerobat, which is designed to test the effectiveness of the MIMIC framework, it will be shown that computational structures and closed-loop feedback can be successfully used to mimic bats stable flight apparatus.

Mechanical design and fabrication of a kinetic sculpture with application to bioinspired drone design

Conference Paper

Full-text available

Apr 2021

Orientation stabilization in a bioinspired bat-robot using integrated mechanical intelligence and control

Conference Paper

Full-text available

Apr 2021

Mechanical design and fabrication of a kinetic sculpture with application to bioinspired drone design

Preprint

Full-text available

Mar 2021

Biologically-inspired robots are a very interesting and difficult branch of robotics dues to its very rich dynamical and morphological complexities. Among them, flying animals, such as bats, have been among the most difficult to take inspiration from as they exhibit complex wing articulation. We attempt to capture several of the key degrees-of-freedom that are present in the natural flapping gait of a bat. In this work, we present the mechanical design and analysis of our flapping wing robot, the Aerobat, where we capture the plunging and flexion-extension in the bat's flapping modes. This robot utilizes gears, cranks, and four-bar linkage mechanisms to actuate the arm-wing structure composed of rigid and flexible components monolithically fabricated using PolyJet 3D printing. The resulting robot exhibits wing expansion and retraction during the downstroke and upstroke respectively which minimizes the negative lift and results in a more efficient flapping gait.

Mixture Models: Inference and Applications to Clustering.

Article

Jan 1989

Learning patterns of activity using real-time tracking

Article

Jan 2000

A Bayesian computer vision system for modeling human interactions

Article

Jan 2000

ALGORITHM FOR TRACKING MULTIPLE TARGETS.

Article

Jan 1978

Donald B. Reid

An algorithm for tracking multiple targets in a cluttered environment is developed. The algorithm is capable of initiating tracks, accounting for false or missing reports, and processing sets of dependent reports. As each measurement is received, probabilities are calculated for the hypotheses that the measurement came from previously known targets in a target file, or from a new target, or that the measurement is false. Target states are estimated from each such data-association hypothesis, using a Kalman filter. As more measurements are received, the probabilities of joint hypotheses are calculated recursively using all available information such as density of unknown targets, density of false targets, probability of detection, and location uncertainty. The branching techique allows correlation of a measurement with its source based on subsequent, as well as previous, data.

Maximum likelihood from incomplete data via the EM algorithm (With discussion)

Article

Jan 1977

Optimal homography computation with a reliability measure

Article

Jul 2000

We describe a theoretically optimal algorithm for computing the homography between two images. First, we derive a theoretical accuracy bound based on a mathematical model of image noise and do simulation to confirm that our renormalization technique effectively attains that bound. Then, we apply our technique to mosaicing of images with small overlaps. By using real images, we show how our algorithm reduces the instability of the image mapping.

Divergence measures based on the Shannon entropy

Article

Jan 1991

J. Lin

Theory of Probability

Article

Dec 1999

title>Change Detection and Tracking Using Pyramid Transform Techniques</title

Conference Paper

Dec 1985
Proceedings of SPIE

An automated, or "smart", surveillance system must be sensitive to small object motion wherever it may occur within a large field of view. The system must also be capable of distinguishing changes of interest from other image activity or noise. Yet the data processing capabilities of practical systems is often quite limited. To achieve these performance objectives at a low data rate, a pyramid based image preprocessor has been constructed that can compute frequency tuned "change energy" measures in real time. A microprocessor then examines a relatively small set of these measures and follows a foveal search strategy to isolate moving objects for tracking or for more detailed analysis.

Maximum-likelihood estimation from incomplete data via the EM algorithm (with discussion)

Article

Jan 1977

Urban Surveillance Systems: From the Laboratory to the Commercial World

Abstract and Figures

Recommended publications

A system of automated training sample generation for visual-based car detection

Object Tracking with Measurements from Single or Multiple Cameras

Crisp Weighted Support Vector Regression for robust single model estimation: Application to object t...

Detecting and tracking body parts of multiple people