ArticlePDF Available

How panoramic visualization can support human supervision of intelligent surveillance

Authors:

Abstract and Figures

In video-based surveillance people monitor a wide spatial area through video sensors for anomalous events related to safety and security. The size of the area, the number of video sensors, and the camera's narrow field-of-view make this a challenging cognitive task. Computer vision researchers have developed a wide range of algorithms to recognize patterns in the video stream (intelligent cameras). These advances create a challenge for human supervision of these intelligent surveillance camera networks. This paper presents a new visualization that has been developed and implemented to integrate video-based computer vision algorithms with control of pan-tilt-zoom cameras in a manner that supports the human supervisory role.
Content may be subject to copyright.
How panoramic visualization can support human
supervision of intelligent surveillance
Alexander M. Morison David D. Woods
Human Systems Integration
Ohio State University
Columbus, OH
James W. Davis
Dept. of Computer Science and Engineering
Ohio State University
Columbus, OH
In video-based surveillance people monitor a wide spatial area through video sensors for anomalous events related to safety and security.
The size of the area, the number of video sensors, and the camera’s narrow field-of-view make this a challenging cognitive task. Computer
vision researchers have developed a wide range of algorithms to recognize patterns in the video stream (intelligent cameras). These advances
create a challenge for human supervision of these intelligent surveillance camera networks. This paper presents a new visualization that has
been developed and implemented to integrate video-based computer vision algorithms with control of pan-tilt-zoom cameras in a manner
that supports the human supervisory role.
SUPERVISION OF SMART CAMERAS
In every day activities people move through many areas moni-
tored by security cameras. Surveillance centers use these cam-
eras to meet safety and security goals by looking for unusual hu-
man activities and anomalous events in the physical areas under
camera surveillance (Haering et al., 2008). Surveillance centers
tend to be quite similar. One or more walls of monitors show
live video feeds from sensors (e.g., video cameras, news organi-
zations). The centers are staffed 24/7; interesting events occur
relatively infrequently, and the centers communicate with secu-
rity personnel in the physical world and with other security and
safety related organizations.
The video camera, which is still the workhorse sensor tech-
nology in surveillance, can be thought of as a stand-in for hu-
man security personnel. Each video sensor, fixed mounted but
with pan, tilt, and zoom (PTZ) capability, can monitor a portion
of the total space to be surveilled, which can be spatially large.
Security personnel, rather than patrol the physical area, often
monitor video feeds and use their experience to look for and
identify anomalous patterns of activity. The task of monitoring
is challenging given the large area under surveillance, difficul-
ties in recognizing anomalous activity (e.g., high context sensi-
tivity), a low base rate of anomalous events, a large number of
video sensors feeding the surveillance center, and the ability to
capture only a fraction of the entire area at any given moment.
One goal of computer vision algorithms in video surveillance
is to reduce the need for and the burden on human security per-
sonnel by creating smart automation to monitor the array of
sensor feeds. The result has been the development and deploy-
ment of intelligent algorithms to detect human motion (Gavrila,
1999), track people moving through a scene (Aggarwal and Cai,
1999), and analyze the types of motions people carry out (Wang
et al., 2003). The design question is how to couple responsible
supervisory human security personnel to the results of the algo-
rithmic analysis of the actual video feeds from the scene, or, in
other words, what kinds of supervisory displays are needed for
smart surveillance systems?
This paper presents a new concept for supervisory visualiza-
tions of smart surveillance systems designed for single smart
PTZ cameras. The visualization is based on a static panoramic
frame of reference that captures the entire space of views for a
single smart PTZ camera. This is overlaid with a brightness
coded activity map that represents outputs from smart algo-
rithms monitoring for human activity. Scan-path algorithms are
overlaid on to this capturing how the PTZ camera will monitor
the space based on the output of smart algorithms.
The scan panoramic display serves as a longshot display
(Woods, 1984; Woods and Watts, 1997) for human supervi-
sors to understand how diverse sets of smart algorithms cap-
ture the flow of human activity through a physical space being
monitored. In addition, the display also conveys how the scan-
paths of a camera will progress over a scene based on the out-
put of these smart algorithms. A longshot display provides an
overview of the system status, orientation, and movement be-
tween detailed views. The longshot is always displayed in par-
allel with detailed views to minimize the attention re-orienting
costs associated with moving between isolated detailed views.
The panoramic representation also provides a base for human
supervisors to interact with the smart camera (visual program-
ming interface). This provides a means to meet the directabil-
ity functional requirement for human supervision of automation
(Woods and Hollnagel, 2006).
The software to create these visualizations has been built and
tested on actual video feeds from PTZ cameras that monitor hu-
man activities on the Ohio State University campus (see Figures
2(a-c)) for a smart algorithm (detecting translating motion) and
three different scan-path algorithms (Davis et al., 2007a,b). In
addition, the base panorama visualization can be built automati-
cally and continuously (Sankaranarayanan and Davis, 2008a,b).
COGNITIVE COMPLEXITIES
This section highlights some of the issues in human supervi-
sion of smart video surveillance by exploring the analogy be-
tween experienced human security personnel moving through
the scene of interest and intelligent processing of the video
stream from multiple cameras placed in and around the scene
of interest.
Patrolling
The patrol or in-scene agent is embedded on the ground within
the physical environment they are observing. The in-scene pa-
trol officer directly perceives the world they are moving through
as a continuous physical topology. The scene of interest is a
series of views the patrol officer takes and the path he or she
follows. The observing behavior of the patrol officer defines a
field of view (FOV) that is not scalable (i.e., cannot externally
expand or shrink the FOV). The in-scene patrol officer is sensi-
tive to the temporal evolution of activity only at a human scale
(i.e., cannot see patterns of activity defined over different tem-
poral scales, e.g., the last month). Additional constraints, such
as physical structures, layout of physical forms, and the environ-
ment influence what an in-scene agent can observe and where
they can move. For an in-scene agent, all possible view direc-
tions are, for our purposes, represented by a full sphere. Also,
the maximum distance between consecutive points of observa-
tion is defined by the type of environment (e.g., city, suburb,
etc.) and the mode of transportation (e.g., by foot, segway, or
car). Finally, the embedded agent has the ability to directly in-
teract with the environment by moving objects, speaking with
people, and by visible presence.
Surveilling
The out-of-scene agent understands, moves around, and inter-
acts with the world with a different set of constraints. Within
the surveillance center, personnel are external to the environ-
ment being observed. Consequently, they do not have a sin-
gle perceptual experience but multiple, narrow “keyhole” views
generated by video sensors on the world. These video sensors,
depending on their configuration in the world, can create oppor-
tunities to view the world at multiple spatial scales (e.g., from
the rooftops of buildings at differing heights). In addition, a
region of interest is spatially scalable by organizing a set of
cameras (i.e., 2 or more) to observe an area larger than the view-
able field of any single camera. Surveillance should be sensitive
to events defined over multiple temporal scales from extremely
short (notice the bag left behind) to extremely long (the organi-
zation of a protest gathering). Activities of interest can also play
out across multiple temporal and spatial scales, such as when a
small organized protest in one place interacts with other events
that transforms the situation into a chaotic violent confrontation
that spills out over a wider area.
Movement through the environment differs for an an out-of-
scene agent as compared to an in-scene agent. Interaction with
the world for an out-of-scene agent consists of switching be-
tween different cameras. But there are a large number of video
feeds mapped onto a set or even a wall of display monitors cre-
ating the potential for a form of data overload. Selecting among
the feeds creates, in some sense, a virtual patrol even though
the sequence of selecting camera feeds can create tortuous paths
and jumps. The virtual paths are not constrained by the spatial
topology, rather, only by the configuration of the sensor network
and the method of monitoring and controlling the network (e.g.,
through a mapping of camera feeds to monitors with a single
control for all cameras). Distance between points of observa-
tion is no longer meaningful given the structure of the control
room (i.e., wall of monitors). View directions are restricted to
a downward pointed hemisphere for the majority of PTZ cam-
eras, as opposed to the full sphere for the in-scene agent. The
context for out-of-scene surveillance creates risks for impaired
spatial understanding of the actual physical environment, rela-
tionships, and activities.
Smart surveilling
Intelligent algorithms create an opportunity to overcome the
complexities that arise from trying to understand in-scene hu-
man activities from a distant surveillance center (such as the
problem of selecting among a very large number of video feeds
for display onto a set of monitors). The current trend is to al-
low the automation to detect, track, and alarm human activi-
ties that could be anomalous. This creates a human supervisor–
automation system design problem. Commercial surveillance
system designers typically assume that alerting human super-
visors to potentially anomalous behaviors and popping up the
relevant video feed is an acceptable base design for human su-
pervisors even though years of human factors research have
demonstrated that this is a very poor joint system design which
produces a variety of predictable problems and failures (e.g.,
(Woods and Sarter, 2000; Woods, 1995)). These include the
false alert problem, getting lost effects in navigating over mul-
tiple cameras (Guerlain, 2006), and spatial disorientation from
view sequences that jump from place to place (Woods, 1984).
Past research on coordinating human-agent activities in such
joint systems has specified basic functional requirements for ef-
fective designs: observability, directability, directed attention,
and shifting perspectives (Woods and Hollnagel, 2006). The
task for human factors of smart surveillance systems is to de-
velop specific visualizations that meet these functional require-
ments
Extended perception and smart surveilling
The design direction we have been exploring for a wide range of
new sensor capabilities and systems is called extended percep-
tion. In this paradigm new technology extends a remote human
observer’s ability to perceive and explore the world as if they
were present in the scene (Murphy and Burke, 2008). For the
case of a smart PTZ camera in a surveillance task, we concep-
tualize the visualization opportunity created by computer vision
algorithms as: (a) support the out-of-scene agents ability to take
virtual patrols as if they were exploring a continuous space, and
(b) integrate the structure of activity and events in the monitored
physical scene extracted by computer vision algorithms with a
direct view of that physical scene.
The visualization design, for the case of a single PTZ camera,
first, requires a visible spatial frame of reference that surveil-
lance personnel can modify, i.e., is directable (Woods, 1995).
Figure 2(a) shows a panoramic frame of reference that captures
all of the views possible from a fixed PTZ camera in the mon-
itored scene (panels (a) through (c) show the base panorama
from three different cameras that are part of the research surveil-
lance network on the OSU campus).
Second, the visualization requires an overlay that captures
the results of the intelligent processing of activity in the scene.
We chose a history of translating motion through the monitored
scene as a baseline and representative exemplar of smart algo-
rithms (Davis et al., 2007a). Detecting translating motion is an
interesting problem in computer vision, and it is often a base for
more sophisticated algorithms such as tracking a person mov-
ing through a scene. Translating motion or activity paths can
also serve as a backdrop for displaying the output of algorithms
that detect specific patterns of activity such as walking versus
running. The visualization uses brightness coding to provide an
overlay that indicates those areas where the activity algorithms
have seen translating motion, cumulated over a past temporal
window. The brighter the area the more motion the algorithm
has seen in that position over the time interval. Figures 3 show
the brightness coding overlay for the actual motion histories for
the cameras/scenes of OSU campus in Figures 2. Note the dark-
est areas correspond to the rooftops and structures where the
cameras are mounted (generally high on buildings) where hu-
man activity occurs very rarely and bright areas correspond to
roads and pathways.
Given the base frame of reference and the activity history
overlay, one can now consider the scan path of the camera or
the spatial-based virtual patrol–where should or will the camera
point next? Scan-path algorithms use the motion history data to
tailor the PTZ camera movement to the activity in the physical
scene. Figure 4 illustrates scan paths for three different scan-
ning algorithms for the motion history data of the scene in Fig-
ure 2(a) and the brightness coding overlay in Figure 3(a) (see
(Davis et al., 2007a,b) for details of the scanning algorithms).
The scan-path in Figure 4(a) moves probabilistically from loca-
tion to location to sample areas with high activity (probabilistic
jump), while the scan-paths in Figure 4(b-c) create smooth con-
tinuous pathways. There are many different criteria (e.g., ac-
tivity value, staleness of data, operator comprehensibility) that
should be considered and balanced in designing any automatic
scan-path algorithm. The algorithms presented balance these
criteria differently resulting in distinct scan-path behaviors.
EXTENDED PERCEPTION DISPLAY
The panoramic frame of reference is constructed from individ-
ual images taken by the PTZ camera and combined through
an image-based stitching process. The panoramic construction
process uses a mapping that converts the cameras pan and tilt
orientation to an x, y pixel position. The inverse mapping con-
verts pixel position to camera orientation and is the foundation
for communication between supervisor and smart algorithms.
The smart algorithm implemented and demonstrated sepa-
rates patterns of translating motion from background noise.
The algorithm accumulates individual pixel differences between
consecutive images in to a single motion history image (Brad-
ski and Davis, 2002). Over 6 seconds (72 images) the salience
and robustness of translating pedestrians, cyclists, and vehicles
in the motion history image emerges against background noise
sources such as camera noise, changes in illumination, and ran-
dom motion (e.g., moving tree leaves and branches).
Using the same mapping that generates the panoramic frame
of reference, the output of the smart algorithms is transformed
into a panoramic representation. Instead of a set of images as
the input to the mapping function, however, the input is the re-
sult or output of the smart algorithms sampled across the pan-
tilt viewspace. The full process for generating an activity map
requires moving the camera to a pan/tilt position, capturing a se-
quence of images, performing the motion analysis, storing the
results, and then moving the camera to a new pan/tilt position.
One complete pass of the entire scene is sufficient to generate a
single activity map over a short temporal window (20 min).
Collecting and merging multiple passes (single activity maps)
of the scene results in a global activity map such as shown in
Figure 3.
EXPLORING VIRTUAL PATHWAYS
We introduced a representation to subsidize the raw camera
view from surveillance cameras. Integrating spatial structure,
activity data, and algorithm generated scan-paths over time
supports observability and directability for human supervisors.
This smart surveilling or extended perception redefines the unit
of analysis for surveillance from a sequence of single camera
feeds to virtual pathways or patrols through the viewable space.
The scan-path panoramic display and temporal displays support
these virtual pathways through spatial-, temporal-, and activity-
based frames of reference. These frames of reference are inher-
ently coupled and a virtual pathway necessarily defines each of
these dimensions, however, for clarity, we define virtual path-
ways and the forms of exploration for each dimension individ-
ually.
Spatial
The spatial pathway of a PTZ camera differs from that of an
in-scene agent, which was defined as a moving point of obser-
vation with all possible view directions represented by a full
sphere. The spatial-based virtual patrols for a PTZ camera are,
instead, a sequence of pan and tilt positions within a down-
ward pointed hemisphere, from a fixed location in space. The
scan-path panoramic display in Figure 4 supports exploration
of the viewable scene for a PTZ camera by making observable
for the human supervisor the virtual pathway or sequence of
pan and tilt positions. An intrinsic quality of this longshot dis-
play is that not only can a human supervisor apprehend what
the camera will see in the future, but also what the camera will
not see. This display provides a mechanism for the human su-
pervisor to act on this information to re-direct the scan-path
algorithms, through the activity overlay, to explore the view-
able scene through a different virtual pathway. The exploration
through spatial-based virtual pathways are thus a collaboration
between human supervisor, smart algorithms, and scan-path al-
gorithms with the scan-path panoramic display as the medium
of communication and interaction.
Figure 1: The temporal display allows users to explore different temporal
rhythms through two independently scalable temporal scales versus smart al-
gorithm activity output. In this case, the display plots hours versus days to
capture the rhythms of different days over a one week period.
Temporal
The out-of-scene supervisory agent must monitor and explore
across multiple temporal rhythms. Temporal-based virtual path-
ways are a new construct for analysis of the temporal dimen-
sion. The relevant temporal rhythms may occur over differ-
ent temporal scales (minutes vs. hours), temporal intervals
(last month vs. last week), or in different temporal patterns
(Mondays and Wednesdays vs. Tuesdays and Thursdays). A
temporal-based virtual pathway is defined by a temporal win-
dow size, scale, location, and orientation for a PTZ camera and
exploration of the temporal dimension along a virtual pathway
consists of adjusting these different dimensions. The display
in Figure 1 captures these temporal dimensions and allows a su-
pervisor to create a temporal-based virtual pathway. The current
scan-path algorithms do not incorporate temporal information,
however, this is a natural extension for smart algorithms that
will likely inform new designs for temporal-based displays for
video surveillance.
Activity
The out-of-scene supervisory agent through new smart algo-
rithms monitors activity patterns over multiple spatial and tem-
poral scales. New algorithms are constantly created to detect
new types of activity and as the data extracted from these algo-
rithms increases, the potential for data overload also increases.
Escaping from data overload requires new forms of organiza-
tion (Woods et al., 2002) and the spatial longshot provided by
the scan-path panoramic display is precisely tuned to this re-
quirement. Independent of the type of smart algorithm or re-
sulting data, if the point of extraction is the video feed, then the
pan-tilt positions necessarily provide a spatial frame of refer-
ence and is therefore transformable into a spherical overlay rep-
resentation, as illustrated in Figure 3. While integration of this
data into the current visualization emphasizes the usefulness of
the panoramic longshot, these data also create a new activity-
based dimension for exploration, which can be supported by
new forms of activity-based virtual pathways. An area for future
research is to understand how activity-based virtual pathways
inform the organization of algorithm overlays, what manipula-
tion of the activity dimension are necessary (scaling, translating,
etc.), and how does the activity-based virtual pathway construct
inform the design of new smart algorithms.
Summary
This paper presented a panoramic display for supervisory visu-
alization of smart algorithms for a single PTZ camera within
a video-based security surveillance context. This display in-
tegrates the capability of video sensors, computer vision algo-
rithms, and cogitive systems principles to overcome the cogni-
tive challenges inherent in understanding a distant environment
through a video feed with a narrow field-of-view.
References
Aggarwal, J. K. and Cai, Q. (1999). Human motion analysis: A review. Com-
puter Vision and Image Understanding, 73(3):428–440.
Bradski, G. R. and Davis, J. W. (2002). Motion segmentation and pose recog-
nition with motion history gradients. Machine Vision and Applications,
13(3):174–184.
Davis, J. W., Morison, A. M., and Woods, D. D. (2007a). An adaptive focus-of-
attention model for video surveillance and monitoring. Mach. Vision Appl.,
18(1):41–64.
Davis, J. W., Morison, A. M., and Woods, D. D. (2007b). Building adaptive
camera models for video surveillance. In Applications of Computer Vision,
2007. WACV ’07. IEEE Workshop on.
Gavrila, D. M. (1999). The visual analysis of human movement: A survey.
Computer Vision and Image Understanding, 73(1):82–98.
Guerlain, S. (2006). Software navigation design. Applied Spatial Cognition:
From Research to Cognitive Technology.
Haering, N., Venetianer, P. L., and Lipton, A. (2008). The evolution of video
surveillance: an overview. Machine Vision and Applications, 19(5-6):279 –
290.
Murphy, R. R. and Burke, J. L. (2008). From remote tool to shared roles.
Robotics & Automation Magazine, IEEE, 15(4):39–49.
Sankaranarayanan, K. and Davis, J. W. (2008a). An efficient active camera
model for video surveillance. In Applications of Computer Vision, 2008.
WACV 2008. IEEE Workshop on, pages 1–7.
Sankaranarayanan, K. and Davis, J. W. (2008b). A fast linear registration frame-
work for Multi-Camera GIS coordination. In Advanced Video and Signal
Based Surveillance, 2008. AVSS’08. IEEE Fifth International Conference
on, pages 245–251.
Wang, L., Hu, W., and Tan, T. (2003). Recent developments in human motion
analysis. Pattern Recognition, 36(3):585–601.
Woods, D. (1995). The alarm problem and directed attention in dynamic fault
management. Ergonomics, 38(11):2371–2393.
Woods, D. D. (1984). Visual momentum: a concept to improve the cognitive
coupling of person and computer. International Journal of Man-Machine
Studies, 21(3):229–244.
Woods, D. D. and Hollnagel, E. (2006). Joint Cognitive Systems: Patterns in
Cognitive Systems Engineering. CRC Press.
Woods, D. D., Patterson, E. S., and Roth, E. M. (2002). Can we ever escape
from data overload? a cognitive systems diagnosis. Cognition, Technology
& Work, 4:22—36.
Woods, D. D. and Sarter, N. B. (2000). Learning from automation surprises and
going sour accidents. Cognitive Engineering in the Aviation Domain, pages
327–353.
Woods, D. D. and Watts, J. C. (1997). How not to have to navigate through too
many displays. Handbook of Human-Computer Interaction, 2:617–650.
(a) (b) (c)
Figure 2: The panoramic frame of reference for three (panels a-c) separate PTZ cameras on OSU campus that captures all of the views possible from a fixed point relative
to the monitored scene.
© copyright Alexander Morison, 2009
Cognitive Systems Engineering Laboratory
The Ohio State University
© copyright Alexander Morison, 2009
Cognitive Systems Engineering Laboratory
The Ohio State University
© copyright Alexander Morison, 2009
Cognitive Systems Engineering Laboratory
The Ohio State University
(a) (b) (c)
Figure 3: The motion history brightness coded overlays for the cameras/scenes in Figures 2(a-c). Note the darkest areas correspond to the rooftops and structures where
human activity occurs very rarely and bright regions correspond to locations of expected human activity such as walkways and roads.
© copyright Alexander Morison, 2009
Cognitive Systems Engineering Laboratory
The Ohio State University
© copyright Alexander Morison, 2009
Cognitive Systems Engineering Laboratory
The Ohio State University
© copyright Alexander Morison, 2009
Cognitive Systems Engineering Laboratory
The Ohio State University
(a) (b) (c)
Figure 4: The brightness coded activity map in Figure 3(a), which represents outputs from smart algorithms monitoring for human activity patterns, is overlaid with a scan
path that represents how the camera will move to monitor the space. The three different scanpaths are created using three different algorithms which are (a) a probabilistic
jump, (b) inhibited probabilistic walk, and (c) reinforcement learning paths.
... Third, the frame is thresholded to get a binary silhouette that has foreground pixels at maximal brightness and background pixels at minimal brightness. This is intended to capture the relevant foreground motion and filter out background noise [76,78]. The MHI is updated with the binary silhouette image as described above, giving us a composite motion image M t that shows the most recent motion at maximal brightness. ...
Article
Full-text available
Researchers in the cognitive and affective sciences investigate how thoughts and feelings are reflected in the bodily response systems including peripheral physiology, facial features, and body movements. One specific question along this line of research is how cognition and affect are manifested in the dynamics of general body movements. Progress in this area can be accelerated by inexpensive, non-intrusive, portable, scalable, and easy to calibrate movement tracking systems. Towards this end, this paper presents and validates Motion Tracker, a simple yet effective software program that uses established computer vision techniques to estimate the amount a person moves from a video of the person engaged in a task (available for download from http://jakory.com/motion-tracker/). The system works with any commercially available camera and with existing videos, thereby affording inexpensive, non-intrusive, and potentially portable and scalable estimation of body movement. Strong between-subject correlations were obtained between Motion Tracker's estimates of movement and body movements recorded from the seat (r =.720) and back (r = .695 for participants with higher back movement) of a chair affixed with pressure-sensors while completing a 32-minute computerized task (Study 1). Within-subject cross-correlations were also strong for both the seat (r =.606) and back (r = .507). In Study 2, between-subject correlations between Motion Tracker's movement estimates and movements recorded from an accelerometer worn on the wrist were also strong (rs = .801, .679, and .681) while people performed three brief actions (e.g., waving). Finally, in Study 3 the within-subject cross-correlation was high (r = .855) when Motion Tracker's estimates were correlated with the movement of a person's head as tracked with a Kinect while the person was seated at a desk (Study 3). Best-practice recommendations, limitations, and planned extensions of the system are discussed.
... number of video sensors, and the camera's narrow field-of-view make this a challenging cognitive task. Computer vision researchers have developed a wide range of algorithms to recognize patterns in the video stream (intelligent cameras). These advances have created a challenge for human supervision of these intelligent surveillance camera networks. Morison et al. (2009) presented an approach to panoramic visualization intended to support human supervision of intelligent surveillance. This new visualization integrates video-based computer vision algorithms with control of pan-tilt-zoom cameras in a manner that supports the human supervisory role.Figure 1 illustrates the positioning of surveillance syste ...
Chapter
Full-text available
As part of a state-of-the-art survey of methods and tools for surveillance and protection of citizens and critical infrastructures, this chapter presents a Human Factors and Ergonomics perspective to support the design and evaluation of surveillance systems. The chapter is composed of three main parts. Human characteristics that impinge on surveillance systems design for effective human-system coupling are reviewed, focusing especially on perception and modes of control, as well as on information processing. Performance considerations are centered on fatigue and arousal states, as well as tolerance to shift work. Human Factors and Ergonomics testing and evaluation methods are considered, including usability, while 2 cognitive systems engineering is introduced as an approach to systems design that embeds both the user-centered design and the design of effective systems approaches. The set of concepts and methods reviewed forms a body of knowledge and techniques which ought to be considered and applied to ensure the design of effective surveillance systems and may be used to improve existing ones.
Article
Intelligent multi-camera video surveillance is a multidisciplinary field related to computer vision, pattern recognition, signal processing, communication, embedded computing and image sensors. This paper reviews the recent development of relevant technologies from the perspectives of computer vision and pattern recognition. The covered topics include multi-camera calibration, computing the topology of camera networks, multi-camera tracking, object re-identification, multi-camera activity analysis and cooperative video surveillance both with active and static cameras. Detailed descriptions of their technical challenges and comparison of different solutions are provided. It emphasizes the connection and integration of different modules in various environments and application scenarios. According to the most recent works, some problems can be jointly solved in order to improve the efficiency and accuracy. With the fast development of surveillance systems, the scales and complexities of camera networks are increasing and the monitored environments are becoming more and more complicated and crowded. This paper discusses how to face these emerging challenges.
Article
Full-text available
In N. Sarter and R. Amalberti (Eds.) Cognitive Engineering in the Aviation Domain, Erlbaum, Hillsdale NJ, in press.
Chapter
Full-text available
The three trends in information visualization attempting to address the problems caused by large networks of displays of raw data are information animation, integrated representations, and coordination of multiple views. Coordination of multiple views is used to create a virtual perceptual field or a workspace in which practitioners carry out their work activity and is the theme of this chapter. This chapter concerns how the characteristics of computer-based networks of displays shape the cognition and performance of practitioners. The chapter reviews common findings from field studies that have shown how large networks of displays, available through a limited keyhole, can place new mental burdens on users. It also provide an overview of some of the techniques that designers can use to break down the keyhole and help users focus on the relevant portion of the data field as activities unfold. The work in human-computer interaction concerned with this level of analysis of a computer based information system often goes under labels such as navigation, browsing, hypertext, dialogue design, or window management. This chapter discusses ways in which designers can coordinate different kinds of display frames within a virtual workspace. The chapter refers to this level of analysis and design as workspace coordination—the integration of the set of displays and classes of views that can be seen together in parallel or in series as a function of context.
Article
Full-text available
Data overload is a generic and tremendously difficult problem that has only grown with each new wave of technological capabilities. As a generic and persistent problem, three observations are in need of explanation: Why is data overload so difficult to address? Why has each wave of technology exacerbated, rather than resolved, data overload? How are people, as adaptive responsible agents in context, able to cope with the challenge of data overload? In this paper, first we examine three different characterisations that have been offered to capture the nature of the data overload problem and how they lead to different proposed solutions. As a result, we propose that (a) data overload is difficult because of the context sensitivity problem – meaning lies, not in data, but in relationships of data to interests and expectations and (b) new waves of technology exacerbate data overload when they ignore or try to finesse context sensitivity. The paper then summarises the mechanisms of human perception and cognition that enable people to focus on the relevant subset of the available data despite the fact that what is interesting depends on context. By focusing attention on the root issues that make data overload a difficult problem and on people’s fundamental competence, we have identified a set of constraints that all potential solutions must meet. Notable among these constraints is the idea that organisation precedes selectivity. These constraints point toward regions of the solution space that have been little explored. In order to place data in context, designers need to display data in a conceptual space that depicts the relationships, events and contrasts that are informative in a field of practice.
Article
Full-text available
Computer display system users must integrate data across successive displays. This problem of across-display processing is analogous to the question of how the visual system combines data across successive glances (fixations). Research from cognitive psychology on the latter question is used in order to formulate guidelines for the display designer. The result is a new principle of person-computer interaction, visual momentum, which captures knowledge about the mechanisms that support the identification of “relevant” data in human perception so that display system design can support an effective distribution of user attention. The negative consequences of low visual momentum on user performance are described, and display design techniques are presented to improve user across-display information extraction.
Article
Full-text available
The ability to recognize humans and their activities by vision is key for a machine to interact intelligently and effortlessly with a human-inhabited environment. Because of many potentially important applications, “looking at people” is currently one of the most active application domains in computer vision. This survey identifies a number of promising applications and provides an overview of recent developments in this domain. The scope of this survey is limited to work on whole-body or hand motion; it does not include work on human faces. The emphasis is on discussing the various methodologies; they are grouped in 2-D approaches with or without explicit shape models and 3-D approaches. Where appropriate, systems are reviewed. We conclude with some thoughts about future directions.
Conference Paper
Full-text available
We address the limited automatic scanning functionality of standard PTZ camera systems. We present an adaptive, scene-specific model using standard PTZ camera hardware. The adaptive model is constructed automatically by detecting human activity in motion history images (MHIs) using an iterative candidacy-classification-reduction process. The target motion is quantified and employed in the construction of a global activity map, which in turn is used to direct or navigate the camera
Article
This article explores teleoperation for remote presence applications from a human-robot interaction (HRI) perspective to create a model that captures the key elements of the system and projects the impact of ever-increasing advances in autonomy and communications connectivity. Remote presence applications are those where one or more humans use the robot to project themselves into an environment to complete a time-critical mission. In these applications, there is some compelling need to have human perception at a distance. For example, the environment may be unsafe or unreachable, such as encountered when searching for survivors in the aftermath of a disaster, or the situation is novel and perceptually unconstrained, as in noticing a hidden terrorist during a hostage situation. Remote presence applications are characterized by "the observers won't know what needs to be seen until they see it" flavor and the need to see that critical "what needs to be seen" in as near real time as possible. This means that remote presence applications are inherently teleoperated; the human is an active element in the control loop, and there is no benefit to full autonomy. The question becomes how to transfer the advances in autonomy and communications to enable the human-robot enterprise to successfully and reliably complete its mission.
Article
Visual analysis of human motion is currently one of the most active research topics in computer vision. This strong interest is driven by a wide spectrum of promising applications in many areas such as virtual reality, smart surveillance, perceptual interface, etc. Human motion analysis concerns the detection, tracking and recognition of people, and more generally, the understanding of human behaviors, from image sequences involving humans. This paper provides a comprehensive survey of research on computer-vision-based human motion analysis. The emphasis is on three major issues involved in a general human motion analysis system, namely human detection, tracking and activity understanding. Various methods for each issue are discussed in order to examine the state of the art. Finally, some research challenges and future directions are discussed.
Conference Paper
We propose a novel registration framework to map the field-of-coverage of pan-tilt cameras to a GIS (Geographic Information System) planar coordinate system. The cam- era's field-of-coverage is obtained by building a spherical panorama using an efficient active camera model. The pan- tilt orientation rays from the panoramic image are projected onto a GIS orthophoto ground plane and registered using a transformation matrix. The parameters of the transfor- mation are learned in linear time using least squares. The proposed model is experimentally evaluated by registering panoramas from multiple cameras with an orthophoto and then overlaying them with GIS metadata ground truth to validate the accuracy of registration. We also demonstrate the applicability of such a GIS-based framework to a multi- camera, master-slave active tracking system.