Content uploaded by Yi-Ling Chen
Author content
All content in this area was uploaded by Yi-Ling Chen on Apr 30, 2018
Content may be subject to copyright.
City Eyes: An Unified Computational Framework for Intelligent
Video Surveillance in Cloud Environment
Yi-Ling Chen∗, Tse-Shih Chen†, Liang-Chun Yin†, Tsiao-Wen Huang†, Shiou-Yaw Wang†and Tzi-cker Chieuh†
∗Intel-NTU Connected Context Computing Center, National Taiwan University, Taipei, Taiwan
†Cloud Computing Center for Mobile Applications, Industrial Technology Research Institute, Hsinchu, Taiwan
Email: yiling.chen.ntu@gmail.com {tschen,marvinyin,Norman.Huang,karenwang,tcc}@itri.org.tw
Abstract—Over the past few years we have seen explosive
growth in the amount of data and researchers are now
facing greater challenges in dealing with heterogeneous data
increasing in an unprecedented speed. There is a clear need
for intelligent surveillance systems since the size of video data
has become too large to be reviewed by humans. In this paper,
we propose an unified computational framework, which aims
to simplify the integration of various video analysis techniques
and enables rapid application development to meet the diverse
needs in different application domains. It is built upon a
cloud based architecture to appropriately deal with the mas-
sive data analysis problem. Several example applications are
implemented to demonstrate the effectiveness of the proposed
system in assisting users to deal with different problems of
video surveillance in real-world cases.
Keywords-Cloud computing; intelligent surveillance system;
video analysis;
I. INTRODUCTION
Nowadays, video surveillance systems are ubiquitously
installed and continuously generate huge amount of long-
running and unprocessed contents. There is information in
the data, but most of the time it cannot possibly be reviewed
in detail. There is thus a clear need for providing automatic
schemes to assist users in accessing long videos other
than human inspection. One example is the development of
intelligent surveillance systems (IVS) [1] in which computer
vision and pattern recognition techniques are exploited to
provide intelligent video analysis and event-based real-time
alerts, e.g. object detection and tracking [2], [3], to enhance
public security. To better address the increasingly challeng-
ing data analysis problem, next generation IVS should not
only be capable of providing high-performance computing
and scalable storage but also have to be highly flexible in
integrating with a wide variety of video sources and video
analysis technology to meet the diverse demands in different
application scenarios.
Traditional IVS are usually deployed as an isolated appli-
cation suitable for particular environment settings. For ex-
ample, automatic license plate recognition (ALPR) systems
[4] are usually installed in specific areas, e.g. unattended
parking lots, for the purposes like security control or auto-
matic toll collection. IBM S3 [5] is an early work which
attempts to provide an open and extensible framework to
aggregate various IVS engines through standard interfaces
such that higher level analysis, e.g. situation awareness,
behaviour analysis, can be accomplished. However, it has
not put enough emphasis on dealing with the rapid growth
of data scale. Cloud computing provides a new opportunity
to effectively solve the increasingly demanding data analysis
problems. Significant research efforts have been devoted
to deal with computation and data intensive problems in
different problem domains by employing distributed com-
puting infrastructures. Some examples include the use of
Windows Azure for data mining [6], MapReduce for k-
means clustering [7]. In [8], a JavaScript based framework is
proposed to support the design and execution of data analysis
workflows on clouds.
In this paper, we introduce a cloud based system called
City Eyes, which aims to provide an unified computational
framework to facilitate the development of smart video
surveillance applications and appropriately deal with the
challenges of large scale data analysis and management. The
rest of this paper is organized as follows. In Section II, we
explain the architecture of City Eyes and the details of the
platform to accomplish video analysis tasks. In Section III,
we demonstrate several example applications built on the
top of City Eyes to deal with real-world cases. Section IV
concludes this paper.
II. PRO PO SE D SYS TE M
A. Architecture
City Eyes is built upon a distributed data storage and
computing environment. Unlike most of the existing video
surveillance systems, which only capture and archive video
data but possess little intelligent video analysis capability,
mobile devices and advanced video sensors [9] could also act
as a source of computing power and video feeds in the sce-
nario of machine-to-machine (M2M) networks, or Internet of
Things (IoT). As shown in Figure 1, the ecosystem of City
Eyes comprises of fully featured cloud based architecture
including the following main components:
•Infrastructure-as-a-Service (IaaS): Any cloud comput-
ing platforms, such as AWS or Windows Azure could
be used to provide City Eyes with elastically scalable
Figure 1. The architecture of City Eyes
computing resources. We adopted the Cloud OS1self-
developed by ITRI as the underlying cloud computing
infrastructure, which possesses some advanced features
such as all layer-2 network [10] and fast fail-over [11].
•Video Analysis Platform-as-a-Service (PaaS): Undoubt-
edly, no single video analysis technology may cover and
satisfy the diverse demands of smart video surveillance.
It is thus essential to provide a general platform on
which a wide range of video processing engines may be
executed efficiently in an unified way. Such platform is
beneficial since it may let computer vision researchers
to focus on delivering effective algorithms without wor-
rying about the complexity of parallel processing. On
the other hand, application developers may also develop
sophisticated applications based on a rich collection of
advanced IVS technologies.
•Software-as-a-Service (SaaS): On the top of the overall
software stack, there are a variety of web services
developed for IVS applications which can be easily
accessed through web browsers.
A cloud based architecture is particularly suitable for video
analysis since it often involves in intensive computations.
The heavy loading of complex computations can thus be
shifted to cloud servers to alleviate the power consumptions
of mobile devices or video sensors. In addition, the cen-
tralized service-oriented model of cloud computing relieves
service providers from the burden of deploying and upgrad-
ing their software.
1http://www.itri.org.tw/eng/econtent/about/about09 02.aspx?sid=5
B. Video Analysis PaaS
The main design challenge of City Eyes is to provide a
general framework to handle the diverse needs in smart video
surveillance applications. To this end, the proposed PaaS has
been designed to include the following main functionalities
to meet some basic requirements, such as the simplicity of
integrating various video analysis engines and supporting
configurable workflow to accomplish different tasks.
1) Controller: PaaS controller is the central part of the
proposed video analysis PaaS. It is the universal interface
for all SaaS developers, which provides standard APIs for
the tasks like submitting jobs and querying job results. Each
call to the submitJob() API will associate each job with
an unique appID to identify its owner. Developers may
also specify the priority of jobs and a job message will
then be assigned to a corresponding queue belonging to the
specified priority. Each job queue is consumed by worker
instances at a certain rate in accordance with its priority to
provide reasonable QoS. PaaS controller is also responsible
for accepting the worker instances to report analysis results.
It is worth noting that each appID has its own dedicated
job queues and worker instances (please see Figure 1).
2) Worker instances: All the dispatched jobs will be
retrieved and performed by various video analysis engines
running on worker instances (i.e. virtual machines). For each
engine, before it can be called as a service of PaaS, it
must be uploaded to an image of VM first and registered
to obtain an unique engineID. To maximize simplicity
and compatibility, it is only required for each engine to be
able to be launched and accept optional parameters through
command line interface. To coordinate job execution and
Figure 2. Control and data flows of Video analysis PaaS
report results, a bridge between PaaS controller and worker
instance is required. To this end, as shown in Figure 2, a
daemon program called orchestrator is put on each worker
instance, which is responsible for launching the specified
engines and monitoring their status. More specifically, or-
chestrator communicates with the engines being executed by
interpreting the messages printed to stdout for progress
update or error handling.
3) Workflow configuration: Typically, a video analysis
job can be further divided into a set of subtasks. When
submitting a job, we allow developers to explicitly specify
the desired workflow as long as each subtask is carried out
by a valid engine. For example, the following message in
JSON format specifies a high priority job composed of a
two-stage workflow:
[{"appId":1},{"engineWorkflow":"1|2"},
{"jobPriority":"HIGH"},{"engineId":1,
"param":"value"},{"engineId":2,"param1":
"value1","param2":"value2"}]
Note that the parameter lists are optional and may be
of variable lengths. The orchestrator will then launch the
engines specified in the given workflow and automatically
append the corresponding parameters as input. In practice,
an engine on PaaS is not necessarily limited to perform only
video analysis tasks. For example, it could also be a video
crawler which sends HTTP requests or CGI commands to
remote DVR/NVR to retrieve video clips for the subsequent
processing.
4) Load balancing and error handling: To achieve ro-
bustness, two-level error handling is performed in the pro-
posed PaaS as illustrated in Figure 2. Firstly, for each
orchestrator, it is responsible for periodically sending heart-
beat signal to PaaS controller to reflect the status of its
host worker instance as well as monitoring the progress
of the job being executed. If an engine fails or crashes, it
will be detected (or eventually timed out) and terminated
by orchestrator. A failed job will be republished to job
queue and re-tried by other workers up to a prescribed
times until it is marked as permanently failed. Secondly, for
PaaS controller, it will restart a worker instance if it stops
Figure 3. Comparison of computation time of video summarization on
videos of lengths ranging from 1 to 10 hours by using 1, 10 and 20 worker
instances, respectively.
sending heartbeats for a certain amount of time. Moreover,
PaaS controller will adaptively start up/shut down worker
instances according to the number of waiting jobs in the
queues.
III. APP LI CATI ON S AN D EVALUATI ON S
A. Applications for Intelligent Video Surveillance
Based on the proposed computational framework, it has
been made very easy to develop various smart video surveil-
lance applications on the top of it. In the following, we
briefly describe several applications developed under such a
programming paradigm.
1) Video summarization: Due to the rapid development
of video capturing technology, surveillance cameras are
installed everywhere and generate a large number of videos
continuously. Relying on human inspection for threat detec-
tion is becoming not only impractical but also intractable.
Video summarization techniques [12] aim to deal with this
issue by eliminating irrelevant video contents before human
inspection. We apply the background subtraction technique
[13] to delete still images containing no moving objects and
produce a compact version of source videos without loss of
salient information.
2) Vehicle detection and tracking: In [14], the authors
intended to combine automatic license plate recognition
technology and geographic information of street surveillance
cameras to recover the trajectory of a vehicle given its
license plate number. We have implemented this concept
on the proposed PaaS. Ideally, such system may greatly
simplify and accelerate the process of investigating certain
security problems, e.g. searching for stolen cars. However,
its success is also heavily dependent on some factors, such
as the density and image quality offered by the cameras.
3) Camera anomaly detection: Maintenance of large
surveillance camera network is important in order to ensure
each surveillance camera is of good image quality and
correct field of view. To minimize the efforts of system
administrators, we have built a map-based web application
which enables users to easily bring up live video feeds as
well as locating the abnormal surveillance cameras with
broken connection. In addition to hardware failure, we also
adopted the image-based approach [15] to automatically de-
tect camera anomaly events such as spray painting, blockage
and defocusing.
B. Implementation and Evaluations
Currently, City Eyes has been integrated into the surveil-
lance camera networks owned by police departments of two
cities in Taiwan, which has over 3000 and 23000 cameras,
respectively. One can imagine that there will be inevitably
a great diversity of video sources among such large camera
networks. Therefore, the greatest efforts when implementing
City Eyes had been to develop various video crawlers to
retrieve videos from DVRs of different vendors. Fortunately,
under the framework of City Eyes, it only requires minor
modifications, e.g. reconfiguration of workflows, to migrate
the IVS applications between different physical environ-
ments. As illustrated in Figure 3, City Eyes can summarize
long-running video in a very short time with a sufficient
number of workers. According to an user study conducted
on 15 test users, it indicated that City Eyes can effectively
reduce the time spent on their daily routines, such as
watching long-running videos as a security problem occurs,
or regularly examining the status of surveillance cameras.
IV. CONCLUSION
In this paper, we presented City Eyes, a cloud based
platform designed for building intelligent video surveillance
applications. The main innovation of City Eyes is a general
framework which facilitates the development and deploy-
ment of video analysis engines. A suite of IVS applications
have been built upon the proposed platform and it has been
proven to be very useful for real-world practices of video
surveillance.
REFERENCES
[1] Arun Hampapur, Lisa Brown, Jonathan Connell, Sharat
Pankanti, Andrew Senior and Yingli Tian, “Smart surveillance:
Applications, technologies and implications,” in Proceedings of
Pacific Rim Conference on Multimedia (PCM ’03), 2003.
[2] Omar Javed and Mubarak Shah, “Tracking and Object Clas-
sification for Automated Surveillance,” in Proceedings of Eu-
ropean Conference on Computer Vision (ECCV ’02), pp. 343-
357, 2002.
[3] Jonathan H. Connell, Andrew W. Senior, Arun Hampapur,
Ying-li Tian, Lisa M. G. Brown, and Sharath Pankanti, “De-
tection and Tracking in the IBM PeopleVision System”, in
Proceedings of IEEE International Conference on Multimedia
and Expo (ICME ’04), 2004.
[4] Shyang-Lih Chang, Li-Shien Chen, Yun-Chung Chung and
Sei-Wan Chen, “Automatic License Plate Recognition,” IEEE
Transactions on Intelligent Transportation Systems, Vol. 5,
No. 1, pp. 42-53, March 2004.
[5] Ying-li Tian, Lisa Brown, Arun Hampapur, Max Lu, Andrew
Senior, and Chiao-fe Shu, “IBM smart surveillance system
(S3): event based video surveillance system with an open
and extensible framework,” Machine Vision and Applications
Journal, Vol. 19, No. 5-6, pp. 315-327, September 2008.
[6] Fabrizio Marozzo, Domenico Talia and Paolo Trunfio, “A
Cloud Framework for Parameter Sweeping Data Mining Ap-
plications,” in Proceedings of IEEE International Conference
on Cloud Computing Technology and Science (CloudCom ’11),
pp. 367-374, 2011.
[7] Ekanayake, Jaliya and Pallickara, Shrideep and Fox, Geoffrey,
“MapReduce for Data Intensive Scientific Analyses,” in Pro-
ceedings of the 2008 Fourth IEEE International Conference on
eScience, pp. 277-284, 2008.
[8] F. Marozzo, D. Talia, P. Trunfio, “Scalable Script-based Data
Analysis Workflows on Clouds,” in Proceedings of Workshop
on Workflows in Support of Large-Scale Science, November
2013.
[9] W.-K. Chan, J.-Y. Chang, T.-W. Chen, Y.-H. Tseng, and S.-Y.
Chien, “Efficient content analysis engine for visual surveillance
network,” IEEE Trans. Circuits Syst. Video Technol., vol. 19,
no. 5, pp. 693703, May 2009.
[10] Tzi-cker Chiueh, Cheng-Chun Tu, Yu-Cheng Wang, Pai-Wei
Wang, Kai-Wen Li and Yu-Ming Huang, “Peregrine: An All-
Layer-2 Container Computer Network,” in Proceedings of
IEEE Cloud, 2012.
[11] Chien-Yung Lee, Yu-Wei Lee, Cheng-Chun Tu, Pai-Wei
Wang, Yu-Cheng Wang, Chih-Yu Lin and Tzi-cker Chiueh,
“Autonomic Fail-over for a Software-Defined Container Com-
puter Network,” in Proceedings of International Conference on
Autonomic Computing (ICAC ’13), 2013.
[12] A. G. Money and H. Agius, “Video summarisation: A con-
ceptual framework and survey of the state of the art,” J. Vis.
Commun. Image Representation, vol. 19, no. 2, pp. 121143,
Feb. 2008.
[13] Z. Zivkovic, “Improved adaptive Gaussian mixture model for
background subtraction,” in Proceedings of IEEE International
Conference on Pattern Recognition (ICPR ’04), pp. 28-31,
2004.
[14] Yi-Ling Chen, Tse-Shih Chen, Tsiao-Wen Huang, Liang-
Chun Yin, Shiou-Yaw Wang, Tzi-cker Chiueh , “Intelligent
Urban Video Surveillance System for Automatic Vehicle De-
tection and Tracking in Clouds,” in Proceedings of IEEE Ad-
vanced Information Network and Applications (AINA 2013),
2013.
[15] Yuan-Kai Wang, Ching-Tan Fan, Ke-Yu Cheng and
P. S. Deng, “Real-Time Camera Anomaly Detection for Real-
World Video Surveillance”, in Proceedings of International
Conference on Machine Learning and Cybernetics, 2011.