Conference PaperPDF Available

Large Scale Video Analytics: On-demand, iterative inquiry for moving image research

Authors:

Abstract and Figures

Video is exploding as a means of communication and expression, and the resultant archives are massive, disconnected datasets. Thus, scholars' ability to research this crucial aspect of contemporary culture is severely hamstrung by limitations in semantic image retrieval, incomplete metadata, and the lack of a precise understanding of the actual content of any given archive. Our aim in the Large Scale Video Analytics (LSVA) project is to address obstacles in both image-retrieval and research that uses extreme-scale archives of video data that employs a human-machine hybrid process for analyzing moving images. We propose an approach that 1) places more interpretive power in the hands of the human user through novel visualizations of video data, and 2) uses a customized on-demand configuration that enables iterative queries.
Content may be subject to copyright.
Large Scale Video Analytics
On-demand, iterative inquiry for moving image research
Virginia Kuhn
USC
Los Angeles, CA, USA
Ritu Arora
TACC
Austin, TX, USA
Alan Craig, Kevin Franklin, Michael Simeone
ICHASS
Urbana, IL, USA
Dave Bock, Luigi Marini
NCSA
Urbana, IL, USA
Abstract- Video is exploding as a means of communication
and expression, and the resultant archives are massive,
disconnected datasets. Thus, scholars’ ability to research this
crucial aspect of contemporary culture is severely hamstrung
by limitations in semantic image retrieval, incomplete
metadata, and the lack of a precise understanding of the actual
content of any given archive. Our aim in the Large Scale
Video Analytics (LSVA) project is to address obstacles in
both image-retrieval and research that uses extreme-scale
archives of video data that employs a human-machine hybrid
process for analyzing moving images. We propose an
approach that 1) places more interpretive power in the hands
of the human user through novel visualizations of video data,
and 2) uses a customized on-demand configuration that
enables iterative queries.
Index Terms- High Performance Computing, Image Edge
Detection, Image Retrieval, Multimedia Databases, Software,
Visualization.
I. INTRODUCTION
The process of understanding and utilizing the content of
large databases of video archives has remained both time-
consuming and laborious. Aside from the massive size of
contemporary archives and the challenges that have faced
semantically-sensitive image retrieval for the last 20 years,
other key challenges to effectively analyzing video archives
with existing methods include limited metadata and the lack of
a precise understanding of the actual content of the archive. A
final difficulty lies in the incompleteness of translation across
semiotic registers - words can never fully represent sounds and
images, leaving a gap in meaning when labels alone are
employed to describe and search for content.
The real-time, interactive and iterative analysis of large
video archives can be both compute-intensive and memory-
intensive. High Performance Computing (HPC) platforms and
storage resources are therefore needed to handle the large
volume, velocity and variety associated with such video
archives. Given that about 72 hours of video are uploaded to
YouTube alone every minute (volume and velocity), and the
videos come in diverse formats and codecs (variety), large-
scale video analytics is actually a BigData problem [1] where
data is semi-structured or unstructured.
Though HPC is indispensable for analyzing such large
databases of videos, for a humanities researcher, one of the
obstacles associated with working in an open-science HPC
environment is the long wait-time associated with job-
processing when a job is submitted to a regular queue. The
nature of the humanities research, especially video analysis,
necessitates that the researcher is able to quickly get results
from one query in order to formulate the next one. Therefore, a
truly interactive system for video analysis that can function in
an HPC environment isrequiredtosupportresearchers’goals.
The Large Scale Video Analytics (LSVA) research project
explores new possibilities offered by both an innovative use of
the Gordon supercomputer at the San Diego Supercomputing
Center (SDSC), and the conjoined interests of HPC and the
cultural and historical study of moving images. We aim to
facilitate humanities research on moving images at a scale
heretofore unthinkable, demonstrating the possibility for
humanists to productively inform policies and infrastructure at
the supercomputing centers, even as the affordances of HPC
enlivens and extends humanities research.
Our aim in this project is to address obstacles in both
image retrieval and research that uses extreme-scale
archives of video data. The searching, tagging, and analysis
enabled by image retrieval faces the semantic gap problem
of satisfactorily using low-level image features and actions
to retrieve user-identified objects. This gap is only
exaggerated as queries by historians and cinema and media
scholars demand a high degree of precision and nuance in
their study of moving images. To solve this problem we
propose a two pronged approach that 1) places more
interpretive power in the hands of the human user through
novel visualizations of video data, and 2) uses a customized
on-demand configuration of Gordon that enables iterative
queries over a short period of time.
The rest of the paper describes our efforts towards
enabling real-time, interactive and iterative video analysis
in an open-science HPC environment.
II. BACKGROUND: THE STUDY OF MOVING IMAGES BY THE
HUMANITIES
Traditionally, cinema scholars’ methods consist of
conducting close readings of individual films or genres of
films, much the way that those in the field of English studies
explicate literature. By and large this inquiry is confined to
theatrical films - that is to say, those films, mainly fictive,
which are produced by studios and which are created for
entertainment. The challenge of analyzing the 115 years of
cinema across the globe that is being digitized is already
daunting. But that is only the tip of the moving image dataset.
With the explosion of affordable recording devices, from
consumer-level video cameras to cell phone recorders, video
has exploded as a common form of authoring, and this content
is widely shared across multiple online platforms in various
forms and lengths. In this environment, the notion of a discrete
film or even a single demarcated archive is quickly becoming
obsolete and irrelevant. It is as though we are building an
alphabet of images and sound but we have no dictionary, nor
grammar to help understand the impact of extra-linguistic
communication. These datasets demand critical analysis of
both form and content.
Like all photorealistic media, video combines two
contradictory features: it carries the presumed objectivity of
machine-recorded evidence that neutrally documents, and yet it
always has a point of view. To shoot footage is to frame, and to
frame is to exclude. No longer confined to theatres, moving
images saturate contemporary culture and they inundate human
beings. However, the role and impact of these ubiquitous
images and sounds that form time-based media is difficult if
not impossible to gauge without innovations in research
methodologies that allow a researcher access to vast archives
that s/he would never be able to view in a single lifetime.
III. SUPERCOMPUTING ON-DEMAND
The arrival the XSEDE resource “Gordon”, the
supercomputer having extensive flash memory, has opened the
possibility for researchers to interactively, and on-demand,
query large databases in real-time, including databases of
digital videos. Additionally, the computational capability of
Gordon is sufficient for extensive analysis of video-assets in
real-time for determining which videos to return in response to
a query. This is a compute- and memory- intensive process
involving queries that cannot be anticipated ahead of time.
This project will be using the Gordon supercomputer to not
only pre-process videos to automatically extract meaningful
metadata, but also as an interactive engine that allows
researchers to generate queries on the fly for which metadata
that was extracted a priori is not sufficient. In order to be
useful to researchers, we are combining an interactive database,
a robust web-based front-end (Medici [2]), and powerful
visualization representations to aid the researcher in
understanding the contents of the video-footage without
requiring them to watch every frame of every movie.
Due to the need for high-quality end-user experience (low-
latency and high-throughput), the LSVA project has received
dedicated and interactive access to Gordon’s I/O nodes. The
overall system architecture is shown in Figure 1. As can be
noticed from this Figure, besides the database of the metadata
extracted from the videos, the repository of the videos will also
be residing on the I/O node on Gordon to minimize the time
involved in I/O. It should be noted that this approach will be
modified after the complete workflow has been prototyped and
tested on Gordon to address the scalability issues related to the
massive increase in the size of datasets during production stage.
Fig. 2. Medici Interface
Fig. 1. Overview of the System Architecture
A. Robust Front-End
Medici is a scalable content management system that
allows users to upload and run analytics on a variety of file
types like images, audio, video and PDF. It supports both
automatic metadata extraction and user-defined content
tagging. Automatic metadata extraction services are driven by
file MIME type and include: image extractor, gamera extractor,
document extractor, PDF extractor, and video extractor. For the
LSVA project, Medici will be extended to extract metadata of
interest to cinema researchers such as shot-length and color-
palette. But perhaps more profoundly, the custom
visualizations we will add to Medici will allow new knowledge
about the role and impact of video data to emerge. The
metadata will be stored in a relational database for running
various tools in the analytical pipeline in a batch-mode.
B. Interactive Database at the Backend
As mentioned in [3], existing video analysis applications
generally fail to scale because the majority of platforms for
video processing treat databases merely as a storage engine
rather than a computation engine. In the LSVA project, the
rich metadata associated with the video repositories will be
stored in the database along with the additional information
related to the processing of videos (e.g., algorithms to be used
in the work-flow) such that the analytics can be performed
proactively in a batch-mode with minimal end-user
interaction. Such proactive processing along with optimization
schemes will result in near real-time end-user experience. The
metadata extraction service in Medici, by default, extracts
standard metadata elements and writes it as RDF-tuples. In
this research, not only will this metadata extraction service be
modified to launch multiple concurrent processes for faster
metadata extraction but it will also be modified so that
additional metadata that is of interest to cinema scholars can
be extracted and stored in a relational database schema for
faster querying and access in the analytical pipeline.
Currently, the size of the sample data that is being used to
establish the complete workflow for this research is under 4
TBs and can hence be stored in the flash memory on the I/O
node allocated for this project. However, with the increase in
the amount of data, and the need for compute-intensive
processing for some of the steps in the batch-mode (e.g.,
massive amounts of metadata extraction from hundreds of
TeraBytes of videos with short turn-around time), the Lustre
filesystem on Gordon will be used to avoid fetching the files at
the start of a batch job.
C. Visualization
Insight and understanding are greatly enhanced when
information is explored from multiple perspectives. To provide
such perspectives, information design must continue to evolve
and experiment with the latest tools and technologies to
provide effective means of communicating ever-increasing and
complex information [4, 5, and 6]. Fundamental principles of
spatial and temporal simultaneity, metamorphosis, time-
modification, and juxtaposition are investigated using advanced
information design and visualization tools in order to develop
methods to effectively represent large collections of video
databases. Our goal is to experiment with presenting video
collections in novel representations and as a means of
visualizing video data and as image tags to searchable video
databases.
Movie Cube
One visualization method involves the concept of a movie
cube. In this study, we explore ways in which we can apply
visualization tools to analyze a movie sequence. A movie
sequence is first converted into a three-dimensional dataset by
extracting and ordering each frame of the sequence along the Z
axis. Once in this form, we can use a variety of visualization
techniques to examine the data. We use our custom
visualization system to examine this dataset as shown in the
examples below (see Figures 3, 4, 5, and 6) by using a
sequence from one of the Internet Archive movies in the
Prelinger Collection (Safety Patrol, 1937 [7]). We begin by
rendering slice planes in various locations along the Z axis
along with the bounds of the dataset. As expected, we see the
individual frames of the movies as shown in Figure 3. Note that
time progresses from front to back in our movie cube.
Fig. 3. Rendering slice planes.
Fig. 4. Rendering slice planes across time (vertical)
Experimenting with different orientations of our slice plane,
we begin to see some interesting patterns emerge. As shown in
Figures 4 and 5, we render slice planes cutting across time
revealing only a single row (left) or column (right) from each
frame in each instance of time. Note that these visualizations
give us a clear representation of camera shots. Specifically, we
can see when in time (along the Z axis) new camera shots
occur as well as the relative duration of each camera shot. We
can also see patterns of movement in time.
Fig. 5. Rendering slice planes across time (horizontal)
We can also sample our movie cube volume using any type
of shape. In Figure 6, we map our movie data to a cylinder
positioned within the volume.
Fig. 6. Mapping movie data to a cylinder
By employing these visualizations that treat the videos as
signals over time rather than a cinematographic creation, we
are able to create two-dimensional images that contain within
them activity over time. Thus, while it is possible to use our
data to search two dimensional images extracted from films by
segmenting individual shots and scenes - a search better
oriented for seeking out specific objects or persons, fraught
with the obstacles that have impeded image retrieval since its
beginning - our search will explore the possibilities offered by
searching for activity instead of objects.
It is our hope that by searching for activity types
(something more easily translatable to machine-readable
pattern) we may prototype an equitable model for hybrid
systems used to navigate large-scale archives of moving
images. In this model the human user retains more
interpretative power to help mitigate the distortion often
introduced to searches by semantic gaps that are often found in
image retrieval.
IV. ANTICIPATED OUTCOMES
To date, data visualization tools have successfully rendered
“snapshots” of large videodatasets, but these produce “meta-
images”that, while informative, hold little explanatory power
on their own and, as such, are difficult to evaluate (See Figures
7 and 8 [8, 9]). When considered on their own, they become
little more than visual indices vis-à-vis the front-end and
graphs of code tolerances on the back-end, neither of which can
hold up as generalizable knowledge objects. Thus one of our
main goals is to use interpretive frameworks to draw some
useful conclusions about these large data sets by versioning
approaches (e.g. crowd sourced verification of machine-read
recognition [9]).
Fig. 7. Visualization using Image Plot algorithm
Fig. 8. Visualization using Cinemetrics algorithm
As detailed in [10], the conceptual issues that inhere when
labeling images with words is another major theme
interrogated and, as such, content tagging will be extremely
important. We will be endeavoring to create a mix of standard
tags, as well as idiosyncratic labels in order to more fully
represent the possibilities presented by a vocabulary of images.
In this way, we will leverage the power of HPC with the
expertise and interpretive strategies of humanities scholars in
order to arrive at a robust system that makes possible
sophisticated analysis of the vast video archives that
characterize contemporary culture. We are also evaluating the
currently existing scene completion techniques [11] to integrate
them in our analytical workflow. Such a tool will be useful for
completing scenes from a repository of semantically-related
pictures or videos.
ACKNOWLEDGEMENT
This work uses the Extreme Science and Engineering
Discovery Environment (XSEDE), which is supported by
National Science Foundation grant number OCI-1053575.
We are grateful to XSEDE for providing us the resources
required for development and deployment of this project.
REFERENCES
[1] Paul Zikopoulos, Chris Eaton, Paul Zikopoulos. 2011.
Understanding Big Data: Analytics for Enterprise Class Hadoop
and Streaming Data, First Edition, pp. 1-166: http://www-
01.ibm.com/software/data/bigdata/
[2] Medici multi-media content management system:
http://medici.ncsa.illinois.edu/
[3] Qiming Chen, Meichun Hsu, Rui Liu, and Weihong Wang.
2009. Scaling-Up and Speeding-Up Video Analytics Inside
Database Engine. In Proceedings of the 20th International
Conference on Database and Expert Systems Applications
(DEXA '09), 244-254.
[4] BarrySalt, 2006. “The NumbersSpeak,” Moving Into Pictures.
Starwood P.
[5] James E. Cutting, Jordan E. DeLong and Christine E. Nothelfer
Attention and the Evolution of Hollywood Film. Psychological
Science published online 5 February 2010 DOI:
10.1177/0956797610361679
[6] Yuri Tsivian and Gunars Civjans. Cinemetrics: Movie
Measurement and Study Tool Database.
http://www.cinemetrics.lv/.
[7] Safety Patrol film. 1937. Producer: Handy (Jam) Organization
Sponsor: Chevrolet Division, General Motors Corporation.
[8] Software Studies Initiative, Image Plot visualization software:
explore patterns in large image collections
http://lab.softwarestudies.com/p/imageplot.html
[9] Brodbeck, Frederic. Cinemetrics thesis project:
http://cinemetrics.fredericbrodbeck.de/
[10] VirginiaKuhn,2010.“FilmicTextsandtheRiseoftheFifth
Estate,”International Journal of Learning and Media, MIT P.
Volume 2, Issue 2-3 doi: 10.1162/IJLM_a_00057
[11] James Hays, Alexei A. Efros. Scene Completion Using Millions
of Photographs. ACM Transactions on Graphics (SIGGRAPH
2007). August 2007, vol. 26, No. 3.
... For such image characteristics as colour, texture and movement, we had to use task-specific algorithms, which turned out to perform better in those cases". 62 The way of representing the differentiation and succession between images that resemble one another proposed by Kuhn et al. (2012) in their work on video corpora served as inspiration here. 63 On the relationship between Thom's work and diagrammatical forces in Deleuze's Francis Bacon: The Logic of Sensation, see Aa.Vv. ...
Book
This book deals with two fundamental issues in the semiotics of the image. The first is the relationship between image and observer: how does one look at an image? To answer this question, this book sets out to transpose the theory of enunciation formulated in linguistics over to the visual field. It also aims to clarify the gains made in contemporary visual semiotics relative to the semiology of Roland Barthes and Emile Benveniste. The second issue addressed is the relation between the forces, forms and materiality of the images. How do different physical mediums (pictorial, photographic and digital) influence visual forms? How does materiality affect the generativity of forms? On the forces within the images, the book addresses the philosophical thought of Gilles Deleuze and René Thom as well as the experiment of Aby Warburg’s Atlas Mnemosyne. The theories discussed in the book are tested on a variety of corpora for analysis, including both paintings and photographs, taken from traditional as well as contemporary sources in a variety of social sectors (arts and sciences). Finally, semiotic methodology is contrasted with the computational analysis of large collections of images (Big Data), such as the “Media Visualization” analyses proposed by Lev Manovich and Cultural Analytics in the field of Computer Science to evaluate the impact of automatic analysis of visual forms on Digital Art History and more generally on the image sciences.
... For such image characteristics as colour, texture and movement, we had to use task-specific algorithms, which turned out to perform better in those cases". 62 The way of representing the differentiation and succession between images that resemble one another proposed by Kuhn et al. (2012) in their work on video corpora served as inspiration here. 63 On the relationship between Thom's work and diagrammatical forces in Deleuze's Francis Bacon: The Logic of Sensation, see Aa.Vv. ...
Chapter
Although semiotics of the structuralist tradition, which devoted itself to the theorization and analysis of visual language in the 1980s and 1990s, did not directly
Article
Full-text available
This study aims to examine, ascertain, and measure the extent of research conducted on real-time surveillance video analytics (RTSVA) and uncover the patterns, advancements, and progression in the SCOPUS database. An electronic search methodology was employed to identify the most relevant scholarly papers. Each analytical tool was analyzed using the R program code and VOS Viewer software. Multiple factors were explored, including co-authorship, co-citation, conceptual structure, co-word occurrence, trend themes analysis, thematic map, and visualization analysis. It is observed the study includes data from 653 sources, such as journals, books, and conference proceedings. There are 1041 documents, with an annual growth rate of 21%. IEEE Access has the highest impact, and Multimedia Tools and Applications has the second highest source impact. The most significant author among the research group is Chen Y, with a fractionalization value of 6.59 (27 articles), followed by Wang J (15 articles) with a value of 3.80. The keywords related to the video analytics domain along with its frequency of occurrence, are "Edge computing," which has appeared 158 times, and video surveillance has appeared 95 times. The Thematic map is divided into four quadrants: Niche, Motors, Emerging, and Basic themes. Traffic surveillance, blockchain, and COVID-19 are under Niche themes. It is also necessary to explore the sociological and ethical implications linked to using video surveillance systems. This study has the potential to aid both novice and experienced researchers in identifying novel research domains, suitable sources, and opportunities for collaboration. The results pertaining to relational and evaluate methodologies could be helpful to the new researchers.
Article
Full-text available
*** explore the data used in this paper in the interactive app at https://nickredfern.shinyapps.io/colour_palletes_US_trailers/. *** This article analyses the smoothed movie barcodes of 173 trailers nominated for a Golden Trailer award between 2016 and 2019 across nine genres: action, animation/family, comedy, documentary, drama, fantasy/adventure, horror, romance, and thriller. The results show that colours in the nine genres are similar, dominated by dark, unsaturated colours in the orange (hue = 30° ± 30°) and azure (hue = 210° ± 30°) regions of the HSL colour wheel. Colour palettes for each genre have similar colours but show some differences in the diversity and evenness of the distributions of these colours. Questo articolo analizza i codici a barre di 173 trailer nominati per un Golden Trailer award tra il 2016 e il 2019 in nove generi: azione, animazione/famiglia, commedia, documentario, dramma, fantasy/avventura, horror, romantico e thriller. I risultati mostrano che i colori nei nove generi sono simili, dominati da colori scuri e insaturi nelle regioni arancione (tonalità = 30° ± 30°) e azzurro (tonalità = 210° ± 30°) della ruota dei colori HSL. Le tavolozze dei colori per ogni genere hanno colori simili ma mostrano alcune differenze nella diversità e nell'uniformità delle distribuzioni di questi colori.
Article
The article explores how the analysis and visualization of sensory features in digitized moving images contributes towards the reconstitution of audiovisual archives, and how this affects how the objects and collections that make up those archives acquire meaning. In doing so, it takes inspiration from the ongoing research project The Sensory Moving Image Archive (SEMIA). SEMIA was born out of the observation that users, in accessing repositories of digitized moving images, are constrained by current practices of archival description. Institutional catalogues and collection management systems generally make use of so-called ‘semantic’ labels: keywords or other tags that serve to positively identify entities featured in or otherwise associated with specific items (e.g. people, locations, dates, genres, etc.). Those labels are either produced manually (as a result of which they are also highly fragmentary) or, increasingly, with tools for automatic or semi-automatic metadating. However, reliance on semantic descriptors implies a logic of targeted search, which presupposes that users know what they are looking for; moreover, searching is profoundly shaped by the interpretive frameworks that govern labelling. Arguably, this poses restrictions on what users can find or how they can relate archival objects to each other—but ultimately also in terms of how they can use or reinterpret the contents of archives. The SEMIA project team sets out to explore whether, and if so how, visual analysis and visualization of the sensory relations between items can help provide an alternative, affording more exploratory forms of browsing.In this article, we reflect on what this effort entails in terms of how meaning is assigned in the context of moving image archives. What do the different digital transformations that archival objects undergo entail in terms of how they are understood, also in relation to each other? How do the conditions for meaning production get shaped, at different points in the processes of computational analysis and interface design? What are the merits, both of the transformation of images and archives and of the attendant possibilities for new meaning being opened up, in terms of archival access and reuse? In exploring these questions, we work towards a consideration of the role of serendipity in how users encounter, and draw meaning from, database objects, in relation to the tool prototype the SEMIA team is currently developing.
Chapter
To this point, we have examined issues surrounding the models of communication inscribed within images as well as the conflicts in terms of presence these images may exhibit.
Article
Full-text available
Filmic Texts and the Rise of the Fifth Estate maps the use of a documentary film as a main text in an undergraduate course, explaining its practices and elaborating its theoretical underpinnings before gesturing toward some of the more salient unresolved issues that offer avenues for further research. Based on the premise that digital technologies endow films with the same infinite patience that books possess: their segments, like pages, are constant and so can be analyzed in a sustained fashionFilmic Texts maintains that in a highly mediated world, facility with all of the available semiotic resources is integral to the type of large scale literacy necessary for a flourishing democracy. This argument gains strength as its concepts are also enacted; it is created in Scalar, a platform that allows one to speak with rich media in addition to words. Please view this article here: http://scalar.usc.edu/anvc/kuhn/index
Article
Full-text available
Reaction times exhibit a spectral patterning known as 1/f, and these patterns can be thought of as reflecting time-varying changes in attention. We investigated the shot structure of Hollywood films to determine if these same patterns are found. We parsed 150 films with release dates from 1935 to 2005 into their sequences of shots and then analyzed the pattern of shot lengths in each film. Autoregressive and power analyses showed that, across that span of 70 years, shots became increasingly more correlated in length with their neighbors and created power spectra approaching 1/f. We suggest, as have others, that 1/f patterns reflect world structure and mental process. Moreover, a 1/f temporal shot structure may help harness observers' attention to the narrative of a film.
Conference Paper
Most conventional video processing platforms treat database merely as a storage engine rather than a computation engine, which causes inefficient data access and massive amount of data movement. Motivated by providing a convergent platform, we push down video processing to the database engine using User Defined Functions (UDFs). However, the existing UDF technology suffers from two major limitations. First, a UDF cannot take a set of tuples as input or as output, which restricts the modeling capability for complex applications, and the tuple-wise pipelined UDF execution often leads to inefficiency and rules out the potential for enabling data-parallel computation inside the function. Next, the UDFs coded in non-SQL language such as C, either involve hard-to-follow DBMS internal system calls for interacting with the query executor, or sacrifice performance by converting input objects to strings. To solve the above problems, we realized the notion of Relation Valued Function (RVF) in an industry-scale database engine. With tuple-set input and output, an RVF can have enhanced modeling power, efficiency and in-function data-parallel computation potential. To have RVF execution interact with the query engine efficiently, we introduced the notion of RVF invocation patterns and based on that developed RVF containers for focused system support. We have prototyped these mechanisms on the Postgres database engine, and tested their power with Support Vector Machine (SVM) classification and learning, the most widely used analytics model for video understanding. Our experience reveals the value of the proposed approach in multiple dimensions: modeling capability, efficiency, in-function data-parallelism with multi-core CPUs, as well as usability; all these are fundamental to converging data-intensive analytics and data management.
Article
What can you do with a million images? In this paper we present a new image completion algorithm powered by a huge database of photographs gathered from the Web. The algorithm patches up holes in images by finding similar image regions in the database that are not only seamless but also semantically valid. Our chief insight is that while the space of images is effectively infinite, the space of semantically differentiable scenes is actually not that large. For many image completion tasks we are able to find similar scenes which contain image fragments that will convincingly complete the image. Our algorithm is entirely data-driven, requiring no annotations or labelling by the user. Unlike existing image completion methods, our algorithm can generate a diverse set of results for each input image and we allow users to select among them. We demonstrate the superiority of our algorithm over existing image completion approaches.
Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data
  • Paul Zikopoulos
  • Chris Eaton
  • Paul Zikopoulos
The Numbers Speak Moving Into Pictures
  • Barry Salt
Cinemetrics: Movie Measurement and Study Tool Database
  • Yuri Tsivian
  • Gunars Civjans
Yuri Tsivian and Gunars Civjans. Cinemetrics: Movie Measurement and Study Tool Database. http://www.cinemetrics.lv/.
Cinemetrics thesis project
  • Frederic Brodbeck