Conference PaperPDF Available

Large Scale Video Analytics: On-demand, iterative inquiry for moving image research

October 2012

October 2012

DOI:10.1109/eScience.2012.6404446

Conference: E-Science (e-Science), 2012 IEEE 8th International Conference on

Authors:

Virginia Kuhn

University of Southern California

Alan Craig

University of Illinois, Urbana-Champaign

Michael Simeone

Arizona State University

Show all 7 authorsHide

Video is exploding as a means of communication and expression, and the resultant archives are massive, disconnected datasets. Thus, scholars' ability to research this crucial aspect of contemporary culture is severely hamstrung by limitations in semantic image retrieval, incomplete metadata, and the lack of a precise understanding of the actual content of any given archive. Our aim in the Large Scale Video Analytics (LSVA) project is to address obstacles in both image-retrieval and research that uses extreme-scale archives of video data that employs a human-machine hybrid process for analyzing moving images. We propose an approach that 1) places more interpretive power in the hands of the human user through novel visualizations of video data, and 2) uses a customized on-demand configuration that enables iterative queries.

Medici Interface

…

Figures - uploaded by Virginia Kuhn

Content may be subject to copyright.

Content uploaded by Virginia Kuhn

Content may be subject to copyright.

Large Scale Video Analytics

On-demand, iterative inquiry for moving image research

Virginia Kuhn

USC

Los Angeles, CA, USA

Ritu Arora

TACC

Austin, TX, USA

Alan Craig, Kevin Franklin, Michael Simeone

ICHASS

Urbana, IL, USA

Dave Bock, Luigi Marini

NCSA

Urbana, IL, USA

Abstract- Video is exploding as a means of communication

and expression, and the resultant archives are massive,

disconnected datasets. Thus, scholars’ ability to research this

crucial aspect of contemporary culture is severely hamstrung

by limitations in semantic image retrieval, incomplete

metadata, and the lack of a precise understanding of the actual

content of any given archive. Our aim in the Large Scale

Video Analytics (LSVA) project is to address obstacles in

both image-retrieval and research that uses extreme-scale

archives of video data that employs a human-machine hybrid

process for analyzing moving images. We propose an

approach that 1) places more interpretive power in the hands

of the human user through novel visualizations of video data,

and 2) uses a customized on-demand configuration that

enables iterative queries.

Index Terms- High Performance Computing, Image Edge

Detection, Image Retrieval, Multimedia Databases, Software,

Visualization.

I. INTRODUCTION

The process of understanding and utilizing the content of

large databases of video archives has remained both time-

consuming and laborious. Aside from the massive size of

contemporary archives and the challenges that have faced

semantically-sensitive image retrieval for the last 20 years,

other key challenges to effectively analyzing video archives

with existing methods include limited metadata and the lack of

a precise understanding of the actual content of the archive. A

final difficulty lies in the incompleteness of translation across

semiotic registers - words can never fully represent sounds and

images, leaving a gap in meaning when labels alone are

employed to describe and search for content.

The real-time, interactive and iterative analysis of large

video archives can be both compute-intensive and memory-

intensive. High Performance Computing (HPC) platforms and

storage resources are therefore needed to handle the large

volume, velocity and variety associated with such video

archives. Given that about 72 hours of video are uploaded to

YouTube alone every minute (volume and velocity), and the

videos come in diverse formats and codecs (variety), large-

scale video analytics is actually a BigData problem [1] where

data is semi-structured or unstructured.

Though HPC is indispensable for analyzing such large

databases of videos, for a humanities researcher, one of the

obstacles associated with working in an open-science HPC

environment is the long wait-time associated with job-

processing when a job is submitted to a regular queue. The

nature of the humanities research, especially video analysis,

necessitates that the researcher is able to quickly get results

from one query in order to formulate the next one. Therefore, a

truly interactive system for video analysis that can function in

an HPC environment isrequiredtosupportresearchers’goals.

The Large Scale Video Analytics (LSVA) research project

explores new possibilities offered by both an innovative use of

the Gordon supercomputer at the San Diego Supercomputing

Center (SDSC), and the conjoined interests of HPC and the

cultural and historical study of moving images. We aim to

facilitate humanities research on moving images at a scale

heretofore unthinkable, demonstrating the possibility for

humanists to productively inform policies and infrastructure at

the supercomputing centers, even as the affordances of HPC

enlivens and extends humanities research.

Our aim in this project is to address obstacles in both

image retrieval and research that uses extreme-scale

archives of video data. The searching, tagging, and analysis

enabled by image retrieval faces the semantic gap problem

of satisfactorily using low-level image features and actions

to retrieve user-identified objects. This gap is only

exaggerated as queries by historians and cinema and media

scholars demand a high degree of precision and nuance in

their study of moving images. To solve this problem we

propose a two pronged approach that 1) places more

interpretive power in the hands of the human user through

novel visualizations of video data, and 2) uses a customized

on-demand configuration of Gordon that enables iterative

queries over a short period of time.

The rest of the paper describes our efforts towards

enabling real-time, interactive and iterative video analysis

in an open-science HPC environment.

II. BACKGROUND: THE STUDY OF MOVING IMAGES BY THE

HUMANITIES

Traditionally, cinema scholars’ methods consist of

conducting close readings of individual films or genres of

films, much the way that those in the field of English studies

explicate literature. By and large this inquiry is confined to

theatrical films - that is to say, those films, mainly fictive,

which are produced by studios and which are created for

entertainment. The challenge of analyzing the 115 years of

cinema across the globe that is being digitized is already

daunting. But that is only the tip of the moving image dataset.

With the explosion of affordable recording devices, from

consumer-level video cameras to cell phone recorders, video

has exploded as a common form of authoring, and this content

is widely shared across multiple online platforms in various

forms and lengths. In this environment, the notion of a discrete

film or even a single demarcated archive is quickly becoming

obsolete and irrelevant. It is as though we are building an

alphabet of images and sound but we have no dictionary, nor

grammar to help understand the impact of extra-linguistic

communication. These datasets demand critical analysis of

both form and content.

Like all photorealistic media, video combines two

contradictory features: it carries the presumed objectivity of

machine-recorded evidence that neutrally documents, and yet it

always has a point of view. To shoot footage is to frame, and to

frame is to exclude. No longer confined to theatres, moving

images saturate contemporary culture and they inundate human

beings. However, the role and impact of these ubiquitous

images and sounds that form time-based media is difficult if

not impossible to gauge without innovations in research

methodologies that allow a researcher access to vast archives

that s/he would never be able to view in a single lifetime.

III. SUPERCOMPUTING ON-DEMAND

The arrival the XSEDE resource “Gordon”, the

supercomputer having extensive flash memory, has opened the

possibility for researchers to interactively, and on-demand,

query large databases in real-time, including databases of

digital videos. Additionally, the computational capability of

Gordon is sufficient for extensive analysis of video-assets in

real-time for determining which videos to return in response to

a query. This is a compute- and memory- intensive process

involving queries that cannot be anticipated ahead of time.

This project will be using the Gordon supercomputer to not

only pre-process videos to automatically extract meaningful

metadata, but also as an interactive engine that allows

researchers to generate queries on the fly for which metadata

that was extracted a priori is not sufficient. In order to be

useful to researchers, we are combining an interactive database,

a robust web-based front-end (Medici [2]), and powerful

visualization representations to aid the researcher in

understanding the contents of the video-footage without

requiring them to watch every frame of every movie.

Due to the need for high-quality end-user experience (low-

latency and high-throughput), the LSVA project has received

dedicated and interactive access to Gordon’s I/O nodes. The

overall system architecture is shown in Figure 1. As can be

noticed from this Figure, besides the database of the metadata

extracted from the videos, the repository of the videos will also

be residing on the I/O node on Gordon to minimize the time

involved in I/O. It should be noted that this approach will be

modified after the complete workflow has been prototyped and

tested on Gordon to address the scalability issues related to the

massive increase in the size of datasets during production stage.

Fig. 2. Medici Interface

Fig. 1. Overview of the System Architecture

A. Robust Front-End

Medici is a scalable content management system that

allows users to upload and run analytics on a variety of file

types like images, audio, video and PDF. It supports both

automatic metadata extraction and user-defined content

tagging. Automatic metadata extraction services are driven by

file MIME type and include: image extractor, gamera extractor,

document extractor, PDF extractor, and video extractor. For the

LSVA project, Medici will be extended to extract metadata of

interest to cinema researchers such as shot-length and color-

palette. But perhaps more profoundly, the custom

visualizations we will add to Medici will allow new knowledge

about the role and impact of video data to emerge. The

metadata will be stored in a relational database for running

various tools in the analytical pipeline in a batch-mode.

B. Interactive Database at the Backend

As mentioned in [3], existing video analysis applications

generally fail to scale because the majority of platforms for

video processing treat databases merely as a storage engine

rather than a computation engine. In the LSVA project, the

rich metadata associated with the video repositories will be

stored in the database along with the additional information

related to the processing of videos (e.g., algorithms to be used

in the work-flow) such that the analytics can be performed

proactively in a batch-mode with minimal end-user

interaction. Such proactive processing along with optimization

schemes will result in near real-time end-user experience. The

metadata extraction service in Medici, by default, extracts

standard metadata elements and writes it as RDF-tuples. In

this research, not only will this metadata extraction service be

modified to launch multiple concurrent processes for faster

metadata extraction but it will also be modified so that

additional metadata that is of interest to cinema scholars can

be extracted and stored in a relational database schema for

faster querying and access in the analytical pipeline.

Currently, the size of the sample data that is being used to

establish the complete workflow for this research is under 4

TBs and can hence be stored in the flash memory on the I/O

node allocated for this project. However, with the increase in

the amount of data, and the need for compute-intensive

processing for some of the steps in the batch-mode (e.g.,

massive amounts of metadata extraction from hundreds of

TeraBytes of videos with short turn-around time), the Lustre

filesystem on Gordon will be used to avoid fetching the files at

the start of a batch job.

C. Visualization

Insight and understanding are greatly enhanced when

information is explored from multiple perspectives. To provide

such perspectives, information design must continue to evolve

and experiment with the latest tools and technologies to

provide effective means of communicating ever-increasing and

complex information [4, 5, and 6]. Fundamental principles of

spatial and temporal simultaneity, metamorphosis, time-

modification, and juxtaposition are investigated using advanced

information design and visualization tools in order to develop

methods to effectively represent large collections of video

databases. Our goal is to experiment with presenting video

collections in novel representations and as a means of

visualizing video data and as image tags to searchable video

databases.

Movie Cube

One visualization method involves the concept of a movie

cube. In this study, we explore ways in which we can apply

visualization tools to analyze a movie sequence. A movie

sequence is first converted into a three-dimensional dataset by

extracting and ordering each frame of the sequence along the Z

axis. Once in this form, we can use a variety of visualization

techniques to examine the data. We use our custom

visualization system to examine this dataset as shown in the

examples below (see Figures 3, 4, 5, and 6) by using a

sequence from one of the Internet Archive movies in the

Prelinger Collection (Safety Patrol, 1937 [7]). We begin by

rendering slice planes in various locations along the Z axis

along with the bounds of the dataset. As expected, we see the

individual frames of the movies as shown in Figure 3. Note that

time progresses from front to back in our movie cube.

Fig. 3. Rendering slice planes.

Fig. 4. Rendering slice planes across time (vertical)

Experimenting with different orientations of our slice plane,

we begin to see some interesting patterns emerge. As shown in

Figures 4 and 5, we render slice planes cutting across time

revealing only a single row (left) or column (right) from each

frame in each instance of time. Note that these visualizations

give us a clear representation of camera shots. Specifically, we

can see when in time (along the Z axis) new camera shots

occur as well as the relative duration of each camera shot. We

can also see patterns of movement in time.

Fig. 5. Rendering slice planes across time (horizontal)

We can also sample our movie cube volume using any type

of shape. In Figure 6, we map our movie data to a cylinder

positioned within the volume.

Fig. 6. Mapping movie data to a cylinder

By employing these visualizations that treat the videos as

signals over time rather than a cinematographic creation, we

are able to create two-dimensional images that contain within

them activity over time. Thus, while it is possible to use our

data to search two dimensional images extracted from films by

segmenting individual shots and scenes - a search better

oriented for seeking out specific objects or persons, fraught

with the obstacles that have impeded image retrieval since its

beginning - our search will explore the possibilities offered by

searching for activity instead of objects.

It is our hope that by searching for activity types

(something more easily translatable to machine-readable

pattern) we may prototype an equitable model for hybrid

systems used to navigate large-scale archives of moving

images. In this model the human user retains more

interpretative power to help mitigate the distortion often

introduced to searches by semantic gaps that are often found in

image retrieval.

IV. ANTICIPATED OUTCOMES

To date, data visualization tools have successfully rendered

“snapshots” of large videodatasets, but these produce “meta-

images”that, while informative, hold little explanatory power

on their own and, as such, are difficult to evaluate (See Figures

7 and 8 [8, 9]). When considered on their own, they become

little more than visual indices vis-à-vis the front-end and

graphs of code tolerances on the back-end, neither of which can

hold up as generalizable knowledge objects. Thus one of our

main goals is to use interpretive frameworks to draw some

useful conclusions about these large data sets by versioning

approaches (e.g. crowd sourced verification of machine-read

recognition [9]).

Fig. 7. Visualization using Image Plot algorithm

Fig. 8. Visualization using Cinemetrics algorithm

As detailed in [10], the conceptual issues that inhere when

labeling images with words is another major theme

interrogated and, as such, content tagging will be extremely

important. We will be endeavoring to create a mix of standard

tags, as well as idiosyncratic labels in order to more fully

represent the possibilities presented by a vocabulary of images.

In this way, we will leverage the power of HPC with the

expertise and interpretive strategies of humanities scholars in

order to arrive at a robust system that makes possible

sophisticated analysis of the vast video archives that

characterize contemporary culture. We are also evaluating the

currently existing scene completion techniques [11] to integrate

them in our analytical workflow. Such a tool will be useful for

completing scenes from a repository of semantically-related

pictures or videos.

ACKNOWLEDGEMENT

This work uses the Extreme Science and Engineering

Discovery Environment (XSEDE), which is supported by

National Science Foundation grant number OCI-1053575.

We are grateful to XSEDE for providing us the resources

required for development and deployment of this project.

REFERENCES

[1] Paul Zikopoulos, Chris Eaton, Paul Zikopoulos. 2011.

Understanding Big Data: Analytics for Enterprise Class Hadoop

and Streaming Data, First Edition, pp. 1-166: http://www-

01.ibm.com/software/data/bigdata/

[2] Medici multi-media content management system:

http://medici.ncsa.illinois.edu/

[3] Qiming Chen, Meichun Hsu, Rui Liu, and Weihong Wang.

2009. Scaling-Up and Speeding-Up Video Analytics Inside

Database Engine. In Proceedings of the 20th International

Conference on Database and Expert Systems Applications

(DEXA '09), 244-254.

[4] BarrySalt, 2006. “The NumbersSpeak,” Moving Into Pictures.

Starwood P.

[5] James E. Cutting, Jordan E. DeLong and Christine E. Nothelfer

Attention and the Evolution of Hollywood Film. Psychological

Science published online 5 February 2010 DOI:

10.1177/0956797610361679

[6] Yuri Tsivian and Gunars Civjans. Cinemetrics: Movie

Measurement and Study Tool Database.

http://www.cinemetrics.lv/.

[7] Safety Patrol film. 1937. Producer: Handy (Jam) Organization

Sponsor: Chevrolet Division, General Motors Corporation.

[8] Software Studies Initiative, Image Plot visualization software:

explore patterns in large image collections

http://lab.softwarestudies.com/p/imageplot.html

[9] Brodbeck, Frederic. Cinemetrics thesis project:

http://cinemetrics.fredericbrodbeck.de/

[10] VirginiaKuhn,2010.“FilmicTextsandtheRiseoftheFifth

Estate,”International Journal of Learning and Media, MIT P.

Volume 2, Issue 2-3 doi: 10.1162/IJLM_a_00057

[11] James Hays, Alexei A. Efros. Scene Completion Using Millions

of Photographs. ACM Transactions on Graphics (SIGGRAPH

2007). August 2007, vol. 26, No. 3.

The Language of Images: The Forms and the Forces

Book

Jan 2020

Maria Giulia Dondero

This book deals with two fundamental issues in the semiotics of the image. The first is the relationship between image and observer: how does one look at an image? To answer this question, this book sets out to transpose the theory of enunciation formulated in linguistics over to the visual field. It also aims to clarify the gains made in contemporary visual semiotics relative to the semiology of Roland Barthes and Emile Benveniste. The second issue addressed is the relation between the forces, forms and materiality of the images. How do different physical mediums (pictorial, photographic and digital) influence visual forms? How does materiality affect the generativity of forms? On the forces within the images, the book addresses the philosophical thought of Gilles Deleuze and René Thom as well as the experiment of Aby Warburg’s Atlas Mnemosyne. The theories discussed in the book are tested on a variety of corpora for analysis, including both paintings and photographs, taken from traditional as well as contemporary sources in a variety of social sectors (arts and sciences). Finally, semiotic methodology is contrasted with the computational analysis of large collections of images (Big Data), such as the “Media Visualization” analyses proposed by Lev Manovich and Cultural Analytics in the field of Computer Science to evaluate the impact of automatic analysis of visual forms on Digital Art History and more generally on the image sciences.

The Theory of Enunciation: From Linguistics to Visual Semiotics

Chapter

Aug 2020

Maria Giulia Dondero

Although semiotics of the structuralist tradition, which devoted itself to the theorization and analysis of visual language in the 1980s and 1990s, did not directly

Twenty-five years of real-time surveillance video analytics: a bibliometric review

Article

Full-text available

Jan 2024
MULTIMED TOOLS APPL

This study aims to examine, ascertain, and measure the extent of research conducted on real-time surveillance video analytics (RTSVA) and uncover the patterns, advancements, and progression in the SCOPUS database. An electronic search methodology was employed to identify the most relevant scholarly papers. Each analytical tool was analyzed using the R program code and VOS Viewer software. Multiple factors were explored, including co-authorship, co-citation, conceptual structure, co-word occurrence, trend themes analysis, thematic map, and visualization analysis. It is observed the study includes data from 653 sources, such as journals, books, and conference proceedings. There are 1041 documents, with an annual growth rate of 21%. IEEE Access has the highest impact, and Multimedia Tools and Applications has the second highest source impact. The most significant author among the research group is Chen Y, with a fractionalization value of 6.59 (27 articles), followed by Wang J (15 articles) with a value of 3.80. The keywords related to the video analytics domain along with its frequency of occurrence, are "Edge computing," which has appeared 158 times, and video surveillance has appeared 95 times. The Thematic map is divided into four quadrants: Niche, Motors, Emerging, and Basic themes. Traffic surveillance, blockchain, and COVID-19 are under Niche themes. It is also necessary to explore the sociological and ethical implications linked to using video surveillance systems. This study has the potential to aid both novice and experienced researchers in identifying novel research domains, suitable sources, and opportunities for collaboration. The results pertaining to relational and evaluate methodologies could be helpful to the new researchers.

Les images entre formes symboliques et humanités numériques : une introduction

Article

Nov 2023

Pierluigi Basso Fossali

Images between Symbolic Forms and Digital Humanities: An Introduction

Article

Nov 2023

Pierluigi Basso Fossali

Colour palettes in US film trailers: a comparative analysis of movie barcodes

Article

Full-text available

Sep 2021

Nick Redfern

*** explore the data used in this paper in the interactive app at https://nickredfern.shinyapps.io/colour_palletes_US_trailers/. *** This article analyses the smoothed movie barcodes of 173 trailers nominated for a Golden Trailer award between 2016 and 2019 across nine genres: action, animation/family, comedy, documentary, drama, fantasy/adventure, horror, romance, and thriller. The results show that colours in the nine genres are similar, dominated by dark, unsaturated colours in the orange (hue = 30° ± 30°) and azure (hue = 210° ± 30°) regions of the HSL colour wheel. Colour palettes for each genre have similar colours but show some differences in the diversity and evenness of the distributions of these colours. Questo articolo analizza i codici a barre di 173 trailer nominati per un Golden Trailer award tra il 2016 e il 2019 in nove generi: azione, animazione/famiglia, commedia, documentario, dramma, fantasy/avventura, horror, romantico e thriller. I risultati mostrano che i colori nei nove generi sono simili, dominati da colori scuri e insaturi nelle regioni arancione (tonalità = 30° ± 30°) e azzurro (tonalità = 210° ± 30°) della ruota dei colori HSL. Le tavolozze dei colori per ogni genere hanno colori simili ma mostrano alcune differenze nella diversità e nell'uniformità delle distribuzioni di questi colori.

Digital Access as Archival Reconstitution: Algorithmic Sampling, Visualization, and the Production of Meaning in Large Moving Image RepositoriesL'accès numérique comme reconstitution de l’archive : l'échantillonnage algorithmique, visualisation et la production de sens dans les archives d’images en mouvement numérisées à grande échelle

Article

May 2021

The article explores how the analysis and visualization of sensory features in digitized moving images contributes towards the reconstitution of audiovisual archives, and how this affects how the objects and collections that make up those archives acquire meaning. In doing so, it takes inspiration from the ongoing research project The Sensory Moving Image Archive (SEMIA). SEMIA was born out of the observation that users, in accessing repositories of digitized moving images, are constrained by current practices of archival description. Institutional catalogues and collection management systems generally make use of so-called ‘semantic’ labels: keywords or other tags that serve to positively identify entities featured in or otherwise associated with specific items (e.g. people, locations, dates, genres, etc.). Those labels are either produced manually (as a result of which they are also highly fragmentary) or, increasingly, with tools for automatic or semi-automatic metadating. However, reliance on semantic descriptors implies a logic of targeted search, which presupposes that users know what they are looking for; moreover, searching is profoundly shaped by the interpretive frameworks that govern labelling. Arguably, this poses restrictions on what users can find or how they can relate archival objects to each other—but ultimately also in terms of how they can use or reinterpret the contents of archives. The SEMIA project team sets out to explore whether, and if so how, visual analysis and visualization of the sensory relations between items can help provide an alternative, affording more exploratory forms of browsing.In this article, we reflect on what this effort entails in terms of how meaning is assigned in the context of moving image archives. What do the different digital transformations that archival objects undergo entail in terms of how they are understood, also in relation to each other? How do the conditions for meaning production get shaped, at different points in the processes of computational analysis and interface design? What are the merits, both of the transformation of images and archives and of the attendant possibilities for new meaning being opened up, in terms of archival access and reuse? In exploring these questions, we work towards a consideration of the role of serendipity in how users encounter, and draw meaning from, database objects, in relation to the tool prototype the SEMIA team is currently developing.

The Metavisual

Chapter

Aug 2020

Maria Giulia Dondero

To this point, we have examined issues surrounding the models of communication inscribed within images as well as the conflicts in terms of presence these images may exhibit.

The Deep Film Access Project: Ontology and metadata design for digital film production assets

Conference Paper

Oct 2014

Filmic Texts And The Rise Of The Fifth Estate

Article

Full-text available

May 2010

Virginia Kuhn

Filmic Texts and the Rise of the Fifth Estate maps the use of a documentary film as a main text in an undergraduate course, explaining its practices and elaborating its theoretical underpinnings before gesturing toward some of the more salient unresolved issues that offer avenues for further research. Based on the premise that digital technologies endow films with the same infinite patience that books possess: their segments, like pages, are constant and so can be analyzed in a sustained fashionFilmic Texts maintains that in a highly mediated world, facility with all of the available semiotic resources is integral to the type of large scale literacy necessary for a flourishing democracy. This argument gains strength as its concepts are also enacted; it is created in Scalar, a platform that allows one to speak with rich media in addition to words. Please view this article here: http://scalar.usc.edu/anvc/kuhn/index

Attention and the Evolution of Hollywood Film

Article

Full-text available

Mar 2010
PSYCHOL SCI

Reaction times exhibit a spectral patterning known as 1/f, and these patterns can be thought of as reflecting time-varying changes in attention. We investigated the shot structure of Hollywood films to determine if these same patterns are found. We parsed 150 films with release dates from 1935 to 2005 into their sequences of shots and then analyzed the pattern of shot lengths in each film. Autoregressive and power analyses showed that, across that span of 70 years, shots became increasingly more correlated in length with their neighbors and created power spectra approaching 1/f. We suggest, as have others, that 1/f patterns reflect world structure and mental process. Moreover, a 1/f temporal shot structure may help harness observers' attention to the narrative of a film.

Scaling-Up and Speeding-Up Video Analytics Inside Database Engine

Conference Paper

Aug 2009

Most conventional video processing platforms treat database merely as a storage engine rather than a computation engine, which causes inefficient data access and massive amount of data movement. Motivated by providing a convergent platform, we push down video processing to the database engine using User Defined Functions (UDFs). However, the existing UDF technology suffers from two major limitations. First, a UDF cannot take a set of tuples as input or as output, which restricts the modeling capability for complex applications, and the tuple-wise pipelined UDF execution often leads to inefficiency and rules out the potential for enabling data-parallel computation inside the function. Next, the UDFs coded in non-SQL language such as C, either involve hard-to-follow DBMS internal system calls for interacting with the query executor, or sacrifice performance by converting input objects to strings. To solve the above problems, we realized the notion of Relation Valued Function (RVF) in an industry-scale database engine. With tuple-set input and output, an RVF can have enhanced modeling power, efficiency and in-function data-parallel computation potential. To have RVF execution interact with the query engine efficiently, we introduced the notion of RVF invocation patterns and based on that developed RVF containers for focused system support. We have prototyped these mechanisms on the Postgres database engine, and tested their power with Support Vector Machine (SVM) classification and learning, the most widely used analytics model for video understanding. Our experience reveals the value of the proposed approach in multiple dimensions: modeling capability, efficiency, in-function data-parallelism with multi-core CPUs, as well as usability; all these are fundamental to converging data-intensive analytics and data management.

Scene completion using millions of photographs

Article

Jul 2007

What can you do with a million images? In this paper we present a new image completion algorithm powered by a huge database of photographs gathered from the Web. The algorithm patches up holes in images by finding similar image regions in the database that are not only seamless but also semantically valid. Our chief insight is that while the space of images is effectively infinite, the space of semantically differentiable scenes is actually not that large. For many image completion tasks we are able to find similar scenes which contain image fragments that will convincingly complete the image. Our algorithm is entirely data-driven, requiring no annotations or labelling by the user. Unlike existing image completion methods, our algorithm can generate a diverse set of results for each input image and we allow users to select among them. We demonstrate the superiority of our algorithm over existing image completion approaches.

Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data

Paul Zikopoulos
Chris Eaton
Paul Zikopoulos

The Numbers Speak Moving Into Pictures

Barry Salt

Cinemetrics: Movie Measurement and Study Tool Database

Yuri Tsivian
Gunars Civjans

Yuri Tsivian and Gunars Civjans. Cinemetrics: Movie Measurement and Study Tool Database. http://www.cinemetrics.lv/.

Cinemetrics thesis project

Frederic Brodbeck

Large Scale Video Analytics: On-demand, iterative inquiry for moving image research

Abstract and Figures

Recommended publications

Toward context-based image classification

Photo Retrieval: Multimedia's Chance to Solve a Real Problem for Real People

Image Retrieval Based on Combined features of DCT and Shape Descriptor

Region-based image retrieval system using efficient feature description