Conference PaperPDF Available

Semantic video adaptation based on automatic annotation of sport videos

October 2004

October 2004

DOI:10.1145/1026711.1026758

Conference: Proc. of ACM Workshop on Multimedia on Information Retrieval (MIR)

Authors:

Marco Bertini

University of Florence

Alberto Del Bimbo

University of Florence

Rita Cucchiara

Università degli Studi di Modena e Reggio Emilia

Andrea Prati

Università di Parma

Semantic video adaptation improves traditional adaptation by taking into account the degree of relevance of the different portions of the content. It employs solutions to detect the significant parts of the video and applies different compression ratios to elements that have different importance. Performance of semantic adaptation heavily depends on the precision of the automatic annotation and the way of operation of the codec which is used to perform adaptation at the event or object level. In this paper, we discuss critical factors that affect performance of automatic annotation and define new performance measures of semantic adaptation, Viewing Quality Loss and Bitrate Cost Increase , that are obtained from classical PSNR and Bit Rate, but relate the results of semantic adaptation with the user's preferences and expectations. The new measures are discussed in detail for a system of sport annotation and adaptation with reference to different user profiles

Examples of standard compression compared with the semantic approach.

…

Comparison between PSNR-BR classical metrics and newly defined VQL and BCI.

…

: Bitrate Cost Increase.

…

Figures - uploaded by Rita Cucchiara

Content may be subject to copyright.

Content uploaded by Rita Cucchiara

Content may be subject to copyright.

Semantic Video Adaptation based on Automatic

Annotation of Sport Videos

Marco Bertini, Alberto Del Bimbo

Dipartimento di Sistemi e Informatica

University of Florence

Via S. Marta, 3

Firenze, Italy

{bertini,delbimbo}@dsi.uniﬁ.it

Rita Cucchiara, Andrea Prati

Dipartimento di Ingegneria dell’Informazione

University of Modena and Reggio Emilia

Via Vignolese, 905

Modena, Italy

{cucchiara.rita, prati.andrea}@unimore.it

ABSTRACT

Semantic video adaptation improves traditional adaptation by tak-

ing into account the degree of relevance of the different portions

of the content. It employs solutions todetect the signiﬁcant parts

of the video and applies different compression ratios to elements

that have differentimportance. Performance ofsemantic adaptation

heavily depends on the precision of the automatic annotation and

the way of operationof the codec which is used to performadap-

tation at the event or object level. In this paper, we discuss critical

factors that affect performance of automatic annotation and deﬁne

new performance measures of semantic adaptation, Viewing Qual-

ity Loss and Bitrate Cost Increase, that are obtained from classical

PSNR and Bit Rate, but relate the results of semantic adaptation

with the user’s preferences and expectations. The new measures

are discussed in detail for a system of sport annotation and adapta-

tion with reference to different user proﬁles.

Categories and Subject Descriptors

H.3.7 [Information Storage and Retrieval]: Digital Libraries—

Systems issues, User issues; H.2.4 [Systems]: Multimedia databases;

I.4.2 [Compression(Coding)]

General Terms

Performance, Human factors

Keywords

Video adaptation, automatic video annotation, transcoding

1. INTRODUCTION

Universal multimedia access is becoming more and more popu-

lar due to the diffusionof new devices to access to multimedia data

from any place. Among multimedia data, videos are probably the

more challenging since they call for high bandwidth requirement

to preserve as much as possible ofthe original quality. However,

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed forproﬁt or commercial advantage and that copies

bear this notice and the full citation on the ﬁrst page. To copy otherwise, to

republish, to post on servers or to redistri bute to lists, requires prior speciﬁc

permission and/or a fee.

MIR’04, October 15–16, 2004, New York, New York,USA.

meeting the constraints of the device and the requirements of the

user, and keeping low the costs of the transmission (in terms of

data transferred and time required)at the same time, is not a trivial

task.

Video adaptation techniques have been widely studied in the

last years [13, 8] in order to enable UniversalMultimedia Access

(UMA) fromany place andalso with devices withlimited resources.

Most of the video adaptation techniques provide syntactic video

adaptation performing scaling, color subsampling, temporal down-

scaling or changing the compression’s factor [9]. This results in

that the video is adapted equally. Therefore, there is, on the one

side, bandwidth waste for preserving the quality of useless parts of

the video, and, on the other side, excessive degradation of mean-

ingfulparts.

As a consequence, recently many researchers have concentrated

their efforts in deﬁningnew “semantics-based” or “content-based”

video adaptation approaches. The rationale is that the user can

elicit relevant video elements (either objects or events of interest)

and deﬁne for each of them a degree of relevance. Relevant ele-

ments should be detected automatically in the video, possibly with

computer vision-basedannotation modules, and the quality of their

transmission shouldbe adapted to theiruser-deﬁned relevance. This

selective adaptation can be done at object-level (connected regions

in a frame) or at event-level (sequences of frames with common

meaning). For example, in the transmission of a video of a soccer

game, we can send good qualityvideo only for the frames where

interesting actions take place, or, within the individual frames, pro-

vide high resolution sampling only for the most relevant objects

(e.g., regions in the surrounding of the players).

Videoadaptation in terms ofthe relevance ofthe objects detected

in each frame has been addressed by [14] and [2] forvideo surveil-

lance applications. In[14], Vetro et al. presented an object-based

transcoding framework that uses dynamic programming or meta-

data, for the allocation of bits among the multiple objects in the

scene. In [7] the advantages of representing visual data and thus

semantics in terms of regions corresponding to objects is clearly

evidenced. Chang et al. [6]have ﬁltered live video content accord-

ing to events and highlights. In [2] we have developed a prototype

system for annotation and adaptationof soccer sport videos, with

adaptation based on objects and events. However, a still open prob-

lem is the choice of the granularity of the elements to be exploited

for the adaptation, thatis deciding to work at object- orevent-level.

A detailed comparison of the possible approaches has been dis-

cussed in [3].

In addition,there is the needof a reliableand consistent perfor-

mance evaluationof content-based video adaptation systems. Most

291

of the measuresfor performanceevaluation ofvideo adaptationsys-

tems are, however,still based on the PSNR (PeakSignal-to-Noise

Ratio) [6, 2] withsome noticeable exceptions that take intoaccount

non-linear distortion effects on the human perception system [14,

5]. However, in the case of content-based video adaptation, they

all can not take into account user’s satisfaction and how much this

is affected by errors in the video annotation system. A few ap-

proaches in this direction havebeen proposed recently. A weighted

PSNR has been deﬁned in [2] to include user’s preferences. Chang

et al. [6]have deﬁned a function that takes into account both qual-

ity in the video transfer (by means of PSNR) and the consumed

bandwidth (using bit rate, BR).

In this paperwe present a new metric forperformance evaluation

of content-based video adaptationsystems that takes into account

the overall user’s satisfaction by merging the effects of annotation

errors and adaptation distortions. The new performance measures,

Viewing Quality Loss and Bitrate Cost Increase, are obtained from

classical PSNR andBit Rate, but relate theresults of semantic adap-

tation with the user’s preferences and expectations. They can be

used with any annotation system and only content-based adapta-

tion module.

2. ANNOTATION AND SEMANTIC

ADAPTATION SYSTEMS

The reference framework is a system resulting from the integra-

tion of an automatic annotationengine and a content-based adap-

tation module. Video annotation has been widely studied over the

last few years, resulting in many research prototypes and several

commercialtools. Among the possibleapplicationcontexts, sports

annotation is very widespread, due to its deployment in broad-

casting, post-production logging, indexing, and so on [1, 10, 16].

Known context is usually structured in an ontology, the deﬁni-

tion of which is beneﬁcial not only in the annotationprocess, but

also for informationretrieval. When video annotation is associated

with video access and delivery, and thus with content adaptation,

the most common frameworks for knowledgerepresentation come

from MPEG-7 and MPEG-21standards [12, 11]. In MPEG-7 the

description schemes (DS) are modeled on XML schemas, easing

the use of parsingtools for indexing, querying,and retrieving infor-

mation. Furthermore, efforts have been made to standardizetech-

niques and rules formodelingthe users’ requests and preferences.

Recently, the MPEG-21 standardization committee has addressed

the UMA-related problems by including a Digital Item Adaptation

(DIA) section in the Part 7 of the standard (ISO/IEC 21000-7) in

order to adapt the media content to the device’s limitations [12].

2.1 Ontology

According to the MPEG-7 terminology, each frame of a video

can be divided into spatial segments, which aresets of not neces-

sarily connected pixelsof a frame. Within them, we call theregions

with associated semantics ROIs, regions of interest. Thus, we can

deﬁne the set of meaningful objects of a video as

O={ROIi}∪{o};O={ROI1,ROI

2, ..., R OIn}∪{o}

where oare the parts of a frame that do notbelong to any ROI. The

ROIs are segmented by means of visual descriptors able to extract

and classify objects, and to perform temporaland spatial reasoning

on the scene.

Then, we shall use the concept oftemporal segments as deﬁned

by MPEG-7. A temporalsegment in MPEG-7 is a set of not nec-

essarily contiguous frames. We shall use the termevents to deﬁne

the types of temporal segments with a speciﬁc meaning. Unlike

MPEG-7 which uses the word “event” to deﬁne the condition that

connects objects to each other in any instant, in our case, “event”

refers only tothe continuous presence of a fact alongthe time se-

quence. In practice, we consider a set of events Edeﬁned as:

E={hi}∪{e};E={h1,h

2, ..., hm}∪{e}

where each hican be viewed as a highlight, while the category

ecomprises all not relevant parts of the video. The ontology is

thus deﬁned in terms of objects, events, and their relationships as

described by means of acyclic graphs.

2.2 User device’s requirements

The device’srequirements representthe constraints thatthe client

device imposes on the access to the video content. For instance,

the maximum resolution of the device’s display limits the spatial

dimension of the video. In this case, spatial downscaling is manda-

tory and has the positive knock-on effect of reducing the required

bandwidth. Furthermore,current mobile devices havelimited color

resolution (typically, no more than 65,535 colors). Consequently,

a reduction in colordepth might be necessary in orderto adapt to

the handset’s capabilities. Although these two alterations are un-

avoidable and bringbeneﬁts in terms of requiredbandwidth, they

may also entail notable image degradation, especially with regard

to color reduction. Tests on basic adaptation techniques have been

carried outin [4]. Mobile devices normallyhave a limited available

memory and a computational capability that sometimes circum-

scribes the possibility to run sophisticated codecs and browsers.

Thus, the video adaptation server should supply different encoded

versions of the video, for instance with MPEG-2 or MPEG-4 stan-

dards. Another requirement that must be taken into account is the

maximumbandwidth availablefor the connection. Currenttelecom-

municationstandards for mobiledevices are GPRS (GeneralPacket

Radio System) and UMTS (Universal Mobile Telecommunications

System), whose maximumdedicated bandwidthsare about 115kbps

and 2 Mbps, respectively. Since typical bandwidthrequirementsfor

videos at PAL/NTSCframe rate are muchhigher, suitable and ef-

fective compression techniques must be employed. In particular,

the selective adaptation of the compression based on content and

user’s interests can improve performance considerably.

2.3 User’s interests

The user’s interests can be basically deﬁned in terms of viewing

quality and service costs. Therefore the basic performanceanalysis

parameters that can be used are PSNR and BR. In [15], a utility

function has been deﬁned showing relationships between different

types of resources (bandwidth, display, etc.) and utilities (objective

or subjective quality, user’s satisfaction, etc.). Here, bitrate and

PSNR are the straightforward parameters adopted for measuring

costs and quality of the video output.

The quality adaptation can be improved by exploitingsemantic

annotation and user’s interests. In particular, we maydeﬁne a set C

of classes of relevance which groups togetherthe parts of the video

that are of the same degree of interest to the user.

Speciﬁcally, a class of relevancegroups entities ofthe ontology

(objects and events) with the same degree ofrelevance for the user.

Formally, giventhe set of classes of relevanceordered by ascending

relevanceC={C0, ..., CNCL

}, each element is deﬁned as:

Ci=<oi,ei>withoi⊆O, ei⊆E(1)

The relevanceassociated to each class is quantiﬁedby means of

a weight assigned by the user. In this paper, we employed three

classes as an example, namely C0,C1and C2of low, medium, and

292

high quality, respectively. The user can assign a relative weight for

each class, indicating the respectiveratios in the relevance, thatwill

basically map onto the compression levels. As an example, setting

the weights to {wC0,w

C1,w

C2}={0.1,0.4,1.0}means that the

quality of class C2should be ten times better than that of class C0.

In this case the performance evaluation depends on the user’s in-

terests. Actually, the user can select his preferences according with

the semantic of the video (e.g., s/he could be more interested ina

shot of goal than a placed kick). The user gives the relative interest

of each class w.r.t. the others and the degree of quality (andcon-

sequent cost) neededforthe most interestingclass. The system se-

lects the compression level of the classes of relevance accordingly.

Then, the ﬁnal performanceparameters, such as PSNR and BR, are

in accordance with theuser’s satisfaction. Nevertheless, while the

variation ofPSNR and BR in function of thecompression is almost

known, the effects of annotation errors on the ﬁnal performance is

not a-priori estimable.

3. ANNOTATION AND ADAPTATION OF

SOCCER VIDEOS

In sports videos, users are usually interested in watching certain

areas of the images, such as the playﬁeld or the zone around the

goal box in soccer, or the zone near the start or arrival in a race.

These regions of interest are extracted by the automatic annotation

system for two purposes: the ﬁrstone is to provide a selective com-

pression at object level, preserving as much quality as possible for

the objects in which the user is moreinterested; the second purpose

is to use the objects as inputs forthe classiﬁcation ofevents. In fact,

in sports certain events can happen only in given areas and under

given conditions (thinkfor instance to the shot on goal in soccer).

The objects that are detected and extractedin soccer videos are

the playﬁeld (PF), the players and the ball (PL):

Osoccer ={PF,PL}∪{o}

where ois the area outside the playﬁeld (e.g., the crowd), which is

of no interest forthe detection of highlightsnor to the viewer of the

video.

The playﬁeld shape is obtained by applyingcolor analysis and

binarization to the video frames. The frame bitmap is processed

using K-ﬁll, ﬂood ﬁll, followed by erosion and dilation. The shape

of the playﬁeld is representedas a polygon for the purposeof au-

tomatic annotation, whilefor the purpose ofvideo adaptation, the

polygon is used for soccer videos, and a bitmap representationis

used for swimming videos. This difference is due to the fact that

accurate detection of the playﬁeld shape and polygonal approxi-

mation are obtained precisely if the color of the playﬁeld area is

uniform, and playﬁeld lines and player “blobs” are of a small size:

the soccer ﬁeld is a typical example in which polygonal shapes can

be extracted reliably in most frames. The portion of playﬁeld that

is framed (and hence the playﬁeldzone where the play takes place)

is identiﬁed by the aspect of the playﬁeld shape and the playﬁeld

lines extracted from the edge image; recognition is performed us-

ing Na¨ıve Bayes classiﬁers. Players and ball blobs are extracted by

color differencing and represented as “binary blobs”. Constraints

on the side ratio of blobs’ boundingboxes and area are used to dis-

card non-player blobs. In order to provide users with a betterun-

derstanding of thevideo content, the blobs of playersand ball are

enlarged in order to include a small part of the area around them.

The problem of modeling highlights can be seen as part of the

problem of detecting special occurrences within the temporal se-

quences. In fact, a generic highlight can be regarded as a con-

catenation of consecutive phases of the competition. Each phase

occurs typically in a distinct zone ofthe playﬁeld, while transitions

between phases are related to the movement of objects such as the

athletes and/orthe ball. In our approach, highlights are modeledus-

ing FSMs. Each highlight is described using a directed graph they

model therelevant steps in theprogression of thegame or race, such

as moving from one partof the playﬁeld to another, accelerating or

decelerating, etc.

Meaningful events that are extracted by the annotation subsys-

tem are the most important highlights. In particular, for soccer,

highlights that have been modeled are: forward launches (FL),

shots on goal (SG), spot kicks as penaltykicks (PK), free kicks near

the goal post, and corner kicks, as well as attacks actions (AA) and

other plays that may lead to a shot on goal. In this paper, we use an

ontologyas follows:

Esoccer ={FL,SG,PK,AA}∪{e}

(2)

where eindicates that no highlight is present in the video stream

being processed.

Table 1 reportsthe Detection Rate (DR) and False Alarm Rate

(FAR) ﬁgures of playﬁeld zones and players in terms of pixels

classiﬁed as belonging to these objects.

Sports video Object DR FA R

Soccer videos Playﬁeld 99,9% 0.16%

Players 99,8% 5.51%

Table1: Performance ﬁgures ofobject automaticdetection over

90’ of soccer video .

Table 2 reports the confusion matrix, showing the precision in

highlight detection and the errors in highlight classiﬁcation. The

percentage in the “other” column indicates the false highlightde-

tection. Finally, in Table 3 the percentages of miss detection are

reported.

The adaptation module performs content-based videoadaptation

according to the bandwidth requirements and the weights of the

classes of relevance. Different compression techniques have been

implemented that performs coding at the semantic level.

The ﬁrst oneexploits the standardadaptive quantizationofMPEG-

2 to select the quantizationscale QS i(QS i∈[0,31]) of each mac-

roblock iof each frame of the video. This approach is referred to

as S-MPEG2. For each i, the dominant class of relevance and the

correspondingQS iare computed, depending on which objects and

event are involved.

Other twocoding policies havebeen implementedbased on MPEG-

4 and, particularly, on the Xvid open source software (http://

www.xvid.org ). Differently from MPEG-2, in MPEG-4 the

quantization values forthe macroblocks within the same Video Ob-

ject Plane (VOP) are sent in a differential format: each value for

a macroblock(except forthe ﬁrst)is coded as {-2,-1,1,2}with re-

spect to the base value of the VOP. This allows MPEG-4 to reduce

the bandwidthrequired forthe adaptive quantization(2 bits foreach

quantization value w.r.t. 5 bits), but restricts the ﬂexibility, practi-

cally preventingus fromthe use ofdifferentquantization scales for

the macroblocks.

The most straightforwardway is to employ the MPEG-4 Simple

Proﬁle(S-MPEG4-SP): it does not considerobjects (and, thus, does

not allow different quantizationfactors within the same frame), but

only events, i.e. different quantization scales are used in different

groups of frames. Instead, working at object-level, the Core Proﬁle

of MPEG-4 can be used (S-MPEG4-CP) and creating a different

VOP for each object extracted by the annotation system. In this

293

True highlight

Recog. highlight Fwd.Launch Shot Goal Placed kick Attack act. Other

Fwd.Launch 89.75% 1.67% 0% 0% 8.58%

Shot on goal 1.525% 93.9% 0% 0% 4.575%

Placed kick 0% 0% 89.75% 0% 10.25%

Attack action 1.6% 1.0% 0% 97.4% 1.0%

Table 2: Performance ﬁgures of highlight automaticdetection over 90’ of soccer video: precision and misclassiﬁcation errors

True highlight

Recognizedhighlight Fwd.Launch Shot on goal Placed kick Attack act.

Misses 5% 13% 7% 25%

Table3: Soccer highlight misses percentage

Compr. Techn. Avg. bandwidth Standard Semantic

MPEG-2 530.30 kbps 32,67 dB 35,57 dB

MPEG-4 179,94 kbps 33,47 dB 36,22 dB

Table 4: Average PSNR for MPEG-2 and MPEG-4, both stan-

dard and semantic approaches over 90’of soccer video

way, we can assign different quantization scales to each object in

dependence to its relevance for the user. However, this approach

has proven to be not suitable in the case of sports videos [3].

In Table 4, we provide a comparisonof the performanceof the

techniques S-MPEG2 and S-MPEG4-SP. Results have been ob-

tained under the hypothesis of ideal (error free) annotation engine

(events and objectsare detected manually), fromDV source videos.

According with the weights assigned to user, we select different

compression factors for objects and events of interests w.r.t. to non

interestingelements.

The average PSNR is calculated at ﬁxed bandwidth. In order to

maintain the frame rate of 10 fps, and comparable viewing quality,

compressed outputs havebeen obtained atthe average bandwidthof

530 kbps for MPEG-2 based solutions, and 180 kbps for MPEG-4-

based solutions (please note that MPEG-4 based solutions achieve

similar viewing qualitywith less bandwidth). The average PSNR

improvementwith semantic adaptation is about 8.5%. Results have

been obtained with a user proﬁleof reference (see Table 5). Videos

included in the test set take into consideration different sources

from different broadcasters and differentconditions, and they are

selected considering the typical averagepercentage of highlights in

a soccer match, as provided by UEFA organization.

4. PERFORMANCE MEASURE FOR

ANNOTATION AND ADAPTATION

Let us consider the case of the access to a whole soccer game(90

minutes) from a mobile device connected with GPRS, whose real

average bandwidthcan be considered 35 kbps. The use of semantic

adaptation enables to achieve acceptable quality for signiﬁcant en-

tities also with this very strict limit. The 90 minutes of videos are

downscaled fromthe PAL format to a220x176 frame size, with ref-

erence to anoff-the-shelflatest generation cellularphone (Motorola

V525). An example of the quality of a relevant frame is reportedin

Fig. 1(b) (and zoomed on the player in Fig. 1(d)) that corresponds

to about 32.7 dB. Ifa standard approach is employed,without ex-

ploiting the semantics, results are much poorer, as demonstrated in

Figs. 1(a) and 1(c), corresponding to about 30.4 dB.

Nevertheless, standard metrics, such as PSNR and BR for the

adaptation module and DR and FAR for the annotation engine,

present two main drawbacks for our purposes:

•these metrics evaluate the performance of the single module

(annotation or adaptation), but not of the integrated system;

in particular, ourproposal aims at evaluatinghow much the

annotation errors affects the overall performance of the sys-

tem;

•these metrics do not take the user’s preferencesinto account;

for instance, degrading the quality of different parts of the

video can have different impacts on the user, depending on

the relevancethat those parts havefor her/him;standard PSNR

does not consider this;

The errors of automatic annotationcan affect the user’s satisfac-

tion. Since objects and events are dividedin classes of relevance by

the users, errors can cause under- or over-estimations of objects or

events. In particular, under-estimation and miss conditions have a

negative impact on user’s satisfaction under the viewpoint of view-

ing quality loss. In fact, in this case, events and/or objects are com-

pressed more than necessary. Instead, costs paid by the user are

lowered since under-estimated objects and events are more com-

pressed. On the other hand, over-estimation and false detection

conditions affect negatively user’s satisfaction with respect to the

cost paid by the user (for transmission, downloading, and storage).

These two effects could compensate each other: two videos differ-

ently annotated could be compressed with the same PSNR and the

same BR, but with a large negative impact in user’s satisfaction:

user can lose details of interests and waste bits for useless parts.

Starting fromthe usual ﬁgures of PSNR at the pixel level and

Bit Rate, we can derive new indexes of performance that do not

take into accountthe parts correctly annotated andadapted but only

the errors: i) Viewing Quality Loss (VQL): resulting from over-

compression due to under-estimation and miss conditions occurred

in the annotation; ii) Bitrate Cost Increase (BCI): resulting from

higher bitrate due to over-estimations and false detections. Let

us call Errt

Qthe set of points of frame tthat have been under-

estimated, i.e. all the pointsthat are supposed to belongto a class

Ciand are, instead, detected as belonging toa class Cj, with j<i.

Correspondingly, let us call Errt

Cthe set of points of frames that

have been over-estimated.

The VQL is evaluated on the pixels that result under-estimated

for each frame It. Using the standard PSNR deﬁnition on this set

Errt

Q, a comparison between ideal (error-free) and actual annota-

tion is provided. The PSNRof under-estimatedpixels in the case

294

Proﬁle C2C1C0Weights (w

C2,w

C1,w

C0)

Proﬁle Ref. <{SG,FL},*> < {PK,AA}, players >residuals (1.0, 0.3, 0.1)

Proﬁle A <SG, * > < {FL, PK,AA}, players >residuals (1.0, 0.3, 0.1)

Proﬁle B <{SG,FL},*> < {PK, AA}, players >residual s (1.0, 0.6, 0.5)

Proﬁle C <{SG, FL}, players > < {PK, AA}, players >residula s (1.0, 0.3, 0.1)

Proﬁle D <{SG, FL},{playﬁeld, players}> < {PK, AA}, players >residuals (1.0, 0.3, 0.1)

Table5: User proﬁles used to evaluate averageVQLand BCI

of actual annotation is denoted by PSNRErrt

Q, and deﬁned as:

PSNR

Errt

Q=10log

10 V2

MAX

MSE

Errt

Q(3)

where where VMAX is the maximum (peak-to-peak) value of the

signal to be measured and MSEErrt

Qis the Mean Square Error of

the frame (limited to Err t

Q), deﬁned as follows:

MSE

Errt

Q=

p∈Errt

d2(p)

|Errt

Q|(4)

with d(p)a properlydeﬁned distance to measure the errorbetween

original and distorted images. As distance, we used the Euclidean

distance in the RGB color space.

The same measure of Eq. 3 can be carried out for ideal (ground-

truthed) annotation: the computed PSNR ID

Errt

Qis also computed

on the set Errt

Qand it is is affected by a non null MSEID

Errt

only due to the selected compression standard and the quantization

scale. The viewing quality loss of the frame is, thus, deﬁned as:

VQL

t=1−

PSNR

Errt

PSNR

Errt

(5)

Since PSNRErrt

Qis computed only on under-estimated pixels

of the frame, its value is lower or equal to that in the case of error-

free annotation (PSNRID

Errt

Q). Consequently, the ratio of Eq. 5

is between 1 (ideal annotation)and 0 (maximumdistortion due to

annotationand adaptationprocesses).

Similarly, the bitrate cost increase, for objects and events, is de-

ﬁned for a frame Itas the ratio between the bandwidth request

in the ideal and actualcase computed on the set of over-estimated

pixels Errt

BCIt=1−BRID

Errt

BRErrt

(6)

Viewing quality loss (VQL) and bitrate cost increase (BCI)at

the video level are directly obtained by averaging the VQL tand

the BCIt:

VQL=



t=0

VQL

N;BCI =



t=0

BCIt

N(7)

where Nis equal to the numberof frames of true highlightplus the

number of frames falsely detected.

The graphs reported in Fig. 2 compare the performance analysis

achievable with classical metrics (such as PSNR and Bit Rate) and

with the new metrics, VQL and BCI, for a sample case. The re-

ported example presents a set of annotation errors. The ideal event

annotation should detect a “forward launch” (FL)event (associated

to class of relevance C1) between frames 0and 42, and a “shot on

goal”(SG) event (of class C2) between frames 282 and 375. An-

notation with the actual system results in a FL detected between

frames 12 and 26, leading to two partial misses (represented by

frame intervals 1 and 3 in Fig. 2), and in a SG between frames

251 and 322, resulting in a partial false detection (number4) and

a partial missed detection (number6). In addition to the errors in

event detection, the actual system makes some errors in the seg-

mentation of the objects (the playﬁeld in this case) that result in

the small cost increase in intervals 2 and 5, and in more relevant

(especially in interval 5)loss of viewing quality. Please note that

the descending PSNRin intervals 4and 5 is due to the decreasing

area of the playﬁeld, since the amount of playﬁeldin the image de-

creases while approaching to the goal. It is also worth noting that

the effect of missed event in the case of FL (intervals 1 and 3) and

in that of SG (interval 6) is different, being different the relevance

of the missed event. The false event in interval 4 results in a BCI

of about 80%. In fact, in this interval, the average occupation of

a frame in the case of correct annotation is about 100 Kbits, that

grows to 650 Kbits since the actual system misclassiﬁes the frame.

From the graphs of Fig. 2 it is evident that the use of classic

metrics is not sufﬁcient. The PSNR reported in the upper graph is

computed to the whole frame Itand can mix both quality and cost

effects of incorrect annotation. As a limit case, these two effects

can neutralize each other, resulting in two videos with the same

average PSNR, but very different user satisfaction levels. From

PSNR and BR only it is not possible, for instance, to understand

how much ofthe PSNR’s decrease of interval 5 is due toannotation

errors and how much is due to reduced playﬁeld size.

According to the deﬁnitions above, both viewing quality loss

and bitrate cost increase depend on the statistics of the objects

and events present in the video, the performance of the annotation

(measured in terms of misses, misclassiﬁcations and losses), the

performance of the adaptation, and, ultimately, the way in which

objects and events are clusteredinto each class of relevance and the

relative importance weight. Thus it appears that the most important

conditions potentiallyinﬂuencing user satisfactionare those related

to events. However players objects are also important for their im-

pact on viewing quality, especially in the presence of meaningful

actions. Among event conditions the most critical one for perfor-

mance is event miss. In fact, in this case all the frames duringthe

whole duration of the event are compressed at a lower rate, that is

proportionalto the relevance weight of the residual C0class that

comprises the events that are less interesting for the user.

5. PERFORMANCE EVALUATION

To evaluatehow VQL and BCI change according to different

sets of user preferences (CoR and weights) and tothe performance

of the automatic annotationsystem (events and objects misses, and

wrongly recognizedhighlights) selected soccer videos taken from

the video testset have been manually annotated,performing objects

and events segmentation, to precisely estimate under- and over-

estimation of objects and events. A reference user proﬁle has been

compared versus 4 other possible user proﬁles, deﬁned as reported

in Table 5.

295

(a) An example frame of standard com-

pression at GPRS bandwidth (b) An example frame of semantic com-

pression at GPRS bandwidth

Figure 1: Examples of standard compression compared with the semantic approach.

Ref. Proﬁle A Proﬁle B Proﬁle C Proﬁle D

FL 4974.95 711.21 4994.88 1222.39 3904.62

SG 5497.50 5497.50 5502.80 1476.17 3683.73

AA 598.89 598.89 797.74 598.89 598.89

PK 485.72 485.72 713.20 485.72 485.72

Table 6: Bitrate of the video obtained with the actual annota-

tion (ﬁgures are in kbps).

The test set consists ina set of selected clipsfromdifferentsoc-

cer videos: in total about 6000 frames have been annotated with

2361 highlights frames (624 frames of forward launches, 760 of

shot on goal, 320 of attack actions, 650 of placed kicks). Also the

annotation at object level has been provided withplayers and play-

ﬁeld. Both manual and annotated versions have been adapted with

S−MPEG2, according with the classes of relevances of the 5

users. S−MPEG2has been preferred to S−MPEG4−SP

for two main reasons: ﬁrst, because MPEG-2 is less computational

intensive, and, thus, more suitable forlow-powerdevices; second,

as stated above, MPEG-2 enables selective compressionat both ob-

ject and event level.

The content-based adaptationsystem provides a compressionat

Ref. Proﬁle A Proﬁle B Proﬁle C Proﬁle D

FL 35.47 31.92 35.47 31.85 32.55

SG 33.64 33.64 33.64 31.35 32.39

AA 32.29 32.29 34.01 32.29 32.29

PK 30.88 30.88 32.57 30.88 30.88

Table 7: Peak Signal-to-Noise Ratio (PSNR) of the video ob-

tained withthe actual annotation(ﬁgures are in dB).

constant quality for the pixels belonging to the highest classes of

relevance. For this reason, similarly to the Constant Quality (CQ)

method ofMPEG, we deﬁne this approachas Constant Best Quality

(CBQ). Tables 6 and 7 reports the averagevalues divided fortype

of highlights. For instance, User Ref and User A are different only

regardingFL. User B has the same PSNRand BR for the FLand SG

highlights (bothof class C2), while obtains a higher overall quality

(PSNR) thanUser Ref in AA and PK highlights (bothof class C1).

User C and UserD have apparentlylower PSNR thanRef because it

is averaged over all the pixels of the frame. These proﬁles, indeed,

ask for the best quality only for the regions of interest (players and

playﬁeld, respectively). The video quality in interesting areas is

almost same, but a decrease in the required bandwidth is evident.

296

Figure 2: Comparison between PSNR-BR classical metrics and newly deﬁned VQL and BCI.

Finally, Tables 8 and 9 report the results achieved with the new

metrics. Here, many considerationsabout the goodness ofthe whole

annotation and adaptation systems can be inferred. First, there is a

high bitrate cost increase dueto the errors in shotof goal. This is

due to the high number of false positives and the average length of

shot on goal, i.e. to the number of frames erroneously classiﬁed as

SG. It is worth noting that, fromTable 2, FL presents also higher

false alarm rate at event level than SG, but the BCI is always lower

than that of SG. This can be explained because shots on goal are

highlight that last more frames, thus the over-estimated frames in

the case of missed events are more (the average length for shot on

goal is about 140 frames, while in the case of forward launch is 60

frames). Another interesting results provided by the BCI measure

is that obtained comparing User Ref and User D. It can be eas-

ily noted that, at least for FL and SG, the BCI is higher for User

D. The BCI is due to two factors: false/over-estimatedevents and

false/over-estimated objects. In the ﬁrst case, the BCI is similar to

that of User Ref since the playﬁeld is usually very large in SG and

297

Ref. Proﬁle A Proﬁle B Proﬁle C Proﬁle D

FL 9.07% 1.23% 8.14% 5.58% 9.39%

SG 11.78% 11.78% 11.07% 8.10% 13.76%

AA 0.45% 0.45% 0.56% 0.45% 0.45%

PK 0.27% 0.27% 0.30% 0.27% 0.27%

Table8: Bitrate Cost Increase.

Ref. Proﬁle A Proﬁle B Proﬁle C Proﬁle D

FL 2.75% 0.61% 1.47% 2.49% 3.60%

SG 1.31% 1.31% 0.83% 3.04% 2.54%

AA 1.34% 1.34% 1.14% 1.34% 1.34%

PK 0.24% 0.24% 0.24% 0.24% 0.24%

Table 9: Viewing Quality Loss.

FL actions. In addition, in the case of User D, there is the over-

estimation of the playﬁeldthat contributes to increasingthe BCI. A

similar consideration can be done for User C, but in this case the

of missed/over-estimated events the errors is lower since the ob-

jects (players) are smaller and the number of over-estimatedpixels

smaller. Thus, the overall BCI is lowerthan in the case of User Ref.

and User D.

Regarding VQL, the error in quality is limited, especially for

User B. In fact, the User B accepts higher costs (due to a higher

bitrate) that limit the effects of miss detections. User A that is less

interested in FL is not affected by signiﬁcant errors. AA and PK

highlights are always considered of average importance.

Acknowledgments

The work has been carried out inthe context of DELOS, the Net-

work of Excellence in digital libraries of European VI Framework

programme.

6. REFERENCES

[1] J. Assfalg, M. Bertini, C. Colombo,A. D. Bimbo, and

W. Nunziati. Semantic annotationof soccer videos:

automatic highlights identiﬁcation. Computer Vision and

ImageUnderstanding, 92(2-3):285–305,

November-December2003.

[2] M. Bertini, R. Cucchiara, A. D. Bimbo, and A. Prati. An

integrated framework forsemantic annotation and

transcoding. Multimedia tools and applications, to appear.

[3] M. Bertini, R. Cucchiara, A. Del Bimbo, and A. Prati.

Object-based and event-based semantic video adaptation. In

Proceedings of Int’l Conference on Pattern Recognition, to

appear, Aug. 2004.

[4] R. Cucchiara, C. Grana, and A. Prati. Semantic video

transcoding using classes of relevance. InternationalJournal

of Image and Graphics, 3(1):145–169,Jan. 2003.

[5] N. Damera-Venkata, T. D. Kite, W. S. Geisler, B. L. Evans,

and A. C. Bovik. Image quality assessment based on a

degradation model. IEEE Transactions on ImageProcessing,

9(4):636–650, Apr. 2000.

[6] J.-G. Kim, Y. Wang, and S.-F. Chang;. Content-adaptive

utility-based videoadaptation. In Proc. of IEEE Int’l

Conference onMultimedia& Expo, pages 281–284, July

2003.

[7] M. Kunt. Object-based Video Coding, chapter 6.3, pages

585–596. in ’Handbook ofImage and Video Processing’.

Academic Press, 2000.

[8] R. Mohan, J. Smith, and C. Li. Adapting multimedia internet

content for universal access. IEEE Transactions on

Multimedia, 1(1):104–114, March 1999.

[9] T. Shanablehand M. Ghanbari.Heterogeneous video

transcoding to lowerspatio-temporal resolution and different

encoding formats. IEEE Transactions on Multimedia,

2(2):101–110, June 2000.

[10] S.Nepal, U.Srinivasan, and G.Reynolds. Automatic detection

of ‘goal’ segments in basketball videos. In Proc. of ACM

Multimedia, pages 261–269, 2001.

[11] B. L. Tseng, C.-Y. Lin, and J. R. Smith. Using MPEG-7 and

MPEG-21 for personalizingvideo. IEEE Multimedia,

11(1):42–52, Jan.-Mar. 2004.

[12] A. Vetro. MPEG-21 digital item adaptation: enabling

universal multimedia access. IEEE Multimedia, 11(1):84–87,

Jan.-Mar. 2004.

[13] A. Vetro, C. Chrisopoulos, and H. Sun. Video transcoding

architectures and techniques: An overview. IEEE Signal

Processing Magazine, 20(2):18–29, Mar. 2003.

[14] A. Vetro, T. Haga, K. Sumi, and H. Sun;. Object-based

coding for long-term archiveof surveillance video. In Proc.

of IEEE Int’l Conference on Multimedia & Expo, volume 2,

pages 417–420, 2003.

[15] Y. Wang, J.-G. Kim, and S.-F. Chang;. Content-based utility

function prediction for real-timeMPEG-4 video transcoding.

In Proc. of IEEE Int’l Conference on Image Processing,

volume 1, pages 189–192, 2003.

[16] W. Zhou, A. Vellaikal, , and C. Kuo. Rule-based video

classiﬁcation system for basketball video indexing. In

Proc. ACM Multimedia 2000 workshop, pages 213–216,

2000.

298

Faceted Search and Retrieval Based on Semantically Annotated Product Family Ontology

Article

Feb 2009

With the advent of various services and applications of Semantic Web, semantic annotation had emerged as an important research area. The use of semantically annotated ontology had been evident in numerous information processing and retrieval tasks. One of such tasks is utilizing the semantically annotated ontology in product design which is able to suggest many important applications that are critical to aid various design related tasks. However, ontology development in design engineering remains a time consuming and tedious task that demands tremendous human efforts. In the context of product family design, management of different product information that features efficient indexing, update, navigation, search and retrieval across product families is both desirable and challenging. This paper attempts to address this issue by proposing an information management and retrieval framework based on the semantically annotated product family ontology. Particularly, we propose a document profile (DP) model to suggest semantic tags for annotation purpose. Using a case study of digital camera families, we illustrate how the faceted search and retrieval of product information can be accomplished based on the semantically annotated camera family ontology. Lastly, we briefly discuss some further research and application in design decision support, e.g. commonality and variety, based on the semantically annotated product family ontology.

Multi-facet product information search and retrieval using semantically annotated product family ontology

Article

Jul 2010
INFORM PROCESS MANAG

Applications of Factorization Theorem and Ontologies for Activity ModelingRecognition and Anomaly Detection

Article

May 2005

Umut Akdemir

THE SPOKEN IMPACT PROJECT: USING AUDIO & VISUAL FEEDBACK TO IMPACT VOCALIZATION IN NON-VERBAL CHILDREN WITH AUTISTIC SPECTRUM DISORDER

Article

Joshua M. Hailpern

A Workflow Model for Collaborative Video Annotation - Supporting the Workflow of Collaborative Video Annotation and Analysis Performed in Educational Settings.

Conference Paper

Jan 2009

There is a growing number of application scenarios for computer-supported video annotation and analysis in educational settings. In related research work, a large number of different research fields and approaches have been involved. Nevertheless, the support of the annotation workflow has been little taken into account. As a first step towards developing a framework that assist users during the annotation process, the single work steps, tasks and sequences of the workflow had to be identified. In this paper, a model of the underlying annotation workflow is illustrated considering its single phases, tasks, and iterative loops that can be especially associated with the collaborative processes taking place.

VCode and VData: Illustrating a new Framework for Supporting the Video Annotation Workflow

Conference Paper

Full-text available

May 2008

ABSTRACT Digital tools for annotation of video have the promise to provide immense,value to researchers in disciplines rang- ing from psychology to ethnography to computer,science. With traditional methods for annotation being cumbersome, time-consuming, and frustrating, technological solutions are situated to aid in video annotation by increasing reliabil- ity, repeatability, and workflow optimizations. Three no- table limitations of existing video annotation tools are lack of support for the annotation workflow, poor representation of data on a timeline, and poor interaction techniques with video, data, and annotations. This paper details a set of design requirements intended to enhance video annotation. Our framework is grounded in existing literature, interviews with experienced coders, and ongoing discussions with re- searchers in multiple disciplines. Our model is demonstrated in a new,system called VCode and VData. The benefit of our system is that is directly addresses the workflow and needs of both researchers and video coders. Categories and Subject Descriptors

Prozesse und Abläufe beim kollaborativen Wissenserwerb mittels computergestützter Videoannotation.

Conference Paper

Full-text available

Jan 2009

Semantic Video Adaptation using a Preprocessing Method for Mobile Environment

Conference Paper

Jun 2010

Video adaptation is critical for video encoding and streaming in mobile environment. Semantic video adaptation takes into account the degree of relevance of different part of the image and tries to adapt video content to meet wireless channel constraint. Making assumption that only changing region of the video is important to mobile user, we propose a real-time semantic level adaptation technique based on video preprocessing. Firstly moving foreground of a video frame is segmented from background scene. Then a low-pass filter is applied to the background of the video frame before it is encoded. By this way, background is blurred and video encoder is forced to allocate more bit rate to foreground part of the frame than the background thus improves the quality of semantically significant portion of the image. To improve robustness a user feedback strategy is employed enabling user to choose the degree of background blurring according their preferences. The simulation result shows that the proposed method improves the quality of encoded video by both subjective and objective measurement.

Workflow-Based Architecture for Collaborative Video Annotation

Conference Paper

Jul 2009

In video annotation research, the support of the video annotation workflow has been taken little into account, especially concerning collaborative use cases. Previous research projects focus each on a different essential part of the whole annotation process. We present a reference architecture model which is based on identified phases of the video annotation workflow. In a first step, the underlying annotation workflow is exemplified with respect to its single phases, tasks, and loops. Secondly, the system architecture is going to be exemplified with respect to its elements, their internal procedures, as well as the interaction between these elements. The goals of this paper are to provide the reader with a basic understanding of the specific characteristics and requirements of collaborative video annotation processes, and to define a reference framework for the design of video annotation systems that include a workflow management system.

Multimedia Content Adaptation: Operation Selection in the MPEG-21 Framework.

Conference Paper

Jan 2005

Whilst many content adaptation operations exist and a number of different approaches are currently being proposed, selecting the right content adaptation operation is largely an ad hoc process. In this paper we discuss issues pertinent to content adaptation operations and in particular the process of their selection by taking the view that a selection process needs to be informed both by the results of the evaluation of generated content and the analysis of the content. The discussion' is made in the context of MPEG-21 which supports both Quality of Service (QoS) and user priority preferences, the combination of which are essential for instituting a process of selection of appropriate adaptation operations.

Semantic Annotation of Soccer Videos: Automatic Highlights Identification

Article

Full-text available

Nov 2003

Automatic semantic annotation of video streams allows both to extract significant clips for production logging and to index video streams for posterity logging. Automatic annotation for production logging is particularly demanding, as it is applied to non-edited video streams and must rely only on visual information. Moreover, annotation must be computed in quasi real-time. In this paper, we present a system that performs automatic annotation of the principal highlights in soccer video, suited for both production and posterity logging. The knowledge of the soccer domain is encoded into a set of finite state machines, each of which models a specific highlight. Highlight detection exploits visual cues that are estimated from the video stream, and particularly, ball motion, the currently framed playfield zone, players’ positions and colors of players’ uniforms. The highlight models are checked against the current observations, using a model checking algorithm. The system has been developed within the EU ASSAVID project.

Semantic Video Transcoding Using Classes of Relevance.

Article

Full-text available

Jan 2003

In this work we present a framework for on-the-fly video transcoding that exploits computer vision-based techniques to adapt the Web access to the user requirements. The proposed transcoding approach aims at coping with both user bandwidth and resources capabilities, and with user interests in the video's content. We propose an object-based semantic transcoding that, according to the user-defined classes of relevance, applies different transcoding techniques to the objects segmented in a scene. Object extraction is provided by on-the-fly video processing, without manual annotation. Multiple transcoding policies are reviewed and a performance evaluation metric based on the Weighted Mean Square Error (and corresponding PSNR), that takes into account the perceptual user requirements by means of classes of relevance, is defined. Results are analyzed by varying transcoding techniques, bandwidth requirements and video types (with indoor and outdoor scenes), showing that the use of semantics can dramatically improve the bandwidth to distortion ratio.

An Integrated Framework for Semantic Annotation and Adaptation

Article

Full-text available

Aug 2005

Tools for the interpretation of significant events from video and video clip adaptation can effectively support automatic extraction and distribution of relevant content from video streams. In fact, adaptation can adjust meaningful content, previously detected and extracted, to the user/client capabilities and requirements. The integration of these two functions is increasingly important, due to the growing demand of multimedia data from remote clients with limited resources (PDAs, HCCs, Smart phones). In this paper we propose an unified framework for event-based and object-based semantic extraction from video and semantic on-line adaptation. Two cases of application, highlight detection and recognition from soccer videos and people behavior detection in domotic* applications, are analyzed and discussed.

Object-based and event-based semantic video adaptation

Conference Paper

Full-text available

Jan 2004

Semantic video adaptation allows transmitting video content with different viewing quality, depending on the relevance of the content from the user's viewpoint. To this end, an automatic annotation subsystem must be employed that automatically detect relevant objects and events in the video stream. We present a composite framework that is made of an automatic annotation engine and a semantics-based adaptation module. Three new different compression solutions are proposed that work at the object or event level. Their performance is compared according to a new measure that takes into account the user's satisfaction and the effects on it of the errors in the annotation module.

Object-based coding for long-term archive of surveillance video

Conference Paper

Full-text available

Aug 2003

This paper describes video coding and segmentation techniques that can be used to achieve significant increase in storage capacity. Specifically, we examine the possibility to use object- based coding for efficient long-term archiving of surveillance video. We consider surveillance systems with many camera sources in which we are required to store several months of video data for each source, thus storage capacity is a major concern. The paper considers several automatic segmentation algorithms. With each algorithm, we analyze the shape coding overhead and implication on overall storage requirements, as well as the effect each algorithm has on the reconstructed quality of frames. Additionally, this paper reviews techniques to dynamically control the temporal rate of objects in the scene and perform bit allocation. Experimental results show that up to 90% savings in storage can be achieved with the proposed method compared to frame-based video coding techniques. The cost for this savings is that the accuracy of the background is compromised; however, we feel that this is satisfactory for the application under consideration.

Video transcoding arch. and techniques: An overview

Article

Automatic detection of 'Goal' segments in basketball videos

Conference Paper

Oct 2001

Advances in the media and entertainment industries, for example streaming audio and digital TV, present new challenges for managing large audio-visual collections. Efficient and effective retrieval from large content collections forms an important component of the business models for content holders and this is driving a need for research in audio-visual search and retrieval. Current content management systems support retrieval using low-level features, such as motion, colour, texture, beat and loudness. However, low-level features often have little meaning for the human users of these systems, who much prefer to identify content using high-level semantic descriptions or concepts. This creates a gap between the system and the user that must be bridged for these systems to be used effectively. The research presented in this paper describes our approach to bridging this gap in a specific content domain, sports video. Our approach is based on a number of automatic techniques for feature detection used in combination with heuristic rules determined through manual observations of sports footage. This has led to a set of models for interesting sporting events-goal segments-that have been implemented as part of an information retrieval system. The paper also presents results comparing output of the system against manually identified goals.

Object-Based Video Coding

Article

Image quality assessment based on a degradation model

Article

Feb 2000

We model a degraded image as an original image that has been subject to linear frequency distortion and additive noise injection. Since the psychovisual effects of frequency distortion and noise injection are independent, we decouple these two sources of degradation and measure their effect on the human visual system. We develop a distortion measure (DM) of the effect of frequency distortion, and a noise quality measure (NQM) of the effect of additive noise. The NQM, which is based on Peli's (1990) contrast pyramid, takes into account the following: 1) variation in contrast sensitivity with distance, image dimensions, and spatial frequency; 2) variation in the local luminance mean; 3) contrast interaction between spatial frequencies; 4) contrast masking effects. For additive noise, we demonstrate that the nonlinear NQM is a better measure of visual quality than peak signal-to noise ratio (PSNR) and linear quality measures. We compute the DM in three steps. First, we find the frequency distortion in the degraded image. Second, we compute the deviation of this frequency distortion from an allpass response of unity gain (no distortion). Finally, we weight the deviation by a model of the frequency response of the human visual system and integrate over the visible frequencies. We demonstrate how to decouple distortion and additive noise degradation in a practical image restoration system.

Content-based utility function prediction for real-time MPEG-4 video transcoding

Conference Paper

Oct 2003
Image Process

Utility function based transcoding is an efficient systematic solution for choosing optimal media transcoding operation to meet dynamic resource constraints (such as bandwidth). However, to date the real-time generation of utility function is not feasible due to computational complexity. In this paper we present a content-based utility function prediction framework for real-time MPEG-4 video transcoding. We develop a statistical approach combining real-time compressed-domain feature extraction, content-based pattern classification and regression. Our extensive experiment results demonstrate that the proposed method achieves very promising prediction accuracy - up to 89% in choosing the optimal transcoding operation with the highest quality from multiple alternatives meeting the same target bitrate.

Semantic video adaptation based on automatic annotation of sport videos

Abstract and Figures

Recommended publications

Embodiment Design Problem Structuring, For Using A Decision Support System

Lorenzen's Games and Linear Logic

Research on Component Composition based on Feature Model.

Attribute Grammar Evolution