Fig 1 - uploaded by Marcin Grzegorzek
Content may be subject to copyright.
MPEG-7 annotation example of an image adapted from Wikipedia, http:// en.wikipedia.org/wiki/Yalta Conference

MPEG-7 annotation example of an image adapted from Wikipedia, http:// en.wikipedia.org/wiki/Yalta Conference

Source publication
Conference Paper
Full-text available
Multimedia constitutes an interesting field of application for Semantic Web and Semantic Web reasoning, as the access and management of multimedia content and context depends strongly on the semantic descriptions of both. At the same time, multimedia resources constitute complex objects, the descriptions of which are involved and require the founda...

Contexts in source publication

Context 1
... D. Roosevelt, and Josef Stalin, respectively. Having these tools, she would like to run the face recognition web services on images and import the extraction results into the authoring tool in order to automatically generate links from the detected face regions to detailed textual information about Churchill, Roosevelt, and Stalin (image in Fig. ...
Context 2
... descriptors, which can be represented in either XML or a binary format. While it is possible to specify very detailed annotations using these descriptors, it is not possible to guarantee that MPEG-7 metadata generated by different agents will be mutually understood due to the lack of formal semantics of this language [32,87]. The XML code of Fig. 1-B illustrates the inherent interop- erability problems of MPEG-7: several descriptors, semantically equivalent and representing the same information while using different syntax can coexist [88]. As Nathalie used three different face recognition web services, the extraction re- sults of the regions SR1, SR2, and SR3 differ from each ...
Context 3
... MPEG-7 code example given in Fig. 1 highlights that the formalization of data structures is not sufficient so far. Complex MPEG-7 types can include nested types that again have to be represented by structured-data-descriptions. In our example, the MPEG-7 SemanticType contains the element Definition which is of complex type TextAnnotationType. The digital data pattern ...
Context 4
... application of the Winston Churchill face recognizer results in an anno- tation RDF graph that is depicted in the upper part of Fig. 7 (visualized by an UML object diagram 27 ). The decomposition of Fig. 1-A, whose content is represented by id0, into one still region (the bounding box of Churchill's face) is represented by the lighter middle part of the UML diagram. The segment is rep- resented by the image-data instance id1 that plays the still-region-role srr1. It is lo- cated by the digital-data instance dd1 which expresses the ...
Context 5
... the COMM, an image is formal- ized as some image-data, that plays a root-segment-role. This is abstracted in the API by creating an image object and assigning a still region (which refers to the image-data) to it (lines 1-3). The bounding box that refers to the rec- ognized face is added as a decomposition to the root still region representing the image. ...

Similar publications

Article
Full-text available
A Engenharia do Conhecimento define metodologias e ferramentas para adquirir e modelar conhecimento com a finalidade de torna-lo independente das pessoas, formalizá-lo e permitir a apropriação por organizações ou sistemas. Desta forma, atende os objetivos da Gestão do Conhecimento. A Engenharia do Conhecimento tem seu foco na criação de modelos for...

Citations

... However, there is the semantic gap, which is the disagreement between features automatically extracted by machines and semantic meanings perceived by humans (Djordjevic, Izquierdo, & Grzegorzek, 2007;Smeulders, Worring, Santini, Gupta, & Jain, 2000;Staab et al., 2008). The semantic gap is attributed to the internal dissimilarity and the external similarity in terms of features. ...
Article
Full-text available
Large-Scale Multimedia Retrieval (LSMR) is the task to fast analyze a large amount of multimedia data like images or videos and accurately find the ones relevant to a certain semantic meaning. Although LSMR has been investigated for more than two decades in the fields of multimedia processing and computer vision, a more interdisciplinary approach is necessary to develop an LSMR system that is really meaningful for humans. To this end, this paper aims to stimulate attention to the LSMR problem from diverse research fields. By explaining basic terminologies in LSMR, we first survey several representative methods in chronological order. This reveals that due to prioritizing the generality and scalability for large-scale data, recent methods interpret semantic meanings with a completely different mechanism from humans, though such humanlike mechanisms were used in classical heuristic-based methods. Based on this, we discuss human-machine cooperation, which incorporates knowledge about human interpretation into LSMR without sacrificing the generality and scalability. In particular, we present three approaches to human-machine cooperation (cognitive, ontological, and adaptive), which are attributed to cognitive science, ontology engineering, and metacognition, respectively. We hope that this paper will create a bridge to enable researchers in different fields to communicate about the LSMR problem and lead to a ground-breaking next generation of LSMR systems.
Article
Recent Large-Scale Multimedia Retrieval (LSMR) methods seem to heavily rely on analysing a large amount of data using high-performance machines. This paper aims to warn this research trend. We advocate that the above methods are useful only for recognising certain primitive meanings, knowledge about human interpretation is necessary to derive high-level meanings from primitive ones. We emphasise this by conducting a retrospective survey on machine-based methods which build classifiers based on features, and human-based methods which exploit user annotation and interaction. Our survey reveals that due to prioritising the generality and scalability for large-scale data, knowledge about human interpretation is left out by recent methods, while it was fully used in classical methods. Thus, we defend the importance of human-machine cooperation which incorporates the above knowledge into LSMR. In particular, we define its three future directions (cognition-based, ontology-based and adaptive learning) depending on types of knowledge, and suggest to explore each direction by considering its relation to the others.
Article
This paper presents a novel method for multimedia document content analysis through modeling multimodal data correlations. We hypothesize that the correlation of different modalities from the same data source can help achieve better multimedia content understanding results than one which explores a single modality. We turn this task into two parts: multimedia data fusion and multimodal correlation propagation. During the first stage, we re-organize the training multimedia data into Modality semAntic Documents (MADs) after extracting quantized multimodal features, and then use multivariate Gaussian distributions to characterize the continuous quantity by latent topic modeling. Model parameters are asymmetrically learned to initialize multimodal correlations in the latent topic space. Accordingly, during the second stage, we construct a Multimodal Correlation Network (MCN) based on the initialized multimodal correlations, and a new mechanism of propagating inter-modality correlations and intra-modality similarities in MCN is further proposed to take the complementary from cross-modalities to facilitate multimedia content analysis. The experimental results of image-audio data retrieval on a 10-categories dataset and content-oriented web page recommendation on the USTODAY dataset show the effectiveness of our method.
Article
Full-text available
For a long time, it was difficult to automatically extract meanings from video shots, because, even for a particular meaning, shots are characterized by signifincantly different visual appearances, depending on camera techniques and shooting environments. One promising approach for this has been recently devised where a large amount of shots are statistically analyzed to cover diverse visual appearances for a meaning. Inspired by the significant performance improvement, concept-based video retrieval receives much research attention. Here, concepts are abstracted names of meanings that humans can perceive from shots, like objects, actions, events, and scenes. For each concept, a detector is built in advance by analyzing a large amount of shots. Then, given a query, shots are retrieved based on concept detection results. Since each detector can detect a concept robustly to diverse visual appearances, effective retrieval can be achieved using concept detection results as “intermediate” features. However, despite the recent improvement, it is still difficult to accurately detect any kind of concept. In addition, shots can be taken by arbitrary camera techniques and in arbitrary shooting environments, which unboundedly increases the diversity of visual appearances. Thus, it cannot be expected to detect concepts with an accuracy of \(100\,\%\). This chapter explores how to utilize such uncertain detection results to improve concept-based video retrieval.
Conference Paper
Higher-level semantics for multimedia content is essential to answer questions like ``Give me all presentations of German Physicists of the 20th century''. The tutorial provides an introduction and overview to such semantics and the developments in multimedia metadata. It introduces current advancements for describing media on the web using Linked Open Data and other more expressive semantic technologies. The application of such technologies will be shown at concrete examples.
Article
In this paper, we investigate the issue of amateur production in order to leverage its integration in professional production. We define a conceptual model of the shooting script that represents information about the shooting realization. It enables us to provides the amateur cameraman with prior shooting guidance on an intelligent camcorder. We use image processing algorithms and methods to provide the amateur with real-time shooting feedbacks. After the shooting, these algorithms produce more accurate descriptions that can be compared to the initial prescription. The comparison is guided by satisfaction rules defined by the professional to sort out non conforming sequence. Such rules are also used as query during video shot reviewing. Eventually, we discuss our approach with related works.
Article
One of the challenges in image retrieval is dealing with concepts which have no visual appearance in the images or are not used as keywords in their annotations. To address this problem, this paper proposes an unsupervised concept-based image indexing technique which uses a lexical ontology to extract semantic signatures called ‘semantic chromosomes’ from image annotations. A semantic chromosome is an information structure, which carries the semantic information of an image; it is the semantic signature of an image in a collection expressed through a set of semantic DNA (SDNA), each of them representing a concept. Central to the concept-based indexing technique discussed is the concept disambiguation algorithm developed, which identifies the most relevant ‘semantic DNA’ (SDNA) by measuring the semantic importance of each word/phrase in the annotation. The concept disambiguation algorithm is evaluated using crowdsourcing. The experiments show that the algorithm has better accuracy (79.4%) than the accuracy demonstrated by other unsupervised algorithms (73%) in the 2007 Semeval competition. It is also comparable with the accuracy achieved in the same competition by the supervised algorithms (82–83%) which contrary to the approach proposed in this paper have to be trained with large corpora. The approach is currently applied to the automated generation of mood boards used as an inspirational tool in concept design.
Chapter
In recent years the variety of different multimedia devices, as well as the amount of available multimedia content, has dramatically increased. To serve multimedia consumers according to their needs, personalization aspects are essential. For this MPEG-7/21 offers a variety of possibilities to describe user preferences, terminal capabilities and transcoding hints within its ‘Digital Item Adaptation’ part. User preferences include information about the specific user, for instance what movie genre he/she prefers. Terminal capabilities describe the user's client device, like e.g. the maximal possible display resolution. In the transcoding hints the authors of multimedia content can define adaptation constraints regarding, e.g., minimum video size, etc. This chapter investigates the provided user preference model and discusses necessary extensions to provide overarching preference descriptions. Currently in the community basically three main approaches are discussed: usage of semantic Web languages & ontologies, XML databases & query languages, or more expressive preference models.
Conference Paper
Full-text available
Manipulation of XML data is becoming increasingly important for multimedia retrieval. Several encouraging attempts at developing methods for searching data have been proposed. However, efficiency and simplicity are still a barrier for further development. This paper gives a global idea about our multimodal system for retrieval multimedia annotated documents. This system combines four retrieval modalities: first by keywords, second by visual graph, third Query by Resources and finally using standardized languages.