Conference PaperPDF Available

An Integrative Semantic Framework for Image Annotation and Retrieval

Authors:

Abstract and Figures

Most public image retrieval engines utilise free-text search mechanisms, which often return inaccurate matches as they in principle rely on statistical analysis of query keyword recurrence in the image annotation or surrounding text. In this paper we present a semantically-enabled image annotation and retrieval engine that relies on methodically structured ontologies for image annotation, thus allowing for more intelligent reasoning about the image content and subsequently obtaining a more accurate set of results and a richer set of alternatives matchmaking the original query. Our semantic retrieval technology is designed to satisfy the requirements of the commercial image collections market in terms of both accuracy and efficiency of the retrieval process. We also present our efforts in further improving the recall of our retrieval technology by deploying an efficient query expansion technique.
Content may be subject to copyright.
An Integrative Semantic Framework for Image Annotation and Retrieval
Taha Osman1, Dhavalkumar Thakker1, Gerald Schaefer2, Phil Lakin3
1 School of Computing & Informatics, Nottingham Trent University, Nottingham, NG11 8NS, UK
{taha.osman, dhavalkumar.thakker}@ntu.ac.uk
2 School of Engineering & Applied Science, Aston University, Aston Triangle, Birmingham B4 7ET, UK
g.schaefer@aston.ac.uk
3PA Photos, Pavilion House, 16 Castle Boulevard, Nottingham, NG7 1FL, UK
phil.lakin@paphotos.com
Abstract
Most public image retrieval engines utilise free-text
search mechanisms, which often return inaccurate
matches as they in principle rely on statistical analysis
of query keyword recurrence in the image annotation
or surrounding text. In this paper we present a
semantically-enabled image annotation and retrieval
engine that relies on methodically structured
ontologies for image annotation, thus allowing for
more intelligent reasoning about the image content and
subsequently obtaining a more accurate set of results
and a richer set of alternatives matchmaking the
original query. Our semantic retrieval technology is
designed to satisfy the requirements of the commercial
image collections market in terms of both accuracy
and efficiency of the retrieval process. We also present
our efforts in further improving the recall of our
retrieval technology by deploying an efficient query
expansion technique.
1. Introduction
Affordable access to digital technology and
advances in Internet communications have contributed
to the unprecedented growth of digital media
repositories (audio, images, and video). Retrieving
relevant media from these ever-increasing repositories
is an impossible task for the user without the aid of
search tools. Most public image retrieval engines rely
on analysing the text accompanying the image to
matchmake it with the user query. Various
optimisations were developed including the use of
weighting systems where for instance higher regard
can be given to th e proximity of th e keyword to the
image location, or advanced text analysis techniques
that use term weighting method, which relies on the
proximity between the anchor to an image and each
word in an HTML file [1]. Despite the optimisation
efforts, these search techniques remain hampered by
the fact that they rely on free-text search that, while
cost-effective to perform, can return irrelevant results
as it primarily relies on the recurrence of exact words
in the text accompanying the image. The inaccuracy of
the results increases with the complexity of the query.
For instance, while performing this research we used
the Yahoo™ search engine to look for images of the
football player Zico returns some good pictures of the
player, mixed with photos of cute dogs (as apparently
Zico is also a popular name for pet dogs), but if we add
the action of scoring to the search text, this seems to
completely confuse the Yahoo search engine and only
one picture of Zico is returned, in which he is standing
still!
Any significant contribution to the accuracy of
matchmaking results can be achieved only if the search
engine can “comprehend” the meaning of the data that
describes the stored images, for instance, if the search
engine can understand that scoring is an act associated
with sport activities performed by humans. Semantic
annotation techniques have gained wide popularity in
associating plain data with “structured” concepts that
software programs can reason about [2]. This effort
presents a comprehensive semantic-based solution to
image annotation and retrieval as well as deploying
query expansion techniques for improving the recall
rate. It specifically targets the commercial image
collections market and acknowledges their
requirements for high quality recall without sacrificing
the performance of th e retrieval process.
The paper begins with an overview of the Semantic
web technologies. In section 3 we review the case
study that was the motivation for this work. Sections 4,
5, 6, and 7 detail the implementation roadmap of our
semantic-based retrieval system, i.e. ontology
engineering, annotation, retrieval, and query
expansion. We present our conclusions and plans for
further work in section 8.
2007 IEEE/WIC/ACM International Conference on Web Intelligence
0-7695-3026-5/07 $25.00 © 2007 IEEE
DOI 10.1109/WI.2007.69
366
2007 IEEE/WIC/ACM International Conference on Web Intelligence
0-7695-3026-5/07 $25.00 © 2007 IEEE
DOI 10.1109/WI.2007.69
366
2007 IEEE/WIC/ACM International Conference on Web Intelligence
0-7695-3026-5/07 $25.00 © 2007 IEEE
DOI 10.1109/WI.2007.69
366
2007 IEEE/WIC/ACM International Conference on Web Intelligence
0-7695-3026-5/07 $25.00 © 2007 IEEE
DOI 10.1109/WI.2007.69
366
2007 IEEE/WIC/ACM International Conference on Web Intelligence
0-7695-3026-5/07 $25.00 © 2007 IEEE
DOI 10.1109/WI.2007.69
366
2. Overview of the semantic web
2.1. Ontologies (domain conceptualisation)
The fundamental premise of the semantic web is to
extend the Web’s current human-oriented interface to a
format that is comprehensible to software programmes.
Naturally this requires a standardised and rich
knowledge representation scheme or Ontology.
One of the most comprehensive definitions of
ontologies is that expressed in [3]: “Ontology is a
shared conceptualisation of a domain and typically
consists of comprehensive set of concept classes,
relationships between them, and instance information
showing how the classes are populated in the
application domain. This comprehensive representation
of knowledge from a particular domain allows
reasoning software to make sense of domain-related
entities (images, documents, services, etc.) and aid in
the process of their retrieval and use.
2.2. Caption-based semantic annotation
Applied to image retrieval, the semantic annotation
of images allows retrieval engines to make more
intelligent decisions about the relevance of the image
to a particular user query, especially for complex
queries. For instance to retrieve images of the football
star David Beckham expressing anger, it is natural to
type the keywords ‘David Beckham angry’ into the
Google™ Image Search engine. However, at the time
of the experiment, the search engine returned 14
images of David Beckham an d he looks upset in only
two of them. The other retrieved images were
completely irrelevant with one of them displaying an
angry moose!
The use of Semantic technologies can significantly
improve the computer’s understanding of the image
objects and their interactions by providing a machine-
understandable conceptualisation of the various
domains that the image represents. This
conceptualisation integrates concepts and inter-entity
relations from different domains, such as Sport, People
and Emotions relation to the query above [4], thus
allowing the search engine to infer that David
Beckham is a person and thus likely to express
emotions and that he is also an English footballer
playing for Real Madrid FC.
2.3. Content-based semantic annotation
The success of caption-based semantic image
retrieval largely depends on the quality of the semantic
caption (annotation) itself. However, the caption is not
always available largely because the annotation is a
labour intensive process. In such situations, image
recognition techniques are applied, which is better
known as content-based retrieval. However, the best
content-based techniques deliver only partial success
as image recognition is an extremely complex problem
[5], especially in the absence of accompanying text that
can aid inferring in the relationship between the
recognized objects in the image. Moreover, from a
query composition point of view, it is much easier to
use a textual interface rather than a visual interface (by
providing sample training image or sketch) [6].
3. Case study for semantic-based image
retrieval
An opportunity to experiment with our research
findings in semantic-based search technology was
gratefully provided by PA Photos™. PA Photos is a
Nottingham-based company which is part of the Press
Association Photo Group Company [7]. As well as
owning a huge image database in excess of 4 million
annotated images which date back to the early 1900’s,
the company processes a colossal amount of images
each day from varying events ranging from sport to
politics and entertainment. The company also receives
annotated images from a number of partners that rely
on a different photo indexing schema.
More significantly, initial investigation has proven
that the accuracy of the results sets matching the user
queries do not measure up to the rich repository of
photos in the company’s library.
The goal of the case study is two-fold. Initially, we
intend to investigate the use of semantic technology to
build a classification and indexing system that
critically unifies the annotation infrastructure for all the
sources of incoming stream of photos. Subsequently,
we’ll conduct a feasibility study aiming to improve the
end-user experience of their images search engine. At
the moment PA Photos search engine relies on Free-
Text search to return a set of images matching the user
requests. Therefore the returned results naturally can
go off-tangent if the search keywords do not exactly
recur in the photo annotations. A significant
improvement can result from semantically enabling the
photo search engine. Semantic-based image search will
ultimately enable the search engine software to
understand the “concept” or “meaning” of the user
request and hence return more accurate results
(images) and a richer set of alternatives.
It is important here to comment about the dynamics
of the retrieval process for this case study as it
represents an important and wide-spread class of
application areas where there is a commercial
opportunity for exploiting semantic technologies:
1. The images in the repository have not been
extracted from the web. Consequently the
extensive research into using the surrounding
367367367367367
text and information in the HTML document in
improving the quality of the annotation such as
in [2] [6] is irrelevant.
2. A significant sector of this market relies on fast
relay of images to customers. Consequently this
confines advanced but time-consuming image
analysis techniques [5] to off-line aid with the
annotation of caption-poor images.
3. The usually colossal amount of legacy images
annotated to particular (non-semantic) schema
necessitates the integration of these
heterogeneous schemas into any new,
semantically-enabled and more comprehensive
ontologies.
4. Ontology development
4.1. Domain Analysis
Our domain analysis started from an advanced
point as we had access to the photo agency’s current
classification system. Hence, we adopted a top-down
approach to ontology construction that starts by
integrating the existing classification with published
evidence of more inclusive public taxonomies [8]. At
the upper level, two ontological trees were identified;
the first captures knowledge about the event (objects
and their relationships) in the image, and the second is
a simple upper class that characterises the image
attributes (frame, size, creation date, etc.), which is
extensible in view of future utilisation of content-
recognition techniques.
Building knowledge-management systems using
ontologies and reasoning engines is a more
cumbersome task than the traditional database-based
approach. Hence, it is wise to be prudent with the scale
of semantic-based projects until feasibility of the
semantic approach is ascertained, particularly in
commercial contexts, where emphasis is on
deliverables rather than the methodology. At the initial
stages of the research, we made the following
decisions:
1. To limit our domain of investigation to sport-
related images
2. Address the sports participants “action” and
“emotion” in our ontology to demonstrate the
advantage of using semantics in expressing
relationships between objects in the image.
3. Defer research into content-based methods,
which mainly targets aid in annotating legacy
images, until the feasibility of caption-based
semantic retrieval proves successful.
A bottom-up approach was used to populate the
lower tiers of the ontology class structure by
examining the free-text and non-semantic caption
accompanying a sample set of sport images. Domain
terms were acquired from approximately 65k image
captions. The terms were purged of redundancies and
verified against publicly available related taxonomies
such as the media classification taxonomy detailed in
[8]. An added benefit of this approach is that it allows
existing annotations to be seamlessly parsed and
integrated into the semantic annotation.
Wherever advantageous, we integrated external
ontologies (e.g., [9]) into our knowledge
representation. However, bearing in mind the
responsiveness requirements of on-line retrieval
applications, we applied caching methods to localise
the access in order to reduce its time overhead.
Figure 1 Subset of the ontology tree
4.2. Consistency Checking
Unlike database structures, ontologies represent
knowledge not data, hence any structural problems will
have detrimental effect on their corresponding
reasoning agents especially that ontologies are open
and distributed by nature, which might cause wide-
spread propagation of any inconsistencies [10]. For
instance, in traditional structuring methodologies,
usually the part-of relationship is followed to express
relationships between interdependent concepts. So, for
players that are part-of a team performing in a
particular event, the following is a commonly taken
approach:
Figure 2 Traditional part-of relationships
Player
FirstName……
LastName……
hasNationality…
….
hasTeam……..
Team
Name…
hasNationality…
hasTourna ment...
Tourname nt
Name….
…..
Event
Sport
Federatio
n
Team Action Feeling
Player
Manager
Huma
n
Characteristic
Stadium Perso
n
Size
Contras
t
Forma
t
Ima
g
e AttributesS
p
orts Domai
n
Ima
e Collection
368368368368368
However logical the above description appears at
first sight, further analysis reveals inconsistency
problems. When a player plays for two different teams
at the same time (e.g. his club and his national team) or
changes clubs ever y year, it is almost impossible to
determine which team the player plays for. Hence, the
order of definition (relationship direction) should
always be the reversal sequence of the part-of
relationship as redesigned below:
Figure 3 Re-organization of the player classification
4.3. Coverage
Although consistent, the structural solution in
Figure 3 is incomplete as players’ membership is
temporal. The same problem occurs with tournaments
as from one year to another, teams takin g par t in the
tournament change. This problem can be solved by
adding a start and end date for the tournament (see
Figure 4), rather than by engineering more complex
object property solutions. Hence, as far as the semantic
reasoner is concerned, the “FIFA World Cup 2004” is
a different instance from “FIFA World Cup 2008”. The
same reasoning can be applied to the class team, as
players can change team every season. These
considerations, although basic for a human reasoning,
need to be explicitly defined in the ontology.
Figure 4 Resolving Coverage problems in ontology
4.4. Normalisation: reducing the redundancy
The objective of normalisation is to reduce
redundancy. In ontology design, redundancy is often
caused by temporal characteristic that can generate
redundant information and negatively affect the
performance of the reasoning process.
Direct adoption of the ontology description in
Figure 4 above will result in creating new team each
season, which is rather inefficient as the team should
be a non-temporal class regardless of the varying
player’s membership or tournament participation every
season . Hence, Arsenal or Glasgow Rangers Football
clubs need to remain abstract entities. Our approach
was to introduce an intermediary temporal membership
concept that servers as an indispensable link between
teams and players, as well as between teams and
tournaments as illustrated in Figure 5 below.
The temporal instances from the Membership class
link instances from two perpetual classes as follows:
memberEntity links to a person (Player, Manager,
Supporter, Photographer, etc.)
isMemberOf refers to the organisation (Club, Press
Association, Company, etc.)
fromPeriod and toPeriod depict membership
temporal properties
Figure 5 Membership class in the final ontology
5. Image Annotation
The Protégé® ontology editor that was utilised to
construct the sport domain ontology. Protégé uses
frame-based knowledge representation [11] and adopts
OWL as the ontology language. The Web Ontology
Language (OWL) [12] has become the de-facto
standard for expressing ontologies, it adds extensive
vocabulary to describe properties and classes and
express relations between them (such as disjointness),
cardinality (for example, "exactly one"), equality,
richer typing of properties, and characteristics of
properties (such as symmetry). The Jena [13] java API
was used to build the annotation portal to the
constructed ontology.
The central component of the annotation are the
images stored (as OWL descriptions) in image library
as illustrated in Figure 6. Each image comprises an
object, whose main features are stored within an
independent object library. Similarly are the object
characteristics, event location, etc. distinct from the
image library. This highly modular annotation model
facilitates the reuse of semantic information and
reducing redundancy.
Membership
fromPeriod
DB_RealMadrid
Club
Arse nal FC
Real Madrid
Barcel ona
isMemberO
f
membershipEntity
Player
Thierry Henr y
David Beckham
Ronaldinho
Team
Name…
Season….
hasNationality…
hasPla yer...
isTeamOf….
Tourname nt
Name….
hasStartDate..
hasEndDate...
hasTea m…
…..
Player
FirstName……
LastName……
hasNationality…
isPlayerOf….
….
Player
FirstName……
LastName……
hasNationality…
….
Team
Name…
hasNationality…
hasPla yer...
Tourname nt
Name….
hasTea m…
…..
369369369369369
Figure 6 Architecture of the annotation
Taking into account the dynamic motion nature of
the sport domain, our research concluded that a
variation of the sentence structure suggested in [14] is
best suited to design our annotation template. We opted
for an “Actor – Action – Object” structure that will
allow the natural annotation of motion or emotion-type
relationships without the need to involve NLP
techniques [15]. For instance, “Beckham – Smiles –
null”, or “Gerrard – Tackles – Henry”. An added
benefit of the structure is that it simplifies the task of
the reasoner in matching actor and action annotations
with entities that have similar characteristics.
6. Image Retrieval
The image retrieval user interface is illustrated in
Figure 7. The search query can include sentence-based
relational terms (Actor-Emotion/Action-Object) and/or
key domain terms (such as tournament and team). In
case multiple terms were selected for the query, the
user needs to specify which term represents the main
search preference (criterion).
Figure 7 Snapshot of the retrieval interface
For instance, in Figure 7 the relational term
(Gerrard Tackles Rooney) is the primary search term
and team Liverpool is the secondary search term. The
preference setting is used to improve the ranking of
retrieved images.
Figure 8 gives a high level view of the annotation
and retrieval mechanism. The semantic description
generator allows the annotator to transparently annotate
new images and also transforms the user query into
OWL format. The semantic reasoning engine applies
our matchmaking algorithm at two phases: The first
retrieves images with annotations matching all
concepts in the query; in the second phase further
matchmaking is performed to improve the ranking of
the retrieved images in response to user preferences.
Figure 8 Schematic diagram of the Semantic Web
Image Retrieval software
Our reasoning engine uses a variation of the nearest
neighbour matchmaking algorithm [16] to serve both
the semantic retrieval and the ranking phases. Our
algorithm continues traversing back to the upper class
of the ontology and matching instances until there are
no super classes in the class hierarchy, i.e. the leaf
node for the tree is reached, giving degree of match
equal to 0. The degree of match (DoM) is calculated
according to the following equation:
GN
MN
DoM = Equation 1
Where the MN is the total number of matching
nodes in the selected traversal path, and GN the total
number of nodes in the selected traversal path. This is
exemplified in Figure 8. Then the comparison values
Image Library
Image#1
Object#o1
ObjectCharac#oc1
Location#l1
Date
ObjectLibrary
Object# o1
Class=pe rson
Name
Size
Date Of Creatio
n
ObjectCharacte
rsticLibrary
Obje ctChara c#oc1
Object# o1
Characterstic=angry
Location
Library
Location#l1
City#cit y1
Country#
countr y1
User
Admin
Person
Library
Team
Library
Match
Library
Pre
f
erences
Setting
In
d
exe
d
annotation
Library Data Index
OWL
Request
Preference-
based
ranking
New Annotation
Semanti c-
based
retrieval
Matc
h
ing
annotations
Final
image set Reasoning
Engine
Semanti c
description
generator
New Query
370370370370370
are weighted using the user preferences according to
the formula [16]:
m = |lr - la| ;
p
[0,1] , v = pm Equation 2
v: value assigned to the comparison;
m: matching level of the individuals,;
p: user preference setting;
lr: level of the request;
la: level of the annotation.
For example, if the query is Object–
hasCharacteristic-happy, and image1 and image2 are
annotated with Object-hasCharacteristic-happy and
Object-hasCharacteristic-smile respectively, the DoM
for image1 is 1 as the instances match to the level of
the leaf node (Figure 9). However, for image2
instances match to the level of Positive Feeling- Mild
class and is one layer lower than the leaf node giving
DoM = 0.5.
Figure 9 Traversing the Ontology Tree
7. Semantic Web based Query Expansion
to achieve better precision and recall
Lately query expansion (QE) techniques have
gained a lot of attention in attempting to improve the
recall of document and media queries. QE methods fit
naturally into our image retrieval technology as we rely
on computing the aggregate degree of match (ADoM)
for the semantic relations describing a particular image
to determine its match to the original query. Hence, we
can easily determine the quality of the retuned results
in terms of accuracy and volume and decide whether to
apply QE techniques to replace or improve the query
concepts to improve the quality of the recall. This is
particularly feasible for semantic-based knowledge
bases as they provide language expressiveness for
specifying the similarity of the concepts (Implicit and
Explicit) at different granularity.
Query expansion techniques can be broadly
classified into two categories: the first category uses
statistical and probabilistic methods [17] to extract
frequently occurring terms from successfully recalled
documents and image annotations. These terms are
then used to expand the keyword set of similar future
queries. The Main shortcoming of the statistics-based
QE techniques is that they are as good as the statistics
they rely on and have similar disadvantages as free-text
based search engines in that they lack structure and are
difficult to generalize or to reuse for other domains.
The second category [18] utilises lexical databases to
expand user queries. A lexical database similar to
WordNet [19] is employed, in which language nouns,
verbs, adjectives and adverbs are organized into
synonym sets that can potentially replace or expand the
original query concepts. However, lexical database
lack the semantic conceptualisation necessary to
interrelate concepts in complex queries and render
them comprehensible to search engines.
Semantic relations-based QE technique expands the
query with related concepts rather than simple terms.
Next we discuss the semantic-based QE algorithm we
designed to expand our image retrieval technology.
Step1: If query has concept Cp as the primary search
concept and Cs
as the secondary search concept
provided by the searcher then we define query
expansion on Cp as follows:
Let’s say Cp’ is the alternative concept, δ is the
distance between Cp and Cp concepts and Ψ is the
expected distance between these two concepts
implying them related, the expansion function is:
ii
n
i
p
i
pCC ii Ψ→
=
Ψ
δ
δ
1
'
,,)( Equation 3
The equation implies that concepts Ci
p’ are related to Cp
if they are at acceptable distance from Cp.
7.1. Formalizing relatedness between two
concepts
A major concern in QE techniques is the
formalization of relatedness between two concepts in
order to select an optimal set of alternatives.
For the benefit of the discussion, we feel it is
necessary to revisit the following components of
Semantic web formalism and their representation in the
OWL ontology language:
Taxonomy Relationships (TR): Taxonomy is the
concepts classification system facilitated by Semantic
Web. Class and Individual are the two main elements
Characteristic
ha
ppy
Level 0: 1
p
rou
d
Intense Strong Moderate Mild Level 1: 0.5
Positive Feeling
Ne
g
ative Feelin
g
Feeling Level 3: 0,125
Level 2: 0.25
Depth of the tree
smile
Level 4: 0
371371371371371
of this structure where a class is simply a name and
collection of properties that describe a set of
individuals. Examples of relationships between
concepts at the taxonomy level are class, subclass,
superclass, equivalent class, individual, sameAs,
oneOf, disjointWith, differentFrom, AllDifferent.
Rules based relationships (RR): Semantic Web
Rule Language (SWRL) defines rule based semantics
using subset of OWL with the sublanguages of Rule
Mark-up Language. SWRL extends OWL with horn-
like First Order Logic rules to extend the language
expressivity of OWL.
We use this relationship formalism to identify
explicit and implicit relatedness of concepts. To
evaluate implicit relationships we use subsumption and
classification to perform semantic tree traversal and
compare the concepts with respect to the semantic
network tree as detailed in our image retrieval
algorithm earlier. Contrarily, explicit relationship
between two concepts always has a Degree of Match
(DoM) of 0 or 1 as they explicitly equate or distinct
two individuals. For example the owl:sameAs equates
two individuals to unify two distinct ontology elements
while owl:differentFrom has exact opposite effect
where it makes individuals mutually distinct.
If the taxonomy and rule based implicit and explicit
relationship results in n number of equivalent concepts
represented by {C1, C2, C3, …... Cn} or C
p’, then to
calculate DoM for these likely replacement concepts
we employ another semantic web relationship
formalism, which we will refer as property based
relationship.
Property Relationships (PR): Properties can be
used to state relationships between individuals or from
individuals to data values. These relationships are
achieved through the data or object type properties.
(i.e., hasTeam, hasTournament, isMemberOf)
Step2: Assuming Query preference concept Cp has
properties Ri which has value instances Ii
R and the
annotation matching the alternative concept Cp’ has
properties R’i and the value instances Ii
R’
, then we can
compare Ii
R and Ii
R’ semantically using Equation 2
7.2. Illustrative example
In this section we illustrate how our QE algorithm
works by discussing the following case. If a user is
searching for pictures with England Team possibly in
the 2006 FIFA World Cup tournament, the system
treats England Team as user’s primary search criterion
and 2006 FIFA World Cup Tournament as secondary
search criterion in the query.
Without expanding the query, the retrieval
algorithm returns zero results if there are no images
annotated with Team England (Table 1). The following
section explains the process of expanding query under
these circumstances using our algorithm.
England Team (Cp)
(Cp has properties Ri) Ii
R (properties value)
Has Nationality Country (England)
Has Sport Sport (Football)
IsWinnerOf Tournament(Fifawc66)
hasNationalTeamTournament Fifawc66, 70,
Table 1 Preference Concept
In our sports domain ontology implicit subsumption
relationship is applied to find relevant primary
concepts. For instance, to find alternative terms for
Team England, the reasoner first retrieves siblings of
the National Team such as Team Brazil, Team Spain,
and then less adjacent siblings of the Team instances
such as Team Chelsea and Team Barcelona.
In the following step we compare the relationship
as defined in step 2 as illustrated in the Table 2 below:
Query Team
Brazil
Team
Chelsea
hasNationality England Brazil 0 England 1
hasSport Football Football 1 Football 1
isWinnerOf Fifawc 06 Fifawc70 0.5 Prem. 06 0
hasNational
TeamTourna..
Fifawc
66, 70, …
Fifawc
66,70, …
1 Prem.
93, 94, …
0
DoM
Brazil 2.5 Chelsea 2
Table 2 Comparing relationship
Step3: If the ranked images in stage 2 are {X1, X2,
X3…}, Cs
is the secondary search term in the query
provided by the searcher, these ranked images have Cs
present in their annotation Cs
X then repeat step 2
where Cp= Cs
and Cp’
= Cs
x
In our image database this results in images
retrieved for the first stage associated with the relevant
concepts and they are: Image 1 (Image with Team
Brazil in 2006 FIFA world cup), Image 2(Chelsea –
Premiership 2007).
Query Image 1 Image 2
hasTournament Fifawc 06 Fifawc 06 1 Prem. 07 0
DoM
Team
Brazil
2.5 Chelsea 2
Table 3 analyzing secondary terms in the query
7. Conclusions
In this paper we presented a comprehensive
solution for image retrieval applications that takes full
advantage of advances in semantic web technologies to
coherently implement the annotation, retrieval and
query expansion components of the integrative
372372372372372
framework. We claim that our solution is particularly
attractive to commercial image providers where
emphasis is on the efficiency of the retrieval process as
much as on improving the accuracy and volume of
returned results. For instance, we shied from
employing expensive content-based recognition
techniques at the retrieval stage and deployed public
ontology caching to reduce the reasoning overhead,
while designed an efficient query expansion algorithm
to improve the quality of the image recall.
The first stage of the development was producing
ontologies that conceptualise the objects and their
relations in the selected domain. We methodically
verified the consistency of our ontology, optimised its
coverage, and performed normalisation methods to rid
of concept redundancies. Our annotation approach was
based on a variation of the “sentence” structure to
obtain the semantic-relational capacity for
conceptualising the dynamic motion nature of the
targeted sport domain.
The retrieval algorithm is based on a variation of
the nearest-neighbour search technique for traversing
the ontology tree and can accommodate complex,
relationship-driven user queries. The algorithm also
provides for user-defined weightings to improve the
ranking of the returned images and was extended to
embrace query expansion technology in a bid to
improve the quality of the recall.
Although we recognize that image analysis
techniques might have a large time overhead for the
on-line retrieval process, we intend to research utilizing
advances for in semantically-enabled content
recognition technology to aid in semi-automating the
annotation process of legacy caption-poor images.
8. References
[1] A. Fujii, T. Ishikawa, “Toward the Automatic
Compilation of Multimedia Encyclopaedias:
Association Images with Term Descriptions on the
Web”, In Proceedings of the 2005 International
Conference on Web Intelligence – WIC05, Compiègne,
France, September 19-22, 2005,pp. 536-542.
[2] H. Wang, S. Liu and L-T. Chia, “Does ontology help in
image retrieval?: a comparison between keyword, text
ontology and multi-modality ontology approaches”,
Proceedings of the 14th annual ACM international
conference on Multimedia, Hawai, USA, 2006, pp. 109
– 112
[3] J. S. Hare, P. G.B, Enser and C.J. Sandom, “Mind the
gap: another look at the problem of the semantic gap in
image retrieval” Multimedia Content Analysis,
Management, and Retrieval, Vol. 6073, No. 1, 2006
pp. 607309-1.
[4] T. Berners-Lee, “Weaving the Web: the original design
of the World Wide Web by its inventor” eds. T.
Berners-Lee with M. Fischetti. Harper Collins, 2000. pp
157-160.
[5] T. Lam and R. Singh, "Semantically Relevant Image
Retrieval by Combining Image and Linguistic
Analysis", Proc. International Symposium on Visual
Computing (ISVC), Lecture Notes in Computer Science
Vol. 4292, pp. 1686 - 1695, Springer Verlag, 2006
[6] E. W. Maina M. Ohta, K. Katayama, I. Hiroshi,
“Semantic Image Retrieval Based On Ontology and
Relevance Model: A Preliminary Study”, Digital
Engineering Workshop, Tokyo, Japan, 24-25 February,
2005, pp. 331-339.
[7] PA Photos. 2007. http://www.paphotos.com/
[8] M. Roach, J. Mason, N. Evens, L. Xu, F. Stentiford,
“Recent Trends in Video Analysis: A Taxonomy Of
Video Classification Problems”, Internet and
Multimedia Systems and Applications, Kaua'i, Hawaii,
USA, 2002, pp.348-353.
[9] http://www.aktors.org/ontology/portal#
[10] A. Rector, “Modularisation of domain ontologies
implemented in description logics and related
formalisms including OWL” Proceedings of 2nd
international conference on Knowledge capture, pp.121
- 128
[11] N.F.Noy, M. Crubezy, R.W. Fergerson,, H.Knublauch ,
and. ,M.A. Musen. “Protege-2000: An Open-source
Ontology-development and Knowledge-acquisition
Environment”, AMIA Annual Symposium Proc., 953.
[12] OWL Web Ontology Language Overview.
http://www.w3.org/TR/owl-features
[13] J.J. Carroll, D. Reynolds, I. Dickinson, A. Seaborne, C.
Dollin and K. Wilkinson, “Jena: implementing the
semantic web recommendations”, Proceedings of the
13th international World Wide Web conference, New
York, USA, ACM Press, pp. 74-83.
[14] L. Hollink, A.Th. Schreiber, J. Wielemaker, and B.
Wielinga, “Semantic annotation of image collections. In
Workshop on Knowledge Markup and Semantic
Annotation”, KCAP'03, Florida, USA, 2003.
[15] H. Chen , “Machine Learning for information retrieval:
Neural networks, symbolic learning and genetic
algorithms”, Journal of the American Society for
Information Science and Technology, 46(3), April 1995,
pp. 194-216.
[16] T. Osman, D. Thakker, D. Al-Dabass D, “Semantic-
Driven Matchmaking of Web Services Using Case-
Based Reasoning” In proceedings of IEEE International
Conference on Web Services (ICWS'06), Chicago,
USA, September 2006. pp. 29-36.
[17] J. Xu, and W. Croft, “Improving the effectiveness of
information retrieval with local context analysis”
Transactions on Information Systems (ACM TOIS),
18(1), 2000, pp.79–112.
[18] E. Voorhees, “Query expansion using lexical-semantic
relations” In the Proceedings of the 17th Annual
International ACM SIGIR Conference on Research and
Development in Information Retrieval, New York, NY,
USA, Springer-Verlag, 1994, pp. 61–69.
[19] C. Fellbau, WordNet- An Electronic Lexical Database,
The MIT press, Cambridge, MA, USA, May 1998.
373373373373373
... The domain knowledge required for annotation in the ontology, such as textual image description and even captions, is extracted from other associated information with the images. The annotation efforts can thus be semi-automatically accomplished by human annotators with tag suggestions [10][11][12], or by using similarity ontologybased matching between image description terms [13,14]. Other specific methodologies such as latent semantic analysis [15], database-centric probabilistic model [16], transductive inference [17], confidence-based dynamic ensemble [18], expectationmaximization algorithm [19] and document object model approach [20] are also attempted. ...
... Other specific methodologies such as latent semantic analysis [15], database-centric probabilistic model [16], transductive inference [17], confidence-based dynamic ensemble [18], expectationmaximization algorithm [19] and document object model approach [20] are also attempted. The application of image annotation is mainly for image indexing and retrieval [11,[13][14][15]20]. ...
Article
With the advent of various services and applications of Semantic Web, semantic annotation had emerged as an important research area. The use of semantically annotated ontology had been evident in numerous information processing and retrieval tasks. One of such tasks is utilizing the semantically annotated ontology in product design which is able to suggest many important applications that are critical to aid various design related tasks. However, ontology development in design engineering remains a time consuming and tedious task that demands tremendous human efforts. In the context of product family design, management of different product information that features efficient indexing, update, navigation, search and retrieval across product families is both desirable and challenging. This paper attempts to address this issue by proposing an information management and retrieval framework based on the semantically annotated product family ontology. Particularly, we propose a document profile (DP) model to suggest semantic tags for annotation purpose. Using a case study of digital camera families, we illustrate how the faceted search and retrieval of product information can be accomplished based on the semantically annotated camera family ontology. Lastly, we briefly discuss some further research and application in design decision support, e.g. commonality and variety, based on the semantically annotated product family ontology.
... Semantic-based image retrieval is implemented in extensive domains and able to prove for tremendous success [18,19,20,21,22,23,24,25,26,27]. ...
Conference Paper
Abstract— A number of researches on image retrieval have been done in past decades. The approaches include conventional tagging method and content-based image retrieval (CBIR). However, the conventional method still ignores the semantic gap while semantic-based approach provides more comprehensive information resources which are another important factor for enhancing retrieval accuracy. Topic Maps is also one of distinguished semantic-based methodology driving impressive results. Thus, the objective of this study is to introduce the Topic Maps and its success stories including some observation on this technique and conventional tagging method as a beginning stage of image retrieval system development. The paper is started with some background in image retrieval. Next, Topic Maps is described to illustrate its structure and function. Success stories are presented afterwards to stress on effectiveness of Topic Maps-based approach. Then, an initial experiment is presented to unveil some advantages of Topic Maps over the conventional tagging method. Finally, the paper is wrapped up with conclusion and expected future work for Topic Maps implementation. Keywords— Image Retrieval, Topic Maps, Knowledge Representation Model, Semantic Technology
... The DL is built from concepts and roles by means of constructors. The domain Hyvönen et al. (2003) Helsinki University Museum Annotated keyword the images Not specified Soo et al. (2003) Mandarin Chinese Thesaurus Manually annotated Not specified Schober et al. (2004) Natural images Manually annotated Descriptive image feature Brooke and Wei (2005) Hotel, car based domain Used the HTML tag around the images Not specified Osman et al. (2007) Sports domain Names of team, player and manager Image size, format, and contrast Farah et al. (2008) Satellite images Textual description with respect to scene, sensor and spatial relation Not specified Shi et al. (2008) Natural images Not specified Images are sub-divided and general features are used Shareha et al. (2009) Human and animal Textual keywords Image annotation Koletsis and Petrakis (2010) Dog Not specified MPEG 7 features Yildirim et al. (2013) Basketball Description of action and activity of the game Not specified Khurana and Chandak (2013) Locomotive XML description SIFT specific Asteroideae ontology (AFIF) is created to bridge the semantic gap. To evaluate the completeness of the created ontology rules based query evaluation has to be performed. ...
Article
Full-text available
Constructing a domain specific ontology is tedious commitment. Through reasoner the created ontology can be evaluated. The reasoner checks the consistency of the classes and evaluates the occurrence of any obvious errors. The ontology entities are expected to be consistent with intuitions. The ontology instance has to be minimal redundant. Thus to maintain the high quality ontology, the designed ontology should be meaningful, correct, minimally redundant, and richly axiomatised. The main objective of this paper is to create a logical entailment between the domain specific ontology and entities using fuzzy rule.
... Another kind of extension is ontology-based image annotation and brings a new architecture of conceptual image annotations [6, 7]. The new semantic information are gained by the conceptual structures and relationships or leads to new models to describe images [9]. Approaches like [5] are learning from ontologies or discover knowledge in ontologies. ...
... While image libraries are growing at a rapid rate (personal image collections may contain thousands, commercial image repositories millions of images [1]), most images remain un-annotated [2] , preventing the application of a typical textbased search. Content-based image retrieval (CBIR) [3, 4] does not require any extra data, as it extracts image features directly from the image data and uses these to query image collections. ...
Chapter
Image collections are growing at a rapid rate, motivating the need for efficient and effective tools to search through these databases. Content-based image retrieval (CBIR) techniques extract features directly from image data and use these to query image collections. In this paper, we focus on some more advanced issues in CBIR, namely the extraction of image features from compressed images, the use of colour invariants for image retrieval, and image browsing systems as an alternative to direct retrieval approaches. Keywordsimage retrieval–image features extraction
... While image libraries are growing at a rapid rate (personal image collections may contain thousands, commercial image repositories millions of images [1]), most images remain un-annotated [2], preventing the application of a typical text-based search. Content-based image retrieval (CBIR) [3, 4] does not require any extra data, as it extracts image features directly from the image data and uses these, coupled with a similarity measure, to query image collections. ...
Chapter
Image collections are growing at a rapid rate, motivating the need for efficient and effective tools to query these databases. Content-based image retrieval (CBIR) techniques extract features directly from image data and use these, coupled with a similarity measure, to search through image collections. In this paper, we introduce some of the basic image features that are used for CBIR. Keywordsimage retrieval–image features
Conference Paper
In recent years, the generation model has made great progress in the task of less label sample data. Aiming at the heavy task, high cost, time-consuming and laborious problems of medical image labeling, this paper proposes an image label generation model based on generative adversarial network (GAN). The generator consists of a convolution network and a long-term and short-term memory network. It generates a text description for the input image. At the same time, the discriminator consists of a convolution network, calculates the difference between the generated description and the real description, and transfers the gradient to complete the confrontation training. In this paper, the model is trained on the INbreast dataset, and the experiment show that the model achieves good results in the generation of medical image data labels.
Article
In this paper we present several approaches to browsing large image databases. The fundamental idea of how to arrange images is based on their visual similarity so that images that are similar get placed close to each other in the navigation environment. A hierarchical structure of the database coupled with arrangement of images on a regular lattice both avoids images overlapping and allows efficient and effective access to large image collections.
Article
Due to the low-level image features it utilizes, the semantic gap problem is hard to bridge and performance of CBIR systems is still far away from users' expectation. Image annotation, region-based image retrieval and relevance feedback are three main approaches for narrowing the "semantic gap". In this paper, recent development in these fields are reviewed and some future directions are proposed in the end.
Chapter
In this chapter we provide a comprehensive overview of the emerging field of visualising and browsing image databases. We start with a brief introduction to content-based image retrieval and the traditional query-by-example search paradigm that many retrieval systems employ. We specify the problems associated with this type of interface, such as users not being able to formulate a query due to not having a target image or concept in mind. The idea of browsing systems is then introduced as a means to combat these issues, harnessing the cognitive power of the human mind in order to speed up image retrieval.We detail common methods in which the often high-dimensional feature data extracted from images can be used to visualise image databases in an intuitive way. Systems using dimensionality reduction techniques, such as multi-dimensional scaling, are reviewed along with those that cluster images using either divisive or agglomerative techniques as well as graph-based visualisations. While visualisation of an image collection is useful for providing an overview of the contained images, it forms only part of an image database navigation system. We therefore also present various methods provided by these systems to allow for interactive browsing of these datasets. A further area we explore are user studies of systems and visualisations where we look at the different evaluations undertaken in order to test usability and compare systems, and highlight the key findings from these studies. We conclude the chapter with several recommendations for future work in this area.
Conference Paper
In the size and coverage of Wikipedia, a freely available online encyclopedia has reached the point where it can be utilized similar to an ontology or taxonomy to identify the topics discussed in a document. In this paper we show that even a simple algorithm that exploits only the titles and categories of Wikipedia articles can characterize documents by Wikipedia categories surprisingly well. We test the reliability of our method by predicting categories of Wikipedia articles themselves based on their bodies, and by performing classification and clustering on 20 newsgroups and RCV1, representing documents by their Wikipedia categories instead of their texts
Article
With the rapid proliferation of Web services as the medium of choice to securely publish application services beyond the firewall, the importance of accurate, yet flexible matchmaking of similar services gains importance both for the human user and for dynamic composition engines . In this paper, we present a novel approach that utilizes the case based reasoning methodology for modelling dynamic Web service discovery and matchmaking. Our framework considers Web services execution experiences in the decision making process and is highly adaptable to the service requester constraints. The framework also utilises OWL semantic descriptions extensively for implementing both the components of the CBR engine and the matchmaking profile of the Web services
Article
An analysis on the structure and content of more than one million blogs worldwide for unearthing insights into blogger behavior is discussed. An analysis on blogspace reflects two distinct perspectives which are the temporal and the spatial. The profile pages of 1.3 million bloggers at livejournal.com was studied in which each blogger has a self-reported profile of personal information, including name, geographic location, date of birth, interests, and friends. It was found that three out of four livejournal bloggers were between 16-24 age and there interests are highly corrrelated with age.
Article
Social Network Analysis methods hold substantial promise for the analysis of Semantic Web metadata, but considerable work remains to be done to reconcile the methods and findings of Social Network Analysis with the data and inference methods of the Semantic Web. The present study de-velops a Social Network Analysis for the foaf:knows and foaf:interests rela-tions of a sample of LiveJournal user profiles. The analysis demonstrates that although there are significant and generally stable structural regularities among both types of metadata, they are largely uncorrelated with each other. Also there are large local variations in the clusters obtained that mitigate their reliability for inference. Hence, while information useful for semantic inference over user profiles can be obtained in this way, the distributional nature of user profile data needs closer study.
Conference Paper
The processes by which communities come together, attract new members, and develop over time is a central research issue in the social sciences — political movements, professional organizations, and religious denominations all provide fundamental examples of such communities. In the digital domain, on-line groups are be- coming increasingly prominent due to the growth of community and social networking sites such as MySpace and LiveJournal. How- ever, the challenge of collecting and analyzing large-scal e time- resolved data on social groups and communities has left most basic questions about the evolution of such groups largely unresolved: what are the structural features that influence whether indi viduals will join communities, which communities will grow rapidly, and how do the overlaps among pairs of communities change over time? Here we address these questions using two large sources of data: friendship links and community membership on LiveJournal, and co-authorship and conference publications in DBLP. Both of these datasets provide explicit user-defined communities, where confer- ences serve as proxies for communities in DBLP. We study how the evolution of these communities relates to properties such as the structure of the underlying social networks. We find that the propensity of individuals to join communities, and of communities to grow rapidly, depends in subtle ways on the underlying network structure. For example, the tendency of an individual to joi n a com- munity is influenced not just by the number of friends he or she has within the community, but also crucially by how those friends are
Conference Paper
Measuring distance or some other form of proximity between ob- jects is a standard data mining tool. Connection subgraphs were re- cently proposed as a way to demonstrate proximity between nodes in networks. We propose a new way of measuring and extract- ing proximity in networks called "cycle free effective conducta- nce" (CFEC). Our proximity measure can handle more than two endpoints, directed edges, is statistically well-behaved, and pro- duces an effectiveness score for the computed subgraphs. We pro- vide an efficient algorithm. Also, we report experimental results and show examples for three large network data sets: a telecommu- nications calling graph, the IMDB actors graph, and an academic co-authorship network.
Conference Paper
Wikipedia provides a knowledge base for computing word relatedness in a more structured fashion than a search engine and with more coverage than WordNet. In this work we present experiments on using Wikipedia for computing semantic relatedness and compare it to WordNet on various benchmarking datasets. Existing relatedness measures perform better using Wikipedia than a baseline given by Google counts, and we show that Wikipedia outperforms WordNet when applied to the largest available dataset designed for that purpose. The best results on this dataset are obtained by integrating Google, WordNet and Wikipedia based measures. We also show that including Wikipedia improves the performance of an NLP application processing naturally occurring texts.