ArticlePDF Available

A Survey of Appliactions and Researches on Schema Matching between GIS Spatial Data

Authors:

Abstract

As a fundamental problem of data management and application technology, schema matching has aroused the universal concern of the academic circles worldwide in recent years. In order to deepen the understandings of schema matching between spatial data and to identify its uses, the documentation method is adopted in this paper to firstly summarize and describe the foundation position and guidance role of schema matching in some typical applications such as spatial data integration (including schema-level integration and instance-level integration), updating information propagation, semantic query and handling, web geo-service finding. Then, aiming to the manual performance limitations of schema matching task in most systems, the previous works on schema matching are discussed mainly from four aspects of matching implementation approaches, matching efficiency optimization, matching results representation and matching capability evaluation for designing an automated approach and system. The related theories, models, approaches, limitations and new trends of current researches on schema matching are respectively analyzed. The conclusion is drawn by these analyses that schema matching researches are still faced with many theoretical and technological problems, the matching between schemas of spatial data will be more difficult and severe, and thus needs further studies since they are more heterogeneous, vaster and complex in structure than schemas of common data.
A SURVEY OF APPLICATIONS AND RESEARCHES ON SCHEMA MATCHING
BETWEEN GIS SPATIAL DATA
WANG Yu-honga, ZHANG He-binga, XU Junb
a School of Surveying and Land Information Engineering, Henan Polytechnic University, Jiaozuo, China-wangyh@hpu.edu.cn
b School of Economics and Management, Henan Polytechnic University, Jiaozuo, China-xujun@hpu.edu.cn
KEY WORDS: Implementation Approach; Efficiency Optimization; Result Representation; Capability Evaluation
ABSTRACT:
As a fundamental problem of data management and application technology, schema matching has aroused the universal concern of
the academic circles worldwide in recent years. In order to deepen the understandings of schema matching between spatial data and
to identify its uses, the documentation method is adopted in this paper to firstly summarize and describe the foundation position and
guidance role of schema matching in some typical applications such as spatial data integration (including schema-level integration
and instance-level integration), updating information propagation, semantic query and handling, web geo-service finding. Then,
aiming to the manual performance limitations of schema matching task in most systems, the previous works on schema matching are
discussed mainly from four aspects of matching implementation approaches, matching efficiency optimization, matching results
representation and matching capability evaluation for designing an automated approach and system. The related theories, models,
approaches, limitations and new trends of current researches on schema matching are respectively analyzed. The conclusion is drawn
by these analyses that schema matching researches are still faced with many theoretical and technological problems, the matching
between schemas of spatial data will be more difficult and severe, and thus needs further studies since they are more heterogeneous,
vaster and complex in structure than schemas of common data.
1. INTRODUCTION
Along with the increasingly maturation and widely popularizat-
ion of GIS science and technology, GIS spatial data is rapidly
increasing day by day. In order to take full advantage of these
obtained data, to reduce the cost of system development and to
promote their comprehensive analysis and application, the
sharing and interoperation issues of spatial data are always the
core and focus in the field of GIS study. The theoretical and
technological problems associated with spatial data sharing and
interoperation are very many, such as data schema integration
(or merging), data instance integration, updating information
propagation, semantic query processing, geo web service
finding, and so on. Although these problems vary in the
concrete solutions, there is a common key link during them,
which is schema matching.
Schema Matching is the process of finding the semantical same
or related elements from two or several data schemas based on
various kinds of auxiliary decision-making information, and
specifying the actual mapping relationships among them
according to the application requirements. For example, the
different levels of the related elements and their mapping
relationships shown in the right part of Figure 1 can be found
and specified by schema matching from partial schemas of two
GIS databases shown in the left part of Figure 1.
In order to deepen the understanding of schema matching issue
between spatial data and provide theoretical basis and technical
reference for developing the efficient and practical schema
matching systems, the typical applications of schema matching
are firstly summarized in this paper, and then the related
contents, principles, models and approaches achieved by the
current researches are discussed.
Figure 1. Diagram of Schema Matching
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-7/W4, 2015
2015 International Workshop on Image and Data Fusion, 21 – 23 July 2015, Kona, Hawaii, USA
This contribution has been peer-reviewed.
doi:10.5194/isprsarchives-XL-7-W4-175-2015
175
2. MAIN APPLICATIONS OF SCHEMA MATCHING
To motivate the importance of schema matching, we give the
brief summaries of the applications of schema matching in GIS
domains in the following section.
2.1 Spatial Data Schema Integration
Most work on schema match has been motivated by schema
integration. Schema integration is the process of constructing of
global schema from a group of independently developed
schemas. Due to the difference in application fields,
development habits and preferences, the schemas to be
integrated may be different in logical structures and expression
forms, even if they are used to describe the same phenomena or
things. Thus, the first step of schema integration is to identify
the related elements and specify the mappings between them
through schema matching. Only according to these mappings
some operations such as merging, eliminating redundancy and
reorganizing can be performed on local schemas to produce a
global comprehensive schema (Wang, et al., 2007; Volz, et al.,
2008).
2.2 Spatial Data Instance Integration
Data instance integration is to organically combine the actual
data records from various sources into a whole for transparently
and seamlessly accessing and utilizing them. The core of data
instance integration is to build schema mapping relationships by
schema matching (Liu, et al., 2006). The required instances in
data sources are filtered, extracted, transformed, fused, cleaned
in term of schema mappings and uploaded into the target
database or uniform retrieval interfaces for shielding the
instance expression differences among data sources (Li, et al.,
2012).
2.3 Updating Information Propagation
Updating information propagation is the process of utilizing the
updating and change information of spatial entities (or features)
in the newly-updated spatial database to revise, refine and
correct the content of other databases constructed based on the
original copy of it for ensuring that they also have an up-to-date
representation of the real word. One of its basic requirements is
to keep the updated databases autonomous, complete, correct
and consistent as much as before. To meet this requirement,
various operations, such as schema matching, change
detection, entity identification, updates integration, etc, are
proposed by many researchers (Laurent, 1998; Arnaud, et al.,
2004; Wang, et al., 2010). The most important of these
operations is schema matching because it is the basis for
other operations and the results achieved by it can be used to
guide and facilitate other operations.
2.4 Semantic Query Processing
Majority of spatial data query are based on keyword matching at
present. If the keywords input by users are not all identical to
the names of schema elements of the queried data, the not-
needed or useless results will be returned. In order to overcome
this defect, semantic query theory is proposed (Wang, et al.,
2007). Semantic query, also called semantic retrieval, concept
matching, refers to transform the keywords in query statements
to make them uniform with the schema elements through
matching operations between query statements and schema
elements for returning the accurate results.
2.5 Geo Web Services Finding
Geo web services are some internet applications which can
supply some basic geographical operations such as address
matching, map drawing, route planning to developers and allow
them to integration the spatial data and the related functions
into their own web applications without implementing them by
themselves (Bernard, et al., 2003). Along with the occurrence of
more and more geo web services, it is particularly important to
rapidly and accurately find the needed ones. There will no or
unsatisfactory services to be found during services finding, once
the service requestor and provider used the different terms to
describe the same concepts or used the same term to describe
the different concepts. Moreover, the semantic heterogeneities
resulted from the version differences of geo web services will
more increase the difficulty of service finding. Like semantic
query, schema matching can be used to facilitate the solution of
this question (He, et al., 2011).
3. EXISTING RELATED RESEARCHES ABOUT
SCHEMA MATCHING
As a fundamental problem of data management and application
technology, schema matching has aroused the universal concern
of the academic circles worldwide in recent years. A lot of
research works on it have been carried out by people from
various fields such as database, artificial intelligence,
information retrieval, knowledge management, semantic web
and so on. To sum up, the existing researches mainly focus on
four aspects: implementation approach, efficiency optimization,
result representation, capability evaluation.
3.1 Implementation Approach
Currently, schema matching is typically performed manually,
perhaps supported by a visual interface such as attribute transfer
mapping functions in ArcGIS 10.0, workbench component in
FME2011, etc. The manual specifying of schema matches is
assumed that users have sufficient knowledge of both source
and target schema. Moreover, it tends to a tedious, time-
consuming, error-prone, and therefore expensive process along
with the increase of the number of elements to be matched.
To overcome the limitations of manual matching, various
automated (or semi-automated) approaches are proposed. The
basic idea of the automated approaches is to evaluate and
express the similarity of elements between schemas. If a certain
degree of similarity can be detected, two elements can be
assigned to each other. According to the source of information
that can be used for similarity evaluation, the automated
approaches can be divided into two main kinds: element-based
approach and instance-based approach.
Element-based approach determines schema matches by
comparing information on elements themselves (such as names,
documents, specifications) based on prerequisite that similar
elements may have similar representations. String-matching
methods and linguistic tools are used to measure the similarities
among class names and attribute names (Stephen, et al., 1990).
Descriptions of classes and attributes in design documents can
be compared through document-similarity measures developed
in the information-retrieval field (Benkley, et al., 1995).
Specifications (including data type, length, value range,
optional, etc) on attributes can be compared according to the
predefined rules (Li, et al., 1993; Qiang, et al., 2003).
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-7/W4, 2015
2015 International Workshop on Image and Data Fusion, 21 – 23 July 2015, Kona, Hawaii, USA
This contribution has been peer-reviewed.
doi:10.5194/isprsarchives-XL-7-W4-175-2015
176
However, element-based approach is often prone to produce the
imperfect results due to the unreliability and incompleteness of
element-level information. For example, two elements that share
the same name can refer to different classes or attributes; two
elements with different names can refer to the same class or
attribute. There may be several elements with similar specificati
-ons, but different meaning between two schemas. Design
documents are often outdated, incomplete, incorrect, ambiguous,
or simply not available.
Instances can give important hints about the contents and
meaning of schema elements (Erhard, et al., 2001), and thus
they are usually used for attribute-level matches based on the
simple principle: similar statistical characteristics or data values
between attributes imply corresponding attributes. For example,
summary instance information of attributes (such as mean and
standard deviation, max, min, average, etc) is used together with
the specifications to measure attribute similarities (Li, et al.,
1993; Qiang, et al., 2003). However, summary instance
information is necessary but not sufficient for schema matching
(Cecil, et al., 2003). In (Lu, et al., 1997), the statistical
correlation coefficients between numerical attributes is firstly
computed based on the overlapping instances and then is used
as attribute similarity measures. In (Cecil, et al., 2003; Bilke, et
al., 2005), the literal similarity among the overlapping instances
is used to measure similarity between textual attributes.
However, because of the difficulties of comparing and
analyzing data instances with the unknown schema matches and
the diversities of instance representation (Zhao, 2007), the
instance-based approaches are still facing with at least three
problems. Firstly, they only focus on attribute-level matches and
leave the problem of determining class-level matches unsolved.
Secondly, the overlapping instances are often obtained
manually or by comparing common ID between objects. This
requires that common ID must exit and has been matched.
However, common ID usually does not exist. Thirdly, some real
attribute-level matches will be missed due to the lake of the
comprehensive consideration to the discrepancies between the
overlapping instances, such as different formats, different scales,
spelling errors, different code, etc.
3.2 Efficiency Optimization
The current difficulty of schema matching lies not only in the
lack of practical strategies or rules to identify whether the
schema elements are matched but also in the high cost for
performing matching based on the predefined rules, which
generally need a large number of computations and comparisons
to find the possible matches. Therefore, the researches on
schema matching efficiency optimization models and algorithms
have to be strengthened. There are only a few systems
considering or handling the problem of performance efficiency.
To sum up, the following four techniques for improving the
performance were introduced in the different types of systems
(Eric, et al., 2010).
Divide and conquer: A number of systems apply a divide and
conquer strategy when matching large schemas. They first try to
manually or automatically identify relevant fragments, blocks,
partitions or clusters. The further matching is then performed on
these identified schema parts, which reduces the search space.
Unfortunately, this approach could worsen the overall result
quality.
Filtering schema parts: Some systems apply a schema reduction
upfront by filtering out the relevant context or by involving the
user through a questionnaire. Some systems automatically
identify non-needed edges in the schema-graph structure or
apply heuristics to reduce the number of comparisons at the cost
of quality. Also the famous edit-distance algorithm can be
improved by early pruning of comparisons. Similar strategies
for reducing the search space were proposed in the record
linkage area. These strategies are called blocking and try to
reduce the number of candidate record comparison pairs while
still maintaining reasonable linkage accuracy.
Avoiding repetitions: A general performance technique is to
avoid the repeated execution of the same subtask. For example,
a pre-matching step such as tokenizing all labels avoids the
repeated tokenization in later match comparisons.
Improved data structures: A number of techniques use special
data structures like indexes or hash tables to improve
performance. Indexing helps to quickly identify the right
elements to compare with. For instance, the B-Match-Approach
indexes tokens and its labels. That saves string comparisons
based on the assumption that two similar labels share at least
one common token. Others remove the nested looping effort
since each element in the source needs to be compared to each
element in the target by introducing a hash-join like method.
They also cache already computed results for later reuse.
3.3 Result Representation
The Main task of matching results representation is to organize
and store the related schema elements and the actual mapping
between them achieved by schema matching, and to build the
necessary access and retrieval approaches for guiding and
simplifying other operations in various applications. At present,
some matching tools directly store results into plain text files
according to their own needs. This kind of representation lacks
the sufficient semantical expressiveness and processing
capability, and thus makes it complicated to access matching
results and difficult to sharing them among many systems.
Several tools store matching results into relational database.
Due to the semi-structured characteristic of matching results,
the relational expression will yield many redundant fields with
“null” values in result tables and be incapable of effectively
recording some complex matching or mapping relationships
such as conditional matches, partial matches and computational
matches (Han, et al., 2006). Moreover, as the schema elements
to be matched change, the structure of result tables may also
change accordingly. This makes it inconvenient to manage and
maintain the matching results.
Aiming to limitations of the above representation ways, some
researchers have tried to utilize logic-based languages or semi-
structured models to represent and store matching results. For
example, Yuan et al. (2005) used first order logic to express the
complex mappings between XML schemas and OWL ontolog-
ies. In order to provide a better understanding of the
commonalities and differences of existing proposals for
ontology mapping languages, Serafini et al. (2005) used
distributed first order logic (DFOL) to give a formal comparison
of existing mapping languages. In BRICK system (Kearney, et
al., 2007), XML model is used to store and manage ontology
mappings.
3.4 Capability Evaluation
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-7/W4, 2015
2015 International Workshop on Image and Data Fusion, 21 – 23 July 2015, Kona, Hawaii, USA
This contribution has been peer-reviewed.
doi:10.5194/isprsarchives-XL-7-W4-175-2015
177
To identify a solution for a particular match problem, it is
important to understand which of the proposed techniques
performs best. The performance capabilities of matching
systems incorporate different (possibly conflicting) aspects such
as effectiveness, efficiency, genericity and usability. The
effectiveness is concerned with the accuracy and the correctness
of the matching results. The efficiency is concerned with
resources consumption (time, memory,...). The genericity is
concerned with the application domains of matching systems.
The ideal matching methods should be applicable to different
match tasks from various domains and for different data models.
The usability is concerned with the ease-of-use of match
systems and the manual effort savings (Do, et al., 2002;
Algergawy, et al., 2008; Köpcke, et al. 2010).
At present, the evaluation of matching capabilities mainly
concentrates on the effectiveness aspect. To show the
effectiveness, some researchers have usually demonstrated
their proposed tools to some real world scenarios or conducted
a study using a range of schema matching tasks. However, it is
quite difficult to evaluate the matching systems for several
reasons. First, the systems are not always available as a demo
and it is not possible to test them against specific sets of
schemas. Second, some systems require specific resources to be
efficient, like an ontology or a thesaurus, which are not always
available. Finally, some matching tools take as input specific
additional files (Duchateau, et al., 2007a). Thus, a generally
accepted benchmark is particularly important to users,
developers and researchers for comprehensively comparing and
evaluating them (Zohra et al., 2011). Some useful attempts and
prototypes (e.g. XBenchMatch, STBenchmark) have been done
and developed (Duchateau, et al., 2007b; Bogdan, et al., 2008).
4. CONCLUSIONS
After more than 30 years of unremitting efforts, gratifying
results were achieved in the studies of schema matching, from
the simple matcher based on information from schemas
themselves, to the composite matcher utilizing various
information such instances, structures, to the human-based
matching tools or systems and to the systematic theory supports
Schema matching still is a challenging issue due to its
subjectivity, uncertainty and complexity.
According to the documents and materials, the researches on
schema matching in the field of GIS spatial data are currently
very weak. The existing relevant discussions are mostly
parenthetic explanations on schema matching concept and lack
the pertinent and detailed analysis despite of a few works
concentrating on the design and realization of the actual
approaches and systems. Compared with the characters of
spatial data schema of many types, large scales and complex
structures, the current researches are not sufficient enough to
meet the requirements of an ideal system on genericity,
robustness, flexibility, interactivity and so on. It is very
necessary to actively and deeply carry our the further research
works on schema matching so as to provide theoretical supports
and technical guarantees for effective sharing and intelligent
services of spatial data resources.
ACKNOWLEDGEMENTS
The work described in this paper is supported by the united
fund project of Natural Science Foundation of China and the
People’s Government of Henan Province (U1304401) and the
sub-project of the 12th Five Years Programs for Science and
Technology Development of China (2012BAJ23B04-2).
REFERENCES
Algergawy A., Schallehn E., Saake G., 2008. Combining
Effectiveness and Efficiency for Schema Matching Evaluation.
Proceedings of 1st International Workshop on Model-Based
Software and Data Integration, Germany, pp.19-30.
Arnaud Braun, 2004. From the Schema Matching to the
Integration of Updating Information into User Geographic
Database, Proceeding of Geoinformaticas 2004, pp.211-218.
Benkley, Fandozzi, Housman, Woodhouse, 1995. Data element
tool-based analysis (DELTA). Technical Report MTR
95B0000147, The MITRE Corporation, Bedford.
Bernard L., Einspanier U., Lutz M., et al., 2003. Interoperability
in GI Service Chains — The Way Forward. The 6th AGILE
Conference on Geographic Information Science, Lyon , France.
Bilke Alexander, Naumann Felix, 2005. Schema Matching
using Duplicates. Proceedings of the 21st International
Conference on Data Engineering, pp.69-80.
Bogdan A., Tan W., Velegrakis Y., 2008. STBenchmark:
Towards a Benchmark for Mapping Systems. Proceeding of
VLDB '08, August 23-28, Auckland, New Zealand, pp.230-244.
Cecil E. H. C., Roger H. L. C., Ee-Peng L., 2003. Instance-
based attribute identification in database integration, VLDB
Journal, Vol.12, No.3, pp.228–243.
Do H., Melnik S., Rahm E., 2002. Comparison of schema
matching evaluations. Lecture Notes in Computer Science: Web,
Web-Services, and Database Systems, 2593, pp.221-237.
Duchateau F., Bellahsène Z., 2007b. Designing a Benchmark
for the Assessment of XML Schema Matching Tools,
Proceeding of VLDB 2007, September 23-28, Vienna, Austria.
Duchateau F., Bellahsène Z., Hunt E., 2007a. XBenchMatch: a
Benchmark for XML Schema Matching Tools. Proceeding of
VLDB 2007, September 23-28, Vienna, Austria, pp.1318-1321.
Erhard Rahm, Philip Bernstein, 2001. A survey of approaches
to automatic schema matching. VLDB Journal, Vol.10, No.4,
pp.334-350.
Eric P., Henrike B., Erhard R., 2010. Rewrite Techniques for
Performance Optimization of Schema Matching Processes.
Proceeding of 13th International Conference on Extending
Database Technology, Lausanne, Switzerland, pp.433-464.
HAN Zhongming, CHEN Dehua, LE Jiajin, 2006. Schema
Mapping and Representation. Journal of Huadong University,
22(2), pp.42-45.
HE Jie, CHEN Neng-cheng, WANG Wei, et al., 2011. A
uniform approach for multi-version web feature service retrieve
based on dynamic schema matching. Science of Surveying and
Mapping, Vol. 36, No.1, pp. 169-172.
Kearney K., 2007. Ontology Mapping in BRICKS. Proceedings
of Workshop on Ontology-Driven Interoperability for Cultural
Heritage Objects.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-7/W4, 2015
2015 International Workshop on Image and Data Fusion, 21 – 23 July 2015, Kona, Hawaii, USA
This contribution has been peer-reviewed.
doi:10.5194/isprsarchives-XL-7-W4-175-2015
178
Köpcke H., Rahm E., 2010. Frameworks for entity matching: A
comparison. Data & Knowledge Engineering, 69(2), pp.197-
120.
Laurent Spery, 1998. Spatial Data Transfer in the case of
Update. International Archives of Photogrammetry and Remote
Sensing, vol.32, no.4, pp.586-593.
LI Jun, SU Guo-zhong, LI Meng, 2012. Shielding the
heterogeneity of geospatial data sources by using GML schema
mapping. Science of Surveying and Mapping, Vol. 37, No.1, pp.
38-41.
Li Wen-Syan, Clifton Chris, 1993. Using field specifications to
determine attribute equivalence in heterogeneous databases.
Proceeding of Third International Workshop on Research
Issues in Data Engineering - Interoperability in Multidatabase
Systems, Vienna, Austria, pp.174-177.
LIU Min-chao, LIU Wei-dong, 2006. Research on key problems
in data integration system. Journal of Computer Applications,
Vol.26, No.7, pp.1507-1510.
Lu Hongjun, Fan Weiguo, Cheng Hian Goh, 1997. Discovering
and Reconciling Semantic Conflicts: A Data Mining
Perspective. Proceedings of the 7th IFIP 2.6 Working
Conference on Database Semantics, Leysin, Switzerland.
Qiang Baohua, Wu Kaiwei, Wu Zhongfu, 2003. A data-type-
based approach for identifying corresponding attributes in
heterogeneous databases, Proceedings of the Second
International Conference on Machine Learning and
Cybernetics, Xi’an, pp.299-344.
Serafini L., Stuckenschmidt H., Wache H., 2005. A formal
investigation of mapping languages for terminological
knowledge. Proceedings of the 19th international joint
conference on Artificial intelligence, pp.576–581.
Stephen Hayne, Sudha Ram, 1990. Multi-user view integration
system (MUVIS) an expert system for view integration,
Proceedings of the Sixth International Conference on Data
Engineering, pp.402-409.
Volz S, Danielas N, Grossmann M, et al, 2008. On Creating a
Spatial Integration Schema for Global, Context-aware
Applications. Proceedings of GeoInfo 2008, pp.13-24.
WANG Hongding, TAN Shaohua, TANG Shiwei, etc., 2007.
Schema Merging Study with Semantic Relationships of Schema
Elements. Acta Scientiarum Naturalium Universitatis
Pekinensis, Vol.43, No.3, pp405-411.
WANG Yandong, GONG Jianya, DAI Jingjing, 2007. Spatial
Data Semantic Query Based on Ontology. Journal of Geomatics,
Vol.32, No.2, pp.32-35.
WANG Yu-hong, CHEN Jun, 2010. Implementation Approach
for Propagating Updates of Fundamental Geographic Database.
Geomatics and Information Science of Wuhan University, Vol.
39, No.1, pp.1116-1120.
Yuan A., Borgida A., Mylopoulos J., 2005. Constructing
Complex Semantic Mappings between XML Data and
Ontologies. Proceedings of the 4th International Semantic Web
Confrence, Ireland, pp.6-19.
Zhao Huimin, 2007. Semantic matching across heterogeneous
data sources. Communications of the ACM, Vol.50, No.1,
pp.45-50.
Zohra B., Angela B., Fabien D., et al., 2011. On Evaluating
Schema Matching and Mapping. Schema Matching and
Mapping, Springer Berlin Heidelberg, pp.253-291.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-7/W4, 2015
2015 International Workshop on Image and Data Fusion, 21 – 23 July 2015, Kona, Hawaii, USA
This contribution has been peer-reviewed.
doi:10.5194/isprsarchives-XL-7-W4-175-2015
179
ResearchGate has not been able to resolve any citations for this publication.
Chapter
Full-text available
Schema matching plays a central role in many applications that require interoperability among heterogeneous data sources. A good evaluation for different capabilities of schema matching systems has become vital as the complexity of such systems arises. The capabilities of matching systems incorporate different (possibly conflicting) aspects among them match quality and match efficiency. The analysis of efficiency of a schema matching system, if it is done, tends to be done in a way separate from the analysis of effectiveness. In this paper, we present the trade-off between schema matching effectiveness and efficiency as a multi-objective optimization problem. This representation enables us to obtain a combined measure as a compromise between them. We combine both performance aspects in a weighted-average function to determine the cost-effectiveness of a schema matching system. We apply our proposed approach to evaluate two currently existing mainstream schema matching systems namely COMA++ and BTreeMatch. Experimental results showed that, by carefully utilizing both small-scale and large-scale schemas, it is necessary to take the response time of the matching process into account especially in large-scale schemas.
Conference Paper
Full-text available
Much data is published on the Web in XML format satisfying schemas, and to make the Semantic Web a reality, such data needs to be interpreted with respect to ontologies. Interpretation is achieved through a semantic mapping between the XML schema and the ontology. We present work on the heuristic construction of complex such semantic mappings, when given an initial set of simple correspondences from XML schema attributes to datatype properties in the ontology. To accomplish this, we first offer a mapping formalism to capture the semantics of XML schemas. Second, we present our heuristic mapping construction algorithm. Finally, we show through an empirical study that considerable effort can be saved when constructing complex mappings by using our prototype tool.
Conference Paper
Full-text available
Current approaches to semantic interoperability require human intervention in detecting potential conflicts and in defining how those conflicts may be resolved. This is a major impedance to achieving “logical connectivity”, especially when the dumber of disparate sources is large. In this paper, we demonstrate that the detection and reconciliation of semantic conflicts can be automated using tools and techniques developed by the data mining community. We describe a process for discovering such rules that represent the relationships among semanticaly related attributes and illustrate the effectiveness of our approach with examples.
Article
Updates propagation is the process of using updating information of the changed features in the new version of fundamental geographic database (FGDB) to perform updating operations such as revision, addition, deletion on the corresponding features and their relates data in client database (CDB) for keep it also having good currency. At present, updates propagation is often implemented manually, needs too much human-computer interaction and is time-taking, laborious and error-prone. In order to automatically and effectively propagate updates from FGDB to CDB, four basic operations necessary for updates propagation, schema matching, change extraction, entity identification and updates integration, are respectively discussed and analyzed aiming to its basic implementation requirements and the influences of semantic heterogeneity on it. The automated execution approaches to these four operations are comparatively analyzed and evaluated. These studies are helpful to grasp the concepts, the implementation requirements, the difficulties, the solutions of updates propagation and the key problems needed to be further researched, and provide the necessary research foundations for designing and developing the automated execution algorithms and the software tool.
Article
A spatial information query structure based on ontology is put forward. It can express the level relationship and semantic of spatial information effectively, and query Spatial data from different databases based on consistent semantic. The structure realizes not only spatial data share, but also semantic share.
Article
This article presents the results of a study on the definition of models and functionalities of tools for the integration of updating information into geographic databases. This updating information is delivered by a geographic information producer (such as a National Mapping Agency). These tools should be generic, i.e. independent of geographic databases and GIS software. This can be reach with a rigorous schema matching between user and producer databases schemas. With such a schema matching, a process with four stages is proposed: scheduling and grouping, filtering, integration of updates and management of conflicts. As a conclusion, implementation of a portable prototype is proposed.
Article
More and more often people exchange geospatial information. This happens when a user gets data from producer. For a transfer of updated data a producer provides a new geographical dataset to users. Nowadays, this update transfer is usually a bulk transfer. Hence, transferred data describes a snapshot of the world. Successive changes undergone by features are not described. To improve the integration process, these modifications should be isolated in the transferred dataset. We study how metadata could be used to represent change in a bulk data transfer. We propose to use information about dating and lineage to detect altered features. Then, only these altered features are integrated in the user information system. For a bulk data transfer, this metadata improve the integration process. Moreover, we extend the use of metadata. It is used during the data transfer and it is not limited to a document role.
Chapter
The increasing demand of matching and mapping tasks in modern integration scenarios has led to a plethora of tools for facilitating these tasks. While the plethora made these tools available to a broader audience, it led to some form of confusion regarding the exact nature, goals, core functionalities, expected features, and basic capabilities of these tools. Above all, it made performance measurements of these systems and their distinction a difficult task. The need for design and development of comparison standards that will allow the evaluation of these tools is becoming apparent. These standards are particularly important to mapping and matching system users, since they allow them to evaluate the relative merits of the systems and take the right business decisions. They are also important to mapping system developers, since they offer a way of comparing the system against competitors, and motivating improvements and further development. Finally, they are important to researchers as they serve as illustrations of the existing system limitations, triggering further research in the area. In this work, we provide a generic overview of the existing efforts on benchmarking schema matching and mapping tasks. We offer a comprehensive description of the problem, list the basic comparison criteria and techniques, and provide a description of the main functionalities and characteristics of existing systems.
Conference Paper
Web feature service is an important part of geospatial data interoperability and enables GIS data with different formats to be operated at feature-level. Open Geospatial Consortium (OGC) has already developed standard implementation specifications for Web feature service, and for those Web feature services with different versions, such as WFS1.1, WFS1.0. OGC also define different schemas. Different version Web feature service has different schema and different version WFS service support different data formats, all these result in problems for interoperability among web feature service with different versions. Such as user only request some versions WFS service specified by WFS server, when WFS server find the request service version is not supported by it may cease communicating with user. In order to enable multi-version Web feature service requests in a uniform interface, the paper design a service retrieve plug-and-play middleware to match different version schemas dynamically and automatically transform the XML result document into a form that meet the requirement of user request, ultimately to achieve WFS service request independence of service version. At end the whole system's schema matching quality and information retrieval efficiency have been tested successfully in GeoServer WFS services, the tested results show the feasibility of the designed approach.
Article
Entity matching is a crucial and difficult task for data integration. Entity matching frameworks provide several methods and their combination to effectively solve different match tasks. In this paper, we comparatively analyze 11 proposed frameworks for entity matching. Our study considers both frameworks which do or do not utilize training data to semi-automatically find an entity matching strategy to solve a given match task. Moreover, we consider support for blocking and the combination of different match algorithms. We further study how the different frameworks have been evaluated. The study aims at exploring the current state of the art in research prototypes of entity matching frameworks and their evaluations. The proposed criteria should be helpful to identify promising framework approaches and enable categorizing and comparatively assessing additional entity matching frameworks and their evaluations.