Article

Learning to Map between Ontologies on the Semantic Web

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Ontologies play a prominent role on the Semantic Web. They make possible the widespread publication of machine understandable data, opening myriad opportunities for automated information processing. However, because of the Semantic Web's distributed nature, data on it will inevitably come from many different ontologies. Information processing across ontologies is not possible without knowing the semantic mappings between their elements. Manually finding such mappings is tedious, error-prone, and clearly not possible at the Web scale. Hence, the development of tools to assist in the ontology mapping process is crucial to the success of the Semantic Web.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

...  GLUE [27] (University of Washington) is an example of an approach that uses learning techniques to find mappings. GLUE uses multiple learners to exploit the information in the bodies and the taxonomy of the ontology. ...
... This ontology will provide an overview of all the attributes used to describe the data. It is used to define the mappings (correspondences) between the concepts through equivalence relations [27]. ...
... The local ontology must be linked to the real information. Therefore, the mappings must be provided between the local ontology and each source of information [27]. ...
... 3) The hierarchies are generally organized in a way that similar categories are closer to each other (Doan et al. 2002). Such organization of the categories allows users to find what they want conveniently. ...
... Other taxonomy integration approaches have impractical assumptions for CQA integration. The GLUE system (Doan et al. 2002) performs 1-to-1 mapping from a category in the source taxonomy to a category in the target taxonomy. However, the notion of 1-to-1 mapping is too restrictive for general CQA integration. ...
Article
Question and answer pairs in Community Question Answering (CQA) services are organized into hierarchical structures or taxonomies to facilitate users to find the answers for their questions conveniently. We observe that different CQA services have their own knowledge focus and used different taxonomies to organize their question and answer pairs in their archives. As there are no simple semantic mappings between the taxonomies of the CQA services, the integration of CQA services is a challenging task. The existing approaches on integrating taxonomies ignore the hierarchical structures of the source taxonomy. In this paper, we propose a novel approach that is capable of incorporating the parent-child and sibling information in the hierarchical structures of the source taxonomy for accurate taxonomy integration. Our experimental results with real world CQA data demonstrate that the proposed method significantly outperforms state-of-the-art methods.
... Approaches that merge and fuse heterogeneous information structures. [32], [88], [87], [36], [84], [81], [35], [37] ...
... The authors in [32] proposed GLUE, a system that employs machine learning techniques to find mappings between two ontologies. For each concept in one ontology, GLUE finds the most similar concept in the other ontology. ...
Thesis
Software networks have the potential to take the network infrastructure to a more advanced level, a level that can make the configuration autonomic. This ability can overcome the rapidly growing complexity of current networks, and allow management entities to enable an effective behavior in the network for overall performance improvement without any human intervention. Configuration parameters can be automatically selected for network resources to cope with various situations that networks encounter like errors and performance degradation. Unfortunately, some challenges need to be tackled to reach that advanced level of networks. Currently, the configuration is still often generated manually by domain experts in huge semi-structured files written in XML, JSON, and YAML. This is a complex, error-prone, and tedious task to do by humans. Also, there is no formal strategy except experience and best practices of domain experts to design the configuration files. Different experts may choose different configurations for the same performance goal. This situation makes it harder to extract features from the configuration files and learn models that could generate or recommend automatic configuration. Moreover, there is still no consensus on a common configuration data model in software networks, which resulted in heterogeneous solutions, such as TOSCA, YANG, Hot, etc. that make the end-to-end network management difficult. In this thesis, we present our contributions that tackle the aforementioned challenges related to automating the configuration in software networks. To tack the problem of heterogeneity between the configuration files we propose a semantic framework based ontologies that can federate common elements from different configuration files. And, to tackle the problem of generating automatically the configuration, we propose two contributions, one contribution that considers deep neural networks to learn from configuration files models for recommending the configuration and another contribution based on a model-driven approach to assist automatically the design of the configuration files.
... Utilization data types or uniqueness constraints on attributes were applied in [35]- [42]. While, TranScm [12], Autoplex [43], Automatch [44]- [46], GLUE [47]- [48], SCM [49], and DUMAS [50] are using instance-based method. ...
Article
Schema matching plays a vital role in the information integration process from heterogeneous databases. Generally, the process of schema matching is to receive input, which are two databases (one as the source and another as a target), to match similarity attributes, and generate output in the form of mapping the similarity of the attribute pairs that are declared suitable. Furthermore, the user will assess these attribute pairs to determine whether the results obtained are correct or still need to be revised. Our previous study developed a model and software prototype of hybrid schema matching using a combination of constraint-based method and instance-based method. In this study, the model improved by adding new features. This paper discusses the increasing effectiveness of adding the features to customize the weight of matching criteria and string sizes matching. The hybrid model's best effectiveness is obtained when the weight of instance is 0.286, the type is 0.238, width is 0.190, nullable is 0.143, unique is 0.095, and the domain is 0.048. The matching process using a bigger string size increases the model effectiveness with the highest precision of 97.66 when the string size interval is between (length-100) and (length+100). The best combination of weight and string size variation obtains 97.66% precision, a 99.90% recall, and an f-measure of 98.74%.
... Global expertise bases had been used by many present models to examine ontologies for net data amassing. For instance, Gauch et al., [8] and Sieg et al., [9] Discovered Adapted ontologies from that Open Manual Scheme toward specify customers' preferences and interests in internet search. It was utilized by Downey et al., [9] to assist apprehend underlying user pursuits in queries. ...
Article
As a version for know-how description and formalization, ontologies are extensively used to represent person contours in Adapted net information gathering. However, whilst representing person contours, many fashions have utilized best expertise from either a Biosphere wide know-how base or a person nearby records. In this article, a customized ontology style is suggested pro expertise illustration and reasoning over user contours respectively. That version absorbs ontological user contours on or after each a biosphere know-how base and person local instance repositories. The ontology model is evaluated by way of evaluating it in opposition to benchmark fashions in internet data gathering. The consequences display that this ontology prototype is a hit.
... Demand for supporting tools and methods is increasing as more and more information and services are constantly available in e-libraries, e-commerce, blogs and forums [4], [14] [15], [31] on the other hand, Clear texts regarding the subject matter area are also a very important subject in information processing techniques to discover knowledge. This task can be facilitated by providing means to oversee the classification of the set of texts examined to match domain services [5], [13], [32]. This paper propose a "Semantic based Terms Relation Approach (STRA)" to classifying information for effective classification of WIS. ...
Article
Full-text available
The rapid growth of web information and its services in different areas such as e-commerce, healthcare, digital marketing, online booking, etc. is a challenge in providing accurate information in the domain services related to the user's query. The current web information of services classifies the retrieval of the relevant service and assists the classification by supporting the knowledge and classifications of the specific service information. Because of these limitations and the complexity of automatic update mechanisms to see this service information, a large number of non-related service information for a requested query, and getting the required web information of services is a cumbersome problem. This paper proposes Semantic based Terms Relation Approach (STRA) for classifying information for effective classification of WIS on the web. The approach utilize Concept Terms Similarity (CTS) method for the most relevant terms in a service domain and construct a Related Terms Hierarchal Model (RTHM), which will be used for classification. A modified Naive Bayes classifier is used to perform the classification of the web information of services using RTHM, to categorize and present accurately. The experiment evaluation of the proposed approach shows an improvement in the classification of information and achieve a highly related matching results against different number of users queries.
... Many present fashions to examine ontologies for net statistics jamboree. For instance, the authors Gauch et al., [10] and Sieg et al., [9], found out customized ontologies for that Exposed Almanac Scheme toward stipulate customers' favorites and pastimes inside net seek. Upon the source of that, Dewey fraction type, Yhe author Kingal. ...
Article
Full-text available
As an ideal for understanding portrayal and reinforcement, ontologies are broadly exhausted toward signify consumer outlines inside customized net info amassing. Though, whilst expressive person outlines, numerous fashions have exploited most effective know-how from either an international know-how sordid or a consumer insular data. Inside that article, a customized ontology version is projected pro expertise illustration and intellectual done consumer outlines. That paradigm absorbs ontological consumer outlines from each a global expertise ignoble and person neighborhood example sources. This ontology version is assessed through evaluating it in opposition to yardstick fashions inside net info jamboree. These consequences display that this ontology archetypal is an effective.
... Many works have addressed ontology matching in the context of ontology design [5], [6], [7]. These works do not deal with explicit notions of similarity. ...
Preprint
Full-text available
In this work we propose a new approach for semantic web matching to improve the performance of Web Service replacement. Because in automatic systems we should ensure the self-healing, self-configuration, self-optimization and self-management, all services should be always available and if one of them crashes, it should be replaced with the most similar one. Candidate services are advertised in Universal Description, Discovery and Integration (UDDI) all in Web Ontology Language (OWL). By the help of bipartite graph, we did the matching between the crashed service and a Candidate one. Then we chose the best service, which had the maximum rate of matching. In fact we compare two services` functionalities and capabilities to see how much they match. We found that the best way for matching two web services, is comparing the functionalities of them.
... Among identified problems is when ontology's are different in terms of context and background knowledge, it will bring to failure in discovering some correct mappings (Aleksovski et al., 2006;Sabou et al., 2006). Nonetheless, when tools are used for mapping two schemas or even ontology's, there is high possibility of missing information because not all concepts are mapped between them (Doan et al., 2002). In another aspect, the computer cannot make decisions and posses vocabulary understanding like a human can (Lilac and Al-Abdullatif, 2010). ...
Article
Full-text available
Problem statement: The wave of ontology has spread drastically in the cultural heritage domain. The impact can be seen from the growing number of cultural heritage web information systems, available textile ontology and harmonization works with the core ontology, CIDOC CRM. The aim of this study is to provide a base for common views in automating the process of mapping between revised TMT Knowledge Model and CIDOC CRM. Approach: Manual mapping was conducted to find similar or overlapping concepts which are aligned to each other in order to achieve ontology similarity. This is achieved after TMT Knowledge Model already undergone transformation process to match with CIDOC CRM structure. Results: Although there are several problems encountered during mapping process, the result shows an instant view of the classes which are found to be easily mapped between both models. Conclusion/Recommendations: Future research will be focused on the construction of Batik Heritage Ontology by using the mapping result obtained in this study. Further testing, evaluation and refinement by using the real collections of cultural artifacts within museums will also be conducted in the near future.
... These phenomena, together with the arrival of deep learning (DL) as an efficient and effective method for ML, have caused ML to expand into an increasing number of fields (Jordan and Mitchell, 2015). Pioneered by Doan et al. (2002), the use of ML in data integration has been expected for some time now (Halevy et al., 2006). Recently, widespread use of ML in data integration appears to be the new norm (see review by Dong and Rekatsinas, 2018). ...
Article
Full-text available
Oceanographic research is a multidisciplinary endeavor that involves the acquisition of an increasing amount of in-situ and remotely sensed data. A large and growing number of studies and data repositories are now available on-line. However, manually integrating different datasets is a tedious and grueling process leading to a rising need for automated integration tools. A key challenge in oceanographic data integration is to map between data sources that have no common schema and that were collected, processed, and analyzed using different methodologies. Concurrently, artificial agents are becoming increasingly adept at extracting knowledge from text and using domain ontologies to integrate and align data. Here, we deconstruct the process of ocean science data integration, providing a detailed description of its three phases: discover, merge, and evaluate/correct. In addition, we identify the key missing tools and underutilized information sources currently limiting the automation of the integration process. The efforts to address these limitations should focus on (i) development of artificial intelligence-based tools for assisting ocean scientists in aligning their schema with existing ontologies when organizing their measurements in datasets; (ii) extension and refinement of conceptual coverage of – and conceptual alignment between – existing ontologies, to better fit the diverse and multidisciplinary nature of ocean science; (iii) creation of ocean-science-specific 'entity resolution' benchmarks to accelerate the development of tools utilizing ocean science terminology and nomenclature; (iv) creation of ocean-science-specific schema matching and mapping benchmarks to accelerate the development of matching and mapping tools utilizing semantics encoded in existing vocabularies and ontologies; (v) annotation of datasets, and development of tools and benchmarks for the extraction and categorization of data quality and preprocessing descriptions from scientific text; and (vi) creation of large-scale word embeddings trained upon ocean science literature to accelerate the development of information extraction and matching tools based on artificial intelligence.
... Image ontology mapping is one of the two processes in the query about heterogeneously distributed image resources. The other process is rewriting the query based on the image ontology mapping [3], that is, to reformat the local modebased query as remote-mode based query. This paper applies the H-Match algorithm to find the mapping relationship between image ontologies in peer-topeer (P2P) environment. ...
... Our tests have shown that on several real-world fields we can fit 66-97% of the nodes correctly. 10 Every system is regarded as a reliable computer in the event of the current system. And so the intruder discovers it simple with false messages to assault the system. ...
... According to [19], the use of constraint-based is part of model group on schema matching which is included in level structure, but not described in more about what properties which explored and included as constraint. Instance-based method is used on TRANSCM [20], Autoplex [78], Automatch [79]- [81], GLUE [82], [83], SCM [84], as well as DUMAS [85]. ...
Article
Full-text available
Schema matching is critical problem within many applications to integration of data/information, to achieve interoperability, and other cases caused by schematic heterogeneity. Schema matching evolved from manual way on a specific domain, leading to a new models and methods that are semi-automatic and more general, so it is able to effectively direct the user within generate a mapping among elements of two the schema or ontologies better. This paper is a summary of literature review on models and prototypes on schema matching within the last 25 years to describe the progress of and research chalenge and opportunities on a new models, methods, and/or prototypes.
... al. [15] propose a flexible framework that is able to add new approaches for matching entities at the schema level and at the instance level. In this sense, our two-phase instance-based approach could be added to the framework and combined with schema-based matching approaches to disco oaches adopt machine learning techniques, such as LSD [4], GLUE [5] and Semint [10,11]. Although most of the machine learning techniques shows good results, their accuracy sometimes depends on a non-trivial manual effort, which we avoid by adopting genetic programming. ...
... According to the TF-IDF concept, the TF-IDF method calculates the weight value of the feature i t as follows [7][8][9] : ...
Article
Full-text available
Information retrieval has a well-established tradition of performing laboratory experiments on test collections to compare the relative effectiveness of different retrieval approaches. The experimental design specifies the evaluation criterion to be used to determine if one approach is better than another. Retrieval behavior is sufficiently complex to be difficult to summarize in one number, many different effectiveness measures have been proposed. A concept model is implicitly possessed by users and is generated from their background knowledge. This model learns ontological user profiles from both a world knowledge base and user local instance repositories. The ontology model is evaluated by comparing it against benchmark models in web information gathering. The results show that this ontology model is successful.
Article
Full-text available
The ontology alignment process is a big challenge especially in open environments like the semantic web. In this process, the construction of the global ontology corresponds to the application of a succession of operations of change (add new mappings). This is obviously a critical task because the new implementation of changes can make the global ontology incoherent. Inevitably, some mappings will be a source of contradictions in terms of ontology. In order to handle these contradictions, the authors propose in this study a new formal solution based on CTL model checking for a consistent ontology alignment. This model uses the Kripke structure which satisfies the CTL logic formula to model the behaviors of the ontology of alignment. To test the functional utility of the proposed approach, they applied it to several examples and the results showed that the proposed framework is working correctly and that inconsistencies are detected in CTL models.
Chapter
Security has become the biggest concern in recent technologies like the Internet of things (IoT) as it involves heterogeneous users and resources sharing sensitive data through cloud. These heterogeneous users and service providers have multifarious privacy and security requirements and need a common mechanism to share the same across the heterogeneous environments. Ontology proves to be an efficient means to handle heterogeneous environments for the following reasons: It provides simple means to share domain knowledge among entities, it is simple to reuse domain knowledge, and it facilitates convenient means to manage and manipulate domain entities and interrelationships among entities. The performance of ontologies is determined by their reasoning ability. In IoT, many devices are involved, and hence, multiple ontologies are to be involved. Still, most of the works mainly focus on only single ontology for reasoning. They lack considering reasoning involving multiple ontologies. This article proposed a deep learning method for associating various ontology rule bases and thus learning new inference rules and thereby providing efficient security in IoT applications. To verify the usefulness of the proposed work, it is realized in healthcare application and proves to achieve better security.
Conference Paper
Inference Enterprise Modeling (IEM) is a methodology developed to address test and evaluation limitations that insider threat detection enterprises face due to a lack of ground truth and/or missing data. IEM uses a collection of statistical, data processing, analysis, and machine learning techniques to estimate and forecast the performance of these enterprises. As part of developing the IEM method, models satisfying various detection system evaluation requirements were created. In this work, we extend IEM as a digital twin generation technique by representing modeled processes as executable UML Activity Diagrams and tracing solution processes to problem requirements using ontologies. Using the proposed framework, we can rapidly prototype a digital twin of a detection system that can also be imported and executed in systems engineering simulation software tools such as Cameo Enterprise Architecture Simulation Toolkit. Cyber security and threat detection is a continuous process that requires regular maintenance and testing throughout its lifecycle, but there often exists access issues for sensitive and private data and proprietary detection model details to perform adequate test and evaluation activities in the live production environment. To solve this issue, organizations can use a digital twin technique to create a real-time virtual counterpart of the physical system. We describe a method for creating digital twins of live and/or hypothetical insider threat detection enterprises for the purpose of performing test and evaluation activities on continuous monitoring systems that are sensitive to disruptions. In this work, we use UML Activity Diagrams to leverage the integrated simulation capabilities of Model-Based Systems Engineering (MBSE).
Article
Full-text available
Background Clear language makes communication easier between any two parties. A layman may have difficulty communicating with a professional due to not understanding the specialized terms common to the domain. In healthcare, it is rare to find a layman knowledgeable in medical terminology which can lead to poor understanding of their condition and/or treatment. To bridge this gap, several professional vocabularies and ontologies have been created to map laymen medical terms to professional medical terms and vice versa. Objective Many of the presented vocabularies are built manually or semi-automatically requiring large investments of time and human effort and consequently the slow growth of these vocabularies. In this paper, we present an automatic method to enrich laymen’s vocabularies that has the benefit of being able to be applied to vocabularies in any domain. Methods Our entirely automatic approach uses machine learning, specifically Global Vectors for Word Embeddings (GloVe), on a corpus collected from a social media healthcare platform to extend and enhance consumer health vocabularies. Our approach further improves the consumer health vocabularies by incorporating synonyms and hyponyms from the WordNet ontology. The basic GloVe and our novel algorithms incorporating WordNet were evaluated using two laymen datasets from the National Library of Medicine (NLM), Open-Access Consumer Health Vocabulary (OAC CHV) and MedlinePlus Healthcare Vocabulary. Results The results show that GloVe was able to find new laymen terms with an F-score of 48.44%. Furthermore, our enhanced GloVe approach outperformed basic GloVe with an average F-score of 61%, a relative improvement of 25%. Furthermore, the enhanced GloVe showed a statistical significance over the two ground truth datasets with P < 0.001. Conclusions This paper presents an automatic approach to enrich consumer health vocabularies using the GloVe word embeddings and an auxiliary lexical source, WordNet. Our approach was evaluated used healthcare text downloaded from MedHelp.org , a healthcare social media platform using two standard laymen vocabularies, OAC CHV, and MedlinePlus. We used the WordNet ontology to expand the healthcare corpus by including synonyms, hyponyms, and hypernyms for each layman term occurrence in the corpus. Given a seed term selected from a concept in the ontology, we measured our algorithms’ ability to automatically extract synonyms for those terms that appeared in the ground truth concept. We found that enhanced GloVe outperformed GloVe with a relative improvement of 25% in the F-score.
Preprint
Full-text available
Since using environments that are made according to the service oriented architecture, we have more effective and dynamic applications. Semantic matchmaking process is finding valuable service candidates for substitution. It is a very important aspect of using semantic Web Services. Our proposed matchmaker algorithm performs semantic matching of Web Services on the basis of input and output descriptions of semantic Web Services matching. This technique takes advantages from a graph structure and flow networks. Our novel approach is assigning matchmaking scores to semantics of the inputs and outputs parameters and their types. It makes a flow network in which the weights of the edges are these scores, using FordFulkerson algorithm, we find matching rate of two web services. So, all services should be described in the same Ontology Web Language. Among these candidates, best one is chosen for substitution in the case of an execution failure. Our approach uses the algorithm that has the least running time among all others that can be used for bipartite matching. The importance of problem is that in real systems, many fundamental problems will occur by late answering. So system`s service should always be on and if one of them crashes, it would be replaced fast. Semantic web matchmaker eases this process.
Preprint
Full-text available
In this work, we show how to discover a semantic web service among a repository of web services. A new approach for web service discovery based on calculating the functions similarity. We define the Web service functions with Ontology Web Language (OWL). We wrote some rules for comparing two web services` parameters. Our algorithm compares the parameters of two web services` inputs/outputs by making a bipartite graph. We compute the similarity rate by using the Ford-Fulkerson algorithm. The higher the similarity, the less are the differences between their functions. At last, our algorithm chooses the service which has the highest similarity. As a consequence, our method is useful when we need to find a web service suitable to replace an existing one that has failed. Especially in autonomic systems, this situation is very common and important since we need to ensure the availability of the application which is based on the failed web service. We use Universal Description, Discovery and Integration (UDDI) compliant web service registry.
Article
Research on the quality of data in a structural calculation document (SCD) is lacking, although the SCD of a bridge is used as an essential reference during the entire lifecycle of the facility. XML Schema matching enables qualitative improvement of the stored data. This study aimed to enhance the applicability of XML Schema matching, which improves the speed and quality of information stored in bridge SCDs. First, the authors proposed a method of reducing the computing time for the schema matching of bridge SCDs. The computing speed of schema matching was increased by 13 to 1800 times by reducing the checking process of the correlations. Second, the authors developed a heuristic solution for selecting the optimal weight factors used in the matching process to maintain a high accuracy by introducing a decision tree. The decision tree model was built using the content elements stored in the SCD, design companies, bridge types, and weight factors as input variables, and the matching accuracy as the target variable. The inverse-calculation method was applied to extract the weight factors from the decision tree model for high-accuracy schema matching results.
Article
Information retrieval has a well-established tradition of performing laboratory experiments on test collections to compare the relative effectiveness of different retrieval approaches. The experimental design specifies the evaluation criterion to be used to determine if one approach is better than another. Retrieval behavior is sufficiently complex to be difficult to summarize in one number, many different effectiveness measures have been proposed. A concept model is implicitly possessed by users and is generated from their background knowledge. This model learns ontological user profiles from both a world knowledge base and user local instance repositories. The ontology model is evaluated by comparing it against benchmark models in web information gathering. The results show that this ontology model is successful.
Chapter
This chapter introduces the reader to Part IV of the book, proposing and discussing a hybrid approach that may serve, not only to synthesize and represent knowledge obtained from the data, but also to explore possible future online learning environment (OLE) states, given different management, policy or environmental scenarios. Pragmatically, this chapter explores the potentiality of the quality of collaboration (QoC) within an Internet-based computer-supported collaborative learning environment and quality of interaction (QoI) with a LMS, both involving fuzzy logic-based modeling, as vehicles to improve the personalization and intelligence of an OLE. Furthermore, QoC and QoI can form the basis for a more pragmatic approach of OLEs under the perspective of semantic Web 3.0, within the context of Higher Education. Finally, a potential case study of the examined hybrid modeling, referring to the “i-TREASURES” European FP7 Programme, is discussed, to explore its applicability and functionality under pragmatic learning scenarios.
Chapter
Methods for the automatic extraction of taxonomies and concept hierarchies from data have recently emerged as essential assistance for humans in ontology construction. The objective of this chapter is to show how the extraction of concept hierarchies and finding relations between them can be effectively coupled with a multi-label classification task. The authors introduce a data mining system which performs classification and addresses both issues by means of association rule mining. The proposed system has been tested on two real-world datasets with the class labels of each dataset coming from two different class hierarchies. Several experiments on hierarchy extraction and concept relation were conducted in order to evaluate the system and three different interestingness measures were applied, to select the most important relations between concepts. One of the measures was developed by the authors. The experimental results showed that the system is able to infer quite accurate concept hierarchies and associations among the concepts. It is therefore well suited for classification-based reasoning.
Chapter
A multitude of approaches to match, merge, and integrate ontologies, and more recently, to interlink RDF data sets, have been proposed over the past years, making ontology alignment one of the most active and at the same time mature area of research and development in semantic technologies. While advances in the area cannot be contested, it is equally true that full automation of the ontology-alignment process is far from being feasible; human input is often indispensable for the bootstrapping of the underlying methods, and for the validation of the results. The question of acquiring and leveraging such human input remains largely unaddressed, in particular when it comes to the incentives and motivators that are likely to make users invest their valuable time and effort in alignment tasks such as entity interlinking and schema matching, which can be domain-knowledge-intensive, technical, or both. In this chapter, the authors present SpotTheLink, a game whose purpose addresses this challenge, demonstrating how knowledge-intensive tasks in the area of the Semantic Web can be collaboratively solved by a community of non-experts in an entertaining fashion.
Chapter
This chapter studies what semantic technologies can bring to the e-business domain and how they can be applied to it. After an overview of the goals to be achieved by e-business applications a large panel of existing e-business standards is detailed, with a specific focus on B2B (Business to Business) and their current modus operandi. Furthermore, some of the most relevant e-business ontologies are also presented. Next, the chapter argues that the use of semantic technologies will simplify the automatic management of many e-business partnerships. However the construction of ontologies brings a new level of complexity that might be facilitated by automating the great part of the generation process. For this purpose, the Janus system, which is a prototype to help with the automatic derivation of ontologies from XML Schemas, the de-facto format adopted in e-business standard applications was developed. Differently from existing systems, it permits to retrieve automatically conceptual knowledge from large XML corpus sources and is based on the use of the Semantic Data Model for Ontology (SDMO) whose advantages are presented in this chapter.
Chapter
Yannis Kalfoglou and Bo Hu argue for the use of a streamlined approach to integrate semantic integration systems. The authors elaborate on the abundance and diversity of semantic integration solutions and how this impairs strict engineering practice and ease of application. The versatile and dynamic nature of these solutions comes at a price: they are not working in sync with each other neither is it easy to align them. Rather, they work as standalone systems often leading to diverse and sometimes incompatible results. Hence the irony that we might need to address the interoperability issue of tools tackling information interoperability. Kalfoglou and Hu also report on an exemplar case from the field of ontology mapping where systems that used seemingly similar integration algorithms and data, yield different results which are arbitrary formatted and annotated making interpretation and reuse of the results difficult. This makes it difficult to apply semantic integration solutions in a principled manner. The authors argue for a holistic approach to streamline and glue together different integration systems and algorithms. This will bring uniformity of results and effective application of the semantic integration solutions. If the proposed streamlining respects design principles of the underlying systems, then the engineers will have maximum configuration power and tune the streamlined systems in order to get uniform and well understood results. The authors propose a framework for building such streamlined system based on engineering principles and an exemplar, purpose built system, CROSI Mapping System (CMS), which targets the problem of ontology mapping.
Article
Full-text available
In the era of mobile big data, data driven intelligent Internet of Things (IoT) applications are becoming widespread, and knowledge-based reasoning is one of the essential tasks of these applications. While most knowledge-based reasoning work is conducted with knowledge graph, ontology-based reasoning method can inherently achieve higher level intelligence by leveraging both explicit and tacit knowledge in specific domains, and its performance is determined by precise refinement of the inference rules. However, most ontology-based reasoning work concentrates on semantic reasoning in a single ontology, and fail to utilize association of multiple ontologies in various domains to extend reasoning capacity. This is even the case for the IoT applications where knowledge from multiple domains needs to be utilized. To overcome this issue, we propose a deep learning-based method to associate multiple ontology rule bases, thereby discover new inference rules. In our method, we first use a regression tree model to determine the threshold value for parameters in inference rules that comprise of the ontology rule base, avoiding the influence of uncertainty factors on knowledge reasoning results. Then, a two-way GRU (Gated Recurrent Unit) neural network with attention mechanism is used to discover semantic relations among the rule bases of ontologies. Therefore, the association of multiple ontology rule bases is realized, and the rule base of knowledge reasoning is expanded by acquiring some unspecified rules. To the best our knowledge, this work is the first one to leverage deep learning in reasoning with multiple ontologies. In order to verify the effectiveness of our method, we apply it in a real traffic safety monitoring application by relating rule bases of a vehicle ontology and a traffic management ontology, and achieve effective knowledge reasoning.
Article
Full-text available
There is a growing awareness that the complexity of managing Big Data is one of the main challenges in the developing field of the Internet of Things (IoT). Complexity arises from several aspects of the Big Data life cycle, such as gathering data, storing them onto cloud servers, cleaning and integrating the data, a process involving the last advances in ontologies, such as Extensible Markup Language (XML) and Resource Description Framework (RDF), and the application of machine learning methods to carry out classifications, predictions, and visualizations. In this review, the state of the art of all the aforementioned aspects of Big Data in the context of the Internet of Things is exposed. The most novel technologies in machine learning, deep learning, and data mining on Big Data are discussed as well. Finally, we also point the reader to the state-of-the-art literature for further in-depth studies, and we present the major trends for the future.
Article
Full-text available
In the field of learning, we are witnessing more and more the introduction of new environments in order to better meet the specific needs of the main actors of the process. The shift from face-to-face learning to distance learning or e-learning has overcome some of the challenges of availability, location, prerequisites, but has been rapidly impacted by the development of mobile technology. As a result, m-learning appeared and quickly evolved into p-learning. The arrival of the "Open Software" concept has given birth to several "open-something" initiatives, among which are the Open Educational Resource (OER) and the Massive Online Open Course (MOOC). These learning resources have also made progress, although they are fairly recent. Admittedly, this diversity of environments offers a wealth and a multitude of pedagogical resources. However, the question of the capitalization of contents, knowledge and know-how of each of these environments is necessary. How can the exchange and reuse of pedagogical resources be guaranteed between these different learning environ-ments? otherwise-said how to guarantee the interoperability of these resources? In order to contribute to the creation of an pedagogical heritage, we propose to design a case-based system allowing the author, when creating a course in a particular context and environment, to exploit the resources that are already available. The goal is to put in place an intelligent production system based on case-based reasoning. It is based on four phases ranging from indexing to reuse, through the similarity measurement and the evaluation. In the first part, we will detail the evolution of learning environments. In the second part, we will review the existing course production platforms, their prin-ciples and their challenges. In the third part, we will present case-based reasoning systems, and then we will introduce our target system.
Conference Paper
In this paper, we propose a new way for the ontology alignment formalization using the Kripke structure in order to be able to exploiting a reliable tool as the Model checking. This later is a powerful mechanism for system verification. Here, the Kripke structures are used to model the behaviors of the ontology of alignment.
Article
Mining frequent patterns are most widely used in many applications such as supermarkets, diagnostics, and other real-time applications. Performance of the algorithm is calculated based on the computation of the algorithm. It is very tedious to compute the frequent patterns in mining. Many algorithms and techniques are implemented and studied to generate the high-performance algorithms such as Prepost+ which employees the N-list to represent itemsets and directly discovers frequent itemsets using a set-enumeration search tree. But due to its pruning strategy, it is known that the computation time is more for processing the search space. It enumerates all item sets from datasets by the principle of exhaustion and they don’t sort them based on utility, but only a statistical proof of most recurring itemset. In this paper, the proposed Enhanced Ontologies based Alignment Algorithm (EOBAA) to identify, extract, sort out the HUI's from FI's. To improve the similarity measure the proposed system adopted Cosine similarity. The experiments conducted on 1 real datasets and show the performance of the EOBAA based on the computation time and accuracy of the proposed EOBAA.
Article
يعتبر أسلوب الفوكسونومي في وقتنا الحالي واحداً من أبرز الاتجاهات الهامة للإنترنت، حيث يعتبر فرع خصب للنمو وجزء أساسي من تطبيقات الويب 2.0، حيث يشير جزئيا- إلي قدرة مستخدمي الإنترنت إلي الإضافة والتغيير والتحديث في محتويات الشبكة العنكبوتية العالمية للإنترنت (1). هذا ويرجع السبق في ابتكار أسلوب" الفوكسونومي (Folksonomy) إلي مهندس المعلومات الشهير Thomas Vander Wal – والذي صك مصطلح الفولكسونومي – حيث أشار أن أسلوب الفوكسونومي ينشأ عندما يقوم المستخدمين بإضافة كلمات مفتاحيه أو تيجان (tags) للعناصر من اختيارهم علي الشبكة العنكبوتية العالمية والتي يمكن استخدامها بعد ذلك لأغراض البحث واسترجاع المعلومات
Article
Purpose This paper aims to apply vector space model (VSM)-PCR model to compute the similarity of Fault zone ontology semantics, which verified the feasibility and effectiveness of the application of VSM-PCR method in uncertainty mapping of ontologies. Design/methodology/approach The authors first define the concept of uncertainty ontology and then propose the method of ontology mapping. The proposed method fully considers the properties of ontology in measuring the similarity of concept. It expands the single VSM of concept meaning or instance set to the “meaning, properties, instance” three-dimensional VSM and uses membership degree or correlation to express the level of uncertainty. Findings It provides a relatively better accuracy which verified the feasibility and effectiveness of VSM-PCR method in treating the uncertainty mapping of ontology. Research limitations/implications The future work will focus on exploring the similarity measure and combinational methods in every dimension. Originality/value This paper presents an uncertain mapping method of ontology concept based on three-dimensional combination weighted VSM, namely, VSM-PCR. It expands the single VSM of concept meaning or instance set to the “meaning, properties, instance” three-dimensional VSM. The model uses membership degree or correlation which is used to express the degree of uncertainty; as a result, a three-dimensional VSM is obtained. The authors finally provide an example to verify the feasibility and effectiveness of VSM-PCR method in treating the uncertainty mapping of ontology.
Chapter
This paper presents the decisions taken during the implementation of DSSim (DSSim stands for Similarity based on Dempster-Shafer) our multi-agent ontology mapping system. It describes several types of agents and their roles in the DSSim architecture. These agents are mapping agents which are able to perform either semantic or syntactic similarity. Our architecture is generic as no mappings need to be learned in advance and it could be easily extended by adding new mapping agents in the framework. The new added mapping agents could run different similarity algorithms (either semantic or syntactic). In this way, DSSim could assess which algorithm has a better performance. Additionally, this paper presents the algorithms used in our ontology alignment system DSSim.
Conference Paper
After years of research and development, standards and technologies for semantic data are sufficiently mature to be used as the foundation of novel data science projects that employ semantic technologies in various application domains such as bio-informatics, materials science, criminal intelligence, and social science. Typically, such projects are carried out by domain experts who have a conceptual understanding of semantic technologies but lack the expertise to choose and to employ existing data management solutions for the semantic data in their project. For such experts, including domain-focused data scientists, project coordinators, and project engineers, our tutorial delivers a practitioner's guide to semantic data management. We discuss the following important aspects of semantic data management and demonstrate how to address these aspects in practice by using mature, production-ready tools: i) storing and querying semantic data; ii) understanding, iii) searching, and iv) visualizing the data; v) automated reasoning; vi) integrating external data and knowledge; and vii) cleaning the data.
Article
Full-text available
Ontology mapping indicates the semantic interconnection between the concepts of ontologies, while multi-domain ontology mapping is usually used to solve the semantic interconnection problem between domain ontologies. However, due to the differences in the definition approaches, there exists the heterogeneity among the domain ontologies to a certain extent. This paper proposes a probability-based and similarity-based ontology mapping algorithm, the purpose of which is to calculate the similarity between the concepts of the multi-domain ontology. Using the ESA algorithm based on Wikipedia and the principle that the similarity between the concepts with the same name equals 1, the paper proposes a new concept, ontology mapping association graph, to represent mapping results. The experiments show that the accuracy rate of the probability-based and similarity-based ontology mapping algorithm can reach 80% on both two Chinese test sets, namely, WordSimilarity-353 and Words-240. Compared with other algorithms, it does stand out on the aspect of accuracy.
Article
Full-text available
Researchers in the ontology-design field have developed the content for ontologies in many domain areas. Recently, ontologies have become increasingly common on the World- Wide Web where they provide semantics for annotations in Web pages. This distributed nature of ontology development has led to a large number of ontologies covering overlapping domains. In order for these ontologies to be reused, they first need to be merged or aligned to one another. The processes of ontology alignment and merging are usually handled manually and often constitute a large and tedious portion of the sharing process. We have developed and implemented PROMPT, an algorithm that provides a semi-automatic approach to ontology merging and alignment. PROMPT performs some tasks automatically and guides the user in performing other tasks for which his intervention is required. PROMPT also determines possible inconsistencies in the state of the ontology, which result from the user's actions, and suggests ways to remedy these inconsistencies. PROMPT is based on an extremely general knowledge model and therefore can be applied across various platforms. Our formative evaluation showed that a human expert followed 90% of the suggestions that PROMPT generated and that 74% of the total knowledge-base operations invoked by the user were suggested by PROMPT.
Article
Full-text available
Researchers in the ontology-design field have developed the content for ontologies in many domain areas. Recently, ontologies have become increasingly common on the World- Wide Web where they provide semantics for annotations in Web pages. This distributed nature of ontology development has led to a large number of ontologies covering overlapping domains, which researchers now need to merge or align to one another. The processes of ontology alignment and merging are usually handled manually and often constitute a large and tedious portion of the sharing process. We have developed and implemented Anchor-PROMPT—an algorithm that finds semantically similar terms automatically. Anchor-PROMPT takes as input a set of anchors—pairs of related terms defined by the user or automatically identified by lexical matching. Anchor- PROMPT treats an ontology as a graph with classes as nodes and slots as links. The algorithm analyzes the paths in the subgraph limited by the anchors and determines which classes frequently appear in similar positions on similar paths. These classes are likely to represent semantically similar concepts. Our experiments show that when we use Anchor-PROMPT with ontologies developed independently by different groups of researchers, 75% of its results are correct.
Article
Full-text available
This paper introduces stacked generalization, a scheme for minimizing the generalization error rate of one or more generalizers. Stacked generalization works by deducing the biases of the generalizer(s) with respect to a provided learning set. This deduction proceeds by generalizing in a second space whose inputs are (for example) the guesses of the original generalizers when taught with part of the learning set and trying to guess the rest of it, and whose output is (for example) the correct guess. When used with multiple generalizers, stacked generalization can be seen as a more sophisticated version of cross-validation, exploiting a strategy more sophisticated than cross-validation's crude winner-takes-all for combining the individual generalizers. When used with a single generalizer, stacked generalization is a scheme for estimating (and then correcting for) the error of a generalizer which has been trained on a particular learning set and then asked a particular question. After introducing stacked generalization and justifying its use, this paper presents two numerical experiments. The first demonstrates how stacked generalization improves upon a set of separate generalizers for the NETtalk task of translating text to phonemes. The second demonstrates how stacked generalization improves the performance of a single surface-fitter. With the other experimental evidence in the literature, the usual arguments supporting cross-validation, and the abstract justifications presented in this paper, the conclusion is that for almost any real-world generalization problem one should use some version of stacked generalization to minimize the generalization error rate. This paper ends by discussing some of the variations of stacked generalization, and how it touches on other fields like chaos theory.
Conference Paper
Full-text available
Researchers in the ontology-design field have developed the content for ontologies in many domain areas. Recently, ontologies have become increasingly common on the World- Wide Web where they provide semantics for annotations in Web pages. This distributed nature of ontology development has led to a large number of ontologies covering overlapping domains. In order for these ontologies to be reused, they first need to be merged or aligned to one another. The processes of ontology alignment and merging are usually handled manually and often constitute a large and tedious portion of the sharing process. We have developed and implemented PROMPT, an algorithm that provides a semi-automatic approach to ontology merging and alignment. PROMPT performs some tasks automatically and guides the user in performing other tasks for which his intervention is required. PROMPT also determines possible inconsistencies in the state of the ontology, which result from the user's actions, and suggests ways to remedy these inconsistencies. PROMPT is based on an extremely general knowledge model and therefore can be applied across various platforms. Our formative evaluation showed that a human expert followed 90% of the suggestions that PROMPT generated and that 74% of the total knowledge-base operations invoked by the user were suggested by PROMPT.
Article
Full-text available
The Semantic Web relies heavily on the formal ontologies that structure underlying data for the purpose of comprehensive and transportable machine understanding. Therefore, the success of the Semantic Web depends strongly on the proliferation of ontologies, which requires fast and easy engineering of ontologies and avoidance of a knowledge acquisition bottleneck. Ontology Learning greatly facilitates the construction of ontologies by the ontology engineer. The vision of ontology learning that we propose here includes a number of complementary disciplines that feed on different types of unstructured, semi-structured and fully structured data in order to support a semi-automatic, cooperative ontology engineering process. Our ontology learning framework proceeds through ontology import, extraction, pruning, refinement, and evaluation giving the ontology engineer a wealth of coordinated tools for ontology modeling. Besides of the general framework and architecture, we show in thi...
Article
Full-text available
Schema matching is a basic problem in many database application domains, such as data integration, E- business, data warehousing, and semanticquery proc essing. In current implementations, schema matching is typically per- formed manually, which has significant limitations. On the other hand, previous research papers have proposed many techniques to achieve a partial automation of the match op- eration for specific application domains. We present a taxon- omy that covers many of these existing approaches, and we describe the approaches in some detail. In particular, we distin- guish between schema-level and instance-level, element-level and structure-level, and language-based and constraint-based matchers. Based on our classification we review some pre- vious match implementations thereby indicating which part of the solution space they cover. We intend our taxonomy and review of past work to be useful when comparing different ap- proaches to schema matching, when developing a new match algorithm, and when implementing a schema matching com- ponent.
Article
Full-text available
Without semantically enriched content, the Web cannot reach its full potential. The authors discuss tools and techniques for generating and processing such content, thus setting a foundation upon which to build the Semantic Web. The authors put a Semantic Web language through its paces and answer questions about how people can use it, such as: how do authors generate semantic descriptions; how do agents discover these descriptions; how can agents integrate information from different sites; and how can users query the Semantic Web.
Article
Full-text available
We here give a short overview and research challenges with respect to the combination of machine learning research with Semantic Web research: Extraction of ontologies from existing data on the Web. The task of extracting ontologies is a typical re-engineering task. In general one may roughly distinguish between existing ontologies (such as thesauri, lexical-semantic nets), schemata (such as relational database, web schemata), instances (in data- and knowledge bases), semi-structured data (e.g. in the form of XML documents), natural language documents. Each of these different kinds of data requires its specific import and processing techniques and learning algorithms. To derive ontologies from existing data on the Web a common picture and framework for re-engineering existing data as given in [8] is required. The integration of multiple resources seems to be a promising approach for the difficult task of extracting ontologies form the existing Web data. Extr
Article
Full-text available
The thesis describes the application of the relaxation labelling algorithm to NLP disambiguation. Language is modelled through context constraint inspired on Constraint Grammars. The constraints enable the use of a real value statind "compatibility". The technique is applied to POS tagging, Shallow Parsing and Word Sense Disambigation. Experiments and results are reported. The proposed approach enables the use of multi-feature constraint models, the simultaneous resolution of several NL disambiguation tasks, and the collaboration of linguistic and statistical models.
Article
Stacked generalization is a general method of using a high-level model to combine lower-level models to achieve greater predictive accuracy. In this paper we address two crucial issues which have been considered to be a ‘black art’ in classification tasks ever since the introduction of stacked generalization in 1992 by Wolpert: the type of generalizer that is suitable to derive the higher-level model, and the kind of attributes that should be used as its input. We find that best results are obtained when the higher-level model combines the confidence (and not just the predictions) of the lower-level ones. We demonstrate the effectiveness of stacked generalization for combining three different types of learning algorithms for classification tasks. We also compare the performance of stacked generalization with majority vote and published results of arcing and bagging.
Conference Paper
A data-integration system provides access to a multitude of data sources through a single mediated schema. A key bottleneck in building such systems has been the laborious manual construction of semantic mappings between the source schemas and the mediated schema. We describe LSD, a system that employs and extends current machine-learning techniques to semi-automatically find such mappings. LSD first asks the user to provide the semantic mappings for a small set of data sources, then uses these mappings together with the sources to train a set of learners. Each learner exploits a different type of information either in the source schemas or in their data. Once the learners have been trained, LSD finds semantic mappings for a new data source by applying the learners, then combining their predictions using a meta-learner. To further improve matching accuracy, we extend machine learning techniques so that LSD can incorporate domain constraints as an additional source of knowledge, and develop a novel learner that utilizes the structural information in XML documents. Our approach thus is distinguished in that it incorporates multiple types of knowledge. Importantly, its architecture is extensible to additional learners that may exploit new kinds of information. We describe a set of experiments on several real-world domains, and show that LSD proposes semantic mappings with a high degree of accuracy.
Article
A large class of problems can be formulated in terms of the assignment of labels to objects. Frequently, processes are needed which reduce ambiguity and noise, and select the best label among several possible choices. Relaxation labeling processes are just such a class of algorithms. They are based on the parallel use of local constraints between labels. This paper develops a theory to characterize the goal of relaxation labeling. The theory is founded on a definition of con-sistency in labelings, extending the notion of constraint satisfaction. In certain restricted circumstances, an explicit functional exists that can be maximized to guide the search for consistent labelings. This functional is used to derive a new relaxation labeling operator. When the restrictions are not satisfied, the theory relies on variational cal-culus. It is shown that the problem of finding consistent labelings is equivalent to solving a variational inequality. A procedure nearly identical to the relaxation operator derived under restricted circum-stances serves in the more general setting. Further, a local convergence result is established for this operator. The standard relaxation labeling formulas are shown to approximate our new operator, which leads us to conjecture that successful applications of the standard methods are explainable by the theory developed here. Observations about con-vergence and generalizations to higher order compatibility relations are described.
Conference Paper
At the heart of many data-intensive applications is the problem of quickly and accurately transforming data into a new form. Database researchers have long advocated the use of declarative queries for this process. Yet tools for creating, managing and understanding the complex queries necessary for data transformation are still too primitive to permit widespread adoption of this approach. We present a new framework that uses data examples as the basis for understanding and refining declarative schema mappings. We identify a small set of intuitive operators for manipulating examples. These operators permit a user to follow and refine an example by walking through a data source. We show that our operators are powerful enough both to identify a large class of schema mappings and to distinguish effectively between alternative schema mappings. These operators permit a user to quickly and intuitively build and refine complex data transformation queries that map one data source into another.
Article
It is shown that the relaxation labelling process of Rosenfeld, Hummel and Zucker is a suboptimal minimization of a cost function measuring inconsistency and ambiguity. Two new algorithms which minimize this cost function more efficiently are introduced. Finally, some general comments on relaxation are presented.
Article
"A Wiley-Interscience publication" Incluye bibliografía e índice
Article
Integration and grounding are key AI challenges for human-robot dialogue. The author and his team are tackling these issues using language games and have experimented with them on progressively more complex platforms. A language game is a sequence of verbal interactions between two agents situated in a specific environment. Language games both integrate the various activities required for dialogue and ground unknown words or phrases in a specific context, which helps constrain possible meanings.
Article
In this paper, we give an overview of a system (CAIMAN) that can facilitate the exchange of relevant documents between geographically dispersed people in Communities of Interest. The nature of Communities of Interest prevents the creation and enforcement of a common organizational scheme for documents, to which all community members adhere. Each community member organizes her documents according to her own categorization scheme (ontology). CAIMAN exploits this personal ontology, which is essentially the perspective of a user on a domain, for information retrieval. Related documents are retrieved on a concept granularity level from a central community document repository.
Article
To manage information like ontology, we usually use categorization with concept hierarchy. Such concept hierarchies are managed individual for each system due to the many differences in concept hierarchies. Consequently, it is difficult to reuse information in computer-based systems. Here, we propose a new concept alignment method for concept hierarchies as a solution to this problem, and construct a system to evaluate the performance of our method. The results of this experiment reveal that the proposed method can be used to induce appropriate align rules for concept hierarchies and classify information into appropriate categories within another concept hierarchy.
Article
Integration of knowledge from multiple independent sources presents problems due to their semantic heterogeneity. Careful handling of semantics is important for reliable interaction with autonomous sources. This paper highlights some of the issues involved in automating the process of selective integration and details the techniques to deal with them. The approach taken is semi-automatic in nature focusing on identifying the articulation over two ontologies, i.e., the terms where linkage occurs among the sources. A semantic knowledge articulation tool (SKAT) based on simple lexical and structural matching works well in our experiments and semi-automatically detects the intersection of two web sources. An expert can initially provide both positive and negative matching rules on the basis of which the articulation is to be determined and then override the automatically generated articulation before it is finalized. The articulation may be stored or generated on demand and is used to answe...
Article
. The simple Bayesian classifier is known to be optimal when attributes are independent given the class, but the question of whether other sufficient conditions for its optimality exist has so far not been explored. Empirical results showing that it performs surprisingly well in many domains containing clear attribute dependences suggest that the answer to this question may be positive. This article shows that, although the Bayesian classifier's probability estimates are only optimal under quadratic loss if the independence assumption holds, the classifier itself can be optimal under zero-one loss (misclassification rate) even when this assumption is violated by a wide margin. The region of quadratic-loss optimality of the Bayesian classifier is in fact a second-order infinitesimal fraction of the region of zero-one optimality. This implies that the Bayesian classifier has a much greater range of applicability than previously thought. For example, in this article it is shown to be opti...
Article
One of the basic problems in the development of techniques for the semantic web is the integration of ontologies. In this paper we deal with a situation where we have various local ontologies, developed independently from each other, and we are required to build an integrated, global ontology as a mean for extracting information from the local ones. In this context, the problem of how to specify the mapping between the global ontology and the local ontologies is a fundamental one, and its solution is essential for establishing an ontology of integration. Description Logics (DLs) are an ideal candidate to formalize ontologies, due to their ability to express complex relationships between concepts. We argue, however, that, for capturing the mapping between dierent ontologies, the direct use of a DL, even a very expressive one, is not sucient, and it is necessary to resort to more exible mechanisms based on the notion of query. Also, we elaborate on the observation that, in the semantic web, the case of mutually inconsistent local ontologies will be very common, and we present the basic ideas in order to extend the integration framework with suitable nonmonotonic features for dealing with this case. 1
Article
The next generation of the Web, called Semantic Web, has to improve the Web with semantic (ontological) page annotations to enable knowledge-level querying and searches. Manual construction of these ontologies will require tremendous efforts that force future integration of machine learning with knowledge acquisition to enable highly automated ontology learning. In the paper we present the state of the-art in the field of ontology learning from the Web to see how it can contribute to the task of semantic Web querying. We consider three components of the query processing system: natural language ontologies, domain ontologies and ontology instances. We discuss the requirements for machine learning algorithms to be applied for the learning of the ontologies of each type from the Web documents, and survey the existent ontology learning and other closely related approaches.
Information Retrieval . London :Butterworths, 1979 . Second Edition . van Rijsbergen
  • Van Rijsbergen
D. Lin. An Information-Theoritic Definiton of Similarity
  • D Lin
  • Lin D.
Semantic Web Working Symposium (SWWS) Position Paper , 2001 . A. Maedche. A Machine Learning Perspective for the Semantic Web
  • A Maedche
  • Maedche A.