A labeled example of SR-CRF.

A labeled example of SR-CRF.

Source publication
Article
Full-text available
Constructing ontology manually is a time-consuming, error-prone, and tedious task. We present SSCO, a self-supervised learning based chinese ontology, which contains about 255 thousand concepts, 5 million entities, and 40 million facts. We explore the three largest online Chinese encyclopedias for ontology learning and describe how to transfer the...

Similar publications

Article
Full-text available
Nowadays, there is a huge amount of textual data coming from on-line social communities like Twitter or encyclopedic data provided by Wikipedia and similar platforms. This Big Data Era created novel challenges to be faced in order to make sense of large data storages as well as to efficiently find specific information within them. In a more domain-...

Citations

... In some cases, methods or resources have been proposed to collect and generate the training dataset, automatically and without the need for manual intervention. As an example, Hu et al. [31] proposed online encyclopedias as a good source to automatically collect and generate training dataset. ...
Article
Full-text available
With the widespread access of people to the Internet and the increasing usage of social networks in all nations, social networks have become a new source to study cultural similarities and differences. We identified major issues in traditional methods of data collection in cross-cultural studies: difficulty in access to people from many nations, limited number of samples, negative effects of translation, positive self-enhancement illusion, and a few unreported problems. These issues are either causing difficulty to perform a cross-cultural study or have negative impacts on the validity of the final results. In this paper, we propose a framework that aims to calculate cultural distance among several countries using the information and cultural features extracted from social networks. To this aim, the framework estimates the distribution of news-oriented tweets for each nation and computes the cultural distance from these sets of distributions. Based on a sample composed of more than 17 million tweets from late 2017, our framework calculated cultural distance between 22 countries. Our results show a positive correlation between cultural distances computed by our framework and distances computed by Hofstede's cultural scores and also identified connections between some of the cultural features.
... In this phase, a number of large-scale Chinese knowledge bases have also emerged, including Zhishi.me [16], SSCO [17] [20] and SNOMED-CT [21] promote standardization and inter-operability for biomedical information systems and services. DrugBank [22] and SIDER [23] contain drug-related information. ...
Article
Full-text available
Background Diabetes has become one of the hot topics in life science researches. To support the analytical procedures, researchers and analysts expend a mass of labor cost to collect experimental data, which is also error-prone. To reduce the cost and to ensure the data quality, there is a growing trend of extracting clinical events in form of knowledge from electronic medical records (EMRs). To do so, we first need a high-coverage knowledge base (KB) of a specific disease to support the above extraction tasks called KB-based Extraction. Methods We propose an approach to build a diabetes-centric knowledge base (a.k.a. DKB) via mining the Web. In particular, we first extract knowledge from semi-structured contents of vertical portals, fuse individual knowledge from each site, and further map them to a unified KB. The target DKB is then extracted from the overall KB based on a distance-based Expectation-Maximization (EM) algorithm. Results During the experiments, we selected eight popular vertical portals in China as data sources to construct DKB. There are 7703 instances and 96,041 edges in the final diabetes KB covering diseases, symptoms, western medicines, traditional Chinese medicines, examinations, departments, and body structures. The accuracy of DKB is 95.91%. Besides the quality assessment of extracted knowledge from vertical portals, we also carried out detailed experiments for evaluating the knowledge fusion performance as well as the convergence of the distance-based EM algorithm with positive results. Conclusions In this paper, we introduced an approach to constructing DKB. A knowledge extraction and fusion pipeline was first used to extract semi-structured data from vertical portals and individual KBs were further fused into a unified knowledge base. After that, we develop a distance based Expectation Maximization algorithm to extract a subset from the overall knowledge base forming the target DKB. Experiments showed that the data in DKB are rich and of high-quality.
... Here is a list of tools recommended by W3C, since it promotes web standards. However, there are others (Chinese Emotion Ontology (Hu et al., 2014), EMOTIME 5 , Senti-TUT 6 ). ...
Conference Paper
Full-text available
The immense contribution of the social web has greatly motivated researchers. This led to the emergence of techniques that have proven their effectiveness in customised opinion and emotion modeling related to applications such as NLP, automatic learning etc.... On the other hand, when it comes to interoperability and a unique encoding of opinions and emotions, there are some weaknesses. This prompted a new research direction that combines opinion analysis works with those of "Linked Data". In this article, we will expose different solutions and projects by presenting some limitations. The reasons why we also believe that linked data is important and how we would like to conduct this research are also detailed in this article.
... Domain ontology is a formal description of a discourse domain. It typically consists of a finite list of concepts, the relationships between these concepts [4]. When utilizing the domain ontology in extraction task, we just need to parse the reviews with shallow semantic analysis (tokenization and part-of-speech tagging), and then run some kind of heuristic matching algorithms to get the features. ...
... Since the infobox and the navbox are edited by users according to specified forms, they are written with reference to professional dictionaries, books, or research articles [6]. In [4], researchers adopts the title of documents as terms for concepts and instances and use the infobox modules for attribute learning. [7] describe an autonomous system for refining Wikipedia's infobox to an ontology. ...
... [6] and [10] conclude patterns from PoS tagging and syntactic parsing results, such as NN (Noun-Noun) and S-O (Subject-Verb-Object). In [4], researchers summarize several synonymous relationship patterns by calculating the frequency of certain phrases within 1000 sentences. However, linguistic patterns relies heavily on individual language. ...
Conference Paper
With the rapid development of E-commerce in China, quality of the products on online shopping platforms has caused wide concern. Customer reviews, which commented by people who bought the very product, now have been one of the most important resources for analyzing product's quality risk. We can get fine-gained, aspect-oriented risk information of a product by mining its reviews. Unfortunately, people tend to write reviews with casual grammar or just omit parts of components of a sentence. Both these features will cause negative impacts when parsing the raw customer reviews directly. Thus a knowledge base which is built totally beyond the reviews could be used to analyze it despite the drawbacks above. In this paper, we generate a domain ontology from raw text in the online encyclopedia. It can be viewed as a graph whose nodes represent domain concepts and edges represent the relations between these concepts. In our work, we integrate syntactic tree structure in linear-chain CRFs for recognizing domain concepts and train SVMs and MaxEnt models on elaborate features for clarifying three types of relationship, namely "Attribute-of", "Part-of" and "Instance-of". Once the ontology has been built, product properties with potential risk will be extracted by our matching method. Experiment show that our approach achieves 64.4% precision and 82.4% recall on risky property extraction task.
... In education domain, knowledge point is the basic elements and the foundation of the relationship between them. Hence, automatic extraction of knowledge is the key of ontology learning [7]. Generally, there are three ways for automatic extraction of knowledge in the field of education: linguistics method, statistical method, and hybrid method [8]. ...
Article
Full-text available
In recent years, Massive Open Online Courses (MOOCs) are very popular among college students and have a powerful impact on academic institutions. In the MOOCs environment, knowledge discovery and knowledge sharing are very important, which currently are often achieved by ontology techniques. In building ontology, automatic extraction technology is crucial. Because the general methods of text mining algorithm do not have obvious effect on online course, we designed automatic extracting course knowledge points (AECKP) algorithm for online course. It includes document classification, Chinese word segmentation, and POS tagging for each document. Vector Space Model (VSM) is used to calculate similarity and design the weight to optimize the TF-IDF algorithm output values, and the higher scores will be selected as knowledge points. Course documents of “C programming language” are selected for the experiment in this study. The results show that the proposed approach can achieve satisfactory accuracy rate and recall rate.
Conference Paper
The main construction method of the current ontology is to rely on ontology experts for manual construction. Because manual construction requires a lot of manual participation, manual construction has great limitations. Text data as one of the main forms of data source, how to construct domain ontology automatically from texts and how to provide semantic retrieval support to text quickly by ontology is the hotspot of ontology research at present. Aiming at the above problems, an automatic construction method of domain ontology based on knowledge graph and association rule mining is presented, and it can extract the concepts, hierarchies and non-hierarchies of domain ontology from text, and finally form ontology by Jena. It also provides semantic retrieval of text by associating text and concepts in the process of ontology construction. Finally, the effect of automatic ontology construction is verified by the effect of text retrieval.
Conference Paper
This paper proposes a machine learning-based electric vehicle (EV) reserved charging service system, which takes into consideration the impacts from both the power system and transportation system. The proposed framework of charging network operation service platform links the power system with transportation system through the charging navigation of massive EVs. The "reserved charging + consumption" integrated service model would be great significant for dealing with large-scale integration of electric vehicles. It applies the concept of charging time window to optimization of EV charging prediction for the reserved charging service system, and designs a dynamic dispatching model based on sliding time axis to make charging process of users get rid of constraints of queuing time and charging service fee period.
Chapter
In order to improve the integration and access efficiency of agricultural information, this paper propose an agricultural information integration framework based on knowledge graph. A knowledge graph of agricultural products producing and managing was constructed, covering the basic process of “Planting - farming - processing - quality inspection - warehousing - Transportation - Sales” and realizing the storage, mapping and inquiry of knowledge graph. Improves the method of mapping data linkage based on database mapping relation, and realizes the transformation of elements from database to knowledge graph elements. Map data link method of database based on mapping relations, realize the conversion of database elements to the knowledge graph elements, the iterative discovery of relation and pattern in text information is realized by means of weak supervised machine learning method. This method integrates the application in the Green-Cloud-Grid platform, and improves the efficiency of information source integration, correlation analysis and mining utilization under the platform.