SVMs try to maximize the margin of separation between positive and negative examples

Source publication

Classifying Wikipedia articles into NE's using SVM's with threshold adjustment

Conference Paper

Full-text available

Jul 2010

In this paper, a method is presented to recognize multilingual Wikipedia named entity articles. This method classifies multilingual Wikipedia articles using a variety of structured and unstructured features and is aided by cross-language links and features in Wikipedia. Adding multilingual features helps boost classification accuracy and is shown t...

Analysing Geo-linguistic Dynamics of the World Wide Web: The Use of Cartograms and Network Analysis to Understand Linguistic Development in Wikipedia

Article

Full-text available

Jan 2010

This article discusses the usefulness of geo-linguistic analysis for Internet studies by presenting two techniques to frame and visualize the linguistic development of the World Wide Web, in particular the geo-linguistic development amongst different language versions of Wikipedia. An emergent research agenda has been set to explore the multilingua...

This is a Pageviews Analysis of multiple English Wikipedia articles...

Overview of pageviews for articles in the English Wikipedia’s category...

The most accessed language versions of Wikipedia articles for "vaccine...

SELF magazine provided this photograph by Heather Hazzan with a free...

Wikipedia for multilingual COVID-19 vaccine education at scale

Article

Full-text available

Jul 2021

We present the design of a project to develop Wikipedia content on general vaccine safety and the COVID-19 vaccines, specifically. This proposal describes what a team would need to distribute public health information in Wikipedia in multiple languages in response to a disaster or crisis, and to measure and report the communication impact of the sa...

Joint Multilingual Supervision for Cross-lingual Entity Linking

Preprint

Full-text available

Sep 2018

Cross-lingual Entity Linking (XEL) aims to ground entity mentions written in any language to an English Knowledge Base (KB), such as Wikipedia. XEL for most languages is challenging, owing to limited availability of resources as supervision. We address this challenge by developing the first XEL approach that combines supervision from multiple langu...

Figure 1: How to use entity representations in downstream tasks. The...

The top-1 accuracies from 9 languages from the mLAMA dataset.

summarizes the task-specific hyperparameter search spaces.

mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models

Preprint

Full-text available

Oct 2021

Recent studies have shown that multilingual pretrained language models can be effectively improved with cross-lingual alignment information from Wikipedia entities. However, existing methods only exploit entity information in pretraining and do not explicitly use entities in downstream tasks. In this study, we explore the effectiveness of leveragin...

Are Multilingual Models the Best Choice for Moderately Under-resourced Languages? A Comprehensive Assessment for Catalan

Preprint

Full-text available

Jul 2021

Multilingual language models have been a crucial breakthrough as they considerably reduce the need of data for under-resourced languages. Nevertheless, the superiority of language-specific models has already been proven for languages having access to large amounts of data. In this work, we focus on Catalan with the aim to explore to what extent a m...

ClassifyWiki: A Framework for Building Generic-Type Wikipedia Classifiers

Article

Feb 2022

Classifying Articles in Chinese Wikipedia with Fine-Grained Named Entity Types

Article

Sep 2014

Named entity classification of Wikipedia articles is a fundamental research area that can be used to automatically build large-scale corpora of named entity recognition or to support other entity processing, such as entity linking, as auxiliary tasks. This paper describes a method of classifying named entities in Chinese Wikipedia with fine-grained types. We considered multi-faceted information in Chinese Wikipedia to construct four feature sets, designed different feature selection methods for each feature, and fused different features with a vector space using different strategies. Experimental results show that the explored feature sets and their combination can effectively improve the performance of named entity classification.

Automatically Developing a Fine-grained Arabic Named Entity Corpus and Gazetteer by utilizing Wikipedia

Conference Paper

Full-text available

Jul 2013

This paper presents a methodology to ex-ploit the potential of Arabic Wikipedia to as-sist in the automatic development of a large Fine-grained Named Entity (NE) corpus and gazetteer. The corner stone of this approach is efficient classification of Wikipedia articles to target NE classes. The resources developed were thoroughly evaluated to ensure reliability and a high quality. Results show the developed gazetteer boosts the performance of the NE classifier on a news-wire domain by at least 2 points F-measure. Moreover, by combining a learning NE classifier with the developed cor-pus the score achieved is a high F-measure of 85.18%. The developed resources overcome the limitations of traditional Arabic NE tasks by more fine-grained analysis and providing a beneficial route for further studies.

Mapping Arabic Wikipedia into the Named Entities Taxonomy

Conference Paper

Full-text available

Dec 2012

This paper describes a comprehensive set of experiments conducted in order to classify Arabic Wikipedia articles into predefined sets of Named Entity classes. We tackle using four different classifiers, namely: Naïve Bayes, Multinomial Naïve Bayes, Support Vector Machines, and Stochastic Gradient Descent. We report on several aspects related to classification models in the sense of feature representation, feature set and statistical modelling. The results reported show that, we are able to correctly classify the articles with scores of 90% on Precision, Recall and balanced F-measure.

Infobox suggestion for Wikipedia entities

Conference Paper

Full-text available

Oct 2012

Given the sheer amount of work and expertise required in authoring Wikipedia articles, automatic tools that help Wikipedia contributors in generating and improving content are valuable. This paper presents our initial step towards building a full-fledged author assistant, particularly for suggesting infobox templates for articles. We build SVM classifiers to suggest infobox template types, among a large number of possible types, to Wikipedia articles without infoboxes. Different from prior works on Wikipedia article classification which deal with only a few label classes for named entity recognition, the much larger 337-class setup in our study is geared towards realistic deployment of infobox suggestion tool. We also emphasize testing on articles without infoboxes, due to that labeled and unlabeled data exhibit different distributions of features, which departs from the typical assumption that they are drawn from the same underlying population.

Classifying Wikipedia entities into fine-grained classes

Conference Paper

Apr 2011

Recognition of named entities (people, companies, locations, etc) is an essential task of text analytics. We address the subproblem of this task, namely, named entity classification. We propose a novel approach that constructs an effective fine-grained named entity classifier. Its key highlights are semi-automatic training set construction from Wikipedia articles and additional feature selection. We justify our solution by creating 18-class classifier and demonstrating its effectiveness and efficiency.

SVMs try to maximize the margin of separation between positive and negative examples

Similar publications

Citations