ArticlePDF Available

Introducing New Features to Wikipedia: Case Studies for Web Science

March 2011
Intelligent Systems, IEEE 26(1):56 - 61

March 2011
26(1):56 - 61

DOI:10.1109/MIS.2011.17

Source
IEEE Xplore

Authors:

Mathias Schindler

Deutscher Bundestag

Denny Vrandečić

Google Inc.

Wikipedia is a free Web based encyclopedia produced by hundreds of thousands of online contributors. Today, Wikipedia offers more than 10 million articles and is available in more than 250 languages. For some of these languages, Wikipedia is not just free, but the only encyclopedia. Wikipedia is a complex sociotechnical process.

The web science process as presented by Tim Berners-Lee in his Keynote The two Magics of Web Science at the WWW2007, Banff, Canada. The two magics are creativity and complexity, since both are not well under- stood yet. Source: 0509-www-keynote-tbl/#(11)

…

Figures - uploaded by Mathias Schindler

Content may be subject to copyright.

Content uploaded by Mathias Schindler

Content may be subject to copyright.

Content uploaded by Mathias Schindler

Content may be subject to copyright.

A preview of the PDF is not available

Introducing MathQA -- A Math-Aware Question Answering System

Preprint

Full-text available

Jun 2019

We present an open source math-aware Question Answering System based on Ask Platypus. Our system returns as a single mathematical formula for a natural language question in English or Hindi. This formulae originate from the knowledge-base Wikidata. We translate these formulae to computable data by integrating the calculation engine sympy into our system. This way, users can enter numeric values for the variables occurring in the formula. Moreover, the system loads numeric values for constants occurring in the formula from Wikidata. In a user study, our system outperformed a commercial computational mathematical knowledge engine by 13%. However, the performance of our system heavily depends on the size and quality of the formula data available in Wikidata. Since only a few items in Wikidata contained formulae when we started the project, we facilitated the import process by suggesting formula edits to Wikidata editors. With the simple heuristic that the first formula is significant for the article, 80% of the suggestions were correct.

Collaborating on the Sum of All Knowledge Across Languages

Chapter

Full-text available

Oct 2020

Denny Vrandečić

Wikipedia's first twenty years: how what began as an experiment in collaboration became the world's most popular reference work. We have been looking things up in Wikipedia for twenty years. What began almost by accident—a wiki attached to a nascent online encyclopedia—has become the world's most popular reference work. Regarded at first as the scholarly equivalent of a Big Mac, Wikipedia is now known for its reliable sourcing and as a bastion of (mostly) reasoned interaction. How has Wikipedia, built on a model of radical collaboration, remained true to its original mission of “free access to the sum of all human knowledge” when other tech phenomena have devolved into advertising platforms? In this book, scholars, activists, and volunteers reflect on Wikipedia's first twenty years, revealing connections across disciplines and borders, languages and data, the professional and personal. The contributors consider Wikipedia's history, the richness of the connections that underpin it, and its founding vision. Their essays look at, among other things, the shift from bewilderment to respect in press coverage of Wikipedia; Wikipedia as “the most important laboratory for social scientific and computing research in history”; and the acknowledgment that “free access” includes not just access to the material but freedom to contribute—that the summation of all human knowledge is biased by who documents it. Contributors Phoebe Ayers, Omer Benjakob, Yochai Benkler, William Beutler, Siko Bouterse, Rebecca Thorndike-Breeze, Amy Carleton, Robert Cummings, LiAnna L. Davis, Siân Evans, Heather Ford, Stephen Harrison, Heather Hart, Benjamin Mako Hill, Dariusz Jemielniak, Brian Keegan, Jackie Koerner, Alexandria Lockett, Jacqueline Mabey, Katherine Maher, Michael Mandiberg, Stephane Coillet-Matillon, Cecelia A. Musselman, Eliza Myrie, Jake Orlowitz, Ian A. Ramjohn, Joseph Reagle, Anasuya Sengupta, Aaron Shaw, Melissa Tamani, Jina Valentine, Matthew Vetter, Adele Vrana, Denny Vrandečić

From Freebase to Wikidata: The Great Migration

Conference Paper

Apr 2016

Collaborative knowledge bases that make their data freely available in a machine-readable form are central for the data strategy of many projects and organizations. The two major collaborative knowledge bases are Wikimedia's Wikidata and Google's Freebase. Due to the success of Wikidata, Google decided in 2014 to offer the content of Freebase to the Wikidata community. In this paper, we report on the ongoing transfer efforts and data mapping challenges, and provide an analysis of the effort so far. We describe the Primary Sources Tool, which aims to facilitate this and future data migrations. Throughout the migration, we have gained deep insights into both Wikidata and Freebase, and share and discuss detailed statistics on both knowledge bases.

Bots, bespoke code, and the materiality of software platforms

Article

Full-text available

Jan 2014

R.Stuart Geiger

This article introduces and discusses the role of bespoke code in Wikipedia, which is code that runs alongside a platform or system, rather than being integrated into server-side codebases by individuals with privileged access to the server. Bespoke code complicates the common metaphors of platforms and sovereignty that we typically use to discuss the governance and regulation of software systems through code. Specifically, the work of automated software agents (bots) in the operation and administration of Wikipedia is examined, with a focus on the materiality of code. As bots extend and modify the functionality of sites like Wikipedia, but must be continuously operated on computers that are independent from the servers hosting the site, they involve alternative relations of power and code. Instead of taking for granted the pre-existing stability of Wikipedia as a platform, bots and other bespoke code require that we examine not only the software code itself, but also the concrete, historically contingent material conditions under which this code is run. To this end, this article weaves a series of autobiographical vignettes about the author's experiences as a bot developer alongside more traditional academic discourse.

A linguistic consensus model for Web 2.0 communities

Article

Jan 2013
APPL SOFT COMPUT

Web 2.0 communities are a quite recent phenomenon which involve large numbers of users and where communication between members is carried out in real time. Despite of those good characteristics, there is still a necessity of developing tools to help users to reach decisions with a high level of consensus in those new virtual environments. In this contribution a new consensus reaching model is presented which uses linguistic preferences and is designed to minimize the main problems that this kind of organization presents (low and intermittent participation rates, difficulty of establishing trust relations and so on) while incorporating the benefits that a Web 2.0 community offers (rich and diverse knowledge due to a large number of users, real-time communication, etc.). The model includes some delegation and feedback mechanisms to improve the speed of the process and its convergence towards a solution of consensus. Its possible application to some of the decision making processes that are carried out in the Wikipedia is also shown.

Assembling Learning Objects for Personalized Learning. An AI Planning Perspective

Article

Full-text available

Jan 2011

The aim of educational systems is to assemble learning objects on a set of topics tailored to the goals and individual students' styles. Given the amount of available learning objects, the challenge of e-learning is to select the proper objects, define their relationships, and adapt their sequencing (i.e. course composition) to the specific needs, objectives and background of the student. This paper describes the general requirements for this course adaptation, the full potential of applying planning techniques on the construction of personalized e-learning routes, and how to accommodate the temporal and resource constraints to make the course applicable in a real scenario.

Wikidata: The Making Of

Conference Paper

Apr 2023

Introducing MathQA: a Math-Aware question answering system

Article

Full-text available

Nov 2018

Purpose This paper aims to present an open source math-aware Question Answering System based on Ask Platypus. Design/methodology/approach The system returns as a single mathematical formula for a natural language question in English or Hindi. These formulae originate from the knowledge-based Wikidata. The authors translate these formulae to computable data by integrating the calculation engine sympy into the system. This way, users can enter numeric values for the variables occurring in the formula. Moreover, the system loads numeric values for constants occurring in the formula from Wikidata. Findings In a user study, this system outperformed a commercial computational mathematical knowledge engine by 13 per cent. However, the performance of this system heavily depends on the size and quality of the formula data available in Wikidata. As only a few items in Wikidata contained formulae when the project started, the authors facilitated the import process by suggesting formula edits to Wikidata editors. With the simple heuristic that the first formula is significant for the paper, 80 per cent of the suggestions were correct. Originality/value This research was presented at the JCDL17 KDD workshop.

Lifecycle-based evolution of features in collaborative open production communities: The case of wikipedia

Article

Jan 2013

In the last decade, collaborative open production communities have provided an effective platform for geographically dispersed users to collaborate and generate content in a well-structured and consistent form. Wikipedia is a prominent example in this area. What is of great importance in production communities is the prioritization and evolution of features with regards to the community lifecycle. Users are the cornerstone of such communities and their needs and attitudes constantly change as communities grow. The increasing amount and versatility of content and users requires modifications in areas ranging from user roles and access levels to content quality standards and community policies and goals. In this paper, we draw on two pertinent theories in terms of the lifecycle of online communities and open collaborative communities in particular by focusing on the case of Wikipedia. We conceptualize three general stages (Rising, Organizing, and Stabilizing) within the lifecycle of collaborative open production communities. The salient factors, features and focus of attention in each stage are provided and the chronology of features is visualized. These findings, if properly generalized, can help designers of other types of open production communities effectively allocate their resources and introduce new features based on the needs of both community and users.

Exploring reflection in online communities

Article

Apr 2012

Commons-based Peer Production is the process by which internet communities create media and software artefacts. Learning is integral to the success of these communities as it encourages contribution on an individual level, helps to build and sustain commitment on a group level and provides a means for adaption at an organisational level. While some communities have established ways to support organisational learning -- through a forum or thread reserved for community discussion -- few have investigated how more in-depth visual and analytic interfaces could help formalise this process. In this paper, we explore how social network visualisation can be used to encourage reflection and thus support organisational learning in online communities. We make the following contributions: First, we describe Commons-Based Peer Production, in terms of a socio-technical learning system that includes individual, group and organisational learning. Second, we present a novel visualisation environment that embeds social network visualisation in an asynchronous collaborative architecture. Third, we present results from an evaluation and discuss the potential for visualisation to support the process of organisational reflection in online communities.

D.: Web Science: An Interdisciplinary Approach to Understanding the Web

Article

Full-text available

Jul 2008

Despite the huge success of the World Wide Web as a technology, and the significant amount of computing infrastructure on which it sits, the Web, as an entity remains surprisingly unstudied. In this article, we look at some of the issues that need to be explored to model the Web as a whole, to keep it growing, and to understand its continuing social impact. We argue that a "systems" approach, in the sense of "systems biology" is needed if we are to be able to understand and engineer the future of the Web.

DBpedia: A Nucleus for a Web of Open Data

Conference Paper

Full-text available

Jan 2007

DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against datasets derived from Wikipedia and to link other datasets on the Web to Wikipedia data. We describe the extraction of the DBpedia datasets, and how the resulting information is published on the Web for human- and machine-consumption. We describe some emerging applications from the DBpedia community and show how website authors can facilitate DBpedia content within their sites. Finally, we present the current status of interlinking DBpedia with other open datasets on the Web and outline how DBpedia could serve as a nucleus for an emerging Web of open data.

Semantic Wikipedia

Article

Full-text available

Sep 2007
J WEB SEMANT

Wikipedia is the world's largest collaboratively edited source of encyclopaedic knowledge. But in spite of its utility, its content is barely machine-interpretable and only weakly structured. With Semantic MediaWiki we provide an extension that enables wiki-users to semantically annotate wiki pages, based on which the wiki contents can be browsed, searched, and reused in novel ways. In this paper, we give an extended overview of Semantic MediaWiki and discuss experiences regarding performance and current applications.

Collaborative thesaurus tagging the Wikipedia way

Article

Full-text available

May 2006

Jakob Voß

This paper explores the system of categories that is used to classify articles in Wikipedia. It is compared to collaborative tagging systems like del.icio.us and to hierarchical classification like the Dewey Decimal Classification (DDC). Specifics and commonalitiess of these systems of subject indexing are exposed. Analysis of structural and statistical properties (descriptors per record, records per descriptor, descriptor levels) shows that the category system of Wikimedia is a thesaurus that combines collaborative tagging and hierarchical subject indexing in a special way.

The two magics of web science. Keynote at the WWW2007 conference in

T Berners-Lee

T. Berners-Lee. The two magics of web science. Keynote at the WWW2007 conference in Banff, Canada.

Mathematical Expressions and Conditional Constructs, Now Implemented

T Starling

The Two Magics of Web Science

berners-lee

Introducing New Features to Wikipedia: Case Studies for Web Science

Abstract and Figures

Recommended publications

Research on measuring semantic correlation based on the Wikipedia hyperlink network

Introducing new features to Wikipedia - Case studies for Web science

Semantic Wikipedia

Introducing new features to Wikipedia

Introducing Wikidata to the Linked Data Web