ArticlePDF Available

Introducing New Features to Wikipedia: Case Studies for Web Science

Authors:

Abstract and Figures

Wikipedia is a free Web based encyclopedia produced by hundreds of thousands of online contributors. Today, Wikipedia offers more than 10 million articles and is available in more than 250 languages. For some of these languages, Wikipedia is not just free, but the only encyclopedia. Wikipedia is a complex sociotechnical process.
Content may be subject to copyright.
A preview of the PDF is not available
... Research in understanding how systems for collaborative knowledge creation are impacted by events like data migration is still in its early stages [20], in particular for structured knowledge [12]. Most of the research is focused on Wikipedia [6], which is understandable considering the availability of its data sets, in particular the whole edit history [26] and the availability of tools for working with Wikipedia [19]. ...
... What is the formula for Continued fraction?26. What is the formula for Pythagorean theorem? ...
Preprint
Full-text available
We present an open source math-aware Question Answering System based on Ask Platypus. Our system returns as a single mathematical formula for a natural language question in English or Hindi. This formulae originate from the knowledge-base Wikidata. We translate these formulae to computable data by integrating the calculation engine sympy into our system. This way, users can enter numeric values for the variables occurring in the formula. Moreover, the system loads numeric values for constants occurring in the formula from Wikidata. In a user study, our system outperformed a commercial computational mathematical knowledge engine by 13%. However, the performance of our system heavily depends on the size and quality of the formula data available in Wikidata. Since only a few items in Wikidata contained formulae when we started the project, we facilitated the import process by suggesting formula edits to Wikidata editors. With the simple heuristic that the first formula is significant for the article, 80% of the suggestions were correct.
... But the communities of the many Wikimedia projects have repeatedly shown that they can meet complex challenges with ingenious combinations of processes and technological advancements. 13 ...
Chapter
Full-text available
Wikipedia's first twenty years: how what began as an experiment in collaboration became the world's most popular reference work. We have been looking things up in Wikipedia for twenty years. What began almost by accident—a wiki attached to a nascent online encyclopedia—has become the world's most popular reference work. Regarded at first as the scholarly equivalent of a Big Mac, Wikipedia is now known for its reliable sourcing and as a bastion of (mostly) reasoned interaction. How has Wikipedia, built on a model of radical collaboration, remained true to its original mission of “free access to the sum of all human knowledge” when other tech phenomena have devolved into advertising platforms? In this book, scholars, activists, and volunteers reflect on Wikipedia's first twenty years, revealing connections across disciplines and borders, languages and data, the professional and personal. The contributors consider Wikipedia's history, the richness of the connections that underpin it, and its founding vision. Their essays look at, among other things, the shift from bewilderment to respect in press coverage of Wikipedia; Wikipedia as “the most important laboratory for social scientific and computing research in history”; and the acknowledgment that “free access” includes not just access to the material but freedom to contribute—that the summation of all human knowledge is biased by who documents it. Contributors Phoebe Ayers, Omer Benjakob, Yochai Benkler, William Beutler, Siko Bouterse, Rebecca Thorndike-Breeze, Amy Carleton, Robert Cummings, LiAnna L. Davis, Siân Evans, Heather Ford, Stephen Harrison, Heather Hart, Benjamin Mako Hill, Dariusz Jemielniak, Brian Keegan, Jackie Koerner, Alexandria Lockett, Jacqueline Mabey, Katherine Maher, Michael Mandiberg, Stephane Coillet-Matillon, Cecelia A. Musselman, Eliza Myrie, Jake Orlowitz, Ian A. Ramjohn, Joseph Reagle, Anasuya Sengupta, Aaron Shaw, Melissa Tamani, Jina Valentine, Matthew Vetter, Adele Vrana, Denny Vrandečić
... Research in understanding how systems for collaborative knowledge creation are impacted by events like this data migration are still in its early stages [23], in particular for structured knowledge [18]. Most of the research is focused on Wikipedia [11], which is understandable considering the availability of its data sets, in particular the whole edit history [27] and the availability of tools for working with Wikipedia [22]. ...
Conference Paper
Collaborative knowledge bases that make their data freely available in a machine-readable form are central for the data strategy of many projects and organizations. The two major collaborative knowledge bases are Wikimedia's Wikidata and Google's Freebase. Due to the success of Wikidata, Google decided in 2014 to offer the content of Freebase to the Wikidata community. In this paper, we report on the ongoing transfer efforts and data mapping challenges, and provide an analysis of the effort so far. We describe the Primary Sources Tool, which aims to facilitate this and future data migrations. Throughout the migration, we have gained deep insights into both Wikidata and Freebase, and share and discuss detailed statistics on both knowledge bases.
... That said, there are times when the Foundation staff had to make a controversial decision about whether to modify the basic functionality of MediaWiki. However, decisions that affect the entire community (or even just entire language versions of Wikipedia) can take years to sort out – as did, for example, the controversies over the rollout of the Flagged Revisions extension (Birn et al., 2013; Schindler & Vrandecic, 2009). For most of Wikipedia's history, controversial platform-level changes to MediaWiki's code were so rare that there is still no agreed-upon model for how those kinds of decisions are even to be made – as we would presume would be in place in Lessig's ideal model of platform governance. ...
Article
Full-text available
This article introduces and discusses the role of bespoke code in Wikipedia, which is code that runs alongside a platform or system, rather than being integrated into server-side codebases by individuals with privileged access to the server. Bespoke code complicates the common metaphors of platforms and sovereignty that we typically use to discuss the governance and regulation of software systems through code. Specifically, the work of automated software agents (bots) in the operation and administration of Wikipedia is examined, with a focus on the materiality of code. As bots extend and modify the functionality of sites like Wikipedia, but must be continuously operated on computers that are independent from the servers hosting the site, they involve alternative relations of power and code. Instead of taking for granted the pre-existing stability of Wikipedia as a platform, bots and other bespoke code require that we examine not only the software code itself, but also the concrete, historically contingent material conditions under which this code is run. To this end, this article weaves a series of autobiographical vignettes about the author's experiences as a bot developer alongside more traditional academic discourse.
... Despite the decentralized nature of the Wikipedia, there are currently some studies that analyse the quality of the contents of the Wikipedia that assure that its quality is almost as good as other well reputed encyclopedias [39, 40]. In such a vast environment, where millions of encyclopedical entries and millions of users interact it has been necessary to introduce new tools and fea- tures [41] to improve not only the quality of the entries, but the coordination [42, 27], cooperation [43] among the users, the social transparency of the articles [44] and the semantic annotation of the contents [45]. However, it is still necessary to develop new tools to avoid conflict [46] and increase the consensus of the decisions taken in Wikipedia. ...
Article
Web 2.0 communities are a quite recent phenomenon which involve large numbers of users and where communication between members is carried out in real time. Despite of those good characteristics, there is still a necessity of developing tools to help users to reach decisions with a high level of consensus in those new virtual environments. In this contribution a new consensus reaching model is presented which uses linguistic preferences and is designed to minimize the main problems that this kind of organization presents (low and intermittent participation rates, difficulty of establishing trust relations and so on) while incorporating the benefits that a Web 2.0 community offers (rich and diverse knowledge due to a large number of users, real-time communication, etc.). The model includes some delegation and feedback mechanisms to improve the speed of the process and its convergence towards a solution of consensus. Its possible application to some of the decision making processes that are carried out in the Wikipedia is also shown.
... For effective interoperability, a LO must be a stand-alone, modular entity that incorporates its learning context (semantic relationships) in its own metadata . Metadata labeling is a key issue for semantic annotation, encoding, exchanging and reusing LOs, as it offers a successful way to catalogue and navigate its content, context, usage and structure [4, 2, 5, 3, 6] . This is valid for adaptive courseware generation, where the goal is to ensure a student completes the required activities, and dynamic courseware generation, where the goal is to assist students in navigating a complex hypermedia space. ...
Article
Full-text available
The aim of educational systems is to assemble learning objects on a set of topics tailored to the goals and individual students' styles. Given the amount of available learning objects, the challenge of e-learning is to select the proper objects, define their relationships, and adapt their sequencing (i.e. course composition) to the specific needs, objectives and background of the student. This paper describes the general requirements for this course adaptation, the full potential of applying planning techniques on the construction of personalized e-learning routes, and how to accommodate the temporal and resource constraints to make the course applicable in a real scenario.
Article
Full-text available
Purpose This paper aims to present an open source math-aware Question Answering System based on Ask Platypus. Design/methodology/approach The system returns as a single mathematical formula for a natural language question in English or Hindi. These formulae originate from the knowledge-based Wikidata. The authors translate these formulae to computable data by integrating the calculation engine sympy into the system. This way, users can enter numeric values for the variables occurring in the formula. Moreover, the system loads numeric values for constants occurring in the formula from Wikidata. Findings In a user study, this system outperformed a commercial computational mathematical knowledge engine by 13 per cent. However, the performance of this system heavily depends on the size and quality of the formula data available in Wikidata. As only a few items in Wikidata contained formulae when the project started, the authors facilitated the import process by suggesting formula edits to Wikidata editors. With the simple heuristic that the first formula is significant for the paper, 80 per cent of the suggestions were correct. Originality/value This research was presented at the JCDL17 KDD workshop.
Article
In the last decade, collaborative open production communities have provided an effective platform for geographically dispersed users to collaborate and generate content in a well-structured and consistent form. Wikipedia is a prominent example in this area. What is of great importance in production communities is the prioritization and evolution of features with regards to the community lifecycle. Users are the cornerstone of such communities and their needs and attitudes constantly change as communities grow. The increasing amount and versatility of content and users requires modifications in areas ranging from user roles and access levels to content quality standards and community policies and goals. In this paper, we draw on two pertinent theories in terms of the lifecycle of online communities and open collaborative communities in particular by focusing on the case of Wikipedia. We conceptualize three general stages (Rising, Organizing, and Stabilizing) within the lifecycle of collaborative open production communities. The salient factors, features and focus of attention in each stage are provided and the chronology of features is visualized. These findings, if properly generalized, can help designers of other types of open production communities effectively allocate their resources and introduce new features based on the needs of both community and users.
Article
Commons-based Peer Production is the process by which internet communities create media and software artefacts. Learning is integral to the success of these communities as it encourages contribution on an individual level, helps to build and sustain commitment on a group level and provides a means for adaption at an organisational level. While some communities have established ways to support organisational learning -- through a forum or thread reserved for community discussion -- few have investigated how more in-depth visual and analytic interfaces could help formalise this process. In this paper, we explore how social network visualisation can be used to encourage reflection and thus support organisational learning in online communities. We make the following contributions: First, we describe Commons-Based Peer Production, in terms of a socio-technical learning system that includes individual, group and organisational learning. Second, we present a novel visualisation environment that embeds social network visualisation in an asynchronous collaborative architecture. Third, we present results from an evaluation and discuss the potential for visualisation to support the process of organisational reflection in online communities.
Article
Full-text available
Despite the huge success of the World Wide Web as a technology, and the significant amount of computing infrastructure on which it sits, the Web, as an entity remains surprisingly unstudied. In this article, we look at some of the issues that need to be explored to model the Web as a whole, to keep it growing, and to understand its continuing social impact. We argue that a "systems" approach, in the sense of "systems biology" is needed if we are to be able to understand and engineer the future of the Web.
Conference Paper
Full-text available
DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against datasets derived from Wikipedia and to link other datasets on the Web to Wikipedia data. We describe the extraction of the DBpedia datasets, and how the resulting information is published on the Web for human- and machine-consumption. We describe some emerging applications from the DBpedia community and show how website authors can facilitate DBpedia content within their sites. Finally, we present the current status of interlinking DBpedia with other open datasets on the Web and outline how DBpedia could serve as a nucleus for an emerging Web of open data.
Article
Full-text available
Wikipedia is the world's largest collaboratively edited source of encyclopaedic knowledge. But in spite of its utility, its content is barely machine-interpretable and only weakly structured. With Semantic MediaWiki we provide an extension that enables wiki-users to semantically annotate wiki pages, based on which the wiki contents can be browsed, searched, and reused in novel ways. In this paper, we give an extended overview of Semantic MediaWiki and discuss experiences regarding performance and current applications.
Article
Full-text available
This paper explores the system of categories that is used to classify articles in Wikipedia. It is compared to collaborative tagging systems like del.icio.us and to hierarchical classification like the Dewey Decimal Classification (DDC). Specifics and commonalitiess of these systems of subject indexing are exposed. Analysis of structural and statistical properties (descriptors per record, records per descriptor, descriptor levels) shows that the category system of Wikimedia is a thesaurus that combines collaborative tagging and hierarchical subject indexing in a special way.
The two magics of web science. Keynote at the WWW2007 conference in
  • T Berners-Lee
T. Berners-Lee. The two magics of web science. Keynote at the WWW2007 conference in Banff, Canada.
Mathematical Expressions and Conditional Constructs, Now Implemented
  • T Starling
The Two Magics of Web Science
  • berners-lee