Figure 1 - uploaded by Alexander Zipf
Content may be subject to copyright.
The OpenStreetMap Infrastructure/Geostack (simplified). 

The OpenStreetMap Infrastructure/Geostack (simplified). 

Source publication
Article
Full-text available
The OpenStreetMap (OSM) project, a well-known source of freely available worldwide geodata collected by volunteers, has experienced a consistent increase in popularity in recent years. One of the main caveats that is closely related to this popularity increase is different types of vandalism that occur in the projects database. Since the applicabil...

Context in source publication

Context 1
... database), the OSM project provides so-called OSM Change-Files (OSC), also referred to as ―Diff‖, which can be downloaded. These file s only contain the latest changes to the database and are available for different time frames, such as every minute, hour or day. The format of the OSC-files will be explained in more detail in a later section of this paper. Figure 1 shows a simplified version of the OSM project infrastructure. By using one of the freely available OSM editors, the contributors can edit any object of t he project’s database. External applications , such as routing or mapping applications, can use the project’s data by retrievi ng the dump- and diff-files from the database. On average, nearly 700 new members have registered to the project each day between January and March 2012. According to [23], nearly 30% of those newly registered contributors will become active contributors (and not only a registered user). That is, each day in 2012, 230 new OSM members started contributing to OSM. Table 1 contains the average number of edits per OSM object (node, way and relation) per day between January and June 2012. Considering these numbers for estimating future processing workloads for our suggested tool, it can be determined that (on average) every minute, 830 node creations, 190 node modifications and 135 node deletions will have to be performed. Furthermore, 90 new, 48 modified and 11 deleted ways and one new, two modified and 0.2 deleted relations have to be processed. Those numbers can obviously vary during the day, but they give a first indication on how much data will be edited. The open approach of data collection in the OSM project can cause a variety of types of vandalism. It is possible that a contributor purposely or accidently makes changes to the dataset that are harming the project’s main goal. Common vandalism types that appear in the actual OSM geodata database are (based upon ...

Similar publications

Article
Full-text available
La argumentación se entiende como una actividad intelectual y social enfocada en defender o rebatir un punto de vista, con el propósito de llegar a un acuerdo en las ideas. Objetivo El propósito de esta investigación se centró en caracterizar los argumentos de los estudiantes de Educación Básica durante el desarrollo de situaciones didácticas media...
Article
Full-text available
The current global economic instability and the vulnerability of small nations provide the impetus for greater integration between the countries of the South Pacific region. This exercise is critical for their survival. Past efforts of regional integration in the South Pacific have mostly failed. However, today's IT collaborative capabilities provi...
Article
Full-text available
Of late, social tagging has become popular trend in information organisation. In context of digital resources the tags assigned by users also play vital role in information retrieval. For information discovery the ‘terms’ used to retrieve the results also depend upon the ‘relevancy’ or ‘weightage’ of the keywords. This study investigates ‘relevancy...

Citations

... OSM enables contributors to edit, create, and delete content on the world map. Numerous studies have demonstrated that contributions to OSM are not uniform, revealing areas with substantial activity as well as those with minimal contributions [7,8]. ...
... Consequently, the quality of spatial data in the OSM project cannot be assured. Additionally, a small number of vandalism cases, where errors are introduced into the OSM map intentionally, have been detected [8]. ...
Article
Full-text available
OpenStreetMap (OSM) is among the most prominent Volunteered Geographic Information (VGI) initiatives, aiming to create a freely accessible world map. Despite its success, the data quality of OSM remains variable. This study begins by identifying the quality metrics proposed by earlier research to assess the quality of OSM building footprints. It then evaluates the quality of OSM building data from 2018 and 2023 for five cities within Québec, Canada. The analysis reveals a significant quality improvement over time. In 2018, the completeness of OSM building footprints in the examined cities averaged around 5%, while by 2023, it had increased to approximately 35%. However, this improvement was not evenly distributed. For example, Shawinigan saw its completeness surge from 2% to 99%. The study also finds that OSM contributors were more likely to digitize larger buildings before smaller ones. Positional accuracy saw enhancement, with the average error shrinking from 3.7 m in 2018 to 2.3 m in 2023. The average distance measure suggests a modest increase in shape accuracy over the same period. Overall, while the quality of OSM building footprints has indeed improved, this study shows that the extent of the improvement varied significantly across different cities. Shawinigan experienced a substantial increase in data quality compared to its counterparts.
... Over the past two decades, several researchers have attempted to assess the usefulness and quality of Volunteered Geographic Information (VGI) in different geomatics applications [1][2][3][4][5][6][7][8][9][10][11][12][13]. OpenStreetMap (OSM) has garnered significant attention due to its provision of a free, editable, and readily available map of the world, which is collaboratively created by thousands of contributors [14]. ...
... OpenStreetMap (OSM) has garnered significant attention due to its provision of a free, editable, and readily available map of the world, which is collaboratively created by thousands of contributors [14]. OSM is emerged after Web 2.0 technology that allows the users (OSM contributors) to create content (such as text, image or video) and send it towards the server to be shared with other users [2,[15][16][17][18]. Since any contributor with no required level of knowledge in geomatics can modify OSM data, the quality of OSM data is not guaranteed [19][20][21]. ...
Chapter
Full-text available
Numerous studies have attempted to assess the quality of OpenStreetMap's building data by comparing it to reference datasets. Map matching (feature matching) is a critical step in this method of quality assessment, involving the matching of polygons in the two datasets. Researchers commonly use two main polygon matching algorithms: 1) the buffer intersection method and 2) the centroid comparison method. While these methods are effective for the majority of OSM building footprints, they may not achieve high accuracy in complex situations. One possible reason is that both methods only consider the position of the OSM polygon compared to that of the reference polygon. To improve these matching algorithms and propose a more robust solution, this study proposes an algorithm that considers shape similarity (using average distance method) in addition to position similarity to better identify corresponding polygons in the two datasets. The experiment results for five cities in the Province of Quebec indicate that the proposed algorithm can reduce the matching error of previous map matching algorithms from approximately 8% to approximately 3%. Furthermore, the study found that the proposed polygon matching algorithm performs more accurately than previous methods when buildings consist of multiple polygons.KeywordsOpenStreetMapFeature MatchingMap MatchingOSM Polygon MatchingOSM Buildings Footprint QualityVolunteered Geographic InformationSpatial Data Quality
... Further works also observed the relationship between editing behavior, link structure, and article quality on Wikipedia [19]. The study of vandalism detection in other open-source platforms like Wikidata and OpenStreetMap, which observe vandalism patterns and propose detection approaches, also provide valuable insights applicable to our tasks due to shared similarities [10,15]. ...
... Each inference service has production images that are published in the WMF Docker Registry. 15 The final model was deployed to the ML staging cluster and allocated up to 4 CPUs and 6 Gi memory resources to the pod. The service endpoint is exposed using Istio as an ingress to the API consumers. ...
Preprint
This paper presents a novel design of the system aimed at supporting the Wikipedia community in addressing vandalism on the platform. To achieve this, we collected a massive dataset of 47 languages, and applied advanced filtering and feature engineering techniques, including multilingual masked language modeling to build the training dataset from human-generated data. The performance of the system was evaluated through comparison with the one used in production in Wikipedia, known as ORES. Our research results in a significant increase in the number of languages covered, making Wikipedia patrolling more efficient to a wider range of communities. Furthermore, our model outperforms ORES, ensuring that the results provided are not only more accurate but also less biased against certain groups of contributors.
... The risk of vandalism, not completely avoided, is however mitigated by a series of automated multi-criteria alerts and tools in permanent development, on editions and user profiles, which allow to flag changes to be checked one by one by "patrols" dedicated to the verifications. Authors are contacted personally, and each case is treated in a relevant way, and accepted, corrected, or reverted (Neis, Goetz, and Zipf 2012). Improvements are always possible in this domain, and there are many methodologies exist 5 . ...
Article
Full-text available
This paper presents an overview of the integration of participatory processes in the production of official data. Through a series of interviews with strategic stakeholders we identified the key elements to institutionalize citizen science in the production of geospatial information. This article discusses practical contexts of uses of data produced or complemented by citizens in Mexico. We analyze institutional processes that facilitates or make difficult the integration into official mechanisms for generating more accurate cartographic information in various institutions, focusing on its possible adoption, in particular by the National Institute of Statistics and Geography (INEGI) of Mexico. Resources, data integration models, workflows, and an organizational structure are needed to benefit from citizen science. We find that the adoption of citizen science within an organization is subject to a well-defined and structured interest driven by leadership and implemented collectively. This presents a paradigm shift in obtaining information, citizen science as official data through concrete and functional information products will allow end users to benefit from timely and accurate data. The purpose of this article is then to generate organizational knowledge on how to use citizen science in public institutions, with long-term perspective, to mediate the lack of current and accurate spatial data and participate in social innovation.
... Besides general quality assessment studies applicable to both commercial and crowdsourcing sources, such as the one in Section 5, more specific applications of the framework and its implementation are possible. For example, in the context of VSVI, the work may be used to associate the quality metrics to each contributor (a quality contribution score can be assigned to reflect trustworthiness and effort of different contributors) and to detect vandalism, a topic that gained interest in other instances of VGI (Neis et al., 2012;Li et al., 2021a). Further, the derived quality elements may be adopted by services and help users filter for imagery suitable for their analysis and help contributors identify areas in need of data of better quality or updated coverage. ...
Article
Full-text available
Street view imagery (SVI) is increasingly in competition with traditional remote sensing sources and assuming its domination in myriads of studies, mainly thanks to the omnipresence of commercial services such as Google Street View. Similar to other spatial data, SVI may be of variable quality and burdened with a variety of errors. Recently, this concern has been amplified with the rise of volunteered SVI such as Mapillary and KartaView, which – akin to other instances of Volunteered Geographic Information (VGI) – are of heterogeneous quality. However, unlike with many other forms of spatial data, there has not been much discussion about the quality of SVI datasets, let alone a standard and mechanism to assess them. Further, current spatial data quality standards are not entirely applicable to SVI due to its particularities. Following a multi-pronged method, we establish a comprehensive framework for describing and assessing the quality of SVI. We present a categorised set of 48 elements that suggest the quality of imagery and associated data such as geographic information and metadata. The framework is applicable to any source of SVI, including both commercial and crowdsourcing services. In the implementation, which we release open-source, we assess several quality elements of SVI datasets across 9 cities. The results expose varying quality of SVI and affirm the importance of the work. Given the exponential volume of studies taking advantage of SVI, but largely overlooking quality aspects, this work is a timely contribution that will benefit data providers, contributors, and users. It may also be applied on other forms of image-based VGI, and underpin establishing a formal international standard in the future. On a broader perspective, while providing an overdue definition of SVI, this work also reveals issues and open questions that impede delineating and assessing this diverse form of urban and terrestrial imagery.
... Det findes der også løsninger på. Der er lavet systemer både til at monitorere kvaliteten af arbejdet og også til at tjekke for uønskede kortlaegninger (Juhász et al., 2020;Neis et al., 2012;Sehra et al., 2014). ...
Article
Full-text available
Oplysningstiden omkring slutningen af 1700-tallet var starten på det videnskabelige paradigme og den sociale og samfundsmæssige orden, som vi kender i dag. Der er en generel tro på, at samfundet udvikler sig gennem social og videnskabelig innovation, og at fremskridt bygger på et fundament af uddannelse og læring. Dette paradigme er kontinuerligt blevet institutionaliseret gennem skoler, universiteter og den akademiske konstitution af videnskaben. Men noget er under forandring i den måde, institutionerne bedriver forskning på i dag. Forandringen kommer blandt andet i kraft af, at borgerne tager deres mobiltelefon (smartphone) i hånden og går ud og laver borgerforskning (Citizen Science). Denne artikel viser eksempler fra forskningsprojektet PERICLES, hvor Citizen Science er brugt i forbindelse med dataindsamlingen.
... On the VGI platforms (e.g., OpenStreetMap), many POIs have missing labels which could affect the integrity and accuracy of related data products. For this issue, the developers proposed to incorporate some control mechanisms into the platforms to ensure the quality and validation of uploading data (Hung et al., 2016;Neis et al., 2012). There are also many studies on predicting object types and patterns by integrating spatial relationship information and contextual content. ...
Article
Full-text available
The point of interest (POI) is a critical part of spatial database, which has been widely used in many fields, such as navigation, mapping and urban planning. The data quality of POIs (e.g., missing and incorrect labels) has a large impact on the effectiveness of geospatial applications, especially regarding the non-professional collection characteristics of OpenStreetMap (OSM) data. The conventional neural network model predicted multi-category data labels directly from a single level, and did not consider the uneven distribution of data among POI categories. The predicted labels tended to be the types with larger data volume, and it is difficult to generalize the small-scale samples. Taking into account the large difference in the data volume among POI categories, this paper proposes a neural network prediction method based on multi-level POI category organization. Through the hierarchical aggregation of small sample categories, we established a POI category tree structure, which achieved a relatively balanced division of data volume at different levels of the tree structure. Specifically, the proposed method first roughly classified the POIs at an abstraction level and then inferred the detailed labels according to the hierarchy of the tree structure. We conducted extensive experiments on two datasets and the results demonstrated that our method outperformed traditional methods by a large margin.
... OSM users can produce geographical data in various ways. Initially, the main way to produce geographic data was to use a GPS receiver, collect some coordinates, and later import them into the OSM database (Neis et al. 2012). After November 2010, Bing map aerial imagery was added to OSM, which allowed users to produce data not only by GPS but also by digitizing areal imagery (Neis et al. 2012). ...
... Initially, the main way to produce geographic data was to use a GPS receiver, collect some coordinates, and later import them into the OSM database (Neis et al. 2012). After November 2010, Bing map aerial imagery was added to OSM, which allowed users to produce data not only by GPS but also by digitizing areal imagery (Neis et al. 2012). In fact, it was no longer necessary to be present in the location, and users began to perform remote mapping. ...
... In the case of OSM, the issue of quality is even more important, because data contributors are not necessarily experts. In addition to the inherent uncertainties in the OSM data, Neis et al. (2012) found various cases of intentional vandalism (importing errors to the database) in the OSM project. In this section, the components of geographic data quality, as well as the VGI/OSM quality measures, are synthesized and discussed. ...
Article
Full-text available
OpenStreetMap (OSM) is one of the most well-known volunteered geographic information (VGI) projects that aims to produce a free-world map. However, there are serious concerns about its quality. Numerous studies have assessed the quality of OSM by comparing the OSM database with a reference database. Several researchers have proposed the use of quality indicators as variables that can describe OSM quality in regions where no reference data are available. A quality indicator is a variable that has a significant monotonic relationship with quality measures. In this study, a literature review was conducted to identify and define the main quality measures proposed for assessing the quality of linear features. Owing to limited access to current data, only three quality elements—completeness, positional accuracy, and attribute accuracy—were evaluated in this study. These quality measures were then used to assess the quality of the OSM roads in the province of Quebec. Finally, Spearman’s rank correlation coefficient test was applied to determine whether there was a significant correlation between the quality measures related to the three quality elements and the five potential quality indicators: population, average income, density of OSM roads, density of OSM buildings, and number of points of interest (POI). The main contribution of this study is testing the following hypothesis: “There is a significant correlation between the five mentioned variables and the measures related to the three quality elements”. Statistical analysis showed that in terms of completeness, the density of OSM roads and population were the best indicators; in terms of positional accuracy, population and income were the best indicators; and in terms of attribute accuracy, completeness was the best indicator. All five variables have significant correlations with the measures of the three elements of quality, except for the following two pairs (attribute accuracy, density of OSM roads) and (attribute accuracy, density of OSM buildings). This study proposes the density of OSM roads and number of POI as two new quality indicators that have not been found in the literature review.
... Whereas the existing literature has considered OSM vandalism previously, only a few automated approaches for vandalism detection in OSM exist. An early approach proposed in [5] adopts a rule-based method to identify suspicious edits. This approach is subject to numerous manually tuned thresholds. ...
... OSMPatrol. This model is an early approach to detect vandalism in OSM [5]. OSMPatrol is a rule-based system aiming to identify vandalism at the level of OSM edits. ...
Preprint
OpenStreetMap is a unique source of openly available worldwide map data, increasingly adopted in real-world applications. Vandalism detection in OpenStreetMap is critical and remarkably challenging due to the large scale of the dataset, the sheer number of contributors, various vandalism forms, and the lack of annotated data to train machine learning algorithms. This paper presents Ovid - a novel machine learning method for vandalism detection in OpenStreetMap. Ovid relies on a neural network architecture that adopts a multi-head attention mechanism to effectively summarize information indicating vandalism from OpenStreetMap changesets. To facilitate automated vandalism detection, we introduce a set of original features that capture changeset, user, and edit information. Our evaluation results on real-world vandalism data demonstrate that the proposed Ovid method outperforms the baselines by 4.7 percentage points in F1 score.
... The existing literature has considered OSM vandalism previously, but only a few automated approaches for vandalism detection in OSM exist. An early approach proposed in [18] adopts a rule-based method to identify suspicious edits. However, configuring the rules manually is tedious and error-prone. ...
... No. past creates [18], past modifications, past deletes. User experience plays a vital role in quantifying user credibility [18]. ...
... No. past creates [18], past modifications, past deletes. User experience plays a vital role in quantifying user credibility [18]. We quantify user experience as the number of previously created, modified, and deleted objects and add these numbers as features. ...
Preprint
OpenStreetMap (OSM), a collaborative, crowdsourced Web map, is a unique source of openly available worldwide map data, increasingly adopted in Web applications. Vandalism detection is a critical task to support trust and maintain OSM transparency. This task is remarkably challenging due to the large scale of the dataset, the sheer number of contributors, various vandalism forms, and the lack of annotated data. This paper presents Ovid - a novel attention-based method for vandalism detection in OSM. Ovid relies on a novel neural architecture that adopts a multi-head attention mechanism to summarize information indicating vandalism from OSM changesets effectively. To facilitate automated vandalism detection, we introduce a set of original features that capture changeset, user, and edit information. Furthermore, we extract a dataset of real-world vandalism incidents from the OSM edit history for the first time and provide this dataset as open data. Our evaluation conducted on real-world vandalism data demonstrates the effectiveness of Ovid.