Article

Data quality assessment[J]

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Tabela 1 -Descrição dos Atributos por Dimensão Fonte: Adaptado de Pipino et al. (2002) ...
... Neste sentido, quando se avalia a qualidade dos dados deve-se estar preparado, por um lado para perceber as perceções subjetivas dos utilizadores e, por outro, para efetuar uma avaliação objetiva. A avaliação objetiva foi analisada através de métricas que permitiu validar inequivocamente as quatro dimensões da qualidade dos dados (Pipino et al., 2002). ...
... Existem dois tipos de métricas, as dependentes e independentes dos processos destinadas a efetuar a avaliação objetiva em três classes: rácio simples (simple ratio), valor mínimo ou valor máximo (min or max value) e média aritmética (Pipino et al., 2002 ...
Conference Paper
A qualidade de informação é essencial para a sobrevivência de qualquer instituição, pelo que os Serviços de Saúde necessitam de sistemas de qualidade que correspondam eficazmente às exigências dos seus utilizadores. Foi avaliada a qualidade do e-Vacinas, através da metodologia Data Quality Assessment em dois distritos do Alentejo, ao qual revelou que este sistema de informação assume um papel importante na qualidade da prestação de cuidados, nomeadamente na vacinação ao nível da eficiência e da eficácia do atendimento dos utentes e na diminuição do risco de erros. O e-Vacinas é um sistema de qualidade, uma vez que a avaliação das quatro dimensões propostas pelos autores, revelam um resultado global considerado forte, demonstrando ainda, que este aplicativo é instrumento de trabalho essencial na prestação de cuidados, facilitando a comunicação e a partilha de informação, promovendo desta forma a existência de intervenções coordenadas entre os profissionais.
... Professors Pipino, Lee and Wang described the principles, which can organizations help to create their own metrics. Studies have confirmed that data quality is a multidimensional concept (Pipino et al., 2002). Companies have to deal with two things, namely:  The subjective perceptions of the individuals involved with the data,  The objective measurements based on the data set in question. ...
... Other dimensions that can be evaluated using this form include concise representation, relevancy, and ease of manipulation. (Pipino, 2002).‖ Min or Max Operation --It is used to handle dimensions that require the aggregation of multiple date quality indicators, the minimum or maximum operation can be applied. ...
... The company wants to insure the rating is normalized, each weighting factor should be between zero and one, and the weighting factors should add to one. Regarding the believability, if the company can specify the degree of importance of each of the variables to the overall believability measure, the weighted average may be an appropriate form to use (Pipino et al., 2002).‖ ...
... (e.g., reliability, accessibility, timeliness) of data quality (Schelling and Robertson, 2020). Wider computing research illustrates different frameworks to assess data quality (Batini et al., 2009;Cichy and Rass, 2019) including combined subjective and objective methods (Pipino et al., 2003). However, we opted to refer to the assessment of information quality rather than data quality as theoretically (from the DIEK hierarchy in Figure 1) practitioners are more inclined to create evidence and knowledge from information sources rather than direct data sources (Dammann, 2018). ...
... However, in future, there is a definite need to conduct sport informatics research for developing more objective methods for assessing information quality in sporting environments. Researchers could refer to information sciences research on objective assessments of data quality to guide such developments (Pipino et al., 2003). Finally, especially due to the optimization of the resistance information flow (Figure 5B), there were reductions in the workloads of the HPU staff (e.g., S&C Coaches). ...
Article
Full-text available
Practical case studies elaborating end-to-end attempts to improve the quality of information flows associated with athlete management processes are scarce in the current sport literature. Therefore, guided by a Business Process Management (BPM) approach, the current study presents the outcomes from a case study to optimize the quality of strength and conditioning (S&C) information flow in the performance department of a professional rugby union club. Initially, the S&C information flow was redesigned using integral technology, activity elimination and activity automation redesign heuristics. Utilizing the Lean Startup framework, the redesigned information flow was digitally transformed by designing data collection, management and visualization systems. Statistical tests used to assess the usability of the data collection systems against industry benchmarks using the System Usability Scale (SUS) administered to 55 players highlighted that its usability (mean SUS score of 87.6 ± 10.76) was well above average industry benchmarks of similar systems (Grade A from SUS scale). In the data visualization system, 14 minor usability problems were identified from 9 cognitive walkthroughs conducted with the High-Performance Unit (HPU) staff. Pre-post optimization information quality was subjectively assessed by administering a standardized questionnaire to the HPU members. The results indicated positive improvements in all of the information quality dimensions (with major improvements to the accessibility) relating to the S&C information flow. Additionally, the methods utilized in the study would be especially beneficial for sporting environments requiring cost effective and easily adoptable information flow digitization initiatives which need to be implemented by its internal staff members.
... All data are actually considered as one of the most important assets for any kind of organizations [1]. Making use of data with inadequate level of quality may lead organizations to experience some losses of their functioning and consequently economic losses [2][3][4][5][6]. Brief examples of this statement are the following [6,7]: the cost of data quality problems is 8-25% of an organization´s revenue; 40-50% of the budget of companies is dedicated to solving problems associated with a low quality of the handled data; organizations can experiment fraudulent reimbursements, exposure to non-delimited risks, payroll overpayments, under billing [8] and things could go even worse: although the 11% of USA firms recognized to have problems for managing data quality, only the 48% have plans for managing the data quality [7]. ...
... Some examples of these experiences were quantified and tabulated in [33]. Thus, if the users assess the level of quality of data as poor, their tasks may become negatively influenced by this assessment [4]. ...
... As far as the sphere of ensuring the data quality is concerned, in literature [13][14] [15], the following four main dimensions can be distinguished through which it can be ensured a level high as possible of the quality of data [1]: i) the data accuracy measures the degree of representativeness of the data stored in databases against the real world's elements which they represent; ii) the data consistency refers to the data's property of respecting the integrity constraints; ...
... Data currency or timeliness is an indicator of measuring the quality of the data used to determine the degree of the currency of data in relation to the specific activity for which they are used [15]. As in the case of the API provided by the National Institute of Statistics of Romania, the API provided by Eurostat presents the same deficiency: the update date is available for the data set as entities and not for records from data sets [1]. ...
Article
Full-text available
With the continuous evolution of society, the analysis of the quality of life of the population has become an increasingly complex process, for which it is necessary to evaluate not only the factors that measure the financial power and the degree of economic development of the region, but also of those through which it can be appreciated the integration of individuals in society and of their implication within the well-functioning community. The importance of such an analysis is revealed from the implications of the insufficiency or even lack of measures to improve the standard of living has on members of society. Thus, as a result of the need of determining the living conditions, the implementation of the European Life Index Framework has been proposed. The Framework aims to automate the process of determining the quality of life of the population, data which the public authorities can use to easily determine the necessary steps for integrating disadvantaged people, reducing the poverty rate of the population, and improving quality of life. After analyzing the level of quality of life in European Union for the period 2007-2017, we have noticed that in the case of the former communist states, the quality of life standard is lower than that of the states which had a political trajectory outside the influence of the communist dictatorial regime. Also, due to the public policies mainly oriented towards citizens, the Nordic states have registered the highest values of the Quality of Life Index, surpassing even the countries of Continental Europe.
... Data collection was performed at LANEM-IQ-UNAM with a Rigaku MicroMax-007HF rotating anode and a Dectris Pilatus3R 200K-A detector (Rigaku, The Woodlands, TX, US). Data were processed using HKL3000 [26], and the spatial group was revised with the program Pointless from CCP4 [27]. The .mtz file from pointless was then used in AutoSol (Phenix) [28] to obtain the phases. ...
... The .mtz file from pointless was then used in AutoSol (Phenix) [28] to obtain the phases. Initial modeling was done automatically with ArpWarp (CCP4) [27], followed by manual modeling using Coot [29]. ...
Article
Full-text available
Escherichia coli (E. coli), one of the most frequently used host for the expression of recombinant proteins, is often affected by the toxic effect of the exogenous proteins that is required to express. Glyceraldehyde 3-phosphate dehydrogenase (GAPDH) is a multi-functional protein that has been used as a control marker for basal function and it is known to undergo cysteine oxidation under different types of cellular stress. Here, we report the 3D structure of the endogenous GAPDH purified from stressed E. coli cells expressing a eukaryotic protein. The structure was solved at 1.64 Å using single-wavelength anomalous dispersion (SAD) phasing with a selenium-modified enzyme. Interestingly, each GAPDH monomer contains a molecule of glyceraldehyde-3 phosphate in a non-previously identified site. Furthermore, the catalytic Cys149 is covalently attached to a 300 Da molecule, possibly glutathione. This modification alters the conformation of an adjacent alpha helix in the catalytic domain, right opposite to the NAD+ binding site. The conformation of the alpha helix is stabilized after soaking the crystals with NAD+. These results exemplify the effects that the overexpression of an exogenous protein has over the host proteins and sheds light on the structural changes that large oxidant molecules on the catalytic cysteine produce for the GAPDH enzyme.
... Although HDI and WHOQOF are two indexes useful in determining a country's level of development, the European Union's Statistical Office (Eurostat) proposed that in official reporting, to measure the quality of life using the following relations [4]: a) material and living conditions; b) productive or main activity; c) health; d) education; e) leisure and social interactions; f) economic and physical safety; g) governance and basic rights; h) natural and living environment; i) overall experience of life. As far as the quality of the data is concerned, in the literature [5] [6] [7] can be distinguished the following four central dimensions which ensure a level of data quality as high as possible: i) data accuracy -measures the degree of representativeness of the data from the database in relation to the elements from real life which they represent; ii) data consistency -is the data's property of respecting the integrity constraints; iii) information completeness -measure the capacity of the database to offer complete information at the user's queries; iv) data currency -reflects the degree of update of the data. ...
... Data currency or timeliness is an indicator which measures the quality of data which reflects the degree of their timeliness in relation to the nature of the activity for which the date is being used [7]. Although the API of the INSSE includes an entry regarding the last date of update of the data sets, this date is available only for the data sets as entities, and not for the records from the data sets. ...
Conference Paper
Full-text available
The analysis of the quality of life has become an increasingly complex process through which a number of economic, political and social factors are examined in order to identify the necessary measures to be taken for the social inclusion of disadvantaged people, to reduce the poverty rate of the population and to increase the living conditions. Through a detailed analysis of the determinants of living standards, we identified the progress that Romania has made since 2007, the year of accession to European Union, until 2016, the year for which the latest open data sets are available. Thus, an important discrepancy it has been identified between the level of quality of life from the region that has obtained the highest score, the region which covers the Romanian capital has been excluded, and the one which has obtained the lowest score. With an income difference of 28% and with four times higher of foreign investments attracted in the Central Region against the NorthEastern Region, the identification and adoption of administrative politics become imperative for ensuring a unitary development of the eight Romanian regions of economic development. Keywords-quality of life index; quality of life dimensions; open data quality; QoLI Framework.
... Hierbei handelt es sich beispielsweise um den wirtschaftlichen Wert für das eigene oder auch andere Unternehmen. [5,6] ...
Technical Report
Full-text available
Das vorliegende Whitepaper befasst sich mit dem Thema Data Governance in der kollaborativen Wertschöpfung anhand der SealedServices Infrastruktur. Dazu wird zunächst Data Governance als Rahmenwerk definiert und in vier Dimensionen aufgeteilt. Diese vier Dimensionen werden jeweils in ihren Auswirkungen auf die drei verschiedenen Infrastrukturebenen und die drei zugehörigen Subebenen beschrieben. Die dabei betrachteten Dimensionen lauten Datenmanagement, Datenqualitätsmanagement, Datensicherheit und regulatorische Vorgaben. Das Datenmanagement befasst sich hierbei insbesondere mit der Verteilung der Verantwortlichkeiten. Die Dimension des Datenqualitätsmanagement ist wiederum unterteilt in fünf Subdimensionen zur möglichst objektiven Einordnung der Datenqualität. Im Bereich der Datensicherheit werden Möglichkeiten zur Sicherung der Komponenten und der Verbindungen durch verschiedene Technologien aufgezeigt.
... Finally, once the data are converted and stored in the Formatted Zone, they are ready for further exploitation. Notice that these data, although reformatted, might still suffer from original data quality issues (e.g., missing values, inconsistencies, redundancy), which should be addressed depending on the final analytical purposes [33]. Addressing such data quality issues is out of the scope of this paper, but having the data uniformly structured largely facilitates data cleaning tasks [34]. ...
Article
Full-text available
The ability to cross data from multiple sources represents a competitive advantage for organizations. Yet, the governance of the data lifecycle, from the data sources into valuable insights, is largely performed in an ad-hoc or manual manner. This is specifically concerning in scenarios where tens or hundreds of continuously evolving data sources produce semi-structured data. To overcome this challenge, we develop a framework for operationalizing and automating data governance. For the first, we propose a zoned data lake architecture and a set of data governance processes that allow the systematic ingestion, transformation and integration of data from heterogeneous sources, in order to make them readily available for business users. For the second, we propose a set of metadata artifacts that allow the automatic execution of data governance processes, addressing a wide range of data management challenges. We showcase the usefulness of the proposed approach using a real world use case, stemming from the collaborative project with the World Health Organization for the management and analysis of data about Neglected Tropical Diseases. Overall, this work contributes on facilitating organizations the adoption of data-driven strategies into a cohesive framework operationalizing and automating data governance.
... For this study the anterior axial refraction, posterior axial refraction, anterior elevation, posterior elevation and pachymetry were used. Some faulty scans by the device with missing or invalid data points were filtered out as part of a data quality assessment 21 . ...
Preprint
Full-text available
Cornea topography maps allow ophthalmologists to screen and diagnose cornea pathologies. We aimto automatically identify any cornea abnormalitiesbased on such cornea topography maps, with focus on diagnosing keratoconus. A set of 1946 consecutive screening scans from the Saarland University Hospital Clinic for Ophthalmology was annotated and used for model training and validation. All scans were recorded witha CASIA2 anterior segment Optical Coherence Tomography (OCT) scanner.We propose to represent the OCT scans as images and apply Convolutional Neural Networks (CNNs) for the automatic analysis. The developed model is based on a state-of-the-art ConvNeXt CNN architecture with weights fine-tuned for the given specific application using the cornea scans dataset. On a new dataset, our model achieves a sensitivity of 97% and a specificity of 97% when distinguishing between Healthy and Pathological corneas. While a comparison to previous work is intricate due tosignificant variations in the experimental setup, our model outperforms other published studies, either in terms of detection performance, and/or in terms of number of potential cornea abnormalities the model can identify. Furthermore, the proposed approach is independent of the topography scanner and allows to visually represent scan regions that drive the models' decision.
... In this paper, we propose the combination of different types of analysis that allow for more robust classification of each data entry quality and the different data dimensions themselves. We adapt the nomenclature proposed in [7,8,9] and consider four types of individual quality assessment: ...
Chapter
Data quality is essential for a correct understanding of the concepts they represent. Data mining is especially relevant when data with inferior quality is used in algorithms that depend on correct data to create accurate models and predictions. In this work, we introduce the issue of errors of identifiers in an anonymous database. The work proposes a quality evaluation approach that considers individual attributes and a contextual analysis that allows additional quality evaluations. The proposed quality analysis model is a robust means of minimizing anonymization costs.KeywordsData pre-processingAnonymized dataData quality
... Our findings further point to several quality dimensions that are considered important in this context. This includes access to data, completeness and the amount of data available, characteristics that were covered in literature (Pipino et al., 2002). We believe that to judge the relevance of a dataset for a tasks users need to be aware of these characteristics. ...
Thesis
Data is one of the most important digital assets in the world thanks to its business and social value. As is becoming increasingly available online, in order to use it effectively, we need tools that allow us to retrieve the most relevant datasets that match our information needs. Web search engines are not well suited for this task as they are designed for documents, not data. In recent years several bespoke search engines have been proposed to help with finding datasets, such as Google Dataset Search crawling the whole web or DataMed focused on creating an index of biomedical datasets. In this work we look closer into the problem of searching for data on the example of Open Data platforms. We first applied a mixed-methods approach aimed at deepening our understanding of users of Open Data portals and types of queries they issue while searching for datasets accompanied by analysis of search sessions over one of these data portals. Based on our findings we look into a particular problem of dataset interpretation - meaning of numerical columns. We propose a novel approach for assigning semantic labels to numerical columns. We conclude our work with the analysis of the future work needed in the field in order to potentially improve the searchability of datasets on the web.
... The best data were obtained from a Nurr1-LBD-PGA2 co-crystal that diffracted to 2.34 Å. The data were indexed, integrated, merged, and scaled using the software iMOSFLM (Battye et al., 2011) and SCALA (Evans, 2006) from the CCP4 suite of programs (Winn et al., 2011). The crystal belonged to the orthorhombic space group P2 1 2 1 2 1 , with four molecules in the asymmetric unit. ...
Article
Full-text available
The orphan nuclear receptor Nurr1 is critical for the development, maintenance, and protection of midbrain dopaminergic neurons. Recently, we demonstrated that prostaglandins E1 (PGE1) and PGA1 directly bind to the ligand-binding domain (LBD) of Nurr1 and stimulate its transcriptional activation function. In this direction, here we report the transcriptional activation of Nurr1 by PGA2, a dehydrated metabolite of PGE2, through physical binding ably supported by NMR titration and crystal structure. The co-crystal structure of Nurr1-LBD bound to PGA2 revealed the covalent coupling of PGA2 with Nurr1-LBD through Cys566. PGA2 binding also induces a 21° shift of the activation function 2 (AF-2) helix H12 away from the protein core, similar to that observed in the Nurr1-LBD-PGA1 complex. We also show that PGA2 can rescue the locomotor deficits and neuronal degeneration in LRRK2 G2019S transgenic fly models.
... X-ray diffraction data from a single crystal was collected over 3600 images at 0.1 o oscillation at the Australian National Synchrotron MX2 beamline (Eiger X 16 M detector). The data were processed in iMosflm 21 , scaled in Aimless 22 and phased using molecular replacement in Phaser 23 with 4UAD 19 as the search model. The structure was modelled and refined in Coot 24 and Phenix 25,26 , respectively. ...
Article
Full-text available
Shuttling of macromolecules between nucleus and cytoplasm is a tightly regulated process mediated through specific interactions between cargo and nuclear transport proteins. In the classical nuclear import pathway, importin alpha recognizes cargo exhibiting a nuclear localization signal, and this complex is transported through the nuclear pore complex by importin beta. Humans possess seven importin alpha isoforms that can be grouped into three subfamilies, with many cargoes displaying specificity towards these importin alpha isoforms. The cargo binding sites within importin alpha isoforms are highly conserved in sequence, suggesting that specificity potentially relies on structural differences. Structures of some importin alpha isoforms, both in cargo-bound and free states, have been previously solved. However, there are currently no known structures of cargo free importin alpha isoforms within subfamily 3 (importin alpha 5, 6, 7). Here, we present the first crystal structure of human importin alpha 7 lacking the IBB domain solved at 2.5 Å resolution. The structure reveals a typical importin alpha architecture comprised of ten armadillo repeats and is most structurally conserved with importin alpha 5. Very little difference in structure was observed between the cargo-bound and free states, implying that importin alpha 7 does not undergo conformational change when binding cargo. These structural insights provide a strong platform for further evaluation of structure–function relationships and understanding how isoform specificity within the importin alpha family plays a role in nuclear transport in health and disease.
... There are several ways in which this deterioration can be quantified. One way is to simply count failing rules, which has been formalized in [48] and been applied in numerous cases [32,47]. In a more general approach, one can use a capacity function to map sets of rules to levels of quality [9]. ...
Chapter
Data quality is a problem studied in many different research disciplines like computer science, statistics and economics. More often than not, these different disciplines come with different perspectives and emphasis. This paper provides a state-of-the-art of data quality management across these disciplines and organizes techniques on two levels: the macro-level and the micro-level. At the macro-level, emphasis lies on the assessment and improvement of processes that affect data quality. Opposed to that, the micro-level has a strong focus of the current database and aims at detection and repair of specific artefacts or errors. We sketch the general methodology for both of these views on the management of data quality and list common methodologies. Finally, we provide a number of open problems and challenges that provide interesting research paths for the future.
... and taking on values ranging from 0 (totally stale data) to 1 (absolutely fresh data) [25]. Considering that p ij 's are independent with respect to j [18], we write: ...
Article
Full-text available
Machine learning has emerged as a promising paradigm for enabling connected, automated vehicles to autonomously cruise the streets and react to unexpected situations. Reacting to such situations requires accurate classification for uncommon events, which in turn depends on the selection of large, diverse, and high-quality training data. In fact, the data available at a vehicle (e.g., photos of road signs) may be affected by errors or have different levels of resolution and freshness. To tackle this challenge, we propose an active learning framework that, leveraging the information collected through onboard sensors as well as received from other vehicles, effectively deals with scarce and noisy data. Given the information received from neighboring vehicles, our solution: (i) selects which vehicles can reliably generate high-quality training data, and (ii) obtains a reliable subset of data to add to the training set by trading off between two essential features, i.e., quality and diversity. The results, obtained with different real-world datasets, demonstrate that our framework significantly outperforms state-of-the-art solutions, providing high classification accuracy with a limited bandwidth requirement for the data exchange between vehicles.
... Images were integrated with XDS 17 and scaled with Aimless. 18 The initial phases were obtained by molecular replacement with Phaser, 19 using the 6-P-β-glucosidase BglA-2 from Streptococcus pneumoniae (PDB id 4IPL) as the search model. The BlBglH crystallographic model was built by cycles of manual adjustments with Coot 20 interspersed with structure refinements with PHENIX. ...
... Timeliness of a digital object is the extent to which it is sufficiently up-to-date for the task at hand. 48 Timeliness measures if the resource includes metadata about when was created, stored, accessed or cited. Users expect updated objects and time of the last freshening is a relevant quality indicator. ...
Article
Cultural heritage institutions have recently started to share their metadata as Linked Open Data (LOD) in order to disseminate and enrich them. The publication of large bibliographic data sets as LOD is a challenge that requires the design and implementation of custom methods for the transformation, management, querying and enrichment of the data. In this report, the methodology defined by previous research for the evaluation of the quality of LOD is analysed and adapted to the specific case of Resource Description Framework (RDF) triples containing standard bibliographic information. The specified quality measures are reported in the case of four highly relevant libraries.
... After workup, the crude product was purified by flash column chromatography (EtOAc, 100%) to give the intermediate, N-(1benzylpiperidin-4-yl)-4-(2-(methoxymethyl)pyrrolidin-1-yl)-Nmethylthieno [3,2-d]pyrimidin-2-amine, a colorless oil (140 mg, 0.31 mmol, 89%). 1 The intermediate (140 mg, 0.31 mmol) was then debenzylated according to general method C. After workup, the crude product was purified by reverse-phase HPLC (50−98% MeOH/water/0.1% formic acid), partially evaporated and then lyophilized to give the title compound (41) (43). According to general method D for reductive amination, (S)-2-(1-(2-(methyl-(piperidin-4-yl)amino)thieno [3,2-d]pyrimidin-4-yl)pyrrolidin-2-yl)acetonitrile (44, 15 mg, 0.042 mmol) was reacted with paraformaldehyde (6.0 mg, 0.2 mmol) and sodium triacetoxyborohydride (85 mg, 0.4 mmol). ...
Article
Full-text available
The leishmaniases, caused by Leishmania species of protozoan parasites, are neglected tropical diseases with 12-15 million cases worldwide. Current therapeutic approaches are limited by toxicity, resistance and cost. N-Myristoyltransferase (NMT), an enzyme ubiquitous and essential in all eukaryotes, has been validated via genetic and pharmacological methods as a promising antileishmanial target. Here we describe a comprehensive structure activity relationship study of a thienopyrimidine series previously identified in a high throughput screen against Leishmania NMT, across 68 compounds in enzyme- and cell-based assay formats. Using a chemical tagging target engagement biomarker assay we identify the first inhibitor in this series with on-target NMT activity in leishmania parasites. Furthermore, crystal structure analyses of 12 derivatives in complex with Leishmania major NMT revealed key factors important for future structure-guided optimization delivering IMP-105 (43), a compound with modest activity against L. donovani intracellular amastigotes and excellent selectivity (>660-fold) for Leishmania NMT over human NMTs.
... APS 24-ID-C synchrotron beamline at an aperture of 30 μm at 5% beam strength and cryo-cooled to 100K. The resulting diffraction data from a single crystal were processed using RAPD [42,43]. All structures were phased using the molecular replacement method in PHENIX [44] using previously published PltB apo structure (PDB: 4RHR) as a search model [19]. ...
Article
Full-text available
Typhoid toxin is an A2B5 toxin secreted from Salmonella Typhi-infected cells during human infection and is suggested to contribute to typhoid disease progression and the establishment of chronic infection. To deliver the enzymatic ‘A’ subunits of the toxin to the site of action in host cells, the receptor-binding ‘B’ subunit PltB binds to the trisaccharide glycan receptor moieties terminated in N-acetylneuraminic acid (Neu5Ac) that is α2–3 or α2–6 linked to the underlying disaccharide, galactose (Gal) and N-acetylglucosamine (GlcNAc). Neu5Ac is present in both unmodified and modified forms, with 9-O-acetylated Neu5Ac being the most common modification in humans. Here we show that host cells associated with typhoid toxin-mediated clinical signs express both unmodified and 9-O-acetylated glycan receptor moieties. We found that PltB binds to 9-O-acetylated α2–3 glycan receptor moieties with a markedly increased affinity, while the binding affinity to 9-O-acetylated α2–6 glycans is only slightly higher, as compared to the affinities of PltB to the unmodified counterparts, respectively. We also present X-ray co-crystal structures of PltB bound to related glycan moieties, which supports the different effects of 9-O-acetylated α2–3 and α2–6 glycan receptor moieties on the toxin binding. Lastly, we demonstrate that the cells exclusively expressing unmodified glycan receptor moieties are less susceptible to typhoid toxin than the cells expressing 9-O-acetylated counterparts, although typhoid toxin intoxicates both cells. These results reveal a fine-tuning mechanism of a bacterial toxin that exploits specific chemical modifications of its glycan receptor moieties for virulence and provide useful insights into the development of therapeutics against typhoid fever.
... with the values of f ij ranging from 0 (totally stale data) to 1 (absolutely fresh data) [13]. Considering that p ij 's are independent with respect to j [14], we write: ...
Preprint
Full-text available
Machine learning has emerged as a promising paradigm for enabling connected, automated vehicles to autonomously cruise the streets and react to unexpected situations. A key challenge, however, is to collect and select real-time and reliable information for the correct classification of unexpected, and often rare, situations that may happen on the road. Indeed, the data generated by vehicles, or received from neighboring vehicles, may be affected by errors or have different levels of resolution and freshness. To tackle this challenge, we propose an active learning framework that, leveraging the information collected through onboard sensors as well as received from other vehicles, effectively deals with scarce and noisy data. In particular, given the available information, our solution selects the data to add to the training set by trading off between two essential features, namely, quality and diversity. The results, obtained using real-world data sets, show that the proposed method significantly outperforms state-of-the-art solutions, providing high classification accuracy at the cost of a limited bandwidth requirement for the data exchange between vehicles.
... with the values of f ij ranging from 0 (totally stale data) to 1 (absolutely fresh data) [13]. Considering that p ij 's are independent with respect to j [14], we write: ...
Conference Paper
Full-text available
Machine learning has emerged as a promising paradigm for enabling connected, automated vehicles to autonomously cruise the streets and react to unexpected situations. A key challenge, however, is to collect and select real-time and reliable information for the correct classification of unexpected, and often rare, situations that may happen on the road. Indeed, the data generated by vehicles, or received from neighboring vehicles, may be affected by errors or have different levels of resolution and freshness. To tackle this challenge, we propose an active learning framework that, leveraging the information collected through onboard sensors as well as received from other vehicles, effectively deals with scarce and noisy data. In particular, given the available information, our solution selects the data to add to the training set by trading off between two essential features, namely, quality and diversity. The results, obtained using real-world data sets, show that the proposed method significantly outperforms state-of-the-art solutions, providing high classification accuracy at the cost of a limited bandwidth requirement for the data exchange between vehicles.
... X-ray diffraction data collection was carried out at the MASSIF-1 automatic beamline of the European Synchrotron (ESRF), Grenoble, France [21]. The dataset was processed and integrated using the program XDS [22] in combination with the program SCALA [23] from the CCP4 package [24]. The crystal belongs to P212121 space group. ...
Article
Full-text available
Tyrosinase-related protein 1 (TYRP1) is one of the three human melanogenic enzymes involved in the biosynthesis of melanin, a pigment responsible for the color of the skin, hair, and eyes. It shares high sequence identity with tyrosinase, but has two zinc ions in its active site rather than two copper ions as in tyrosinase. Typical tyrosinase inhibitors do not directly coordinate to the zinc ions of TYRP1. Here, we show, from an X-ray crystal structure determination, that phenylthiourea, a highly potent tyrosinase inhibitor, does neither coordinate the active site zinc ions, but binds differently from other structurally characterized TYRP1-inhibitor complexes. Its aromatic ring is directed outwards from the active site, apparently as a result from the absence of polar oxygen substituents that can take the position of water molecules bound in the active site. The compound binds via hydrophobic interactions, thereby blocking substrate access to the active site.
... X-ray datasets were collected at X11 and X12 beamlines at EMBL/DESY, Hamburg (DORIS storage ring) for M1 and M7, respectively, and at the European Synchrotron Radiation Facility beamlines ID23-1 and ID14-1, Grenoble, France, at 100K, for M3 and M4/M10, respectively. Data were processed with XDS [31] and scaled using XSCALE (M1, M7, M10), SCALA [32] (M4) and Aimless [33] (M3). ...
Article
Full-text available
In striated muscles, molecular filaments are largely composed of long protein chains with extensive arrays of identically folded domains, referred to as “beads-on-a-string”. It remains a largely unresolved question how these domains have developed a unique molecular profile such that each carries out a distinct function without false-positive readout. This study focuses on the M-band segment of the sarcomeric protein titin, which comprises ten identically folded immunoglobulin domains. Comparative analysis of high-resolution structures of six of these domains ‒ M1, M3, M4, M5, M7, and M10 ‒ reveals considerable structural diversity within three distinct loops and a non-conserved pattern of exposed cysteines. Our data allow to structurally interpreting distinct pathological readouts that result from titinopathy-associated variants. Our findings support general principles that could be used to identify individual structural/functional profiles of hundreds of identically folded protein domains within the sarcomere and other densely crowded cellular environments.
... This potential synergistic effect among data is unique and different than the synergistic effect of the interaction of other resources in organizations (e.g., IT or organizational resources) that has been explored in the literature. We justify this argument by explaining crucial and unique characteristics of data that enable the mutual interaction among two or more data sets to a certain degree (see Table 2): shareability, transportability, combinability, nonconsumability, and versatility (Eaton & Bawden, 1991;Gorla, Somers, & Wong, 2010;Levitin & Redman, 1998;Nelson et al., 2005;Pipino, Lee, & Wang, 2002;Wang & Strong, 1996). The shareability and almost instantaneous transportability of data simplifies the interaction of data from various information systems. ...
Article
Full-text available
Organizations sit on a treasure trove of data. Combining data from a plurality of sources is challenging but comes with enormous potential. Although this phenomenon is crucial for generating value, its underlying synergistic effect is virtually absent in IS literature. Grounded in systems theory, we developed a conceptual framework of data synergy by means of reviewing literature and conducting 24 semi-structured interviews. We reveal various enabling conditions, facilitating super-additive informational and transactional value generation.
Conference Paper
The abundance of current information makes it necessary to select the highest quality documents. For this purpose, it is necessary to deepen the knowledge of information quality systems. The different dimensions of quality are analyzed, and different problems related to these dimensions are discussed. The paper groups these issues into different facets: primary information, its manipulation and interpretation, and the publication and dissemination of information. The impact of these interdependent facets on the production of untruthful information is discussed. Finally, ChatGPT is analyzed as a use case. It is shown how these problems and facets have an impact on the quality of the system and the mentions made by experts are analyzed. Different challenges that artificial intelligence systems face are concluded.
Chapter
Full-text available
Fahrzeugentwickelnde stehen in der frühen Entwicklungsphase vor der Herausforderung, dass entwickelte Konzepte aufgrund Geheimhaltung und finanziell angespannter Projekte kaum mit Zielkunden erprobt und evaluiert werden können. Dies kann zu einer erhöhten Marktunsicher-heit führen, die sich in unklaren Kundenanforderungen an zukünftige Fahrzeuggenerationen widerspiegelt. Unternehmen laufen infolgedessen Gefahr, dass die technisch umgesetzten und die von Kunden tatsächlich geforderten Anforderungen nicht kongruent sind. Die Entwicklung eines neuen technischen Systems basiert stets auf einem Referenzsystem, von dem ausgehend eine neue Systemgeneration durch Variation der darin enthaltenen Referenzsystemelemente (RSE) entwickelt wird. Empirische Studien haben die Potentiale der Analyse von Nutzungsdaten aus diesen Referenzen dargelegt: Entscheidungen können objektiviert, bestehende Marktunsicherheiten reduziert und neue Kundenbedarfe erhoben werden. In diesem Beitrag wird daher eine Methode zur Triangulation von qualitativen und quantitativen Produktnutzungsdaten vorgestellt. Ergebnis der Methode ist ein Set an Nutzungsanforderungen von Kunden und Anwendern an eine zukünftige Systemgeneration. Die Methode wurde in Zusammenarbeit mit einem deutschen OEM entwickelt, in der Konzeptphase angewendet und initial bewertet. Die Methode ergänzt die Erfahrung von Entwickelnden um eine faktenbasierte Entscheidungshilfe und leistet dadurch einen entscheidenden Beitrag zur Reduktion von Marktunsicherheiten. Das Entwicklungsteam hat die Reduktion der wahrgenommenen Marktunsicherheit in einer abschließenden Umfrage bestätigt. Ergänzend dazu wird eine Proxy-Variable für die Marktunsicherheit eingesetzt, um den Erfolgsbeitrag der Methode ganzheitlich zu untersuchen. Am letzten Messpunkt konnte die indexierte Marktunsicherheit um 54% im Vergleich zum Projektstart reduziert werden.
Chapter
Organizations continuously generate and manage extensive amounts of data for specific purposes, such as making informed decisions or monitoring certain parameters. It is not only essential to obtain the data; how it is obtained, stored, and maintained is equally, if not more, valuable. Data quality is a crucial factor for any organization because if the data does not meet the required level of quality, its use will not yield the best results. To maintain adequate levels of quality, organizations need to identify the data requirements or business rules that their data must adhere to for the intended purpose. The most common problem is organizations’ lack of knowledge in identifying business rules adequately. In this regard, a model based on ISO/IEC 25012 enables the assessment of data quality based on an organization’s requirements. As a solution, this work presents a methodology to facilitate the identification and classification of business rules for an organization, as well as their association with each data quality characteristic defined by the ISO/IEC 25012 standard.
Book
Full-text available
Financing Sustainable Development in Egypt
Technical Report
Full-text available
Das vorliegende Whitepaper befasst sich mit den Mehrwerten datengetriebener Dienstleistungen am Beispiel des SealedServices Ökosystems. Hierbei wird zunächst auf die Wichtigkeit von Daten sowie die grundlegende Definition von datengetriebenen Dienstleistungen eingegangen. Darüber hinaus werden Hürden für datengetriebene Dienstleistungen an Beispielen dargestellt und anhand des SealedServices Ökosystems erläutert. Im Kern des Whitepapers werden verschiedene theoretische Modelle erläutert, welche verschiedene Dimensionen der Mehrwerte datengetriebener Dienstleistungen darstellen. Zudem wird ein Mehrwertschema zu datengetriebenen Dienstleistungen dargestellt, welches auf Basis von leitfadengeführten Interviews mit an SealedServices beteiligten Partnerunternehmen erstellt wurde. Die Erkenntnisse daraus werden anschließend mit den theoretischen Darstellungen verglichen und Unterschiede herausgestellt.
Article
Full-text available
By suppressing gene transcription through the recruitment of corepressor proteins, B-cell lymphoma 6 (BCL6) protein controls a transcriptional network required for the formation and maintenance of B-cell germinal centres. As BCL6 deregulation is implicated in the development of Diffuse Large B-Cell Lymphoma, we sought to discover novel small molecule inhibitors that disrupt the BCL6-corepressor protein–protein interaction (PPI). Here we report our hit finding and compound optimisation strategies, which provide insight into the multi-faceted orthogonal approaches that are needed to tackle this challenging PPI with small molecule inhibitors. Using a 1536-well plate fluorescence polarisation high throughput screen we identified multiple hit series, which were followed up by hit confirmation using a thermal shift assay, surface plasmon resonance and ligand-observed NMR. We determined X-ray structures of BCL6 bound to compounds from nine different series, enabling a structure-based drug design approach to improve their weak biochemical potency. We developed a time-resolved fluorescence energy transfer biochemical assay and a nano bioluminescence resonance energy transfer cellular assay to monitor cellular activity during compound optimisation. This workflow led to the discovery of novel inhibitors with respective biochemical and cellular potencies (IC50s) in the sub-micromolar and low micromolar range.
Article
Aim: To offer high-quality healthcare services, individuals need to utilize high-quality information. The present study aims to evaluate the data quality in the hospital information system (HIS) at a selected educational hospital. Method: This descriptive and cross-sectional study was conducted in 2018. The statistical population consisted of 202 users of the hospital HIS at Amiralmomenin Hospital in Zabol. The respondents were selected using stratified random sampling. Data were collected using a researcher-made questionnaire. Then, they were analyzed through SPSS-20 and descriptive statistics. Results: It was found that 45 of the respondents stated in the comprehensibility of the hospital information, while 76 considered the hospital information not very understandable. Moreover, 34.7% believed that the hospital information would be rapidly accessible when needed. The average scores of the dimensions were found to be 5-8.5, and there were significant, positive relationships between all the dimensions under the study (P-value<0.05). Conclusion: Findings suggest that only a small number of staff had complete information on the HIS and associated subsystems. Other respondents lacked sufficient awareness of the HIS or were unaware of its existence. The authors suggest that the needs of users be evaluated before designing a HIS system in order to ensure that it will meet those needs. Despite the use of HIS subsystems in all the units of the hospital under study, respondents had insufficient information on how these subsystems could be used.
Article
This article aims to explore the quality of the information provided by public authorities to citizens, businesses, and other stakeholders as part of the implementation of e-services of “A State in a Smartphone” in Ukraine. The article presents the structure of the authors’ model of information quality assessment, which includes three levels of characteristics and allows calculating the integral indicator of information quality. The model involves the use of expert research methods. The results of the study indicate that the information provided to users by public authorities has a fairly high level of quality, but there are reserves for improvement.
Conference Paper
Full-text available
The presented article provides results on Deep Learning model application for internet traffic classification. The classification task is performed for video-traffic of two types: real-time streaming video and on-demand video record. Multiple Machine Learning methods have been used to solve the specified task, and this article presents results obtained by Convolutional Neural Networks method. The article describes data pre-processing and formatting, as well as used model parameters. The article is concluded with some of the obtained results presented in form of Tables and 3D surface plots. The conclusion finalizes the article.
Article
Full-text available
Open Government Data (OGD) has the potential to support social and economic progress. However, this potential can be frustrated if this data remains unused. Although the literature suggests that OGD datasets metadata quality is one of the main factors affecting their use, to the best of our knowledge, no quantitative study provided evidence of this relationship. Considering about 400,000 datasets of 28 national, municipal, and international OGD portals, we have programmatically analyzed their usage, their metadata quality, and the relationship between the two. Our analysis has highlighted three main findings. First of all, regardless of their size, the software platform adopted, and their administrative and territorial coverage, most OGD datasets are underutilized. Second, OGD portals pay varying attention to the quality of their datasets' metadata. Third, we did not find clear evidence that datasets usage is positively correlated to better metadata publishing practices. Finally, we have considered other factors, such as datasets' category, and some demographic characteristics of the OGD portals, and analyzed their relationship with datasets usage, obtaining partially affirmative answers.
Article
Full-text available
Box C/D ribonucleoprotein complexes are RNA-guided methyltransferases that methylate the ribose 2’-OH of RNA. The central ‘guide RNA’ has box C and D motifs at its ends, which are crucial for activity. Archaeal guide RNAs have a second box C’/D’ motif pair that is also essential for function. This second motif is poorly conserved in eukaryotes and its function is uncertain. Conflicting literature data report that eukaryotic box C’/D’ motifs do or do not bind proteins specialized to recognize box C/D-motifs and are or are not important for function. Despite this uncertainty, the architecture of eukaryotic 2’-O-methylation enzymes is thought to be similar to that of their archaeal counterpart. Here, we use biochemistry, X-ray crystallography and mutant analysis to demonstrate the absence of functional box C’/D’ motifs in more than 80% of yeast guide RNAs. We conclude that eukaryotic Box C/D RNPs have two non-symmetric protein assembly sites and that their three-dimensional architecture differs from that of archaeal 2’-O-methylation enzymes.
Chapter
Für eine ökonomische Betrachtung der Datenqualität (DQ) und insbesondere die Planung von DQ-Maßnahmen unter Kosten-Nutzen-Aspekten sind DQ-Metriken unverzichtbar. Deswegen wird im Folgenden die Fragestellung aufgegriffen, wie DQ zweckorientiert und adäquat quantifiziert werden kann. Dazu werden Metriken entwickelt und vorgestellt, die zum einen eine quantitative Analyse der zum Messzeitpunkt vorhandenen DQ ermöglichen sollen, um Handlungsbedarfe zu identifizieren. Zum anderen sollen Auswirkungen auf die DQ, wie z. B. zeitlicher Verfall oder die Durchführung von DQ-Maßnahmen, zielgerichtet – durch Vergleich des DQ-Niveaus zu zwei oder mehreren Messzeitpunkten – untersucht werden können.
Article
Inductively coupled plasma mass spectrometry (ICP-MS) has become inevitable for quantitative determination of trace and ultra-trace elements in geological samples. Inferences derived from ICP-MS data sets and other supportive evidence have revolutionized theories on geology/geodynamics. In this scenario, validation through interlaboratory studies plays an important role in assuring the quality of measurements, along with the performance and accreditation programs of the laboratory. The Geological Survey of India (GSI) initiated an interlaboratory testing program on soil and stream sediment testing in which about fifteen laboratories, including our geochemical laboratory participated. Trace and rare-earth elements (REE) determined on the reference material PKS-1 by high-resolution ICP-MS and validated with the certified values for PKS-1 provided by GSI after compilation. The fitness of acquired data for either “pure geochemistry” or “applied geochemistry” was determined based on the Z-scores. The data for most of the analytes fell within the -2 < Z < + 2 range, which included the majority of the trace and REE used in petrogenetic and provenance studies. These results help to review the analytical mismatches observed in the data, optimize necessary aspects and minimize the interference effects caused during analysis.
Thesis
Full-text available
With an increasing amount of research data generated and to be processed, as well as an increasing distribution of the reuse of data (secondary data use), there is also an increased need for the evaluation of their data quality. Through opinions, recommendations and guidelines from professional societies and scientific associations such as the Rat für Informationsinfrastrukturen (RfII) or the Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V (TMF), the evaluation, and improvement of data quality are becoming more and more conscious of the research community. Demands for the integration of data quality management in the FAIR process and the GCP regulations further underline this increasing awareness. In the context of this work, an overview of the challenges and the possibilities based on them for operationalizing data quality is provided. In addition to the theoretical preparation and conceptual delimitation of the subject area, the work also contributes to the establishment of a partially automated solution for evaluating and displaying data quality. The concepts "categories", "dimensions", "indicators" and "parameters" of data quality identified in the literature are explained and related to each other. The work describes a possible procedure for evaluating the quality of a data record regardless of the type and scope of the information contained. In a second step, indicators for operationalizing data quality are defined and examined for their possible use in a generic evaluation algorithm. The indicators suitable for a generic assessment approach are described in more detail and first formalized as part of the preparation of implementation as an automatable script. Finally, the calculation and visualization of these quality indicators are implemented in a statistical script in the language R. To promote the automation of the evaluation, the developed R script was additionally implemented as a SmartR plugin. The American Gut Project data set is used as the reference data set for the development of the R script and SmartR plugin. The developed plugin is designed for integration into the integrated research data platform tranSMART, where it can evaluate the data quality of the data records it contains. The overview is rounded off by excursions into the areas of data cleaning and statistics with R. Also, the different forms of "Data Repositories", "Data Warehouses", and "Data Lakes" are differentiated. The final part of the thesis focuses on a description of further use cases and potential extensions, as well as a critical examination of the existing quality concepts. In summary, only a few indicators turned out to be fundamentally suitable for a generic approach, since the calculation of the majority of the indicators defined in the literature sometimes requires considerable preliminary and additional information. Finally, only descriptive-statistical values for incomplete or missing entries, completeness and outliers prove to be suitable. The chosen approach with maximum generics, therefore, leads to a reduced informative value compare to other approaches, so that further adjustments are necessary for productive use.
Article
Factor XIa inhibitors are promising novel anticoagulants which show excellent efficacy in preclinical thrombosis models with minimal effects on hemostasis. The discovery of potent and selective FXIa inhibitors which are also orally bioavailable has been a challenge. Here, we describe optimization of the imidazole-based macrocyclic series and our initial progress towards meeting this challenge. A two-pronged strategy, which focused on replacement of the imidazole scaffold and the design of new P1 groups, led to the discovery of potent, orally bioavailable pyridine-based macrocyclic FXIa inhibitors. Moreover, pyridine-based macrocycle 19, possessing the phenylimidazole carboxamide P1, exhibited excellent selectivity against relevant blood coagulation enzymes and displayed antithrombotic efficacy in a rabbit thrombosis model.
Preprint
Full-text available
Viral escape from CD8 ⁺ cytotoxic T lymphocyte responses correlates with disease progression and represents a significant challenge for vaccination. Here, we demonstrate that CD8 ⁺ T cell recognition of the naturally occurring MHC-I-restricted LCMV-associated immune escape variant Y4F is restored following vaccination with a proline-altered peptide ligand (APL). The APL increases MHC/peptide (pMHC) complex stability, rigidifies the peptide and facilitates T cell receptor (TCR) recognition through reduced entropy costs. Structural analyses of pMHC complexes before and after TCR binding, combined with biophysical analyses, revealed that although the TCR binds similarly to all complexes, the p3P modification alters the conformations of a very limited amount of specific MHC and peptide residues, facilitating efficient TCR recognition. This approach can be easily introduced in peptides restricted to other MHC alleles, and can be combined with currently available and future vaccination protocols in order to prevent viral immune escape. Author Summary Viral escape mutagenesis correlates often with disease progression and represents a major hurdle for vaccination-based therapies. Here, we have designed and developed a novel generation of altered epitopes that re-establish and enhance significantly CD8 ⁺ T cell recognition of a naturally occurring viral immune escape variant. Biophysical and structural analyses provide a clear understanding of the molecular mechanisms underlying this reestablished recognition. We believe that this approach can be implemented to currently available or novel vaccination approaches to efficiently restore T cell recognition of virus escape variants to control disease progression.
ResearchGate has not been able to resolve any references for this publication.