Data quality assessment[J]

Will e-Vaccinas be a Quality Information System?

Conference Paper

Jan 2023

A qualidade de informação é essencial para a sobrevivência de qualquer instituição, pelo que os Serviços de Saúde necessitam de sistemas de qualidade que correspondam eficazmente às exigências dos seus utilizadores. Foi avaliada a qualidade do e-Vacinas, através da metodologia Data Quality Assessment em dois distritos do Alentejo, ao qual revelou que este sistema de informação assume um papel importante na qualidade da prestação de cuidados, nomeadamente na vacinação ao nível da eficiência e da eficácia do atendimento dos utentes e na diminuição do risco de erros. O e-Vacinas é um sistema de qualidade, uma vez que a avaliação das quatro dimensões propostas pelos autores, revelam um resultado global considerado forte, demonstrando ainda, que este aplicativo é instrumento de trabalho essencial na prestação de cuidados, facilitando a comunicação e a partilha de informação, promovendo desta forma a existência de intervenções coordenadas entre os profissionais.

Information Quality

Chapter

Jan 2011

Part 2 - Digitally Optimizing the Information Flows Necessary to Manage Professional Athletes: A Case Study in Rugby Union

Article

Full-text available

Jun 2022

Practical case studies elaborating end-to-end attempts to improve the quality of information flows associated with athlete management processes are scarce in the current sport literature. Therefore, guided by a Business Process Management (BPM) approach, the current study presents the outcomes from a case study to optimize the quality of strength and conditioning (S&C) information flow in the performance department of a professional rugby union club. Initially, the S&C information flow was redesigned using integral technology, activity elimination and activity automation redesign heuristics. Utilizing the Lean Startup framework, the redesigned information flow was digitally transformed by designing data collection, management and visualization systems. Statistical tests used to assess the usability of the data collection systems against industry benchmarks using the System Usability Scale (SUS) administered to 55 players highlighted that its usability (mean SUS score of 87.6 ± 10.76) was well above average industry benchmarks of similar systems (Grade A from SUS scale). In the data visualization system, 14 minor usability problems were identified from 9 cognitive walkthroughs conducted with the High-Performance Unit (HPU) staff. Pre-post optimization information quality was subjectively assessed by administering a standardized questionnaire to the HPU members. The results indicated positive improvements in all of the information quality dimensions (with major improvements to the accessibility) relating to the S&C information flow. Additionally, the methods utilized in the study would be especially beneficial for sporting environments requiring cost effective and easily adoptable information flow digitization initiatives which need to be implemented by its internal staff members.

Specifying Data Quality Requirements through Web Functionalities – MOSQAF

Article

Dec 2021

eLIF: European Life Index Framework - An Analysis for the Case of European Union Countries

Article

Full-text available

Dec 2019

With the continuous evolution of society, the analysis of the quality of life of the population has become an increasingly complex process, for which it is necessary to evaluate not only the factors that measure the financial power and the degree of economic development of the region, but also of those through which it can be appreciated the integration of individuals in society and of their implication within the well-functioning community. The importance of such an analysis is revealed from the implications of the insufficiency or even lack of measures to improve the standard of living has on members of society. Thus, as a result of the need of determining the living conditions, the implementation of the European Life Index Framework has been proposed. The Framework aims to automate the process of determining the quality of life of the population, data which the public authorities can use to easily determine the necessary steps for integrating disadvantaged people, reducing the poverty rate of the population, and improving quality of life. After analyzing the level of quality of life in European Union for the period 2007-2017, we have noticed that in the case of the former communist states, the quality of life standard is lower than that of the states which had a political trajectory outside the influence of the communist dictatorial regime. Also, due to the public policies mainly oriented towards citizens, the Nordic states have registered the highest values of the Quality of Life Index, surpassing even the countries of Continental Europe.

A Novel Substrate-Binding Site in the X-Ray Structure of an Oxidized E. coli Glyceraldehyde 3-Phosphate Dehydrogenase Elucidated by Single-Wavelength Anomalous Dispersion

Article

Full-text available

Nov 2019

Escherichia coli (E. coli), one of the most frequently used host for the expression of recombinant proteins, is often affected by the toxic effect of the exogenous proteins that is required to express. Glyceraldehyde 3-phosphate dehydrogenase (GAPDH) is a multi-functional protein that has been used as a control marker for basal function and it is known to undergo cysteine oxidation under different types of cellular stress. Here, we report the 3D structure of the endogenous GAPDH purified from stressed E. coli cells expressing a eukaryotic protein. The structure was solved at 1.64 Å using single-wavelength anomalous dispersion (SAD) phasing with a selenium-modified enzyme. Interestingly, each GAPDH monomer contains a molecule of glyceraldehyde-3 phosphate in a non-previously identified site. Furthermore, the catalytic Cys149 is covalently attached to a 300 Da molecule, possibly glutathione. This modification alters the conformation of an adjacent alpha helix in the catalytic domain, right opposite to the NAD+ binding site. The conformation of the alpha helix is stabilized after soaking the crystals with NAD+. These results exemplify the effects that the overexpression of an exogenous protein has over the host proteins and sheds light on the structural changes that large oxidant molecules on the catalytic cysteine produce for the GAPDH enzyme.

Quality of Life Index Analysis for the Case of Romanian Regions

Conference Paper

Full-text available

Feb 2019

The analysis of the quality of life has become an increasingly complex process through which a number of economic, political and social factors are examined in order to identify the necessary measures to be taken for the social inclusion of disadvantaged people, to reduce the poverty rate of the population and to increase the living conditions. Through a detailed analysis of the determinants of living standards, we identified the progress that Romania has made since 2007, the year of accession to European Union, until 2016, the year for which the latest open data sets are available. Thus, an important discrepancy it has been identified between the level of quality of life from the region that has obtained the highest score, the region which covers the Romanian capital has been excluded, and the one which has obtained the lowest score. With an income difference of 28% and with four times higher of foreign investments attracted in the Central Region against the NorthEastern Region, the identification and adoption of administrative politics become imperative for ensuring a unitary development of the eight Romanian regions of economic development. Keywords-quality of life index; quality of life dimensions; open data quality; QoLI Framework.

SealedServices - Data Governance in der kollaborativen Wertschöpfung

Technical Report

Full-text available

Mar 2023

Das vorliegende Whitepaper befasst sich mit dem Thema Data Governance in der kollaborativen Wertschöpfung anhand der SealedServices Infrastruktur. Dazu wird zunächst Data Governance als Rahmenwerk definiert und in vier Dimensionen aufgeteilt. Diese vier Dimensionen werden jeweils in ihren Auswirkungen auf die drei verschiedenen Infrastrukturebenen und die drei zugehörigen Subebenen beschrieben. Die dabei betrachteten Dimensionen lauten Datenmanagement, Datenqualitätsmanagement, Datensicherheit und regulatorische Vorgaben. Das Datenmanagement befasst sich hierbei insbesondere mit der Verteilung der Verantwortlichkeiten. Die Dimension des Datenqualitätsmanagement ist wiederum unterteilt in fünf Subdimensionen zur möglichst objektiven Einordnung der Datenqualität. Im Bereich der Datensicherheit werden Möglichkeiten zur Sicherung der Komponenten und der Verbindungen durch verschiedene Technologien aufgezeigt.

Operationalizing and automating Data Governance

Article

Full-text available

Dec 2022

The ability to cross data from multiple sources represents a competitive advantage for organizations. Yet, the governance of the data lifecycle, from the data sources into valuable insights, is largely performed in an ad-hoc or manual manner. This is specifically concerning in scenarios where tens or hundreds of continuously evolving data sources produce semi-structured data. To overcome this challenge, we develop a framework for operationalizing and automating data governance. For the first, we propose a zoned data lake architecture and a set of data governance processes that allow the systematic ingestion, transformation and integration of data from heterogeneous sources, in order to make them readily available for business users. For the second, we propose a set of metadata artifacts that allow the automatic execution of data governance processes, addressing a wide range of data management challenges. We showcase the usefulness of the proposed approach using a real world use case, stemming from the collaborative project with the World Health Organization for the management and analysis of data about Neglected Tropical Diseases. Overall, this work contributes on facilitating organizations the adoption of data-driven strategies into a cohesive framework operationalizing and automating data governance.

Automated Cornea Diagnosis using Deep Convolutional Neural Networks based on Cornea Topography Maps

Preprint

Full-text available

Nov 2022

Cornea topography maps allow ophthalmologists to screen and diagnose cornea pathologies. We aimto automatically identify any cornea abnormalitiesbased on such cornea topography maps, with focus on diagnosing keratoconus. A set of 1946 consecutive screening scans from the Saarland University Hospital Clinic for Ophthalmology was annotated and used for model training and validation. All scans were recorded witha CASIA2 anterior segment Optical Coherence Tomography (OCT) scanner.We propose to represent the OCT scans as images and apply Convolutional Neural Networks (CNNs) for the automatic analysis. The developed model is based on a state-of-the-art ConvNeXt CNN architecture with weights fine-tuned for the given specific application using the cornea scans dataset. On a new dataset, our model achieves a sensitivity of 97% and a specificity of 97% when distinguishing between Healthy and Pathological corneas. While a comparison to previous work is intricate due tosignificant variations in the experimental setup, our model outperforms other published studies, either in terms of detection performance, and/or in terms of number of potential cornea abnormalities the model can identify. Furthermore, the proposed approach is independent of the topography scanner and allows to visually represent scan regions that drive the models' decision.

Errors of Identifiers in Anonymous Databases: Impact on Data Quality

Chapter

Oct 2022

Data quality is essential for a correct understanding of the concepts they represent. Data mining is especially relevant when data with inferior quality is used in algorithms that depend on correct data to create accurate models and predictions. In this work, we introduce the issue of errors of identifiers in an anonymous database. The work proposes a quality evaluation approach that considers individual attributes and a contextual analysis that allows additional quality evaluations. The proposed quality analysis model is a robust means of minimizing anonymization costs.KeywordsData pre-processingAnonymized dataData quality

Improving searchability of datasets

Thesis

Mar 2022

Emilia Kacprzak

Data is one of the most important digital assets in the world thanks to its business and social value. As is becoming increasingly available online, in order to use it effectively, we need tools that allow us to retrieve the most relevant datasets that match our information needs. Web search engines are not well suited for this task as they are designed for documents, not data. In recent years several bespoke search engines have been proposed to help with finding datasets, such as Google Dataset Search crawling the whole web or DataMed focused on creating an index of biomedical datasets. In this work we look closer into the problem of searching for data on the example of Open Data platforms. We first applied a mixed-methods approach aimed at deepening our understanding of users of Open Data portals and types of queries they issue while searching for datasets accompanied by analysis of search sessions over one of these data portals. Based on our findings we look into a particular problem of dataset interpretation - meaning of numerical columns. We propose a novel approach for assigning semantic labels to numerical columns. We conclude our work with the analysis of the future work needed in the field in order to potentially improve the searchability of datasets on the web.

Prostaglandin A2 Interacts with Nurr1 and Ameliorates Behavioral Deficits in Parkinson’s Disease Fly Model

Article

Full-text available

Apr 2022

The orphan nuclear receptor Nurr1 is critical for the development, maintenance, and protection of midbrain dopaminergic neurons. Recently, we demonstrated that prostaglandins E1 (PGE1) and PGA1 directly bind to the ligand-binding domain (LBD) of Nurr1 and stimulate its transcriptional activation function. In this direction, here we report the transcriptional activation of Nurr1 by PGA2, a dehydrated metabolite of PGE2, through physical binding ably supported by NMR titration and crystal structure. The co-crystal structure of Nurr1-LBD bound to PGA2 revealed the covalent coupling of PGA2 with Nurr1-LBD through Cys566. PGA2 binding also induces a 21° shift of the activation function 2 (AF-2) helix H12 away from the protein core, similar to that observed in the Nurr1-LBD-PGA1 complex. We also show that PGA2 can rescue the locomotor deficits and neuronal degeneration in LRRK2 G2019S transgenic fly models.

Structural characterization of human importin alpha 7 in its cargo-free form at 2.5 Å resolution

Article

Full-text available

Jan 2022

Shuttling of macromolecules between nucleus and cytoplasm is a tightly regulated process mediated through specific interactions between cargo and nuclear transport proteins. In the classical nuclear import pathway, importin alpha recognizes cargo exhibiting a nuclear localization signal, and this complex is transported through the nuclear pore complex by importin beta. Humans possess seven importin alpha isoforms that can be grouped into three subfamilies, with many cargoes displaying specificity towards these importin alpha isoforms. The cargo binding sites within importin alpha isoforms are highly conserved in sequence, suggesting that specificity potentially relies on structural differences. Structures of some importin alpha isoforms, both in cargo-bound and free states, have been previously solved. However, there are currently no known structures of cargo free importin alpha isoforms within subfamily 3 (importin alpha 5, 6, 7). Here, we present the first crystal structure of human importin alpha 7 lacking the IBB domain solved at 2.5 Å resolution. The structure reveals a typical importin alpha architecture comprised of ten armadillo repeats and is most structurally conserved with importin alpha 5. Very little difference in structure was observed between the cargo-bound and free states, implying that importin alpha 7 does not undergo conformational change when binding cargo. These structural insights provide a strong platform for further evaluation of structure–function relationships and understanding how isoform specificity within the importin alpha family plays a role in nuclear transport in health and disease.

Data Quality Management: An Overview of Methods and Challenges

Chapter

Sep 2021

Antoon Bronselaer

Data quality is a problem studied in many different research disciplines like computer science, statistics and economics. More often than not, these different disciplines come with different perspectives and emphasis. This paper provides a state-of-the-art of data quality management across these disciplines and organizes techniques on two levels: the macro-level and the micro-level. At the macro-level, emphasis lies on the assessment and improvement of processes that affect data quality. Opposed to that, the micro-level has a strong focus of the current database and aims at detection and repair of specific artefacts or errors. We sketch the general methodology for both of these views on the management of data quality and list common methodologies. Finally, we provide a number of open problems and challenges that provide interesting research paths for the future.

Active Learning With Noisy Labelers for Improving Classification Accuracy of Connected Vehicles

Article

Full-text available

Mar 2021

Machine learning has emerged as a promising paradigm for enabling connected, automated vehicles to autonomously cruise the streets and react to unexpected situations. Reacting to such situations requires accurate classification for uncommon events, which in turn depends on the selection of large, diverse, and high-quality training data. In fact, the data available at a vehicle (e.g., photos of road signs) may be affected by errors or have different levels of resolution and freshness. To tackle this challenge, we propose an active learning framework that, leveraging the information collected through onboard sensors as well as received from other vehicles, effectively deals with scarce and noisy data. Given the information received from neighboring vehicles, our solution: (i) selects which vehicles can reliably generate high-quality training data, and (ii) obtains a reliable subset of data to add to the training set by trading off between two essential features, i.e., quality and diversity. The results, obtained with different real-world datasets, demonstrate that our framework significantly outperforms state-of-the-art solutions, providing high classification accuracy with a limited bandwidth requirement for the data exchange between vehicles.

X-ray Structure, Bioinformatics Analysis, and Substrate Specificity of a 6-Phospho-β-glucosidase Glycoside Hydrolase 1 Enzyme from Bacillus licheniformis

Article

Nov 2020
J CHEM INF MODEL

Evaluating the quality of linked open data in digital libraries

Article

Aug 2020
J INF SCI

Cultural heritage institutions have recently started to share their metadata as Linked Open Data (LOD) in order to disseminate and enrich them. The publication of large bibliographic data sets as LOD is a challenge that requires the design and implementation of custom methods for the transformation, management, querying and enrichment of the data. In this report, the methodology defined by previous research for the evaluation of the quality of LOD is analysed and adapted to the specific case of Resource Description Framework (RDF) triples containing standard bibliographic information. The specified quality measures are reported in the case of four highly relevant libraries.

Novel Thienopyrimidine Inhibitors of Leishmania N-Myristoyltransferase with On-Target Activity in Intracellular Amastigotes

Article

Full-text available

Jun 2020

The leishmaniases, caused by Leishmania species of protozoan parasites, are neglected tropical diseases with 12-15 million cases worldwide. Current therapeutic approaches are limited by toxicity, resistance and cost. N-Myristoyltransferase (NMT), an enzyme ubiquitous and essential in all eukaryotes, has been validated via genetic and pharmacological methods as a promising antileishmanial target. Here we describe a comprehensive structure activity relationship study of a thienopyrimidine series previously identified in a high throughput screen against Leishmania NMT, across 68 compounds in enzyme- and cell-based assay formats. Using a chemical tagging target engagement biomarker assay we identify the first inhibitor in this series with on-target NMT activity in leishmania parasites. Furthermore, crystal structure analyses of 12 derivatives in complex with Leishmania major NMT revealed key factors important for future structure-guided optimization delivering IMP-105 (43), a compound with modest activity against L. donovani intracellular amastigotes and excellent selectivity (>660-fold) for Leishmania NMT over human NMTs.

The role of 9-O-acetylated glycan receptor moieties in the typhoid toxin binding and intoxication

Article

Full-text available

Feb 2020
PLOS PATHOG

Typhoid toxin is an A2B5 toxin secreted from Salmonella Typhi-infected cells during human infection and is suggested to contribute to typhoid disease progression and the establishment of chronic infection. To deliver the enzymatic ‘A’ subunits of the toxin to the site of action in host cells, the receptor-binding ‘B’ subunit PltB binds to the trisaccharide glycan receptor moieties terminated in N-acetylneuraminic acid (Neu5Ac) that is α2–3 or α2–6 linked to the underlying disaccharide, galactose (Gal) and N-acetylglucosamine (GlcNAc). Neu5Ac is present in both unmodified and modified forms, with 9-O-acetylated Neu5Ac being the most common modification in humans. Here we show that host cells associated with typhoid toxin-mediated clinical signs express both unmodified and 9-O-acetylated glycan receptor moieties. We found that PltB binds to 9-O-acetylated α2–3 glycan receptor moieties with a markedly increased affinity, while the binding affinity to 9-O-acetylated α2–6 glycans is only slightly higher, as compared to the affinities of PltB to the unmodified counterparts, respectively. We also present X-ray co-crystal structures of PltB bound to related glycan moieties, which supports the different effects of 9-O-acetylated α2–3 and α2–6 glycan receptor moieties on the toxin binding. Lastly, we demonstrate that the cells exclusively expressing unmodified glycan receptor moieties are less susceptible to typhoid toxin than the cells expressing 9-O-acetylated counterparts, although typhoid toxin intoxicates both cells. These results reveal a fine-tuning mechanism of a bacterial toxin that exploits specific chemical modifications of its glycan receptor moieties for virulence and provide useful insights into the development of therapeutics against typhoid fever.

Active Learning-based Classification in Automated Connected Vehicles

Preprint

Full-text available

Feb 2020

Machine learning has emerged as a promising paradigm for enabling connected, automated vehicles to autonomously cruise the streets and react to unexpected situations. A key challenge, however, is to collect and select real-time and reliable information for the correct classification of unexpected, and often rare, situations that may happen on the road. Indeed, the data generated by vehicles, or received from neighboring vehicles, may be affected by errors or have different levels of resolution and freshness. To tackle this challenge, we propose an active learning framework that, leveraging the information collected through onboard sensors as well as received from other vehicles, effectively deals with scarce and noisy data. In particular, given the available information, our solution selects the data to add to the training set by trading off between two essential features, namely, quality and diversity. The results, obtained using real-world data sets, show that the proposed method significantly outperforms state-of-the-art solutions, providing high classification accuracy at the cost of a limited bandwidth requirement for the data exchange between vehicles.

Active Learning-based Classification in Automated Connected Vehicles

Conference Paper

Full-text available

Feb 2020

Machine learning has emerged as a promising paradigm for enabling connected, automated vehicles to autonomously cruise the streets and react to unexpected situations. A key challenge, however, is to collect and select real-time and reliable information for the correct classification of unexpected, and often rare, situations that may happen on the road. Indeed, the data generated by vehicles, or received from neighboring vehicles, may be affected by errors or have different levels of resolution and freshness. To tackle this challenge, we propose an active learning framework that, leveraging the information collected through onboard sensors as well as received from other vehicles, effectively deals with scarce and noisy data. In particular, given the available information, our solution selects the data to add to the training set by trading off between two essential features, namely, quality and diversity. The results, obtained using real-world data sets, show that the proposed method significantly outperforms state-of-the-art solutions, providing high classification accuracy at the cost of a limited bandwidth requirement for the data exchange between vehicles.

Phenylthiourea Binding to Human Tyrosinase-Related Protein 1

Article

Full-text available

Jan 2020
INT J MOL SCI

Tyrosinase-related protein 1 (TYRP1) is one of the three human melanogenic enzymes involved in the biosynthesis of melanin, a pigment responsible for the color of the skin, hair, and eyes. It shares high sequence identity with tyrosinase, but has two zinc ions in its active site rather than two copper ions as in tyrosinase. Typical tyrosinase inhibitors do not directly coordinate to the zinc ions of TYRP1. Here, we show, from an X-ray crystal structure determination, that phenylthiourea, a highly potent tyrosinase inhibitor, does neither coordinate the active site zinc ions, but binds differently from other structurally characterized TYRP1-inhibitor complexes. Its aromatic ring is directed outwards from the active site, apparently as a result from the absence of polar oxygen substituents that can take the position of water molecules bound in the active site. The compound binds via hydrophobic interactions, thereby blocking substrate access to the active site.

Structural diversity in the atomic resolution 3D fingerprint of the titin M-band segment

Article

Full-text available

Dec 2019
PLOS ONE

In striated muscles, molecular filaments are largely composed of long protein chains with extensive arrays of identically folded domains, referred to as “beads-on-a-string”. It remains a largely unresolved question how these domains have developed a unique molecular profile such that each carries out a distinct function without false-positive readout. This study focuses on the M-band segment of the sarcomeric protein titin, which comprises ten identically folded immunoglobulin domains. Comparative analysis of high-resolution structures of six of these domains ‒ M1, M3, M4, M5, M7, and M10 ‒ reveals considerable structural diversity within three distinct loops and a non-conserved pattern of exposed cysteines. Our data allow to structurally interpreting distinct pathological readouts that result from titinopathy-associated variants. Our findings support general principles that could be used to identify individual structural/functional profiles of hundreds of identically folded protein domains within the sarcomere and other densely crowded cellular environments.

Turning Data into Value - Exploring the Role of Synergy in Leveraging Value among Data

Article

Full-text available

Nov 2019

Organizations sit on a treasure trove of data. Combining data from a plurality of sources is challenging but comes with enormous potential. Although this phenomenon is crucial for generating value, its underlying synergistic effect is virtually absent in IS literature. Grounded in systems theory, we developed a conceptual framework of data synergy by means of reviewing literature and conducting 24 semi-structured interviews. We reveal various enabling conditions, facilitating super-additive informational and transactional value generation.

Factors Affecting the Reliability of Information: The Case of ChatGPT

Conference Paper

Jan 2024

The abundance of current information makes it necessary to select the highest quality documents. For this purpose, it is necessary to deepen the knowledge of information quality systems. The different dimensions of quality are analyzed, and different problems related to these dimensions are discussed. The paper groups these issues into different facets: primary information, its manipulation and interpretation, and the publication and dissemination of information. The impact of these interdependent facets on the production of untruthful information is discussed. Finally, ChatGPT is analyzed as a use case. It is shown how these problems and facets have an impact on the quality of the system and the mentions made by experts are analyzed. Different challenges that artificial intelligence systems face are concluded.

A Unified Framework for Querying Dynamic and Semantic Data Sources

Conference Paper

Jul 2023

Datengestützte SGE -Systemgenerationsentwicklung: Konzeption und Anwendung einer Methode zur Synthese von Anforderungen aus Produktnutzungsdaten

Chapter

Full-text available

Sep 2023

Fahrzeugentwickelnde stehen in der frühen Entwicklungsphase vor der Herausforderung, dass entwickelte Konzepte aufgrund Geheimhaltung und finanziell angespannter Projekte kaum mit Zielkunden erprobt und evaluiert werden können. Dies kann zu einer erhöhten Marktunsicher-heit führen, die sich in unklaren Kundenanforderungen an zukünftige Fahrzeuggenerationen widerspiegelt. Unternehmen laufen infolgedessen Gefahr, dass die technisch umgesetzten und die von Kunden tatsächlich geforderten Anforderungen nicht kongruent sind. Die Entwicklung eines neuen technischen Systems basiert stets auf einem Referenzsystem, von dem ausgehend eine neue Systemgeneration durch Variation der darin enthaltenen Referenzsystemelemente (RSE) entwickelt wird. Empirische Studien haben die Potentiale der Analyse von Nutzungsdaten aus diesen Referenzen dargelegt: Entscheidungen können objektiviert, bestehende Marktunsicherheiten reduziert und neue Kundenbedarfe erhoben werden. In diesem Beitrag wird daher eine Methode zur Triangulation von qualitativen und quantitativen Produktnutzungsdaten vorgestellt. Ergebnis der Methode ist ein Set an Nutzungsanforderungen von Kunden und Anwendern an eine zukünftige Systemgeneration. Die Methode wurde in Zusammenarbeit mit einem deutschen OEM entwickelt, in der Konzeptphase angewendet und initial bewertet. Die Methode ergänzt die Erfahrung von Entwickelnden um eine faktenbasierte Entscheidungshilfe und leistet dadurch einen entscheidenden Beitrag zur Reduktion von Marktunsicherheiten. Das Entwicklungsteam hat die Reduktion der wahrgenommenen Marktunsicherheit in einer abschließenden Umfrage bestätigt. Ergänzend dazu wird eine Proxy-Variable für die Marktunsicherheit eingesetzt, um den Erfolgsbeitrag der Methode ganzheitlich zu untersuchen. Am letzten Messpunkt konnte die indexierte Marktunsicherheit um 54% im Vergleich zum Projektstart reduziert werden.

DQBR25K: Data Quality Business Rules Identification Based on ISO/IEC 25012

Chapter

Sep 2023

Organizations continuously generate and manage extensive amounts of data for specific purposes, such as making informed decisions or monitoring certain parameters. It is not only essential to obtain the data; how it is obtained, stored, and maintained is equally, if not more, valuable. Data quality is a crucial factor for any organization because if the data does not meet the required level of quality, its use will not yield the best results. To maintain adequate levels of quality, organizations need to identify the data requirements or business rules that their data must adhere to for the intended purpose. The most common problem is organizations’ lack of knowledge in identifying business rules adequately. In this regard, a model based on ISO/IEC 25012 enables the assessment of data quality based on an organization’s requirements. As a solution, this work presents a methodology to facilitate the identification and classification of business rules for an organization, as well as their association with each data quality characteristic defined by the ISO/IEC 25012 standard.

17th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2022)

Conference Paper

Sep 2022

Kula Kekeba Tune

17th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2022)

Conference Paper

Sep 2022

Kula Kekeba Tune

17th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2022)

Conference Paper

Full-text available

Sep 2022

Kula Kekeba Tune

Financing Sustianable Development in Egypt

Book

Full-text available

Feb 2022

Financing Sustainable Development in Egypt

SealedServices - Mehrwert datengetriebener Dienstleistungen

Technical Report

Full-text available

May 2023

Das vorliegende Whitepaper befasst sich mit den Mehrwerten datengetriebener Dienstleistungen am Beispiel des SealedServices Ökosystems. Hierbei wird zunächst auf die Wichtigkeit von Daten sowie die grundlegende Definition von datengetriebenen Dienstleistungen eingegangen. Darüber hinaus werden Hürden für datengetriebene Dienstleistungen an Beispielen dargestellt und anhand des SealedServices Ökosystems erläutert. Im Kern des Whitepapers werden verschiedene theoretische Modelle erläutert, welche verschiedene Dimensionen der Mehrwerte datengetriebener Dienstleistungen darstellen. Zudem wird ein Mehrwertschema zu datengetriebenen Dienstleistungen dargestellt, welches auf Basis von leitfadengeführten Interviews mit an SealedServices beteiligten Partnerunternehmen erstellt wurde. Die Erkenntnisse daraus werden anschließend mit den theoretischen Darstellungen verglichen und Unterschiede herausgestellt.

Discovering cell-active BCL6 inhibitors: effectively combining biochemical HTS with multiple biophysical techniques, X-ray crystallography and cell-based assays

Article

Full-text available

Nov 2022

By suppressing gene transcription through the recruitment of corepressor proteins, B-cell lymphoma 6 (BCL6) protein controls a transcriptional network required for the formation and maintenance of B-cell germinal centres. As BCL6 deregulation is implicated in the development of Diffuse Large B-Cell Lymphoma, we sought to discover novel small molecule inhibitors that disrupt the BCL6-corepressor protein–protein interaction (PPI). Here we report our hit finding and compound optimisation strategies, which provide insight into the multi-faceted orthogonal approaches that are needed to tackle this challenging PPI with small molecule inhibitors. Using a 1536-well plate fluorescence polarisation high throughput screen we identified multiple hit series, which were followed up by hit confirmation using a thermal shift assay, surface plasmon resonance and ligand-observed NMR. We determined X-ray structures of BCL6 bound to compounds from nine different series, enabling a structure-based drug design approach to improve their weak biochemical potency. We developed a time-resolved fluorescence energy transfer biochemical assay and a nano bioluminescence resonance energy transfer cellular assay to monitor cellular activity during compound optimisation. This workflow led to the discovery of novel inhibitors with respective biochemical and cellular potencies (IC50s) in the sub-micromolar and low micromolar range.

Data Quality Evaluation of Hospital Information System: A User Perspective Study

Article

Jul 2022

Aim: To offer high-quality healthcare services, individuals need to utilize high-quality information. The present study aims to evaluate the data quality in the hospital information system (HIS) at a selected educational hospital. Method: This descriptive and cross-sectional study was conducted in 2018. The statistical population consisted of 202 users of the hospital HIS at Amiralmomenin Hospital in Zabol. The respondents were selected using stratified random sampling. Data were collected using a researcher-made questionnaire. Then, they were analyzed through SPSS-20 and descriptive statistics. Results: It was found that 45 of the respondents stated in the comprehensibility of the hospital information, while 76 considered the hospital information not very understandable. Moreover, 34.7% believed that the hospital information would be rapidly accessible when needed. The average scores of the dimensions were found to be 5-8.5, and there were significant, positive relationships between all the dimensions under the study (P-value<0.05). Conclusion: Findings suggest that only a small number of staff had complete information on the HIS and associated subsystems. Other respondents lacked sufficient awareness of the HIS or were unaware of its existence. The authors suggest that the needs of users be evaluated before designing a HIS system in order to ensure that it will meet those needs. Despite the use of HIS subsystems in all the units of the hospital under study, respondents had insufficient information on how these subsystems could be used.

Information Quality Research Within the Project “A State in a Smartphone”

Article

May 2022

This article aims to explore the quality of the information provided by public authorities to citizens, businesses, and other stakeholders as part of the implementation of e-services of “A State in a Smartphone” in Ukraine. The article presents the structure of the authors’ model of information quality assessment, which includes three levels of characteristics and allows calculating the integral indicator of information quality. The model involves the use of expert research methods. The results of the study indicate that the information provided to users by public authorities has a fairly high level of quality, but there are reserves for improvement.

Internet Video Traffic Classification with Convolutional Neural Networks

Conference Paper

Full-text available

Nov 2021

The presented article provides results on Deep Learning model application for internet traffic classification. The classification task is performed for video-traffic of two types: real-time streaming video and on-demand video record. Multiple Machine Learning methods have been used to solve the specified task, and this article presents results obtained by Convolutional Neural Networks method. The article describes data pre-processing and formatting, as well as used model parameters. The article is concluded with some of the obtained results presented in form of Tables and 3D surface plots. The conclusion finalizes the article.

Research on Asphalt Pavement Inspection Data Quality Evaluation System

Conference Paper

Aug 2021

Open Government Data: Usage trends and metadata quality

Article

Full-text available

Oct 2021
J INF SCI

Alfonso Quarati

Open Government Data (OGD) has the potential to support social and economic progress. However, this potential can be frustrated if this data remains unused. Although the literature suggests that OGD datasets metadata quality is one of the main factors affecting their use, to the best of our knowledge, no quantitative study provided evidence of this relationship. Considering about 400,000 datasets of 28 national, municipal, and international OGD portals, we have programmatically analyzed their usage, their metadata quality, and the relationship between the two. Our analysis has highlighted three main findings. First of all, regardless of their size, the software platform adopted, and their administrative and territorial coverage, most OGD datasets are underutilized. Second, OGD portals pay varying attention to the quality of their datasets' metadata. Third, we did not find clear evidence that datasets usage is positively correlated to better metadata publishing practices. Finally, we have considered other factors, such as datasets' category, and some demographic characteristics of the OGD portals, and analyzed their relationship with datasets usage, obtaining partially affirmative answers.

Conférence Nationale d'Intelligence Artificielle Année 2021

Book

Jun 2021

Eukaryotic Box C/D methylation machinery has two non-symmetric protein assembly sites

Article

Full-text available

Sep 2021

Box C/D ribonucleoprotein complexes are RNA-guided methyltransferases that methylate the ribose 2’-OH of RNA. The central ‘guide RNA’ has box C and D motifs at its ends, which are crucial for activity. Archaeal guide RNAs have a second box C’/D’ motif pair that is also essential for function. This second motif is poorly conserved in eukaryotes and its function is uncertain. Conflicting literature data report that eukaryotic box C’/D’ motifs do or do not bind proteins specialized to recognize box C/D-motifs and are or are not important for function. Despite this uncertainty, the architecture of eukaryotic 2’-O-methylation enzymes is thought to be similar to that of their archaeal counterpart. Here, we use biochemistry, X-ray crystallography and mutant analysis to demonstrate the absence of functional box C’/D’ motifs in more than 80% of yeast guide RNAs. We conclude that eukaryotic Box C/D RNPs have two non-symmetric protein assembly sites and that their three-dimensional architecture differs from that of archaeal 2’-O-methylation enzymes.

Text Mining and Data Dimesion Reduction Approach on Consumer Comments

Article

Mar 2021

Ahmet YÜCEL

MOSCAF – Specifying Data Quality Requirements according Web Functionalities

Conference Paper

Nov 2020

Datenqualitätsmetriken für ein ökonomisch orientiertes Qualitätsmanagement

Chapter

Nov 2020

Für eine ökonomische Betrachtung der Datenqualität (DQ) und insbesondere die Planung von DQ-Maßnahmen unter Kosten-Nutzen-Aspekten sind DQ-Metriken unverzichtbar. Deswegen wird im Folgenden die Fragestellung aufgegriffen, wie DQ zweckorientiert und adäquat quantifiziert werden kann. Dazu werden Metriken entwickelt und vorgestellt, die zum einen eine quantitative Analyse der zum Messzeitpunkt vorhandenen DQ ermöglichen sollen, um Handlungsbedarfe zu identifizieren. Zum anderen sollen Auswirkungen auf die DQ, wie z. B. zeitlicher Verfall oder die Durchführung von DQ-Maßnahmen, zielgerichtet – durch Vergleich des DQ-Niveaus zu zwei oder mehreren Messzeitpunkten – untersucht werden können.

New Soil Reference Material Validation for Trace and Rare-Earth Elements by High-Resolution Inductively Coupled Plasma Mass Spectrometry

Article

Oct 2020

Inductively coupled plasma mass spectrometry (ICP-MS) has become inevitable for quantitative determination of trace and ultra-trace elements in geological samples. Inferences derived from ICP-MS data sets and other supportive evidence have revolutionized theories on geology/geodynamics. In this scenario, validation through interlaboratory studies plays an important role in assuring the quality of measurements, along with the performance and accreditation programs of the laboratory. The Geological Survey of India (GSI) initiated an interlaboratory testing program on soil and stream sediment testing in which about fifteen laboratories, including our geochemical laboratory participated. Trace and rare-earth elements (REE) determined on the reference material PKS-1 by high-resolution ICP-MS and validated with the certified values for PKS-1 provided by GSI after compilation. The fitness of acquired data for either “pure geochemistry” or “applied geochemistry” was determined based on the Z-scores. The data for most of the analytes fell within the -2 < Z < + 2 range, which included the majority of the trace and REE used in petrogenetic and provenance studies. These results help to review the analytical mismatches observed in the data, optimize necessary aspects and minimize the interference effects caused during analysis.

The Critical Need for Tool Validation before Using Malware Scanners in Digital Forensics

Conference Paper

Mar 2020

Automatisierte Darstellung von Datenqualität in phänotypischen Daten-Repositorien für die medizinische Forschung

Thesis

Full-text available

Jan 2020

Cornelius Knopp

With an increasing amount of research data generated and to be processed, as well as an increasing distribution of the reuse of data (secondary data use), there is also an increased need for the evaluation of their data quality. Through opinions, recommendations and guidelines from professional societies and scientific associations such as the Rat für Informationsinfrastrukturen (RfII) or the Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V (TMF), the evaluation, and improvement of data quality are becoming more and more conscious of the research community. Demands for the integration of data quality management in the FAIR process and the GCP regulations further underline this increasing awareness. In the context of this work, an overview of the challenges and the possibilities based on them for operationalizing data quality is provided. In addition to the theoretical preparation and conceptual delimitation of the subject area, the work also contributes to the establishment of a partially automated solution for evaluating and displaying data quality. The concepts "categories", "dimensions", "indicators" and "parameters" of data quality identified in the literature are explained and related to each other. The work describes a possible procedure for evaluating the quality of a data record regardless of the type and scope of the information contained. In a second step, indicators for operationalizing data quality are defined and examined for their possible use in a generic evaluation algorithm. The indicators suitable for a generic assessment approach are described in more detail and first formalized as part of the preparation of implementation as an automatable script. Finally, the calculation and visualization of these quality indicators are implemented in a statistical script in the language R. To promote the automation of the evaluation, the developed R script was additionally implemented as a SmartR plugin. The American Gut Project data set is used as the reference data set for the development of the R script and SmartR plugin. The developed plugin is designed for integration into the integrated research data platform tranSMART, where it can evaluate the data quality of the data records it contains. The overview is rounded off by excursions into the areas of data cleaning and statistics with R. Also, the different forms of "Data Repositories", "Data Warehouses", and "Data Lakes" are differentiated. The final part of the thesis focuses on a description of further use cases and potential extensions, as well as a critical examination of the existing quality concepts. In summary, only a few indicators turned out to be fundamentally suitable for a generic approach, since the calculation of the majority of the indicators defined in the literature sometimes requires considerable preliminary and additional information. Finally, only descriptive-statistical values for incomplete or missing entries, completeness and outliers prove to be suitable. The chosen approach with maximum generics, therefore, leads to a reduced informative value compare to other approaches, so that further adjustments are necessary for productive use.

Potent, Orally Bioavailable and Efficacious Macrocyclic Inhibitors of Factor XIa. Discovery of Pyridine-Based Macrocycles Possessing Phenylazole Carboxamide P1 Groups

Article

Dec 2019
J MED CHEM

Factor XIa inhibitors are promising novel anticoagulants which show excellent efficacy in preclinical thrombosis models with minimal effects on hemostasis. The discovery of potent and selective FXIa inhibitors which are also orally bioavailable has been a challenge. Here, we describe optimization of the imidazole-based macrocyclic series and our initial progress towards meeting this challenge. A two-pronged strategy, which focused on replacement of the imidazole scaffold and the design of new P1 groups, led to the discovery of potent, orally bioavailable pyridine-based macrocyclic FXIa inhibitors. Moreover, pyridine-based macrocycle 19, possessing the phenylimidazole carboxamide P1, exhibited excellent selectivity against relevant blood coagulation enzymes and displayed antithrombotic efficacy in a rabbit thrombosis model.

Tuning antiviral CD8 T-cell response via proline-altered peptide ligand vaccination

Preprint

Full-text available

Dec 2019

Viral escape from CD8 ⁺ cytotoxic T lymphocyte responses correlates with disease progression and represents a significant challenge for vaccination. Here, we demonstrate that CD8 ⁺ T cell recognition of the naturally occurring MHC-I-restricted LCMV-associated immune escape variant Y4F is restored following vaccination with a proline-altered peptide ligand (APL). The APL increases MHC/peptide (pMHC) complex stability, rigidifies the peptide and facilitates T cell receptor (TCR) recognition through reduced entropy costs. Structural analyses of pMHC complexes before and after TCR binding, combined with biophysical analyses, revealed that although the TCR binds similarly to all complexes, the p3P modification alters the conformations of a very limited amount of specific MHC and peptide residues, facilitating efficient TCR recognition. This approach can be easily introduced in peptides restricted to other MHC alleles, and can be combined with currently available and future vaccination protocols in order to prevent viral immune escape. Author Summary Viral escape mutagenesis correlates often with disease progression and represents a major hurdle for vaccination-based therapies. Here, we have designed and developed a novel generation of altered epitopes that re-establish and enhance significantly CD8 ⁺ T cell recognition of a naturally occurring viral immune escape variant. Biophysical and structural analyses provide a clear understanding of the molecular mechanisms underlying this reestablished recognition. We believe that this approach can be implemented to currently available or novel vaccination approaches to efficiently restore T cell recognition of virus escape variants to control disease progression.

Data quality assessment[J]

No full-text available

Recommended publications

A Privacy Preserving Protocol for Tracking Participants in Phase I Clinical Trials

Designing Adaptive Feedback for Improving Data Entry Accuracy

Structure from Motion in Practice

Data quality management for service-oriented manufacturing cyber-physical systems