ArticlePDF Available

Data Governance and Quality Management in Data Engineering

November 2023
International Journal of Computer Trends and Technology 71(11):40-45

November 2023
71(11):40-45

DOI:10.14445/22312803/IJCTT-V71I11P106

Authors:

Alekhya Achanta

Institute of Electrical and Electronics Engineers

Roja Bo

Independent Researcher

Data has become one of the most valuable assets for organizations today. With the exponential growth in data, effectively governing and managing its quality is critical for gaining business insights and maintaining regulatory compliance. This paper examines the importance of data governance and quality management in data engineering. It outlines the fundamental principles, processes, and best practices for implementing robust data governance frameworks and quality management programs. The roles of key stakeholders, such as data owners, stewards, and engineers, are discussed. It also explores the challenges, such as inadequate data quality culture and lack of executive support. The focus is on new technologies, such as machine learning and automation, which can potentially improve data governance and quality. The paper concludes by emphasizing the need for a holistic strategy, strong leadership, and a collaborative culture for successful data governance and quality management outcomes.

Content uploaded by Alekhya Achanta

Content may be subject to copyright.

International Journal of Computer Trends and Technology Volume 71 Issue 11, 40-45, November 2023

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Original Article

Data Governance and Quality Management in Data

Engineering

Alekhya Achanta1, Roja Boina2

1DataOps Engineer, Continental Properties Company Inc, Wisconsin, USA.

2Independent Researcher, North Carolina, United States of America.

Corresponding Author : alekhya.achanta@gmail.com

Received: 16 September 2023 Revised: 21 October 2023 Accepted: 08 November 2023 Published: 25 November 2023

Abstract - Data has become one of the most valuable assets for organizations today. With the exponential growth in data,

effectively governing and managing its quality is critical for gaining business insights and maintaining regulatory

compliance. This paper examines the importance of data governance and quality management in data engineering. It

outlines the fundamental principles, processes, and best practices for implementing robust data governance frameworks and

quality management programs. The roles of key stakeholders, such as data owners, stewards, and engineers, are discussed.

It also explores the challenges, such as inadequate data quality culture and lack of executive support. The focus is on new

technologies, such as machine learning and automation, which can potentially improve data governance and quality. The

paper concludes by emphasizing the need for a holistic strategy, strong leadership, and a collaborative culture for successful

data governance and quality management outcomes.

Keywords - Data governance frameworks, Data profiling and monitoring, Data validation and standards, Data quality

assurance, Metadata management.

1. Introduction

In the era of digitization, the ability to harness data has

revolutionized the competitive landscape for organizations.

In their pursuit to be data-centric, these entities continuously

amass vast troves of structured and unstructured data from

diverse avenues like social media, the Internet of Things,

sensors, clickstreams, and transactions (Ghavami, 2020).

While the potential to extract value from this data is

immense, there exists a significant research gap and

practical challenge: maintaining the quality of this data.

Poor-quality data can obfuscate genuine insights without

robust governance and judicious management, thereby

misguiding business strategies. Recent empirical studies

paint a dire picture—on average, businesses suffer a

staggering loss of over $15 million annually due to inferior

data quality. Furthermore, this compromised data quality

not only leads to flawed business decisions but also poses

severe regulatory risks, which have culminated in penalties

amounting to millions (Wang et al., 2022).

Given this backdrop, the criticality of instituting strong

frameworks and methodologies to govern and ensure data

quality cannot be overemphasized. This paper delves deep

into the realm of data governance and quality management,

particularly within the ambit of data engineering. Data

engineering, which encapsulates the myriad processes and

systems for procuring, housing, and dissecting data at scale,

forms the bedrock of any data-driven decision-making

mechanism. Our discourse begins by elucidating the

fundamental concepts of data governance and quality.

Following this, we highlight the pivotal role of data quality

management in the intricate pipelines of data engineering.

As we progress, the paper delineates the cardinal principles,

best practices, processes, and hierarchical roles pivotal for

ensuring data integrity. In addition, we probe into the

challenges that organizations grapple with and spotlight

emerging technological innovations poised to bolster data

governance and quality. Lastly, our exploration accentuates

the indispensable cultural shifts and leadership imperatives

that underpin effective governance and assiduous quality

management of data.

2. Defining Data Governance and Quality

Data governance refers to the overall strategy, policies,

standards, and processes that ensure high-quality data assets

across the organization. It establishes accountability and

oversight for managing data as a critical enterprise asset.

Data governance helps align regulatory, operational, and

strategic objectives with data strategies. Key activities

include developing data policies, standards, and procedures;

governing data architecture and quality; providing

stewardship and ownership; and monitoring compliance.

Effective data governance requires involvement across

functions - from legal, compliance, IT, and lines of business

to executive leadership.

Alekhya Achanta & Roja Boina / IJCTT, 71(11), 40-45, 2023

Data quality encompasses the precision, entirety,

coherence, and punctuality of the data validity and

uniqueness of data (Wende, 2007). High-quality data that

meets these characteristics enhances business value and

reduces risks. Data quality management involves practices

that ensure data adheres to quality requirements through its

lifecycle - from creation, acquisition, and storage to

processing, distribution, and archival. It provides standards

for data quality via metrics, monitoring, issue resolution,

and improvement initiatives (Wende & Otto, A Contingency

Approach To Data Governance, 2007).

3. Significance in Data Engineering Context

In data engineering pipelines, large volumes of data

flow through various phases like acquisition, storage,

processing, and consumption. Governance and quality must

be ingrained across these pipelines to drive confidence in

the data products. Data engineers are deeply involved in

developing and operating these pipelines. Hence, they must

adopt governance practices and build quality into data

infrastructure and processes (Dai, et al., 2016). Some key

reasons are:

3.1. Compliance

Various regulations like GDPR mandate data

governance through accountability, transparency, and

quality. This requires implementing policies, access

controls, lineage tracking, and quality checks in data

platforms.

3.2. Trustworthy Analytics

Quality issues like errors, duplication, inconsistencies,

and incompleteness can propagate and get magnified in

downstream analytics, leading to incorrect insights.

Governance and quality help prevent "garbage-in, garbage-

out."

3.3. Metadata

Governance requires rich metadata with definitions,

standards, and rules enabled by data catalogues,

dictionaries, and linage tools. This aids discoverability and

interoperability (Lis & Otto, 2020).

3.4. Monitoring

Continuous, automated data quality monitoring via

statistical profiling, rules, and machine learning models

helps identify issues early. Data engineers need to build

these capabilities.

3.5. Automation

Technologies like data quality rules engines and

machine learning can automate quality checks and

corrections, freeing up engineers.

With robust governance and quality, data teams save

significant time in non-value cleaning and reconciliation.

Governance provides standards, while quality management

helps systematically improve data assets (Koltay, 2016).

4. Fundamental Principles and Best Practices

Here is an elaboration on the fundamental principles

and best practices for data governance and quality:

4.1. Business Alignment

Business outcomes and requirements should drive data

quality initiatives rather than just IT preferences. The

priorities and use cases should come from business teams,

while technology teams enable implementation. This

ensures governance and quality efforts deliver maximum

business value.

4.2. Shared Accountability

Data quality cannot be the sole responsibility of IT

teams. Business teams who enter or depend on data for

decisions are equally accountable for reporting issues,

resolving them at source, and implementing quality

practices. A collaborative culture between data producers,

consumers, and enablers is essential.

4.3. Data Lifecycle Approach

Data quality must be assessed and managed across the

entire lifecycle - from creation, acquisition, storage, and

processing to consumption and archival. For example,

assess quality at intake, build checks into the ETL process,

profile before analysis, retain integrity during archival, etc.

This end-to-end view is critical.

4.4. Continuous Monitoring

Quality levels must be systematically measured via

metrics and monitored through dashboards. Issues must be

rapidly identified and resolved. Increase monitoring

coverage through automation using data quality tools.

Issue resolution - Superficial bug fixes result in quality

issues reappearing downstream. Root cause analysis to

identify systemic gaps and address those to prevent

recurrence.

Prevention over correction - Defining quality standards

upfront and building validations into systems versus

retrofitting quality is more efficient. Controls during entry

and processing prevent issues downstream.

4.5. Incremental Improvement

Start with a few critical metrics and data sets. Quick

wins build momentum for expanding systematically across

other data assets based on value, risk, etc.

4.6. Reusable Frameworks

Leverage consistent governance models across data

initiatives. Do not reinvent the wheel. Promote reuse of

metrics, policies, standards, and patterns.

Alekhya Achanta & Roja Boina / IJCTT, 71(11), 40-45, 2023

4.7. Risk-Based Approach

Priorities for data quality must be driven by business

risk and impact analysis. Focus on high-value, sensitive, or

compliance-related data first.

4.8. Master Data Foundation

High-quality customer, product, and financial master

data are necessary for reliable downstream analytics. Bad

master data amplifies downstream issues.

4.9. Security and Privacy

The governance framework must consistently

incorporate data security controls, access policies, and

privacy protections. These principles require change

management, executive mandate, shared accountability, and

robust processes. With persistent execution, data quality

becomes an organizational capability.

5. Key Roles and Responsibilities

The critical roles involved are Chief Data Officer,

Data/Domain Owners, Data Stewards, Data Engineers, Data

Architects, Data Analysts, and Legal/Compliance.

5.1. Responsibilities

5.1.1. Chief Data Officer (CDO)

• Responsible for data strategy and governance at the

executive level. Establishes policies and standards and

focuses on data quality.

• Evangelizes the importance of data quality across the

organization.

• Sponsors data governance programs and drives

adoption top-down.

• Chairs data stewardship committees and councils that

define governance practices.

• Secures funding and investments for improvement

initiatives.

• Measures effectiveness of governance through quality

metrics and benefits tracking.

5.1.2. Data/Domain Owners

• Business teams who generate, consume and are

accountable for domain data.

• Define business data requirements and quality needs for

their domain or functions.

• Participate in stewardship committees to evaluate

proposals and issues.

• Implement data quality practices mandated by

governance within their team.

• Fix data quality issues at the source systems under their

control.

5.1.3. Data Stewards

• Cross-functional team that defines and oversees data

standards and quality.

• Document data definitions, standards, rules, metrics,

and SOPs for governance.

• Support data certification for critical data assets to

assure quality compliance.

• Help troubleshoot data quality issues, analyze root

causes, and guide remediation.

• Provide tools and training to data producers on quality

practices.

• Monitor quality metrics and track issue resolution.

5.1.4. Data Engineers

• Implement data quality checks, automation, and

monitoring per governance rules.

• Embed data quality capabilities within data

infrastructure like ETL pipelines and models.

• Support integration of new data quality tools like

profiling, cleansing, and matching.

• Monitor data quality metrics across the data lifecycle

and report issues.

• Remediate quality issues found in upstream systems

and pipelines.

5.1.5. Data Architects

• Develop overall data models, architecture principles,

and standards for consistency.

• Define the master data model for critical domains like

customer, product, finance, etc.

• Create data dictionaries, taxonomy, and metadata

standards for governance.

• Ensure architectural components follow recommended

data quality patterns and capabilities.

5.1.6. Data Analysts

• Analyze reports and dashboards to detect data quality

issues that affect insights.

• Report upstream data issues found in consumption

systems like BI tools.

• Validate reports and metrics for accuracy and

completeness.

• Provide inputs to improve quality based on analytical

needs and pain points.

5.1.7. Legal/Compliance

• Recommend policies based on regulatory and

compliance needs like GDPR and CCPA.

• Conduct audits to ensure governance practices meet

compliance requirements.

• Determine retention rules and access policies per

regulatory guidelines.

• Enforce security standards for sensitive data like PII.

• Validate quality practices to meet compliance

expectations around reporting accuracy.

Alekhya Achanta & Roja Boina / IJCTT, 71(11), 40-45, 2023

5.2. Key Governance Processes Enabled by the Above

Stakeholders

• Data quality standards - Dimensions, metrics,

acceptance criteria, frequency

• Data policy and principles - Usage, integrity, retention,

security, access

• Data models - Master, transactional, analytics,

integration models

• Metadata standards - Define and maintain data

taxonomy, dictionaries, lineage

• Issue tracking - Document issues found, severity, status,

root cause, remediation

• Data quality tools - Select and implement automated

profiling, monitoring, and metadata tools

• Training and communication - Increase quality

awareness and skills across teams

The interplay between the roles, supported by

standardized governance processes and executive

sponsorship, can help engrain data quality accountability

across the data lifecycle.

Stewards and IT teams enable the above processes

while analysts and engineers embed quality in upstream

pipelines. Collaboration between roles is vital for end-to-

end governance.

6. Challenges in Implementation

• Lack of executive support - For data governance to be

effective, it must be championed and funded by

executive leadership. They must set the vision, strategy,

and urgency for enterprise-wide governance. Without

their active sponsorship, governance councils and

working groups will lack authority and struggle to drive

adoption. Executive focus on data quality and backing

for improvement programs is also essential.

• Poor data culture - Data quality needs to be everyone's

responsibility, not just IT's. A culture that values high-

quality data avoids "data dumping" and fixes issues at

source requires behavioral change across teams. It is

difficult to promote shared accountability and break

siloed attitudes.

• Data producers need to take more ownership, while

consumers should provide feedback.

• Distributed systems - With data and workloads

increasingly distributed across multi-cloud architectures

in siloed groups, maintaining consistent governance

standards becomes challenging. Data in legacy systems

also creates fragmentation. Governance processes break

down due to a lack of visibility and coordination across

systems.

• Inconsistent metrics - Measuring data quality through

standardized dimensions and metrics provides

necessary visibility. But, consistent definitions and

calculations across teams lead to clarity. The lack of

automated monitoring using quality metrics also

hampers systematic improvement.

• Technical complexity - With hundreds of upstream data

sources, ETL pipelines, databases, and systems,

implementing end-to-end governance with checks,

controls, and monitoring is complex. Diverse

technologies and integration points make it worse.

• Manual processes - Governance processes like

validating data against standards, visual inspection,

reconciliation, issue reporting, etc., remain

predominantly manual. This leads to limited coverage,

errors, delays, and rework.

• Legacy practices - Ingrained processes not designed for

data quality continue unchecked.

• Siloed teams unwilling to adopt new governance

practices and standards thwart progress.

• Skill gaps - Data engineers, stewards, and architects

need diverse skills in data modeling, metadata

management, and statistical quality techniques. Lack of

these skills impedes design and implementation.

Addressing these requires a strategic focus on culture,

skills, systems integration, metrics standardization, issue

resolution processes, and increased automation.

7. Emerging Trends and Technologies

Several emerging technologies enable organizations to

enhance data governance and quality in a scalable manner:

1. Machine learning and rules engines automate quality

checks and issue remediation versus manual processes.

They can also find complex data relationships and root

cause issues.

2. Metadata management, data catalogs, and data lineage

tools provide visibility into data assets and their usage

across systems. This aids governance, reporting, and

issue resolution.

3. Data virtualization creates abstraction layers that

insulate consumers from underlying changes while

providing transformation capabilities. This facilitates

quality enhancement.

4. Data quality monitoring tools perform statistical

profiling, custom checks, and anomaly detection over

data. They generate metrics and dashboards to track

quality over time.

5. Natural language interfaces allow users to state quality

requirements and definitions in business terms rather

than technical queries. This simplifies governance

processes.

6. Cloud-scale data platforms provide built-in governance

capabilities for access control, lineage, provenance, and

compliance monitoring. Leveraging them accelerates

implementation.

Alekhya Achanta & Roja Boina / IJCTT, 71(11), 40-45, 2023 
 
44 
7. Augmented  data  management  via  platforms  that 
automatically  detect  issues,  recommend  fixes,  and 
enable remediation helps continuously improve quality. 
 Adopting the above technologies helps enhance quality, 
reduce manual efforts, and improve adoption across the data 
lifecycle. However,  more  than  technology  is  needed  to 
address the cultural and organizational challenges outlined 
earlier.  Holistic  governance  frameworks  and  change 
management are vital for success. 
 
8. Conclusion 
        Data  has  emerged  as  a  strategic  asset  critical  to the 
digital  transformation  of  organizations.  As  data  volumes 
grow  across  disparate systems,  consistently governing  and 
managing  its quality  provides tremendous  value but  poses 
challenges.  This requires  a systematic  approach driven  by 
executive leadership, shared business-IT accountability, and 
collaborative  culture.  Core  principles  include  continuous 
monitoring,  prevention  versus  correction,  and  incremental 
data quality improvement supported by standard processes.  
 
  Technologies  like  machine  learning  and  cloud-scale 
data  platforms  are  valuable  enablers.  However,  cultural 
transformation and persistence are necessary to embed data 
governance and quality across  the data value  chain -  from 
acquisition  to  consumption.  With  robust  implementation, 
organizations can accelerate their data-driven ambitions and 
unlock more business value responsibly. 
References 
[1] Peter Ghavami, Big Data Management: Data Governance Principles for Big Data Analytics, Walter De Gruyter GmbH and Co KG, pp. 
1-174, 2020. [Google Scholar] [Publisher Link] 
[2] Miye Wang et al.,  “Big Data Health  Care Platform with Multisource Heterogeneous Data Integration and Massive High-Dimensional 
Data Governance  for Large  Hospitals: Design,  Development, and Application,” JMIR  Medical Informatics,  vol. 10,  no. 4,  pp. 1-15, 
2022. [CrossRef] [Google Scholar] [Publisher Link] 
[3] Kristin  Wende,  “A  Model  for  Data  Governance-Organising  Accountabilities  for  Data  Quality  Management,”  Association  for 
Information Systems Electronic Library, vol. 80, pp. 1-10, 2007. [Google Scholar] [Publisher Link] 
[4] Kristin  Wende,  and  Boris  Otto,  “A  Contingency  Approach  to  Data  Governance,”  International  Consultation  on  Incontinence 
Questionnaire, pp. 163-176, 2007. [Google Scholar] [Publisher Link] 
[5] Wei Dai et al., “Data Profiling Technology of Data Governance Regarding Big Data: Review and Rethinking,” Information Technology: 
New  Generations:  13th  International  Conference  on  Information  Technology,  pp.  439-450,  2016. [CrossRef]  [Google  Scholar] 
[Publisher Link] 
[6] Dominik Lis, and Boris Otto, “Data Governance in Data Ecosystems-Insights from Organizations,” Association for Information Systems 
Electronic Library, pp. 1-11, 2020. [Google Scholar] [Publisher Link] 
[7] Tibor  Koltay,  “Data  Governance,  Data Literacy  and  the  Management  of Data  Quality,” International  Federation  of  Library  and 
Institutions Journal, vol. 42, no. 4, pp. 303-312, 2016. [CrossRef] [Google Scholar] [Publisher Link] 
[8] Soňa Karkošková, “Data Governance Model to Enhance Data Quality in Financial Institutions,” Information Systems Management, vol. 
40, no. 1, pp. 90-110, 2023. [CrossRef] [Google Scholar] [Publisher Link] 
[9] Sung Une Lee, Liming Zhu, and Ross Jeffery, “A Contingency-Based Approach to Data Governance Design for Platform Ecosystems,” 
Association for Information Systems Electronic Library, pp. 1-15, 2018. [Google Scholar] [Publisher Link] 
[10] Rene  Abraham,  Johannes  Schneider,  and  Jan  vom  Brocke,  “Data  Governance: A  Conceptual  Framework, Structured  Review,  and 
Research  Agenda,”  International  Journal  of  Information  Management,  vol.  49,  pp.  424-438,  2019. [CrossRef]  [Google  Scholar] 
[Publisher Link] 
[11] Ibrahim Alhassan, David Sammon, and  Mary Daly, “Data Governance Activities: An Analysis of the Literature,” Journal of Decision 
Systems, vol. 25, no. 1, pp. 64-75, 2016. [CrossRef] [Google Scholar] [Publisher Link] 
[12] John A. Pearce, and Shaker A. Zahra, “Board Composition from a Strategic Contingency Perspective,” Journal of Management Studies, 
vol. 29, no. 4, pp. 411-438, 1992. [CrossRef] [Google Scholar] [Publisher Link] 
[13] Chunxia Wang, and Jian Xie, “Constructing a Computer Model for Discipline Data Governance using the Contingency Theory and Data 
Mining,” 2021 4th International Conference on Information Systems and Computer Aided Education, pp. 1967-1970, 2021. [CrossRef] 
[Google Scholar] [Publisher Link] 
[14] Majid Al-Ruithe,  Elhadj  Benkhelifa,  and  Khawar  Hameed,  “A  Systematic Literature  Review of  Data  Governance  and Cloud  Data 
Governance,” Personal and Ubiquitous Computing, vol. 23, pp. 839-859, 2019. [CrossRef] [Google Scholar] [Publisher Link] 
[15] Marijn Janssen et al., “Data Governance: Organizing Data for Trustworthy Artificial Intelligence,” Government Information Quarterly, 
vol. 37, no. 3, 2020. [CrossRef] [Google Scholar] [Publisher Link] 
[16] Zeljko Panian, “Some Practical Experiences in Data Governance,” World Academy of Science, Engineering and Technology, vol. 62, no. 
1, pp. 939-946, 2010. [Google Scholar] [Publisher Link] 
 

Alekhya Achanta & Roja Boina / IJCTT, 71(11), 40-45, 2023 
 
45 
[17] Boris Otto, “A Morphology of the Organisation of Data Governance,” Association for Information Systems Electronic Library, pp. 1-13, 
2011. [Google Scholar] [Publisher Link] 
[18] Marina Micheli et al., “Emerging Models of Data Governance in the Age of Datafication,” Big Data and Society, vol. 7, no. 2, pp. 1-15, 
2020. [CrossRef] [Google Scholar] [Publisher Link] 
[19] Stephanie Russo Carroll, Desi Rodriguez-Lonebear, and Andrew Martinez, “Indigenous Data Governance: Strategies from United States 
Native Nations,” Data Science Journal, vol. 18, no. 31, pp. 1-15, 2019. [CrossRef] [Google Scholar] [Publisher Link] 
[20] Steve Sarsfield,  “The  Data Governance  Imperative: A  Business Strategy for  Corporate Data, IT Governance  Publishing, pp. 1-161, 
2009. [Google Scholar] [Publisher Link] 
[21] Ibrahim Alhassan,  David  Sammon, and  Mary Daly,  “Data  Governance Activities:  A  Comparison between  Scientific  and  Practice-
Oriented Literature,” Journal of Enterprise  Information Management, vol. 31, no. 2, pp. 300-316, 2018. [CrossRef] [Google Scholar] 
[Publisher Link] 
[22] Krassimira Paskaleva  et  al., “Data Governance in the Sustainable Smart  City,” Informatics, vol. 4,  no. 4, pp. 1-19, 2017. [CrossRef] 
[Google Scholar] [Publisher Link] 
[23] Huberman A. Michael, and Miles B. Matthew, “Data Management and Analysis Methods,” Handbook of Qualitative Research, pp. 428-
444, 1994. [Google Scholar] [Publisher Link] 
 
 
  

Article ID: IJCET_15_02_007 Cite this Article: Sandeep Kumar and Manoj Kumar Vandanapu, Natural Language Generation and Artificial Intelligence in Financial Reporting: Transforming Financial Data into Strategic Insights for Executive Leadership

Research

Full-text available

Apr 2024

This article explores the innovative integration of Natural Language Generation (NLG) and Artificial Intelligence (AI) in financial reporting, spotlighting their transformative impact on turning financial data into strategic insights for executive leadership through automatic commentary generation. At the intersection of AI advancements and financial analysis, NLG emerges as a pivotal technology, enabling the automated transformation of complex financial datasets into narrative or commentary reports that are coherent, insightful, and accessible to decision-makers. The discussion extends to the mechanisms underlying NLG's functionality in financial reporting ranging from data interpretation and language processing to commentary generation and customization thereby enhancing report efficiency, accuracy, and accessibility. Through detailed exploration, this article articulates NLG's contributions to financial reporting commentary, such as improved efficiency, accuracy, scalability, and the personalized delivery of financial insights. It also addresses the challenges inherent in NLG application, including technical and ethical considerations, and the limitations of current technologies in capturing financial nuances. Looking forward, the article envisions a future where ongoing advancements in AI and machine learning further refine NLG's capabilities, offering even richer, more nuanced financial insights to support strategic decision-making at the highest levels of business leadership.

Unleashing Digital Transformation with SAP Rise

Article

Full-text available

Apr 2024

Sandeep Kumar

Organizations are traversing extraordinary encounters and breaks driven by technological evolutions in dynamic business ecosystem. To stay aggressive and irrepressible, businesses must embark on a holistic digital transformation journey. SAP RISE, a state-of-the-art offering from SAP, stands at the forefront of empowering organizations to realize their digital potential. This white paper explores SAP RISE as a catalyst for driving transformative change, uncovering its key features, benefits, and considerations for organizations seeking to leverage this solution.

Construction of a Big Data Platform in Healthcare with Multi-source, Heterogeneous Data Integration and Massive High-Dimensional Data Governance for Large Hospitals: Design, Development, and Application (Preprint)

Article

Full-text available

Jan 2022

Background: With the advent of data-intensive science, a full integration of big data science and health care will bring a cross-field revolution to the medical community in China. The concept big data represents not only a technology but also a resource and a method. Big data are regarded as an important strategic resource both at the national level and at the medical institutional level, thus great importance has been attached to the construction of a big data platform for health care. Objective: We aimed to develop and implement a big data platform for a large hospital, to overcome difficulties in integrating, calculating, storing, and governing multisource heterogeneous data in a standardized way, as well as to ensure health care data security. Methods: The project to build a big data platform at West China Hospital of Sichuan University was launched in 2017. The West China Hospital of Sichuan University big data platform has extracted, integrated, and governed data from different departments and sections of the hospital since January 2008. A master-slave mode was implemented to realize the real-time integration of multisource heterogeneous massive data, and an environment that separates heterogeneous characteristic data storage and calculation processes was built. A business-based metadata model was improved for data quality control, and a standardized health care data governance system and scientific closed-loop data security ecology were established. Results: After 3 years of design, development, and testing, the West China Hospital of Sichuan University big data platform was formally brought online in November 2020. It has formed a massive multidimensional data resource database, with more than 12.49 million patients, 75.67 million visits, and 8475 data variables. Along with hospital operations data, newly generated data are entered into the platform in real time. Since its launch, the platform has supported more than 20 major projects and provided data service, storage, and computing power support to many scientific teams, facilitating a shift in the data support model-from conventional manual extraction to self-service retrieval (which has reached 8561 retrievals per month). Conclusions: The platform can combine operation systems data from all departments and sections in a hospital to form a massive high-dimensional high-quality health care database that allows electronic medical records to be used effectively and taps into the value of data to fully support clinical services, scientific research, and operations management. The West China Hospital of Sichuan University big data platform can successfully generate multisource heterogeneous data storage and computing power. By effectively governing massive multidimensional data gathered from multiple sources, the West China Hospital of Sichuan University big data platform provides highly available data assets and thus has a high application value in the health care field. The West China Hospital of Sichuan University big data platform facilitates simpler and more efficient utilization of electronic medical record data for real-world research.

Data Governance in Data Ecosystems – Insights from Organizations

Conference Paper

Full-text available

Jul 2020

The emergence of data ecosystems with platform-based infrastructures continuously demonstrate the potential of value creation based on data. In this context, data governance has become an emerging topic in recent IS literature as organizations require a more sophisticated approach over the management of their data assets. However, the implications of organizations interacting within new forms of inter-organizational collaborations have not been sufficiently explored. We first elaborate on the topic of data governance to derive intra- and inter-organizational characteristics of data governance. Using a case study approach, we examine three firms developing data-driven business models that rely on ecosystems in their business-to-business (B2B) operations. We analyze new implications arising to each individual with regard to data governance and indicate that data governance requires a broader research perspective, one that takes the dynamics outside a single organization in ecosystems into consideration.

Data Governance: A conceptual framework, structured review, and research agenda

Article

Full-text available

Jul 2019
INT J INFORM MANAGE

Data governance refers to the exercise of authority and control over the management of data. The purpose of data governance is to increase the value of data and minimize data-related cost and risk. Despite data governance gaining in importance in recent years, a holistic view on data governance, which could guide both practitioners and researchers, is missing. In this review paper, we aim to close this gap and develop a conceptual framework for data governance, synthesize the literature, and provide a research agenda. We base our work on a structured literature review including 145 research papers and practitioner publications published during 2001-2019. We identify the major building blocks of data governance and decompose them along six dimensions. The paper supports future research on data governance by identifying five research areas and displaying a total of 15 research questions. Furthermore, the conceptual framework provides an overview of antecedents, scoping parameters, and governance mechanisms to assist practitioners in approaching data governance in a structured manner.

Data Governance in the Sustainable Smart City

Article

Full-text available

Nov 2017

The wisdom of ‘smart’ development increasingly shapes urban sustainability in Europe and beyond. Yet, the ‘smart city’ paradigm has been critiqued for favouring technological solutions and business interests over social inclusion and urban innovation. Despite the rhetoric of ‘citizen-centred approaches’ and ‘user-generated data’, the level of stakeholder engagement and public empowerment is still in question. It is unclear how smart city initiatives are developing common visions according to the principles of sustainable urban development. This paper examines how data governance in particular is framed in the new smart city agenda that is focused on sustainability. The challenges and opportunities of data governance in sustainability-driven smart city initiatives are articulated within a conceptual Framework on Sustainable Smart City Data Governance. Drawing on three cases from European countries and a stakeholder survey, the paper shows how governance of data can underpin urban smart and sustainable development solutions. The paper presents insights and lessons from this multi-case study, and discusses risks, challenges, and future research.

Board composition from a strategic contingency perspective

Article

Full-text available

Jan 1992

A Model for Data Governance - Organising Accountabilities for Data Quality Management

Conference Paper

Full-text available

Dec 2007

Kristin Weber

Enterprises need data quality management (DQM) that combines business-driven and technical perspectives to respond to strategic and operational challenges that demand high-quality corporate data. Hitherto, companies have assigned accountabilities for DQM mostly to IT departments. They have thereby ignored the organisational issues that are critical to the success of DQM. With data governance, however, companies implement corporate-wide accountabilities for DQM that encompass professionals from business and IT. This paper outlines a data governance model comprised of three components. Data quality roles, decision areas and responsibilities build a matrix, comparable to a RACI chart. The data governance model documents the data quality roles and their type of interaction with DQM activities. Companies can structure their company-specific data governance model based on these findings.

A Contingency Approach to Data Governance

Conference Paper

Full-text available

Nov 2007

Enterprises need data quality management (DQM) to respond to strategic and operational challenges demanding high-quality corporate data. Hitherto, companies have assigned accountabilities for DQM mostly to IT departments. They have thereby ignored the organizationalissues that are critical to the success of DQM. With data governance, however, companies implement corporate-wide accountabilities for DQM that encompass professionals from business and IT. This paper outlines a data governance model comprised of three components that build a matrix comparable to an RACI chart: data quality roles, decision areas, and responsibilities. The data governance model documents the data quality roles and their type of interaction with DQM activities. In addition, the paper identifies contingency factors that impact the model configuration. Companies can structure their company-specific data governance model based on these findings.

Big Data Management: Data Governance Principles for Big Data Analytics

Book

Jan 2021

Peter Ghavami

Data governance: Organizing data for trustworthy Artificial Intelligence

Article

Jun 2020
GOV INFORM Q

The rise of Big, Open and Linked Data (BOLD) enables Big Data Algorithmic Systems (BDAS) which are often based on machine learning, neural networks and other forms of Artificial Intelligence (AI). As such systems are increasingly requested to make decisions that are consequential to individuals, communities and society at large, their failures cannot be tolerated, and they are subject to stringent regulatory and ethical requirements. However, they all rely on data which is not only big, open and linked but varied, dynamic and streamed at high speeds in real-time. Managing such data is challenging. To overcome such challenges and utilize opportunities for BDAS, organizations are increasingly developing advanced data governance capabilities. This paper reviews challenges and approaches to data governance for such systems, and proposes a framework for data governance for trustworthy BDAS. The framework promotes the stewardship of data, processes and algorithms, the controlled opening of data and algorithms to enable external scrutiny, trusted information sharing within and between organizations, risk-based governance, system-level controls, and data control through shared ownership and self-sovereign identities. The framework is based on 13 design principles and is proposed incrementally, for a single organization and multiple networked organizations.

Some practical experiences in data governance

Article

Feb 2010

Z. Panian

If managed correctly, data can become an organization's most valuable asset, helping it to remain competitive and agile, to proactively meet customer needs, and to keep costs in check. Companies and government organizations of all sizes are striving to manage data as an enterprise asset, to be shared and reused across multiple software applications and systems, business processes, and users throughout the organization. They're finding that they need to establish standards, policies, and processes for the usage, development, and management of data. They recognize that creating the right organizational structure and developing the technology infrastructure to support the governance of their data is also critical.

Data Governance and Quality Management in Data Engineering

Abstract

Recommended publications

The Role of Data Visualization in Healthcare Analytics

Process-based e-Catalog Data Quality Management

Evolving Paradigms of Data Engineering in the Modern Era: Challenges, Innovations, and Strategies

Integrating Data Engineering with Intelligent Process Automation for Business Efficiency