ArticlePDF Available

Statistical Data Quality Model for Data Migration Business Enterprise

Authors:
  • BMS Institute of Technology and Management

Abstract

In current information trends, state of decision making is one of the important deeds for any organization or enterprise to identify them in the business market in this connection, the data which is present in data warehouse or decision databases should be very accurate and help them to give proper decisions. When the organizations or enterprises undergo a merger/takeover demands the data migration from legacy systems to modern systems/decision databases, i.e., target systems. If a target/decision databases is very large to ensure quality assurance of the decision database is tedious. The resource utilization required to conduct full data verification is exorbitant. This research proposes a mathematical model using deterministic statistical methods to reduce resource utilization and assures greater data quality. The proposed method validated using various data sets and volumes against man effort, CPU time, defects raised and cost. It also ensures comfortable confidence for end users to rely on the data quality for decision making.
... Raman et al. [30] evaluated the novelty distribution of student's inspiration by adopting programming competition. Data mining can be applied to such databases in order to gain challenging outputs [1,9,11,23,24,31,34]. Data mining helps users extract useful information from large databases. ...
... Statistics has the same general uses and results as data mining. Regression is used in statistics quite often it creates models that are predictive of behavior, and these models are built from large stores of historical data [23]. Data mining effectively automates the statistical process, thereby the reliving the users burden. ...
... This results in tool that is easier to use. In the educational data mining where we primarily investigate analytics for good insights [1,11,13,14,23,28,34]. Data mining broadly classified into two parts: unsupervised and supervised learning. ...
Chapter
This study focuses on the students internet use in their personal life. Various aspects has been assumed with the help of data mining technique and tried to obtain some hidden outcomes of student's internet behavior. The special focus is to test the significance based the on gender wise, location wise and different financial income group perspective to discriminate the behavioral pattern. Here, online survey is carrying out and 217 students information is gathered. The random sampling is performed for collection of data. The unsupervised and supervised learning analysis was carried out with SPSS 22.0v software package. The obtained result helps in future planning the direction of appropriate use of internet by students.
... Gesture is programmed and indication support on JWT owner. The customer determination subsequently present in the cluster gesture to the HSSO examine supplier to demand for the study permission gesture [9]. ...
... We will discuss set of strategic and tactical responses to address and challenges. Our objective is to help individuals tasked with Hadoop security concentrate on possibility to the cluster, as well as construct an authority outline to sustain prepared requirements [8]. ...
Chapter
Every minute in the internet, enormous volume of data is generating tuning into big data. A new paradigm of data storage and processing is essential. There is extremely no query that Hadoop is an essentially unruly technology. Latest innovation in scalability, appearance, and data dispensation competence has been beating us apiece of few months over the last few years. In this source of ecosystem is the extremely classification of novelty. Big data has distorted data analytics given that extent, presentation, and flexibility that was just not potential a few years ago, at an outlay that was evenly inconceivable. But as Hadoop turn into the new standard of information technology, developers, and security policy are playing grab awake to identify with Hadoop security. Moreover, current security hypothesis and mechanism have been established. In this paper, we discuss lays out a series of recommended security controls for Hadoop along with an access control framework, which enforces access control policies dynamically based on the sensitivity of the data and systemic security, operational security and architecture for data security. A relative study of latest advances in big data for security. A number of prospect information for big data security and privacy methods are discussed.
... By communicating with customers, preparing their discussions and basically understanding clients in their very own words, organizations can more likely comprehend their customers' needs and enhance the associations with them. -Text Analytics: Many associations use characteristic dialect preparing to approach content issues and enhance exercises, for example, information administration and huge information examination [12]. Morphological, linguistic [13], syntactic and semantic examinations of dialect empower distinguishing proof and extraction of elements like themes, areas, individuals, association, dates, and so forth and produce the metadata that can be utilized to tag and classify content in the most exact way. ...
Chapter
In modern age of information explosion, every day millions of gigabytes of data are generated in the form of documents, web pages, e-mail, social media text, blogs etc., so importance of effective and efficient Natural Language Processing techniques become crucial for an information retrieval system, text summarization, sentiment analysis, information extraction, named entity recognition, relationship extraction, social media monitoring, text mining, language translation program, and question answering system. Natural Language Processing is a computational technique applies different levels of linguistic analysis for representing natural language into a useful representation for further processing. NLP is recognized as a challenging task in computer science and artificial intelligence because understanding human natural language is not only depends on the words but how those words are linked together to form precise meaning is also considered. Regardless of language being one of the easiest concepts for human to learn, but for training computers to understand natural language is a difficult task due to the ambiguity of language syntax and semantics. Natural Language processing techniques involves processing documents or text which reduces storage space and also reduces the size of index and understanding the given information which satisfies user’s need. NLP techniques improve the performance of the information retrieval efficiency and effective documentation processes. Common dialect handling procedures incorporates tokenization, stop word expulsion, stemming, lemmatization, parts of discourse labeling, lumping and named substance recognizer which enhances execution of NLP applications. The Natural Language Toolkit is the best possible solution for learning the ropes of NLP domain. NLTK, a collection of application packages which encourage researchers and learners in natural language processing, computational linguistics and artificial intelligence.
... Set controls at the field and employment level for various beneficiaries (one target, differential access). Randomization is another approach to anonymize or de-distinguish actually identifiable data [8], [9]. Figure 2 shows the big data sets migrationssecurity need. ...
Article
Full-text available
In today's predictive analytics world, data engineering play a vital role, data acquisition is carried out from various source systems and process as per the business applications and domain. Big Data integrates, governs, and secures big data with repeatable, reliable, and maintainable processes. Through volume, speed, and assortment of information characteristics try to reveal business esteem from enormous information. However, with information that is frequently deficient, conflicting, ungoverned, and unprotected, which is hazardous and enormous information being a risk instead of an advantage. What's more, with conventional methodologies that are manual and unpredictable, huge information ventures take too long to acknowledge business esteem. Reasonably and over and again conveying business esteem from enormous information requires another technique. In this connection, raw data has to be moved between onsite and offshore environment during this course of action, data privacy is a major concern and challenge. A Big Data Privacy platform can make it easier to detect, investigate, assess, and remediate threats from intruders. We tried to do complete study of Big Data Privacy using data masking methods on various data loads and different types. This work will help data quality analyst and big data developers while building the big data applications. © 2018 Institute of Advanced Engineering and Science. All rights reserved.
... Muralidhar K and R Sarathy, states "Interval Responses for Queries on Confidential Attributes: A Security Evaluation." Journal of Information Privacy and Security, 9(1),[3][4][5][6][7][8][9][10][11][12][13][14][15][16] 2013. A white paper by camouflage data masking specialist, titled "A Proactive Approach to Data Security for Cloud-Based Testing and Development", May 2014, emphasis any cloudbased application development offers organizations many tangible benefits, yet organizations struggle with how to work with data in the cloud-big data while complying with key regulations and meeting data security requirements. ...
Article
Full-text available
Due to Internet of things and social media platforms, raw data is getting generated from systems around us in three sixty degree with respect to time, volume and type. Social networking is increasing rapidly to exploit business advertisements as business demands. In this regard there are many challenges for data management service providers, security is one among them. Data management service providers need to ensure security for their privileged customers in providing accurate and valid data. Since underlying transactional data have varying data characteristics such huge volume, variety and complexity, there is an essence of deploying such data sets on to the big data platforms which can handle structured, semi-structured and un-structured data sets. In this regard we propose a data masking technique for big data security. Data masking ensures proxy of original dataset with a different dataset which is not real but looks realistic. The given data set is masked using modulus operator and the concept of keys. Our experiment advocates enhanced modulus based data masking is better with respect to execution time and space utilization for larger data sets when compared to modulus based data masking. This work will help big data developers, quality analysts in the business domains and provides confidence for end-users in providing data security. © 2017 Institute of Advanced Engineering and Science. All rights reserved.
Chapter
In the contemporary information technology world, data engineering enacts a vital role in creating scenario-based test data for development and testing purposes and this is done through the cloning of the production data to generate real-time data for the said purposes. The data movement between the onsite and offshore sites for development and testing purposes is inevitable. Information securing is done from different data source frameworks and procedures according to business applications. Enormous data coordinates oversees and protects large information with repeatable, dependable and viable procedures. With the big data that is frequently fragmented which are conflicting and ungoverned which are unprotected and associations have the chance of enormous information being a risk rather than an advantage. Business ventures take too long time to even consider realizing the value of data quality and data validity. Sustainably business organizations require another procedure to find the data in real time for its validity and quality during the development cycle. Right now, information must be moved among on location and seaward condition during this strategy, information security is a significant concern. A major information security stage can make it simpler to recognize, evaluate and remediate dangers from interlopers. We attempted to do an investigation of huge information security utilizing information veiling strategies on different information loads and various sorts of data. The proposed work will enable data quality specialists and data engineers during big data application building with more security features.
Book
Full-text available
This book comprises the best deliberations with the theme “Machine Learning Technologies and Applications” in the “International Conference on Advances in Computer Engineering and Communication Systems (ICACECS 2020),” organized by the Department of Computer Science and Engineering, VNR Vignana Jyothi Institute of Engineering and Technology. The book provides insights into the recent trends and developments in the field of computer science with a special focus on the machine learning and big data. The book focuses on advanced topics in artificial intelligence, machine learning, data mining and big data computing, cloud computing, Internet of things, distributed computing and smart systems.
Article
Full-text available
Saraph et al. systematically attempted to organize and synthesize the various perceptions offered by other authors on the critical factors of quality management. The authors provided a synthesis of the quality literature by identifying eight critical factors of quality management in a business unit. They stated that the measures were both valid and reliable. In the light of this, empirically tests their resulting instrument to a greater extent and from an international perspective. The instrument of the operational measures of the developed factors is tested for reliability and validity using data collected from 424 general managers and quality managers in the United Arab Emirates. Results provide strong evidence that the measures are judged to be both valid and reliable. The empirical replication on a more broadly based sample provides further corroboration of Saraph et al.’s results. Examines the level of practice of factors of quality management in the UAE. Suggests that better use of the instrument is accomplished when it is used jointly with other instruments that measure customer satisfaction.
Article
Full-text available
Over the past decades, data mining has proved to be a successful approach for extracting hidden knowledge from huge collections of structured digital data stored in databases. From the inception, Data mining was done primarily on numerical set of data. Nowadays as large multimedia data sets such as audio, speech, text, web, image, video and combinations of several types are becoming increasingly available and are almost unstructured or semistructured data by nature, which makes it difficult for human beings to extract the information without powerful tools. This drives the need to develop data mining techniques that can work on all kinds of data such as documents, images, and signals. This paper explores on survey of the current state of multimedia data mining and knowledge discovery, data mining efforts aimed at multimedia data, current approaches and well known techniques for mining multimedia data.
Article
Full-text available
The rapid growth of the Internet as an environment for information exchange and the lack of enforceable standards regarding the information it contains has lead to numerous information quality problems. A major issue is the inability of Search Engine technology to wade through the vast expanse of questionable content and return "quality" results to a user's query. This paper attempts to address some of the issues involved in determining what quality is, as it pertains to information retrieval on the Internet. The IQIP model is presented as an approach to managing the choice and implementation of quality related algorithms of an Internet crawling Search Engine.
Article
Many previous studies of data quality have focused on the realization and evaluation of both data value quality and data service quality. These studies revealed that poor data value quality and poor data service quality were caused by poor data structure. In this study we focus on metadata management, namely, data structure quality and introduce the data quality management maturity model as a preferred maturity model. We empirically show that data quality improves as data management matures.