ArticlePDF Available

Statistical Data Quality Model for Data Migration Business Enterprise

January 2013
International Journal of Soft Computing 8(5):340-351

January 2013
8(5):340-351

DOI:10.3923/ijscomp.2013.340.351

Authors:

Manjunath T N

BMS Institute of Technology and Management

Ravindra S Hegadi

Central University of Karnataka

In current information trends, state of decision making is one of the important deeds for any organization or enterprise to identify them in the business market in this connection, the data which is present in data warehouse or decision databases should be very accurate and help them to give proper decisions. When the organizations or enterprises undergo a merger/takeover demands the data migration from legacy systems to modern systems/decision databases, i.e., target systems. If a target/decision databases is very large to ensure quality assurance of the decision database is tedious. The resource utilization required to conduct full data verification is exorbitant. This research proposes a mathematical model using deterministic statistical methods to reduce resource utilization and assures greater data quality. The proposed method validated using various data sets and volumes against man effort, CPU time, defects raised and cost. It also ensures comfortable confidence for end users to rely on the data quality for decision making.

Content uploaded by Ravindra S Hegadi

Content may be subject to copyright.

Data Mining Learning of Behavioral Pattern of Internet User Students

Chapter

Jul 2019

This study focuses on the students internet use in their personal life. Various aspects has been assumed with the help of data mining technique and tried to obtain some hidden outcomes of student's internet behavior. The special focus is to test the significance based the on gender wise, location wise and different financial income group perspective to discriminate the behavioral pattern. Here, online survey is carrying out and 217 students information is gathered. The random sampling is performed for collection of data. The unsupervised and supervised learning analysis was carried out with SPSS 22.0v software package. The obtained result helps in future planning the direction of appropriate use of internet by students.

A Key Management of Security to Design Enhanced Apache and Rhino Utilities in Big Data Using Hadoop

Article

Jan 2019

Research Challenges in Big Data Security with Hadoop Platform

Chapter

Jul 2019

Every minute in the internet, enormous volume of data is generating tuning into big data. A new paradigm of data storage and processing is essential. There is extremely no query that Hadoop is an essentially unruly technology. Latest innovation in scalability, appearance, and data dispensation competence has been beating us apiece of few months over the last few years. In this source of ecosystem is the extremely classification of novelty. Big data has distorted data analytics given that extent, presentation, and flexibility that was just not potential a few years ago, at an outlay that was evenly inconceivable. But as Hadoop turn into the new standard of information technology, developers, and security policy are playing grab awake to identify with Hadoop security. Moreover, current security hypothesis and mechanism have been established. In this paper, we discuss lays out a series of recommended security controls for Hadoop along with an access control framework, which enforces access control policies dynamically based on the sensitivity of the data and systemic security, operational security and architecture for data security. A relative study of latest advances in big data for security. A number of prospect information for big data security and privacy methods are discussed.

Review on Natural Language Processing Trends and Techniques Using NLTK

Chapter

Jul 2019

In modern age of information explosion, every day millions of gigabytes of data are generated in the form of documents, web pages, e-mail, social media text, blogs etc., so importance of effective and efficient Natural Language Processing techniques become crucial for an information retrieval system, text summarization, sentiment analysis, information extraction, named entity recognition, relationship extraction, social media monitoring, text mining, language translation program, and question answering system. Natural Language Processing is a computational technique applies different levels of linguistic analysis for representing natural language into a useful representation for further processing. NLP is recognized as a challenging task in computer science and artificial intelligence because understanding human natural language is not only depends on the words but how those words are linked together to form precise meaning is also considered. Regardless of language being one of the easiest concepts for human to learn, but for training computers to understand natural language is a difficult task due to the ambiguity of language syntax and semantics. Natural Language processing techniques involves processing documents or text which reduces storage space and also reduces the size of index and understanding the given information which satisfies user’s need. NLP techniques improve the performance of the information retrieval efficiency and effective documentation processes. Common dialect handling procedures incorporates tokenization, stop word expulsion, stemming, lemmatization, parts of discourse labeling, lumping and named substance recognizer which enhances execution of NLP applications. The Natural Language Toolkit is the best possible solution for learning the ropes of NLP domain. NLTK, a collection of application packages which encourage researchers and learners in natural language processing, computational linguistics and artificial intelligence.

A Study on Big Data Privacy Protection Models using Data Masking Methods

Article

Full-text available

Oct 2018
IJECE

In today's predictive analytics world, data engineering play a vital role, data acquisition is carried out from various source systems and process as per the business applications and domain. Big Data integrates, governs, and secures big data with repeatable, reliable, and maintainable processes. Through volume, speed, and assortment of information characteristics try to reveal business esteem from enormous information. However, with information that is frequently deficient, conflicting, ungoverned, and unprotected, which is hazardous and enormous information being a risk instead of an advantage. What's more, with conventional methodologies that are manual and unpredictable, huge information ventures take too long to acknowledge business esteem. Reasonably and over and again conveying business esteem from enormous information requires another technique. In this connection, raw data has to be moved between onsite and offshore environment during this course of action, data privacy is a major concern and challenge. A Big Data Privacy platform can make it easier to detect, investigate, assess, and remediate threats from intruders. We tried to do complete study of Big Data Privacy using data masking methods on various data loads and different types. This work will help data quality analyst and big data developers while building the big data applications. © 2018 Institute of Advanced Engineering and Science. All rights reserved.

A Big Data Security using Data Masking Methods

Article

Full-text available

Aug 2017

Due to Internet of things and social media platforms, raw data is getting generated from systems around us in three sixty degree with respect to time, volume and type. Social networking is increasing rapidly to exploit business advertisements as business demands. In this regard there are many challenges for data management service providers, security is one among them. Data management service providers need to ensure security for their privileged customers in providing accurate and valid data. Since underlying transactional data have varying data characteristics such huge volume, variety and complexity, there is an essence of deploying such data sets on to the big data platforms which can handle structured, semi-structured and un-structured data sets. In this regard we propose a data masking technique for big data security. Data masking ensures proxy of original dataset with a different dataset which is not real but looks realistic. The given data set is masked using modulus operator and the concept of keys. Our experiment advocates enhanced modulus based data masking is better with respect to execution time and space utilization for larger data sets when compared to modulus based data masking. This work will help big data developers, quality analysts in the business domains and provides confidence for end-users in providing data security. © 2017 Institute of Advanced Engineering and Science. All rights reserved.

A Study on Edge Computing through Machine Learning for IoT Devices

Conference Paper

Full-text available

Dec 2021

A Study on Onsite–Offshore Data Security Model for Big Data Applications

Chapter

Mar 2021

In the contemporary information technology world, data engineering enacts a vital role in creating scenario-based test data for development and testing purposes and this is done through the cloning of the production data to generate real-time data for the said purposes. The data movement between the onsite and offshore sites for development and testing purposes is inevitable. Information securing is done from different data source frameworks and procedures according to business applications. Enormous data coordinates oversees and protects large information with repeatable, dependable and viable procedures. With the big data that is frequently fragmented which are conflicting and ungoverned which are unprotected and associations have the chance of enormous information being a risk rather than an advantage. Business ventures take too long time to even consider realizing the value of data quality and data validity. Sustainably business organizations require another procedure to find the data in real time for its validity and quality during the development cycle. Right now, information must be moved among on location and seaward condition during this strategy, information security is a significant concern. A major information security stage can make it simpler to recognize, evaluate and remediate dangers from interlopers. We attempted to do an investigation of huge information security utilizing information veiling strategies on different information loads and various sorts of data. The proposed work will enable data quality specialists and data engineers during big data application building with more security features.

Machine Learning Technologies and Applications Proceedings of ICACECS 2020: Proceedings of ICACECS 2020

Book

Full-text available

Jan 2021

This book comprises the best deliberations with the theme “Machine Learning Technologies and Applications” in the “International Conference on Advances in Computer Engineering and Communication Systems (ICACECS 2020),” organized by the Department of Computer Science and Engineering, VNR Vignana Jyothi Institute of Engineering and Technology. The book provides insights into the recent trends and developments in the field of computer science with a special focus on the machine learning and big data. The book focuses on advanced topics in artificial intelligence, machine learning, data mining and big data computing, cloud computing, Internet of things, distributed computing and smart systems.

Smartphone Price Prediction in Retail Industry Using Machine Learning Techniques

Chapter

Jan 2019

A study of measuring the critical factors of quality management

Article

Full-text available

Mar 1995
Int J Qual Reliab Manag

Saraph et al. systematically attempted to organize and synthesize the various perceptions offered by other authors on the critical factors of quality management. The authors provided a synthesis of the quality literature by identifying eight critical factors of quality management in a business unit. They stated that the measures were both valid and reliable. In the light of this, empirically tests their resulting instrument to a greater extent and from an international perspective. The instrument of the operational measures of the developed factors is tested for reliability and validity using data collected from 424 general managers and quality managers in the United Arab Emirates. Results provide strong evidence that the measures are judged to be both valid and reliable. The empirical replication on a more broadly based sample provides further corroboration of Saraph et al.’s results. Examines the level of practice of factors of quality management in the UAE. Suggests that better use of the instrument is accomplished when it is used jointly with other instruments that measure customer satisfaction.

A Survey on Multimedia Data Mining and Its Relevance Today

Article

Full-text available

Nov 2010

Over the past decades, data mining has proved to be a successful approach for extracting hidden knowledge from huge collections of structured digital data stored in databases. From the inception, Data mining was done primarily on numerical set of data. Nowadays as large multimedia data sets such as audio, speech, text, web, image, video and combinations of several types are becoming increasingly available and are almost unstructured or semistructured data by nature, which makes it difficult for human beings to extract the information without powerful tools. This drives the need to develop data mining techniques that can work on all kinds of data such as documents, images, and signals. This paper explores on survey of the current state of multimedia data mining and knowledge discovery, data mining efforts aimed at multimedia data, current approaches and well known techniques for mining multimedia data.

Developing a Framework for Assessing Information Quality on the World Wide Web

Article

Full-text available

Jan 2005

The rapid growth of the Internet as an environment for information exchange and the lack of enforceable standards regarding the information it contains has lead to numerous information quality problems. A major issue is the inability of Search Engine technology to wade through the vast expanse of questionable content and return "quality" results to a user's query. This paper attempts to address some of the issues involved in determining what quality is, as it pertains to information retrieval on the Internet. The IQIP model is presented as an approach to managing the choice and implementation of quality related algorithms of an Internet crawling Search Engine.

A Data Quality Management Maturity Model

Article

Apr 2006

Many previous studies of data quality have focused on the realization and evaluation of both data value quality and data service quality. These studies revealed that poor data value quality and poor data service quality were caused by poor data structure. In this study we focus on metadata management, namely, data structure quality and introduce the data quality management maturity model as a preferred maturity model. We empirically show that data quality improves as data management matures.

Statistical Data Quality Model for Data Migration Business Enterprise

Abstract

Recommended publications

Advances in Databases and Information Systems 23rd European Conference, ADBIS 2019, Bled, Slovenia,...

Towards collaborative data reduction in stream-processing systems

Highly Extensible Modular System for Online Monitoring of the Atlas Experiment

Event dissemination via group-aware stream filtering