Discussion
Started 18th Jul, 2019
Deleted profile

What are issues in data mining?

What are issues in data mining?
Data mining is not an easy task, as the algorithms used can get very complex and data is not always available at one place. It needs to be integrated from various heterogeneous data sources. These factors also create some issues. Here in this tutorial, we will discuss the major issues regarding −
  • Mining Methodology and User Interaction
  • Performance Issues
  • Diverse Data Types Issues
It refers to the following kinds of issues
  • Mining different kinds of knowledge in databases − Different users may be interested in different kinds of knowledge. Therefore it is necessary for data mining to cover a broad range of knowledge discovery task.
  • Interactive mining of knowledge at multiple levels of abstraction − The data mining process needs to be interactive because it allows users to focus the search for patterns, providing and refining data mining requests based on the returned results.
  • Incorporation of background knowledge − To guide discovery process and to express the discovered patterns, the background knowledge can be used. Background knowledge may be used to express the discovered patterns not only in concise terms but at multiple levels of abstraction.
  • Data mining query languages and ad hoc data mining − Data Mining Query language that allows the user to describe ad hoc mining tasks, should be integrated with a data warehouse query language and optimized for efficient and flexible data mining.
  • Presentation and visualization of data mining results − Once the patterns are discovered it needs to be expressed in high level languages, and visual representations. These representations should be easily understandable.
  • Handling noisy or incomplete data − The data cleaning methods are required to handle the noise and incomplete objects while mining the data regularities. If the data cleaning methods are not there then the accuracy of the discovered patterns will be poor.
  • Pattern evaluation − The patterns discovered should be interesting because either they represent common knowledge or lack novelty.
There can be performance-related issues such as follows
  • Efficiency and scalability of data mining algorithms − In order to effectively extract the information from huge amount of data in databases, data mining algorithm must be efficient and scalable.
  • Parallel, distributed, and incremental mining algorithms − The factors such as huge size of databases, wide distribution of data, and complexity of data mining methods motivate the development of parallel and distributed data mining algorithms. These algorithms divide the data into partitions which is further processed in a parallel fashion. Then the results from the partitions is merged. The incremental algorithms, update databases without mining the data again from scratch.
Diverse Data Types Issues
  • Handling of relational and complex types of data − The database may contain complex data objects, multimedia data objects, spatial data, temporal data etc. It is not possible for one system to mine all these kind of data.
  • Mining information from heterogeneous databases and global information systems − The data is available at different data sources on LAN or WAN. These data source may be structured, semi structured or unstructured. Therefore mining the knowledge from them adds challenges to data mining.

Most recent answer

Bestoon Faraj
University of Sulaimani
Thank you Kamal M for your explanation
1 Recommendation

Popular replies (1)

Eduard Babulak Dear, My impression of your question is that your question involves various aspects of crime prevention, each of which can be a challenge for humans. Financial fraud, small and organized crime are just examples of this compromising security. Here are some Big Data-based predictive analytical tools that are already in use around the globe:
1. Palantir Technologies
Preventing terrorist attacks. Preparing for major political and economic transformations. Anticipating emerging threats. Intelligence organizations have been charged with high stakes missions. Unfortunately, predicting the future remains a hard problem. Intelligence officers need intuitive ways to extract insight from massive-scale data of disparate types. Palantir provides a range of big data analysis solutions that-
  • Enable low-friction interaction between domain experts and high-quality information. This is one of the best ways to stay ahead of adaptive threats and disruptive events
  • Answers the need to enrich raw data - from signals intelligence to unstructured data- with analytic insight, so others can benefit from the intelligence given by colleagues
  • Enables securing data in a highly granular way, so they can safely collaborate while respecting privacy, civil liberties and data handling policie
2. IBM Intelligent Operations Center
The IBM Intelligent Operations Center synchronizes and analyzes information gathered from diverse data-collection systems. Patterns revealed through analytics, help decision-makers anticipate and respond to problems and despatch responders to the scene faster. The results, such as predictive policing, mean better citizen-centred service or stopping crime.
IBM Intelligent Operations Center enables:
  • Monitoring and management of resources, events and incidents through situational awareness
  • Optimization of city growth and operations through deep analysis of the city environment and resources
  • Unbroken connection with citizens and address their concerns through citizen collaboration tools and services
  • Better safety of citizens safer with crime risk hot-spot analytics
  • Data-integration from various departments and agencies through a common platform
Former policeman Shaun Hipgrave, now a security consultant at IBM, says: "It's about using big data and analytics in a smarter way. You are just giving them access to information that they never used to have before."
3. Beware, Intrado
Used by Fresno, California Police Force, Beware can be accessed through any browser (fixed or mobile) on any Internet-enabled device including tablets, smartphones, laptops and desktop computers. “Beware”-
  • Searches, sorts and scores billions of publically-available commercial records in a matter of seconds - alerting responders to potentially dangerous situations while en route to, or at the location of, a 9-1-1 request for assistance
  • Is a tool to help first responders understand the nature of the environment they may encounter during the window of a 9-1-1 event
  • Augments established protocols and procedures used by public safety personnel and presents data in a way that is typically unavailable to the first responder, helping them to be better prepared to render aid in response to an emergency situation
4. DAS (Microsoft & NYPD)
Built for & used by only the NYPD, Domain Awareness System (DAS) is a software kit for law enforcement agencies.
  • DAS pulls together data from CCTV Cameras, radiation detectors and license-plate readers, it uses a detailed database to give instant tracking when things take a turn for the worse. It's also taking some cues from the city's CompStat, using geographical patterns to help effectively deploy members of the 5-0 in areas statistically likely to suffer more crime
  • DAS is being marketed worldwide to countries that can use the intelligence to prevent crimes
5. COPLINK
IBM® i2® COPLINK® is a police software with capabilities to consolidate data from many sources, aid collaboration and help generate tactical leads.
It enables law enforcement professionals to generate photo line-ups, save their search history and organize investigations to generate reports more easily.
i2 COPLINK police software helps law enforcement officers to:
  • Discover investigative case leads by organizing and providing tactical, strategic and command-level access to vast quantities of seemingly unrelated data.
  • Visualize and analyze data on maps through time-sequence playback
  • Centralize multiple data stores in one system and discover hidden value in existing information stores
  • Share data with other law enforcement organizations with security-rich features, including password protection and data encryption
  • Search where and when needed—at the desk, in the car or with a mobile device
4 Recommendations

All replies (10)

Eduard Babulak
National Science Foundation
What are the current practical solutions to use big data in crime prevention and human safety?
1 Recommendation
Olivier Serrat
Georgetown University
Mining methodology and user interaction, performance issues, diverse data type issues … With more and more organizations accessing computer power, data mining is going to increase, more often than not for the motive of profit: should the ethics of data mining, for instance where they impact privacy, not be a fourth major issue?
2 Recommendations
Olivier Serrat You are right, dear friend
I thank you for the guidance
1 Recommendation
Eduard Babulak Dear, My impression of your question is that your question involves various aspects of crime prevention, each of which can be a challenge for humans. Financial fraud, small and organized crime are just examples of this compromising security. Here are some Big Data-based predictive analytical tools that are already in use around the globe:
1. Palantir Technologies
Preventing terrorist attacks. Preparing for major political and economic transformations. Anticipating emerging threats. Intelligence organizations have been charged with high stakes missions. Unfortunately, predicting the future remains a hard problem. Intelligence officers need intuitive ways to extract insight from massive-scale data of disparate types. Palantir provides a range of big data analysis solutions that-
  • Enable low-friction interaction between domain experts and high-quality information. This is one of the best ways to stay ahead of adaptive threats and disruptive events
  • Answers the need to enrich raw data - from signals intelligence to unstructured data- with analytic insight, so others can benefit from the intelligence given by colleagues
  • Enables securing data in a highly granular way, so they can safely collaborate while respecting privacy, civil liberties and data handling policie
2. IBM Intelligent Operations Center
The IBM Intelligent Operations Center synchronizes and analyzes information gathered from diverse data-collection systems. Patterns revealed through analytics, help decision-makers anticipate and respond to problems and despatch responders to the scene faster. The results, such as predictive policing, mean better citizen-centred service or stopping crime.
IBM Intelligent Operations Center enables:
  • Monitoring and management of resources, events and incidents through situational awareness
  • Optimization of city growth and operations through deep analysis of the city environment and resources
  • Unbroken connection with citizens and address their concerns through citizen collaboration tools and services
  • Better safety of citizens safer with crime risk hot-spot analytics
  • Data-integration from various departments and agencies through a common platform
Former policeman Shaun Hipgrave, now a security consultant at IBM, says: "It's about using big data and analytics in a smarter way. You are just giving them access to information that they never used to have before."
3. Beware, Intrado
Used by Fresno, California Police Force, Beware can be accessed through any browser (fixed or mobile) on any Internet-enabled device including tablets, smartphones, laptops and desktop computers. “Beware”-
  • Searches, sorts and scores billions of publically-available commercial records in a matter of seconds - alerting responders to potentially dangerous situations while en route to, or at the location of, a 9-1-1 request for assistance
  • Is a tool to help first responders understand the nature of the environment they may encounter during the window of a 9-1-1 event
  • Augments established protocols and procedures used by public safety personnel and presents data in a way that is typically unavailable to the first responder, helping them to be better prepared to render aid in response to an emergency situation
4. DAS (Microsoft & NYPD)
Built for & used by only the NYPD, Domain Awareness System (DAS) is a software kit for law enforcement agencies.
  • DAS pulls together data from CCTV Cameras, radiation detectors and license-plate readers, it uses a detailed database to give instant tracking when things take a turn for the worse. It's also taking some cues from the city's CompStat, using geographical patterns to help effectively deploy members of the 5-0 in areas statistically likely to suffer more crime
  • DAS is being marketed worldwide to countries that can use the intelligence to prevent crimes
5. COPLINK
IBM® i2® COPLINK® is a police software with capabilities to consolidate data from many sources, aid collaboration and help generate tactical leads.
It enables law enforcement professionals to generate photo line-ups, save their search history and organize investigations to generate reports more easily.
i2 COPLINK police software helps law enforcement officers to:
  • Discover investigative case leads by organizing and providing tactical, strategic and command-level access to vast quantities of seemingly unrelated data.
  • Visualize and analyze data on maps through time-sequence playback
  • Centralize multiple data stores in one system and discover hidden value in existing information stores
  • Share data with other law enforcement organizations with security-rich features, including password protection and data encryption
  • Search where and when needed—at the desk, in the car or with a mobile device
4 Recommendations
Hayder Naser Khraibet Al-Behadili
Shat Al Arab University College
A huge issues for data mining task is that the majority of data mining model are black-box approaches with lack transparency, hence do not foster trust and acceptance of them among end-users.
1 Recommendation
Hayder Naser Khraibet Al-Behadili thank you for the guidance
2 Recommendations
Subrata Datta
Netaji Subhash Engineering College
Oliver Serrat : Thank you for your observation.
Jaydip Datta
Independent Academician
For Data Structure & Data Base Management System .
4 Recommendations
Kamal M Alsaad
University of Basrah
Data mining is a dynamic and fast-expanding field with great strengths. In this section, we briefly outline the major issues in data mining research, partitioning them into five groups: mining methodology, user interaction, efficiency and scalability, diversity of data types, and data mining and society.
Some of the Data mining challenges are given as under:
  • Security and Social Challenges.
  • Noisy and Incomplete Data.
  • Distributed Data.
  • Complex Data.
  • Performance.
  • Scalability and Efficiency of the Algorithms.
  • Improvement of Mining Algorithms.
  • Incorporation of Background Knowledge.
Bestoon Faraj
University of Sulaimani
Thank you Kamal M for your explanation
1 Recommendation

Similar questions and discussions

Related Publications

Article
Maximal pattern mining in highly dynamic transactional database is difficult task since both discarded and updated contents are used together by changing the threshold. This essence has spurred the researchers to develop algorithm to support both incremental and interactive mining, which do not identify the pattern again and again for a correspondi...
Article
Public and private organizations have legacy or operational spatial databases or non-spatial databases, which are also somehow linked to a spatial database or a spatial meaning. In addition to mission related databases, these organizations either have or access several databases comprising such as census, economic, security, image, multimedia, stat...
Preprint
Motivation The rapidly increasing size of biomedical databases such as MEDLINE requires the use of intelligent data mining methods for information extraction and summarization. Existing biomedical text-mining tools have limited capabilities for inferring topological and network relationships between biomedical terms. Very often too much is returned...
Got a technical question?
Get high-quality answers from experts.