Performance comparison of various ordering criteria using 21 attributes.

Source publication

Towards Real Time Discovery from Distributed Information Sources.

Conference Paper

Full-text available

Apr 1998

Many successful knowledge discovery or data mining techniques and systems have been developed. These techniques usually apply to centralized databases with less restricted requirements on learning and response time. Not so much effort yet has been put into mining of distributed databases and real-time issues. In this paper, we investigate issues of...

Distributed Data Mining Bibliography

Article

Full-text available

May 2004

Advances in computing and communication over wired and wireless networks have resulted in many pervasive distributed computing environments. Many of these environments deal with different distributed sources of voluminous data, multiple compute nodes, and distributed user community. Analyzing and monitoring these distributed data sources require a data mining technology designed for distributed applications. The field of distributed data mining (DDM) deals with this problem---mining distributed data by paying careful attention to the distributed resources. The goal of this paper is to maintain and distribute a bibliography of DDM-related publications. We hope that DDM researchers and practitioners find this service useful. We welcome every help from the community in maintaining the bibliography.

A New Distributed Data Mining Model Based on Similarity.

Conference Paper

Full-text available

Mar 2003

Distributed Data Mining (DDM) has been very active and enjoying a growing amount attention since its inception. Current DDM techniques regard the distributed data sets as a single virtual table and assume there exists a global model which could be generated if the data were combined/centralized. This paper proposes a similarity-based distributed data mining(SBDDM) framework which explicitly take the differences among distributed sources into consideration. A new similarity measure is introduced and its effectiveness is then evaluated and validated. This paper also illustrates the limitations of current DDM techniques through three concrete case studies. Finally distributed clustering within the SBDDM framework is also discussed.

Distributed Multivariate Regression Using Wavelet-Based Collective Data Mining

Article

Mar 2001
J PARALLEL DISTR COM

This paper presents a method for distributed multivariate regression using wavelet-based collective data mining (CDM). The method seamlessly blends machine learning and the theory of communication with the statistical methods employed in parametric multivariate regression to provide an effective data mining technique for use in a distributed data and computation environment. The technique is applied to two benchmark data sets, producing results that are consistent with those obtained by applying standard parametric regression techniques to centralized data sets. Evaluation of the method in terms of mode accuracy as a function of appropriateness of the selected wavelet function, relative number of nonlinear cross-terms, and sample size demonstrates that accurate parametric multivariate regression models can be generated from distributed, heterogeneous, data sets with minimal data communication overhead compared to that required to aggregate a distributed data set. Application of this method to linear discriminant analysis, which is related to parametric multivariate regression, produced classification results on the Iris data set that are comparable to those obtained with centralized data analysis.

Daily prediction of Major Stock Indices from Textual WWW Data

Article

Full-text available

Jun 1998

We predict stock markets using information contained in articles pubilshed on the Web. Mostly textual artictes appearing in the leading and the most influential financial newspapers are taken as input. From those articles the daily closing values of major stock market indices In Asia, Europe and America are predicted. Textual statements contain not only the effect (e.g., stocks down) but also the possible causes of the event (e.g., stocks down because of weakness in the dollar and consequently a weakening of the treasury bonds). Exploiting textual information therefore increases the quality of the input.The forecasts are available real-time via www.cs.ust.hk/~beat/Predict dally at 7:45 am Hong Kong time. Hence all predictions are available before the major Asian markets, Tokyo, Hong Kong and Singapore, start trading. Several techniques, such as rule-based, k-NN algorithm and neural net have been employed to produce the forecast. Those techniques are compared with one another. A trading strategy based on the system’s forecast is suggested. This strategy is shown to potentially outperform stock fund managers. This suggests that it will be extremely difficult to further improve the system’s accuracy. Hence the performance is very close to what can be expected in the best case from a system or even from human beings.

Towards a knowledge discovery framework for yield management in the Hong Kong hotel industry

Article

Mar 2000
Int J Hospit Manag

Yield management is a technique focusing management decision making on maximizing revenue or profit from the sale of hotel rooms. In our study, we develop a yield management technique for maximizing revenue using a probabilistic rule-based framework in Knowledge Discovery technique. The paper starts the investigation with a theoretical framework that sets the profit maximization criteria for a hotel. The key role of yield management is to provide hotel managers with an effective means to achieve maximum revenue in a changing market environment by allocating the sale of accommodations to different customer categories through a set of decision rules.

An extensible service oriented distributed data mining framework

Conference Paper

Jan 2004

This paper discusses a new approach for developing a service-oriented infrastructure for distributed data mining applications. The proposed architecture hides the complexity of implementation details and enables users to perform data mining in a utility-like fashion. The service-oriented architecture provides an autonomic data mining framework where self-describing data mining services can be automatically discovered on the Internet. Moreover, this structure allows for the implementation of data mining algorithms for processing data on more than one site in a distributed manner. The performance of the proposed distributed data mining framework is compared to a standard data mining approach to demonstrate its effectiveness.

The Data Wave: Data Management and Mining

Conference Paper

Jan 2010

M-Tahar Kechadi

Nowadays, massive amounts of data that are often geographically distributed and owned by different organisations are being mined. As consequence, a large mount of knowledge is being produced. This causes the problem of efficient knowledge management and mining. The main aim is to develop DM infrastructures to fully exploit the benefit of the knowledge contained in these very large data repositories. To this end, we introduced ”knowledge map” approach to represent easily and efficiently the knowledge mined in a large-scale platform such as Grid. This also facilitates the integration and coordination of local mining processes along with existing knowledge to increase the accuracy of the final models. In this paper, we discuss its advantages and its design issues.

Performance comparison of various ordering criteria using 21 attributes.

Citations