Figure 4 - uploaded by Vincent Cho
Content may be subject to copyright.
Performance comparison of various ordering criteria using 21 attributes.  

Performance comparison of various ordering criteria using 21 attributes.  

Source publication
Conference Paper
Full-text available
Many successful knowledge discovery or data mining techniques and systems have been developed. These techniques usually apply to centralized databases with less restricted requirements on learning and response time. Not so much effort yet has been put into mining of distributed databases and real-time issues. In this paper, we investigate issues of...

Citations

... [10] ...
Article
Full-text available
Advances in computing and communication over wired and wireless networks have resulted in many pervasive distributed computing environments. Many of these environments deal with different distributed sources of voluminous data, multiple compute nodes, and distributed user community. Analyzing and monitoring these distributed data sources require a data mining technology designed for distributed applications. The field of distributed data mining (DDM) deals with this problem---mining distributed data by paying careful attention to the distributed resources. The goal of this paper is to maintain and distribute a bibliography of DDM-related publications. We hope that DDM researchers and practitioners find this service useful. We welcome every help from the community in maintaining the bibliography.
... Yamanishi [21] developed a distributed cooperative Bayesian learning approach, in which different Bayesian agents estimate the parameters of the target distribution and a global learner combines the outputs of each local model. Cho and Wuthrich [6] described the fragmented approach, in which a global rule set is formed based on the rules generated at each local site, to mine classifiers from distributed information sources. Lam and Segre [13] suggested a technique to derive Bayesian belief network from distributed data. ...
Conference Paper
Full-text available
Distributed Data Mining (DDM) has been very active and enjoying a growing amount attention since its inception. Current DDM techniques regard the distributed data sets as a single virtual table and assume there exists a global model which could be generated if the data were combined/centralized. This paper proposes a similarity-based distributed data mining(SBDDM) framework which explicitly take the differences among distributed sources into consideration. A new similarity measure is introduced and its effectiveness is then evaluated and validated. This paper also illustrates the limitations of current DDM techniques through three concrete case studies. Finally distributed clustering within the SBDDM framework is also discussed.
... The fragmented approach to mining classifiers from distributed data sources is suggested by [10]. In this method a single, best, rule is generated in each distributed data source. ...
Article
This paper presents a method for distributed multivariate regression using wavelet-based collective data mining (CDM). The method seamlessly blends machine learning and the theory of communication with the statistical methods employed in parametric multivariate regression to provide an effective data mining technique for use in a distributed data and computation environment. The technique is applied to two benchmark data sets, producing results that are consistent with those obtained by applying standard parametric regression techniques to centralized data sets. Evaluation of the method in terms of mode accuracy as a function of appropriateness of the selected wavelet function, relative number of nonlinear cross-terms, and sample size demonstrates that accurate parametric multivariate regression models can be generated from distributed, heterogeneous, data sets with minimal data communication overhead compared to that required to aggregate a distributed data set. Application of this method to linear discriminant analysis, which is related to parametric multivariate regression, produced classification results on the Iris data set that are comparable to those obtained with centralized data analysis.
... For example, the final decision is that the Nky moves up. Though maximum likelihood yields already good results for making this final decision, we found a slight improvement over maximum likelihood for this application (this is described in detail in Cho and Wüthrich (1998)). After the direction of the stock market movement is determined (up, down or steady), the closing value of the stock index is computed. ...
Article
Full-text available
We predict stock markets using information contained in articles pubilshed on the Web. Mostly textual artictes appearing in the leading and the most influential financial newspapers are taken as input. From those articles the daily closing values of major stock market indices In Asia, Europe and America are predicted. Textual statements contain not only the effect (e.g., stocks down) but also the possible causes of the event (e.g., stocks down because of weakness in the dollar and consequently a weakening of the treasury bonds). Exploiting textual information therefore increases the quality of the input.The forecasts are available real-time via www.cs.ust.hk/~beat/Predict dally at 7:45 am Hong Kong time. Hence all predictions are available before the major Asian markets, Tokyo, Hong Kong and Singapore, start trading. Several techniques, such as rule-based, k-NN algorithm and neural net have been employed to produce the forecast. Those techniques are compared with one another. A trading strategy based on the system’s forecast is suggested. This strategy is shown to potentially outperform stock fund managers. This suggests that it will be extremely difficult to further improve the system’s accuracy. Hence the performance is very close to what can be expected in the best case from a system or even from human beings.
Article
Yield management is a technique focusing management decision making on maximizing revenue or profit from the sale of hotel rooms. In our study, we develop a yield management technique for maximizing revenue using a probabilistic rule-based framework in Knowledge Discovery technique. The paper starts the investigation with a theoretical framework that sets the profit maximization criteria for a hotel. The key role of yield management is to provide hotel managers with an effective means to achieve maximum revenue in a changing market environment by allocating the sale of accommodations to different customer categories through a set of decision rules.
Conference Paper
This paper discusses a new approach for developing a service-oriented infrastructure for distributed data mining applications. The proposed architecture hides the complexity of implementation details and enables users to perform data mining in a utility-like fashion. The service-oriented architecture provides an autonomic data mining framework where self-describing data mining services can be automatically discovered on the Internet. Moreover, this structure allows for the implementation of data mining algorithms for processing data on more than one site in a distributed manner. The performance of the proposed distributed data mining framework is compared to a standard data mining approach to demonstrate its effectiveness.
Conference Paper
Nowadays, massive amounts of data that are often geographically distributed and owned by different organisations are being mined. As consequence, a large mount of knowledge is being produced. This causes the problem of efficient knowledge management and mining. The main aim is to develop DM infrastructures to fully exploit the benefit of the knowledge contained in these very large data repositories. To this end, we introduced ”knowledge map” approach to represent easily and efficiently the knowledge mined in a large-scale platform such as Grid. This also facilitates the integration and coordination of local mining processes along with existing knowledge to increase the accuracy of the final models. In this paper, we discuss its advantages and its design issues.