Figure 4 - uploaded by Christian Kaiser
Content may be subject to copyright.
A simulated power consumption profile of a residential building for a typical weekday [24].

A simulated power consumption profile of a residential building for a typical weekday [24].

Source publication
Conference Paper
Full-text available
Local regression models are one of the backbones of spatial analytics. Computational scalability of such models can be resolved with a distributed implementation which requires truly local modelling as communication between components is limited or absent at some stages. The use of such models in a streaming context provide further restrictions. A...

Context in source publication

Context 1
... presented experimental study therefore investigates computational and exploratory abil- ities of SLR and can not be used to draw conclusions on the real energy supply system of the region. A sample daily power consumption profile of a large residential building is shown in Figure 4. The data stream used for regression modelling was gen- erated according to the following model, which is an exact weighted locally linear combination (5): ...

Similar publications

Article
Full-text available
Gastric cancer is the most prevalent and the leading cause of cancer death in Colombia. It has been identified as possible risk factors associated the altitude in the Andean region and the acquisition of the bacterium Helicobacter (H.) pyloriin childhood. The aim was to explain the behavior of this pathology in the department of Boyacá, Colombia fo...

Citations

... However, these studies do not deal with frequently updated datasets, efficiently. In the literature, there are limited number of GWR studies dealing with frequently updated datasets and they require distributed and/or parallel computing resources [13,14] . ...
... Tran et al. [49] developed distributed GWR to handle large dataset by leveraging Spark framework. Pozdnoukhov and Kaiser [13] proposed a spatially distributed incremental local regression model to handle streaming data. The proposed approach used the distributed computing framework of MapReduce for storage management and handling intensive computations. ...
Article
Geographically weighted regression (GWR) is a local spatial regression technique to model varying relationships in many application domains, such as ecology, environmental management, public health, meteorology, and tourism. In the literature, most of the studies dealing with GWR do not take into account if the dataset is frequently updated and so these techniques are not efficient to handle such datasets. In this study, to handle frequently updated data on given locations, a computationally efficient GWR approach, RNN-GWR, which utilizes reverse nearest neighbor (RNN) strategy, is proposed. The performance of the proposed RNN-GWR approach is compared with the performances of a Naïve-GWR and FastGWR approaches. Experimental evaluations show that the proposed approach is computationally efficient than the other approaches on handling frequently updated data.
... The modesty of this improvement despite utilizing 100 nodes on the grid is probably the result of the algorithm not parallelizing the calculation of the model diagnostics, which can require significant computational costs. Pozdnoukhov and Kaiser (2011) implemented a scalable local regression algorithm using a MapReduce parallelization framework to analyze streaming geo-referenced data. The algorithm is able to handle a large amount of streaming data for a relatively small number of locations, but it is severely limited in handling a large number of locations. ...
Article
Geographically Weighted Regression (GWR) is a widely-used tool for exploring spatial heterogeneity of processes over geographic space. GWR computes location-specific parameter estimates, which makes its calibration process computationally intensive. The maximum number of data points that can be handled by current open-source GWR software is approximately 15,000 observations on a standard desktop. In the era of big data, this places a severe limitation on the use of GWR. To overcome this limitation, we propose a highly scalable, open-source FastGWR implementation based on Python and the Message Passing Interface (MPI) that scales to the order of millions of observations. FastGWR optimizes memory usage along with parallelization to boost performance significantly. To illustrate the performance of FastGWR, a hedonic house price model is calibrated on approximately 1.3 million single-family residential properties from a Zillow dataset for the city of Los Angeles, which is the first effort to apply GWR to a dataset of this size. The results show that FastGWR scales linearly as the number of cores within the High-Performance Computing (HPC) environment increases. It also outperforms currently available open-sourced GWR software packages with drastic speed reductions-up to thousands of times faster-on a standard desktop.
... While data sets have typically been processed in batch, nowadays, there is increasing interest in real-time systems that require so-called online stream-processing (Michalak et al., 2012). Other studies outside of semiparametric regression that developed online methods for horizontally partitioned data are Guestrin et al. (2004), Bhaduri and Kargupta (2008), Pozdnoukhov and Kaiser (2011) and Yan et al. (2013). Interestingly, the former three studies included mechanisms for concept drift in the algorithms for distributed regression. ...
Article
This paper proposes a method for semiparametric regression analysis of large-scale data which are distributed over multiple hosts. This enables modeling of nonlinear relationships and both the batch approach, where analysis starts after all data have been collected, and the real-time setting are addressed. The methodology is extended to operate in evolving environments, where it can no longer be assumed that model parameters remain constant over time. Two areas of application for the methodology are presented: regression modeling when there are multiple data owners and regression modeling within the MapReduce framework. A website, realtime-semiparametric-regression.net, illustrates the use of the proposed method on United States domestic airline data in real-time.
Article
Full-text available
Mekânsal analizler günümüzde önemli hale gelmiştir ve çok farklı uygulama alanlarında kullanılmaktadır. Yaygın olarak kullanılan konum temelli analiz yöntemlerinden biri olan Coğrafi Ağırlıklı Regresyon (Geographically Weighted Regression-GWR) coğrafya üzerindeki değişen ilişkileri modellemek için kullanılan bir yerel mekânsal regresyon tekniğidir. Coğrafi ve Zamansal Ağırlıklı Regresyon (Geographically and Temporal Weighted Regression-GTWR) ise GWR yaklaşımının verideki zamansal ilişkileri gözönüne almasıyla geliştirilen bir yaklaşımdır.Veri kümesinde mekân-zamansal heterojenliğin olduğu durumlarda GTWR yaklaşımı GWR yaklaşımına göre daha iyi modeller üretmesine rağmen mekân-zamansal modellerin karmaşıklığı göz önüne alındığında algoritma zaman karmaşıklığı artmaktadır. Bu nedenle, literatürde koşturulan GTWR modelleri sınırlı sayıdaki veri üzerinde çalışabilmiştir. Bu çalışmada GTWR’nin algoritmasının hızını arttırmak ve dolayısı ile veri boyutu kısıtlamasının üstesinden gelmek için hızlı bir GTWR yaklaşımı olan FastGTWR modeli önerilmiştir. Önerilen FastGTWR yaklaşımının performansı gerçek veriler kullanılarak klasik GWR ve GTWR yaklaşımlarının performanslarıyla karşılaştırılmıştır. Deneysel sonuçlar önerilen FastGTWR yaklaşımının GWR ve GTWR yaklaşımlarına göre daha hızlı çalıştığını ortaya koymuştur.
Conference Paper
Stochastic analysis and prediction is an important component of space-time data processing for a broad spectrum of Geographic Information Systems scientists and end users. For this task, a variety of numerical tools are available that are based on established statistical techniques. We present an original software tool that implements stochastic data analysis and prediction based on the Bayesian Maximum Entropy methodology, which has attractive advanced analytical features and has been known to address shortcomings of common mainstream techniques. The proposed tool contains a library of Bayesian Maximum Entropy analytical functions, and is available in the form of a plugin for the Quantum GIS open source Geographic Information System software.