Figure 4 - uploaded by Alexander S. Szalay
Content may be subject to copyright.
Cartesian coordinates allow quick tests for point-in-polygon and point-near-point. Each lat/lon point has a corresponding (x,y,z) unit vector.

Cartesian coordinates allow quick tests for point-in-polygon and point-near-point. Each lat/lon point has a corresponding (x,y,z) unit vector.

Source publication
Article
Full-text available
This article explains how to add spatial search functions (point-near-point and point in polygon) to Microsoft SQL Server 2005 using C# and table-valued functions. It is possible to use this library to add spatial search to your application without writing any special code. The library implements the public-domain C# Hierarchical Triangular Mesh (H...

Similar publications

Article
Full-text available
Data mining a process for assembling and analysing data into useful information can be applied as rapid measures for malaria diagnosis. In this research work we implemented (knowledge-base) inference engine that will help in mining sample patient records to discover interesting relationships in malaria related cases. The computer programming langua...
Article
Full-text available
Online data submission and retrieval system for coordinated research trials in food legumes has been developed with an aim to reduce the time and cost in collection, compilation, data analysis, retrieval and report generation. At the first instance, efforts were made to use plant breeding data as it covers more than 60% of total trials. The system...
Article
Full-text available
The increasing need for information dissemination and the tremendous population growth in today's organizations calls for the need to migrate most application and their associated data to the network where several millions of people can access concurrently. Several challenges such as security risk and management/maintenance of the database have bee...

Citations

... For example, VizieR and Topcat could only process small datasets because they adopted in-memory algorithms. The US National Virtual Observatory (NVO) proposed complicated efficiency-optimizing measures [2][3][4] named "zoned" on this problem, which was carried out completely in SQL commands to avoid expensive procedures or tablevalued functions. However, this function is too elusive, so to some extent its popularization among other astronomical organizations is limited. ...
Article
Astronomical cross-matching is a basic method for aggregating the observational data of different wavelengths. By data aggregation, the properties of astronomical objects can be understood comprehensively. Aiming at decreasing the time consumed on I/O operations, several improved methods are introduced, including a processing flow based on the boundary growing model, which can reduce the database query operations; a concept of the biggest growing block and its determination which can improve the performance of task partition and resolve data-sparse problem; and a fast bitwise algorithm to compute the index numbers of the neighboring blocks, which is a significant efficiency guarantee. Experiments show that the methods can effectively speed up cross-matching on both sparse datasets and high-density datasets. Keywordsastronomical cross-matching–boundary growing model–HEALPix–task partition–data-sparse problem
... Goodchild [2,3] and Song et al [4] created the Discrete Global Grid, with precisely equal areas. Szalay et al [1] and Gray [5] made use of HTM as spatial indexing by spherical partitioning, mapping onto B-Tree index in SQL Server. In this paper we followed the theory of HTM from them but took a different approach to implement the system architecture. ...
... IV. SYSTEM IMPLEMENTATION Gray [5] has implemented a database system that supports spatial queries based on SQL Server 2005. Different with them, in this paper we have re-implemented HTM library by Java and deployed it on the distributed system. ...
... The globe's orientations in the three-dimensional cartesian coordinate[5]. ...
Conference Paper
Full-text available
Spatial indexing is one of the most important techniques in the field of spatial data management. Many kinds of techniques of spatial indexing have been successfully developed, and each of them has advantages towards special applications. As a type of spatial data structure, Hierarchical Triangular Mesh (HTM) has excellent features of global continuity, stability, hierarchy and uniformity, which has attracted much interest of researchers for many years. This paper investigates the method that using HTM as indexing for global geographical data (only point-like objects now). The HTM is defined by subdividing a unit sphere recursively and the basic elements in it are spherical triangles that are coded as integers called HTM codes in the computer system. At the global scale, all the regions on the sphere are spherical, which can be intersected with HTM elements obeying some equations. The spatial position of each input object can also be represented by a HTM code. HTM codes thus become the bridge between query regions and input objects. Our system is based on the combination of database management system (DBMS) and distributed file system. The major information of input files is extracted as metadata that are stored on tables of DBMS, while the original files are stored on the distributed file system (called HDFS) which has potential abilities to support parallel processing. Millions of point-like objects on the global were examined and the experiments indicated the system were acceptable.
... We have used all three methods extensively since that article was written [2], [4], [5], [6], [7]. The Zone Algorithm is particularly well suited to point-near-point queries with a search radius known in advance. ...
... Pushing the logic entirely into SQL allows the query optimizer to do a very efficient job at filtering the objects. In particular, the Zone design gives a point-near-point performance comparable to the performance of the C# HTM sample code described in [5]. Both execute the following statement on the sample USGS Place table at a rate of about 600 lookups per second 2 : ...
Article
Full-text available
Zones index an N-dimensional Euclidian or metric space to efficiently support points-near-a-point queries either within a dataset or between two datasets. The approach uses relational algebra and the B-Tree mechanism found in almost all relational database systems. Hence, the Zones Algorithm gives a portable-relational implementation of points-near-point, spatial cross-match, and self-match queries. This article corrects some mistakes in an earlier article we wrote on the Zones Algorithm and describes some algorithmic improvements. The Appendix includes an implementation of point-near-point, self-match, and cross-match using the USGS city and stream gauge database.
... SQL Server 2005 lacks of adequate support to spatial data as it is testified by the article of (Fekete, Szalay, and Gray, 2005) that explains how to add spatial search functions to Microsoft SQL Server 2005 using C# and a set of scalar-valued and table-valued functions. ...
Article
The relevance of merging spatial data with standard relational data is largely recognized, since by adding geometry into databases we enhance stored information and expand the possibilities of business intelligence, as well. This paper reports about the current state of the technological transfer of research results about "spatial SQL" into relational database management systems and sets a list of the next first-class features hopefully to be implemented to further enhance business intelligence.
Chapter
Working with Jim Gray, we set out more than 20 years ago to design and build the archive for the Sloan Digital Sky Survey (SDSS), the SkyServer. The SDSS project collected a huge data set over a large fraction of the Northern Sky and turned it into an open resource for the world’s astronomy community. Over the years the project has changed astronomy. Now the project is faced with the problem of how to ensure that the data will be preserved and kept alive for active use for another 15 to 20 years. At the time there were very few examples to learn from and we had to invent much of the system ourselves. The paper discusses the lessons learned, future directions and recalls some memorable moments of our collaboration.
Article
Twenty years ago, work commenced on the Sloan Digital Sky Survey. The project aimed to collect a statistically complete dataset over a large fraction of the sky and turn it into an open data resource for the world’s astronomy community. There were few examples to learn from, and those of us who worked on it had to invent much of the system ourselves. The project has made fundamental changes to astronomy, and we are now faced with the problem of ensuring that the data will be preserved and kept in active use for another 20 years. In redesigning this very large, open archive of data, we made a system that is able to serve a much broader set of communities. In this article, I discuss what we have learned by rebuilding a massive dataset that is available to an increasingly sophisticated set of users, and how we have been challenged and motivated to incorporate more of the patterns of data analytics required by contemporary science.
Conference Paper
To enable historical analyses of logged data streams by SQL queries, the Stream Log Analysis System (SLAS) bulk loads data streams derived from sensor readings into a relational database system. SQL queries over such log data often involve numerical conditions containing inequalities, e.g. to find suspected deviations from normal behavior based on some function over measured sensor values. However, such queries are often slow to execute, because the query optimizer is unable to utilize ordered indexed attributes inside numerical conditions. In order to speed up the queries they need to be reformulated to utilize available indexes. In SLAS the query transformation algorithm AQIT (Algebraic Query Inequality Transformation) automatically transforms SQL queries involving a class of algebraic inequalities into more scalable SQL queries utilizing ordered indexes. The experimental results show that the queries execute substantially faster by a commercial DBMS when AQIT has been applied to preprocess them.
Article
High Performance Computing is becoming an instrument in its own right. The largest simulations performed on our supercomputers are now approaching petabytes. As the volume of these simulations is growing, it is becoming harder to access, analyze and visualize these data.at the same time for a broad community buy in we need to provide public access to some of the simulation results. This is becoming another Big Data challenge, where we have to move the analyses and visualizations right where the data is. The paper will discuss the challenges in creating such interactive numerical laboratories.
Conference Paper
Multi-wavelength data cross-match among multiple catalogs is a basic and unavoidable step to make distributed digital archives accessible and interoperable. As current catalogs often contain millions or billions objects, it is a typical data-intensive computation problem. In this paper, a high-efficient parallel approach of astronomical cross-match is introduced. We issue our partitioning and parallelization approach, after that we address some problems introduced by task partition and give the solutions correspondingly, including a sky splitting function HEALPix we selected which play a key role on both the task partitioning and the database indexing, and a quick bit-operation algorithm we advanced to resolve the block-edge problem. Our experiments prove that the function has a marked performance superiority comparing with the previous functions and is fully applicable to large-scale cross-match.
Article
Our collaboration with Jim Gray has created some of the world's largest astronomy databases, and has enabled us to test many avant-garde ideas in practice. The astronomers have been very receptive to these and embraced Jim as a 'card carrying member' of their community. Jim's contributions have made a permanent mark on astronomy, and eScience in general.