ArticlePDF Available

Optimal Ridge Detection using Coverage Risk

June 2015

June 2015

Source
arXiv

Authors:

Yen-Chi Chen

University of Washington Seattle

Christopher R. Genovese

Carnegie Mellon University

Larry Wasserman

Carnegie Mellon University

We introduce the concept of coverage risk as an error measure for density ridge estimation. The coverage risk generalizes the mean integrated square error to set estimation. We propose two risk estimators for the coverage risk and we show that we can select tuning parameters by minimizing the estimated risk. We study the rate of convergence for coverage risk and prove consistency of the risk estimators. We apply our method to three simulated datasets and to cosmology data. In all the examples, the proposed method successfully recover the underlying density structure.

The cosmic web. This is a slice of the observed Universe from the Sloan Digital Sky Survey. We apply the density ridge method to detect filaments [7]. The top row is one example for the detected filaments. The bottom row shows the effect of smoothing. Bottom-Left: optimal smoothing. Bottom-Middle: under-smoothing. Bottom-Right: over-smoothing. Under optimal smoothing, we detect an intricate filament network. If we under-smooth or over-smooth the dataset, we cannot find the structure.

…

Another slice for the cosmic web data from the Sloan Digital Sky Survey. The leftmost panel shows the (estimated) L 1 coverage risk (right panel) for estimating density ridges under different smoothing parameters. We estimated the L 1 coverage risk by using data splitting. For the rest panels, from left to right, we display the case for under-smoothing, optimal smoothing, and over-smoothing. As can be seen easily, the optimal smoothing method allows the SCMS algorithm to detect the intricate cosmic network structure.

…

Figures - uploaded by Yen-Chi Chen

Content may be subject to copyright.

Content uploaded by Yen-Chi Chen

Content may be subject to copyright.

A preview of the PDF is not available

Normal-Bundle Bootstrap

Article

May 2021

Strong evidence for cross-cultural regularities in music and speech

Article

Full-text available

Jul 2023

Chris Chambers

Normal-bundle Bootstrap

Preprint

Jul 2020

Probabilistic models of data sets often exhibit salient geometric structure. Such a phenomenon is summed up in the manifold distribution hypothesis, and can be exploited in probabilistic learning. Here we present normal-bundle bootstrap (NBB), a method that generates new data which preserve the geometric structure of a given data set. Inspired by algorithms for manifold learning and concepts in differential geometry, our method decomposes the underlying probability measure into a marginalized measure on a learned data manifold and conditional measures on the normal spaces. The algorithm estimates the data manifold as a density ridge, and constructs new data by bootstrapping projection vectors and adding them to the ridge. We apply our method to the inference of density ridge and related statistics, and data augmentation to reduce overfitting.

Multivariate Long-Memory Processes and Nonparametric Density Estimation, with Applications to Ridge Detection

Thesis

Jan 2018

Klaus Telkmann

Globally, songs and instrumental melodies are slower and higher and use more stable pitches than speech: A Registered Report

Article

Full-text available

May 2024

Both music and language are found in all known human societies, yet no studies have compared similarities and differences between song, speech, and instrumental music on a global scale. In this Registered Report, we analyzed two global datasets: (i) 300 annotated audio recordings representing matched sets of traditional songs, recited lyrics, conversational speech, and instrumental melodies from our 75 coauthors speaking 55 languages; and (ii) 418 previously published adult-directed song and speech recordings from 209 individuals speaking 16 languages. Of our six preregistered predictions, five were strongly supported: Relative to speech, songs use (i) higher pitch, (ii) slower temporal rate, and (iii) more stable pitches, while both songs and speech used similar (iv) pitch interval size and (v) timbral brightness. Exploratory analyses suggest that features vary along a “musi-linguistic” continuum when including instrumental melodies and recited lyrics. Our study provides strong empirical evidence of cross-cultural regularities in music and speech.

Linear convergence of the subspace constrained mean shift algorithm: from Euclidean to directional data

Article

Apr 2022

This paper studies the linear convergence of the subspace constrained mean shift (SCMS) algorithm, a well-known algorithm for identifying a density ridge defined by a kernel density estimator. By arguing that the SCMS algorithm is a special variant of a subspace constrained gradient ascent (SCGA) algorithm with an adaptive step size, we derive the linear convergence of such SCGA algorithm. While the existing research focuses mainly on density ridges in the Euclidean space, we generalize density ridges and the SCMS algorithm to directional data. In particular, we establish the stability theorem of density ridges with directional data and prove the linear convergence of our proposed directional SCMS algorithm.

On nonparametric ridge estimation for multivariate long-memory processes

Article

May 2020

We consider nonparametric estimation of the ridge of a probability density function for multivariate linear processes with long-range dependence. We derive functional limit theorems for estimated eigenvectors and eigenvalues of the Hessian matrix. We use these results to obtain the weak convergence for the estimated ridge and asymptotic simultaneous confidence regions.

Importance sampling and its optimality for stochastic simulation models

Article

Jan 2019

Empirical evolution equations

Article

Jan 2018

Evolution equations comprise a broad framework for describing the dynamics of a system in a general state space: when the state space is finite-dimensional, they give rise to systems of ordinary differential equations; for infinite-dimensional state spaces, they give rise to partial differential equations. Several modern statistical and machine learning methods concern the estimation of objects that can be formalized as solutions to evolution equations, in some appropriate state space, even if not stated as such. The corresponding equations, however, are seldom known exactly, and are empirically derived from data, often by means of nonparametric estimation. This induces uncertainties on the equations and their solutions that are challenging to quantify, and moreover the diversity and the specifics of each particular setting may obscure the path for a general approach. In this paper, we address the problem of constructing general yet tractable methods for quantifying such uncertainties, by means of asymptotic theory combined with bootstrap methodology. We demonstrates these procedures in important examples including gradient line estimation, diffusion tensor imaging tractography, and local principal component analysis. The bootstrap perspective is particularly appealing as it circumvents the need to simulate from stochastic (partial) differential equations that depend on (infinite-dimensional) unknowns. We assess the performance of the bootstrap procedure via simulations and find that it demonstrates good finite-sample coverage. © 2018, Institute of Mathematical Statistics. All rights reserved.

Bandwidth selection for nonparametric modal regression

Article

Full-text available

Oct 2017
COMMUN STAT-SIMUL C

In the context of estimating local modes of a conditional density based on kernel density estimators, we show that existing bandwidth selection methods developed for kernel density estimation are unsuitable for mode estimation. We propose two methods to select bandwidths tailored for mode estimation in the regression setting. Numerical studies using synthetic data and a real-life data set are carried out to demonstrate the performance of the proposed methods in comparison with several well received bandwidth selection methods for density estimation.

Cosmic Web Reconstruction through Density Ridges: Method and Algorithm

Article

Full-text available

Jan 2015
MON NOT R ASTRON SOC

The detection and characterization of filamentary structures in the cosmic web allows cosmologists to constrain parameters that dictate the evolution of the Universe. While many filament estimators have been proposed, they generally lack estimates of uncertainty, reducing their inferential power. In this paper, we demonstrate how one may apply the subspace constrained mean shift (SCMS) algorithm (Ozertem & Erdogmus 2011; Genovese et al. 2014) to uncover filamentary structure in galaxy data. The SCMS algorithm is a gradient ascent method that models filaments as density ridges, one-dimensional smooth curves that trace high-density regions within the point cloud. We also demonstrate how augmenting the SCMS algorithm with bootstrap-based methods of uncertainty estimation allows one to place uncertainty bands around putative filaments. We apply the SCMS first to the data set generated from the Voronoi model. The density ridges show strong agreement with the filaments from Voronoi method. We then apply the SCMS method data sets sampled from a P3M N-body simulation, with galaxy number densities consistent with SDSS and WFIRST-AFTA, and to LOWZ and CMASS data from the Baryon Oscillation Spectroscopic Survey (BOSS). To further assess the efficacy of SCMS, we compare the relative locations of BOSS filaments with galaxy clusters in the redMaPPer catalogue, and find that redMaPPer clusters are significantly closer (with p-values <10−9) to SCMS-detected filaments than to randomly selected galaxies.

Bandwidth Selection for Mean-shift based Unsupervised Learning Techniques: a Unified Approach via Self-coverage

Article

Full-text available

Jun 2011

Jochen Einbeck

Nonparametric Modal Regression

Article

Full-text available

Dec 2014
ANN STAT

Modal regression estimates the local modes of the distribution of Y given X = x, instead of the mean, as in the usual regression sense, and can hence reveal important structure missed by usual regression methods. We study a simple nonparametric method for modal regression, based on a kernel density estimate (KDE) of the joint distribution of Y and X. We derive asymptotic error bounds for this method, and propose techniques for constructing confidence sets and prediction sets. The latter is used to select the smoothing bandwidth of the underlying KDE. The idea behind modal regression is connected to many others, such as mixture regression and density ridge estimation, and we discuss these ties as well.

Asymptotic Theory for Density Ridges

Article

Full-text available

Jun 2014
ANN STAT

The large sample theory of estimators for density modes is well-understood. In this paper we consider density ridges, which are a higher-dimensional extension of modes. Modes correspond to zero-dimensional, local high-density regions in point clouds. Density ridges correspond to $s$-dimensional, local high-density regions in point clouds. We establish three main results. First we show that, under appropriate regularity conditions, the local variation of the estimated ridge can be approximated by an empirical process. Second, we show that the distribution of the estimated ridge converges to a Gaussian process. Third, we establish that the bootstrap leads to valid confidence sets for density ridges.

Generalized Mode and Ridge Estimation

Article

Full-text available

Jun 2014

The generalized density is a product of a density function and a weight function. For example, the average local brightness of an astronomical image is the probability of finding a galaxy times the mean brightness of the galaxy. We propose a method for studying the geometric structure of generalized densities. In particular, we show how to find the modes and ridges of a generalized density function using a modification of the mean shift algorithm and its variant, subspace constrained mean shift. Our method can be used to perform clustering and to calculate a measure of connectivity between clusters. We establish consistency and rates of convergence for our estimator and apply the methods to data from two astronomical problems.

Local tracing of curvilinear structures in volumetric color images: Application to the Brainbow analysis

Article

Full-text available

Nov 2012
J VIS COMMUN IMAGE R

In this study, we compare two vectorial tracing methods for 3D color images: (i) a conventional piecewise linear generalized cylinder algorithm that uses color and edge information and (ii) a principal curve tracing algorithm that uses the gradient and Hessian of a given density estimate. We tested the algorithms on synthetic and Brainbow dataset to show the effectiveness of the proposed algorithms. Results indicate that the proposed methods can successfully trace multiple axons in dense neighborhoods.

Multivariate Density Estimation, Theory, Practice and Visualization.

Article

Jan 1994

Multivariate Density Estimation. Theory, Practice and Visualization

Article

Jan 1993

A Method for Accurate Road Centerline Extraction From a Classified Image

Article

Dec 2014

Accurate road centerline extraction plays an important role in practical remote sensing applications. Most existing centerline extraction methods have many limitations when the classified image contains complicated objects such as curvilinear, close, or short extent features. To cope with these limitations, this study presents a novel accurate centerline extraction method that integrates tensor voting, principal curves, and the geodesic method. The proposed method consists of three main steps. Tensor voting is first used to extract feature points from the classified image. The extracted feature points are then projected onto the principal curves. Finally, the feature points are linked by the geodesic method to create the central line. The experimental results demonstrate that the proposed method, which is automatic, provides a comparatively accurate solution for centerline extraction from a classified image.

Rates of strong consistency for multivariate kernel density estimators

Article

Optimal Ridge Detection using Coverage Risk

Abstract and Figures

Recommended publications

The Merger Rates and Mass Assembly Histories of Dark Matter Haloes in the Two Millennium Simulations

The optimal gravitational softening length for cosmological N-body simulations

Structure of Dark Matter Halos From Hierarchical Clustering: II. Universality and Self-Similarity in...

Large-scale Retrospective Relative Spectro-photometric Self-calibration in Space