We introduce the concept of coverage risk as an error measure for density
ridge estimation. The coverage risk generalizes the mean integrated square
error to set estimation. We propose two risk estimators for the coverage risk
and we show that we can select tuning parameters by minimizing the estimated
risk. We study the rate of convergence for coverage risk and prove consistency
of the risk estimators. We apply our method to three simulated datasets and to
cosmology data. In all the examples, the proposed method successfully recover
the underlying density structure.
All content in this area was uploaded by Yen-Chi Chen on Jun 11, 2015
Content may be subject to copyright.
A preview of the PDF is not available
... This paper handles the same problem but explicitly estimates the manifold by the density ridge and generates new data by bootstrapping, which avoids the computational cost of MCMC sampling. Our method makes heavy use of ridge estimation; see related development in [25,17,11,9,10,16]. ...
... We use an oversmoothing parameter α, usually between 2 and 4, and good estimates can be often obtained across a wide range of α values. The authors of [9] gave a method of selecting h that minimizes coverage risk estimates. ...
... The search space of bandwidth is set as as minimum following Genoves e et al. (2016). The maximum bandwidth value is set as Silverman's r ule-of-thumb (Silverman, 1986) since this bandwidth selection is usua lly considered oversmoothing (Hall et al., 1991), and this idea was p reviously also used for ridge detection analysis (Chen et al., 2015). Removing low density data points (outliers) to infer the persistent h omology features is recommended (Chazal et al., 2018), so we set the threshold to eliminate data points that is where is a kernel den sity function with the bandwidth parameter and is kernel dens ity estimate using all data points. ...
... We use an oversmoothing parameter α, usually between 2 and 4, and good estimates can be often obtained across a wide range of α values. [8] gave a method to select h that minimizes coverage risk estimates. ...
Probabilistic models of data sets often exhibit salient geometric structure. Such a phenomenon is summed up in the manifold distribution hypothesis, and can be exploited in probabilistic learning. Here we present normal-bundle bootstrap (NBB), a method that generates new data which preserve the geometric structure of a given data set. Inspired by algorithms for manifold learning and concepts in differential geometry, our method decomposes the underlying probability measure into a marginalized measure on a learned data manifold and conditional measures on the normal spaces. The algorithm estimates the data manifold as a density ridge, and constructs new data by bootstrapping projection vectors and adding them to the ridge. We apply our method to the inference of density ridge and related statistics, and data augmentation to reduce overfitting.
... More recent contributions regarding computational issues are due to Ozertem and Erdogmus (2011) and Sasaki et al. (2014Sasaki et al. ( , 2017. Asymptotic results for ridge detection using kernel estimators are derived in Chen et al. (2014Chen et al. ( , 2015b. show that kernel estimation of density ridges leads to consistent estimators under mild regularity assumptions and establish bounds for the Hausdorff distance between the true and estimated ridge. ...
Both music and language are found in all known human societies, yet no studies have compared similarities and differences between song, speech, and instrumental music on a global scale. In this Registered Report, we analyzed two global datasets: (i) 300 annotated audio recordings representing matched sets of traditional songs, recited lyrics, conversational speech, and instrumental melodies from our 75 coauthors speaking 55 languages; and (ii) 418 previously published adult-directed song and speech recordings from 209 individuals speaking 16 languages. Of our six preregistered predictions, five were strongly supported: Relative to speech, songs use (i) higher pitch, (ii) slower temporal rate, and (iii) more stable pitches, while both songs and speech used similar (iv) pitch interval size and (v) timbral brightness. Exploratory analyses suggest that features vary along a “musi-linguistic” continuum when including instrumental melodies and recited lyrics. Our study provides strong empirical evidence of cross-cultural regularities in music and speech.
This paper studies the linear convergence of the subspace constrained mean shift (SCMS) algorithm, a well-known algorithm for identifying a density ridge defined by a kernel density estimator. By arguing that the SCMS algorithm is a special variant of a subspace constrained gradient ascent (SCGA) algorithm with an adaptive step size, we derive the linear convergence of such SCGA algorithm. While the existing research focuses mainly on density ridges in the Euclidean space, we generalize density ridges and the SCMS algorithm to directional data. In particular, we establish the stability theorem of density ridges with directional data and prove the linear convergence of our proposed directional SCMS algorithm.
We consider nonparametric estimation of the ridge of a probability density function for multivariate linear processes with long-range dependence. We derive functional limit theorems for estimated eigenvectors and eigenvalues of the Hessian matrix. We use these results to obtain the weak convergence for the estimated ridge and asymptotic simultaneous confidence regions.
In the context of estimating local modes of a conditional density based on kernel density estimators, we show that existing bandwidth selection methods developed for kernel density estimation are unsuitable for mode estimation. We propose two methods to select bandwidths tailored for mode estimation in the regression setting. Numerical studies using synthetic data and a real-life data set are carried out to demonstrate the performance of the proposed methods in comparison with several well received bandwidth selection methods for density estimation.
The detection and characterization of filamentary structures in the cosmic web allows cosmologists to constrain parameters
that dictate the evolution of the Universe. While many filament estimators have been proposed, they generally lack estimates
of uncertainty, reducing their inferential power. In this paper, we demonstrate how one may apply the subspace constrained
mean shift (SCMS) algorithm (Ozertem & Erdogmus 2011; Genovese et al. 2014) to uncover filamentary structure in galaxy
data. The SCMS algorithm is a gradient ascent method that models filaments as density ridges, one-dimensional smooth curves
that trace high-density regions within the point cloud. We also demonstrate how augmenting the SCMS algorithm with bootstrap-based
methods of uncertainty estimation allows one to place uncertainty bands around putative filaments. We apply the SCMS first
to the data set generated from the Voronoi model. The density ridges show strong agreement with the filaments from Voronoi
method. We then apply the SCMS method data sets sampled from a P3M N-body simulation, with galaxy number densities consistent with SDSS and WFIRST-AFTA, and to LOWZ and CMASS data from the Baryon Oscillation Spectroscopic Survey (BOSS). To further assess the efficacy of SCMS,
we compare the relative locations of BOSS filaments with galaxy clusters in the redMaPPer catalogue, and find that redMaPPer
clusters are significantly closer (with p-values <10−9) to SCMS-detected filaments than to randomly selected galaxies.
Modal regression estimates the local modes of the distribution of Y given X =
x, instead of the mean, as in the usual regression sense, and can hence reveal
important structure missed by usual regression methods. We study a simple
nonparametric method for modal regression, based on a kernel density estimate
(KDE) of the joint distribution of Y and X. We derive asymptotic error bounds
for this method, and propose techniques for constructing confidence sets and
prediction sets. The latter is used to select the smoothing bandwidth of the
underlying KDE. The idea behind modal regression is connected to many others,
such as mixture regression and density ridge estimation, and we discuss these
ties as well.
The large sample theory of estimators for density modes is well-understood.
In this paper we consider density ridges, which are a higher-dimensional
extension of modes. Modes correspond to zero-dimensional, local high-density
regions in point clouds. Density ridges correspond to $s$-dimensional, local
high-density regions in point clouds. We establish three main results. First we
show that, under appropriate regularity conditions, the local variation of the
estimated ridge can be approximated by an empirical process. Second, we show
that the distribution of the estimated ridge converges to a Gaussian process.
Third, we establish that the bootstrap leads to valid confidence sets for
density ridges.
The generalized density is a product of a density function and a weight
function. For example, the average local brightness of an astronomical image is
the probability of finding a galaxy times the mean brightness of the galaxy. We
propose a method for studying the geometric structure of generalized densities.
In particular, we show how to find the modes and ridges of a generalized
density function using a modification of the mean shift algorithm and its
variant, subspace constrained mean shift. Our method can be used to perform
clustering and to calculate a measure of connectivity between clusters. We
establish consistency and rates of convergence for our estimator and apply the
methods to data from two astronomical problems.
In this study, we compare two vectorial tracing methods for 3D color images: (i) a conventional piecewise linear generalized cylinder algorithm that uses color and edge information and (ii) a principal curve tracing algorithm that uses the gradient and Hessian of a given density estimate. We tested the algorithms on synthetic and Brainbow dataset to show the effectiveness of the proposed algorithms. Results indicate that the proposed methods can successfully trace multiple axons in dense neighborhoods.
Accurate road centerline extraction plays an important role in practical remote sensing applications. Most existing centerline extraction methods have many limitations when the classified image contains complicated objects such as curvilinear, close, or short extent features. To cope with these limitations, this study presents a novel accurate centerline extraction method that integrates tensor voting, principal curves, and the geodesic method. The proposed method consists of three main steps. Tensor voting is first used to extract feature points from the classified image. The extracted feature points are then projected onto the principal curves. Finally, the feature points are linked by the geodesic method to create the central line. The experimental results demonstrate that the proposed method, which is automatic, provides a comparatively accurate solution for centerline extraction from a classified image.