Stephan Mandt's research works | University of California, Irvine, CA (UCI) and other places

Figure 1: The illustration of batch-level anomaly detection with LLMs....

Figure 2: Illustration of Llama2 for batch-level anomaly detection...

Figure 3: Graphical models of the synthetic data generating processes....

Anomaly Detection of Tabular Data Using LLMs

Preprint

Full-text available

Jun 2024

Large language models (LLMs) have shown their potential in long-context understanding and mathematical reasoning. In this paper, we study the problem of using LLMs to detect tabular anomalies and show that pre-trained LLMs are zero-shot batch-level anomaly detectors. That is, without extra distribution-specific model fitting, they can discover hidd...

Neural NeRF Compression

Preprint

Jun 2024

Neural Radiance Fields (NeRFs) have emerged as powerful tools for capturing detailed 3D scenes through continuous volumetric representations. Recent NeRFs utilize feature grids to improve rendering quality and speed; however, these representations introduce significant storage overhead. This paper presents a novel method for efficiently compressing...

Preserving Identity with Variational Score for General-purpose 3D Editing

Preprint

Jun 2024

We present Piva (Preserving Identity with Variational Score Distillation), a novel optimization-based method for editing images and 3D models based on diffusion models. Specifically, our approach is inspired by the recently proposed method for 2D image editing - Delta Denoising Score (DDS). We pinpoint the limitations in DDS for 2D and 3D editing,...

Figure 1: Qualitative comparison between C-ΠG(D/F)M and ΠG(D/F)M...

Figure 3: Qualitative comparison between ΠGDM and C-ΠGDM at NFE=5 for...

Fast Samplers for Inverse Problems in Iterative Refinement Models

Preprint

Full-text available

May 2024

Constructing fast samplers for unconditional diffusion and flow-matching models has received much attention recently; however, existing methods for solving inverse problems, such as super-resolution, inpainting, or deblurring, still require hundreds to thousands of iterative steps to obtain high-quality results. We propose a plug-and-play framework...

Author Correction: Comparing storm resolving models and climates via unsupervised machine learning

Article

Full-text available

Mar 2024

Understanding precipitation changes through unsupervised machine learning

Article

Full-text available

Feb 2024

Despite the importance of quantifying how the spatial patterns of heavy precipitation will change with warming, we lack tools to objectively analyze the storm-scale outputs of modern climate models. To address this gap, we develop an unsupervised, spatial machine-learning framework to quantify how storm dynamics affect changes in heavy precipitatio...

An overview of our machine learning based approach. We extract 2D...

A selected vertical velocity field from each of the nine GSRMs used in...

Two-dimensional principal component analysis (PCA) projection plots of...

The results from the VAE trained on DYAMOND UM data. Unsupervised...

Unsupervised storm-resolving model (GSRM) inter-comparison. The top...

Comparing storm resolving models and climates via unsupervised machine learning

Article

Full-text available

Dec 2023

Global storm-resolving models (GSRMs) have gained widespread interest because of the unprecedented detail with which they resolve the global climate. However, it remains difficult to quantify objective differences in how GSRMs resolve complex atmospheric formations. This lack of comprehensive tools for comparing model similarities is a problem in m...

Figure 3. 1st row: Inverse CRPS scores (higher is better) as a function...

Figure 4. Spatially resolved CRPS scores (right two plots, lower is...

Figure A2. Autoregressive transform module for predicting the next...

Test set perceptual (FVD, LPIPS) and forecasting (CRPS) metrics, lower...

Ablation studies on (1) modeling residuals (RVD, proposed) versus...

Diffusion Probabilistic Modeling for Video Generation

Article

Full-text available

Oct 2023

Denoising diffusion probabilistic models are a promising new class of generative models that mark a milestone in high-quality image generation. This paper showcases their ability to sequentially generate video, surpassing prior methods in perceptual and probabilistic forecasting metrics. We propose an autoregressive, end-to-end optimized video diff...

Computationally-Efficient Neural Image Compression with Shallow Decoders

Conference Paper

Oct 2023

A Complete Recipe for Diffusion Generative Models

Conference Paper

Oct 2023

Fully Bayesian Autoencoders with Latent Sparse Gaussian Processes

Article

Jul 2023

Understanding Pathologies of Deep Heteroskedastic Regression

Preprint

Full-text available

Jun 2023

Several recent studies have reported negative results when using heteroskedastic neural regression models to model real-world data. In particular, for overparameterized models, the mean and variance networks are powerful enough to either fit every single data point (while shrinking the predicted variances to zero), or to learn a constant prediction...

ClimSim: An open large-scale dataset for training high-resolution physics emulators in hybrid multi-scale climate simulators

Preprint

Full-text available

Jun 2023

Modern climate projections lack adequate spatial and temporal resolution due to computational constraints. A consequence is inaccurate and imprecise prediction of critical processes such as storms. Hybrid methods that combine physics with machine learning (ML) have introduced a new generation of higher fidelity climate simulators that can sidestep...

Deep Anomaly Detection on Tennessee Eastman Process Data

Article

Apr 2023

This paper provides the first comprehensive evaluation and analysis of modern (deep‐learning‐based) unsupervised anomaly detection methods for chemical process data. We focus on the Tennessee Eastman process dataset, a standard litmus test to benchmark anomaly detection methods for nearly three decades. Our extensive study will facilitate choosing...

Asymmetrically-powered Neural Image Compression with Shallow Decoders

Preprint

Apr 2023

Neural image compression methods have seen increasingly strong performance in recent years. However, they suffer orders of magnitude higher computational complexity compared to traditional codecs, which stands in the way of real-world deployment. This paper takes a step forward in closing this gap in decoding complexity by adopting shallow or even...

Generative Diffusions in Augmented Spaces: A Complete Recipe

Preprint

Mar 2023

Score-based Generative Models (SGMs) have achieved state-of-the-art synthesis results on diverse tasks. However, the current design space of the forward diffusion process is largely unexplored and often relies on physical intuition or simplifying assumptions. Leveraging results from the design of scalable Bayesian posterior samplers, we present a c...

Deep Anomaly Detection under Labeling Budget Constraints

Preprint

Full-text available

Feb 2023

Selecting informative data points for expert feedback can significantly improve the performance of anomaly detection (AD) in various contexts, such as medical diagnostics or fraud detection. In this paper, we determine a set of theoretical conditions under which anomaly scores generalize from labeled queries to unlabeled data. Motivated by these re...

Figure 3. 2D visualization (after PCA) of the adaptively centered...

The effects of batch normalization for zero-shot AD. The first two...

Zero-Shot Anomaly Detection without Foundation Models

Preprint

Full-text available

Feb 2023

Anomaly detection (AD) tries to identify data instances that deviate from the norm in a given data set. Since data distributions are subject to distribution shifts, our concept of ``normality" may also drift, raising the need for zero-shot adaptation approaches for anomaly detection. However, the fact that current zero-shot AD methods rely on found...

Figure 2: Performance of autoencoder models as a function of the number...

Figure 6: Connections between our proposed models and other latent...

A comparison between methods of multi-output GP models and GP...

Fully Bayesian Autoencoders with Latent Sparse Gaussian Processes

Preprint

Full-text available

Feb 2023

Autoencoders and their variants are among the most widely used models in representation learning and generative modeling. However, autoencoder-based models usually assume that the learned representations are i.i.d. and fail to capture the correlations between the data samples. To address this issue, we propose a novel Sparse Gaussian Process Bayesi...

An Introduction to Neural Data Compression

Article

Jan 2023

An Introduction to Neural Data Compression

Book

Jan 2023

The goal of data compression is to reduce the number of bits needed to represent useful information. Neural, or learned compression, is the application of neural networks and related machine learning techniques to this task. This monograph aims to serve as an entry point for machine learning researchers interested in compression by reviewing the pr...

Figure 2: Three potential sequences S (i) that satisfies the condition...

Figure 6: Results from 1,000 different marginal mark queries evaluated...

Model Hyperparameters for Real-World Datasets

Probabilistic Querying of Continuous-Time Event Sequences

Preprint

Full-text available

Nov 2022

Continuous-time event sequences, i.e., sequences consisting of continuous time stamps and associated event types ("marks"), are an important type of sequential data with many applications, e.g., in clinical medicine or user behavior modeling. Since these data are typically modeled autoregressively (e.g., using neural Hawkes processes or their class...

Figure 1: Selected vertical velocity fields from our Control (0K, a-d)...

Figure 3: Changes induced by +4 • C of simulated global warming: The...

Figure 4: A comprehensive view of the vertical structure of each type...

An Unsupervised Learning Perspective on the Dynamic Contribution to Extreme Precipitation Changes

Preprint

Full-text available

Nov 2022

Despite the importance of quantifying how the spatial patterns of extreme precipitation will change with warming, we lack tools to objectively analyze the storm-scale outputs of modern climate models. To address this gap, we develop an unsupervised machine learning framework to quantify how storm dynamics affect precipitation extremes and their cha...

Predictive Querying for Autoregressive Neural Sequence Models

Preprint

Full-text available

Oct 2022

In reasoning about sequential events it is natural to pose probabilistic queries such as "when will event A occur next" or "what is the probability of A occurring before B", with applications in areas such as user modeling, medicine, and finance. However, with machine learning shifting towards neural autoregressive models such as RNNs and transform...

Lossy Image Compression with Conditional Diffusion Models

Preprint

Sep 2022

Diffusion models are a new class of generative models that mark a milestone in high-quality image generation while relying on solid probabilistic principles. This makes them promising candidate models for neural image compression. This paper outlines an end-to-end optimized framework based on a conditional diffusion model for image compression. Bes...

Comparing Storm Resolving Models and Climates via Unsupervised Machine Learning

Preprint

Full-text available

Aug 2022

Storm-resolving climate models (SRMs) have gained international interest for their unprecedented detail with which they globally resolve convection. However, this high resolution also makes it difficult to quantify the emergent differences or similarities among complex atmospheric formations induced by different parameterizations of sub-grid inform...

Raising the Bar in Graph-level Anomaly Detection

Conference Paper

Jul 2022

Graph-level anomaly detection has become a critical topic in diverse areas, such as financial fraud detection and detecting anomalous activities in social networks. While most research has focused on anomaly detection for visual data such as images, where high detection accuracies have been obtained, existing deep learning approaches for graphs cur...

Raising the Bar in Graph-level Anomaly Detection

Preprint

May 2022

Graph-level anomaly detection has become a critical topic in diverse areas, such as financial fraud detection and detecting anomalous activities in social networks. While most research has focused on anomaly detection for visual data such as images, where high detection accuracies have been obtained, existing deep learning approaches for graphs cur...

Schematic of the problem and the solution. Current simulations map from...

Schematic diagram of how OTUS can be used in an abstract analysis. The...

Performance of OTUS for Z→e+e-\documentclass[12pt]{minimal}...

Visualization of the transformation from...

Learning to simulate high energy particle collisions from unlabeled data

Article

Full-text available

May 2022

In many scientific fields which rely on statistical inference, simulations are often used to map from theoretical models to experimental data, allowing scientists to test model predictions against experimental results. Experimental data is often reconstructed from indirect measurements causing the aggregate transformation from theoretical models to...

Making Thermodynamic Models of Mixtures Predictive by Machine Learning: Matrix Completion of Pair Interactions

Preprint

Apr 2022

Predictive models of thermodynamic properties of mixtures are paramount in chemical engineering and chemistry. Classical thermodynamic models are successful in generalizing over (continuous) conditions like temperature and concentration. On the other hand, matrix completion methods (MCMs) from machine learning successfully generalize over (discrete...

Hybridizing Physical and Data-driven Prediction Methods for Physicochemical Properties

Preprint

Apr 2022

We present a generic way to hybridize physical and data-driven methods for predicting physicochemical properties. The approach ‘distills’ the physical method's predictions into a prior model and combines it with sparse experimental data using Bayesian inference. We apply the new approach to predict activity coefficients at infinite dilution and obt...

Illustration of embedding an MCM into a physical model of mixtures...

Mean absolute error (MAE) of MCM-UNIQUAC on the training and test set...

Prediction of isobaric vapor–liquid phase diagrams for binary systems...

Prediction of the VLE in ternary systems at constant pressure with...

Making Thermodynamic Models of Mixtures Predictive by Machine Learning: Matrix Completion of Pair Interactions

Article

Full-text available

Apr 2022

Predictive models of thermodynamic properties of mixtures are paramount in chemical engineering and chemistry. Classical thermodynamic models are successful in generalizing over (continuous) conditions like temperature and concentration. On the other hand, matrix completion methods (MCMs) from machine learning successfully generalize over (discrete...

SC2: Supervised Compression for Split Computing

Preprint

Mar 2022

Split computing distributes the execution of a neural network (e.g., for a classification task) between a mobile device and a more powerful edge server. A simple alternative to splitting the network is to carry out the supervised task purely on the edge server while compressing and transmitting the full data, and most approaches have barely outperf...

Diffusion Probabilistic Modeling for Video Generation

Preprint

Mar 2022

Denoising diffusion probabilistic models are a promising new class of generative models that are competitive with GANs on perceptual metrics. In this paper, we explore their potential for sequentially generating video. Inspired by recent advances in neural video compression, we use denoising diffusion models to stochastically generate a residual to...

Latent Outlier Exposure for Anomaly Detection with Contaminated Data

Preprint

Feb 2022

Anomaly detection aims at identifying data points that show systematic deviations from the majority of data in an unlabeled dataset. A common assumption is that clean training data (free of anomalies) is available, which is often violated in practice. We propose a strategy for training an anomaly detector in the presence of unlabeled anomalies that...

An Introduction to Neural Data Compression

Preprint

Feb 2022

Neural compression is the application of neural networks and other machine learning methods to data compression. While machine learning deals with many concepts closely related to compression, entering the field of neural compression can be difficult due to its reliance on information theory, perceptual metrics, and other knowledge specific to the...

Detecting Anomalies within Time Series using Local Neural Transformations

Preprint

Feb 2022

We develop a new method to detect anomalies within time series, which is essential in many application domains, reaching from self-driving cars, finance, and marketing to medical diagnosis and epidemiology. The method is based on self-supervised deep learning that has played a key role in facilitating deep anomaly detection on images, where powerfu...

Supervised Compression for Resource-Constrained Edge Computing Systems

Conference Paper

Jan 2022

Analyzing High-Resolution Clouds and Convection using Multi-Channel VAEs

Preprint

Full-text available

Dec 2021

Understanding the details of small-scale convection and storm formation is crucial to accurately represent the larger-scale planetary dynamics. Presently, atmospheric scientists run high-resolution, storm-resolving simulations to capture these kilometer-scale weather details. However, because they contain abundant information, these simulations can...

History Marginalization Improves Forecasting in Variational Recurrent Neural Networks

Article

Full-text available

Nov 2021

Deep probabilistic time series forecasting models have become an integral part of machine learning. While several powerful generative models have been proposed, we provide evidence that their associated inference models are oftentimes too limited and cause the generative model to predict mode-averaged dynamics. Mode-averaging is problematic since m...

Towards Empirical Sandwich Bounds on the Rate-Distortion Function

Preprint

Nov 2021

Rate-distortion (R-D) function, a key quantity in information theory, characterizes the fundamental limit of how much a data source can be compressed subject to a fidelity criterion, by any compression algorithm. As researchers push for ever-improving compression performance, establishing the R-D function of a given data source is not only of scien...

Lossless Compression with Probabilistic Circuits

Preprint

Nov 2021

Despite extensive progress on image generation, deep generative models are suboptimal when applied to lossless compression. For example, models such as VAEs suffer from a compression cost overhead due to their latent variables that can only be partially eliminated with elaborated schemes such as bits-back coding, resulting in oftentimes poor single...

Sequence modeling with autoregressive flows. Top: Pixel values (solid)...

Affine autoregressive transform. Computational diagram for an affine...

Redundancy reduction. (a) Conditional densities for...

Model diagrams. a An autoregressive flow pre-processes a data sequence,...

Decreased temporal correlation. a Affine autoregressive flows result in...

Improving sequential latent variable models with autoregressive flows

Article

Full-text available

Nov 2021

We propose an approach for improving sequence modeling based on autoregressive normalizing flows. Each autoregressive transform, acting across time, serves as a moving frame of reference, removing temporal correlations and simplifying the modeling of higher-level dynamics. This technique provides a simple, general-purpose method for improving seque...

Hybridizing Physical and Data-driven Prediction Methods for Physicochemical Properties

Preprint

Full-text available

Oct 2021

We present a generic way to hybridize physical and data-driven methods for predicting physicochemical properties. The approach ‘distills’ the physical method's predictions into a prior model and combines it with sparse experimental data using Bayesian inference. We apply the new approach to predict activity coefficients at infinite dilution and obt...

Supervised Compression for Resource-constrained Edge Computing Systems

Preprint

Aug 2021

There has been much interest in deploying deep learning algorithms on low-powered devices, including smartphones, drones, and medical sensors. However, full-scale deep neural networks are often too resource-intensive in terms of energy and storage. As a result, the bulk part of the machine learning operation is therefore often carried out on an edg...

Insights from Generative Modeling for Neural Video Compression

Preprint

Jul 2021

While recent machine learning research has revealed connections between deep generative models such as VAEs and rate-distortion losses used in learned compression, most of this work has focused on images. In a similar spirit, we view recently proposed neural video coding algorithms through the lens of deep autoregressive and latent variable modelin...

IAC and ESS metrics for CIFAR-10, SVHN, and FMNIST with various...

Structured Stochastic Gradient MCMC

Preprint

Full-text available

Jul 2021

Stochastic gradient Markov chain Monte Carlo (SGMCMC) is considered the gold standard for Bayesian inference in large-scale models, such as Bayesian neural networks. Since practitioners face speed versus accuracy tradeoffs in these models, variational inference (VI) is often the preferable option. Unfortunately, VI makes strong assumptions on both...

Neural Transformation Learning for Deep Anomaly Detection Beyond Images

Preprint

Mar 2021

Data transformations (e.g. rotations, reflections, and cropping) play an important role in self-supervised learning. Typically, images are transformed into different views, and neural networks trained on tasks involving these views produce useful feature representations for downstream tasks, including anomaly detection. However, for anomaly detecti...

Foundations of a Fast, Data-Driven, Machine-Learned Simulator

Preprint

Jan 2021

We introduce a novel strategy for machine-learning-based fast simulators, which is the first that can be trained in an unsupervised manner using observed data samples to learn a predictive model of detector response and other difficult-to-model transformations. Across the physical sciences, a barrier to interpreting observed data is the lack of kno...

Variational Beam Search for Online Learning with Distribution Shifts

Preprint

Dec 2020

We consider the problem of online learning in the presence of sudden distribution shifts as frequently encountered in applications such as autonomous navigation. Distribution shifts require constant performance monitoring and re-training. They may also be hard to detect and can lead to a slow but steady degradation in model performance. To address...

User-Dependent Neural Sequence Models for Continuous-Time Event Data

Preprint

Nov 2020

Continuous-time event data are common in applications such as individual behavior data, financial transactions, and medical health records. Modeling such data can be very challenging, in particular for applications with many different types of events, since it requires a model to predict the event types as well as the time of occurrence. Recurrent...

Scalable Gaussian Process Variational Autoencoders

Preprint

Oct 2020

Conventional variational autoencoders fail in modeling correlations between data points due to their use of factorized priors. Amortized Gaussian process inference through GP-VAEs has led to significant improvements in this regard, but is still inhibited by the intrinsic complexity of exact GP inference. We improve the scalability of these methods...

Variational Dynamic Mixtures

Preprint

Oct 2020

Deep probabilistic time series forecasting models have become an integral part of machine learning. While several powerful generative models have been proposed, we provide evidence that their associated inference models are oftentimes too limited and cause the generative model to predict mode-averaged dynamics. Modeaveraging is problematic since ma...

Hierarchical Autoregressive Modeling for Neural Video Compression

Preprint

Oct 2020

Recent work by Marino et al. (2020) showed improved performance in sequential density estimation by combining masked autoregressive flows with hierarchical latent variable models. We draw a connection between such autoregressive generative models and the task of lossy video compression. Specifically, we view recent neural video compression methods...

Improving Sequential Latent Variable Models with Autoregressive Flows

Preprint

Oct 2020

We propose an approach for improving sequence modeling based on autoregressive normalizing flows. Each autoregressive transform, acting across time, serves as a moving frame of reference, removing temporal correlations, and simplifying the modeling of higher-level dynamics. This technique provides a simple, general-purpose method for improving sequ...

Generative Modeling of Atmospheric Convection

Conference Paper

Sep 2020

Hybridizing physical and data-driven prediction methods for physicochemical properties

Article

Sep 2020

We present a generic way to hybridize physical and data-driven methods for predicting physicochemical properties. The approach 'distills' the physical method's predictions into a prior model and combines it with sparse experimental data using Bayesian inference. We apply the new approach to predict activity coefficients at infinite dilution and obt...

Generative Modeling for Atmospheric Convection

Preprint

Jul 2020

To improve climate modeling, we need a better understanding of multi-scale atmospheric dynamics--the relationship between large scale environment and small-scale storm formation, morphology and propagation--as well as superior stochastic parameterization of convective organization. We analyze raw output from ~6 million instances of explicitly simul...

Improving Inference for Neural Image Compression

Preprint

Jun 2020

We consider the problem of lossy image compression with deep latent variable models. State-of-the-art methods build on hierarchical variational autoencoders~(VAEs) and learn inference networks to predict a compressible latent representation of each data point. Drawing on the variational inference perspective on compression, we identify three approx...

Variable-Bitrate Neural Compression via Bayesian Arithmetic Coding

Preprint

Feb 2020

Deep Bayesian latent variable models have enabled new approaches to both model and data compression. Here, we propose a new algorithm for compressing latent representations in deep probabilistic models, such as variational autoencoders, in post-processing. The approach thus separates model design and training from the compression task. Our algorith...

Extreme Classification via Adversarial Softmax Approximation

Preprint

Feb 2020

Training a classifier over a large number of classes, known as 'extreme classification', has become a topic of major interest with applications in technology, science, and e-commerce. Traditional softmax regression induces a gradient cost proportional to the number of classes $C$, which often is prohibitively expensive. A popular scalable softmax a...

The k-tied Normal Distribution: A Compact Parameterization of Gaussian Mean Field Posteriors in Bayesian Neural Networks

Preprint

Feb 2020

Variational Bayesian Inference is a popular methodology for approximating posterior distributions over Bayesian neural network weights. Recent work developing this class of methods has explored ever richer parameterizations of the approximate posterior in the hope of improving performance. In contrast, here we share a curious experimental finding t...

How Good is the Bayes Posterior in Deep Neural Networks Really?

Preprint

Feb 2020

During the past five years the Bayesian deep learning community has developed increasingly accurate and efficient approximate inference procedures that allow for Bayesian inference in deep neural networks. However, despite this algorithmic progress and the promise of improved uncertainty quantification and sample efficiency there are---as of early...

Machine Learning in Thermodynamics: Prediction of Activity Coefficients by Matrix Completion

Preprint

Full-text available

Jan 2020

Activity coefficients, which are a measure of the non-ideality of liquid mixtures, are a key property in chemical engineering with relevance to modeling chemical and phase equilibria as well as transport processes. Although experimental data on thousands of binary mixtures are available, prediction methods are needed to calculate the activity coeff...

Machine Learning in Thermodynamics: Prediction of Activity Coefficients by Matrix Completion

Article

Full-text available

Jan 2020

Activity coefficients, which are a measure of the non-ideality of liquid mixtures, are a key property in chemical engineering with relevance to modeling chemical and phase equilibria as well as transport processes. Although experimental data on thousands of binary mixtures are available, prediction methods are needed to calculate activity coefficie...

Hydra: Preserving Ensemble Diversity for Model Distillation

Preprint

Jan 2020

Ensembles of models have been empirically shown to improve predictive performance and to yield robust measures of uncertainty. However, they are expensive in computation and memory. Therefore, recent research has focused on distilling ensembles into a single compact model, reducing the computational and memory burden of the ensemble while trying to...

Tightening bounds for variational inference by revisiting perturbation theory

Article

Dec 2019

Tightening Bounds for Variational Inference by Revisiting Perturbation Theory

Preprint

Full-text available

Sep 2019

Variational inference has become one of the most widely used methods in latent variable modeling. In its basic form, variational inference employs a fully factorized variational distribution and minimizes its KL divergence to the posterior. As the minimization can only be carried out approximately, this approximation induces a bias. In this paper,...

Autoregressive Text Generation Beyond Feedback Loops

Preprint

Aug 2019

Autoregressive state transitions, where predictions are conditioned on past predictions, are the predominant choice for both deterministic and stochastic sequential models. However, autoregressive feedback exposes the evolution of the hidden state trajectory to potential biases from well-known train-test discrepancies. In this paper, we combine a l...

Multivariate Time Series Imputation with Variational Autoencoders

Preprint

Jul 2019

Multivariate time series with missing values are common in many areas, for instance in healthcare and finance. To face this problem, modern data imputation approaches should (a) be tailored to sequential data, (b) deal with high dimensional and complex data distributions, and (c) be based on the probabilistic modeling paradigm for interpretability...

A Quantum Field Theory of Representation Learning

Preprint

Jul 2019

Continuous symmetries and their breaking play a prominent role in contemporary physics. Effective low-energy field theories around symmetry breaking states explain diverse phenomena such as superconductivity, magnetism, and the mass of nucleons. We show that such field theories can also be a useful tool in machine learning, in particular for loss f...

Augmenting and Tuning Knowledge Graph Embeddings

Preprint

Jul 2019

Knowledge graph embeddings rank among the most successful methods for link prediction in knowledge graphs, i.e., the task of completing an incomplete collection of relational facts. A downside of these models is their strong sensitivity to model hyperparameters, in particular regularizers, which have to be extensively tuned to reach good performanc...

Mobile Robotic Painting of Texture

Conference Paper

May 2019

Image Anomaly Detection with Generative Adversarial Networks: Recognizing Outstanding Ph.D. Research

Chapter

Full-text available

Jan 2019

Many anomaly detection methods exist that perform well on low-dimensional problems however there is a notable lack of effective methods for high-dimensional spaces, such as images. Inspired by recent successes in deep learning we propose a novel approach to anomaly detection using generative adversarial networks. Given a sample under consideration,...

Autoregressive Text Generation Beyond Feedback Loops

Conference Paper

Jan 2019

Quasi-Monte Carlo Flows

Conference Paper

Full-text available

Nov 2018

Normalizing flows provide a general approach to construct flexible variational posteriors. The parameters are learned by stochastic optimization of the variational bound, but inference can be slow due to high variance of the gradient estimator. We propose Quasi-Monte Carlo (QMC) flows which reduce the variance of the gradient estimator by one order...

Deep Probabilistic Video Compression

Preprint

Oct 2018

We propose a variational inference approach to deep probabilistic video compression. Our model uses advances in variational autoencoders (VAEs) for sequential data and combines it with recent work on neural image compression. The approach jointly learns to transform the original video into a lower-dimensional representation as well as to entropy co...

Image Anomaly Detection with Generative Adversarial Networks

Conference Paper

Full-text available

Sep 2018

Many anomaly detection methods exist that perform well on low-dimensional problems however there is a notable lack of effective methods for high-dimensional spaces, such as images. Inspired by recent successes in deep learning we propose a novel approach to anomaly detection using generative adversarial networks. Given a sample under consideration,...

Iterative Amortized Inference

Preprint

Full-text available

Jul 2018

Inference models are a key component in scaling variational inference to deep latent variable models, most notably as encoder networks in variational auto-encoders (VAEs). By replacing conventional optimization-based inference with a learned model, inference is amortized over data examples and therefore more computationally efficient. However, stan...

Quasi-Monte Carlo Variational Inference

Preprint

Full-text available

Jul 2018

Many machine learning problems involve Monte Carlo gradient estimators. As a prominent example, we focus on Monte Carlo variational inference (MCVI) in this paper. The performance of MCVI crucially depends on the variance of its stochastic gradients. We propose variance reduction by means of Quasi-Monte Carlo (QMC) sampling. QMC replaces N i.i.d. s...

Iterative Amortized Inference

Conference Paper

Jul 2018

Inference models are a key component in scaling variational inference to deep latent variable models, most notably as encoder networks in variational auto-encoders (VAEs). By replacing conventional optimization-based inference with a learned model, inference is amortized over data examples and therefore more computationally efficient. However, stan...

Quasi-Monte Carlo Variational Inference

Conference Paper

Full-text available

Jun 2018

Many machine learning problems involve Monte Carlo gradient estimators. As a prominent example , we focus on Monte Carlo variational inference (mcvi) in this paper. The performance of mcvi crucially depends on the variance of its stochastic gradients. We propose variance reduction by means of Quasi-Monte Carlo (qmc) sampling. qmc replaces N i.i.d....

Active Mini-Batch Sampling Using Repulsive Point Processes

Article

Full-text available

Apr 2018

The convergence speed of stochastic gradient descent (SGD) can be improved by actively selecting mini-batches. We explore sampling schemes where similar data points are less likely to be selected in the same mini-batch. In particular, we prove that such repulsive sampling schemes lowers the variance of the gradient estimator. This generalizes recen...

Scalable Generalized Dynamic Topic Models

Article

Full-text available

Mar 2018

Dynamic topic models (DTMs) model the evolution of prevalent themes in literature, online media, and other forms of text over time. DTMs assume that word co-occurrence statistics change continuously and therefore impose continuous stochastic process priors on their model parameters. These dynamical priors make inference much harder than in regular...

Scalable Generalized Dynamic Topic Models

Conference Paper

Full-text available

Mar 2018

Dynamic topic models (DTMs) model the evolution of prevalent themes in literature, online media, and other forms of text over time. DTMs assume that word co-occurrence statistics change continuously and therefore impose continuous stochastic process priors on their model parameters. These dynamical priors make inference much harder than in regular...

Improving Optimization in Models With Continuous Symmetry Breaking

Article

Mar 2018

Many loss functions in representation learning are invariant under a continuous symmetry transformation. As an example, consider word embeddings (Mikolov et al., 2013), where the loss remains unchanged if we simultaneously rotate all word and context embedding vectors. We show that representation learning models with a continuous symmetry and a qua...

A Deep Generative Model for Disentangled Representations of Sequential Data

Article

Mar 2018

We present a VAE architecture for encoding and generating high dimensional sequential data, such as video or audio. Our deep generative model learns a latent representation of the data which is split into a static and dynamic part, allowing us to approximately disentangle latent time-dependent features (dynamics) from features which are preserved o...

Continuous Word Embedding Fusion via Spectral Decomposition

Conference Paper

Jan 2018

1711.05597

Data

Dec 2017

Diversified Mini-Batch Sampling using Repulsive Point Processes

Technical Report

Full-text available

Dec 2017

Non-uniform mini-batch sampling may be beneficial in stochastic variational inference (SVI) and more generally in stochastic gradient descent (SGD). In particular, sampling data points with repulsive interactions, i.e., suppressing the probability of similar data points in the same mini-batch, was shown to reduce the stochastic gradient noise, lead...

Generalizing and Scaling up Dynamic Topic Models via Inducing Point Variational Inference

Conference Paper

Full-text available

Dec 2017

Dynamic topic models (DTMs) model the evolution of prevalent themes in literature, online media, and other forms of text over time. DTMs assume that topics change continuously over time and therefore impose continuous stochastic process priors on their model parameters. In this paper, we extend the class of tractable priors from Wiener processes to...

Advances in Variational Inference

Article

Full-text available

Nov 2017

Many modern unsupervised or semi-supervised machine learning algorithms rely on Bayesian probabilistic models. These models are usually intractable and thus require approximate inference. Variational inference (VI) lets us approximate a high-dimensional Bayesian posterior with a simpler variational distribution by solving an optimization problem. T...

Bayesian Paragraph Vectors

Article

Full-text available

Nov 2017

Word2vec (Mikolov et al., 2013) has proven to be successful in natural language processing by capturing the semantic relationships between different words. Built on top of single-word embeddings, paragraph vectors (Le and Mikolov, 2014) find fixed-length representations for pieces of text with arbitrary lengths, such as documents, paragraphs, and s...

Sparse probit linear mixed model

Article

Full-text available

Oct 2017

Linear mixed models (LMMs) are important tools in statistical genetics. When used for feature selection, they allow to find a sparse set of genetic traits that best predict a continuous phenotype of interest, while simultaneously correcting for various confounding factors such as age, ethnicity and population structure. Formulated as models for lin...

Perturbative Black Box Variational Inference

Article

Full-text available

Sep 2017

Black box variational inference (BBVI) with reparameterization gradients triggered the exploration of divergence measures other than the Kullback-Leibler (KL) divergence, such as alpha divergences. In this paper, we view BBVI with generalized divergences as a form of estimating the marginal likelihood via biased importance sampling. The choice of d...

Factorized Variational Autoencoders for Modeling Audience Reactions to Movies

Conference Paper

Jul 2017

Matrix and tensor factorization methods are often used for finding underlying low-dimensional patterns from noisy data. In this paper, we study non-linear tensor factorization methods based on deep variational autoencoders. Our approach is well-suited for settings where the relationship between the latent representation to be learned and the raw da...

Structured Black Box Variational Inference for Latent Time Series Models

Article

Full-text available

Jul 2017

Continuous latent time series models are prevalent in Bayesian modeling; examples include the Kalman filter, dynamic collaborative filtering, or dynamic topic models. These models often benefit from structured, non mean field variational approximations that capture correlations between time steps. Black box variational inference with reparameteriza...

Stochastic Learning on Imbalanced Data: Determinantal Point Processes for Mini-batch Diversification

Article

Full-text available

May 2017

We study a mini-batch diversification scheme for stochastic gradient descent (SGD). While classical SGD relies on uniformly sampling data points to form a mini-batch, we propose a non-uniform sampling scheme based on the Determinantal Point Process (DPP). The DPP relies on a similarity measure between data points and gives low probabilities to mini...

Stochastic Gradient Descent as Approximate Bayesian Inference

Article

Apr 2017

Stochastic Gradient Descent with a constant learning rate (constant SGD) simulates a Markov chain with a stationary distribution. With this perspective, we derive several new results. (1) We show that constant SGD can be used as an approximate Bayesian posterior inference algorithm. Specifically, we show how to adjust the tuning parameters of const...

Dynamic Word Embeddings

Article

Full-text available

Feb 2017

We present a probabilistic language model for time-stamped text data which tracks the semantic evolution of individual words over time. The model represents words and contexts by latent trajectories in an embedding space. At each moment in time, the embedding vectors are inferred from a probabilistic version of word2vec [Mikolov et al., 2013]. Thes...

Stephan Mandt's research while affiliated with University of California, Irvine and other places

What is this page?

Publications (115)

Citations