Home
Peking University
School of Mathematical Sciences
Zhanxing Zhu

Zhanxing Zhu
Peking University | PKU · School of Mathematical Sciences

PhD in machine learning

About

Publications

21,283

Reads

5,061

Citations

Assistant Professor at School of Mathematical Sciences, Peking University, working on machine learning.

Skills and Expertise

Pattern Recognition

Computer Vision

Artificial Intelligence

Neural Networks and Artificial Intelligence

Data Mining and Knowledge Discovery

Advanced Machine Learning

September 2012 - present

The University of Edinburgh

School of Informatics
Edinburgh, United Kingdom

Position

PhD Student

Description

Large-scale distributed and parallel optimization in machine learning, market mechanism

November 2009 - December 2012

Aalto University

Department of Computer Science
Helsinki, Finland

Position

Research Assistant

Description

Research projects: Local linear models, supervised dimensionality reduction and its application to chemical process monitoring, nonnegative learning.

October 2007 - July 2009

Beihang University (BUAA)

School of Astronautics
Beijing, China

Position

Research Assistant

Description

Nonnegative matrix factorization for hyperspectral image unmixing

Publications

Differential effect of interventions in patients with prediabetes stratified by a machine learning‐based diabetes progression prediction model

Article

Oct 2023

Aim To investigate whether stratifying participants with prediabetes according to their diabetes progression risks (PR) could affect their responses to interventions. Methods We developed a machine learning‐based model to predict the 1‐year diabetes PR (ML‐PR) with the least predictors. The model was developed and internally validated in participa...

Fine-grained differentiable physics: a yarn-level model for fabrics

Conference Paper

Full-text available

Feb 2022

Differentiable physics modeling combines physics models with gradient-based learning to provide model explicability and data efficiency. It has been used to learn dynamics, solve inverse problems and facilitate design, and is at its inception of impact. Current successes have concentrated on general physics models such as rigid bodies, deformable s...

An Annealing Mechanism for Adversarial Training Acceleration

Article

Aug 2021

Despite the empirical success in various domains, it has been revealed that deep neural networks are vulnerable to maliciously perturbed input data that can dramatically degrade their performance. These are known as adversarial attacks. To counter adversarial attacks, adversarial training formulated as a form of robust optimization has been demonst...

Proceedings of ICML 2021 Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI

Preprint

Jul 2021

This is the Proceedings of ICML 2021 Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI. Deep neural networks (DNNs) have undoubtedly brought great success to a wide range of applications in computer vision, computational linguistics, and AI. However, foundational principles underlying the DNNs' success and their r...

Adaptive Progressive Continual Learning

Article

Jul 2021

Continual learning paradigm learns from a continuous stream of tasks in an incremental manner and aims to overcome the notorious issue: the catastrophic forgetting. In this work, we propose a new adaptive progressive network framework including two models for continual learning: Reinforced Continual Learning (RCL) and Bayesian Optimized Continual L...

Adversarial Invariant Learning

Conference Paper

Jun 2021

Amata: An Annealing Mechanism for Adversarial Training Acceleration

Article

May 2021

Despite the empirical success in various domains, it has been revealed that deep neural networks are vulnerable to maliciously perturbed input data that much degrade their performance. This is known as adversarial attacks. To counter adversarial attacks, adversarial training formulated as a form of robust optimization has been demonstrated to be ef...

Spatial-Temporal Fusion Graph Neural Networks for Traffic Flow Forecasting

Article

May 2021

Spatial-temporal data forecasting of traffic flow is a challenging task because of complicated spatial dependencies and dynamical trends of temporal pattern between different roads. Existing frameworks usually utilize given spatial adjacency graph and sophisticated mechanisms for modeling spatial and temporal correlations. However, limited represen...

Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization

Preprint

Mar 2021

It is well-known that stochastic gradient noise (SGN) acts as implicit regularization for deep learning and is essentially important for both optimization and generalization of deep networks. Some works attempted to artificially simulate SGN by injecting random noise to improve deep learning. However, it turned out that the injected simple random n...

Sampling Sparse Representations with Randomized Measurement Langevin Dynamics

Article

Full-text available

Feb 2021

Stochastic Gradient Langevin Dynamics (SGLD) have been widely used for Bayesian sampling from certain probability distributions, incorporating derivatives of the log-posterior. With the derivative evaluation of the log-posterior distribution, SGLD methods generate samples from the distribution through performing as a thermostats dynamics that trave...

Amata: An Annealing Mechanism for Adversarial Training Acceleration

Preprint

Dec 2020

Spatial-Temporal Fusion Graph Neural Networks for Traffic Flow Forecasting

Preprint

Dec 2020

Spatial-temporal data forecasting of traffic flow is a challenging task because of complicated spatial dependencies and dynamical trends of temporal pattern between different roads. Existing frameworks typically utilize given spatial adjacency graph and sophisticated mechanisms for modeling spatial and temporal correlations. However, limited repres...

Neural Approximate Sufficient Statistics for Implicit Models

Preprint

Full-text available

Oct 2020

We consider the fundamental problem of how to automatically construct summary statistics for implicit generative models where the evaluation of likelihood function is intractable but sampling / simulating data from the model is possible. The idea is to frame the task of constructing sufficient statistics as learning mutual information maximizing re...

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Preprint

Oct 2020

Knowledge distillation is a strategy of training a student network with guide of the soft output from a teacher network. It has been a successful method of model compression and knowledge transfer. However, currently knowledge distillation lacks a convincing theoretical understanding. On the other hand, recent finding on neural tangent kernel enabl...

Fig. 1. The framework of our proposed method. Dtrain, D val represent...

Fig. 2. Visualization of the proposed data augmentation on certain 2D...

Fig. 3. Results on Task05 Prostate of selected architectures. Left:...

Automatic Data Augmentation for 3D Medical Image Segmentation

Preprint

Full-text available

Oct 2020

Data augmentation is an effective and universal technique for improving generalization performance of deep neural networks. It could enrich diversity of training samples that is essential in medical image segmentation tasks because 1) the scale of medical image dataset is typically smaller, which may increase the risk of overfitting; 2) the shape a...

Automatic Data Augmentation for 3D Medical Image Segmentation

Chapter

Sep 2020

Informative Dropout for Robust Representation Learning: A Shape-bias Perspective

Preprint

Aug 2020

Convolutional Neural Networks (CNNs) are known to rely more on local texture rather than global shape when making decisions. Recent work also indicates a close relationship between CNN's texture-bias and its robustness against distribution shift, adversarial perturbation, random corruption, etc. In this work, we attempt at improving various kinds o...

On Breaking Deep Generative Model-based Defenses and Beyond

Conference Paper

Jul 2020

Deep neural networks have been proven to be vulnerable to the so-called adversarial attacks. Recently there have been efforts to defend such attacks with deep generative models. These defenses often predict by inverting the deep generative models rather than simple feedforward propagation. Such defenses are difficult to attack due to the obfuscated...

Learning to Search Efficient DenseNet with Layer-wise Pruning

Conference Paper

Jul 2020

Spherical Motion Dynamics of Deep Neural Networks with Batch Normalization and Weight Decay

Preprint

Full-text available

Jun 2020

We comprehensively reveal the learning dynamics of deep neural networks (DNN) with batch normalization (BN) and weight decay (WD), named as Spherical Motion Dynamics (SMD). Our theorem on SMD is based on the scale-invariant property of weights caused by BN, and regularization effect of WD. SMD shows the optimization trajectory of weights is like a...

Classify and Generate Reciprocally: Simultaneous Positive-Unlabelled Learning and Conditional Generation with Extra Data

Preprint

Full-text available

Jun 2020

The scarcity of class-labeled data is a ubiquitous bottleneck in a wide range of machine learning problems. While abundant unlabeled data normally exist and provide a potential solution, it is extremely challenging to exploit them. In this paper, we address this problem by leveraging Positive-Unlabeled~(PU) classification and conditional generation...

Global Robustness Verification Networks

Preprint

Jun 2020

The wide deployment of deep neural networks, though achieving great success in many domains, has severe safety and reliability concerns. Existing adversarial attack generation and automatic verification techniques cannot formally verify whether a network is globally robust, i.e., the absence or not of adversarial examples in the input space. To add...

Neural Control Variates for Monte Carlo Variance Reduction

Chapter

Full-text available

Apr 2020

In statistics and machine learning, approximation of an intractable integration is often achieved by using the unbiased Monte Carlo estimator, but the variances of the estimation are generally high in many applications. Control variates approaches are well-known to reduce the variance of the estimation. These control variates are typically construc...

Efficient Neural Architecture Search via Proximal Iterations

Article

Full-text available

Apr 2020

Neural architecture search (NAS) attracts much research attention because of its ability to identify better architectures than handcrafted ones. Recently, differentiable search methods become the state-of-the-arts on NAS, which can obtain high-performance architectures in several days. However, they still suffer from huge computation costs and infe...

Multi-Stage Self-Supervised Learning for Graph Convolutional Networks on Graphs with Few Labeled Nodes

Article

Apr 2020

Graph Convolutional Networks (GCNs) play a crucial role in graph learning tasks, however, learning graph embedding with few supervised signals is still a difficult problem. In this paper, we propose a novel training algorithm for Graph Convolutional Network, called Multi-Stage Self-Supervised (M3S) Training Algorithm, combined with self-supervised...

Black-Box Certification with Randomized Smoothing: A Functional Optimization Based Framework

Preprint

Full-text available

Feb 2020

Randomized classifiers have been shown to provide a promising approach for achieving certified robustness against adversarial attacks in deep learning. However, most existing methods only leverage Gaussian smoothing noise and only work for $\ell_2$ perturbation. We propose a general framework of adversarial certification with non-Gaussian noise and...

Using Generative Adversarial Networks to Break and Protect Text Captchas

Article

Jan 2020

Text-based CAPTCHAs remains a popular scheme for distinguishing between a legitimate human user and an automated program. This article presents a novel genetic text captcha solver based on the generative adversarial network. As a departure from prior text captcha solvers that require a labor-intensive and time-consuming process to construct, our sc...

Adversarial Attacks on Faster R-CNN Object Detector

Article

Dec 2019

Adversarial attacks have stimulated research interests in the field of deep learning security. However, most of existing adversarial attack methods are developed on classification. In this paper, we use Projected Gradient Descent (PGD), the strongest first-order attack method on classification, to produce adversarial examples on the total loss of F...

Patch-level Neighborhood Interpolation: A General and Effective Graph-based Regularization Strategy

Preprint

Nov 2019

Regularization plays a crucial role in machine learning models, especially for deep neural networks. The existing regularization techniques mainly reply on the i.i.d. assumption and only employ the information of the current sample, without the leverage of neighboring information between samples. In this work, we propose a general regularizer calle...

Towards Making Deep Transfer Learning Never Hurt

Conference Paper

Full-text available

Nov 2019

Transfer learning have been frequently used to improve deep neural network training through incorporating weights of pre-trained networks as the starting-point of optimization for regularization. While deep transfer learning can usually boost the performance with better accuracy and faster convergence, transferring weights from inappropriate networ...

Towards Making Deep Transfer Learning Never Hurt

Preprint

Full-text available

Nov 2019

Virtual Adversarial Training on Graph Convolutional Networks in Node Classification

Chapter

Oct 2019

The effectiveness of Graph Convolutional Networks (GCNs) has been demonstrated in a wide range of graph-based machine learning tasks. However, the update of parameters in GCNs is only from labeled nodes, lacking the utilization of unlabeled data. In this paper, we apply Virtual Adversarial Training (VAT), an adversarial regularization method based...

How Question Generation Can Help Question Answering over Knowledge Base

Chapter

Sep 2019

We study how to improve the performance of Question Answering over Knowledge Base (KBQA) by utilizing the factoid Question Generation (QG) in this paper. The task of question generation (QG) is to generate a corresponding natural language question given the input answer, while question answering (QA) is a reverse task to find a proper answer given...

Spatio-temporal Manifold Learning for Human Motions via Long-horizon Modeling

Preprint

Full-text available

Aug 2019

Data-driven modeling of human motions is ubiquitous in computer graphics and computer vision applications, such as synthesizing realistic motions or recognizing actions. Recent research has shown that such problems can be approached by learning a natural motion manifold using deep learning to address the shortcomings of traditional data-driven appr...

Spatio-Temporal Manifold Learning for Human Motions via Long-Horizon Modeling

Preprint

Full-text available

Aug 2019

Data-driven modeling of human motions is ubiquitous in computer graphics and vision applications. Such problems can be approached by deep learning on a large amount data. However, existing methods can be sub-optimal for two reasons. First, skeletal information has not been fully utilized. Unlike images, it is difficult to define spatial proximity i...

AdaGCN: Adaboosting Graph Convolutional Networks into Deep Models

Preprint

Aug 2019

The design of deep graph models still remains to be investigated and the crucial part is how to explore and exploit the knowledge from different hops of neighbors in an efficient way. In this paper, we propose a novel RNN-like deep graph neural network architecture by incorporating AdaBoost into the computation of network; and the proposed graph co...

Spatio-temporal Manifold Learning for Human Motions via Long-horizon Modeling

Article

Full-text available

Aug 2019

Data-driven modeling of human motions is ubiquitous in computer graphics and computer vision applications, such as synthesizing realistic motions or recognizing actions. Recent research has shown that such problems can be approached by learning a natural motion manifold using deep learning on a large amount data, to address the shortcomings of trad...

SpHMC: Spectral Hamiltonian Monte Carlo

Conference Paper

Full-text available

Jul 2019

Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) methods have been widely used to sample from certain probability distributions, incorporating (kernel) density derivatives and/or given datasets. Instead of exploring new samples from kernel spaces, this piece of work proposed a novel SGHMC sampler, namely Spectral Hamiltonian Monte Carlo (SpHMC),...

The Multiplicative Noise in Stochastic Gradient Descent: Data-Dependent Regularization, Continuous and Discrete Approximation

Preprint

Full-text available

Jun 2019

The randomness in Stochastic Gradient Descent (SGD) is considered to play a central role in the observed strong generalization capability of deep learning. In this work, we re-interpret the stochastic gradient of vanilla SGD as a matrix-vector product of the matrix of gradients and a random noise vector (namely multiplicative noise, M-Noise). Compa...

The Regularization Effects of Anisotropic Noise in Stochastic Gradient Descent

Conference Paper

Full-text available

Jun 2019

Understanding the generalization of deep learning has raised lots of concerns recently, where the learning algorithms play an important role in generalization performance, such as stochastic gradient descent (SGD). Along this line, we particularly study the anisotropic noise introduced by SGD, and investigate its importance for the generalization i...

1644-P: Novel Subgroups of Prediabetes and Their Clinical Outcomes: A Data-Driven Analysis

Article

Jun 2019

The progression of prediabetes to diabetes and the reversion of them to NGT are highly heterogeneous. To develop a refined classification of prediabetes, we used data-driven K-means clustering in patients diagnosed with IGT and IFG according to WHO 1999 criteria in Pinggu Study (a prospective population-based survey for diabetes in suburban Beijing...

Tangent-Normal Adversarial Regularization for Semi-Supervised Learning

Conference Paper

Full-text available

Jun 2019

Efficient Neural Architecture Search via Proximal Iterations

Preprint

Full-text available

May 2019

Neural architecture search (NAS) recently attracts much research attention because of its ability to identify better architectures than handcrafted ones. However, many NAS methods, which optimize the search process in a discrete search space, need many GPU days for convergence. Recently, DARTS, which constructs a differentiable search space and the...

On the Learning Dynamics of Two-layer Nonlinear Convolutional Neural Networks

Preprint

May 2019

Convolutional neural networks (CNNs) have achieved remarkable performance in various fields, particularly in the domain of computer vision. However, why this architecture works well remains to be a mystery. In this work we move a small step toward understanding the success of CNNs by investigating the learning dynamics of a two-layer nonlinear conv...

Interpreting Adversarially Trained Convolutional Neural Networks

Preprint

Full-text available

May 2019

We attempt to interpret how adversarially trained convolutional neural networks (AT-CNNs) recognize objects. We design systematic approaches to interpret AT-CNNs in both qualitative and quantitative ways and compare them with normally trained models. Surprisingly, we find that adversarial training alleviates the texture bias of standard CNNs when t...

Bayesian Optimized Continual Learning with Attention Mechanism

Preprint

Full-text available

May 2019

Though neural networks have achieved much progress in various applications, it is still highly challenging for them to learn from a continuous stream of tasks without forgetting. Continual learning, a new learning paradigm, aims to solve this issue. In this work, we propose a new model for continual learning, called Bayesian Optimized Continual Lea...

You Only Propagate Once: Painless Adversarial Training Using Maximal Principle

Preprint

Full-text available

May 2019

Deep learning achieves state-of-the-art results in many areas. However recent works have shown that deep networks can be vulnerable to adversarial perturbations which slightly changes the input but leads to incorrect prediction. Adversarial training is an effective way of improving the robustness to the adversarial examples, typically formulated as...

ST-UNet: A Spatio-Temporal U-Network for Graph-structured Time Series Modeling

Preprint

Mar 2019

The spatio-temporal graph learning is becoming an increasingly important object of graph study. Many application domains involve highly dynamic graphs where temporal information is crucial, e.g. traffic networks and financial transaction graphs. Despite the constant progress made on learning structured data, there is still a lack of effective means...

3D Graph Convolutional Networks with Temporal Graphs: A Spatial Information Free Framework For Traffic Forecasting

Preprint

Mar 2019

Spatio-temporal prediction plays an important role in many application areas especially in traffic domain. However, due to complicated spatio-temporal dependency and high non-linear dynamics in road networks, traffic prediction task is still challenging. Existing works either exhibit heavy training cost or fail to accurately capture the spatio-temp...

Enhancing the Robustness of Deep Neural Networks by Boundary Conditional GAN

Preprint

Full-text available

Feb 2019

Deep neural networks have been widely deployed in various machine learning tasks. However, recent works have demonstrated that they are vulnerable to adversarial examples: carefully crafted small perturbations to cause misclassification by the network. In this work, we propose a novel defense mechanism called Boundary Conditional GAN to enhance the...

Towards Understanding Adversarial Examples Systematically: Exploring Data Size, Task and Model Factors

Preprint

Feb 2019

Most previous works usually explained adversarial examples from several specific perspectives, lacking relatively integral comprehension about this problem. In this paper, we present a systematic study on adversarial examples from three aspects: the amount of training data, task-dependent and model-specific factors. Particularly, we show that adver...

Virtual Adversarial Training on Graph Convolutional Networks in Node Classification

Preprint

Feb 2019

Multi-Stage Self-Supervised Learning for Graph Convolutional Networks

Preprint

Feb 2019

Graph Convolutional Networks(GCNs) play a crucial role in graph learning tasks, however, learning graph embedding with few supervised signals is still a difficult problem. In this paper, we propose a novel training algorithm for Graph Convolutional Network, called Multi-Stage Self-Supervised(M3S) Training Algorithm, combined with self-supervised le...

Quasi-potential as an implicit regularizer for the loss function in the stochastic gradient descent

Preprint

Full-text available

Jan 2019

We interpret the variational inference of the Stochastic Gradient Descent (SGD) as minimizing a new potential function named the quasi-potential. We analytically construct the quasi-potential function in the case when the loss function is convex and admits only one global minimum point. We show in this case that the quasi-potential function is rela...

Novel subgroups of patients with adult-onset diabetes in Chinese and US populations

Article

Jan 2019

Yet Another Text Captcha Solver: A Generative Adversarial Network Based Approach

Conference Paper

Full-text available

Oct 2018

Despite several attacks have been proposed, text-based CAPTCHAs are still being widely used as a security mechanism. One of the reasons for the pervasive use of text captchas is that many of the prior attacks are scheme-specific and require a labor-intensive and time-consuming process to construct. This means that a change in the captcha security f...

Tangent-Normal Adversarial Regularization for Semi-supervised Learning

Preprint

Aug 2018

The ever-increasing size of modern datasets combined with the difficulty of obtaining label information has made semi-supervised learning of significant practical importance in modern machine learning applications. Compared with supervised learning, the key difficulty in semi-supervised learning is how to make full use of the unlabeled data. In ord...

Stochastic Fractional Hamiltonian Monte Carlo

Conference Paper

Jul 2018

In this paper, we propose a novel stochastic fractional Hamiltonian Monte Carlo approach which generalizes the Hamiltonian Monte Carlo method within the framework of fractional calculus and L\'evy diffusion. Due to the large ``jumps'' introduced by L\'evy noise and momentum term, the proposed dynamics is capable of exploring the parameter space mor...

Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting

Conference Paper

Full-text available

Jul 2018

Timely accurate traffic forecast is crucial for urban traffic control and guidance. Due to the high nonlinearity and complexity of traffic flow, traditional methods cannot satisfy the requirements of mid-and-long term prediction tasks and often neglect spatial and temporal dependencies. In this paper, we propose a novel deep learning framework, Spa...

Reinforced Continual Learning

Preprint

May 2018

Most artificial intelligence models have limiting ability to solve new tasks faster, without forgetting previously acquired knowledge. The recently emerging paradigm of continual learning aims to solve this issue, in which the model learns various tasks in a sequential fashion. In this work, a novel approach for continual learning is proposed, whic...

Neural Control Variates for Variance Reduction

Preprint

Full-text available

May 2018

SIPID: A deep learning framework for sinogram interpolation and image denoising in low-dose CT reconstruction

Conference Paper

Apr 2018

Understanding and Enhancing the Transferability of Adversarial Examples

Article

Full-text available

Feb 2018

State-of-the-art deep neural networks are known to be vulnerable to adversarial examples, formed by applying small but malicious perturbations to the original inputs. Moreover, the perturbations can \textit{transfer across models}: adversarial examples generated for a specific model will often mislead other unseen models. Consequently the adversary...

Fig. 5: The substitute anomaly detector model

Fig. 6: The architecture of the stealthy attack GAN

A Deep Learning-based Framework for Conducting Stealthy Attacks in Industrial Control Systems

Article

Full-text available

Sep 2017

Industrial control systems (ICS), which in many cases are components of critical national infrastructure, are increasingly being connected to other networks and the wider internet motivated by factors such as enhanced operational functionality and improved efficiency. However, set in this context, it is easy to see that the cyber attack surface of...

Spatio-temporal Graph Convolutional Neural Network: A Deep Learning Framework for Traffic Forecasting

Article

Sep 2017

The goal of traffic forecasting is to predict the future vital indicators (such as speed, volume and density) of the local traffic network in reasonable response time. Due to the dynamics and complexity of the traffic network flow, typical simulation experiments and classic statistical methods cannot satisfy the requirements of medium and long-term...

Learning with Noise: Enhance Distantly Supervised Relation Extraction with Dynamic Transition Matrix

Conference Paper

Full-text available

Jul 2017

Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes

Article

Full-text available

Jun 2017

It is widely observed that deep learning models with learned parameters generalize well, even with much more model parameters than the number of training samples. We systematically investigate the underlying reasons why deep neural networks often generalize well, and reveal the difference between the minima (with the same training error) that gener...

Figure 2: Sentence Level Results on TIMERE

Learning with Noise: Enhance Distantly Supervised Relation Extraction with Dynamic Transition Matrix

Article

Full-text available

May 2017

Distant supervision significantly reduces human efforts in building training data for many classification tasks. While promising, this technique often introduces noise to the generated training data, which can severely affect the model performance. In this paper, we take a deep look at the application of distant supervision in relation extraction....

Langevin Dynamics with Continuous Tempering for High-dimensional Non-convex Optimization

Article

Mar 2017

Minimizing non-convex and high-dimensional objective functions are challenging, especially when training modern deep neural networks. In this paper, a novel approach is proposed which divides the training process into two consecutive phases to obtain better generalization performance: Bayesian sampling and stochastic optimization. The first phase i...

Integrating local information for inference and optimization in machine learning

Thesis

Jun 2016

Zhanxing Zhu

In practice, machine learners often care about two key issues: one is how to obtain a more accurate answer with limited data, and the other is how to handle large-scale data (often referred to as “Big Data” in industry) for efficient inference and optimization. One solution to the first issue might be aggregating learned predictions from diverse lo...

Stochastic Parallel Block Coordinate Descent for Large-Scale Saddle Point Problems

Article

Nov 2015

We consider convex-concave saddle point problems with a separable structure and non-strongly convex functions. We propose an efficient stochastic block coordinate descent method using adaptive primal-dual updates, which enables flexible parallel optimization for large-scale problems. Our method shares the efficiency and flexibility of block coordin...

Covariance-Controlled Adaptive Langevin Thermostat for Large-Scale Bayesian Sampling

Article

Oct 2015

Monte Carlo sampling for Bayesian posterior inference is a common approach used in machine learning. The Markov Chain Monte Carlo procedures that are used are often discrete-time analogues of associated stochastic differential equations (SDEs). These SDEs are guaranteed to leave invariant the required posterior distribution. An area of current rese...

Adaptive Stochastic Primal-Dual Coordinate Descent for Separable Saddle Point Problems

Conference Paper

Aug 2015

We consider a generic convex-concave saddle point problem with a separable structure, a form that covers a wide-ranged machine learning applications. Under this problem structure, we follow the framework of primal-dual updates for saddle point problems, and incorporate stochastic block coordinate descent with adaptive stepsizes into this framework....

Aggregation Under Bias: Rényi Divergence Aggregation and Its Implementation via Machine Learning Markets

Conference Paper

Aug 2015

Trading in information markets, such as machine learning markets, has been shown to be an effective approach for aggregating the beliefs of different agents. In a machine learning context, aggregation commonly uses forms of linear opinion pools, or logarithmic (log) opinion pools. It is interesting to relate information market aggregation to the ma...

Adaptive Stochastic Primal-Dual Coordinate Descent for Separable Saddle Point Problems

Article

Jun 2015

We consider a generic convex-concave saddle point problem with separable structure, a form that covers a wide-ranged machine learning applications. Under this problem structure, we follow the framework of primal-dual updates for saddle point problems, and incorporate stochastic block coordinate descent with adaptive stepsize into this framework. We...

Supervised Distance Preserving Projections: Applications in the quantitative analysis of diesel fuels and light cycle oils from NIR spectra

Article

Dec 2014

In this work, we discuss a recently proposed approach for supervised dimensionality reduction, the Supervised Distance Preserving Projection (SDPP) and, we investigate its applicability to monitoring material's properties from spectroscopic observations. Motivated by continuity preservation, the SDPP is a linear projection method where the proximit...

Supervised Distance Preserving Projections | Matlab code

Data

Sep 2014

In this work, we consider dimensionality reduction in supervised settings and, specifically, we focus on regression problems. A novel algorithm, the supervised distance preserving projection (SDPP), is proposed. The SDPP minimizes the difference between pairwise distances among projected input covariates and distances among responses locally. This...

Spectroscopic monitoring of diesel fuels using Supervised Distance Preserving Projections

Conference Paper

Dec 2013

In this work, we discuss a recently proposed approach for supervised dimensionality reduction, the Supervised Distance Preserving Projection and, we investigate its applicability to monitoring material's properties from spectroscopic observations. Motivated by continuity preservation, the SDPP is a linear projection method where the local geometry...

Supervised Distance Preserving Projections

Article

Full-text available

Dec 2013

Monitoring Diesel Fuels with Supervised Distance Preserving Projections and Local Linear Regression

Conference Paper

Full-text available

Sep 2013

In this work, we discuss a recently proposed approach for supervised dimensionality reduction, the Supervised Distance Preserving Projection (SDPP) and, we investigate its applicability to monitoring material's properties from spectroscopic observations using Local Linear Regression (LLR). An experimental evaluation is conducted to show the perform...

Lecture Notes in Computer Science

Conference Paper

Full-text available

Jan 2013

Stochastic matrices are arrays whose elements are discrete probabilities. They are widely used in techniques such as Markov Chains, probabilistic latent semantic analysis, etc. In such learning problems, the learned matrices, being stochastic matrices, are non-negative and all or part of the elements sum up to one. Conventional multiplicative updat...

Local Linear Regression for Soft-Sensor Design with Application to an Industrial Deethanizer

Conference Paper

Aug 2011

Soft-sensors for estimating in real-time important quality variables are a key technology in modern process industry. The successful development of a soft-sensor whose performance does not deteriorate with time and changing process characteristics is troublesome and only seldom achieved in real-world setups. The design of soft-sensors based on loca...

Hyperspectral unmixing using non-negative matrix factorization with automatically estimating regularization parameters

Conference Paper

Jul 2011

Hyperspectral unmixing is a process by which pixel spectra in a scene are decomposed into constituent materials and their corresponding fractions. Nonnegative matrix factorization (NMF) is a method recently developed to deal with matrix factorization. This paper proposes a hyperspectral unmixing algorithm using auto-NMF based on the L-curve theory....

Automatic Rank Determination in Projective Nonnegative Matrix Factorization

Conference Paper

Full-text available

Sep 2010

Projective Nonnegative Matrix Factorization (PNMF) has demonstrated advantages in both sparse feature extraction and clustering. However, PNMF requires users to specify the column rank of the approximative projection matrix, the value of which is unknown beforehand. In this paper, we propose a method called ARDPNMF to automatically determine the co...

A method of automatically estimating the regularization parameter for Non-negative Matrix Factorization

Conference Paper

Aug 2010

The Idea of Non-negative Matrix Factorization (NMF) has been implemented in a wide variety of real world applications. To improve the usability of NMF, people usually add some regularization items to constrain the process of matrix factorization. The Regularized Non-negative Matrix Factorization (RNMF) mainly relies on prior knowledge to set the re...

Quadratic Form Innovation to Blind Source Separation

Conference Paper

Sep 2009

This paper proposes a blind source separation (BSS) method based on the quadratic form innovation of original sources, which includes linear predictability and energy (square) predictability as special cases. A simple algorithm is presented by minimizing a loss function of the quadratic form innovation. Simulations by source signals with linear or...

A Fixed-Point Algorithm for Nonnegative Independent Component Analysis

Conference Paper

Jan 2009

This paper proposes a fixed-point algorithm for nonnegative independent component analysis, based on the mutual independency of source signals and `nonpositive' parts of source signals. The algorithm is computationally simple, provides fast convergence and does not need choose any learning step sizes. Simulations by independent source signals which...