Zhanxing Zhu

Zhanxing Zhu
Peking University | PKU · School of Mathematical Sciences

PhD in machine learning

About

87
Publications
21,283
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,061
Citations
Additional affiliations
September 2012 - present
The University of Edinburgh
Position
  • PhD Student
Description
  • Large-scale distributed and parallel optimization in machine learning, market mechanism
November 2009 - December 2012
Aalto University
Position
  • Research Assistant
Description
  • Research projects: Local linear models, supervised dimensionality reduction and its application to chemical process monitoring, nonnegative learning.
October 2007 - July 2009
Beihang University (BUAA)
Position
  • Research Assistant
Description
  • Nonnegative matrix factorization for hyperspectral image unmixing

Publications

Publications (87)
Article
Aim To investigate whether stratifying participants with prediabetes according to their diabetes progression risks (PR) could affect their responses to interventions. Methods We developed a machine learning‐based model to predict the 1‐year diabetes PR (ML‐PR) with the least predictors. The model was developed and internally validated in participa...
Conference Paper
Full-text available
Differentiable physics modeling combines physics models with gradient-based learning to provide model explicability and data efficiency. It has been used to learn dynamics, solve inverse problems and facilitate design, and is at its inception of impact. Current successes have concentrated on general physics models such as rigid bodies, deformable s...
Article
Despite the empirical success in various domains, it has been revealed that deep neural networks are vulnerable to maliciously perturbed input data that can dramatically degrade their performance. These are known as adversarial attacks. To counter adversarial attacks, adversarial training formulated as a form of robust optimization has been demonst...
Preprint
This is the Proceedings of ICML 2021 Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI. Deep neural networks (DNNs) have undoubtedly brought great success to a wide range of applications in computer vision, computational linguistics, and AI. However, foundational principles underlying the DNNs' success and their r...
Article
Continual learning paradigm learns from a continuous stream of tasks in an incremental manner and aims to overcome the notorious issue: the catastrophic forgetting. In this work, we propose a new adaptive progressive network framework including two models for continual learning: Reinforced Continual Learning (RCL) and Bayesian Optimized Continual L...
Article
Despite the empirical success in various domains, it has been revealed that deep neural networks are vulnerable to maliciously perturbed input data that much degrade their performance. This is known as adversarial attacks. To counter adversarial attacks, adversarial training formulated as a form of robust optimization has been demonstrated to be ef...
Article
Spatial-temporal data forecasting of traffic flow is a challenging task because of complicated spatial dependencies and dynamical trends of temporal pattern between different roads. Existing frameworks usually utilize given spatial adjacency graph and sophisticated mechanisms for modeling spatial and temporal correlations. However, limited represen...
Preprint
It is well-known that stochastic gradient noise (SGN) acts as implicit regularization for deep learning and is essentially important for both optimization and generalization of deep networks. Some works attempted to artificially simulate SGN by injecting random noise to improve deep learning. However, it turned out that the injected simple random n...
Article
Full-text available
Stochastic Gradient Langevin Dynamics (SGLD) have been widely used for Bayesian sampling from certain probability distributions, incorporating derivatives of the log-posterior. With the derivative evaluation of the log-posterior distribution, SGLD methods generate samples from the distribution through performing as a thermostats dynamics that trave...
Preprint
Despite the empirical success in various domains, it has been revealed that deep neural networks are vulnerable to maliciously perturbed input data that much degrade their performance. This is known as adversarial attacks. To counter adversarial attacks, adversarial training formulated as a form of robust optimization has been demonstrated to be ef...
Preprint
Spatial-temporal data forecasting of traffic flow is a challenging task because of complicated spatial dependencies and dynamical trends of temporal pattern between different roads. Existing frameworks typically utilize given spatial adjacency graph and sophisticated mechanisms for modeling spatial and temporal correlations. However, limited repres...
Preprint
Full-text available
We consider the fundamental problem of how to automatically construct summary statistics for implicit generative models where the evaluation of likelihood function is intractable but sampling / simulating data from the model is possible. The idea is to frame the task of constructing sufficient statistics as learning mutual information maximizing re...
Preprint
Knowledge distillation is a strategy of training a student network with guide of the soft output from a teacher network. It has been a successful method of model compression and knowledge transfer. However, currently knowledge distillation lacks a convincing theoretical understanding. On the other hand, recent finding on neural tangent kernel enabl...
Preprint
Full-text available
Data augmentation is an effective and universal technique for improving generalization performance of deep neural networks. It could enrich diversity of training samples that is essential in medical image segmentation tasks because 1) the scale of medical image dataset is typically smaller, which may increase the risk of overfitting; 2) the shape a...
Chapter
Data augmentation is an effective and universal technique for improving generalization performance of deep neural networks. It could enrich diversity of training samples that is essential in medical image segmentation tasks because 1) the scale of medical image dataset is typically smaller, which may increase the risk of overfitting; 2) the shape a...
Preprint
Convolutional Neural Networks (CNNs) are known to rely more on local texture rather than global shape when making decisions. Recent work also indicates a close relationship between CNN's texture-bias and its robustness against distribution shift, adversarial perturbation, random corruption, etc. In this work, we attempt at improving various kinds o...
Conference Paper
Deep neural networks have been proven to be vulnerable to the so-called adversarial attacks. Recently there have been efforts to defend such attacks with deep generative models. These defenses often predict by inverting the deep generative models rather than simple feedforward propagation. Such defenses are difficult to attack due to the obfuscated...
Preprint
Full-text available
We comprehensively reveal the learning dynamics of deep neural networks (DNN) with batch normalization (BN) and weight decay (WD), named as Spherical Motion Dynamics (SMD). Our theorem on SMD is based on the scale-invariant property of weights caused by BN, and regularization effect of WD. SMD shows the optimization trajectory of weights is like a...
Preprint
Full-text available
The scarcity of class-labeled data is a ubiquitous bottleneck in a wide range of machine learning problems. While abundant unlabeled data normally exist and provide a potential solution, it is extremely challenging to exploit them. In this paper, we address this problem by leveraging Positive-Unlabeled~(PU) classification and conditional generation...
Preprint
The wide deployment of deep neural networks, though achieving great success in many domains, has severe safety and reliability concerns. Existing adversarial attack generation and automatic verification techniques cannot formally verify whether a network is globally robust, i.e., the absence or not of adversarial examples in the input space. To add...
Chapter
Full-text available
In statistics and machine learning, approximation of an intractable integration is often achieved by using the unbiased Monte Carlo estimator, but the variances of the estimation are generally high in many applications. Control variates approaches are well-known to reduce the variance of the estimation. These control variates are typically construc...
Article
Full-text available
Neural architecture search (NAS) attracts much research attention because of its ability to identify better architectures than handcrafted ones. Recently, differentiable search methods become the state-of-the-arts on NAS, which can obtain high-performance architectures in several days. However, they still suffer from huge computation costs and infe...
Article
Graph Convolutional Networks (GCNs) play a crucial role in graph learning tasks, however, learning graph embedding with few supervised signals is still a difficult problem. In this paper, we propose a novel training algorithm for Graph Convolutional Network, called Multi-Stage Self-Supervised (M3S) Training Algorithm, combined with self-supervised...
Preprint
Full-text available
Randomized classifiers have been shown to provide a promising approach for achieving certified robustness against adversarial attacks in deep learning. However, most existing methods only leverage Gaussian smoothing noise and only work for $\ell_2$ perturbation. We propose a general framework of adversarial certification with non-Gaussian noise and...
Article
Text-based CAPTCHAs remains a popular scheme for distinguishing between a legitimate human user and an automated program. This article presents a novel genetic text captcha solver based on the generative adversarial network. As a departure from prior text captcha solvers that require a labor-intensive and time-consuming process to construct, our sc...
Article
Adversarial attacks have stimulated research interests in the field of deep learning security. However, most of existing adversarial attack methods are developed on classification. In this paper, we use Projected Gradient Descent (PGD), the strongest first-order attack method on classification, to produce adversarial examples on the total loss of F...
Preprint
Regularization plays a crucial role in machine learning models, especially for deep neural networks. The existing regularization techniques mainly reply on the i.i.d. assumption and only employ the information of the current sample, without the leverage of neighboring information between samples. In this work, we propose a general regularizer calle...
Conference Paper
Full-text available
Transfer learning have been frequently used to improve deep neural network training through incorporating weights of pre-trained networks as the starting-point of optimization for regularization. While deep transfer learning can usually boost the performance with better accuracy and faster convergence, transferring weights from inappropriate networ...
Preprint
Full-text available
Transfer learning have been frequently used to improve deep neural network training through incorporating weights of pre-trained networks as the starting-point of optimization for regularization. While deep transfer learning can usually boost the performance with better accuracy and faster convergence, transferring weights from inappropriate networ...
Chapter
The effectiveness of Graph Convolutional Networks (GCNs) has been demonstrated in a wide range of graph-based machine learning tasks. However, the update of parameters in GCNs is only from labeled nodes, lacking the utilization of unlabeled data. In this paper, we apply Virtual Adversarial Training (VAT), an adversarial regularization method based...
Chapter
We study how to improve the performance of Question Answering over Knowledge Base (KBQA) by utilizing the factoid Question Generation (QG) in this paper. The task of question generation (QG) is to generate a corresponding natural language question given the input answer, while question answering (QA) is a reverse task to find a proper answer given...
Preprint
Full-text available
Data-driven modeling of human motions is ubiquitous in computer graphics and computer vision applications, such as synthesizing realistic motions or recognizing actions. Recent research has shown that such problems can be approached by learning a natural motion manifold using deep learning to address the shortcomings of traditional data-driven appr...
Preprint
Full-text available
Data-driven modeling of human motions is ubiquitous in computer graphics and vision applications. Such problems can be approached by deep learning on a large amount data. However, existing methods can be sub-optimal for two reasons. First, skeletal information has not been fully utilized. Unlike images, it is difficult to define spatial proximity i...
Preprint
The design of deep graph models still remains to be investigated and the crucial part is how to explore and exploit the knowledge from different hops of neighbors in an efficient way. In this paper, we propose a novel RNN-like deep graph neural network architecture by incorporating AdaBoost into the computation of network; and the proposed graph co...
Article
Full-text available
Data-driven modeling of human motions is ubiquitous in computer graphics and computer vision applications, such as synthesizing realistic motions or recognizing actions. Recent research has shown that such problems can be approached by learning a natural motion manifold using deep learning on a large amount data, to address the shortcomings of trad...
Conference Paper
Full-text available
Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) methods have been widely used to sample from certain probability distributions, incorporating (kernel) density derivatives and/or given datasets. Instead of exploring new samples from kernel spaces, this piece of work proposed a novel SGHMC sampler, namely Spectral Hamiltonian Monte Carlo (SpHMC),...
Preprint
Full-text available
The randomness in Stochastic Gradient Descent (SGD) is considered to play a central role in the observed strong generalization capability of deep learning. In this work, we re-interpret the stochastic gradient of vanilla SGD as a matrix-vector product of the matrix of gradients and a random noise vector (namely multiplicative noise, M-Noise). Compa...
Conference Paper
Full-text available
Understanding the generalization of deep learning has raised lots of concerns recently, where the learning algorithms play an important role in generalization performance, such as stochastic gradient descent (SGD). Along this line, we particularly study the anisotropic noise introduced by SGD, and investigate its importance for the generalization i...
Article
The progression of prediabetes to diabetes and the reversion of them to NGT are highly heterogeneous. To develop a refined classification of prediabetes, we used data-driven K-means clustering in patients diagnosed with IGT and IFG according to WHO 1999 criteria in Pinggu Study (a prospective population-based survey for diabetes in suburban Beijing...
Preprint
Full-text available
Neural architecture search (NAS) recently attracts much research attention because of its ability to identify better architectures than handcrafted ones. However, many NAS methods, which optimize the search process in a discrete search space, need many GPU days for convergence. Recently, DARTS, which constructs a differentiable search space and the...
Preprint
Convolutional neural networks (CNNs) have achieved remarkable performance in various fields, particularly in the domain of computer vision. However, why this architecture works well remains to be a mystery. In this work we move a small step toward understanding the success of CNNs by investigating the learning dynamics of a two-layer nonlinear conv...
Preprint
Full-text available
We attempt to interpret how adversarially trained convolutional neural networks (AT-CNNs) recognize objects. We design systematic approaches to interpret AT-CNNs in both qualitative and quantitative ways and compare them with normally trained models. Surprisingly, we find that adversarial training alleviates the texture bias of standard CNNs when t...
Preprint
Full-text available
Though neural networks have achieved much progress in various applications, it is still highly challenging for them to learn from a continuous stream of tasks without forgetting. Continual learning, a new learning paradigm, aims to solve this issue. In this work, we propose a new model for continual learning, called Bayesian Optimized Continual Lea...
Preprint
Full-text available
Deep learning achieves state-of-the-art results in many areas. However recent works have shown that deep networks can be vulnerable to adversarial perturbations which slightly changes the input but leads to incorrect prediction. Adversarial training is an effective way of improving the robustness to the adversarial examples, typically formulated as...
Preprint
The spatio-temporal graph learning is becoming an increasingly important object of graph study. Many application domains involve highly dynamic graphs where temporal information is crucial, e.g. traffic networks and financial transaction graphs. Despite the constant progress made on learning structured data, there is still a lack of effective means...
Preprint
Spatio-temporal prediction plays an important role in many application areas especially in traffic domain. However, due to complicated spatio-temporal dependency and high non-linear dynamics in road networks, traffic prediction task is still challenging. Existing works either exhibit heavy training cost or fail to accurately capture the spatio-temp...
Preprint
Full-text available
Deep neural networks have been widely deployed in various machine learning tasks. However, recent works have demonstrated that they are vulnerable to adversarial examples: carefully crafted small perturbations to cause misclassification by the network. In this work, we propose a novel defense mechanism called Boundary Conditional GAN to enhance the...
Preprint
Most previous works usually explained adversarial examples from several specific perspectives, lacking relatively integral comprehension about this problem. In this paper, we present a systematic study on adversarial examples from three aspects: the amount of training data, task-dependent and model-specific factors. Particularly, we show that adver...
Preprint
The effectiveness of Graph Convolutional Networks (GCNs) has been demonstrated in a wide range of graph-based machine learning tasks. However, the update of parameters in GCNs is only from labeled nodes, lacking the utilization of unlabeled data. In this paper, we apply Virtual Adversarial Training (VAT), an adversarial regularization method based...
Preprint
Graph Convolutional Networks(GCNs) play a crucial role in graph learning tasks, however, learning graph embedding with few supervised signals is still a difficult problem. In this paper, we propose a novel training algorithm for Graph Convolutional Network, called Multi-Stage Self-Supervised(M3S) Training Algorithm, combined with self-supervised le...
Preprint
Full-text available
We interpret the variational inference of the Stochastic Gradient Descent (SGD) as minimizing a new potential function named the quasi-potential. We analytically construct the quasi-potential function in the case when the loss function is convex and admits only one global minimum point. We show in this case that the quasi-potential function is rela...
Conference Paper
Full-text available
Despite several attacks have been proposed, text-based CAPTCHAs are still being widely used as a security mechanism. One of the reasons for the pervasive use of text captchas is that many of the prior attacks are scheme-specific and require a labor-intensive and time-consuming process to construct. This means that a change in the captcha security f...
Preprint
The ever-increasing size of modern datasets combined with the difficulty of obtaining label information has made semi-supervised learning of significant practical importance in modern machine learning applications. Compared with supervised learning, the key difficulty in semi-supervised learning is how to make full use of the unlabeled data. In ord...
Conference Paper
In this paper, we propose a novel stochastic fractional Hamiltonian Monte Carlo approach which generalizes the Hamiltonian Monte Carlo method within the framework of fractional calculus and L\'evy diffusion. Due to the large ``jumps'' introduced by L\'evy noise and momentum term, the proposed dynamics is capable of exploring the parameter space mor...
Conference Paper
Full-text available
Timely accurate traffic forecast is crucial for urban traffic control and guidance. Due to the high nonlinearity and complexity of traffic flow, traditional methods cannot satisfy the requirements of mid-and-long term prediction tasks and often neglect spatial and temporal dependencies. In this paper, we propose a novel deep learning framework, Spa...
Preprint
Most artificial intelligence models have limiting ability to solve new tasks faster, without forgetting previously acquired knowledge. The recently emerging paradigm of continual learning aims to solve this issue, in which the model learns various tasks in a sequential fashion. In this work, a novel approach for continual learning is proposed, whic...
Preprint
Full-text available
In statistics and machine learning, approximation of an intractable integration is often achieved by using the unbiased Monte Carlo estimator, but the variances of the estimation are generally high in many applications. Control variates approaches are well-known to reduce the variance of the estimation. These control variates are typically construc...
Article
Full-text available
State-of-the-art deep neural networks are known to be vulnerable to adversarial examples, formed by applying small but malicious perturbations to the original inputs. Moreover, the perturbations can \textit{transfer across models}: adversarial examples generated for a specific model will often mislead other unseen models. Consequently the adversary...
Article
Full-text available
Industrial control systems (ICS), which in many cases are components of critical national infrastructure, are increasingly being connected to other networks and the wider internet motivated by factors such as enhanced operational functionality and improved efficiency. However, set in this context, it is easy to see that the cyber attack surface of...
Article
The goal of traffic forecasting is to predict the future vital indicators (such as speed, volume and density) of the local traffic network in reasonable response time. Due to the dynamics and complexity of the traffic network flow, typical simulation experiments and classic statistical methods cannot satisfy the requirements of medium and long-term...
Article
Full-text available
It is widely observed that deep learning models with learned parameters generalize well, even with much more model parameters than the number of training samples. We systematically investigate the underlying reasons why deep neural networks often generalize well, and reveal the difference between the minima (with the same training error) that gener...
Article
Full-text available
Distant supervision significantly reduces human efforts in building training data for many classification tasks. While promising, this technique often introduces noise to the generated training data, which can severely affect the model performance. In this paper, we take a deep look at the application of distant supervision in relation extraction....
Article
Minimizing non-convex and high-dimensional objective functions are challenging, especially when training modern deep neural networks. In this paper, a novel approach is proposed which divides the training process into two consecutive phases to obtain better generalization performance: Bayesian sampling and stochastic optimization. The first phase i...
Thesis
In practice, machine learners often care about two key issues: one is how to obtain a more accurate answer with limited data, and the other is how to handle large-scale data (often referred to as “Big Data” in industry) for efficient inference and optimization. One solution to the first issue might be aggregating learned predictions from diverse lo...
Article
We consider convex-concave saddle point problems with a separable structure and non-strongly convex functions. We propose an efficient stochastic block coordinate descent method using adaptive primal-dual updates, which enables flexible parallel optimization for large-scale problems. Our method shares the efficiency and flexibility of block coordin...
Article
Monte Carlo sampling for Bayesian posterior inference is a common approach used in machine learning. The Markov Chain Monte Carlo procedures that are used are often discrete-time analogues of associated stochastic differential equations (SDEs). These SDEs are guaranteed to leave invariant the required posterior distribution. An area of current rese...
Conference Paper
We consider a generic convex-concave saddle point problem with a separable structure, a form that covers a wide-ranged machine learning applications. Under this problem structure, we follow the framework of primal-dual updates for saddle point problems, and incorporate stochastic block coordinate descent with adaptive stepsizes into this framework....
Conference Paper
Trading in information markets, such as machine learning markets, has been shown to be an effective approach for aggregating the beliefs of different agents. In a machine learning context, aggregation commonly uses forms of linear opinion pools, or logarithmic (log) opinion pools. It is interesting to relate information market aggregation to the ma...
Article
We consider a generic convex-concave saddle point problem with separable structure, a form that covers a wide-ranged machine learning applications. Under this problem structure, we follow the framework of primal-dual updates for saddle point problems, and incorporate stochastic block coordinate descent with adaptive stepsize into this framework. We...
Article
In this work, we discuss a recently proposed approach for supervised dimensionality reduction, the Supervised Distance Preserving Projection (SDPP) and, we investigate its applicability to monitoring material's properties from spectroscopic observations. Motivated by continuity preservation, the SDPP is a linear projection method where the proximit...
Data
In this work, we consider dimensionality reduction in supervised settings and, specifically, we focus on regression problems. A novel algorithm, the supervised distance preserving projection (SDPP), is proposed. The SDPP minimizes the difference between pairwise distances among projected input covariates and distances among responses locally. This...
Conference Paper
In this work, we discuss a recently proposed approach for supervised dimensionality reduction, the Supervised Distance Preserving Projection and, we investigate its applicability to monitoring material's properties from spectroscopic observations. Motivated by continuity preservation, the SDPP is a linear projection method where the local geometry...
Article
Full-text available
In this work, we consider dimensionality reduction in supervised settings and, specifically, we focus on regression problems. A novel algorithm, the supervised distance preserving projection (SDPP), is proposed. The SDPP minimizes the difference between pairwise distances among projected input covariates and distances among responses locally. This...
Conference Paper
Full-text available
In this work, we discuss a recently proposed approach for supervised dimensionality reduction, the Supervised Distance Preserving Projection (SDPP) and, we investigate its applicability to monitoring material's properties from spectroscopic observations using Local Linear Regression (LLR). An experimental evaluation is conducted to show the perform...
Conference Paper
Full-text available
Stochastic matrices are arrays whose elements are discrete probabilities. They are widely used in techniques such as Markov Chains, probabilistic latent semantic analysis, etc. In such learning problems, the learned matrices, being stochastic matrices, are non-negative and all or part of the elements sum up to one. Conventional multiplicative updat...
Conference Paper
Soft-sensors for estimating in real-time important quality variables are a key technology in modern process industry. The successful development of a soft-sensor whose performance does not deteriorate with time and changing process characteristics is troublesome and only seldom achieved in real-world setups. The design of soft-sensors based on loca...
Conference Paper
Hyperspectral unmixing is a process by which pixel spectra in a scene are decomposed into constituent materials and their corresponding fractions. Nonnegative matrix factorization (NMF) is a method recently developed to deal with matrix factorization. This paper proposes a hyperspectral unmixing algorithm using auto-NMF based on the L-curve theory....
Conference Paper
Full-text available
Projective Nonnegative Matrix Factorization (PNMF) has demonstrated advantages in both sparse feature extraction and clustering. However, PNMF requires users to specify the column rank of the approximative projection matrix, the value of which is unknown beforehand. In this paper, we propose a method called ARDPNMF to automatically determine the co...
Conference Paper
The Idea of Non-negative Matrix Factorization (NMF) has been implemented in a wide variety of real world applications. To improve the usability of NMF, people usually add some regularization items to constrain the process of matrix factorization. The Regularized Non-negative Matrix Factorization (RNMF) mainly relies on prior knowledge to set the re...
Conference Paper
This paper proposes a blind source separation (BSS) method based on the quadratic form innovation of original sources, which includes linear predictability and energy (square) predictability as special cases. A simple algorithm is presented by minimizing a loss function of the quadratic form innovation. Simulations by source signals with linear or...
Conference Paper
This paper proposes a fixed-point algorithm for nonnegative independent component analysis, based on the mutual independency of source signals and `nonpositive' parts of source signals. The algorithm is computationally simple, provides fast convergence and does not need choose any learning step sizes. Simulations by independent source signals which...

Network

Cited By