Jun Wan

Jun Wan
Institute of Automation, Chinese Academy of Sciences · National Laboratory of Pattern Recognition

Doctor of Philosophy

About

138
Publications
37,232
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,636
Citations

Publications

Publications (138)
Preprint
Continual learning (CL) aims to extend deep models from static and enclosed environments to dynamic and complex scenarios, enabling systems to continuously acquire new knowledge of novel categories without forgetting previously learned knowledge. Recent CL models have gradually shifted towards the utilization of pre-trained models (PTMs) with param...
Article
Rehearsal methods based on knowledge distillation (KD) have been widely used in continual learning (CL). However, given memory constraints, few exemplars contain limited variations of previously learned tasks, impeding the effectiveness of KD in retaining long-term knowledge. The decision boundaries learned by the typical KD strategy overfit the li...
Chapter
Facial attributes indicate the intuitive semantic descriptions of a human face like gender, race, expression, and so on. In the past few years, automated facial attribute analysis has become an active field in the area of biometric recognition due to its wide range of possible applications, such as face verification [5, 59], face identification [63...
Article
Facial age estimation has received a lot of attention for its diverse application scenarios. Most existing studies treat each sample equally and aim to reduce the average estimation error for the entire dataset, which can be summarized as General Age Estimation . However, due to the long-tailed distribution prevalent in the dataset, treating all...
Preprint
Full-text available
Sign Language Translation (SLT) is a challenging task due to its cross-domain nature, involving the translation of visual-gestural language to text. Many previous methods employ an intermediate representation, i.e., gloss sequences, to facilitate SLT, thus transforming it into a two-stage task of sign language recognition (SLR) followed by sign lan...
Preprint
Facial age estimation has received a lot of attention for its diverse application scenarios. Most existing studies treat each sample equally and aim to reduce the average estimation error for the entire dataset, which can be summarized as General Age Estimation. However, due to the long-tailed distribution prevalent in the dataset, treating all sam...
Chapter
Through the release of three large-scale datasets and the successful holding of three competitions, we have promoted the development of the face anti-spoofing community. In this chapter, we will summarize our work in recent years from two aspects of datasets and competitions, including the characteristics of datasets CASIA-SURF, CASIA-SURF CeFA and...
Chapter
In recent years, the security of face recognition systems has been increasingly threatened. Face Anti-spoofing (FAS) is essential to secure face recognition systems primarily from various attacks. In order to attract researchers and push forward the state of the art in Face Presentation Attack Detection (PAD), we organized three editions of Face An...
Chapter
In this chapter, we first report the results obtained by each team that has participated in the face anti-spoofing challenge series, including ablation study results when available. Then, we analyze the advantages and disadvantages of the analyzed methods based on the experimental results. Finally, we outline the common characteristics we identifie...
Chapter
The PAD competitions we organized attracted more than 835 teams from home and abroad, most of them from the industry, which shows that the topic of face anti-spoofing is closely related to daily life, and there is an urgent need for advanced algorithms to solve its application needs. Specifically, the Chalearn LAP multi-modal face anti-spoofing att...
Chapter
With the ubiquity of facial authentication systems and the prevalence of security cameras around the world, the impact that facial presentation attack techniques may have is huge. However, research progress in this field has been slowed by a number of factors, including the lack of appropriate and realistic datasets, ethical and privacy issues that...
Preprint
Long-tailed visual recognition has received increasing attention in recent years. Due to the extremely imbalanced data distribution in long-tailed learning, the learning process shows great uncertainties. For example, the predictions of different experts on the same image vary remarkably despite the same training settings. To alleviate the uncertai...
Article
Exemplar rehearsal-based methods with knowledge distillation (KD) have been widely used in class incremental learning (CIL) scenarios. However, they still suffer from performance degradation because of severely distribution discrepancy between training and test set caused by the limited storage memory on previous classes. In this paper, we mathemat...
Article
Motion recognition is a promising direction in computer vision, but the training of video classification models is much harder than images due to insufficient data and considerable parameters. To get around this, some works strive to explore multimodal cues from RGB-D data. Although improving motion recognition to some extent, these methods still f...
Preprint
The availability of handy multi-modal (i.e., RGB-D) sensors has brought about a surge of face anti-spoofing research. However, the current multi-modal face presentation attack detection (PAD) has two defects: (1) The framework based on multi-modal fusion requires providing modalities consistent with the training input, which seriously limits the de...
Preprint
Full-text available
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, most of the studies lacked consideration of long-distance scenarios. Specifically, compared with FAS in traditional scenes such as phone unlocking, face payment, and self-service security inspection, FAS in long-distance such as station...
Preprint
Full-text available
Face anti-spoofing (FAS) is an essential mechanism for safeguarding the integrity of automated face recognition systems. Despite substantial advancements, the generalization of existing approaches to real-world applications remains challenging. This limitation can be attributed to the scarcity and lack of diversity in publicly available FAS dataset...
Preprint
Full-text available
Rehearsal approaches in class incremental learning (CIL) suffer from decision boundary overfitting to new classes, which is mainly caused by two factors: insufficiency of old classes data for knowledge distillation and imbalanced data learning between the learned and new classes because of the limited storage memory. In this work, we present a simp...
Preprint
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in...
Article
The availability of handy multi-modal ( i.e ., RGB-D) sensors has brought about a surge of face anti-spoofing research. However, the current multi-modal face presentation attack detection (PAD) has two defects: (1) The framework based on multi-modal fusion requires providing modalities consistent with the training input, which seriously limits the...
Article
Source-free unsupervised domain adaptation (SFUDA) aims to conduct prediction on the target domain by leveraging knowledge from the well-trained source model. Due to the absence of source data in the SFUDA setting, the existing methods mainly build the target classifier by fine-tuning the source model incorporated with empirical adaptation losses....
Article
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications ( i.e ., phone unlocking) while lacking consideration of long-distance scenes ( i.e ., surveillance security checks). In order to promote relevant research and fill this...
Preprint
Full-text available
Motion recognition is a promising direction in computer vision, but the training of video classification models is much harder than images due to insufficient data and considerable parameters. To get around this, some works strive to explore multimodal cues from RGB-D data. Although improving motion recognition to some extent, these methods still f...
Article
Most face verification systems verify a person’s identity by comparing the ID document with the live face (also called spot face). More specifically, the spot face can be regarded as the probe image and the face in ID document can be regard as the reference image. The identity verification is then conducted by calculating the similarity of the two...
Preprint
Full-text available
Vision Transformers (ViTs) have shown promising performance compared with Convolutional Neural Networks (CNNs), but the training of ViTs is much harder than CNNs. In this paper, we define several metrics, including Dynamic Data Proportion (DDP) and Knowledge Assimilation Rate (KAR), to investigate the training process, and divide it into three peri...
Article
Unsupervised domain adaptation (UDA) is an emerging learning paradigm that models on unlabeled datasets by leveraging model knowledge built on other labeled datasets, in which the statistical distributions of these datasets are usually not identical. Formally, UDA is to leverage knowledge from a labeled source domain to promote an unlabeled target...
Article
Context prediction plays a crucial role in implementing autonomous driving applications. As one of important context-prediction tasks, crowd-and-vehicle counting is critical for achieving real-time traffic and crowd analysis, consequently facilitating decision-making processes for autonomous vehicles. However, the completion of crowd-and-vehicle co...
Article
Recently, owing to the superior performances, knowledge distillation-based (kd-based) methods with the exemplar rehearsal have been widely applied in class incremental learning (CIL). However, we discover that they suffer from the feature uncalibration problem, which is caused by directly transferring knowledge from the old model immediately to the...
Article
In our daily life, a large number of activities require identity verification, e.g., ePassport gates. Most of those verification systems recognize who you are by matching the ID document photo (ID face) to your live face image (spot face). The ID vs. Spot (IvS) face recognition is different from general face recognition where each dataset usually c...
Preprint
Full-text available
The networks trained on the long-tailed dataset vary remarkably, despite the same training settings, which shows the great uncertainty in long-tailed learning. To alleviate the uncertainty, we propose a Nested Collaborative Learning (NCL), which tackles the problem by collaboratively learning multiple experts together. NCL consists of two core comp...
Preprint
In this paper, a novel approach via embedded tensor manifold regularization for 2D+3D facial expression recognition (FERETMR) is proposed. Firstly, 3D tensors are constructed from 2D face images and 3D face shape models to keep the structural information and correlations. To maintain the local structure (geometric information) of 3D tensor samples...
Article
Full-text available
Face presentation attack detection (PAD) is essential to secure face recognition systems primarily from high-fidelity mask attacks. Most existing 3D mask PAD benchmarks suffer from several drawbacks: 1) a limited number of mask identities, types of sensors, and a total number of videos; 2) low-fidelity quality of facial masks. Basic deep models and...
Article
Facial age estimation has attracted considerable attention owing to its great potential in applications. However, it still falls short of reliable age estimation due to the lack of sufficient training data with accurate age labels. Using conventional semi-supervised methods to exploit unlabeled data appears to be a good solution, but it does not yi...
Preprint
Full-text available
Decoupling spatiotemporal representation refers to decomposing the spatial and temporal features into dimension-independent factors. Although previous RGB-D-based motion recognition methods have achieved promising performance through the tightly coupled multi-modal spatiotemporal representation, they still suffer from (i) optimization difficulty un...
Chapter
Full-text available
Facial age estimation is an important yet very challenging problem in computer vision. To improve the performance of facial age estimation, we first formulate a simple standard baseline and build a much strong one by collecting the tricks in pre-training, data augmentation, model architecture, and so on. Compared with the standard baseline, the pro...
Preprint
Full-text available
Facial age estimation is an important yet very challenging problem in computer vision. To improve the performance of facial age estimation, we first formulate a simple standard baseline and build a much strong one by collecting the tricks in pre-training, data augmentation, model architecture, and so on. Compared with the standard baseline, the pro...
Article
Full-text available
Multi-label pedestrian attribute recognition in surveillance is inherently a challenging task due to poor imaging quality, large pose variations, and so on. In this paper, we improve its performance from the following two aspects: (1) We propose a cascaded Split-and-Aggregate Learning (SAL) to capture both the individuality and commonality for all...
Preprint
Full-text available
The threat of 3D masks to face recognition systems is increasingly serious and has been widely concerned by researchers. To facilitate the study of the algorithms, a large-scale High-Fidelity Mask dataset, namely CASIA-SURF HiFiMask (briefly HiFiMask) has been collected. Specifically, it consists of a total amount of 54, 600 videos which are record...
Article
Face presentation attack detection, also termed Face Anti-Spoofing (FAS) [item 1), 2) in the Appendix), is a hot and challenging research topic that has received much attention from the computer vision and pattern recognition communities in the past. Owing to the development of deep learning and big data, recent advances in this and related fields...
Preprint
Full-text available
Dealing with incomplete information is a well studied problem in the context of machine learning and computational intelligence. However, in the context of computer vision, the problem has only been studied in specific scenarios (e.g., certain types of occlusions in specific types of images), although it is common to have incomplete information in...
Article
Full-text available
Gesture recognition has attracted considerable attention owing to its great potential in applications. Although the great progress has been made recently in multi-modal learning methods, existing methods still lack effective integration to fully explore synergies among spatio-temporal modalities effectively for gesture recognition. The problems are...
Article
Background Gesture recognition has attracted significant attention because of its wide range of potential applications. Although multi-modal gesture recognition has made significant progress in recent years, a popular method still is simply fusing prediction scores at the end of each branch, which often ignores complementary features among differen...
Article
Human gesture recognition has drawn much attention in the area of computer vision. However, the performance of gesture recognition is always influenced by some gesture-irrelevant factors like the background and the clothes of performers. Therefore, focusing on the regions of hand/arm is important to the gesture recognition. Meanwhile, a more adapti...
Preprint
Full-text available
Face presentation attack detection (PAD) is essential to secure face recognition systems primarily from high-fidelity mask attacks. Most existing 3D mask PAD benchmarks suffer from several drawbacks: 1) a limited number of mask identities, types of sensors, and a total number of videos; 2) low-fidelity quality of facial masks. Basic deep models and...
Article
Face Presentation Attack Detection (PAD) approaches based on multi-modal data have been attracted increasingly by the research community. However, they require multi-modal face data consistently involved in both the training and testing phases. It would severely limit the applicability due to the most Face Anti-spoofing (FAS) systems are only equip...
Preprint
Full-text available
Human gesture recognition has drawn much attention in the area of computer vision. However, the performance of gesture recognition is always influenced by some gesture-irrelevant factors like the background and the clothes of performers. Therefore, focusing on the regions of hand/arm is important to the gesture recognition. Meanwhile, a more adapti...
Article
Full-text available
Face anti‐spoofing is critical to prevent face recognition systems from a security breach. The biometrics community has achieved impressive progress recently due to the excellent performance of deep neural networks and the availability of large datasets. Although ethnic bias has been verified to severely affect the performance of face recognition s...
Article
Full-text available
First impressions strongly influence social interactions, having a high impact in the personal and professional life. In this paper, we present a deep Classification-Regression Network (CR-Net) for analyzing the Big Five personality problem and further assisting on job interview recommendation in a first impressions setup. The setup is based on the...
Article
Face anti-spoofing (FAS) plays a vital role in securing face recognition systems. Existing methods heavily rely on the expert-designed networks, which may lead to a sub-optimal solution for FAS task. Here we propose the first FAS method based on neural architecture search (NAS), called NAS-FAS, to discover the well-suited task-aware networks. Unlik...
Preprint
Full-text available
Face anti-spoofing (FAS) plays a vital role in securing face recognition systems. Existing methods heavily rely on the expert-designed networks, which may lead to a sub-optimal solution for FAS task. Here we propose the first FAS method based on neural architecture search (NAS), called NAS-FAS, to discover the well-suited task-aware networks. Unlik...
Preprint
Full-text available
Gesture recognition has attracted considerable attention owing to its great potential in applications. Although the great progress has been made recently in multi-modal learning methods, existing methods still lack effective integration to fully explore synergies among spatio-temporal modalities effectively for gesture recognition. The problems are...
Article
The ChaLearn large-scale gesture recognition challenge has run twice in two workshops in conjunction with the International Conference on Pattern Recognition (ICPR) 2016 and International Conference on Computer Vision (ICCV) 2017, attracting more than 200 teams around the world. This challenge has two tracks, focusing on isolated and continuous ges...
Article
The diversity of multimedia data in the real world usually forms heterogeneous types of feature sets. How to explore the structure information and the relationships among multiple features is still an open problem. In this paper, we propose an unsupervised subspace learning method, named the shared low-rank correlation embedding (SLRCE) for multipl...
Article
Full-text available
Human behaviour analysis has introduced several challenges in various fields, such as applied information theory, affective computing, robotics, biometrics and pattern recognition [...]
Article
Full-text available
The papers in this special issue comprise all aspects of computer vision and pattern recognition devoted to image and video inpainting, including related tasks like denoising, debluring, sampling, super-resolutkon enhancement, restoration, hallucination, etc. The special issue was associated to the 2018 Chalearn Looking at People Satellite ECCV Wor...
Preprint
Full-text available
Face anti-spoofing is critical to prevent face recognition systems from a security breach. The biometrics community has %possessed achieved impressive progress recently due the excellent performance of deep neural networks and the availability of large datasets. Although ethnic bias has been verified to severely affect the performance of face recog...
Article
In this paper, we propose a new end-to-end network, named Joint Learning of Attribute and Contextual relations (JLAC), to solve the task of pedestrian attribute recognition. It includes two novel modules: Attribute Relation Module (ARM) and Contextual Relation Module (CRM). For ARM, we construct an attribute graph with attribute-specific features w...
Preprint
Full-text available
Ethnic bias has proven to negatively affect the performance of face recognition systems, and it remains an open research problem in face anti-spoofing. In order to study the ethnic bias for face anti-spoofing, we introduce the largest up to date CASIA-SURF Cross-ethnicity Face Anti-spoofing (CeFA) dataset (briefly named CeFA), covering $3$ ethnicit...
Chapter
In this chapter, the results obtained by the thirteen teams that qualified to the final phase of the challenge are presented. We first present the performance of the top three teams [Parkin and Grinchuk, 2019, Shen et al., 2019, Zhang et al., 2019a]. Then, the effectiveness of proposed algorithms are analyzed and we point out some limitations of th...
Chapter
This section describes the top-ranked solutions developed in the context of the ChaLearn Face Anti-spoofing attack detection challenge. Additionally, we also describe the baseline which we have developed for the competition.
Chapter
In this chapter, we first introduce CASIA-SURF the largest multi-modal dataset for the study of face anti-spoofing. Then, we briefly describe the challenge organized around this dataset.
Preprint
Full-text available
Regardless of the usage of deep learning and handcrafted methods, the dynamic information from videos and the effect of cross-ethnicity are rarely considered in face anti-spoofing. In this work, we propose a static-dynamic fusion mechanism for multi-modal face anti-spoofing. Inspired by motion divergences between real and fake faces, we incorporate...
Chapter
In this paper, we focus on isolated gesture recognition from RGB-D videos. Our main idea is to design an algorithm that can extract global and local information from multi-modality inputs. To this end, we propose a novel attention-based method with 3D convolutional neural network (CNN) to recognize isolated gesture recognition. It includes two part...
Chapter
Dealing with incomplete information is a well studied problem in the context of machine learning and computational intelligence. However, in the context of computer vision, the problem has only been studied in specific scenarios (e.g., certain types of occlusions in specific types of images), although it is common to have incomplete information in...
Preprint
Pedestrian detection has achieved significant progress with the availability of existing benchmark datasets. However, there is a gap in the diversity and density between real world requirements and current pedestrian detection benchmarks: 1) most of existing datasets are taken from a vehicle driving through the regular traffic scenario, usually lea...
Preprint
Face anti-spoofing is essential to prevent face recognition systems from a security breach. Much of the progresses have been made by the availability of face anti-spoofing benchmark datasets in recent years. However, existing face anti-spoofing benchmarks have limited number of subjects ($\le\negmedspace170$) and modalities ($\leq\negmedspace2$), w...
Conference Paper
Full-text available
In this paper, we propose a novel unified network named Deep Hybrid-Aligned Architecture for facial age estimation. It contains global, local and global-local branches. They are jointly optimized and thus can capture multiple types of features with complementary information. In each branch, we employ a separate loss for each sub-network to extract...
Conference Paper
Full-text available
Face anti-spoofing is essential to prevent face recognition systems from a security breach. Much of the progresses have been made by the availability of face anti-spoofing benchmark datasets in recent years. However, existing face anti-spoofing benchmarks have limited number of subjects (≤ 170) and modalities (≤ 2), which hinder the further develop...
Preprint
Full-text available
The ChaLearn large-scale gesture recognition challenge has been run twice in two workshops in conjunction with the International Conference on Pattern Recognition (ICPR) 2016 and International Conference on Computer Vision (ICCV) 2017, attracting more than $200$ teams round the world. This challenge has two tracks, focusing on isolated and continuo...
Article
Pedestrian detection has achieved significant progress with the availability of exiting benchmark datasets. However, there is a gap in the diversity and density between real world requirements and current pedestrian detection benchmarks: 1) most of existing datasets are taken from a vehicle driving through the regular traffic scenario, usually lead...
Article
Full-text available
Recognizing the pedestrian attributes in surveillance scenes is an inherently challenging task, especially for the pedestrian images with large pose variations, complex backgrounds and various camera viewing angles. To select important and discriminative regions or pixels against the variations, three attention mechanisms are proposed, including pa...
Article
Full-text available
Deep multitask learning for face analysis has received increasing attentions. From literature, most existing methods focus on optimizing a main task by jointly learning several auxiliary tasks. It is challenging to consider the performance of each task in a multitask framework due to the following reasons: 1) different face tasks usually rely on di...
Article
The recent studies for face alignment have involved developing an isolated algorithm on well-cropped face images. It is difficult to obtain the expected input by using an off-the-shelf face detector in practical applications. In this paper, we attempt to bridge between face detection and face alignment by establishing a novel joint multi-task model...
Preprint
Human motion recognition is one of the most important branches of human-centered research activities. In recent years, motion recognition based on RGB-D data has attracted much attention. Along with the development in artificial intelligence, deep learning techniques have gained remarkable success in computer vision. In particular, convolutional ne...
Article
In this paper, a 4D tensor model is firstly constructed to explore efficient structural information and correlations from multi-modal data (both 2D and 3D face data). As the dimensionality of the generated 4D tensor is high, a tensor dimensionality reduction technique is in need. Since many real-world high-order data often reside in a low dimension...
Book
The problem of dealing with missing or incomplete data in machine learning and computer vision arises in many applications. Recent strategies make use of generative models to impute missing or corrupted data. Advances in computer vision using deep generative models have found applications in image/video processing, such as denoising, restoration, s...
Preprint
Face anti-spoofing is essential to prevent face recognition systems from a security breach. Much of the progresses have been made by the availability of face anti-spoofing benchmark datasets in recent years. However, existing face anti-spoofing benchmarks have limited number of subjects ($\le\negmedspace170$) and modalities ($\leq\negmedspace2$), w...
Preprint
Full-text available
Deep learning based computer vision fails to work when labeled images are scarce. Recently, Meta learning algorithm has been confirmed as a promising way to improve the ability of learning from few images for computer vision. However, previous Meta learning approaches expose problems: 1) they ignored the importance of attention mechanism for the Me...

Network

Cited By