Zhenbo Yu's research while affiliated with Renji Hospital and other places

Publications (14)

Article
Various methods have been proposed to defend against adversarial attacks. However, there is a lack of enough theoretical guarantee of the performance , thus leading to two problems: First, deficiency of necessary adversarial training samples might attenuate the normal gradient's back-propagation, which leads to overfitting and gradient masking...
Article
Supervised learning is a mainstay for large discriminative models in 3D computer vision, while large amounts of human-annotated data are the key to achieve state-of-the-art performance. This limitation is particularly notable for large-scale point cloud sequence segmentation tasks, because point-level annotations are very time-consuming and especia...
Article
Recent works on learning-based frameworks for Lagrangian (i.e., particle-based) fluid simulation, though bypassing iterative pressure projection via efficient convolution operators, are still time-consuming due to excessive amount of particles. To address this challenge, we propose a dynamic multi-scale gridding method to reduce the magnitude of el...
Preprint
Humans have a strong intuitive understanding of physical processes such as fluid falling by just a glimpse of such a scene picture, i.e., quickly derived from our immersive visual experiences in memory. This work achieves such a photo-to-fluid-dynamics reconstruction functionality learned from unannotated videos, without any supervision of ground-t...
Article
The key point for an experienced craftsman to repair broken objects effectively is that he must know about them deeply. Similarly, we believe that a model can capture rich geometry information from a shape/scene and generate discriminative representations if it is able to find distorted parts of shapes/scenes and restore them. Inspired by this obse...
Chapter
Full-text available
Given a single chair image, could we wake it up by reconstructing its 3D shape and skeleton, as well as animating its plausible articulations and motions, similar to that of human modeling? It is a new problem that not only goes beyond image-based object reconstruction but also involves articulated animation of generic objects in 3D, which could gi...
Preprint
Full-text available
Human motion synthesis is a long-standing problem with various applications in digital twins and the Metaverse. However, modern deep learning based motion synthesis approaches barely consider the physical plausibility of synthesized motions and consequently they usually produce unrealistic human motions. In order to solve this problem, we propose a...
Article
We address cross-species 3D face morphing (i.e., 3D face morphing from human to animal), a novel problem with promising applications in social media and movie industry. It remains challenging how to preserve target structural information and source fine-grained facial details simultaneously. To this end, we propose an Alignment-aware 3D Face Morphin...
Preprint
Given a single chair image, could we extract its 3D shape and animate its plausible articulations and motions? This is an interesting new question that may have numerous downstream augmented reality and virtual reality applications. In this paper, we propose an automated approach to tackle the entire process of reconstruct such 3-D generic objects...

Citations

... Current approaches for fully supervised point cloud semantic segmentation heavily rely on deep learning networks. However, progress in this area is hindered by the laborious process of annotating large-scale point cloud datasets, impeding the development of deep point cloud semantic segmentation methods [4,5]. It is an important topic for 3D scene understanding with a wide range of applications, such as robotics, autonomous driving, and augmented reality. ...
... In our systemically undertaken review of published literature, only one prior work was identified that investigated unsupervised 2D-3D lifting from a single image, OCR-Pose by Wang et al. [27]. OCR-Pose incorporates two modules: a topology invariant contrastive learning (TiCLR) module and a view equivariant contrastive learning (VeCLR) module. ...
... Works in motion sythetisation are predominantly directed towards generating controllable, general actions for use in animation [42,27,36,3]. Yan et al. [54] proposed a convolutional architecture named Convolutional Sequence Generation Network (CSGN) for generating skeleton sequences for action recognition. The authors employed spatial graph downsampling and temporal downsampling to generate the whole sequence in a single pass, using latent vectors sampled from gaussian processes. ...
... Yan et al. [27] proposed an alignment-aware 3D face morphing framework utilizing an encoder-decoder structure, effectively morphing three-dimensional human face mesh data into an animal face with alignment-aware control. Wang et al. [28] proposed neural cages, a method for cage-based deformations that predicts deformations in a more natural space with greater accuracy than existing techniques. ...
... In monocular approaches, temporally correlated 2D poses can be estimated from an input video and used as a supervision signal for a frame-specific 3D pose estimation [15,55,30]. In-the-wild approaches have mostly relied on 2D pose as ground truths to supervise intermediate 3D estimates [54,12]. Other approaches perform monocular 3D pose estimation using only 2D pose supervision [27,9,2,50]. ...
... However, we face challenges in adopting a diffusionbased approach to tackle human mesh recovery. Firstly, it is difficult to directly produce complicated 3D mesh outputs with a single RGB image as input; as shown in previous works [7,58,42,29,8,22,41,59], it is important to leverage some prior knowledge (e.g., pose information, segmentation maps) as input to guide the mesh recovery process, which is not performed in the standard diffusion process. Secondly, it is difficult to predict an accurate mesh output that corresponds to the RGB image, using only the standard diffusion process [16]. ...
... This concept was further extended to 3D [46]. Another keypoint-based approach is 2D compositional human pose (CHP) [47], which introduces a blend of bone and limb vectors and has been further developed into 3D [48,49]. Alternatively, richer information can be obtained using model-based representations. ...