Article

DiffusionNet: Discretization Agnostic Learning on Surfaces

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We introduce a new general-purpose approach to deep learning on three-dimensional surfaces based on the insight that a simple diffusion layer is highly effective for spatial communication. The resulting networks are automatically robust to changes in resolution and sampling of a surface—a basic property that is crucial for practical applications. Our networks can be discretized on various geometric representations, such as triangle meshes or point clouds, and can even be trained on one representation and then applied to another. We optimize the spatial support of diffusion as a continuous network parameter ranging from purely local to totally global, removing the burden of manually choosing neighborhood sizes. The only other ingredients in the method are a multi-layer perceptron applied independently at each point and spatial gradient features to support directional filters. The resulting networks are simple, robust, and efficient. Here, we focus primarily on triangle mesh surfaces and demonstrate state-of-the-art results for a variety of tasks, including surface classification, segmentation, and non-rigid correspondence.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Unlike the functional map-based methods that utilize handcrafted features (descriptors), deep functional maps aim to learn features directly from training data by using deep learning. The pioneering work FMNet [26] optimized shape descriptor SHOT [51] via residual multilayer perceptron (MLP) layers and its strategy has been followed by subsequent works [2,10,13,14,48,52,53], where DiffusionNet [53] was known as the most advanced feature extractor since that can capture discretization-resistant and orientation-aware shape features. Then, most of the followup is based on it, such as [4,6,7,9,18,23,24,28,30,56]. ...
... Unlike the functional map-based methods that utilize handcrafted features (descriptors), deep functional maps aim to learn features directly from training data by using deep learning. The pioneering work FMNet [26] optimized shape descriptor SHOT [51] via residual multilayer perceptron (MLP) layers and its strategy has been followed by subsequent works [2,10,13,14,48,52,53], where DiffusionNet [53] was known as the most advanced feature extractor since that can capture discretization-resistant and orientation-aware shape features. Then, most of the followup is based on it, such as [4,6,7,9,18,23,24,28,30,56]. ...
... Bold corresponds to the best method The numbers in the table are the average geodesic errors ×10 3 for readability. We use the discretization robust correspondence benchmark provided by [53] to test the robustness of the methods across different shape discretizations. We compute the correspondences using the shapes of orig, iso, dense, qes, and mc datasets, respectively, as the targets and the shapes of orig as the sources Fig. 6 Qualitative comparisons of robustness to shape discretizations. ...
Article
Full-text available
The functional maps framework has achieved remarkable success in non-rigid shape matching. However, the traditional functional map representations do not explicitly encode surface orientation, which can easily lead to orientation-reversing correspondence. The complex functional map addresses this issue by linking oriented tangent bundles to favor orientation-preserving correspondence. Nevertheless, the absence of effective restrictions on the complex functional maps hinders them from obtaining high-quality correspondences. To this end, we introduce novel and powerful constraints to determine complex functional maps by incorporating multiple complex spectral filter operator preservation constraints with a rigorous theoretical guarantee. Such constraints encode the surface orientation information and enforce the isometric property of the map. Based on these constraints, we propose a novel and efficient method to obtain orientation-preserving and accurate correspondences across shapes by alternatively updating the functional maps, complex functional maps, and pointwise maps. Extensive experiments demonstrate our significant improvements in correspondence quality and computing efficiency. In addition, our constraints can be easily adapted to other functional maps-based methods to enhance their performance.
... In lieu of this, spectral approaches (Wiersma et al., 2022;Sharp et al., 2022) operate on the functional space characterized by differential operators. Network architectures that operate on these function spaces, such as the eigenvectors of the Laplace operator, are more robust to discretization and are able to generalize to other domains (e.g., point clouds) as long as the differential operators are defined. ...
... Starting with a set of tangent vectors u ∈ C n×c l (represented as complex numbers, see Sec. 3.1), where n denotes the number of vertices and c l denotes the number of tangent vector fields. We harness information from local neighbor-hoods by solving the vector heat equation with the implicit Euler integration in Eq. 3, inspired by (Sharp et al., 2022). As the implicit Euler requires an expensive step of solving a linear system, we utilize spectral acceleration (Donati et al., 2022) to speed up the process. ...
... Inspired by (Sharp et al., 2022), we treat time-step size s in Eq. 5 as trainable parameters. Intuitively, the network learns whether to diffuse the vectors over a small or large local neighborhood. ...
Preprint
Full-text available
Vector fields are widely used to represent and model flows for many science and engineering applications. This paper introduces a novel neural network architecture for learning tangent vector fields that are intrinsically defined on manifold surfaces embedded in 3D. Previous approaches to learning vector fields on surfaces treat vectors as multi-dimensional scalar fields, using traditional scalar-valued architectures to process channels individually, thus fail to preserve fundamental intrinsic properties of the vector field. The core idea of this work is to introduce a trainable vector heat diffusion module to spatially propagate vector-valued feature data across the surface, which we incorporate into our proposed architecture that consists of vector-valued neurons. Our architecture is invariant to rigid motion of the input, isometric deformation, and choice of local tangent bases, and is robust to discretizations of the surface. We evaluate our Vector Heat Network on triangle meshes, and empirically validate its invariant properties. We also demonstrate the effectiveness of our method on the useful industrial application of quadrilateral mesh generation.
... Keywords Graph Neural Networks · Partial Differential Equations · Product Graphs. Continuous Product Graph Neural Networks learning graph neighborhoods, alleviating the limitations of discrete GNNs [16,18,20]. Despite these advantages, CGNNs mostly rely on a single graph and lack a principled framework for learning joint multi-graph interactions. ...
... The continuity and differentiability of these filters allow the graph receptive fields to be learned adaptively during the training process, eliminating the need for a grid search. Additionally, CITRUS maintains low computational complexity by relying on the spectral decomposition of the small factor graphs [16,20]. Furthermore, unlike non-graph approaches, the number of learnable parameters in CITRUS is independent of the factor graphs, ensuring scalability. ...
... Let K p ≤ N p be the number of selected eigenvalue-eigenvector pairs of the p-th factor Laplacian. When K p = N p , it can be shown [16,20] that we can rewrite (6) as follows: ...
Preprint
Full-text available
Processing multidomain data defined on multiple graphs holds significant potential in various practical applications in computer science. However, current methods are mostly limited to discrete graph filtering operations. Tensorial partial differential equations on graphs (TPDEGs) provide a principled framework for modeling structured data across multiple interacting graphs, addressing the limitations of the existing discrete methodologies. In this paper, we introduce Continuous Product Graph Neural Networks (CITRUS) that emerge as a natural solution to the TPDEG. CITRUS leverages the separability of continuous heat kernels from Cartesian graph products to efficiently implement graph spectral decomposition. We conduct thorough theoretical analyses of the stability and over-smoothing properties of CITRUS in response to domain-specific graph perturbations and graph spectra effects on the performance. We evaluate CITRUS on well-known traffic and weather spatiotemporal forecasting datasets, demonstrating superior performance over existing approaches.
... Noise causes deformation in urban meshes, especially for small objects, making them difficult to recognize. The diffusion perceptron is inspired by diffusion operations (Sharp et al., 2022), which can smooth or blur the sharp and incoherent geometric details in urban meshes (Wang et al., 2016), thereby reducing noise and enhancing spatial features of urban meshes. The proposed approach learns a diffusion time, which can dynamically adjust the receptive field of diffusion perceptron (see Section 3.4.1) to model meshes features from local gradually to global. ...
... Spectral methods can extract geometric features well and reduce the computational costs (Li et al., 2023a). Thus, other works (Smirnov and Solomon, 2021;Sharp et al., 2022;Feng et al., 2019) apply spectral methods to model the spatial geometric features. ...
... However, both GDC and MkdCNN only use receptive fields of consistent sizes to acquire local features and lack global features. In this paper, we adopt a learnable diffusion process (Sharp et al., 2022) to model dependencies among different regions directly, and capture features from local to entirely global. Following approaches leveraging spectral acceleration (Zi et al., 2021a;Sharp et al., 2022;Shi and Rajkumar, 2020), we accelerate diffusion via spectral functions. ...
Article
Full-text available
Urban meshes semantic segmentation is essential for comprehending the 3D real-world environments, as it plays a vital role across various application domains, including digital twins, 3D navigation, and smart cities. Nevertheless, the inherent topological complexities of urban meshes impede the precise representation of dependencies and local structures, yielding compromised segmentation accuracy, especially for small or irregularly-shaped objects like vegetation and vehicles. To address this challenge, we introduce UrbanSegNet, a novel end-to-end model incorporating diffusion perceptron blocks and a vertex spatial attention mechanism. The diffusion perceptron blocks can dynamically enlarge receptive fields to capture features from local to completely global, enabling effective representation of urban meshes using multi-scale features and increasing small and irregularly-shaped object segmentation accuracy. The vertex spatial attention mechanism extracts the internal correlations within urban meshes to enhance semantic segmentation performance. Besides, a tailored loss function is designed to enhance overall performance further. Comprehensive experiments on two datasets demonstrate that the proposed method outperforms the state-of-the-art models in terms of mean F1 score, recall, and mean intersection over union (mIoU). The experimental results also demonstrate that UrbanSegNet achieves higher segmentation accuracy on vehicles and high vegetation compared to the state-of-the-art methods, highlighting the superiority of our proposed model in extracting features of small and irregularly-shaped objects.
... DiffusionNet (Sharp et al., 2022) on the original meshes; 2) realignment of the meshes based on the roughly predicted landmarks; 3) segmentation of the facial region through fitting of a template facial mesh using a morphable model; 4) refined landmark prediction on the segmented meshes using a final DiffusionNet. The DiffusionNet models used spatial features only and did not use texture information for the automated landmarking task. ...
... Step 1: Rough prediction of landmarks A DiffusionNet, a state-of-the-art and robust deep learning network for 3D surfaces, was utilized for initial prediction of the exocanthions, endocanthions, nasion, nose tip, alares, and cheilions as visualized in Figure 2. (Sharp et al., 2022) Preprocessing To speed up the training process, each mesh was downsampled to a maximum of 25.000 vertices (Garland & Heckbert, 1997). Subsequently, a mask was applied, assigning a value of 1 to all vertices located within 5 mm Euclidean distance to the manually annotated landmarks and a value of 0 to the remaining vertices. ...
... In contrast to step 1, the meshes were not downsampled. A mask was created in which the vertices within 3.5 mm of the manually annotated landmarks were assigned value 1 and the other vertices were assigned value 0. Default mesh normalization and scaling were applied as provided by the DiffusionNet package (Sharp et al., 2022). ...
Preprint
Full-text available
Three-dimensional facial stereophotogrammetry provides a detailed representation of craniofacial soft tissue without the use of ionizing radiation. While manual annotation of landmarks serves as the current gold standard for cephalometric analysis, it is a time-consuming process and is prone to human error. The aim in this study was to develop and evaluate an automated cephalometric annotation method using a deep learning-based approach. Ten landmarks were manually annotated on 2897 3D facial photographs by a single observer. The annotation process was repeated by the first observer, a second observer, and a third observer on 50 randomly selected 3D photos to assess intra-observer and inter-observer variability. The automated landmarking workflow involved two successive DiffusionNet models and additional algorithms for facial segmentation. The dataset was randomly divided into a training (85%) and test (15%) dataset. The training dataset was used to train the deep learning networks, whereas the test dataset was used to evaluate the performance of the automated workflow. The landmarks were also annotated using a semi-automatic method on all 3D photographs. The precision of the workflow was evaluated by calculating the Euclidean distances between the automated and manual landmarks and compared to the intra-observer and inter-observer variability of manual annotation and the semi-automated landmarking method. The workflow was successful in 98.6% of all test cases. The deep learning-based landmarking method achieved precise and consistent landmark annotation. The mean precision of 1.69 ± 1.15 mm was comparable to the inter-observer variability (1.31 ± 0.91 mm) of manual annotation. The Euclidean distance between the automated and manual landmarks was within 2 mm in 69%. Automated landmark annotation on 3D photographs was achieved with the DiffusionNet-based approach. The proposed method allows quantitative analysis of large datasets and may be used in diagnosis, follow-up, and virtual surgical planning. Repository: https://github.com/rumc3dlab/3dlandmarkdetection/
... Instead of learning from labeled maps, unsupervised approaches [20,45] demonstrate that it is sufficient to learn from geometric map priors. More recently, with the development of robust mesh feature extractors [47], more frameworks [9,29,11,3] are proposed to learn directly from geometry, yielding state-of-the-art performance. Cycle Consistency Cycle consistency has long been used as a strong prior for joint map optimization among a collection of shapes. ...
... Specifically, we use DiffusionNet [47] as our feature extractor. And WKS [?] descriptors are fed into it as initialization of learned features. ...
... We implement our network with PyTorch [41]. We use four DiffusionNet blocks [47] as feature backbone and borrow the functional map block with Laplacian regularizer from [12]. The dimension of the Laplace-Beltrami eigenbasis is set to 50. ...
Preprint
Cycle consistency has long been exploited as a powerful prior for jointly optimizing maps within a collection of shapes. In this paper, we investigate its utility in the approaches of Deep Functional Maps, which are considered state-of-the-art in non-rigid shape matching. We first justify that under certain conditions, the learned maps, when represented in the spectral domain, are already cycle consistent. Furthermore, we identify the discrepancy that spectrally consistent maps are not necessarily spatially, or point-wise, consistent. In light of this, we present a novel design of unsupervised Deep Functional Maps, which effectively enforces the harmony of learned maps under the spectral and the point-wise representation. By taking advantage of cycle consistency, our framework produces state-of-the-art results in mapping shapes even under significant distortions. Beyond that, by independently estimating maps in both spectral and spatial domains, our method naturally alleviates over-fitting in network training, yielding superior generalization performance and accuracy within an array of challenging tests for both near-isometric and non-isometric datasets. Codes are available at https://github.com/rqhuang88/Spatiallyand-Spectrally-Consistent-Deep-Functional-Maps.
... While graph neural networks have been successfully applied to classification tasks such as shape recognition, segmentation, and registration [15][16][17][18], regression tasks where the network should predict a (potentially vectorvalued) function on the input geometry are much less studied [19,20]. Moreover, graph neural networks that are used for processing discrete surface data have mostly been applied to closed surfaces, i.e., surfaces without boundary. ...
... N 2 R 3 of (18) results in the smallest possible residual. The neural network should then approximate an operator that assigns to each sufficiently large set of input features a parameterization that is optimal with respect to (18). Without fixing some of the parameter values a priori, this operator is not uniquely defined, making it difficult to directly train neural networks for this task. ...
Article
Full-text available
Surface reconstruction from scattered point clouds is the process of generating surfaces from unstructured data configurations retrieved using an acquisition device such as a laser scanner. Smooth surfaces are possible with the use of spline representations, an established mathematical tool in computer-aided design and related application areas. One key step in the surface reconstruction process is the parameterization of the points, that is, the construction of a proper mapping of the 3D point cloud to a planar domain that preserves surface boundary and interior points. Despite achieving a remarkable progress, existing heuristics for generating a suitable parameterization face challenges related to the accuracy, the robustness with respect to noise, and the computational efficiency of the results. In this work, we propose a boundary-informed dynamic graph convolutional network (BIDGCN) characterized by a novel boundary-informed input layer, with special focus on applications related to adaptive spline approximation of scattered data. The newly introduced layer propagates given boundary information to the interior of the point cloud, in order to let the input data be suitably processed by successive graph convolutional network layers. We apply our BIDGCN model to the problem of parameterizing three-dimensional unstructured data sets over a planar domain. A selection of numerical examples shows the effectiveness of the proposed approach for adaptive spline fitting with (truncated) hierarchical B-spline constructions. In our experiments, improved accuracy is obtained, e.g., from 60% up to 80% for noisy data, while speedups ranging from 4 up to 180 times are observed with respect to classical algorithms. Moreover, our method automatically predicts the local neighborhood graph, leading to much more robust results without the need for delicate free parameter selection.
... It is worth noting that the continuous CGProNet can propagate global and local information within each time step with flexible and learnable parameters. Therefore, the efficient receptive field on the neighbor nodes is automatically optimized through the training process, alleviating the over-smoothing and over-squashing issues [33][34][35]. Further theoretical and experimental analysis of the continuous CGProNet model can be found in Appendix C. ...
... are the learnable graph filter coefficients. Using this approach, both the global and local information can be propagated within each time step with flexible and learnable parameters {t i } M i=1 [33,34]. The main challenge here is the need for performing an EVD on A (O(N 3 )) that can be precomputed only once as a preprocessing step. ...
Preprint
Full-text available
Graph Neural Networks (GNNs) have advanced spatiotemporal forecasting by leveraging relational inductive biases among sensors (or any other measuring scheme) represented as nodes in a graph. However, current methods often rely on Recurrent Neural Networks (RNNs), leading to increased runtimes and memory use. Moreover, these methods typically operate within 1-hop neighborhoods, exacerbating the reduction of the receptive field. Causal Graph Processes (CGPs) offer an alternative, using graph filters instead of MLP layers to reduce parameters and minimize memory consumption. This paper introduces the Causal Graph Process Neural Network (CGProNet), a non-linear model combining CGPs and GNNs for spatiotemporal forecasting. CGProNet employs higher-order graph filters, optimizing the model with fewer parameters, reducing memory usage, and improving runtime efficiency. We present a comprehensive theoretical and experimental stability analysis, highlighting key aspects of CGProNet. Experiments on synthetic and real data demonstrate CGProNet's superior efficiency, minimizing memory and time requirements while maintaining competitive forecasting performance.
... The main challenge lies in the need to add triangles to the missing regions of the input mesh while processing it using a deep neural network. Unfortunately, even with state-of-the-art networks [14], [15], [16], [17], dynamic triangulation involving changes in vertex connections remains challenging. This limitation hinders the development of deep-learning-based mesh inpainting methods that do not rely on an intermediate format, despite the existence of several large-scale mesh datasets [18], [19]. ...
... Although there are many studies on shape inpainting using voxel grids, point clouds, and implicit functions, few studies have been conducted for mesh inpainting because general-purpose deep neural networks for meshes have not been well established despite the recent efforts [14], [15], [16], [17]. In addition, all the methods described above require long pretraining times with large-scale datasets. ...
Article
Full-text available
This study presents a self-prior-based mesh inpainting framework that requires only an incomplete mesh as input, without the need for any training datasets. Additionally, our method maintains the polygonal mesh format throughout the inpainting process without converting the shape format to an intermediate, such as a voxel grid, a point cloud, or an implicit function, which are typically considered easier for deep neural networks to process. To achieve this goal, we introduce two graph convolutional networks (GCNs): single-resolution GCN (SGCN) and multi-resolution GCN (MGCN), both trained in a self-supervised manner. Our approach refines a watertight mesh obtained from the initial hole filling to generate a completed output mesh. Specifically, we train the GCNs to deform an oversmoothed version of the input mesh into the expected completed shape. To supervise the GCNs for accurate vertex displacements, despite the unknown correct displacements at real holes, we utilize multiple sets of meshes with several connected regions marked as fake holes. The correct displacements are known for vertices in these fake holes, enabling network training with loss functions that assess the accuracy of displacement vectors estimated by the GCNs. We demonstrate that our method outperforms traditional dataset-independent approaches and exhibits greater robustness compared to other deep-learning-based methods for shapes that less frequently appear in shape datasets.
... Acc. † (Masci et al. 2015) 85.4% PointNet (Qi et al. 2017a) 74.7% PointNet++ (Qi et al. 2017b) 82.3% MeshCNN (Hanocka et al. 2019) 87.8% DGCNN (Wang et al. 2019) 87.8% CGConv (Yang et al. 2021) 89.9% PD-MeshNet (Milano et al. 2020) 86.9% DiffusionNet (Sharp et al. 2022) 91.7% SubdivNet (Hu et al. 2022a) 93.0% MeshMAE (Liang et al. 2022) 90.0% ...
... MeshMAE (Liang et al. 2022) 90.0% Ours (Hanocka et al. 2019) 85.4% PD-MeshNet (Milano et al. 2020) 85.6% HodgeNet (Dmitriy Smirnov 2021) 85.0% DiffusionNet (Sharp et al. 2022) 90.8% SubdivNet (Hu et al. 2022a) 90.8% MeshMAE (Liang et al. 2022) 90.1% Ours 91.1% Table 2: Segmentation results on the HumanBody dataset (Maron et al. 2017). The † rows are evaluations on the original meshes, and the ‡ rows are evaluations on the processed inputs. ...
Preprint
Meshes are widely used in 3D computer vision and graphics, but their irregular topology poses challenges in applying them to existing neural network architectures. Recent advances in mesh neural networks turn to remeshing and push the boundary of pioneer methods that solely take the raw meshes as input. Although the remeshing offers a regular topology that significantly facilitates the design of mesh network architectures, features extracted from such remeshed proxies may struggle to retain the underlying geometry faithfully, limiting the subsequent neural network's capacity. To address this issue, we propose SieveNet, a novel paradigm that takes into account both the regular topology and the exact geometry. Specifically, this method utilizes structured mesh topology from remeshing and accurate geometric information from distortion-aware point sampling on the surface of the original mesh. Furthermore, our method eliminates the need for hand-crafted feature engineering and can leverage off-the-shelf network architectures such as the vision transformer. Comprehensive experimental results on classification and segmentation tasks well demonstrate the effectiveness and superiority of our method.
... As such, we pass the projected encodings to a head consisting of an O(k)-equivariant vector neuron (VN) MLP [36]. We note that for large-scale tasks NIso is potentially well-suited to pair with DiffusionNet [42], which can make use of the learned eigenbasis to perform accelerated operations in the latent space, though we do not consider this regime here. ...
Preprint
Real-world geometry and 3D vision tasks are replete with challenging symmetries that defy tractable analytical expression. In this paper, we introduce Neural Isometries, an autoencoder framework which learns to map the observation space to a general-purpose latent space wherein encodings are related by isometries whenever their corresponding observations are geometrically related in world space. Specifically, we regularize the latent space such that maps between encodings preserve a learned inner product and commute with a learned functional operator, in the same manner as rigid-body transformations commute with the Laplacian. This approach forms an effective backbone for self-supervised representation learning, and we demonstrate that a simple off-the-shelf equivariant network operating in the pre-trained latent space can achieve results on par with meticulously-engineered, handcrafted networks designed to handle complex, nonlinear symmetries. Furthermore, isometric maps capture information about the respective transformations in world space, and we show that this allows us to regress camera poses directly from the coefficients of the maps between encodings of adjacent views of a scene.
... Then, the ground-truth correspondences are obtained by a template model. The remeshed datasets, in which each shape contains approximately 5k vertices, already have been widely used in deformable shape matching tasks [29], [58], [122], [123], [129]- [131]. In this case, the source and target shapes come from the remeshed datasets. ...
Article
A large number of 3D spectral descriptors have been proposed in the literature, which act as an essential component for 3D deformable shape matching and related applications. An outstanding descriptor should have desirable natures including high-level descriptive capacity, cheap storage, and robustness to a set of nuisances. It is, however, unclear which descriptors are more suitable for a particular application. This paper fills the gap by comprehensively evaluating nine state-of-the-art spectral descriptors on ten popular deformable shape datasets as well as perturbations such as mesh discretization, geometric noise, scale transformation, non-isometric setting, partiality, and topological noise. Our evaluated terms for a spectral descriptor cover four major concerns, i.e., distinctiveness, robustness, compactness, and computational efficiency. In the end, we present a summary of the overall performance and several interesting findings that can serve as guidance for the following researchers to construct a new spectral descriptor and choose an appropriate spectral feature in a particular application.
... Recently, diffusion-based approaches have shown remarkable results on several generative tasks ] ranging from image generation Nichol et al. 2022a;Ramesh et al. 2022;Rombach et al. 2022;Saharia et al. 2022], audio synthesis [Kong et al. 2020;Popov et al. 2021], pose estimation , natural language generation , and motion synthesis [Alexanderson et al. 2023;Dabral et al. 2023;, to point cloud generation [Luo and Hu 2021;Nichol et al. 2022b], 3D object synthesis [Poole et al. 2022;Seo et al. 2023;Xiang et al. 2023], and scene creation Sharp et al. 2022;Vuong et al. 2023;Zeng et al. 2022]. Diffusion models have shown that they can achieve high mode coverage, unlike GANs, while still maintaining high sample quality [Ulhaq et al. 2022;]. ...
Article
Music-driven group choreography poses a considerable challenge but holds significant potential for a wide range of industrial applications. The ability to generate synchronized and visually appealing group dance motions that are aligned with music opens up opportunities in many fields such as entertainment, advertising, and virtual performances. However, most of the recent works are not able to generate high-fidelity long-term motions, or fail to enable controllable experience. In this work, we aim to address the demand for high-quality and customizable group dance generation by effectively governing the consistency and diversity of group choreographies. In particular, we utilize a diffusion-based generative approach to enable the synthesis of flexible number of dancers and long-term group dances, while ensuring coherence to the input music. Ultimately, we introduce a Group Contrastive Diffusion (GCD) strategy to enhance the connection between dancers and their group, presenting the ability to control the consistency or diversity level of the synthesized group animation via the classifier-guidance sampling technique. Through intensive experiments and evaluation, we demonstrate the effectiveness of our approach in producing visually captivating and consistent group dance motions. The experimental results show the capability of our method to achieve the desired levels of consistency and diversity, while maintaining the overall quality of the generated group choreography.
... In contrast to step 1, the meshes were not downsampled. A mask was created in which the vertices within 3.5 mm of the manually annotated landmarks were assigned value 1 and the other vertices were assigned value 0. Default mesh normalization and scaling were applied as provided by the DiffusionNet package 14 . ...
Preprint
Full-text available
Three-dimensional facial stereophotogrammetry provides a detailed representation of craniofacial soft tissue without the use of ionizing radiation. While manual annotation of landmarks serves as the current gold standard for cephalometric analysis, it is a time-consuming process and is prone to human error. The aim in this study was to develop and evaluate an automated cephalometric annotation method using a deep learning-based approach. Ten landmarks were manually annotated on 2897 3D facial photographs. The automated landmarking workflow involved two successive DiffusionNet models. The dataset was randomly divided into a training and test dataset. The precision of the workflow was evaluated by calculating the Euclidean distances between the automated and manual landmarks and compared to the intra-observer and inter-observer variability of manual annotation and a semi-automated landmarking method. The workflow was successful in 98.6% of all test cases. The deep learning-based landmarking method achieved precise and consistent landmark annotation. The mean precision of 1.69 ± 1.15 mm was comparable to the inter-observer variability (1.31 ± 0.91 mm) of manual annotation. Automated landmark annotation on 3D photographs was achieved with the DiffusionNet-based approach. The proposed method allows quantitative analysis of large datasets and may be used in diagnosis, follow-up, and virtual surgical planning.
... Feature extractor. We employ a feature extraction module based on Diffusion-Net [SACO22], a geometric neural network that relies on intrinsic operations. This architecture has been shown robust to various shape discretizations and invariant to isometries. ...
Article
Full-text available
Learning functions defined on non‐flat domains, such as outer surfaces of non‐rigid shapes, is a central task in computer vision and geometry processing. Recent studies have explored the use of neural fields to represent functions like light reflections in volumetric domains and textures on curved surfaces by operating in the embedding space. Here, we choose a different line of thought and introduce a novel formulation of partial shape matching by learning a piecewise smooth function on a surface. Our method begins with pairing sparse landmarks defined on a full shape and its part, using feature similarity. Next, a neural representation is optimized to fit these landmarks, efficiently interpolating between the matched features that act as anchors. This process results in a function that accurately captures the partiality. Unlike previous methods, the proposed neural model of functions is intrinsically defined on the given curved surface, rather than the classical embedding Euclidean space. This representation is shown to be particularly well‐suited for representing piecewise smooth functions. We further extend the proposed framework to the more challenging part‐to‐part setting, where both shapes exhibit missing parts. Comprehensive experiments highlight that the proposed method effectively addresses partiality in shape matching and significantly outperforms leading state‐of‐the‐art methods in challenging benchmarks. Code is available at https://github.com/davidgip74/Learning-Partiality-with-Implicit-Intrinsic-Functions
Preprint
Full-text available
Constructing well-behaved Laplacian and mass matrices is essential for tetrahedral mesh processing. Unfortunately, the \emph{de facto} standard linear finite elements exhibit bias on tetrahedralized regular grids, motivating the development of finite-volume methods. In this paper, we place existing methods into a common construction, showing how their differences amount to the choice of simplex centers. These choices lead to satisfaction or breakdown of important properties: continuity with respect to vertex positions, positive semi-definiteness of the implied Dirichlet energy, positivity of the mass matrix, and unbiased-ness on regular grids. Based on this analysis, we propose a new method for constructing dual-volumes which explicitly satisfy all of these properties via convex optimization.
Article
Full-text available
Mesh texture synthesis is a key component in the automatic generation of 3D content. Existing learning‐based methods have drawbacks—either by disregarding the shape manifold during texture generation or by requiring a large number of different views to mitigate occlusion‐related inconsistencies. In this paper, we present a novel surface‐aware approach for mesh texture synthesis that overcomes these drawbacks by leveraging the pre‐trained weights of 2D Convolutional Neural Networks (CNNs) with the same architecture, but with convolutions designed for 3D meshes. Our proposed network keeps track of the oriented patches surrounding each texel, enabling seamless texture synthesis and retaining local similarity to classical 2D convolutions with square kernels. Our approach allows us to synthesize textures that account for the geometric content of mesh surfaces, eliminating discontinuities and achieving comparable quality to 2D image synthesis algorithms. We compare our approach with state‐of‐the‐art methods where, through qualitative and quantitative evaluations, we demonstrate that our approach is more effective for a variety of meshes and styles, while also producing visually appealing and consistent textures on meshes.
Article
Full-text available
Three-dimensional facial stereophotogrammetry provides a detailed representation of craniofacial soft tissue without the use of ionizing radiation. While manual annotation of landmarks serves as the current gold standard for cephalometric analysis, it is a time-consuming process and is prone to human error. The aim in this study was to develop and evaluate an automated cephalometric annotation method using a deep learning-based approach. Ten landmarks were manually annotated on 2897 3D facial photographs. The automated landmarking workflow involved two successive DiffusionNet models. The dataset was randomly divided into a training and test dataset. The precision of the workflow was evaluated by calculating the Euclidean distances between the automated and manual landmarks and compared to the intra-observer and inter-observer variability of manual annotation and a semi-automated landmarking method. The workflow was successful in 98.6% of all test cases. The deep learning-based landmarking method achieved precise and consistent landmark annotation. The mean precision of 1.69 ± 1.15 mm was comparable to the inter-observer variability (1.31 ± 0.91 mm) of manual annotation. Automated landmark annotation on 3D photographs was achieved with the DiffusionNet-based approach. The proposed method allows quantitative analysis of large datasets and may be used in diagnosis, follow-up, and virtual surgical planning.
Article
3D shape segmentation is a fundamental and crucial task in the field of image processing and 3D shape analysis. To segment 3D shapes using data-driven methods, a fully labeled dataset is usually required. However, obtaining such a dataset can be a daunting task, as manual face-level labeling is both time-consuming and labor-intensive. In this paper, we present a semi-supervised framework for 3D shape segmentation that uses a small, fully labeled set of 3D shapes, as well as a weakly labeled set of 3D shapes with sparse scribble labels. Our framework first employs an auxiliary network to generate initial fully labeled segmentation labels for the sparsely labeled dataset, which helps in training the primary network. During training, the self-refine module uses increasingly accurate predictions of the primary network to improve the labels generated by the auxiliary network. Our proposed method achieves better segmentation performance than previous semi-supervised methods, as demonstrated by extensive benchmark tests, while also performing comparably to supervised methods.
Chapter
Calculating correspondences between non-rigidly deformed shapes is the backbone of many applications in 3D computer vision and graphics. The functional map approach offers an efficient solution to this problem and has been very popular in learning frameworks due to its low-dimensional and continuous nature. However, most methods rely on the eigenfunctions of the Laplace-Beltrami operator as a basis for the underlying function spaces. While these have many advantages, they are also sensitive to non-isometric deformations and noise. Recently a method to learn the basis functions along with suitable descriptors has been proposed by Marin et al.. We do an in-depth analysis of the architecture proposed, including a new training scheme to increase robustness against sampling inconsistencies and an extension to unsupervised training which still obtains results on-par with the supervised approach.
Conference Paper
Deep learning-based approaches for three-dimensional (3D) grid understanding and processing tasks have been extensively studied in recent years. Despite the great success in various scenarios, the existing approaches fail to effectively utilize the velocity information in the flow field, resulting in the actual requirements of post-processing tasks being difficult to meet by the extracted features. To fully integrate structural information in the 3D grid and velocity information, this paper constructs a flow-field-aware network (FFANet) for 3D grid classification and segmentation tasks. The main innovations include: (i) using the self-attention mechanism to build a multi-scale feature learning network to learn the distribution feature of the velocity field and structure feature of different scales in the 3D flow field grid, respectively, for generating a global feature with more discriminative representation information; (ii) constructing a fine-grained semantic learning network based on a co-attention mechanism to adaptively learn the weight matrix between the above two features to enhance the effective semantic utilization of the global feature; (iii) according to the practical requirements of post-processing in numerical simulation, we designed two downstream tasks: 1) surface grid identification task and 2) feature edge extraction task. The experimental results show that the accuracy (Acc) and intersection-over-union (IoU) performance of the FFANet compared favourably to the 3D mesh data analysis approaches.
Article
3D mesh classification deep neural network (3D DNN) has been widely applied in many safety-critical domains. Backdoor attack is a serious threat that occurs during the training stage. Previous backdoor attacks from 2D image and 3D point cloud domains are not suitable for 3D mesh due to data structure restrictions. Therefore, in a pioneering effort, this paper presents two types of backdoor attacks on 3D mesh. Specifically, the first attack is a Mesh Geometrical Feature guided 3D Mesh Backdoor Attack named MGF-MBA . Most 3D DNNs have to convert 3D mesh to a regular matrix (mesh geometrical feature), which is a refinement of the input 3D mesh. The 3D DNN directly learns the 3D shape from the mesh geometrical feature, which enables attackers to implant backdoor through it. Hence, the proposed MGF-MBA generates a backdoored 3D mesh under the guidance of mesh geometrical feature. The second attack is a Remeshing based 3D Mesh Backdoor Attack named ReMBA . The quality of samples backdoored by exiting backdoor attacks always decrease. Although many efforts have been made to reduce the descent in quality in return for stealthiness, the descent persists. For better stealthiness, we regard the backdoor implantation process as a way to increase the quality of backdoored sample rather than a way to reduce it. Specifically, ReMBA designs a new isotropic remeshing method that attempts to represent a 3D mesh by equilateral triangles while keeping the number of vertices, edges and faces unchanged. Numerous experimental results show that both MGF-MBA and ReMBA achieve guaranteed attack performance on 3D DNNs. Furthermore, transferability experiments demonstrate that ReMBA can even attack 3D point cloud networks with an increased ability to resist defenses.
Article
K-surfaces are an interactive modeling technique for Bézier-spline surfaces. Inspired by k -curves by [Yan et al. 2017], each patch provides a single control point that is being interpolated at a local extremum of Gaussian curvature. The challenge is to solve the inverse problem of finding the center control point of a Bézier patch given the boundary control points and the handle. Unlike the situation in 2D, bi-quadratic Bézier patches may exhibit none, one, or several extrema, and finding them is non-trivial. We solve the difficult inverse problem, including the possible selection among several extrema, by learning the desired function from samples, generated by computing Gaussian curvature of random patches. This approximation provides a stable solution to the ill-defined inverse problem and is much more efficient than direct numerical optimization, facilitating the interactive modeling framework. The local solution is used in an iterative optimization incorporating continuity constraints across patches. We demonstrate that the surface varies smoothly with the handle location and that the resulting modeling system provides local and generally intuitive control. The idea of learning the inverse mapping from handles to patches may be applicable to other parametric surfaces.
Article
Full-text available
A 3D mesh is a popular representation of 3D shapes. For mesh analysis tasks, one typical method is to map 3D mesh data into 1D sequence data with random walk sampling. However, existing random walk-based approaches cannot make full use of attentive regions, which limits the capability of 3D shape analysis. In addition, existing methods process the random walk as sequence data in the discovery order, which results in computational overhead. In this paper, we propose a novel neural framework named WalkFormer, which applies a transformer to a random walk to fully exploit semantic information in a 3D mesh. First, we propose a novel transformer-based framework to learn semantic information from a random walk of a 3D mesh. Second, to capture the attentive regions of the random walk, our approach extends the multi-head self-attention mechanism to specific 3D mesh analysis tasks. To establish the long-range interactions between vertices in the random walk, our approach adopts a novel relative position encoding module. Thus, the local–global information in the random walk can be obtained and learned in our approach. Third, we discover that for 3D mesh analysis, the sequential operations for the random walk sequence are redundant. Different from previous random walk methods, our approach can be executed in a parallelized manner, which greatly improves computational efficiency. Numerous experiments demonstrate the effectiveness of the proposed method on typical 3D shape analysis tasks.
Article
Full-text available
In physics‐based cloth animation, rich folds and detailed wrinkles are achieved at the cost of expensive computational resources and huge labor tuning. Data‐driven techniques make efforts to reduce the computation significantly by utilizing a preprocessed database. One type of methods relies on human poses to synthesize fitted garments, but these methods cannot be applied to general cloth animations. Another type of methods adds details to the coarse meshes obtained through simulation, which does not have such restrictions. However, existing works usually utilize coordinate‐based representations which cannot cope with large‐scale deformation, and requires dense vertex correspondences between coarse and fine meshes. Moreover, as such methods only add details, they require coarse meshes to be sufficiently close to fine meshes, which can be either impossible, or require unrealistic constraints to be applied when generating fine meshes. To address these challenges, we develop a temporally and spatially as‐consistent‐as‐possible deformation representation (named TS‐ACAP) and design a DeformTransformer network to learn the mapping from low‐resolution meshes to ones with fine details. This TS‐ACAP representation is designed to ensure both spatial and temporal consistency for sequential large‐scale deformations from cloth animations. With this TS‐ACAP representation, our DeformTransformer network first utilizes two mesh‐based encoders to extract the coarse and fine features using shared convolutional kernels, respectively. To transduct the coarse features to the fine ones, we leverage the spatial and temporal Transformer network that consists of vertex‐level and frame‐level attention mechanisms to ensure detail enhancement and temporal coherence of the prediction. Experimental results show that our method is able to produce reliable and realistic animations in various datasets at high frame rates with superior detail synthesis abilities compared to existing methods.
Article
Full-text available
We describe HalfedgeCNN, a collection of modules to build neural networks that operate on triangle meshes. Taking inspiration from the (edge‐based) MeshCNN, convolution, pooling, and unpooling layers are consistently defined on the basis of halfedges of the mesh, pairs of oppositely oriented virtual instances of each edge. This provides benefits over alternative definitions on the basis of vertices, edges, or faces. Additional interface layers enable support for feature data associated with such mesh entities in input and output as well. Due to being defined natively on mesh entities and their neighborhoods, lossy resampling or interpolation techniques (to enable the application of operators adopted from image domains) do not need to be employed. The operators have various degrees of freedom that can be exploited to adapt to application‐specific needs.
Article
Full-text available
Graph neural networks (GNNs) extend the functionality of traditional neural networks to graph-structured data. Similar to CNNs, an optimized design of graph convolution and pooling is key to success. Borrowing ideas from physics, we propose path integral-based GNNs (PAN) for classification and regression tasks on graphs. Specifically, we consider a convolution operation that involves every path linking the message sender and receiver with learnable weights depending on the path length, which corresponds to the maximal entropy random walk. It generalizes the graph Laplacian to a new transition matrix that we call the maximal entropy transition (MET) matrix derived from a path integral formalism. Importantly, the diagonal entries of the MET matrix are directly related to the subgraph centrality, thus leading to a natural and adaptive pooling mechanism. PAN provides a versatile framework that can be tailored for different graph data with varying sizes and structures. We can view most existing GNN architectures as special cases of PAN. Experimental results show that PAN achieves state-of-the-art performance on various graph classification/regression tasks, including a new benchmark dataset from statistical mechanics that we propose to boost applications of GNN in physical sciences.
Article
Full-text available
The Euclidean scattering transform was introduced nearly a decade ago to improve the mathematical understanding of convolutional neural networks. Inspired by recent interest in geometric deep learning, which aims to generalize convolutional neural networks to manifold and graph-structured domains, we define a geometric scattering transform on manifolds. Similar to the Euclidean scattering transform, the geometric scattering transform is based on a cascade of wavelet filters and pointwise nonlinearities. It is invariant to local isometries and stable to certain types of diffeomorphisms. Empirical results demonstrate its utility on several geometric learning tasks. Our results generalize the deformation stability and local translation invariance of Euclidean scattering, and demonstrate the importance of linking the used filter structures to the underlying geometry of the data.
Conference Paper
Full-text available
Cell complexes are topological spaces constructed from simple blocks called cells. They generalize graphs, simplicial complexes, and polyhedral complexes that form important domains for practical applications. They also provide a combinatorial formalism that allows the inclusion of complicated relationships of restrictive structures such as graphs and meshes. In this paper, we propose Cell Complexes Neural Networks (CXNs), a general, combinatorial and unifying construction for performing neural network-type computations on cell complexes. We introduce an inter-cellular message passing scheme on cell complexes that takes the topology of the underlying space into account and generalizes message passing scheme to graphs. Finally, we introduce a unified cell complex encoder-decoder framework that enables learning representation of cells for a given complex inside the Euclidean spaces. In particular, we show how our cell complex autoencoder construction can give, in the special casecell2vec, a generalization for node2vec
Article
Full-text available
Discrete Laplacians for triangle meshes are a fundamental tool in geometry processing. The so‐called cotan Laplacian is widely used since it preserves several important properties of its smooth counterpart. It can be derived from different principles: either considering the piecewise linear nature of the primal elements or associating values to the dual vertices. Both approaches lead to the same operator in the two‐dimensional setting. In contrast, for tetrahedral meshes, only the primal construction is reminiscent of the cotan weights, involving dihedral angles. We provide explicit formulas for the lesser‐known dual construction. In both cases, the weights can be computed by adding the contributions of individual tetrahedra to an edge. The resulting two different discrete Laplacians for tetrahedral meshes only retain some of the properties of their two‐dimensional counterpart. In particular, while both constructions have linear precision, only the primal construction is positive semi‐definite and only the dual construction generates positive weights and provides a maximum principle for Delaunay meshes. We perform a range of numerical experiments that highlight the benefits and limitations of the two constructions for different problems and meshes.
Article
Full-text available
We describe a discrete Laplacian suitable for any triangle mesh, including those that are nonmanifold or nonorientable (with or without boundary). Our Laplacian is a robust drop‐in replacement for the usual cotan matrix, and is guaranteed to have nonnegative edge weights on both interior and boundary edges, even for extremely poor‐quality meshes. The key idea is to build what we call a “tufted cover” over the input domain, which has nonmanifold vertices but manifold edges. Since all edges are manifold, we can flip to an intrinsic Delaunay triangulation; our Laplacian is then the cotan Laplacian of this new triangulation. This construction also provides a high‐quality point cloud Laplacian, via a nonmanifold triangulation of the point set. We validate our Laplacian on a variety of challenging examples (including all models from Thingi10k), and a variety of standard tasks including geodesic distance computation, surface deformation, parameterization, and computing minimal surfaces.
Article
Full-text available
The discrete Laplace‐Beltrami operator for surface meshes is a fundamental building block for many (if not most) geometry processing algorithms. While Laplacians on triangle meshes have been researched intensively, yielding the cotangent discretization as the de‐facto standard, the case of general polygon meshes has received much less attention. We present a discretization of the Laplace operator which is consistent with its expression as the composition of divergence and gradient operators, and is applicable to general polygon meshes, including meshes with non‐convex, and even non‐planar, faces. By virtually inserting a carefully placed point we implicitly refine each polygon into a triangle fan, but then hide the refinement within the matrix assembly. The resulting operator generalizes the cotangent Laplacian, inherits its advantages, and is empirically shown to be on par or even better than the recent polygon Laplacian of Alexa and Wardetzky [AW11] — while being simpler to compute.
Article
Full-text available
In this paper, we propose a novel formulation extending convolutional neural networks (CNN) to arbitrary two‐dimensional manifolds using orthogonal basis functions called Zernike polynomials. In many areas, geometric features play a key role in understanding scientific trends and phenomena, where accurate numerical quantification of geometric features is critical. Recently, CNNs have demonstrated a substantial improvement in extracting and codifying geometric features. However, the progress is mostly centred around computer vision and its applications where an inherent grid‐like data representation is naturally present. In contrast, many geometry processing problems deal with curved surfaces and the application of CNNs is not trivial due to the lack of canonical grid‐like representation, the absence of globally consistent orientation and the incompatible local discretizations. In this paper, we show that the Zernike polynomials allow rigourous yet practical mathematical generalization of CNNs to arbitrary surfaces. We prove that the convolution of two functions can be represented as a simple dot product between Zernike coefficients and the rotation of a convolution kernel is essentially a set of 2 × 2 rotation matrices applied to the coefficients. The key contribution of this work is in such a computationally efficient but rigorous generalization of the major CNN building blocks.
Article
Full-text available
Deep learning methods have achieved great success in analyzing traditional data such as texts, sounds, images and videos. More and more research works are carrying out to extend standard deep learning technologies to geometric data such as point cloud or voxel grid of 3D objects, real life networks such as social and citation network. Many methods have been proposed in the research area. In this work, we aim to provide a comprehensive survey of geometric deep learning and related methods. First, we introduce the relevant knowledge and history of geometric deep learning field as well as the theoretical background. In the method part, we review different graph network models for graphs and manifold data. Besides, practical applications of these methods, datasets currently available in different research area and the problems and challenges are also summarized.
Article
Full-text available
SciPy is an open-source scientific computing library for the Python programming language. Since its initial release in 2001, SciPy has become a de facto standard for leveraging scientific algorithms in Python, with over 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories and millions of downloads per year. In this work, we provide an overview of the capabilities and development practices of SciPy 1.0 and highlight some recent technical developments. This Perspective describes the development and capabilities of SciPy 1.0, an open source scientific computing library for the Python programming language.
Article
Full-text available
Mesh is an important and powerful type of data for 3D shapes and widely studied in the field of computer vision and computer graphics. Regarding the task of 3D shape representation, there have been extensive research efforts concentrating on how to represent 3D shapes well using volumetric grid, multi-view and point cloud. However, there is little effort on using mesh data in recent years, due to the complexity and irregularity of mesh data. In this paper, we propose a mesh neural network, named MeshNet, to learn 3D shape representation from mesh data. In this method, face-unit and feature splitting are introduced, and a general architecture with available and effective blocks are proposed. In this way, MeshNet is able to solve the complexity and irregularity problem of mesh and conduct 3D shape representation well. We have applied the proposed MeshNet method in the applications of 3D shape classification and retrieval. Experimental results and comparisons with the state-of-the-art methods demonstrate that the proposed MeshNet can achieve satisfying 3D shape classification and retrieval performance, which indicates the effectiveness of the proposed method on 3D shape representation.
Conference Paper
Full-text available
We propose a method for efficiently computing orientation-preserving and approximately continuous correspondences between non-rigid shapes, using the functional maps framework. We first show how orientation preservation can be formulated directly in the functional (spectral) domain without using landmark or region correspondences and without relying on external symmetry information. This allows us to obtain functional maps that promote orientation preservation, even when using descriptors, that are invariant to orientation changes. We then show how higher quality, approximately continuous and bijective pointwise correspondences can be obtained from initial functional maps by introducing a novel refinement technique that aims to simultaneously improve the maps both in the spectral and spatial domains. This leads to a general pipeline for computing correspondences between shapes that results in high-quality maps, while admitting an efficient optimization scheme. We show through extensive evaluation that our approach improves upon state-of-the-art results on challenging isometric and non-isometric correspondence benchmarks according to both measures of continuity and coverage as well as producing semantically meaningful correspondences as measured by the distance to ground truth maps.
Conference Paper
Full-text available
We present PPF-FoldNet for unsupervised learning of 3D local descriptors on pure point cloud geometry. Based on the folding-based auto-encoding of well known point pair features, PPF-FoldNet offers many desirable properties: it necessitates neither supervision, nor a sensitive local reference frame, benefits from point-set sparsity, is end-to-end, fast, and can extract powerful rotation invariant descriptors. Thanks to a novel feature visualization, its evolution can be monitored to provide interpretable insights. Our extensive experiments demonstrate that despite having six degree-of-freedom invariance and lack of training labels, our network achieves state of the art results in standard benchmark datasets and outperforms its competitors when rotations and varying point densities are present. PPF-FoldNet achieves 9% higher recall on standard benchmarks, 23% higher recall when rotations are introduced into the same datasets and finally, a margin of >35% is attained when point density is significantly decreased.
Article
The efficient treatment of long-range interactions (LRIs) for point clouds is a challenging problem in many scientific machine learning applications. To extract global information, one usually needs a large window size, a large number of layers, and/or a large number of channels. This can often significantly increase the computational cost. In this work, we present a novel neural network layer that directly incorporates long-range information for a point cloud. This layer, dubbed the long-range convolutional (LRC)-layer, leverages the convolutional theorem coupled with the non-uniform Fourier transform. In a nutshell, the LRC-layer mollifies the point cloud to an adequately sized regular grid, computes its Fourier transform, multiplies the result by a set of trainable Fourier multipliers, computes the inverse Fourier transform, and finally interpolates the result back to the point cloud. The resulting global all-to-all convolution operation can be performed in nearly-linear time asymptotically with respect to the number of input points. The LRC-layer is a particularly powerful tool when combined with local convolution as together they offer efficient and seamless treatment of both short- and long-range interactions. We showcase this framework by introducing a neural network architecture that combines LRC-layers with short-range convolutional layers to accurately learn the energy and force associated with a N-body potential. We also exploit the induced two-level decomposition and propose an efficient strategy to train the combined architecture with a reduced number of samples.
Article
Constrained by the limitations of learning toolkits engineered for other applications, such as those in image processing, many mesh-based learning algorithms employ data flows that would be atypical from the perspective of conventional geometry processing. As an alternative, we present a technique for learning from meshes built from standard geometry processing modules and operations. We show that low-order eigenvalue/eigenvector computation from operators parameterized using discrete exterior calculus is amenable to efficient approximate backpropagation, yielding spectral per-element or per-mesh features with similar formulas to classical descriptors like the heat/wave kernel signatures. Our model uses few parameters, generalizes to high-resolution meshes, and exhibits performance and time complexity on par with past work.
Article
This paper introduces a novel geometric multigrid solver for unstructured curved surfaces. Multigrid methods are highly efficient iterative methods for solving systems of linear equations. Despite the success in solving problems defined on structured domains, generalizing multigrid to unstructured curved domains remains a challenging problem. The critical missing ingredient is a prolongation operator to transfer functions across different multigrid levels. We propose a novel method for computing the prolongation for triangulated surfaces based on intrinsic geometry, enabling an efficient geometric multigrid solver for curved surfaces. Our surface multigrid solver achieves better convergence than existing multigrid methods. Compared to direct solvers, our solver is orders of magnitude faster. We evaluate our method on many geometry processing applications and a wide variety of complex shapes with and without boundaries. By simply replacing the direct solver, we upgrade existing algorithms to interactive frame rates, and shift the computational bottleneck away from solving linear systems.
Article
Recently, many deep neural networks were designed to process 3D point clouds, but a common drawback is that rotation invariance is not ensured, leading to poor generalization to arbitrary orientations. In this paper, we introduce a new low-level purely rotation-invariant representation to replace common 3D Cartesian coordinates as the network inputs. Also, we present a network architecture to embed these representations into features, encoding local relations between points and their neighbors, and the global shape structure. To alleviate inevitable global information loss caused by the rotation-invariant representations, we further introduce a region relation convolution to encode local and non-local information. We evaluate our method on multiple point cloud analysis tasks, including (i) shape classification, (ii) part segmentation, and (iii) shape retrieval. Extensive experimental results show that our method achieves consistent, and also the best performance, on inputs at arbitrary orientations, compared with all the state-of-the-art methods.
Article
Researchers are pushing beyond the limitations of convolutional neural networks using geometric deep learning techniques.
Article
Most attempts to represent 3D shapes for deep learning have focused on volumetric grids, multi-view images and point clouds. In this paper we look at the most popular representation of 3D shapes in computer graphics - -a triangular mesh - -and ask how it can be utilized within deep learning. The few attempts to answer this question propose to adapt convolutions & pooling to suit Convolutional Neural Networks (CNNs). This paper proposes a very different approach, termed MeshWalker to learn the shape directly from a given mesh. The key idea is to represent the mesh by random walks along the surface, which "explore"the mesh's geometry and topology. Each walk is organized as a list of vertices, which in some manner imposes regularity on the mesh. The walk is fed into a Recurrent Neural Network (RNN) that "remembers"the history of the walk. We show that our approach achieves state-of-the-art results for two fundamental shape analysis tasks: shape classification and semantic segmentation. Furthermore, even a very small number of examples suffices for learning. This is highly important, since large datasets of meshes are difficult to acquire.
Chapter
We present a 3D capsule module for processing point clouds that is equivariant to 3D rotations and translations, as well as invariant to permutations of the input points. The operator receives a sparse set of local reference frames, computed from an input point cloud and establishes end-to-end transformation equivariance through a novel dynamic routing procedure on quaternions. Further, we theoretically connect dynamic routing between capsules to the well-known Weiszfeld algorithm, a scheme for solving iterative re-weighted least squares (IRLS) problems with provable convergence properties. It is shown that such group dynamic routing can be interpreted as robust IRLS rotation averaging on capsule votes, where information is routed based on the final inlier scores. Based on our operator, we build a capsule network that disentangles geometry from pose, paving the way for more informative descriptors and a structured latent space. Our architecture allows joint object classification and orientation estimation without explicit supervision of rotations. We validate our algorithm empirically on common benchmark datasets.
Chapter
We present the first spatial-spectral joint consistency network for self-supervised dense correspondence mapping between non-isometric shapes. The task of alignment in non-Euclidean domains is one of the most fundamental and crucial problems in computer vision. As 3D scanners can generate highly complex and dense models, the mission of finding dense mappings between those models is vital. The novelty of our solution is based on a cyclic mapping between metric spaces, where the distance between a pair of points should remain invariant after the full cycle. As the same learnable rules that generate the point-wise descriptors apply in both directions, the network learns invariant structures without any labels while coping with non-isometric deformations. We show here state-of-the-art-results by a large margin for a variety of tasks compared to known self-supervised and supervised methods .
Article
This paper is concerned with a fundamental problem in geometric deep learning that arises in the construction of convolutional neural networks on surfaces. Due to curvature, the transport of filter kernels on surfaces results in a rotational ambiguity, which prevents a uniform alignment of these kernels on the surface. We propose a network architecture for surfaces that consists of vector-valued, rotation-equivariant features. The equivariance property makes it possible to locally align features, which were computed in arbitrary coordinate systems, when aggregating features in a convolution layer. The resulting network is agnostic to the choices of coordinate systems for the tangent spaces on the surface. We implement our approach for triangle meshes. Based on circular harmonic functions, we introduce convolution filters for meshes that are rotation-equivariant at the discrete level. We evaluate the resulting networks on shape correspondence and shape classifications tasks and compare their performance to other approaches.
Conference Paper
Graph convolution is the core of most Graph Neural Networks (GNNs) and usually approximated by message passing between direct (one-hop) neighbors. In this work, we remove the restriction of using only the direct neighbors by introducing a powerful, yet spatially localized graph convolution: Graph diffusion convolution (GDC). GDC leverages generalized graph diffusion, examples of which are the heat kernel and personalized PageRank. It alleviates the problem of noisy and often arbitrarily defined edges in real graphs. We show that GDC is closely related to spectral-based models and thus combines the strengths of both spatial (message passing) and spectral methods. We demonstrate that replacing message passing with graph diffusion convolution consistently leads to significant performance improvements across a wide range of models on both supervised and unsupervised tasks and a variety of datasets. Furthermore, GDC is not limited to GNNs but can trivially be combined with any graph-based model or algorithm (e.g. spectral clustering) without requiring any changes to the latter or affecting its computational complexity. Our implementation is available online.
Article
We present a simple and efficient method for refining maps or correspondences by iterative upsampling in the spectral domain that can be implemented in a few lines of code. Our main observation is that high quality maps can be obtained even if the input correspondences are noisy or are encoded by a small number of coefficients in a spectral basis. We show how this approach can be used in conjunction with existing initialization techniques across a range of application scenarios, including symmetry detection, map refinement across complete shapes, non-rigid partial shape matching and function transfer. In each application we demonstrate an improvement with respect to both the quality of the results and the computational speed compared to the best competing methods, with up to two orders of magnitude speed-up in some applications. We also demonstrate that our method is both robust to noisy input and is scalable with respect to shape complexity. Finally, we present a theoretical justification for our approach, shedding light on structural properties of functional maps.
Conference Paper
This paper proposes a unified and consistent set of flexible tools to approximate important geometric attributes, including normal vectors and curvatures on arbitrary triangle meshes. We present a consistent derivation of these first and second order differential properties using averaging Voronoi cells and the mixed Finite-Element/Finite-Volume method, and compare them to existing formulations. Building upon previous work in discrete geometry, these operators are closely related to the continuous case, guaranteeing an appropriate extension from the continuous to the discrete setting: they respect most intrinsic properties of the continuous differential operators. We show that these estimates are optimal in accuracy under mild smoothness conditions, and demonstrate their numerical quality. We also present applications of these operators, such as mesh smoothing, enhancement, and quality checking, and show results of denoising in higher dimensions, such as for tensor images.
Conference Paper
Graph convolutional networks gain remarkable success in semi-supervised learning on graph-structured data. The key to graph-based semisupervised learning is capturing the smoothness of labels or features over nodes exerted by graph structure. Previous methods, spectral methods and spatial methods, devote to defining graph convolution as a weighted average over neighboring nodes, and then learn graph convolution kernels to leverage the smoothness to improve the performance of graph-based semi-supervised learning. One open challenge is how to determine appropriate neighborhood that reflects relevant information of smoothness manifested in graph structure. In this paper, we propose GraphHeat, leveraging heat kernel to enhance low-frequency filters and enforce smoothness in the signal variation on the graph. GraphHeat leverages the local structure of target node under heat diffusion to determine its neighboring nodes flexibly, without the constraint of order suffered by previous methods. GraphHeat achieves state-of-the-art results in the task of graph-based semi-supervised classification across three benchmark datasets: Cora, Citeseer and Pubmed.
Article
Polygonal meshes provide an efficient representation for 3D shapes. They explicitly captureboth shape surface and topology, and leverage non-uniformity to represent large flat regions as well as sharp, intricate features. This non-uniformity and irregularity, however, inhibits mesh analysis efforts using neural networks that combine convolution and pooling operations. In this paper, we utilize the unique properties of the mesh for a direct analysis of 3D shapes using MeshCNN, a convolutional neural network designed specifically for triangular meshes. Analogous to classic CNNs, MeshCNN combines specialized convolution and pooling layers that operate on the mesh edges, by leveraging their intrinsic geodesic connections. Convolutions are applied on edges and the four edges of their incident triangles, and pooling is applied via an edge collapse operation that retains surface topology, thereby, generating new mesh connectivity for the subsequent convolutions. MeshCNN learns which edges to collapse, thus forming a task-driven process where the network exposes and expands the important features while discarding the redundant ones. We demonstrate the effectiveness of MeshCNN on various learning tasks applied to 3D meshes.
Article
This article describes a method for efficiently computing parallel transport of tangent vectors on curved surfaces, or more generally, any vector-valued data on a curved manifold. More precisely, it extends a vector field defined over any region to the rest of the domain via parallel transport along shortest geodesics. This basic operation enables fast, robust algorithms for extrapolating level set velocities, inverting the exponential map, computing geometric medians and Karcher/Fréchet means of arbitrary distributions, constructing centroidal Voronoi diagrams, and finding consistently ordered landmarks. Rather than evaluate parallel transport by explicitly tracing geodesics, we show that it can be computed via a short-time heat flow involving the connection Laplacian. As a result, transport can be achieved by solving three prefactored linear systems, each akin to a standard Poisson problem. To implement the method, we need only a discrete connection Laplacian, which we describe for a variety of geometric data structures (point clouds, polygon meshes, etc.). We also study the numerical behavior of our method, showing empirically that it converges under refinement, and augment the construction of intrinsic Delaunay triangulations so that they can be used in the context of tangent vector field processing.
Chapter
The question of representation of 3D geometry is of vital importance when it comes to leveraging the recent advances in the field of machine learning for geometry processing tasks. For common unstructured surface meshes state-of-the-art methods rely on patch-based or mapping-based techniques that introduce resampling operations in order to encode neighborhood information in a structured and regular manner. We investigate whether such resampling can be avoided, and propose a simple and direct encoding approach. It does not only increase processing efficiency due to its simplicity – its direct nature also avoids any loss in data fidelity. To evaluate the proposed method, we perform a number of experiments in the challenging domain of intrinsic, non-rigid shape correspondence estimation. In comparisons to current methods we observe that our approach is able to achieve highly competitive results.
Conference Paper
We propose a novel approach for performing convolution of signals on curved surfaces and show its utility in a variety of geometric deep learning applications. Key to our construction is the notion of directional functions defined on the surface, which extend the classic real-valued signals and which can be naturally convolved with with real-valued template functions. As a result, rather than trying to fix a canonical orientation or only keeping the maximal response across all alignments of a 2D template at every point of the surface, as done in previous works, we show how information across all rotations can be kept across different layers of the neural network. Our construction, which we call multi-directional geodesic convolution, or directional convolution for short, allows, in particular, to propagate and relate directional information across layers and thus different regions on the shape. We first define directional convolution in the continuous setting, prove its key properties and then show how it can be implemented in practice, for shapes represented as triangle meshes. We evaluate directional convolution in a wide variety of learning scenarios ranging from classification of signals on surfaces, to shape segmentation and shape matching, where we show a significant improvement over several baselines.
Chapter
We present a new deep learning approach for matching deformable shapes by introducing Shape Deformation Networks which jointly encode 3D shapes and correspondences. This is achieved by factoring the surface representation into (i) a template, that parameterizes the surface, and (ii) a learnt global feature vector that parameterizes the transformation of the template into the input surface. By predicting this feature for a new shape, we implicitly predict correspondences between this shape and the template. We show that these correspondences can be improved by an additional step which improves the shape feature by minimizing the Chamfer distance between the input and transformed template. We demonstrate that our simple approach improves on state-of-the-art results on the difficult FAUST-inter challenge, with an average correspondence error of 2.88 cm. We show, on the TOSCA dataset, that our method is robust to many types of perturbations, and generalizes to non-human shapes. This robustness allows it to perform well on real unclean, meshes from the SCAPE dataset.