Table 1 - uploaded by T. Lim
Content may be subject to copyright.
Reconstruction Results.

Reconstruction Results.

Source publication
Article
Full-text available
When working with three-dimensional data, choice of representation is key. We explore voxel-based models, and present evidence for the viability of voxellated representations in applications including shape modeling and object classification. Our key contributions are methods for training voxel-based variational autoencoders, a user interface for e...

Context in source publication

Context 1
... reconstruction accuracy of our fully-trained VAE, evaluated on the ModelNet10 test set, is displayed in Table 1. The model attains a 99.39% true positive and 92.36% true negative recon- struction accuracy on the ModelNet10 test set, indicating that it learns to reconstruct with high fidelity, but tends to slightly overestimate the probability of a voxel being present. ...

Similar publications

Preprint
Full-text available
The Latent Space Clustering in Generative adversarial networks (ClusterGAN) method has been successful with high-dimensional data. However, the method assumes uniformlydistributed priors during the generation of modes, which isa restrictive assumption in real-world data and cause loss ofdiversity in the generated modes. In this paper, we proposesel...
Article
Full-text available
Sensor data analysis is used in many application areas, for example, Artificial Intelligence of Things (AIoT), with the rapid developing of the deep neural network learning that promotes its application area. In this work, we propose the Depth and Width Changeable Deep Kernel Learning-based hyperspectral sensing data analysis algorithm. Compared wi...
Article
Full-text available
The state-of-the-art image classification models, generally including feature coding and pooling, have been widely adopted to generate discriminative and robust image representations. However, the coding schemes available in these models only preserve salient features which results in information loss in the process of generating final image repres...
Article
Full-text available
Training a deep architecture using a ranking loss has become standard for the person re-identification task. Increasingly, these deep architectures include additional components that leverage part detections, attribute predictions, pose estimators and other auxiliary information, in order to more effectively localize and align discriminative image...
Chapter
Full-text available
We consider the semi-supervised multi-class classification problem of learning from sparse labelled and abundant unlabelled training data. To address this problem, existing semi-supervised deep learning methods often rely on the up-to-date “network-in-training” to formulate the semi-supervised learning objective. This ignores both the discriminativ...

Citations

... The projection-based method involves projecting threedimensional spatial information into two-dimensional images to learn point cloud features (Su et al., 2015;Aksoy et al., 2020). Voxel-based methods involve converting point clouds into voxel grids, similar to image pixels, and then using deep convolutional networks for feature extraction (Maturana et al., 2015;Brock et al., 2016;Zhu et al., 2022). However, these methods can cause discretization artifacts and information loss during the data conversion or projection process, affecting the segmentation performance. ...
... The KL divergence loss makes sure that we penalize the learned distribution if it differs from the input distribution. We use a specialized form of Binary Cross-Entropy (BCE) proposed by Brock et.al [2] as reconstruction loss for the VAE model. ...
Preprint
Full-text available
Motion planning for robot manipulators in a cluttered environment is one of the most researched areas in the field of robotics.For high degrees of freedom (DOF) robotic systems, optimization-based motion planners are often preferred for trajectory planning, as they are computationally efficient and provide a smooth, locally optimal solution.To converge to an optimal solution, an optimization-based motion planner needs a good initial trajectory. Finding an initial trajectory, which is not far away from the basin of attraction of the optimum is not a trivial task. In this work, we propose an Initial Trajectory Prediction Network (ITPNet), a deep neural network framework for predicting an initial trajectory to warm start optimization-based motion planners.Given a planning task in the form of task and environment features, the ITPNet predicts the best initial trajectory for warm starting an optimization-based motion planner. Two different task features: joint and Cartesian features, and three types of environment features extracted using: Principal Component Analysis (PCA), Variational Autoencoder (VAE), and Signed Spatial Distance (SSD) techniques are compared.The learned models are evaluated on an upper-torso humanoid system in two different scenarios. The results show that the model, using the Cartesian task and the SSD-based environment features, efficiently learns the mapping between the planning tasks and the optimal trajectories. Warm-starting the planner with the predicted initial trajectory, even for an unseen environment results in a higher success rate and requires fewer iterations.
... In order to apply the advantages in the eld of deep learning to the eld of point cloud processing, previous researchers have tried various methods to process point cloud data, such as voxelizing point cloud data [8,9]and using multi-view images [10,11] in order to apply them to traditional classical network CNNs, however, the above transformation operations produce a large amount of information loss when converting 3D data to 2D data, and the The transformed data is of high complexity and computationally intensive, which is not the best method for point cloud data processing. PointNet [12] network proposed by Stanford University in 2017 as directly using point cloud data as input data, effectively extracts the individual features of the point cloud, and can effectively extract the global features of the point cloud solves the problems of disorderedness and irregularity of the point cloud data, but still exists the problem of not being able to extract the local features of the point cloud. ...
Preprint
Full-text available
In the field of deep learning, point clouds are used as the basic input format for 3D data, which can provide detailed geometric information about objects in the original 3D space. PointNet + + is a deep learning network that uses point cloud data as an input format, which avoids the losses associated with the previous conversion of point cloud into 3D voxelization as well as a collection of 2D images. Although PointNet + + can directly process point cloud data in various ways, due to the disordered, irregular and unevenly distributed nature of point cloud data, the effect of extracting point cloud features is not ideal, and due to the large amount of point cloud data, it also leads to the training model falling into the local optimal solution, which affects the training results. To address these problems, some effective methods and strategies have emerged in recent years. In this thesis, three methods are proposed on the basis of PointNet + + network: feature similarity-based attention pooling, adaptive regularization term and fixed random seed method to improve the performance of PointNet + + network. Experiments show that the improvement methods proposed in this paper effectively improve the feature extraction accuracy, which in turn improves the accuracy of PointNet + + network for classification on Modelnet40 dataset, with an overall improvement of 0.68% compared with PointNet++.
... [22] uses a B-spline wavelet to represent the geometry and attributes of the compressed point cloud. [25] points out that converting 3D point clouds into 2D maps and using conventional algorithms for compression will lead to the loss of key feature information. ...
Article
Digital twin technology has recently gathered pace in engineering communities as it allows for the convergence of the real structure and its digital counterpart. 3D point cloud data is a more effective way to describe the real world and to reconstruct the digital counterpart than the conventional 2D images or 360-degree images. Large-scale, e.g., city-scale digital twins, typically collect point cloud data via internet-of-things (IoT) devices and transmit it over wireless networks. However, the existing wireless transmission technology can not carry real-time point cloud transmission for digital twin reconstruction due to mass data volume, high processing overheads, and low delay-tolerance. We propose a novel artificial intelligence (AI) powered end-to-end framework, termed AIRec, for efficient digital twin communication from point cloud compression, wireless channel coding, and digital twin reconstruction. AIRec adopts the encoder-decoder architecture. In the encoder, a novel importance-aware pooling scheme is designed to adaptively select important points with learnable thresholds to reduce the transmission volume. We also design a novel noise-aware joint source and channel coding is proposed to adaptively adjust the transmission strategy based on SNR and map the features to error-resilient channel symbols for wireless transmission to achieve a good tradeoff between the transmission rate and reconstruction quality. The decoder can accurately reconstruct the digital twins from the received symbols. Extensive experiments of typical datasets and comparison with baselines show that we achieve a good reconstruction quality under $24\times $ compression ratio.
... Self-supervised learning algorithms employ unlabeled data to infer the hidden structure, which is particularly advantageous when dealing with high-dimensional data, such as images and 3D models, which are significant parameters in building stock energy simulation, yet remain relatively unexplored. Recent studies include the Generative Adversarial Neural Networks (GANs) (Wu et al. 2017), Variational Autoencoders (VAEs) (Brock et al. 2016), Vector Quantized Variational Autoencoders (VQ-VAEs) (Oord, Vinyals, and Kavukcuoglu 2018), and Auto Decoders (Park et al. 2019). These methods employ the latent space as a dimension reduction tool, to learn useful representations of the input dataset, and generate new data that has the implicit features of the datasets (Kleineberg , Fey, and Weichert 2020; Zhuang et al. 2023). ...
... Despite early research suggesting that data compression and the use of massively parallel systems outperformed raw processing of high-dimensional data, we are currently seeing the reverse trend in our data analysis. HAR outperforms 2D projection techniques to object detection, according to Brock et al. and Carreira and Zisserman [47,48]. Fusion across several processing layers and stages appears to outperform all other techniques. ...
Article
Full-text available
Deep learning (DL) has revolutionized advanced digital picture processing, enabling significant advancements in computer vision (CV). However, it is important to note that older CV techniques, developed prior to the emergence of DL, still hold value and relevance. Particularly in the realm of more complex, three-dimensional (3D) data such as video and 3D models, CV and multimedia retrieval remain at the forefront of technological advancements. We provide critical insights into the progress made in developing higher-dimensional qualities through the application of DL, and also discuss the advantages and strategies employed in DL. With the widespread use of 3D sensor data and 3D modeling, the analysis and representation of the world in three dimensions have become commonplace. This progress has been facilitated by the development of additional sensors, driven by advancements in areas such as 3D gaming and self-driving vehicles. These advancements have enabled researchers to create feature description models that surpass traditional two-dimensional approaches. This study reveals the current state of advanced digital picture processing, highlighting the role of DL in pushing the boundaries of CV and multimedia retrieval in handling complex, 3D data.
... This approach enables the same CNN operations used for images to be easily extended to three-dimensional grids, allowing for a seamless transfer of traditional image-based techniques to the realm of shapes. Wu et al. [21] were the first to explore this idea for 3D shapes, and subsequent works have expanded on this approach [22,23,24,25,26]. However, volumetric representations and image-based methods demand significant computational resources and extensive memory usage. ...
Preprint
Full-text available
Computational Fluid Dynamics (CFD) is widely used in different engineering fields, but accurate simulations are dependent upon proper meshing of the simulation domain. While highly refined meshes may ensure precision, they come with high computational costs. Similarly, adaptive remeshing techniques require multiple simulations and come at a great computational cost. This means that the meshing process is reliant upon expert knowledge and years of experience. Automating mesh generation can save significant time and effort and lead to a faster and more efficient design process. This paper presents a machine learning-based scheme that utilizes Graph Neural Networks (GNN) and expert guidance to automatically generate CFD meshes for aircraft models. In this work, we introduce a new 3D segmentation algorithm that outperforms two state-of-the-art models, PointNet++ and PointMLP, for surface classification. We also present a novel approach to project predictions from 3D mesh segmentation models to CAD surfaces using the conformal predictions method, which provides marginal statistical guarantees and robust uncertainty quantification and handling. We demonstrate that the addition of conformal predictions effectively enables the model to avoid under-refinement, hence failure, in CFD meshing even for weak and less accurate models. Finally, we demonstrate the efficacy of our approach through a real-world case study that demonstrates that our automatically generated mesh is comparable in quality to expert-generated meshes and enables the solver to converge and produce accurate results. Furthermore, we compare our approach to the alternative of adaptive remeshing in the same case study and find that our method is 5 times faster in the overall process of simulation. The code and data for this project are made publicly available at https://github.com/ahnobari/AutoSurf.
... Recently, approaches that focus on learning point features with neural networks have been widely investigated to overcome such issues. One category of methods is to project the point cloud to a regular representation, e.g., voxel grids [8], 2D images from multiviews [9,10], or combined [11] that 2/3D convolutional operations can be performed on top of these intermediate data in Euclidean space. Apart from these, PointNet [12] leverages multilayer perceptron (MLP) to obtain pointwise features and a max-pool to aggregate global features but without local information. ...
Article
Full-text available
Purpose: Middle ear infection is the most prevalent inflammatory disease, especially among the pediatric population. Current diagnostic methods are subjective and depend on visual cues from an otoscope, which is limited for otologists to identify pathology. To address this shortcoming, endoscopic optical coherence tomography (OCT) provides both morphological and functional in vivo measurements of the middle ear. However, due to the shadow of prior structures, interpretation of OCT images is challenging and time-consuming. To facilitate fast diagnosis and measurement, improvement in the readability of OCT data is achieved by merging morphological knowledge from ex vivo middle ear models with OCT volumetric data, so that OCT applications can be further promoted in daily clinical settings. Methods: We propose C2P-Net: a two-staged non-rigid registration pipeline for complete to partial point clouds, which are sampled from ex vivo and in vivo OCT models, respectively. To overcome the lack of labeled training data, a fast and effective generation pipeline in Blender3D is designed to simulate middle ear shapes and extract in vivo noisy and partial point clouds. Results: We evaluate the performance of C2P-Net through experiments on both synthetic and real OCT datasets. The results demonstrate that C2P-Net is generalized to unseen middle ear point clouds and capable of handling realistic noise and incompleteness in synthetic and real OCT data. Conclusions: In this work, we aim to enable diagnosis of middle ear structures with the assistance of OCT images. We propose C2P-Net: a two-staged non-rigid registration pipeline for point clouds to support the interpretation of in vivo noisy and partial OCT images for the first time. Code is available at: https://gitlab.com/nct_tso_public/c2p-net.
... Modeling of 3D shapes has been a central research topic in the computer graphics domain for decades. However, although great progress has been made since the seminal work of ShapeNet [5], most existing 3D modeling methods only focus on full shape modeling and are part relationoblivious [3,15,41]. Since no semantic information is used in those methods, details of the 3D shapes, especially those of small volume yet exquisite parts, are often badly generated in a blurred or shattered manner. ...
... Since the seminal work of ShapeNet [5], which is a large 3D shape model dataset that is publicly available, lots of research has been conducted in building 3D generative models for modeling 3D shapes. Brock et al. [3] use a 3D VAE neural network to encode and decode 3D shapes. Wu et al. use a 3D-GAN [41] to generate 3D shapes from latent vectors. ...
... (1) For the part reconstruction loss L part , same as [3,40], a modified binary cross entropy loss is used by introducing a hyper-parameter γ, which weights the relative importance of false positives against false negatives. ...
... Modeling of 3D shapes has been a central research topic in the computer graphics domain for decades. However, although great progress has been made since the seminal work of ShapeNet [5], most existing 3D modeling methods only focus on full shape modeling and are part relationoblivious [3,15,41]. Since no semantic information is used in those methods, details of the 3D shapes, especially those of small volume yet exquisite parts, are often badly generated in a blurred or shattered manner. ...
... Since the seminal work of ShapeNet [5], which is a large 3D shape model dataset that is publicly available, lots of research has been conducted in building 3D generative models for modeling 3D shapes. Brock et al. [3] use a 3D VAE neural network to encode and decode 3D shapes. Wu et al. use a 3D-GAN [41] to generate 3D shapes from latent vectors. ...
... (1) For the part reconstruction loss L part , same as [3,40], a modified binary cross entropy loss is used by introducing a hyper-parameter γ, which weights the relative importance of false positives against false negatives. ...
Preprint
Modeling a 3D volumetric shape as an assembly of decomposed shape parts is much more challenging, but semantically more valuable than direct reconstruction from a full shape representation. The neural network needs to implicitly learn part relations coherently, which is typically performed by dedicated network layers that can generate transformation matrices for each part. In this paper, we propose a VoxAttention network architecture for attention-based part assembly. We further propose a variant of using channel-wise part attention and show the advantages of this approach. Experimental results show that our method outperforms most state-of-the-art methods for the part relation-aware 3D shape modeling task.