Article

Real-Time 3D Model Acquisition

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The digitization of the 3D shape of real objects is a rapidly expanding field, with applications in entertainment, design, and archaeology. We propose a new 3D model acquisition system that permits the user to rotate an object by hand and see a continuously-updated model as the object is scanned. This tight feedback loop allows the user to find and fill holes in the model in real time, and determine when the object has been completely covered. Our system is based on a 60 Hz. structured-light rangefinder, a real-time variant of ICP (iterative closest points) for alignment, and point-based merging and rendering algorithms. We demonstrate the ability of our prototype to scan objects faster and with greater ease than conventional model acquisition pipelines.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In general, in order to achieve multiple measurements of the tested object from different views, it can be classified into three main categories: methods based on turn-table [9,10], methods based on movable robot arm [11,12], and measurement systems with plane mirrors [13][14][15]. In the first method, the tested object is placed on a turn-table, acquiring the whole 3D data by multiple rotations. ...
... Likewise, the other two formulas can be derived from Eqs. (10), (11), and (12): ...
... Next, Eqs. (10), (11), and (12) can be rewritten again based on the Levenberg-Marquardt algorithm: ...
Chapter
Full-text available
In this paper, we propose a system calibration method for panoramic 3D shape measurement with plane mirrors. By introducing plane mirrors into the traditional fringe projection profilometry (FPP), our system can capture fringe images of the measured object from three different perspectives simultaneously including a real camera and two virtual cameras obtained by plane mirrors, realizing panoramic 3D shape reconstruction only by single-shot measurement. Furthermore, a flexible new technique is proposed to easily calibrate the mirror. In the proposed technique, the calibration of the mirror is discussed mathematically to ensure the effectiveness and rationality of the calibration process, it only requires the camera to observe a set of feature point pairs (including real points and virtual points) to achieve the solution of the reflection matrix for plane mirrors. The acquired calibration information is used to convert 3D point cloud data obtained from real and virtual perspectives into a common world coordinate system, making it possible to obtain full-surface 3D data of the object. Finally, benefited from the robust and high-performance calibration method, experimental results verify that our system can achieve high-accuracy and panoramic 3D shape measurement.
... Reconstructing 3D models of unknown objects from multi-view images is a long-standing computer vision problem which has received considerable attention [8]. With a single camera, a user can capture multiple views of an object by manually moving the camera around a static object [21,25,41] or by turning the object in front of the camera [26,30,33,34]. The latter approach is often referenced as in-hand object scanning and is particularly convenient Figure 1. ...
... Using an RGB-D sensor, several early in-hand scanning systems [26,[33][34][35] rely on tracking and are able to recover the shape of small objects manipulated by hands. Later, [30] showed how to use the motion of the hand and its contact points with the object to add constraints useful to deal with texture-less and highly symmetric objects, while restricting the contact points to stay fixed during the scanning. ...
Preprint
Full-text available
We propose a method for in-hand 3D scanning of an unknown object from a sequence of color images. We cast the problem as reconstructing the object surface from un-posed multi-view images and rely on a neural implicit surface representation that captures both the geometry and the appearance of the object. By contrast with most NeRF-based methods, we do not assume that the camera-object relative poses are known and instead simultaneously optimize both the object shape and the pose trajectory. As global optimization over all the shape and pose parameters is prone to fail without coarse-level initialization of the poses, we propose an incremental approach which starts by splitting the sequence into carefully selected overlapping segments within which the optimization is likely to succeed. We incrementally reconstruct the object shape and track the object poses independently within each segment, and later merge all the segments by aligning poses estimated at the overlapping frames. Finally, we perform a global optimization over all the aligned segments to achieve full reconstruction. We experimentally show that the proposed method is able to reconstruct the shape and color of both textured and challenging texture-less objects, outperforms classical methods that rely only on appearance features, and its performance is close to recent methods that assume known camera poses.
... In structured-light techniques (Park et al. 2001;Pagès et al. 2003;Caspi et al. 1998;Page et al. 2003;Szymon et al. 2002;Chen and Kak 1987;Joaquim et al. 2004;Salvi et al. 1998;Kiyasu et al. 1995;Grin et al. 1992; Morano et al. 1998), a light pattern is projected at a known angle onto the surface of interest and an image of the resulting pattern, reflected by the surface, is captured. The image is then analyzed to calculate the coordinates of the data point on the surface. ...
... The principle behind all time-of-flight (TOF) ( Szymon et al. 2002) implementations is to measure the amount of time (t) that a light pulse (i.e., laser electromagnetic radiation) takes to travel to the object and return. Because the speed of light (C) is known, it is possible to determine the distance traveled. ...
Book
Reverse engineering (RE) is generally defined as a process of analyzing an object or existing system (hardware and software) to identify its components and their interrelationships and to investigate how it works to redesign or produce a copy without access to the design from which it was originally produced. In areas related to 3-D graphics and modeling, RE technology is used for reconstructing 3-D models of an object in different geometric formats. RE hardware is used for RE data acquisition, which for 3-D modeling, is the collection of geometric data that represent a physical object. There are three main technologies for RE data acquisition: contact, noncontact and destructive. Outputs of the RE data acquisition process are 2-D cross-sectional images and point clouds that define the geometry of an object. RE software is employed to transform the RE data produced by RE hardware into 3-D geometric models. The final outputs of the RE data processing chain can be one of two types of 3-D data: (i) polygons or (ii) NURBS (nonuniform rational B-splines). Polygon models, which are normally in the STL, VRML, or DXF format, are commonly used for rapid prototyping, laser milling, 3-D graphics, simulation, and animations. NURBS surfaces or solids are frequently used in computer-aided design, manufacturing, and engineering (CAD-CAM-CAE) applications. In this chapter, hardware and software for RE are presented. Commercially available RE hardware based on different 3-D data collection techniques is briefly introduced. The advantages and disadvantages of various data acquisition methods are outlined to help in selecting the right RE hardware for specific applications. In the RE software section, end-use RE applications are classified and typical commercial RE packages are reviewed. The four RE phases used a RE data processing chain are highlighted, and the fundamental RE operations that are necessary for completing the RE data processing chain are presented and discussed in detail.
... Machat et al. be used to identify proteins with similar molecular surfaces despite a low sequence identity which could be beneficial to the protein structure prediction community, notably in threading where folds could be enriched with surface shapes. This could also be useful for identifying possible interacting partners [4] since molecular shape plays a crucial role in binding [3,44]. Shape retrieval methods could then be used for creating a structural classification of proteins based on their surfaces [6], rather than evolutionary distances or fold categories as in SCOP [40] or CATH [41] opening the possibility to extend the protein structure-function paradigm towards a protein structure-surface(s)-function paradigm. ...
... ShapeDNA is using the eigenvalues of the Laplace-Beltrami operator. Another work is Global Point Signature [44] which add the eigenfunctions to the descriptor and is called signature. Heat Kernel Signature (HKS) [41] is derived from the Heat Kernel which represent the diffusion of heat on an object depending on the time. ...
Thesis
Les interactions entre protéines jouent un rôle crucial dans les processus du vivant comme la communication cellulaire, l’immunité, la croissance, prolifération et la mort cellulaires. Ces interactions se font via leur surface et la perturbation des interactions entre protéines est à la base de nombreux processus pathologiques. Il est donc nécessaire de bien comprendre et caractériser la surface des protéines et leurs interactions mutuelles de manière à mieux comprendre les processus du vivant. Différentes méthodes de comparaison de la surface des protéines ont été développées ces dernières années mais aucune n’est assez puissante pour traiter l’ensemble des structures disponibles dans les différentes bases de données. Le projet de thèse est donc de développer des méthodes rapides de comparaison de surface et de les appliquer à la surface des macromolécules.
... Sous-échantillonner le nuage points peut parfois être utile, afin de réduire la redondance d'informations [BBH08] ou pour accélérer les temps de traitement (pour permettre un rendu temps réel par exemple) [RHHL02]. Dans le premier cas une structure arborescente tirée de [BHGS06] a été utilisée, dans le second cas les points redondants ont été fusionnés via une grille de voxels. ...
... Annotation et synthèse basée données des expressions faciales de la Langue des Signes Française Clément Reverdy 2019Obtention d'un nuage dense de points La lumière structurée[RHHL02] ...
Thesis
La Langue des Signes Française (LSF) représente une part de l'identité et de la culture de la communauté des sourds en France. L'un des moyens permettant de promouvoir cette langue est la génération de contenu par le biais de personnages virtuels appelés avatars signeurs. Le système que nous proposons s’intègre dans un projet plus général de synthèse gestuelle de la LSF par concaténation qui permet de générer de nouvelles phrases à partir d'un corpus de données de mouvements annotées et capturées via un dispositif de capture de mouvement basé marqueurs (MoCap) en éditant les données existantes. En LSF, l'expressivité faciale est le vecteur de nombreuses informations (e.g., affectives, clausales ou adjectivales), d'où son importance. Cette thèse a pour but d'intégrer l'aspect facial de la LSF au système de synthèse concaténative décrit précédemment. Ainsi, nous proposons une chaîne de traitement de l'information allant de la capture des données via un dispositif de MoCap jusqu'à l'animation faciale de l'avatar à partir de ces données et l'annotation automatique des corpus ainsi constitués. La première contribution de cette thèse concerne la méthodologie employée et la représentation par blendshapes à la fois pour la synthèse d'animations faciales et pour l'annotation automatique. Elle permet de traiter le système d'analyse / synthèse à un certain niveau d'abstraction, avec des descripteurs homogènes et signifiants. La seconde contribution concerne le développement d'une approche d'annotation automatique qui s'appuie sur la reconnaissance d'expressions faciales émotionnelles par des techniques d'apprentissage automatique. La dernière contribution réside dans la méthode de synthèse qui s'exprime comme un problème d'optimisation assez classique mais au sein duquel nous avons inclus une énergie basée laplacien quantifiant les déformations d'une surface en tant qu'énergie de régularisation.
... More importantly, 2D fingerprints cannot truly represent natural three-dimensional (3D) ones because they lose 3D information during acquisition when a curved 3D finger is flattened against a 2D plane. Therefore, AFRSs for 3D fingerprints have been proposed in recent yeas [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13]. ...
... In recent years, several methods proposed for 3D fingerprint reconstruction can be classified into three main categories based on their imaging techniques: 1) photometric stereo [3], [4], [5], [6]; 2) structured light scanning [7], [8], [9], [10]; and 3) stereo vision [11], [12], [13]. ...
Article
Full-text available
Automated fingerprint recognition system (AFRS) for 3D fingerprints is essential and highly promising in biometric security. Despite the progress in 3D AFRSs, high-quality real-time 3D fingerprint reconstruction and high-accuracy 3D fingerprint recognition remain two challenging issues. To address these issues, we propose a robust 3D AFRS based on ridge-valley-guided 3D fingerprint reconstruction and 3D topology feature extraction. The proposed 3D fingerprint reconstruction considers the unique fingerprint characteristic of ridge-valley (RV) and achieves real-time reconstruction. Different from traditional triangulation-based methods that establish correspondence between points by cross-correlation-based searching, we propose to establish RV correspondence (RVC) between ridges/valleys by defining and calculating a RVC matrix based on the topology of RV curves. To enhance depth reconstruction, curve-based smoothing is proposed to refine our novel RV disparity map. The proposed 3D fingerprint recognition is based on three-dimensional topology polymer (TTP) feature extraction. The TTP codes 3D topology by projecting the 3D minutiae onto multiple planes and extracting their corresponding 2D topologies, which has proven to be effective and efficient. Comprehensive experimental results demonstrate that our method outperforms the state-of-the-art methods in terms of both reconstruction and recognition accuracy. Thanks to the significantly short running time, our method is applicable to practical applications.
... 3D imaging has seen rapid development and increased awareness through recent years, emerging as an invaluable technique for archaeological projects and digital archaeology by encouraging a new dimension toward recording and analysing finds (Tanasi, 2020;Wessels et al., 2022). 3D imaging technologies have seen significant development in quality, speed, affordability and user friendliness that have reduced the barriers of entry caused by complicated or highly technical methodologies and expensive equipment (Athanasiou et al., 2013;Campana and Remondino, 2008;Hirst et al., 2018;Kaiser and Dědič, 2024;Pavelka et al., 2021;Rusinkiewicz et al., 2002;Siebke et al., 2018). However, there is still very little methodological consistency observed in 3D scanning strategies for producing digital collections. ...
Article
Full-text available
Archaeology has faced increased pressure to digitise collections and make artefacts available and accessible to a wider audience. 3D imaging involves producing a 3D digital or printed model of an object or site. 3D models have the potential to augment the traditional approaches to museum engagement whilst breaking down the barriers to access, whether through providing 3D printed proxies in museums or sharing digital models online. 3D imaging has clear value in archaeology and public engagement but there is no standardisation or accessible pipelines available for achieving professional 3D imaging output. There is very little consensus in 3D modelling and worldwide, digital collections are being created with no methodological consistency. This research observed each stage necessary for producing high-quality 3D models with structured light scanning (SLS) technology. SLS was effective on a range of textures that may be encountered in archaeological scenarios, although highly reflective objects, or pale objects with black areas, may fail to be captured even with an altered strategy. In order to make the 3D model most representative of the archaeological find, it is recommended that a range of scanner settings such as brightness or shutter speed are tested on the object before committing these settings to the rest of the scans. Generalised 3D scanning pipelines are provided to inform archaeological teams on a 3D digital and printing strategy.
... These techniques can learn complex, high-dimensional representations of deformable objects from large amounts of image data, enabling robots to handle various things with different shapes and properties. Template matching and model-based methods are also employed to estimate the pose and deformation of known deformable objects [92,93]. These methods involve matching the observed image features to a pre-defined template or a 3D model, allowing for an accurate estimation of the object pose and deformation under varying conditions. ...
Article
Full-text available
This paper reviews the robotic manipulation of deformable objects in caregiving scenarios. Deformable objects like clothing, food, and medical supplies are ubiquitous in care tasks, yet pose modeling, control, and sensing challenges. This paper categorises caregiving deformable objects and analyses their distinct properties influencing manipulation. Key sections examine progress in simulation, perception, planning, control, and system designs for deformable object manipulation, along with end-to-end deep learning’s potential. Hybrid analytical data-driven modeling shows promise. While laboratory successes have been achieved, real-world caregiving applications lag behind. Enhancing safety, speed, generalisation, and human compatibility is crucial for adoption. The review synthesises critical technologies, capabilities, and limitations, while also pointing to open challenges in deformable object manipulation for robotic caregiving. It provides a comprehensive reference for researchers tackling this socially valuable domain. In conclusion, multi-disciplinary innovations combining analytical and data-driven methods are needed to advance real-world robot performance and safety in deformable object manipulation for patient care.
... The implementation of the microphones required me to again change the design to again carve out holes and make the carved out circle into a cylinder on the inside in order to accommodate the microphones. The implementation of the stepper motor with gear required me to 3D model a gear that fits the outer gear that is on the inside of the top of the Circle of Command which allows the top to rotate, and it also required me to once again change the design so that there is a holder for the stepper motor as it might very well move with the top as it rotates the top which will lead to the top not rotating at all as there is not a central point to rotate from which will defeat the purpose of this project as the project is basically based on the assumption that the top rotates, so the design was yet again revised with the addition of the holder in order to hold the stepper motor in place as it rotates [13]. The implementation of the analog to digital converter did not involve any changes to the design, all that was involved was connecting all of the wires to its proper place on the raspberry pi and the analog to digital converter and the wires from the analog to digital converter to the three microphones, as well as some tape to hold some of the wires down. ...
Conference Paper
According to the CDC, 13.7% of adults in the United States have mobility issues, which is about 42 million peoplewho have that issue. The purpose of this paper is to utilize artificial intelligence and robotics packaged in a3Dprinted Circle of Command to help make the lives of those 42 million people easier [5]. A raspberry pi is usedtopower and process the voice control commands and turn the voice commands into actions that respond tothecommands [6]. The voice control is achieved through three microphones that are embedded from the inside of theCircle of Command using holes to listen to any potential voice commands, and as the raspberry pi can't input analog output which is outputted from the microphones, the microphones and the raspberry pi are connected toananalog to digital converter that allows the raspberry pi and the microphones to exchange information smoothlyRobotics is used in the stepper motor and the gear that is used to turn the top of the lazy susan, which is 3Dprintedso that there are gears on the inside of the top. There is also a home switch that will home the lazy susan. Thereisno practical application of this AI Robotic Circle of Command as it is just a prototype, and so there are no results, but it is hoped that it will be able to change some of the 42 million lives once the production of the production model begins, the production model will be larger than the prototype as the prototype serves as only a proof of concept andshows that this idea is actually doable [7].
... There would be hardware with multiple sensors to record data based on the movements of the skate and then the data would be processed. After that, a skater can upload this data onto an app that would have a 3D model of a skate on ice, which would move based on the data [4]. Then, the skater would be able to see whether the skate moved the wrong way, fell on the wrong edge, or did other wrong techniques. ...
Conference Paper
Ever since the start of Figure Skating, there has been an emphasis on skating technique, especially in the step sequences of a skater’s choreography [1]. But Figure Skaters often are not able to detect the motion, edge, or placement of their blade on the ice without watching themselves skate. The solution to this problem would be to have a skate analyzer. A skate analyzer would record the movements of a skate on ice and one would be able to playback the recorded data and view their skate motion precisely [2]. Three main components that my project links together are the QTPY-ESP 32 microcontroller, the sensor that combines the accelerometer, gyroscope, and magnetometer, and the SD card reader. The QTPY-ESP32 is a microcontroller that acts as a main computer controlling the whole board. The QTPY is then connected to a sensor board through an I2c protocol. Then, through an SPI protocol, the QTPY is connected to an SD card reader. After the skater is finished recording, they can insert the SD card in a computer, upload the data into the app, and play it back. There is also a slider on the top of the screen that the skater can slide back and forth to view the skate at specific times in the file. This would be a great technology to use for skaters as they can playback their movements on ice and improve their technique [3].
... Paper prototypes are the sketches of the digital application that enable designers to quickly bring out changes in features based on customer feedback [13]. Digital prototypes [14] and 3D models [15] are more creative and attractive way to present product features. Although they are a bit costly requiring manpower and skill set but they can represent the features in more depth as compared to paper prototypes. ...
Conference Paper
A minimum viable product (MVP) is a product with minimal but enough number of features to be used by end users to assess the product in terms of its functionality and ease of use. Startups, specifically digital startups use MVP to gain better insight in to design issues and features of newly made products to make them better in future. A better product in terms of desired features and innovation ensures its applicability in market and enables startups to scale with the passage of time. As there is very little research on usage and impact of MVPs on startup performance we have tried to gain insight in to different usages of MVPs in digital startups. Our findings show that different kinds of MVPs such as digital prototypes, 3D MVPs etc. serve different purposes such as feature validation and idea extraction at different phases of startups. We have also identified different challenges faced by technical team while designing MVP. These include challenges regarding time, budget, communicating ideas with stakeholders and identifying the most valuable features for the MVP
... Early in-hand scanning uses depth images as input. [Rusinkiewicz et al. 2002] proposes an in-hand 3D model acquisition system, which allows the user to manipulate the object and see the scanning mesh in real time. The system captures local surface patches, then aligns and integrates them into a complete 3D mesh. ...
Preprint
This paper presents an approach that reconstructs a hand-held object from a monocular video. In contrast to many recent methods that directly predict object geometry by a trained network, the proposed approach does not require any learned prior about the object and is able to recover more accurate and detailed object geometry. The key idea is that the hand motion naturally provides multiple views of the object and the motion can be reliably estimated by a hand pose tracker. Then, the object geometry can be recovered by solving a multi-view reconstruction problem. We devise an implicit neural representation-based method to solve the reconstruction problem and address the issues of imprecise hand pose estimation, relative hand-object motion, and insufficient geometry optimization for small objects. We also provide a newly collected dataset with 3D ground truth to validate the proposed approach.
... SL enables 3D reconstruction without line-by-line scanning by projecting a known parallel set of line pattern onto the object scene and measuring how much it has been altered by the object. The correspondence matching between the elements in the projected and captured patterns requires an encoding of the pattern, and hence, numerous encoding algorithms using time-multiplexing 54-57 , lineshifting 58 , and dynamic scenes 56,59 have been studied. The proposed full-space diffractive metasurface can be designed to project patterns needed in these algorithms. ...
Article
Full-text available
Structured light (SL)-based depth-sensing technology illuminates the objects with an array of dots, and backscattered light is monitored to extract three-dimensional information. Conventionally, diffractive optical elements have been used to form laser dot array, however, the field-of-view (FOV) and diffraction efficiency are limited due to their micron-scale pixel size. Here, we propose a metasurface-enhanced SL-based depth-sensing platform that scatters high-density ~10 K dot array over the 180° FOV by manipulating light at subwavelength-scale. As a proof-of-concept, we place face masks one on the beam axis and the other 50° apart from axis within distance of 1 m and estimate the depth information using a stereo matching algorithm. Furthermore, we demonstrate the replication of the metasurface using the nanoparticle-embedded-resin (nano-PER) imprinting method which enables high-throughput manufacturing of the metasurfaces on any arbitrary substrates. Such a full-space diffractive metasurface may afford ultra-compact depth perception platform for face recognition and automotive robot vision applications.
... Structured light projection is one of the most promising techniques. Several review articles (see, for example, [1 , 2] ) summarize the patterns currently used in structured light projection: statistical patterns [3][4][5] , binary patterns [6] , sinusoidal fringe patterns [7] , pseudo-sine-wave patterns produced from binary patterns with defocus [8] , and some additional novel patterns [9][10][11] . The projection of sinusoidal fringe patterns is one of the most widely used methods, due to its superior measurement accuracy. ...
Article
Full-text available
Projection rates of up to 30,000 greyscale fringe patterns per second have been achieved recently by defocusing binary fringe patterns from a digital micromirror device (DMD) based projector. Part 1 of this two-part paper describes the design of a binary phase mask, based on a virtual scatter plate, for the purpose of enhancing the performance of a binary fringe projector. The phase mask's anisotropic point spread function (PSF) produces a well-defined blur of the fringes parallel to the fringe direction, thereby minimising degradation of fringe contrast. The shape of the PSF is also shown, by means of a polychromatic Fourier optics model, to be insensitive to projection distance over a range of ±10% of the standoff distance. Two new binary fringe design methods are proposed, including extensions to optimize the system performance in the case of a mismatch between camera and projector framing rates. Expressions for the phase noise are derived as a function of the phase mask design parameters, which demonstrate that fringe quality comparable to traditional 8-bit greyscale fringes is achievable at projection rates over two orders of magnitude higher.
... The other active method is the approach using structured light, while the measuring process and reconstruction algorithm depend on the codification methods of the project structured patterns. The popular encoding methods include binary coding [6], sinusoidal phase encoding [7], [8], binary defocusing [9], [10], triangular phase-shifting method [11], and color coded structured patterns [12], [13]. Those encoding methods are mostly based on the computer-controlled DMD-chip-based digital-light-processing (DLP) projector, which suffers from the limitations such as the requirement of precise synchronization, speed limit of measurement, and the nonlinear gamma effect of the DLP projection [14]. ...
Article
Full-text available
This investigation is aiming at the development of a method for in-situ 3D imaging and reconstructions of objects in the rain. The proposed method is based on the use of monochromatic sinusoidal fringe pattern generated by the designed optical system, the polarization technique, and the Fourier-transform-based algorithm of reconstruction. Based on the theoretical analyses and experimental results, it shows that the generated laser-beam-based signal keeps being coherent until reaching the observed object. The coherence of the projected sinusoidal signal is the key feature of the proposed method and ensures the accuracy of measurement and reconstruction of objects under rainy conditions. Moreover, the effects resulted from the spectral absorption and multiple scattering on the propagation of the projected sinusoidal fringe pattern in the rain can be removed using the polarization technique. The developed method is capable of obtaining accurate 3D reconstructions of the objects under rainy conditions together with background illumination, the multiply-scattered light, the vibration from environmental influences including the wind, and the inhomogeneous medium of rain. The application of this method does not suffer the limitation of focusing, precise synchronization, and the speed of measurement that might be a problem for the time-of-flight technique and digital-light-processing-based profilometry.
... Infini-TAM [12] proposed a highly efficient implementation of voxel hashing to achieve real-time scanning of large objects on a Nvidia Shield Tablet externally connected with a depth camera. Recently, some interactive in-hand object modeling systems have been proposed by [33,42], which provide real-time registration of the input RGBD frames, while the in-hand interactivity enables the user to guide the object scanning process. Tzionas and Gall [40] improve the in-hand scanning pipeline to effectively facilitate the reconstruction of featureless and highly symmetric objects by 3D hand motion extraction. ...
Article
Full-text available
We present a novel online 3D scanning system for high-quality object reconstruction with a mobile device, called Mobile3DScanner. Using a mobile device equipped with an embedded RGBD camera, our system provides online 3D object reconstruction capability for users to acquire high-quality textured 3D object models. Starting with a simultaneous pose tracking and TSDF fusion module, our system allows users to scan an object with a mobile device to get a 3D model for real-time preview. After the real-time scanning process is completed, the scanned 3D model is globally optimized and mapped with multi-view textures as an efficient post process to get the final textured 3D model on the mobile device. Unlike most existing state-of-the-art systems which can only scan homeware objects such as toys with small dimensions due to the limited computation and memory resources of mobile platforms, our system can reconstruct objects with large dimensions such as statues. We propose a novel visual-inertial ICP approach to achieve real-time accurate 6DoF pose tracking of each incoming frame on the front end, while maintaining a keyframe pool on the back end where the keyframe poses are optimized by local BA. Simultaneously, the keyframe depth maps are fused by the optimized poses to a TSDF model in real-time. Especially, we propose a novel adaptive voxel resizing strategy to solve the out-of-memory problem of large dimension TSDF fusion on mobile platforms. In the post-process, the keyframe poses are globally optimized and the keyframe depth maps are optimized and fused to obtain a final object model with more accurate geometry. The experiments with quantitative and qualitative evaluation demonstrate the effectiveness of the proposed 3D scanning system based on a mobile device, which can successfully achieve online high-quality 3D reconstruction of natural objects with larger dimensions for efficient AR content creation.
... This can be addressed with spatial index structures and so-called splatting. Here, the individual points are rendered as ellipses with a given radius, normal vector and color (Rusinkiewicz et al., 2002). ...
Article
Full-text available
Real-time 3D reconstruction enables fast dense mapping of the environment which benefits numerous applications, such as navigation or live evaluation of an emergency. In contrast to most real-time capable approaches, our method does not need an explicit depth sensor. Instead, we only rely on a video stream from a camera and its intrinsic calibration. By exploiting the self-motion of the unmanned aerial vehicle (UAV) flying with oblique view around buildings, we estimate both camera trajectory and depth for selected images with enough novel content. To create a 3D model of the scene, we rely on a three-stage processing chain. First, we estimate the rough camera trajectory using a simultaneous localization and mapping (SLAM) algorithm. Once a suitable constellation is found, we estimate depth for local bundles of images using a Multi-View Stereo (MVS) approach and then fuse this depth into a global surfel-based model. For our evaluation, we use 55 video sequences with diverse settings, consisting of both synthetic and real scenes. We evaluate not only the generated reconstruction but also the intermediate products and achieve competitive results both qualitatively and quantitatively. At the same time, our method can keep up with a 30 fps video for a resolution of 768 × 448 pixels.
... • a novel joint formulation of segmentation-by-reconstruction and object pose tracking that enables 4D reconstructions through decomposing the underlying TSDF rotations in individually reconstructed fused volumes; • an end-to-end pipeline that runs at interactive rates combined with a human semantic segmentation network to address a wide range of real-world scanning cases; • in addition to challenging real-world scenes, we introduce a new dataset for quantitative evaluation (20 scenes) and training semantic priors (39,068 scenes), where our method shows substantial improvements, especially for foreground reconstruction and tracking, over competing methods. of dense, RGB-D reconstruction approaches have been introduced. The core idea of these methods is to accumulate a series of RGB-D frames into a shared voxel representation, often a signed distance field [CL96], even at interactive rates [RHL02]. Most recent approaches are highly parallelized and run on the GPU with KinectFusion being one of the most popular methods [NIH*11]. ...
Article
Full-text available
Although surface reconstruction from depth data has made significant advances in the recent years, handling changing environments remains a major challenge. This is unsatisfactory, as humans regularly move objects in their environments. Existing solutions focus on a restricted set of objects (e.g., those detected by semantic classifiers) possibly with template meshes, assume static camera, or mark objects touched by humans as moving. We remove these assumptions by introducing RigidFusion. Our core idea is a novel asynchronous moving‐object detection method, combined with a modified volumetric fusion. This is achieved by a model‐to‐frame TSDF decomposition leveraging free‐space carving of tracked depth values of the current frame with respect to the background model during run‐time. As output, we produce separate volumetric reconstructions for the background and each moving object in the scene, along with its trajectory over time. Our method does not rely on the object priors (e.g., semantic labels or pre‐scanned meshes) and is insensitive to the motion residuals between objects and the camera. In comparison to state‐of‐the‐art methods (e.g., Co‐Fusion, MaskFusion), we handle significantly more challenging reconstruction scenarios involving moving camera and improve moving‐object detection (26% on the miss‐detection ratio), tracking (27% on MOTA), and reconstruction (3% on the reconstruction F1) on the synthetic dataset. Please refer the supplementary and the project website for the video demonstration (geometry.cs.ucl.ac.uk/projects/2021/rigidfusion).
... This can be addressed with spatial index structures and so-called splatting. Here, the individual points are rendered as ellipses with a given radius, normal vector and color (Rusinkiewicz et al., 2002). ...
Preprint
Full-text available
Real-time 3D reconstruction enables fast dense mapping of the environment which benefits numerous applications, such as navigation or live evaluation of an emergency. In contrast to most real-time capable approaches, our approach does not need an explicit depth sensor. Instead, we only rely on a video stream from a camera and its intrinsic calibration. By exploiting the self-motion of the unmanned aerial vehicle (UAV) flying with oblique view around buildings, we estimate both camera trajectory and depth for selected images with enough novel content. To create a 3D model of the scene, we rely on a three-stage processing chain. First, we estimate the rough camera trajectory using a simultaneous localization and mapping (SLAM) algorithm. Once a suitable constellation is found, we estimate depth for local bundles of images using a Multi-View Stereo (MVS) approach and then fuse this depth into a global surfel-based model. For our evaluation, we use 55 video sequences with diverse settings, consisting of both synthetic and real scenes. We evaluate not only the generated reconstruction but also the intermediate products and achieve competitive results both qualitatively and quantitatively. At the same time, our method can keep up with a 30 fps video for a resolution of 768x448 pixels.
... When rendering 3-D models in 3-D computer graphics, most of them are represented by a set of polygons composed of the arbitrary surfaces (Rusinkiewicz et al., 2002;Zhang & Chen, 2001). In addition, since arbitrary surfaces can be divided into triangles, a model using polygons are represented as a set of triangular meshes. ...
Article
Full-text available
In the field of 4‐D space visualization, the information of 4‐D space is obtained by projecting 4‐D data onto 3‐D space. Most of the previous research has been aimed at the recognition of 4‐D space, whereas the target of the recognition has been limited to the geometrical information of 4‐D objects in 4‐D space or static spatial information without dynamics. Our research aims to develop a visualization system for providing the human experience of the physics‐based environment in 4‐D space. In this research, we mainly focus on collision detection and the behaviour of 4‐D objects in 4‐D space in order to construct the physics‐based environment of 4‐D space. Our contribution in this paper is the development of a collision detection algorithm for 4‐D objects and a calculation method for physics based behaviour of 4‐D objects. Our proposed collision detection algorithm is based on the intersection test of tetrahedrons in 4‐D space, so that 4‐D objects in our system is represented as tetrahedral meshes. The tetrahedron‐based collision detection algorithm is performed by a combination of half‐space tests with the use of 5‐D homogeneous processing to enhance the calculation accuracy of the collision detection. Our proposed method calculates the behaviour of the 4‐D objects after the collision by solving the motion equation based on the principle of physics. Consequently, the visualization system with the proposed algorithm allows us to observe the physics‐based environment in 4‐D space.
... (Curless and Levoy, 1996) introduced the fundamental work of volumetric fusion and inspired the most modern approaches. The ability to provide real-time RGB-D reconstruction appears in 2002 with a system based on a 60 Hz structuredlight rangefinder (Rusinkiewicz et al., 2002). Although it is no longer a recent algorithm, KinectFusion (Newcombe et al., 2011) had a significant impact on the computer graphics and vision communities. ...
Conference Paper
This is a paper describing a system made up of several inward-facing cameras able to perform reconstruction ofdeformable objects through synchronous acquisition of RGBD data. The configuration of the camera systemallows the acquisition of 3D omnidirectional images of the objects. The paper describes the structure of thesystem as well as an approach for the extrinsic calibration, which allows the estimation of the coordinatetransformations between the cameras. Reconstruction results are also presented.
... One of them is the relatively mature fringe projection profilometry (FPP) technique [7] that extracts the parameters required for triangulation calculation from the phases of the light field. This method contains the optical sensor combining a camera and a digital light projection (DLP) device, where multiple cameras [8,9] gain the complete profile of the object, and the projection device varies from projectors to micro-mirrors. [10] The projected fringe is meticulously coded, the phase of the projected light fringe modulated by the surface of the object is calculated through algorithms, [11,12] and the 3D profile is restored according to the phase information. ...
Article
Full-text available
This paper conducts a trade-off between efficiency and accuracy of three-dimensional (3D) shape measurement based on the triangulation principle, and introduces a flying and precise 3D shape measurement method based on multiple parallel line lasers. Firstly, we establish the measurement model of the multiple parallel line lasers system, and introduce the concept that multiple base planes can help to deduce the unified formula of the measurement system and are used in simplifying the process of the calibration. Then, the constraint of the line spatial frequency, which maximizes the measurement efficiency while ensuring accuracy, is determined according to the height distribution of the object. Secondly, the simulation analyzing the variation of the systemic resolution quantitatively under the circumstance of a set of specific parameters is performed, which provides a fundamental thesis for option of the four system parameters. Thirdly, for the application of the precision measurement in the industrial field, additional profiles are acquired to improve the lateral resolution by applying a motor to scan the 3D surface. Finally, compared with the line laser, the experimental study shows that the present method of obtaining 41220 points per frame improves the measurement efficiency. Furthermore, the accuracy and the process of the calibration are advanced in comparison with the existing multiple-line laser and the structured light makes an accuracy better than 0.22 mm at a distance of 956.02 mm.
... The fringe patterns have limitations of lower accuracy and resolution. Rusinkiewicz et al. [13] and Hall-Holt et al. [14] developed a real-time 3D shape measurement system based on the stripe boundary code. In their approach the image acquisition and processing time was high and four patterns were required to reconstruct one 3D model. ...
Article
Full-text available
The Structure Light System (SLS) is a general concept and it is one of the cheapest methods for the non-contactbased 3D reconstruction. The existing single-shot SLS which is primarily based on the spatial encoding techniques are not optimal in terms of resolution and digitally encoded patterns. Those schemes are not flexible, controllable, and designed up to the level of the pixel. So, to increase the resolution and to implement a flexible controllable pattern we proposed a novel heuristic method based on the spatial neighborhood. In this paper, we propose a multi-resolution SLS which can be implemented with a set of 25 geometrical shaped distinct symbols or alphabets to use in the projection pattern as shape primitive. The size of each symbol is well defined in pixels which enabled us to have access and control up to the full resolution of the projector. The shape descriptive parameters for each symbol or alphabet are also defined and computed. To spread the alphabets in a controllable manner, a method is defined to generate a robust pseudo-random sequence of any required size with a certain number of alphanumeric bases, to be employed in the projection pattern concerning the measured resolution. This arrangement will enable us to design the projection patterns according to the required surface area and the resolution. A new technique is developed for the decoding of the captured image pattern. The decoding process depends upon the classification of symbols which is based on shape descriptive parameters. The searching in the neighborhood of a symbol is carried out through computing the location information, grid distance, and direction information to find the codewords which are used to establish the correspondence.
... To achieve high-speed measurement and reduce motion artifacts, structured patterns must be switched rapidly, and captured in a short period of time. For example, Rusinkiewicz and Levoy [27] developed a real-time 3D shape measurement system using the stripe boundary code [28] that only requires four binary patterns for codification. Such a system achieved 15 Hz 3D data acquisition speed. ...
Article
Full-text available
This paper reviews recent developments of non-contact three-dimensional (3D) surface metrology using an active structured optical probe. We focus primarily on those active non-contact 3D surface measurement techniques that could be applicable to the manufacturing industry. We discuss principles of each technology, and its advantageous characteristics as well as limitations. Towards the end, we discuss our perspectives on the current technological challenges in designing and implementing these methods in practical applications.
... We have developed a prototype projection mapping (PM) system that consisted of a projector and a camera. This is used to measure the shape of the breast surface by the structured light method [8,9] and can project the MIP image of a breast MRI scan taken in a supine position onto the breast skin. The aim of this study was to evaluate whether the location and spread of invasive breast cancer shown by the PM method was consistent with those identified by conventional methods or pathological findings. ...
Article
Full-text available
Aim: To evaluate the feasibility of a newly developed prototype MRI projection mapping (PM) system for localization of invasive breast cancer before breast-conserving surgery. Methods: This prospective study enrolled 10 women with invasive breast cancer. MRI was performed in both prone and supine positions. The tumor location was drawn on the breast skin using palpation and sonography while referring to the prone MRI (i.e., a conventional method). A maximum intensity projection image generated from the supine MRI was projected using our PM system, and the tumor location was drawn. The PM system consisted of a projector and a camera and was used to measure the shape of the breast surface using the structured light method. Breast-conserving surgery was performed based on the conventional method. We compared the tumor size and location between the PM and conventional methods or pathology. Results: There were no significant differences in the maximum diameters of invasive cancers between the PM system and the conventional method or pathology. The maximum discrepancy in tumor location between the PM and conventional method was 3-8 mm. Conclusions: This PM system may support breast-conserving surgery by showing the tumor size and location on the breast surface.
... Precision refers to the closeness of agreement between test results. Accordingly, the trueness of the intraoral scan often gradually decreases as the scan progresses away from the point of scan origin [21][22][23]26]. In order to reveal the trueness deviation due to the stitching, a new methodology was developed [27] in which superimposition was performed at the scanning origin only after carefully aligning this initial data point. ...
Article
Full-text available
Backgrounds: Intraoral scanner (IOS) accuracy is commonly evaluated using full-arch surface comparison, which fails to take into consideration the starting position of the scanning (scan origin). Previously a novel method was developed, which takes into account the scan origin and calculates the deviation of predefined identical points between references and test models. This method may reveal the error caused by stitching individual images during intraoral scan. This study aimed to validate the novel method by comparing the trueness of seven IOSs (Element 1, Element 2, Emerald, Omnicam, Planscan, Trios 3, CS 3600) to a physical impression digitized by laboratory scanner which lacks linear stitching problems. Methods: Digital test models of a dentate human cadaver maxilla were made by IOSs and by laboratory scanner after polyvinylsiloxane impression. All scans started on the occlusal surface of the tooth #15 (universal notation, scan origin) and finished at tooth #2. The reference model and test models were superimposed at the scan origin in GOM Inspect software. Deviations were measured between identical points on three different axes, and the complex 3D deviation was calculated. The effect of scanners, tooth, and axis was statistically analyzed by the generalized linear mixed model. Results: The deviation gradually increased as the distance from scan origin increased for the IOSs but not for the physical impression. The highest deviation occurred mostly at the apico-coronal axis for the IOSs. The mean deviation of the physical impression (53 ± 2 μm) was not significantly different from the Trios 3 (156 ± 8 μm) and CS 3600 (365 ± 29 μm), but it was significantly lower than the values of Element 1 (531 ± 26 μm), Element 2 (246 ± 11 μm), Emerald (317 ± 13 μm), Omnicam (174 ± 11 μm), Planscan (903 ± 49 μm). Conclusions: The physical impression was superior compared to the IOSs on dentate full-arch of human cadaver. The novel method could reveal the stitching error of IOSs, which may partly be caused by the difficulties in depth measurement.
... To obtain precise 3D shape, laser scanners and structuredlight patterns were used in [22], [38], and [56]. Based on a precise 3D reconstruction, parametric reflectance functions can be fitted at each surface point according to the image observations, as in [39] and [21]. ...
Preprint
We present a method to capture both 3D shape and spatially varying reflectance with a multi-view photometric stereo (MVPS) technique that works for general isotropic materials. Our algorithm is suitable for perspective cameras and nearby point light sources. Our data capture setup is simple, which consists of only a digital camera, some LED lights, and an optional automatic turntable. From a single viewpoint, we use a set of photometric stereo images to identify surface points with the same distance to the camera. We collect this information from multiple viewpoints and combine it with structure-from-motion to obtain a precise reconstruction of the complete 3D shape. The spatially varying isotropic bidirectional reflectance distribution function (BRDF) is captured by simultaneously inferring a set of basis BRDFs and their mixing weights at each surface point. In experiments, we demonstrate our algorithm with two different setups: a studio setup for highest precision and a desktop setup for best usability. According to our experiments, under the studio setting, the captured shapes are accurate to 0.5 millimeters and the captured reflectance has a relative root-mean-square error (RMSE) of 9%. We also quantitatively evaluate state-of-the-art MVPS on a newly collected benchmark dataset, which is publicly available for inspiring future research.
... [49]) enjoy a wide range of applications. Examples include 3D reconstruction from a few views [12] (e.g., RGB-D scans of an indoor scene captured at a few distinctive locations), enhancing the performance of interactive scanning [42,35] when there are interruptions during the acquisition process, early detection of loop closure [2], and solving jigsaw puzzles [8]. ...
Preprint
Full-text available
In this paper, we introduce a novel RGB-D based relative pose estimation approach that is suitable for small-overlapping or non-overlapping scans and can output multiple relative poses. Our method performs scene completion and matches the completed scans. However, instead of using a fixed representation for completion, the key idea is to utilize hybrid representations that combine 360-image, 2D image-based layout, and planar patches. This approach offers adaptively feature representations for relative pose estimation. Besides, we introduce a global-2-local matching procedure, which utilizes initial relative poses obtained during the global phase to detect and then integrate geometric relations for pose refinement. Experimental results justify the potential of this approach across a wide range of benchmark datasets. For example, on ScanNet, the rotation translation errors of the top-1/top-5 predictions of our approach are 34.9/0.69m and 19.6/0.57m, respectively. Our approach also considerably boosts the performance of multi-scan reconstruction in few-view reconstruction settings.
... Basically, the acquired surface data by IOSs would be arranged in a common coordinate system, and then image reconstruction would be carried out. The surface of an object is recognized as point clouds, and then highly specific software algorithms reconstruct the virtual surface image by stitching together, filtering, and converting the various point clouds [31][32][33] . Due to the humid environment and the different materials and textures, such as an enamel and various restorative materials in the oral cavity, as well as the patient's movements, direct intra oral scanning can be especially challenging. ...
Article
Optical technology has provided a paradigm shift in implant dentistry. However, there is little information about the use of optical technology in implant dentistry, since this technology is relatively new and has been evolving under the current conditions. In the present narrative literature review, the effects of intraoral scanners (IOSs) use on accuracy and the operating time, as well as safety and patient perception, in implant dentistry were evaluated from the clinical perspective. The accuracy of digital scans with IOSs was comparable to the conventional impression techniques for single or partial prostheses, and the digital scans with IOSs are time efficient when taking impressions for single- or double-abutments. However, the accuracy and time efficiency are decreased for multiple implant scans or large-area scans with IOSs use. Patient satisfaction with and preference for IOSs scans are generally superior to those with conventional impression procedures.
... The most common solution for pairwise registration of 3D point data is the Iterative Closest Point (ICP) algorithm alternating correspondence estimation and parameter optimisation until convergence criteria are satisfied as illustrated in Figure 2.8 and attended to in this section. ICP-based registration frameworks are popular to track depth cameras through a video stream aligning each frame to the previous one (frame-to-frame) [260], by aligning the current frame to an incrementally updated scene representation built from all previous frames (frame-to-model) [48,220,310] or by aligning a shape template to the current frame [329]. ICP registration of two frames at a time is also used to derive all camera pose relations in unordered image collections. ...
Thesis
Breast cancer is one of the most prevalent yet increasingly treatable cancer types. Clinical studies are suggesting a significant impact of breast cancer treatment on female patients' wellbeing and quality of life. With the oncological prognosis for mastectomy being on par with breast-conserving surgery, the latter still does lead to poor or suboptimal results in nearly a third of cases. Geometric 3D models of the breast have the potential to aid planning, assessment and prediction of treatment but require sustaining costly infrastructure-heavy commercial scanning solutions. This cross-disciplinary work within the scope of a European project investigates recently marketed depth consumer cameras as low-cost easy-to-operate imaging devices for dense 3D breast surface reconstruction. Clinical data acquisition software in accordance with a predefined protocol is implemented and deployed. Preliminary breast surface models from extending a state-of-the-art static-scene reconstruction method are validated on synthetic, phantom and clinical data. Contemporary publicly available reconstruction frameworks from the computer vision and robotics community are subsequently evaluated. Their shortcomings with respect to the characteristics of the captured breast data are addressed in a new tailored non-rigid reconstruction pipeline. Favourable accuracy and precision are underpinned by an extensive clinical data validation including a breast volume comparison study against the gold standard.
... For such problems, the overall shape of the object is generally obtained by multiple measurements from different views, and these existing methods can be classified into three main categories: methods based on turntable, 9, 10 methods based on movable robot arm, 11,12 and measurement systems with plane mirrors. [13][14][15][16] In the first method, the tested object is placed on a turn-table, acquiring the whole 3D data by multiple rotations. ...
Preprint
Full-text available
In this paper, we propose a novel LiDAR(-inertial) odometry and mapping framework to achieve the goal of simultaneous localization and meshing in real-time. This proposed framework termed ImMesh comprises four tightly-coupled modules: receiver, localization, meshing, and broadcaster. The localization module utilizes the prepossessed sensor data from the receiver, estimates the sensor pose online by registering LiDAR scans to maps, and dynamically grows the map. Then, our meshing module takes the registered LiDAR scan for incrementally reconstructing the triangle mesh on the fly. Finally, the real-time odometry, map, and mesh are published via our broadcaster. The key contribution of this work is the meshing module, which represents a scene by an efficient hierarchical voxels structure, performs fast finding of voxels observed by new scans, and reconstructs triangle facets in each voxel in an incremental manner. This voxel-wise meshing operation is delicately designed for the purpose of efficiency; it first performs a dimension reduction by projecting 3D points to a 2D local plane contained in the voxel, and then executes the meshing operation with pull, commit and push steps for incremental reconstruction of triangle facets. To the best of our knowledge, this is the first work in literature that can reconstruct online the triangle mesh of large-scale scenes, just relying on a standard CPU without GPU acceleration. To share our findings and make contributions to the community, we make our code publicly available on our GitHub: https://github.com/hku-mars/ImMesh.
Thesis
Modeling hand-object manipulations is essential for understanding how humans interact with their environment. Recent efforts to recover 3D information from RGB images have been directed towards fully-supervised methods which require large amounts of labeled training samples. However, collecting 3D ground-truth data for hand-object interactions is costly, tedious, and error-prone. In this thesis, we propose several contributions to overcome this challenge.First, we propose a fully automatic method to generate synthetic data with hand-object interactions for training. We generate ObMan, a synthetic dataset with automatically generated labels, and use it to train a deep convolutional neural network to reconstruct the observed object and the hand pose from a single RGB frame. We present an end-to-end learnable model that exploits a novel contact loss to favor physically plausible hand-object constellations. We investigate the domain gap and validate that our synthesized training data allows our model to reconstruct hand-object interactions from real images, provided the captured grasps are similar to the ones in the synthetic images.While costly, curating annotations from real images allows to obtain samples from the distribution of natural hand-object interactions. Next, we investigate a strategy to make the most of manual annotation efforts: we propose to leverage the temporal context in videos when sparse annotations are available. In a learnable framework which jointly reconstructs hands and objects in 3D by inferring the poses of known models, we leverage photometric consistency across time. Given our estimated reconstructions, we differentiably render the optical flow between pairs of images and use it to warp one frame to another. We then apply a self-supervised photometric loss that relies on the visual consistency between nearby images. We display competitive results for 3D hand-object reconstruction benchmarks and demonstrate that our approach allows to improve the pose estimation accuracy by leveraging information from neighboring frames in low-data regimes.Finally, we explore automatic annotation of real RGB data by proposing a learning-free fitting approach for hand-object reconstruction. We rely on 2D cues obtained with common learnt methods for detection, hand pose estimation and instance segmentation and integrate hand-object interaction priors. We evaluate our approach and show that it can be applied to datasets with varying levels of complexity. Our method can seamlessly handle two-hand object interactions and can provide noisy pseudo-labels for learning-based approaches.In summary, our contributions are the following: (i) we generate synthetic data for hand-object grasps that allows training CNNs for joint hand-object reconstruction, (ii) we propose a strategy to leverage the temporal context in videos when sparse annotations are provided, (iii) we propose to recover hand-object interactions for short video clips by fitting models to noisy predictions from learnt models.
Article
Full-text available
Contactless fingerprint biometrics has achieved rapid development in the past decades thanks to its inherent advantages, such as no physical contact between a finger and a sensor, no contamination by latent fingerprints, and more hygienic. These advantages have paved the way for new 2D or 3D contactless fingerprint-based applications and have promoted a larger number of academic publications in recent years. Therefore, it is necessary and important to conduct a comprehensive survey on contactless fingerprint biometric technology, review the latest research findings on 2D and 3D contactless fingerprint recognition systems, and point out the future development direction of contactless fingerprint biometrics. In this work, a comprehensive survey is presented to review the 2D and 3D contactless fingerprint biometrics from four essential aspects: contactless fingerprint capture, fingerprint preprocessing, feature extraction, and template comparison. To serve as a good reference, we provide a well-structured taxonomy about contactless fingerprint biometrics. We also identify related research problems and future research directions.
Thesis
Full-text available
Cette thèse s'inscrit dans le cadre du projet CPER Bramms, dont un des objectifs était de développer une méthode d'acquisition de la surface du buste féminin. Les travaux menés ont donc eu pour but la conception, le développement et la réalisation d'une machine de mesure tridimensionnelle adaptée aux objets vivants. Parmi le nombre important de méthodes de mesures tridimensionnelles existantes, l'attention a été portée sur la mise en correspondance par stéréovision ainsi que sur l'utilisation de lumière structurée. La mise en correspondance par stéréovision consiste à retrouver les pixels homologues dans deux images d'une même scène, prise de deux points de vue différents. Une des manières de réaliser la mise en correspondance est de faire appel à des mesures de corrélation. Les algorithmes utilisés se heurtent alors à certaines difficultés : les changements de luminosité, les bruits, les déformations, les occultations, les zones peu texturées et les larges zones homogènes. L'utilisation de lumière structurée a permis essentiellement d'ajouter de l'information dans les zones homogènes lors des travaux menés. En développant cette approche, une méthode de reconstruction originale basée sur l'exploitation d'un motif particulier projeté sur la surface a ainsi été conçue. Un appariement basé sur la comparaison de signatures de points particuliers du motif a été mis en place. Ce procédé permet une reconstruction éparse en une unique acquisition et simplifie l'étape de gestion du nuage de points pour en faire un maillage surfacique
Article
We present a novel real-time framework for non-rigid 3D reconstruction that is robust to noise, camera poses, and large deformation from a single depth camera. KinectFusion has achieved high-quality 3D object reconstructions in real-time by implicitly representing an object’s surface with a signed distance field (SDF) representation from a single depth camera. Many studies for incremental reconstruction have been presented since then, with the surface estimation improving over time. Previous works primarily focused on improving conventional SDF matching and deformation schemes. In contrast to these works, the proposed framework tackles the problem of temporal inconsistency caused by SDF approximation and fusion to manipulate SDFs and reconstruct a target more accurately over time. In our reconstruction pipeline, we introduce a refinement evolution method, where an erroneous SDF from a depth sensor is recovered more accurately in a few iterations by propagating erroneous SDF values from the surface. Reliable gradients of refined SDFs enable more accurate non-rigid tracking of a target object. Furthermore, we propose a level-set evolution for SDF fusion, enabling SDFs to be manipulated stably in the reconstruction pipeline over time. The proposed methods are fully parallelizable and can be executed in real-time. Qualitative and quantitative evaluations show that incorporating the refinement and fusion methods into the reconstruction pipeline improves 3D reconstruction accuracy and temporal reliability by avoiding cumulative errors over time. Evaluation results show that our pipeline results in more accurate reconstruction that is robust to noise and large motions, as well as outperforms previous state-of-the-art reconstruction methods.
Chapter
Registration is the problem of bringing together two or more 3D shapes, either of the same object or of two different but similar objects. This chapter first introduces the classical Iterative Closest Point (ICP) algorithm which represents the gold standard registration method. Current limitations of ICP are addressed, and the most popular variants of ICP are described to improve the basic implementation in several ways. Challenging registration scenarios are analyzed, and a taxonomy of recent and promising alternative registration techniques is introduced. Four case studies are then described with an increasing level of difficulty. The first case study describes a simple but effective technique to detect outliers. The second case study uses the Levenberg–Marquardt (LM) optimization procedure to solve standard pairwise registration. The third case study focuses on the challenging problem of deformable object registration. The fourth case study introduces an ICP method for preoperative data registration in laparoscopy. Finally, open issues and directions for future work are discussed, and conclusions are drawn.
Article
We present a method to capture both 3D shape and spatially varying reflectance with a multi-view photometric stereo (MVPS) technique that works for general isotropic materials. Our algorithm is suitable for perspective cameras and nearby point light sources. Our data capture setup is simple, which consists of only a digital camera, some LED lights, and an optional automatic turntable. From a single viewpoint, we use a set of photometric stereo images to identify surface points with the same distance to the camera. We collect this information from multiple viewpoints and combine it with structure-from-motion to obtain a precise reconstruction of the complete 3D shape. The spatially varying isotropic bidirectional reflectance distribution function (BRDF) is captured by simultaneously inferring a set of basis BRDFs and their mixing weights at each surface point. In experiments, we demonstrate our algorithm with two different setups: a studio setup for highest precision and a desktop setup for best usability. According to our experiments, under the studio setting, the captured shapes are accurate to 0.5 millimeters and the captured reflectance has a relative root-mean-square error (RMSE) of 9%. We also quantitatively evaluate state-of-the-art MVPS on a newly collected benchmark dataset, which is publicly available for inspiring future research.
Article
Data-driven modeling of human hand contact dynamics starts with a tedious process of data collection. The data of contact dynamics consist of an input describing an applied action and response stimuli from the environment. The quality and stability of the model mainly depend on how well data points cover the model space. Thus, in order to build a reliable data-driven model, a user usually collects data dozens of times. In this article, we aim to build an interactive system that assists a user in data collection. We develop an online segmentation framework that partitions a multivariate streaming signal. Real-time segmentation allows for tracking the process of how the model space is being populated. We applied the proposed framework for a haptic texture modeling use-case. In order to guide a user in data collection, we designed a user interface mapping applied input to alternative visual modalities based on the theory of direct perception. A combination of the segmentation framework and user interface implements a human-in-loop system, where the user interface assigns the target combination of input variables and the user tries to acquire them. Experimental results show that the proposed data collection schema considerably increases the approximation quality of the model, whereas the proposed user interface considerably reduces mental workload experienced during data collection.
Article
Full-text available
We present a simple, effective, and efficient technique for approximating arbitrary polyhedra. It is based on triangulation and vertex-clustering, and produces a series of 3D approximations (also called “levels of detail”) that resemble the original object from all viewpoints, but contain an increasingly smaller number of faces and vertices. The simplification is more efficient than competing techniques because it does not require building and maintaining a topological adjacency graph. Furthermore, it is better suited for mechanical CAD models which often exhibit patterns of small features, because it automatically groups and simplifies features that are geometrically close, but need not be topologically close or even part of a single connected component Using a lower level of detail when displaying small, distant, or background objects improves graphic performance without a significant loss of perceptual information, and thus enables realtime inspection of complex scenes or a convenient environment for animation or walkthrough preview.
Conference Paper
Full-text available
We describe the design of a system to augment a light stripingcamera for three dimensional scanning with a photometric system tocapture bump maps and approximate reflectances. In contrast with scanningan object with very high spatial resolution, this allows the relativelyefficient and inexpensive acquistion of input for high quality rendering.
Article
Full-text available
The structure-from-motion problem has been extensively studied in the field of computer vision. Yet, the bulk of the existing work assumes that the scene contains only a single moving object. The more realistic case where an unknown number of objects move in the scene has received little attention, especially for its theoretical treatment. In this paper we present a new method for separating and recovering the motion and shape of multiple independently moving objects in a sequence of images. The method does not require prior knowledge of the number of objects, nor is dependent on any grouping of features into an object at the image level. For this purpose, we introduce a mathematical construct of object shapes, called the shape interaction matrix, which is invariant to both the object motions and the selection of coordinate systems. This invariant structure is computable solely from the observed trajectories of image features without grouping them into individual objects. Once the matrix is computed, it allows for segmenting features into objects by the process of transforming it into a canonical form, as well as recovering the shape and motion of each object. The theory works under a broad set of projection models (scaled orthography, paraperspective and affine) but they must be linear, so it excludes projective "cameras".
Article
Full-text available
The problem of creating a complete model of a physical object is studied. Although this may be possible using intensity images, the authors use range images which directly provide access to three-dimensional information. The first problem that needs to be solved is to find the transformation between the different views. Previous approaches have either assumed this transformation to be known (which is extremely difficult for a complete model) or computed it with feature matching (which is not accurate enough for integration. The authors propose an approach that works on range data directly and registers successive views with enough overlapping area to get an accurate transformation between views. This is performed by minimizing a functional that does not require point-to-point matches. Details are given of the registration method and modeling procedure, and they are illustrated on range images of complex objects
Article
Full-text available
Three-dimensional (3-D) surface reconstructions provide a method to view complex anatomy contained in a set of computed tomography (CT), magnetic resonance imaging (MRI), or single photon emission computed tomography tomograms. Existing methods of 3-D display generate images based on the distance from an imaginary observation point to a patch on the surface and on the surface normal of the patch. We believe that the normalized gradient of the original values in the CT or MRI tomograms provides a better estimate for the surface normal and hence results in higher quality 3-D images. Then two algorithms that generate 3-D surface models are presented. The new methods use polygon and point primitives to interface with computer-aided design equipment. Finally, several 3-D images of both bony and soft tissue show the skull, spine, internal air cavities of the head and abdomen, and the abdominal aorta in detail.
Conference Paper
Full-text available
We present a multiview registration method for aligning range data. We first align scans pairwise with each other and use the pairwise alignments as constraints that the multiview step enforces while evenly diffusing the pairwise registration errors. This approach is especially suitable for registering large data sets, since using constraints from pairwise alignments does not require loading the entire data set into memory to perform the alignment. The alignment method is efficient, and it is less likely to get stuck into a local minimum than previous methods, and can be used in conjunction with any pairwise method based on aligning overlapping surface sections
Conference Paper
Full-text available
The structure from motion problem has been extensively studied in the field of computer vision. Yet, the bulk of the existing work assumes that the scene contains only a single moving object. The more realistic case where an unknown number of objects move in the scene has received little attention, especially for its theoretical treatment. We present a new method for separating and recovering the motion and shape of multiple independently moving objects in a sequence of images. The method does not require prior knowledge of the number of objects, nor is dependent on any grouping of features into an object at the image level. For this purpose, we introduce a mathematical construct of object shapes, called the shape interaction matrix, which is invariant to both the object motions and the selection of coordinate systems. This invariant structure is computable solely from the observed trajectories of image features without grouping them into individual objects. Once the structure is computed, it allows for segmenting features into objects by the process of transforming it into a canonical form, as well as recovering the shape and motion of each object
Article
Full-text available
Structures of dynamic scenes can only be recovered using a real-time range sensor. Depth from defocus offers an effective solution to fast and dense range estimation. However, accurate depth estimation requires theoretical and practical solutions to a variety of problems including recovery of textureless surfaces, precise blur estimation, and magnification variations caused by defocusing. Both textured and textureless surfaces are recovered using an illumination pattern that is projected via the same optical path used to acquire images. The illumination pattern is optimized to maximize accuracy and spatial resolution in computed depth. The relative blurring in two images is computed using a narrow-band linear operator that is designed by considering all the optical, sensing, and computational elements of the depth from defocus system. Defocus invariant magnification is achieved by the use of an additional aperture in the imaging optics. A prototype focus range sensor has been developed that has a workspace of 1 cubic foot and produces up to 512×480 depth estimates at 30 Hz with an average RMS error of 0.2%. Several experimental results are included to demonstrate the performance of the sensor
Article
Full-text available
Concerns the problem of range image registration for the purpose of building surface models of 3D objects. The registration task involves finding the translation and rotation parameters which properly align overlapping views of the object so as to reconstruct from these partial surfaces, an integrated surface representation of the object. The registration task is expressed as an optimization problem. We define a function which measures the quality of the alignment between the partial surfaces contained in two range images as produced by a set of motion parameters. This function computes a sum of Euclidean distances from control points on one surfaces to corresponding points on the other. The strength of this approach is in the method used to determine point correspondences. It reverses the rangefinder calibration process, resulting in equations which can be used to directly compute the location of a point in a range image corresponding to an arbitrary point in 3D space. A stochastic optimization technique, very fast simulated reannealing (VFSR), is used to minimize the cost function. Dual-view registration experiments yielded excellent results in very reasonable time. A multiview registration experiment took a long time. A complete surface model was then constructed from the integration of multiple partial views. The effectiveness with which registration of range images can be accomplished makes this method attractive for many practical applications where surface models of 3D objects must be constructed
Article
Full-text available
A strategy for acquiring 3-D data of an unknown scene, using range images obtained by a light stripe range finder is addressed. The foci of attention are occluded regions, i.e., only the scene at the borders of the occlusions is modeled to compute the next move. Since the system has knowledge of the sensor geometry, it can resolve the appearance of occlusions by analyzing them. The problem of 3-D data acquisition is divided into two subproblems due to two types of occlusions. An occlusion arises either when the reflected laser light does not reach the camera or when the directed laser light does not reach the scene surface. After taking the range image of a scene, the regions of no data due to the first kind of occlusion are extracted. The missing data are acquired by rotating the sensor system in the scanning plane, which is defined by the first scan. After a complete image of the surface illuminated from the first scanning plane has been built, the regions of missing data due to the second kind of occlusions are located. Then, the directions of the next scanning planes for further 3-D data acquisition are computed
Article
Full-text available
The authors describe a general-purpose, representation-independent method for the accurate and computationally efficient registration of 3-D shapes including free-form curves and surfaces. The method handles the full six degrees of freedom and is based on the iterative closest point (ICP) algorithm, which requires only a procedure to find the closest point on a geometric entity to a given point. The ICP algorithm always converges monotonically to the nearest local minimum of a mean-square distance metric, and the rate of convergence is rapid during the first few iterations. Therefore, given an adequate set of initial rotations and translations for a particular class of objects with a certain level of `shape complexity', one can globally minimize the mean-square distance metric over all six degrees of freedom by testing each initial registration. One important application of this method is to register sensed data from unfixtured rigid objects with an ideal geometric model, prior to shape inspection. Experimental results show the capabilities of the registration algorithm on point sets, curves, and surfaces
Article
Full-text available
We describe the design of a system to augment a light striping camera for three dimensional scanning with a photometric system to capture bump maps and approximate reflectances. In contrast with scanning an object with very high spatial resolution, this allows the relatively efficient and inexpensive acquistion of input for high quality rendering.
Article
Full-text available
This paper describes an interactive sensor planning system that can be used to select viewpoints subject to camera visibility, field of view and task constraints. Application areas for this method include surveillance planning, safety monitoring, architectural site design planning, and automated site modeling. Given a description of the sensor's characteristics, the objects in the 3-D scene, and the targets to be viewed, our algorithms compute the set of admissible view points that satisfy the constraints. The system first builds topologically correct solid models of the scene from a variety of data sources. Viewing targets are then selected, and visibility volumes and field of view cones are computed and intersected to create viewing volumes where cameras can be placed. The user can interactively manipulate the scene and select multiple target features to be viewed by a camera. The user can also select candidate viewpoints within this volume to synthesize views and verify the correctn...
Article
Full-text available
Surface elements (surfels) are a powerful paradigm to efficiently render complex geometric objects at interactive frame rates. Unlike classical surface discretizations, i.e., triangles or quadrilateral meshes, surfels are point primitives without explicit connectivity. Surfel attributes comprise depth, texture color, normal, and others. As a pre-process, an octree-based surfel representation of a geometric object is computed. During sampling, surfel positions and normals are optionally perturbed, and different levels of texture colors are prefiltered and stored per surfel. During rendering, a hierarchical forward warping algorithm projects surfels to a z-buffer. A novel method called visibility splatting determines visible surfels and holes in the z-buffer. Visible surfels are shaded using texture filtering, Phong illumination, and environment mapping using per-surfel normals. Several methods of image reconstruction, including supersampling, offer flexible speed-quality tradeoffs. Due ...
Article
Full-text available
We describe a hardware and software system for digitizing the shape and color of large fragile objects under non-laboratory conditions. Our system employs laser triangulation rangefinders, laser time-of-flight rangefinders, digital still cameras, and a suite of software for acquiring, aligning, merging, and viewing scanned data. As a demonstration of this system, we digitized 10 statues by Michelangelo, including the well-known figure of David, two building interiors, and all 1,163 extant fragments of the Forma Urbis Romae, a giant marble map of ancient Rome. Our largest single dataset is of the David - 2 billion polygons and 7,000 color images. In this paper, we discuss the challenges we faced in building this system, the solutions we employed, and the lessons we learned. We focus in particular on the unusual design of our laser triangulation scanner and on the algorithms and software we developed for handling very large scanned models. CR Categories: I.2.10 [Artificial Intelligence]...
Conference Paper
We describe a hardware and software system for digitizing the shape and color of large fragile objects under non-laboratory conditions. Our system employs laser triangulation rangefinders, laser time-of-flight rangefinders, digital still cameras, and a suite of software for acquiring, aligning, merging, and viewing scanned data. As a demonstration of this system, we digitized 10 statues by Michelangelo, including the well-known figure of David, two building interiors, and all 1,163 extant fragments of the Forma Urbis Romae, a giant marble map of ancient Rome. Our largest single dataset is of the David - 2 billion polygons and 7,000 color images. In this paper, we discuss the challenges we faced in building this system, the solutions we employed, and the lessons we learned. We focus in particular on the unusual design of our laser triangulation scanner and on the algorithms and software we developed for handling very large scanned models.
Article
We study the problem of creating a complete model of a physical object. Although this may be possible using intensity images, we here use images which directly provide access to three dimensional information. The first problem that we need to solve is to find the transformation between the different views. Previous approaches either assume this transformation to be known (which is extremely difficult for a complete model), or compute it with feature matching (which is not accurate enough for integration). In this paper, we propose a new approach which works on range data directly, and registers successive views with enough overlapping area to get an accurate transformation between views. This is performed by minimizing a functional which does not require point-to-point matches. We give the details of the registration method and modelling procedure, and illustrate them on real range images of complex objects.
Conference Paper
A method is presented that extracts the 3D shape of objects, together with the surface texture. Both shape and texture are obtained from a single image. The underlying principle is based on an active technique. A high resolution, pattern is projected onto the object and the deformations as observed by a single camera yield the 3rd dimension. Furthermore, the surface texture is extracted from the same image. Because the whole procedure is based on a single image, a frame-by-frame reconstruction of a video taken with the pattern projected throughout, yields 3D shape dynamics. The paper sketches the complete system but focuses on the problem of texture extraction
Conference Paper
Range imaging offers an inexpensive and accurate means for digitizing the shape of three-dimensional objects. Because most objects self occlude, no single range image suffices to describe the entire object. We present a method for combining a collection of range images into a single polygonal mesh that completely describes an object to the extent that it is visible from the outside.The steps in our method are: 1) align the meshes with each other using a modified iterated closest-point algorithm, 2) zipper together adjacent meshes to form a continuous surface that correctly captures the topology of the object, and 3) compute local weighted averages of surface positions on all meshes to form a consensus surface geometry.Our system differs from previous approaches in that it is incremental; scans are acquired and combined one at a time. This approach allows us to acquire and combine large numbers of scans with minimal storage overhead. Our largest models contain up to 360,000 triangles. All the steps needed to digitize an object that requires up to 10 range scans can be performed using our system with five minutes of user interaction and a few hours of compute time. We show two models created using our method with range data from a commercial rangefinder that employs laser stripe technology.
Conference Paper
This paper presents an example-based method for calculating skeleton-driven body deformations. Our example data consists of range scans of a human body in a variety of poses. Using markers captured during range scanning, we construct a kinematic skeleton and identify the pose of each scan. We then construct a mutually consistent parameterization of all the scans using a posable subdivision surface template. The detail deformations are represented as displacements from this surface, and holes are filled smoothly within the displacement maps. Finally, we combine the range scans using k-nearest neighbor interpolation in pose space. We demonstrate results for a human upper body with controllable pose, kinematics, and underlying surface shape.
Article
This paper describes a general purpose, representation independent method for the accurate and computationally efficient registration of 3-D shapes including free-form curves and surfaces. The method handles the full six-degrees of freedom and is based on the iterative closest point (ICP) algorithm, which requires only a procedure to find the closest point on a geometric entity to a given point. The ICP algorithm always converges monotonically to the nearest local minimum of a mean-square distance metric, and experience shows that the rate of convergence is rapid during the first few iterations. Therefore, given an adequate set of initial rotations and translations for a particular class of objects with a certain level of 'shape complexity', one can globally minimize the mean-square distance metric over all six degrees of freedom by testing each initial registration. For examples, a given 'model' shape and a sensed 'data' shape that represents a major portion of the model shape can be registered in minutes by testing one initial translation and a relatively small set of rotations to allow for the given level of model complexity. One important application of this method is to register sensed data from unfixtured rigid objects with an ideal geometric model prior to shape inspection. The described method is also useful for deciding fundamental issues such as the congruence (shape equivalence) of different geometric representations as well as for estimating the motion between point sets where the correspondences are not known. Experimental results show the capabilities of the registration algorithm on point sets, curves, and surfaces.
Article
In this paper, we discuss a novel strategy for rapid acquisition of the range map of a scene employing color-encoded structured light. This technique offers several potential advantages including increased speed and improved accuracy. In this approach we illuminate the scene with a single encoded grid of colored light stripes. The indexing problem, that of matching a detected image plane stripe with its position in the projection grid, is solved from a knowledge of the color grid encoding. In fact, the possibility exists for the first time to acquire high-resolution range data in real time for modest cost, since only a single projection and single color image are required. Grid to grid alignment problems associated with previous multistripe techniques are eliminated, as is the requirement for dark interstices between grid stripes. Scene illumination is more uniform, simplifying the stripe detection problem, and mechanical difficulties associated with the equipment design are significantly reduced.
Article
In this paper, we describe an efficient image-based approach to computing and shading visual hulls from silhouette image data. Our algorithm takes advantage of epipolar geometry and incremental computation to achieve a constant rendering cost per rendered pixel. It does not suffer from the computation complexity, limited resolution, or quantization artifacts of previous volumetric approaches. We demonstrate the use of this algorithm in a real-time virtualized reality application running off a small number of video streams. Engineering and Applied Sciences
Article
Submitted to the Department of Computer Science. Copyright by the author. Thesis (Ph. D.)--Stanford University, 2001.
Conference Paper
We present a novel approach to real-time structured light range scanning. After an analysis of the underlying assumptions of existing structured light techniques, we derive a new set of illumination patterns based on coding the boundaries between projected stripes. These stripe boundary codes allow range scanning of moving objects, with only modest assumptions about scene continuity and reflectance. We describe an implementation that integrates these new codes with real-time algorithms for tracking stripe boundaries and determining depths. Our system uses a standard video camera and DLP projector, and produces dense range images at 60 Hz with 100 μm accuracy over a 10 cm working volume. As an application, we demonstrate the creation of complete models of rigid objects: the objects are rotated in front of the scanner by hand, and successive range images are automatically aligned
Conference Paper
The ICP (Iterative Closest Point) algorithm is widely used for geometric alignment of three-dimensional models when an initial estimate of the relative pose is known. Many variants of ICP have been proposed, affecting all phases of the algorithm from the selection and matching of points to the minimization strategy. We enumerate and classify many of these variants, and evaluate their effect on the speed with which the correct alignment is reached. In order to improve convergence for nearly-flat meshes with small features, such as inscribed surfaces, we introduce a new variant based on uniform sampling of the space of normals. We conclude by proposing a combination of ICP variants optimized for high speed. We demonstrate an implementation that is able to align two range images in a few tens of milliseconds, assuming a good initial guess. This capability has potential application to real-time 3D model acquisition and model-based tracking
Conference Paper
Laser range scanners are a popular method for acquiring three-dimensional geometry due to their accuracy and robustness. Maximizing scanner accuracy while minimizing engineering costs is a key challenge to future scanner designs. Engineering costs arise from both expensive components and difficult calibration requirements. We propose a two camera range scanner design, specifically chosen to minimize calibration complexity and cost. This design eliminates all actuated components from the calibrated geometry. Since it is difficult to ensure absolute repeatability of moving parts, a design with only statically arranged components can dramatically reduce the costs associated with calibration
Conference Paper
The paper addresses the problem of recovering 3D non-rigid shape models from image sequences. For example, given a video recording of a talking person, we would like to estimate a 3D model of the lips and the full face and its internal modes of variation. Many solutions that recover 3D shape from 2D image sequences have been proposed; these so-called structure-from-motion techniques usually assume that the 3D object is rigid. For example, C. Tomasi and T. Kanades' (1992) factorization technique is based on a rigid shape matrix, which produces a tracking matrix of rank 3 under orthographic projection. We propose a novel technique based on a non-rigid model, where the 3D shape in each frame is a linear combination of a set of basis shapes. Under this model, the tracking matrix is of higher rank, and can be factored in a three-step process to yield pose, configuration and shape. To the best of our knowledge, this is the first model free approach that can recover from single-view video sequences nonrigid shape models. We demonstrate this new algorithm on several video sequences. We were able to recover 3D non-rigid human face and animal models with high accuracy
Conference Paper
For registration of 3-D free-form surfaces we have developed a representation which requires no knowledge of the transformation between views. The representation comprises descriptive images associated with oriented points on the surface of an object. Constructed using single point bases, these images are data level shape descriptions that are used for efficient matching of oriented points. Correlation of images is used to establish point correspondences between two views; from these correspondences a rigid transformation that aligns the views is calculated. The transformation is then refined and verified using a modified iterative closest point algorithm. To demonstrate the generality of our approach, we present results from multiple sensing domains
Conference Paper
A portable three-dimensional digitizer using a monocular camera is presented in this paper. The digitizer automatically acquires the shape of a target object as well as its texture. The digitizer has the following advantages: 1) compact and inexpensive, 2) skill-free 3D image acquisition, and 3) handles a wide range of objects of various materials. The digitizing algorithm is based on the “Shape-from-Silhouette” framework, where several novel techniques are embedded as follows. In the silhouette extraction, not only pixel-level subtraction between images but also region-level subtraction are embedded so as to achieve precise extraction. The texture acquisition is treated as a labeling problem in an energy minimization framework, which enables us to get realistic textures with a simple operation. Our experiments showed that the digitizing speed of the digitizer was practical
Conference Paper
We have developed a video-rate stereo machine that has the capability of generating a dense depth map at the video rate. The performance bench marks of the CMU video-rate stereo machine are: 1) multi-image input of up to 6 cameras; 2) throughput of 30 million point×disparity measurement per second; 3) frame rate of 30 frame/sec; 4) a dense depth map of up to 256×240 pixels; 5) disparity search range of up to 60 pixels; 6) high precision of depth output up to 8 bits (with interpolation). The capability of passively producing such a dense depth map (3D representation) of a scene at the video rate can open up a new class of applications of 3D vision: merging real and virtual worlds in real time
Conference Paper
A multiresolution surface modeling technique is presented. Several registered range views obtained from different viewpoints are first integrated into a nonredundant surface triangulation. The integration technique is based on the reparameterization of the canonic subsets of the Venn diagram of the set of views. The resulting triangulation is then input to a sequential optimization process that computes different levels of resolution of the surfaces of interest
Conference Paper
A simple imaging range sensor is described, based on the measurement of focal error, as described by A. Pentland (1982 and 1987). The current implementation can produce range over a 1 m<sup>3</sup> workspace with a measured standard error of 2.5% (4.5 significant bits of data). The system is implemented using relatively inexpensive commercial image-processing equipment. Experience shows that this ranging technique can be both economical and practical for tasks which require quick and reliable but coarse estimates of range. Examples of such tasks are initial target acquisition or obtaining the initial coarse estimate of stereo disparity in a coarse-to-fine stereo algorithm
Article
In this paper, we propose a new method, the RANSAC-based DARCES method (data-aligned rigidity-constrained exhaustive search based on random sample consensus), which can solve the partially overlapping 3D registration problem without any initial estimation. For the noiseless case, the basic algorithm of our method can guarantee that the solution it finds is the true one, and its time complexity can be shown to be relatively low. An extra characteristic is that our method can be used even for the case that there are no local features in the 3D data sets
Article
A number of techniques have been developed for reconstructing surfaces by integrating groups of aligned range images. A desirable set of properties for such algorithms includes: incremental updating, representation of directional uncertainty, the ability to fill gaps in the reconstruction, and robustness in the presence of outliers. Prior algorithms possess subsets of these properties. In this paper, we present a volumetric method for integrating range images that possesses all of these properties.
Article
For registration of 3-D free-form surfaces we have developed a representation which requires no knowledge of the transformation between views. The representation comprises descriptive images associated with oriented points on the surface of an object. Constructed using single point bases, these images are data level shape descriptions that are used for efficient matching of oriented points. Correlation of images is used to establish point correspondences between two views; from these correspondences a rigid transformation that aligns the views is calculated. The transformation is then refined and verified using a modified iterative closest point algorithm. To demonstrate the generality of our approach, we present results from multiple sensing domains. 1. Introduction Surface registration is the process that aligns 3-D data sets acquired from different view points or at different times. A common application of surface registration is to spatially reconcile multiple views of a scene in...
Article
this paper we present & new method for separating and recovering the motion and shape of multiple independently moving objects in sequence of images. The method does not require prior knowledge of the number of objects, nor is dependent on any grouping of features into an object at the image level. For this purpose, we introduce a mathematical construct of object shapes, called the shape interaction matrix, which is invariant to both the object motions and the selection of coordinate systems. This invariant structure is computable solely from the observed trajectories of image features without grouping them into individual objects
Article
We have developed a video-rate stereo machine that has the capability of generating a dense depth map at the video rate. The performance bench marks of the CMU video-rate stereo machine are: 1) multi image input of up to 6 cameras; 2) throughput of 30 million point x disparity measurement per second; 3) frame rate of 30 frame/sec; 4) a dense depth map of up to 256 × 240 pixels; 5) disparity search range up to 60 pixels; 6) high precision of depth output up to 8 bits (with interpolation). The capability of passively producing such a dense depth map (3D representation) of a scene at the video rate can open up a new class of applications of 3D vision: merging real and virtual worlds in real time.
Article
The ICP (Iterative Closest Point) algorithm is widely used for geometric alignment of three-dimensional models when an initial estimate of the relative pose is known. Many variants of ICP have been proposed, affecting all phases of the algorithm from the selection and matching of points to the minimization strategy. We enumerate and classify many of these variants, and evaluate their effect on the speed with which the correct alignment is reached. In order to improve convergence for nearly-flat meshes with small features, such as inscribed surfaces, we introduce a new variant based on uniform sampling of the space of normals. We conclude by proposing a combination of ICP variants optimized for high speed. We demonstrate an implementation that is able to align two range images in a few tens of milliseconds, assuming a good initial guess. This capability has potential application to real-time 3D model acquisition and model-based tracking.
Article
Advances in 3D scanning technologies have enabled the practical creation of meshes with hundreds of millions of polygons. Traditional algorithms for display, simplification, and progressive transmission of meshes are impractical for data sets of this size. We describe a system for representing and progressively displaying these meshes that combines a multiresolution hierarchy based on bounding spheres with a rendering system based on points. A single data structure is used for view frustum culling, backface culling, level-of-detail selection, and rendering. The representation is compact and can be computed quickly, making it suitable for large data sets. Our implementation, written for use in a large-scale 3D digitization project, launches quickly, maintains a user-settable interactive frame rate regardless of object complexity or camera position, yields reasonable image quality during motion, and refines progressively when idle to a high final image quality. We have demonstrated the system on scanned models containing hundreds of millions of samples. Categories and Subject Descriptors: I.3.3 [Computer Graphics]: Picture/Image Generation -- Display Algorithms; I.3.5 [Computer Graphics]: Computational Geometry and Object Modeling -- Curve, surface, solid, and object representations; I.3.6 [Computer Graphics ]: Methodology and Techniques -- Graphics data structures and data types. Keywords: Rendering systems, Spatial data structures, Level of detail algorithms, Compression algorithms 1
The Digital Michelangelo Project: 3D Scanning of Large Statues A Portable Three-Dimensional Digi-tizer Image-Based Visual Hulls Occlusions as a Guide for Planning the Next View Real-Time Focus Range Sensor A Simple, Real-Tie Range Camera
  • M Levoy
  • K Pulli
  • B Curless
  • S Rusinkiewicz
  • D Koller
  • L Pereira
  • M Ginzton
  • S Anderson
  • J Davis
  • J Ginsberg
  • J Shade
  • D And Fulk
  • Y Matsumoto
  • H Terasaki
  • K Sugimoto
  • T And Arakawa
  • W Matusik
  • C Buehler
  • R Raskar
  • S Gortler
  • L And Mcmillan
  • D Miyazaki
  • T Ooishi
  • T Nishikawa
  • R Sagawa
  • K Nishino
  • T Tomomatsu
  • Y Takase
  • K And Ikeuchi
  • S K Watanabe
  • M And Noguchi
  • M H Zwicker
  • M Van Baar
  • J And Gross
LEVOY, M., PULLI, K., CURLESS, B., RUSINKIEWICZ, S., KOLLER, D., PEREIRA, L., GINZTON, M., ANDERSON, S., DAVIS, J., GINSBERG, J., SHADE, J., AND FULK, D. 2000. " The Digital Michelangelo Project: 3D Scanning of Large Statues, " Proc. ACM SIGGRAPH 2000. MATSUMOTO, Y., TERASAKI, H., SUGIMOTO, K., AND ARAKAWA, T. 1997. " A Portable Three-Dimensional Digi-tizer, " Proc. 3DIM 1997. MATUSIK, W., BUEHLER, C., RASKAR, R., GORTLER, S., AND MCMILLAN, L. 2000. " Image-Based Visual Hulls, " Proc. ACM SIGGRAPH 2000. MAVER, J. AND BAJCSY, R. 1993. " Occlusions as a Guide for Planning the Next View, " Trans. PAMI, Vol. 15, No. 5. MIYAZAKI, D., OOISHI, T., NISHIKAWA, T., SAGAWA, R., NISHINO, K. TOMOMATSU, T., TAKASE, Y., AND IKEUCHI, K. 2000. Proc. VSMM 2000. NAYAR, S. K., WATANABE, M., AND NOGUCHI, M. 1996. " Real-Time Focus Range Sensor, " Trans. PAMI, Vol. 18, No. 12. PENTLAND, A., DARRELL, T., TURK, M. AND HUANG, W. 1989. " A Simple, Real-Tie Range Camera, " Proc. CVPR 1989. PFISTER, H., ZWICKER, M., VAN BAAR, J., AND GROSS, M. 2000. " Surfels: Surface Elements as Rendering Primitives, " Proc. ACM SIGGRAPH 2000.
Multi-Resolution Sur-face Modeling from Multiple Range Views Interactive Sensor Planning Zippered Polygon Meshes from Range Images
  • M Soucy
  • D And Laurendeau
  • I Stamos
  • P And Allen
  • G Turk
  • M And Levoy
SOUCY, M. AND LAURENDEAU, D. 1992. " Multi-Resolution Sur-face Modeling from Multiple Range Views, " Proc. CVPR 1992. STAMOS, I. AND ALLEN, P. 1998. " Interactive Sensor Planning, " Proc. CVPR 1998. TURK, G. AND LEVOY, M. 1994. " Zippered Polygon Meshes from Range Images, " Proc. ACM SIGGRAPH 94.
Efficient Variants of the ICP Algorithm
  • S And
  • M Levoy
Acquiring Input for Rendering at Appropriate Levels of Detail: Digitizing a Pietà
  • H Bernardini
  • F Mittleman
  • J And
  • G Taubin
Geometric Modeling in Computer Graphics. ROSSIGNAC J. AND BORREL P. 1993
  • Rossignac J Borrel P