Article

A depth map fusion algorithm with improved efficiency considering pixel region prediction

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
Full-text available
Because of the complex structure and different shapes of building contours, the uneven density distribution of airborne LiDAR point clouds, and occlusion, existing building contour extraction algorithms are subject to such problems as poor robustness, difficulty with setting parameters, and low extraction efficiency. To solve these problems, a building contour extraction algorithm based on multidirectional bands was proposed in this study. Firstly, the point clouds were divided into bands with the same width in one direction, the points within each band were vertically projected on the central axis in the band, the two projection points with the farthest distance were determined, and their corresponding original points were regarded as the roof contour points; given that the contour points obtained based on single-direction bands were sparse and discontinuous, different banding directions were selected to repeat the above contour point marking process, and the contour points extracted from the different banding directions were integrated as the initial contour points. Then, the initial contour points were sorted and connected according to the principle of joining the nearest points in the forward direction, and the edges with lengths greater than a given threshold were recognized as long edges, which remained to be further densified. Finally, each long edge was densified by selecting the noninitial contour point closest to the midpoint of the long edge, and the densification process was repeated for the updated long edge. In the end, a building roof contour line with complete details and topological relationships was obtained. In this study, three point cloud datasets of representative building roofs were chosen for experiments. The results show that the proposed algorithm can extract high-quality outer contours from point clouds with various boundary structures, accompanied by strong robustness for point clouds differing in density and density change. Moreover, the proposed algorithm is characterized by easily setting parameters and high efficiency for extracting outer contours. Specific to the experimental data selected for this study, the PoLiS values in the outer contour extraction results were always smaller than 0.2 m, and the RAE values were smaller than 7%. Hence, the proposed algorithm can provide high-precision outer contour information on buildings for applications such as 3D building model reconstruction.
Article
Full-text available
3D mapping is an increasingly important feature for recent photogrammetry and remote sensing systems. Nowadays, unmanned aerial vehicles (UAVs) have become one of the extensively used remote sensing platforms due to their high timeliness and flexibility on data acquisition and high spatial resolution of recorded images. UAV-based 3D mapping has overwhelming advantages over traditional data sources from satellite and aerial platforms. Generally, the workflow of UAV-based 3D mapping consists of four major steps, including: (i) data acquisition by using an optimal trajectory configuration, (ii) image matching to obtain reliable correspondences, (iii) aerial triangulation to resume accurate camera poses, and (iv) dense image matching to generate point clouds with high density. The performance of used algorithms in each step determines the reliability and precision of final 3D mapping products. With the rapid development of new techniques in computer vision and deep learning, a majority of newly designed methods have been documented in recent years. Thus, to gain an overall understanding of the four aspects, this review gives a survey of existing techniques and presents the specialty applications related to UAV-based 3D mapping, as well as the challenges in data acquisition in a complex environment, integration of multi-source images, and 3D mapping for fine-scale objects and with deep learning techniques. We hope that this review could conclude recent development and inspire new ideas.
Conference Paper
Full-text available
Scene understanding of full-scale 3D models of an urban area remains a challenging task. While advanced computer vision techniques offer cost-effective approaches to analyse 3D urban elements, a precise and densely labelled dataset is quintessential. The paper presents the first-ever labelled dataset for a highly dense Aerial Laser Scanning (ALS) point cloud at city-scale. This work introduces a novel benchmark dataset that includes a manually annotated point cloud for over 260 million laser scanning points into 100'000 (approx.) assets from Dublin LiDAR point cloud (Laefer, et al) in 2015. Objects are labelled into 13 classes using hierarchical levels of detail from large (i.e. building, vegetation and ground) to refined (i.e. window, door and tree) elements. To validate the performance of our dataset, two different applications are showcased. Firstly, the labelled point cloud is employed for training Convolutional Neural Networks (CNNs) to classify urban elements. The dataset is tested on the well-known state-of-the-art CNNs (i.e. PointNet, PointNet++ and So-Net). Secondly, the complete ALS dataset is applied as detailed ground truth for city-scale image-based 3D reconstruction.
Article
Full-text available
Photorealistic three-dimensional (3D) models are fundamental to the spatial data infrastructure of a digital city, and have numerous potential applications in areas such as urban planning, urban management, urban monitoring, and urban environmental studies. Recent developments in aerial oblique photogrammetry based on aircraft or unmanned aerial vehicles (UAVs) offer promising techniques for 3D modeling. However, 3D models generated from aerial oblique imagery in urban areas with densely distributed high-rise buildings may show geometric defects and blurred textures, especially on building façades, due to problems such as occlusion and large camera tilt angles. Meanwhile, mobile mapping systems (MMSs) can capture terrestrial images of close-range objects from a complementary view on the ground at a high level of detail, but do not offer full coverage. The integration of aerial oblique imagery with terrestrial imagery offers promising opportunities to optimize 3D modeling in urban areas. This paper presents a novel method of integrating these two image types through automatic feature matching and combined bundle adjustment between them, and based on the integrated results to optimize the geometry and texture of the 3D models generated from aerial oblique imagery. Experimental analyses were conducted on two datasets of aerial and terrestrial images collected in Dortmund, Germany and in Hong Kong. The results indicate that the proposed approach effectively integrates images from the two platforms and thereby improves 3D modeling in urban areas.
Conference Paper
Full-text available
In this paper we present a scalable approach for robustly computing a 3D surface mesh from multi-scale multi-view stereo point clouds that can handle extreme jumps of point density (in our experiments three orders of magnitude). The backbone of our approach is a combination of octree data partitioning, local Delaunay tetrahedralization and graph cut optimization. Graph cut optimization is used twice, once to extract surface hypotheses from local Delaunay tetrahedralizations and once to merge overlapping surface hypotheses even when the local tetrahedralizations do not share the same topology.This formulation allows us to obtain a constant memory consumption per sub-problem while at the same time retaining the density independent interpolation properties of the Delaunay-based optimization. On multiple public datasets, we demonstrate that our approach is highly competitive with the state-of-the-art in terms of accuracy, completeness and outlier resilience. Further, we demonstrate the multi-scale potential of our approach by processing a newly recorded dataset with 2 billion points and a point density variation of more than four orders of magnitude - requiring less than 9GB of RAM per process.
Conference Paper
Full-text available
Motivated by the limitations of existing multi-view stereo benchmarks, we present a novel dataset for this task. Towards this goal, we recorded a variety of indoor and outdoor scenes using a high-precision laser scanner and captured both high-resolution DSLR imagery as well as synchronized low-resolution stereo videos with varying fields-of-view. To align the images with the laser scans, we propose a robust technique which minimizes photometric errors conditioned on the geometry. In contrast to previous datasets, our benchmark provides novel challenges and covers a diverse set of viewpoints and scene types, ranging from natural scenes to man-made indoor and outdoor environments. Furthermore, we provide data at significantly higher temporal and spatial resolution. Our benchmark is the first to cover the important use case of hand-held mobile devices while also providing high-resolution DSLR camera images. We make our datasets and an online evaluation server available at http:// www.eth3d.net.
Conference Paper
Full-text available
This work presents a Multi-View Stereo system for robust and efficient dense modeling from unstructured image collections. Our core contributions are the joint estimation of depth and normal information , pixelwise view selection using photometric and geometric priors , and a multi-view geometric consistency term for the simultaneous refinement and image-based depth and normal fusion. Experiments on benchmarks and large-scale Internet photo collections demonstrate state-of-the-art performance in terms of accuracy, completeness, and efficiency.
Conference Paper
Full-text available
We present a new 3D reconstruction pipeline for digital preservation of natural and cultural assets. This application requires high quality results, making time and space constraints less important than the achievable precision. Besides the high quality models generated, our work allows an overview of the entire reconstruction process, from range image acquisition to texture generation. Several contributions are shown, which improve the overall quality of the obtained 3D models. We also identify and discuss many practical problems found during the pipeline implementation. Our objective is to help future works of other researchers facing the challenge of creating accurate 3D models of real objects.
Conference Paper
Full-text available
Depth-map merging approaches have become more and more popular in multi-view stereo (MVS) because of their flexibility and superior performance. The quality of depth map used for merging is vital for accurate 3D reconstruction. While traditional depth map estimation has been performed in a discrete manner, we suggest the use of a continuous counterpart. In this paper, we first integrate silhouette information and epipolar constraint into the variational method for continuous depth map estimation. Then, several depth candidates are generated based on a multiple starting scales (MSS) framework. From these candidates, refined depth maps for each view are synthesized according to path-based NCC (normalized cross correlation) metric. Finally, the multiview depth maps are merged to produce 3D models. Our algorithm excels at detail capture and produces one of the most accurate results among the current algorithms for sparse MVS datasets according to the Middlebury benchmark. Additionally, our approach shows its outstanding robustness and accuracy in free-viewpoint video scenario.
Conference Paper
Full-text available
We present a system for accurate real-time mapping of complex and arbitrary indoor scenes in variable lighting conditions, using only a moving low-cost depth camera and commodity graphics hardware. We fuse all of the depth data streamed from a Kinect sensor into a single global implicit surface model of the observed scene in real-time. The current sensor pose is simultaneously obtained by tracking the live depth frame relative to the global model using a coarse-to-fine iterative closest point (ICP) algorithm, which uses all of the observed depth data available. We demonstrate the advantages of tracking against the growing full surface model compared with frame-to-frame tracking, obtaining tracking and mapping results in constant time within room sized scenes with limited drift and high accuracy. We also show both qualitative and quantitative results relating to various aspects of our tracking and mapping system. Modelling of natural scenes, in real-time with only commodity sensor and GPU hardware, promises an exciting step forward in augmented reality (AR), in particular, it allows dense surfaces to be reconstructed in real-time, with a level of detail and robustness beyond any solution yet presented using passive computer vision.
Conference Paper
Full-text available
We present a viewpoint-based approach for the quick fu- sion of multiple stereo depth maps. Our method selects depth estimates for each pixel that minimize violations of visibility constraints and thus remove errors and inconsis- tencies from the depth maps to produce a consistent surface. We advocate a two-stage process in which the first stage generates potentially noisy, overlapping depth maps from a set of calibrated images and the second stage fuses these depth maps to obtain an integrated surface with higher ac- curacy, suppressed noise, and reduced redundancy. We show that by dividing the processing into two stages we are able to achieve a very high throughput because we are able to use a computationally cheap stereo algorithm and because this architecture is amenable to hardware- accelerated (GPU) implementations. A rigorous formula- tion based on the notion of stability of a depth estimate is presented first. It aims to determine the validity of a depth estimate by rendering multiple depth maps into the refer- ence view as well as rendering the reference depth map into the other views in order to detect occlusions and free- space violations. We also present an approximate alterna- tive formulation that selects and validates only one hypoth- esis based on confidence. Both formulations enable us to perform video-based reconstruction at up to 25 frames per second. We show results on the Multi-View Stereo Eval- uation benchmark datasets and several outdoors video se- quences. Extensive quantitative analysis is performed using an accurately surveyed model of a real building as ground truth.
Article
Full-text available
We present a new approach for large scale multi-view stereo matching, which is designed to operate on ultra high resolution image sets and efficiently compute dense 3D point clouds. We show that, by using a robust descriptor for matching purposes and high resolution images, we can skip the computationally expensive steps other algorithms require. As a result, our method has low memory requirements and low computational complexity while producing 3D point clouds containing virtually no outliers. This makes it exceedingly suitable for large scale reconstruction. The core of our algorithm is the dense matching of image pairs using DAISY descriptors, implemented so as to eliminate redundancies and optimize memory access. We use a variety of challenging data sets to validate and compare our results against other algorithms.
Article
Full-text available
This paper proposes a novel algorithm for multiview stereopsis that outputs a dense set of small rectangular patches covering the surfaces visible in the images. Stereopsis is implemented as a match, expand, and filter procedure, starting from a sparse set of matched keypoints, and repeatedly expanding these before using visibility constraints to filter away false matches. The keys to the performance of the proposed algorithm are effective techniques for enforcing local photometric consistency and global visibility constraints. Simple but effective methods are also proposed to turn the resulting patch model into a mesh which can be further refined by an algorithm that enforces both photometric consistency and regularization constraints. The proposed approach automatically detects and discards outliers and obstacles and does not require any initialization in the form of a visual hull, a bounding box, or valid depth ranges. We have tested our algorithm on various data sets including objects with fine surface details, deep concavities, and thin structures, outdoor scenes observed from a restricted set of viewpoints, and "crowded" scenes where moving obstacles appear in front of a static structure of interest. A quantitative evaluation on the Middlebury benchmark shows that the proposed method outperforms all others submitted so far for four out of the six data sets.
Article
Full-text available
In this paper we want to start the discussion on whether image based 3-D modelling techniques can possibly be used to replace LIDAR systems for outdoor 3D data acquisition. Two main issues have to be addressed in this context: (i) camera calibration (internal and external) and (ii) dense multi-view stereo. To investigate both, we have acquired test data from outdoor scenes both with LIDAR and cameras. Using the LIDAR data as reference we estimated the ground-truth for several scenes. Evaluation sets are prepared to evaluate different aspects of 3D model building. These are: (i) pose estimation and multi-view stereo with known internal camera parameters; (ii) camera calibration and multi-view stereo with the raw images as the only input and (iii) multi-view stereo.
Article
In this paper, we propose some efficient multi-view stereo methods for accurate and complete depth map estimation. We first present our basic methods with Adaptive Checkerboard sampling and Multi-Hypothesis joint view selection (ACMH $\&$ ACMH+). Based on our basic models, we develop two frameworks to deal with the depth estimation of ambiguous regions (especially low-textured areas) from two different perspectives: multi-scale information fusion and planar geometric clue assistance. For the former one, we propose a multi-scale geometric consistency guidance framework (ACMM) to obtain the reliable depth estimates for low-textured areas at coarser scales and guarantee that they can be propagated to finer scales. For the latter one, we propose a planar prior assisted framework (ACMP). We utilize a probabilistic graphical model to contribute a novel multi-view aggregated matching cost. At last, by taking advantage of the above frameworks, we further design a multi-scale geometric consistency guided and planar prior assisted multi-view stereo (ACMMP). This greatly enhances the discrimination of ambiguous regions and helps their depth sensing. Experiments on extensive datasets show our methods achieve state-of-the-art performance, recovering the depth estimation not only in low-textured areas but also in details. Related codes are available at https://github.com/GhiXu .
Article
While Structure from Motion achieves great success in 3D reconstruction, it still meets challenges on large scale scenes. Incremental SfM approaches are robust to outliers, but are limited by low efficiency and easy suffer from drift problem. Though Global SfM methods are more efficient than incremental approaches, they are sensitive to outliers, and would also meet memory limitation and time bottleneck. In this work, large scale SfM is deemed as a graph problem, where graph are respectively constructed in image clustering step and local reconstructions merging step. By leveraging the graph structure, we are able to handle large scale dataset in divide-and-conquer manner. Firstly, images are modelled as graph nodes, with edges are retrieved from geometric information after feature matching. Then images are divided into into independent clusters by a image clustering algorithm, and followed by a subgraph expansion step, the connection and completeness of scenes are enhanced by walking along a maximum spanning tree, which is utilized to construct overlapping images between clusters. Secondly, Image clusters are distributed into servers to execute SfM in parallel mode. Thirdly, after local reconstructions complete, we construct a minimum spanning tree to find accurate similarity transformations. Then the minimum spanning tree is transformed into a Minimum Height Tree to find a proper anchor node, and is further utilized to prevent error accumulation. We evaluate our approach on various kinds of datasets and our approach shows superiority over the state-of-the-art in accuracy and efficiency. Our algorithm is open-sourced in https://github.com/AIBluefisher/GraphSfM.
Article
Common depth image fusion methods use each original image as a reference plane and fuse the depth images using mutual projection. These methods can eliminate inconsistency between the depth images, but they cannot alleviate the point cloud redundancy and computational complexity. This article proposes a virtual view method for depth image fusion, defines a limited number of virtual views by means of view clustering, reduces the redundant calculations, and covers all scenes as much as possible. The depth image is merged ray by ray, and a reliable depth value is obtained via the F-test. Compared with the modified semiglobal matching (TSGM) stereo dense matching algorithm, the accuracy is improved by approximately 50% and the roughness is improved by approximately 50%. Compared with the classic surface reconstruction (SURE) fusion algorithm, there is more fusion depth value in each ray, and the accuracy and roughness are slightly improved. In addition, the algorithm of this article greatly reduces the number of reference planes.
Article
Multi-view stereo (MVS) plays a critical role in many practically important vision applications. Among the existing MVS methods, one typical approach is to fuse the depth maps from different views via minimization of the energy functional. However, these methods usually have expensive computational cost and are inflexible for extending to large neighborhood, leading to long run time and reconstruction artifacts. In this work, we propose a simple, efficient and flexible depth-map-fusion-based MVS reconstruction method: CoD-Fusion. The core idea of the method is to minimize the anisotropic or isotropic TV+L1 energy functional using the coordinate decent (CoD) algorithm. CoD performs TV+L1 minimization via solving a serial of voxel-wise L1 minimization sub-problems which can be efficiently solved using fast weighted median filtering (WMF). We then extend WMF to larger neighborhood to suppress reconstruction artifacts. The results of quantitative and qualitative evaluation validate the flexibility and efficiency of CoD-Fusion as a promising method for large scale MVS reconstruction.
Conference Paper
Ray tracing and rasterization have long been considered as two fundamentally different approaches to rendering images of 3D scenes, although they compute the same results for primary rays. Rasterization projects every triangle onto the image plane and enumerates all covered pixels in 2D, while ray tracing operates in 3D by generating rays through every pixel and then finding the closest intersection with a triangle. In this paper we introduce a new view on the two approaches: based on the Plücker ray-triangle intersection test, we define 3D triangle edge functions, resembling (homogeneous) 2D edge functions. Then both approaches become identical with respect to coverage computation for image samples (or primary rays). This generalized "3D rasterization" perspective enables us to exchange concepts between both approaches: we can avoid applying any model or view transformation by instead transforming the sample generator, and we can also eliminate the need for perspective division and render directly to non-planar viewports. While ray tracing typically uses floating point with its intrinsic numerical issues, we show that it can be implemented with the same consistency rules as 2D rasterization. With 3D rasterization the only remaining differences between the two approaches are the scene traversal and the enumeration of potentially covered samples on the image plane (binning). 3D rasterization allows us to explore the design space between traditional rasterization and ray casting in a formalized manner. We discuss performance/cost trade-offs and evaluate different implementations and compare 3D rasterization to traditional ray tracing and 2D rasterization.
Article
In this paper we propose a depth-map merging based Multiple View Stereo method for large-scale scenes which takes both accuracy and efficiency into account. In the proposed method, an efficient patch based stereo matching process is used to generate depth-map at each image with acceptable errors, followed by a depth-map refinement process to enforce consistency over neighboring views. Compared to state-of-the-art methods, the proposed method can reconstruct quite accurate and dense point clouds with high computational efficiency. Besides, the proposed method could be easily parallelized at image level, i.e., each depth-map is computed individually, which makes it suitable for large-scale scene reconstruction with high resolution images. The accuracy and efficiency of the proposed method are evaluated quantitatively on benchmark data and qualitatively on large data sets.
Conference Paper
Abstract We present a new formulation to multi-view stereo that treats the problem,as probabilistic 3D segmentation. Pre- vious work has used the stereo photo-consistency criterion as a detector of the boundary,between,the 3D scene and the surrounding,empty space. Here we show how the same criterion can also provide a foreground/background model that can predict if a 3D location is inside or outside the scene. This model replaces the commonly,used naive fore- ground model based on ballooning which is known,to per- form poorly in concavities. We demonstrate,how the prob- abilistic visibility is linked to previous work on depth-ma p fusion and we present a multi-resolution graph-cut imple- mentation,using the new,ballooning term that is very ef- ficient both in terms of computation,time and memory,re- quirements.
Article
This paper proposes a quasi-dense approach to 3D surface model acquisition from uncalibrated images. First, correspondence information and geometry are computed based on new quasi-dense point features that are resampled subpixel points from a disparity map. The quasi-dense approach gives more robust and accurate geometry estimations than the standard sparse approach. The robustness is measured as the success rate of full automatic geometry estimation with all involved parameters fixed. The accuracy is measured by a fast gauge-free uncertainty estimation algorithm. The quasi-dense approach also works for more largely separated images than the sparse approach, therefore, it requires fewer images for modeling. More importantly, the quasidense approach delivers a high density of reconstructed 3D points on which a surface representation can be reconstructed. This fills the gap of insufficiency of the sparse approach for surface reconstruction, essential for modeling and visualization applications. Second, surface reconstruction methods from the given quasi-dense geometry are also developed. The algorithm optimizes new unified functionals integrating both 3D quasi-dense points and 2D image information, including silhouettes. Combining both 3D data and 2D images is more robust than the existing methods using only 2D information or only 3D data. An efficient bounded regularization method is proposed to implement the surface evolution by level-set methods. Its properties are discussed and proven for some cases. As a whole, a complete automatic and practical system of 3D modeling from raw images captured by hand-held cameras to surface representation is proposed. Extensive experiments demonstrate the superior performance of the quasi-dense approach with respect to the standard sparse approach in robustness, accuracy, and applicability.
Conference Paper
We present a multi-view stereo algorithm that addresses the extreme changes in lighting, scale, clutter, and other effects in large online community photo collections. Our idea is to intelligently choose images to match, both at a per-view and per-pixel level. We show that such adaptive view selection enables robust performance even with dramatic appearance variability. The stereo matching technique takes as input sparse 3D points reconstructed from structure-from-motion methods and iteratively grows surfaces from these points. Optimizing for surface normals within a photoconsistency measure significantly improves the matching results. While the focus of our approach is to estimate high-quality depth maps, we also show examples of merging the resulting depth maps into compelling scene reconstructions. We demonstrate our algorithm on standard multi-view stereo datasets and on casually acquired photo collections of famous scenes gathered from the Internet.
Article
A number of techniques have been developed for reconstructing surfaces by integrating groups of aligned range images. A desirable set of properties for such algorithms includes: incremental updating, representation of directional uncertainty, the ability to fill gaps in the reconstruction, and robustness in the presence of outliers. Prior algorithms possess subsets of these properties. In this paper, we present a volumetric method for integrating range images that possesses all of these properties.