Fig 1 - uploaded by Djamila Aouada
Content may be subject to copyright.
Undefined pixels using state-of-the-art SR methods (Red colors are the closest objects and green colors are the furthest ones.) 

Undefined pixels using state-of-the-art SR methods (Red colors are the closest objects and green colors are the furthest ones.) 

Source publication
Conference Paper
Full-text available
We use multi-frame super-resolution, specifically, Shift & Add, to increase the resolution of depth data. In order to be able to deploy such a framework in practice, without requiring a very high number of observed low resolution frames, we improve the initial estimation of the high resolution frame. To that end, we propose a new data model that le...

Contexts in source publication

Context 1
... Γ (x) is a regularization term added to compensate undetermined cases by enforcing prior information about x, and λ being the regularization parameter. Starting with an accurate initial guess z 0 has a strong impact on the final solution of (6). We show the effect of undefined pixels in Z 0 caused by classical S&A in Fig. 1(b). A similar phenomenon is observed using interpolation-based initialization such as variational Bayesian SR (VBSR) [9] as seen in Fig. 1(c), suggesting that interpolation is not a sufficient solution to remove undefined pixels. Moreover, it creates additional artifacts on depth data such as jagged values on edges. It is common to face ...
Context 2
... and λ being the regularization parameter. Starting with an accurate initial guess z 0 has a strong impact on the final solution of (6). We show the effect of undefined pixels in Z 0 caused by classical S&A in Fig. 1(b). A similar phenomenon is observed using interpolation-based initialization such as variational Bayesian SR (VBSR) [9] as seen in Fig. 1(c), suggesting that interpolation is not a sufficient solution to remove undefined pixels. Moreover, it creates additional artifacts on depth data such as jagged values on edges. It is common to face this serious problem of undefined pixels in practice. It is dealt with by restricting the SR factor to low values, e.g., r = 2, and by ...
Context 3
... non-zero initialization in (9) releases the condition in (5), thus solving the problem of undefined pixels. In order not to fall under the same artifacts as those present with interpolation-based SR approaches, e.g., VBSR (Fig. 1(c)), it is necessary to perform the filling operation from registered and clustered LR images as in (4). Indeed, the values from LR frames remain more reliable sources of information than the ones due to upsampling. They are further processed by a (3 × 3) median filtering to smooth out noisy depth pixels. We point out that the higher ...

Similar publications

Conference Paper
Full-text available
The analysis of dynamic scenes in video is a very useful task especially for the detection and monitoring of natural hazards such as floods and fires. In this work, we focus on the challenging problem of real-world dynamic scene understanding, where videos contain dynamic textures that have been recorded in the "wild". These videos feature large il...

Citations

... The above framework has been first proposed in the case of static 2D scenes in [17] and for static depth scenes in [44]. In [8] and in [10] it has been extended to dynamic depth scenes defining the UP-SR algorithm. ...
Article
Full-text available
We propose a novel approach for enhancing depth videos containing non-rigidly deforming objects. Depth sensors are capable of capturing depth maps in real-time but suffer from high noise levels and low spatial resolutions. While solutions for reconstructing 3D details in static scenes, or scenes with rigid global motions have been recently proposed, handling unconstrained non-rigid deformations in relative complex scenes remains a challenge. Our solution consists in a recursive dynamic multi-frame super-resolution algorithm where the relative local 3D motions between consecutive frames are directly accounted for. We rely on the assumption that these 3D motions can be decoupled into lateral motions and radial displacements. This allows to perform a simple local per-pixel tracking where both depth measurements and deformations are dynamically optimized. The geometric smoothness is subsequently added using a multi-level L1 minimization with a bilateral total variation regularization. The performance of this method is thoroughly evaluated on both real and synthetic data. As compared to alternative approaches, the results show a clear improvement in reconstruction accuracy and in robustness to noise, to relative large non-rigid deformations, and to topological changes.
... In this work, we adopt Maximum A Posteriori (MAP) estimation using the robust bilateral total variation (BTV) as a regularization term as defined in [19]. This choice is motivated by the fact that the properties of a bilateral filter, namely, noise reduction while preserving edges, is now established as an appropriate method for depth data processing [12,32,33]. The BTV regularization is defined as follows: ...
Article
Full-text available
Multi-frame super-resolution is the process of recovering a high resolution image or video from a set of captured low resolution images. Super-resolution approaches have been largely explored in 2-D imaging. However, their extension to depth videos is not straightforward due to the textureless nature of depth data, and to their high frequency contents coupled with fast motion artifacts. Recently, few attempts have been introduced where only the super-resolution of static depth scenes has been addressed. In this work, we propose to enhance the resolution of dynamic depth videos with non-rigidly moving objects. The proposed approach is based on a new data model that uses densely upsampled, and cumulatively registered versions of the observed low resolution depth frames. We show the impact of upsampling in increasing the sub-pixel accuracy and reducing the rounding error of the motion vectors. Furthermore, with the proposed cumulative motion estimation, a high registration accuracy is achieved between non-successive upsampled frames with relative large motions. A statistical performance analysis is derived in terms of mean square error explaining the effect of the number of observed frames and the effect of the super-resolution factor at a given noise level. We evaluate the accuracy of the proposed algorithm theoretically and experimentally as function of the SR factor, and the level of contamination with noise. Experimental results on both real and synthetic data show the effectiveness of the proposed algorithm on dynamic depth videos as compared to state-of-art methods.
... This work has been published in [38] and [39] and some extended parts are under review in [40]. ...
... In this work, we adopt Maximum A Posteriori (MAP) estimation using the robust bilateral total variation (BTV) as a regularization term as defined in [7]. This choice is motivated by the fact that the properties of a bilateral filter, namely, noise reduction while preserving edges, is now established as an appropriate method for depth data processing [9,27,38]. The BTV regularization is defined as follows: ...
Thesis
Full-text available
Sensing using 3D technologies has seen a revolution in the past years where cost-effective depth sensors are today part of accessible consumer electronics. Their ability in directly capturing depth videos in real-time has opened tremendous possibilities for multiple applications in computer vision. These sensors, however, have major shortcomings due to their high noise contamination, including missing and jagged measurements, and their low spatial resolutions. In order to extract detailed 3D features from this type of data, a dedicated data enhancement is required. We propose a generic depth multi-frame super-resolution framework that addresses the limitations of state-of-the-art depth enhancement approaches. The proposed framework does not need any additional hardware or coupling with different modalities. It is based on a new data model that uses densely upsampled low resolution observations. This results in a robust median initial estimation, further refined by a deblurring operation using a bilateral total variation as the regularization term. The upsampling operation ensures a systematic improvement in the registration accuracy. This is explored in different scenarios based on the motions involved in the depth video. For the general and most challenging case of objects deforming non-rigidly in full 3D, we propose a recursive dynamic multi-frame super-resolution algorithm where the relative local 3D motions between consecutive frames are directly accounted for. We rely on the assumption that these 3D motions can be decoupled into lateral motions and radial displacements. This allows to perform a simple local per-pixel tracking where both depth measurements and deformations are optimized. As compared to alternative approaches, the results show a clear improvement in reconstruction accuracy and in robustness to noise, to relative large non-rigid deformations, and to topological changes. Moreover, the proposed approach, implemented on a CPU, is shown to be computationally efficient and working in real-time.
... This work has been published in [38] and [39] and some extended parts are under review in [40]. ...
... In this work, we adopt Maximum A Posteriori (MAP) estimation using the robust bilateral total variation (BTV) as a regularization term as defined in [7]. This choice is motivated by the fact that the properties of a bilateral filter, namely, noise reduction while preserving edges, is now established as an appropriate method for depth data processing [9,27,38]. The BTV regularization is defined as follows: ...
Thesis
Full-text available
Sensing using 3D technologies has seen a revolution in the past years where cost-effective depth sensors are today part of accessible consumer electronics. Their ability in directly capturing depth videos in real-time has opened tremendous possibilities for multiple applications in computer vision. These sensors, however, have major shortcomings due to their high noise contamination, including missing and jagged measurements, and their low spatial resolutions. In order to extract detailed 3D features from this type of data, a dedicated data enhancement is required. We propose a generic depth multi-frame super-resolution framework that addresses the limitations of state-of-the-art depth enhancement approaches. The proposed framework does not need any additional hardware or coupling with different modalities. It is based on a new data model that uses densely upsampled low resolution observations. This results in a robust median initial estimation, further refined by a deblurring operation using a bilateral total variation as the regularization term. The upsampling operation ensures a systematic improvement in the registration accuracy. This is explored in different scenarios based on the motions involved in the depth video. For the general and most challenging case of objects deforming non-rigidly in full 3D, we propose a recursive dynamic multi-frame super-resolution algorithm where the relative local 3D motions between consecutive frames are directly accounted for. We rely on the assumption that these 3D motions can be decoupled into lateral motions and radial displacements. This allows to perform a simple local per-pixel tracking where both depth measurements and deformations are optimized. As compared to alternative approaches, the results show a clear improvement in reconstruction accuracy and in robustness to noise, to relative large non-rigid deformations, and to topological changes. Moreover, the proposed approach, implemented on a CPU, is shown to be computationally efficient and working in real-time.
... lan and P. Kiran, 2003;D. Robinson and P. Milanfar, 2006). Those, however, do not consider the bias of an SR estimator despite it being always part of an image reconstruction solution (P. Chatterjee and P. Milanfar, 2009). Moreover, they assume a Gaussian noise model while UP-SR exploits an additive Laplace noise model. Recently, Al Ismaeil et al. (K. Al Ismaeil, D. Aouada, B. Mirbach, and B. Ottersten,, 2013a) proposed a new multi-frame SR approach for the enhancement of static depth scenes captured with these cameras. In (K. Al Ismaeil, D. Aouada, B. Mirbach, and B. Ottersten,, 2013b), they extended this work to dynamic depth scenes subject to local motions, i.e., scenes containing one or more moving objects. This algorithm is referred to a ...
... as Upsampling for Precise Super-Resolution (UP-SR). It is based on upsampling the observed LR frames prior to their registration. This has led to rewriting the general SR data model to a simplified image denoising problem from multiple noisy and blurred observations. The denoising is then achieved using a Maximum Likelihood (ML) approach. In both (K. Al Ismaeil, D. Aouada, B. Mirbach, and B. Ottersten,, 2013a) and (K. Al Ismaeil, D. Aouada, B. Mirbach, and B. Ottersten,, 2013b) the performance of UP-SR was characterized experimentally. In this paper, in order to reach a better understanding of this algorithm, and to separate the effect of the number of frames and the effect of the SR factor, we derive its performance in terms of mean square e ...
Conference Paper
Full-text available
All existent methods for statistical analysis of super–resolution approaches have stopped at the variance term, not accounting for the bias. In this paper we give an original derivation of the bias term. We propose to use a patch-based method inspired by the work of (P. Chatterjee and P. Milanfar, 2009). Our approach, however, is completely new as we derive a new affine bias model dedicated for the multi-frame super resolution framework. We apply the proposed statistical performance analysis to the Upsampling for Precise Super–Resolution (UP-SR) algorithm. This algorithm was shown experimentally to be a good solution for enhancing the res- olution of depth sequences in both cases of global and local motions. Its performance is herein analyzed theoretically in terms of its approximated mean square error, using the proposed derivation of the bias. This analysis is validated experimentally on a simulated static and dynamic depth sequences with a known ground truth. This provides an insightful understanding of the effects of noise variance, number of observed low resolution frames, and super–resolution factor on the final and intermediate performance of UP–SR. Our conclusion is that increasing the number of frames should improve the performance while the error is increased due to local motions, and to the upsampling which is part of UP-SR.
... tic limits of SR ( Rajagopalan and Kiran, 2003;Robinson and Milanfar, 2006). Those, however, do not consider the bias of an SR estimator despite it being always part of an image reconstruction solu- tion ( Chatterjee and Milanfar, 2009). Moreover, they assume a Gaussian noise model while UP-SR exploits an additive Laplace noise model. Recently, Al Ismaeil et al. (K. Al Ismaeil, 2013a) proposed a new multi-frame SR approach for the en- hancement of static depth scenes captured with these cameras. In (K. Al Ismaeil, 2013b), the authors have extended this work to dynamic depth scenes subject to local motions, i.e., scenes containing one or more moving objects. This algorithm is referred to as Up- sampling for Precise Su ...
Conference Paper
Full-text available
All existent methods for the statistical analysis of super–resolution approaches have stopped at the variance term, not accounting for the bias in the mean square error. In this paper we give an original derivation of the bias term. We propose to use a patch-based method inspired by the work of (Chatterjee and Milanfar, 2009). Our approach, however, is completely new as we derive a new affine bias model dedicated for the multi-frame super resolution framework. We apply the proposed statistical performance analysis to the Upsampling for Precise Super–Resolution (UP-SR) algorithm. This algorithm was shown experimentally to be a good solution for enhancing the resolution of depth sequences in both cases of global and local motions. Its performance is herein analyzed theoretically in terms of its approximated mean square error, using the proposed derivation of the bias. This analysis is validated experimentally on simulated static and dynamic depth sequences with a known ground truth. This provides an insightful understanding of the effects of noise variance, number of observed low resolution frames, and super–resolution factor on the final and intermediate performance of UP–SR. Our conclusion is that increasing the number of frames should improve the performance while the error is increased due to local motions, and to the upsampling which is part of UP-SR.
... Recently, Berretti et al. proposed to use SR on facial depth images once back-projected in 3-D, and defined the superfaces approach [9]. The SR algorithm they deployed is similar in principle to the initial blurred estimate provided in the enhanced Shift & Add algorithm proposed by Al Ismaeil et al. in [7]. Later on, this work was extended to the dynamic case where the considered multiple realizations were ordered frames constituting a video sequence [8]. ...
Conference Paper
Full-text available
We address the limitation of low resolution depth cameras in the context of face recognition. Considering a face as a surface in 3-D, we reformulate the recently proposed Upsampling for Precise Super--Resolution algorithm as a new approach on three dimensional points. This reformulation allows an efficient implementation, and leads to a largely enhanced 3-D face reconstruction. Moreover, combined with a dedicated face detection and representation pipeline, the proposed method provides an improved face recognition system using low resolution depth cameras. We show experimentally that this system increases the face recognition rate as compared to directly using the low resolution raw data.
Article
Guided depth map enhancement based on Markov Random Field (MRF) normally assumes edge consistency between the color image and the corresponding depth map. Under this assumption, the low-quality depth edges can be refined according to the guidance from the high-quality color image. However, such consistency is not always true, which leads to texture-copying artifacts and blurring depth edges. In addition, the previous MRF-based models always calculate the guidance affinities in the regularization term via a non-structural scheme which ignores the local structure on the depth map. In this paper, a novel MRF-based method is proposed. It computes these affinities via the distance between pixels in a space consisting of the Minimum Spanning Trees (Forest) to better preserve depth edges. Furthermore, inside each Minimum Spanning Tree, the weights of edges are computed based on explicit edge inconsistency measurement model, which significantly mitigates texture-copying artifacts. To further tolerate the effects caused by noise and better preserve depth edges, a bandwidth adaption scheme is proposed. Our method is evaluated for depth map super-resolution and depth map completion problems on synthetic and real datasets including Middlebury, ToF-Mark and NYU. A comprehensive comparison against 16 state-of-the-art methods is carried out. Both qualitative and quantitative evaluation present the improved performances.
Article
This paper proposes a unified multi-lateral filter to efficiently increase the spatial resolution of low-resolution and noisy depth maps in real-time. Time-of-Flight (ToF) cameras have become a very promising alternative to stereo-based range sensing systems as they provide depth measurements at a high frame rate. However, there are actually two main drawbacks that restrict their use in a wide range of applications; namely, their fairly low spatial resolution as well as the amount of noise within the depth estimation. In order to address these drawbacks, we propose a new approach based on sensor fusion. That is, we couple a ToF camera of low-resolution with a 2-D camera of higher resolution to which the low-resolution depth map will be efficiently upsampled. In this paper, we first review the existing depth map enhancement approaches based on sensor fusion and discuss their limitations. We then propose a unified multi-lateral filter that accounts for the inaccuracy of depth edges position due to the low-resolution ToF depth maps. By doing so, unwanted artefacts such as texture copying and edge blurring are almost entirely eliminated. Moreover, the proposed filter is configurable to behave as most of the alternative depth enhancement approaches. Using a convolution-based formulation and data quantization and downsampling, the described filter has been effectively and efficiently implemented for dynamic scenes in real-time applications. The experimental results show a sensitive qualitative as well as quantitative improvement on raw depth maps, outperforming state-of-the-art multi-lateral filters.
Conference Paper
Full-text available
In this work we propose KinectDeform, an algorithm which targets enhanced 3D reconstruction of scenes containing non-rigidly deforming objects. It provides an innovation to the existing class of algorithms which either target scenes with rigid objects only or allow for very limited non-rigid deformations or use precomputed templates to track them. KinectDeform combines a fast non-rigid scene tracking algorithm based on octree data representation and hierarchical voxel associations with a recursive data filtering mechanism. We analyze its performance on both real and simulated data and show improved results in terms of smoothness and feature preserving 3D reconstructions with reduced noise.