Conference PaperPDF Available

Dynamic super resolution of depth sequences with non-rigid motions

September 2013

September 2013

DOI:10.1109/ICIP.2013.6738136

Conference: 2013 20th IEEE International Conference on Image Processing (ICIP)
At: Melbourne, Australia

Authors:

Kassem Al Ismaeil

University of Luxembourg

Djamila Aouada

University of Luxembourg

B. Mirbach

DFKI Kaiserslautern

Björn Ottersten

University of Luxembourg

We enhance the resolution of depth videos acquired with low resolution time-of-flight cameras. To that end, we propose a new dedicated dynamic super-resolution that is capable to accurately super-resolve a depth sequence containing one or multiple moving objects without strong constraints on their shape or motion, thus clearly outperforming any existing super-resolution techniques that perform poorly on depth data and are either restricted to global motions or not precise because of an implicit estimation of motion. The proposed approach is based on a new data model that leads to a robust registration of all depth frames after a dense upsampling. The textureless nature of depth images allows to robustly handle sequences with multiple moving objects as confirmed by our experiments.

. UP-SR results: (a) Proposed method using motion estimated from an LR sequence. (b) Proposed method using motion estimated from densely upsampled sequence ( r = 5 ).

…

. UP-SR example of a dynamic depth scene ( r = 5) :

…

. PSNR for different SR methods applied the moving hand sequence with ( r = 4 ).

…

Figures - uploaded by Kassem Al Ismaeil

Content may be subject to copyright.

Content uploaded by Kassem Al Ismaeil

Content may be subject to copyright.

DYNAMIC SUPER RESOLUTION OF DEPTH SEQUENCES WITH NON-RIGID MOTIONS

Kassem Al Ismaeil?Djamila Aouada?Bruno Mirbach†Bj¨

orn Ottersten?

?SnT - University of Luxembourg †Advanced Engineering - IEE S.A.

{kassem.alismaeil, djamila.aouada, bjorn.ottersten}@uni.lu bruno.mirbach@iee.lu

ABSTRACT

We enhance the resolution of depth videos acquired with low

resolution time-of-ﬂight cameras. To that end, we propose

a new dedicated dynamic super-resolution that is capable to

accurately super-resolve a depth sequence containing one or

multiple moving objects without strong constraints on their

shape or motion, thus clearly outperforming any existing

super-resolution techniques that perform poorly on depth

data and are either restricted to global motions or not precise

because of an implicit estimation of motion. Our proposed

approach is based on a new data model that leads to a robust

registration of all depth frames after a dense upsampling. The

textureless nature of depth images allows to robustly handle

sequences with multiple moving objects as conﬁrmed by our

experiments.

Index Terms—Depth sequence, dynamic super-resolution,

motion estimation, upsampling, ToF data, moving object.

1. INTRODUCTION

Super resolution (SR) is the process of recovering a high res-

olution (HR) image from a set of captured low resolution

(LR) frames. SR has originally been deﬁned for static scenes,

i.e., scenes where the motion between the observed images is

global as opposed to dynamic scenes containing a moving ob-

ject. The past two decades have witnessed tremendous work

on SR for static scenes. As presented in [3], these algorithms,

commonly referred to as classical SR, are numerically limited

to small global motions even for an increased number of LR

frames. Moreover, they cannot handle scenes with moving

objects and consider the corresponding frames as outliers. As

a solution to these major limitations, example-based SR algo-

rithms have been proposed [4], as well as their combinations

with classical multi-frame SR [5]. However, such algorithms

depend on a heavy training phase and the quality of the super-

resolved image is dependent on the suitability of the training

data. Relatively little attention has been given to the SR of

dynamic scenes. Farsiu et al. [1] have proposed a dynamic

shift and add model (dynamic S&A) as a mere extension of

the static case [2], hence suffering from the same restrictions.

Other methods [6, 7, 8, 9] have been proposed to tackle the

problem of dynamic SR by segmenting the moving object

ﬁrst before super resolving it. Such methods do not handle

pixels on the boundary of the object causing major artifacts.

In 2010, van Eekeren et al. [9] proposed an algorithm to solve

the problem of boundary pixels; however, this algorithm is

computationally heavy and based upon strong assumptions.

In 2009, dynamic SR models were proposed with an implicit

motion estimation, e.g., steering kernels for SR (SKSR) [20].

While the idea is theoretically attractive, it is very impracti-

cal as it relies on heavy computations and on many empirical

parameters. Moreover, these methods are dedicated for 2D

intensity sequences, strongly failing when it comes to depth

data because of its abrupt value changes around edges and

textureless nature. Such data, usually captured with a time-of-

ﬂight (ToF) camera, requires a resolution enhancement. Fu-

sion based methods have been proposed as a solution for dy-

namic depth scenes [10, 11, 12, 13, 14] where a HR 2D cam-

era is coupled with a depth LR camera. These methods often

suffer from texture copying problems and require a perfect

alignment and synchronization of 2D and depth sequences. In

this work, we propose to release the limitations on scale and

motion of SR algorithms for dynamic depth scenes containing

one or multiple moving objects without prior assumptions on

their shape or motion, and without engaging in an additional

learning stage. The proposed algorithm takes advantage of

the textureless nature of depth data, leading to a robust me-

dian estimation without fusing with 2D data; hence, avoiding

blurring and texture copying artifacts. This algorithm is based

on a new data model that starts by densely upsamling the LR

measurements for an accurate registration.

The organization of the paper starts by formulating the

problem of dynamic SR in Section 2. We then provide our

key concepts for a robust motion estimation in Section 3. In

Section 4, we propose a new data model that leads to a robust

dynamic depth SR algorithm. In Section 5, we experimen-

tally compare its performance with state-of-art techniques us-

ing depth sequences. A conclusion is given in Section 6.

2. PROBLEM FORMULATION

The aim of dynamic SR algorithms is to estimate a sequence

of NHR images {xt}N

t=1 of size (m×n)from an observed

LR sequence {yt}N

t=1, where each LR image ytis of size

(m0×n0)pixels, with n=r·n0and m=r·m0, such that ris

the SR factor. Every image ytmay be viewed as an LR noisy

and deformed realization of xt0at the acquisition time t, with

t0≤t. Rearranging all images xtand yt,t= 1,· · · , N , in

lexicographic order, i.e., column vectors of lengths mn and

m0n0, respectively, we consider the following classical data

model:

yt=DHLt0

txt0+nt, t0≤tand t, t0∈[1, N ]⊂N∗,(1)

where Dis a matrix of dimension (m0n0×mn)that repre-

sents the downsampling operator, and which we assume to

be known and constant over time. The system blur is repre-

sented by the time and space invariant matrix H. The vector

ntis an additive Laplacian noise at time tas justiﬁed in [2].

The matrices Lt0

tare (mn ×mn)matrices corresponding to

the geometric motion between the considered HR image xt0

and the observed LR image ytprior to its downsampling.

The dynamic SR problem is simpliﬁed by reconstructing one

HR image at a time using the full observed sequence. From

now on, we ﬁx the reference time to t0, and focus on the re-

construction of xt0from {yt}N

t=t0. The operation may be

repeated for t0= 1,· · · , N . Based on the data model in

(1), and using an L1norm between the observations and the

model, the Maximum Likelihood (ML) estimate of xt0is ob-

tained as follows:

xt0= arg min

xt0

t=t0

kDHLt0

txt0−ytk1.(2)

Using the same approach as in [2, 17], we consider that H

and Lt0

tare block circulant matrices. Therefore:

HLt0

t=Lt0

tH.(3)

The minimization in (2) can therefore be decomposed into

two steps; estimation of a blurred HR image zt0=Hxt0,

followed by a deblurring step. In what follows, we assume

that ytis simply the noisy and decimated version of ztwith-

out any geometric warp. We may thus write Lt

t=I,∀t,I

being the identity matrix, hence, Lt0

tzt0=zt=Hxt. This

operation can be assimilated to registering zt0to zt. We draw

attention to the fact that in the case of static multi-frame SR,

instead of a sequence, a set of observed LR images is consid-

ered, i.e., there is no order between frames. Such order be-

comes crucial in dynamic SR because the estimation of mo-

tion, based on the optical ﬂow paradigm, happens between

consecutive frames only. An accurate dynamic SR estimation

is consequently highly dependent on the accuracy of estimat-

ing the registration matrices Lt−1

t, as well as Lt0

t. In the case

of one moving object with a very small translational motion

through few frames, a subpixel motion estimation would be

sufﬁcient to guarantee a good HR image. This assumption is

not valid anymore if the object moves fast or the scene has

multiple objects moving with different motions. In this case,

the SR process becomes more challenging and a robust reg-

istration method is required using a dense optical ﬂow. Most

SR algorithms are directly related to a registration based on

a too coarse pixel correspondence as compared to the scale

of details in the scene. It is therefore necessary to call upon

a very accurate subpixel correspondence. In what follows,

we argue that this accuracy is highly increased after an up-

sampling of the observed sequence as presented in Section 3.

We accordingly propose a new data formulation for dynamic

depth SR and give its corresponding algorithm in Section 4.

3. MOTION ESTIMATION AND REGISTRATION

It has been shown in [16] that higher image resolutions help

increase the accuracy of motion estimation which justiﬁes ap-

plying an upsampling framework to get higher scale images.

Moreover, performing the registration process on upsampled

images guarantees a better result with a higher accuracy than

registering the LR images ytfollowed by upsampling them.

This is due to the fact that registration parameters are approx-

imated by rounding the motion vectors with an expected error

of ±1

2pixel. The effect of this error is related to the size

of the registered images, whereas the upsampling process re-

duces this effect from ±1

2min the LR case to ±1

2rm . Hence,

we propose to upsample the observed LR images even be-

fore registering them. Due to the speciﬁcations of depth data,

classical interpolation based methods (e.g. bicubic) cannot be

used and lead to jagged values and blurring effects especially

for boundary pixels. Thus, we propose to densely upsam-

ple yt,t= 1, ..., N , up to the super-resolved image of size

(m×n). We deﬁne the resulting image as:

yt↑=U·yt,(4)

where Uis a dense upsampling matrix of size (mn ×m0n0),

which we choose to be the transpose of D, s.t., UD =A,

where Ais a block circulant matrix that deﬁnes a new blur-

ring matrix B=AH. Therefore, we redeﬁne ztas:

zt=Bxt.(5)

Since the optical ﬂow approach works under the assumption

of small motions, the frames which are further from the refer-

ence frame yt0↑would introduce a higher registration error

than the ones that are closer to yt0↑. They will thus be con-

sidered as outliers. The percentage of these outliers is related

to two main factors; the speed of the moving objects and the

length of the sequence N. For example, a long sequence with

a fast moving object would most likely lead to more than 50%

of outliers and the SR process fails even when using a robust

estimator with a high breakdown value such as a median esti-

mator. To tackle this problem, we herein propose to use a new

registration method based on a cumulative motion compensa-

tion.

Considering two consecutive upsampled frames yt−1↑

and yt↑, the optimal registration solution is:

t−1= arg min

MΨ (yt−1↑,yt↑,M),(6)

where Ψis a dense optical ﬂow-related cost function and

yt↑=Mt

t−1yt−1↑+vt.(7)

The vector vtcontains the innovation that we assume negligi-

ble in this framework. In addition, similarly to [20], for ana-

lytical convenience, we assume that all pixels in yt↑originate

from pixels in yt−1↑in a one to one mapping. Therefore,

each row in Mt

t−1contains 1for each position corresponding

to the address of the source pixel in yt−1↑. This bijective

property implies that the matrix ˆ

t−1is an invertible per-

mutation, s.t., [ˆ

t−1]−1=ˆ

Mt−1

t. Furthermore, its estimate

leads to the following registration to yt−1:

yt↑=ˆ

Mt−1

tyt↑.(8)

We then need to deﬁne yt0

t↑, the registered version of yt↑

to the reference yt0↑. To that end, we use all the regis-

tered upsampled images yt↑, as deﬁned in (8), for t > t0.

We propose, similarly to our work in [16], a cumulative mo-

tion compensation approach with an additional improvement

where we further reduce the cumulated motion error by re-

computing ˆ

t−1as follows:

t−1= arg min

MΨ (yt−1↑,yt↑,M),(9)

We prove by induction the following registration relationship

for non-consecutive frames:

yt0

t↑=ˆ

Mt0

tyt↑=ˆ

Mt0

t0+1 · · · ˆ

Mt−1

| {z }

(t−t0)times

·yt↑.(10)

Considering the bijection simpliﬁcation, we further write:

Mt0

t≈Lt

t0= [Lt0

t]−1.(11)

4. PROPOSED ALGORITHM

The subpixel accuracy in motion estimation induced by the

combined upsampling and cumulative motions proposed in

Section 3, make it feasible to handle a depth sequence with

a moving object without using any prior information on its

shape, rigidity, or motion. These advantages are extended to

the much more complex case of multiple moving objects. In-

deed, the textureless nature of depth images categorizes them

as images containing gross information only, i.e., with no tex-

ture information, as per Mallikarajuna et al.’s composite im-

age model [21]. This property, combined with the SR impulse

noise nt, suggests that a temporal median estimator is a robust

equivalent to the ML formulation of (2).

We reformulate the data model in (1) to introduce the upsam-

pling strategy of Section 3. Combining (1), (3), (4), (5), and

(11), we ﬁnd1:

yt0

t↑=zt0+wt, t0≤tand t, t0∈[1, N ]⊂N∗,(12)

1A full proof of (12) will be provided in another paper.

This work was supported by the National Research Fund, Luxembourg, under

the CORE project C11/BM/1204105/FAVE/Ottersten.

where wt=ˆ

Mt0

tU·ntis an additive Laplacian noise at t.

The estimation in (2) becomes:

zt0= arg min

zt0

t=t0

kzt0−yt0

t↑ k1,(13)

which corresponds to the pixel-wise temporal median estima-

tor, i.e., ˆ

zt0=medt{yt0

t↑}N

t=t0. Then follows a simple im-

age deblurring to recover ˆ

xt0from ˆ

zt0. We hence propose

a new dynamic SR algorithm corresponding to the presented

new SR estimation, that we refer to as Upsampling for Precise

Super-Resolution (UP-SR) as summarized below:

UP-SR algorithm

for t0,

1. Choose the reference frame yt0.

for t,s.t.,t0< t < N,

2. Compute yt↑using (4).

3. Estimate the registration matrices ˆ

Mt0

tusing (10).

4. Compute yt0

t↑using (10).

end do

end for

5. Find ˆ

zt0by applying a median estimator (13).

6. Deduce ˆ

xt0by deblurring.

end for

5. EXPERIMENTS

We test the performance of the proposed UP-SR algorithm

on depth data acquired with a ToF camera. Using the SR

estimation on such data is suitable as it suffers from a very

low resolution. We start with a simple case of one moving

object (hand) with a translational motion. We compare the

performance of the proposed algorithm for both cases, regis-

tered measured LR depth images yt0

tand registered densely

upsampled depth images yt0

t↑,t0<t<N. Results show

that in the latter case (Fig. 1(b)), the registration is more ac-

curate and leads to sharper edges. Directly relying on LR

images, however, leads to blurred edges (Fig. 1(a)), necessi-

tating a special treatment or a segmentation step to reduce the

artifacts caused by the boundary pixels. This experimentally

conﬁrms the beneﬁt of our upsampling strategy. Next, we

tested UP-SR on a real sequence of LR depth images contain-

ing multiple moving objects. We mounted an LR ToF camera

at a 2.5 meter hight looking down at the ground with two per-

sons sitting on chairs sliding in two different directions. A

sequence of 9 LR depth images, of size (56×61) pixels, were

super-resolved with an SR scale factor r= 5 using UP-SR,

2D/depth fusion [13], SKSR [17] and dynamic S&A [1]. Vi-

sual results for one frame are given in Fig. 2 (c), (d), (e) and

(f), clearly showing that SKSR and dynamic S&A fail badly

(a) (b)

Fig. 1. UP-SR results: (a) Proposed method using motion

estimated from an LR sequence. (b) Proposed method using

motion estimated from densely upsampled sequence (r= 5).

on depth data mainly on boundary pixels while 2D/depth fu-

sion, although computationally efﬁcient, often suffers from

strong 2D texture copying on the ﬁnal super-resolved depth

frame. Fig. 2(f) shows the result of UP-SR where we obtained

clear sharp edges in addition to an efﬁcient removal of noisy

pixel values. This is mostly due to the proposed subpixel mo-

tion estimation combined with an accurate registration lead-

ing to a successful temporal fusion of the sequence. Finally,

in order to provide a quantitative evaluation, we generated an

LR depth sequence by downsampling an available HR depth

sequence with a factor r= 4, and further degrading it by

additive white Gaussian noise (AWGN) with signal to noise

ratio (SNR) of 15, 25, 35, and 45 dB. We quantitatively com-

pare our proposed algorithm with SKRS and dynamic S&A.

We tested these methods using the corresponding softwares

provided in [18] and [19]. Since we have a known ground

truth {xt}N

t=1, we measure the quality of an estimated HR

depth frame ˆ

xt0using peak SNR (PSNR), which is deﬁned

as: PSNR = 10 log10

m×n

kxt0−ˆ

xt0k2. Obtained results show the

superiority of the UP-SR algorithm where it provides the best

results among discussed state-of-the-art SR methods across

all noise levels. As illustrated in Fig. 3, it is not surprising to

see that even for a very high noise level (SNR = 15 dB) re-

sults are good. This is due to the key components of UP-SR,

namely, its subpixel motion estimation and accurate multi-

frame registration combined with a robust median ﬁltering

that matches the textureless property of depth data. There-

fore, our algorithm results with good quality depth images

without having to call upon an additional regularization step.

Such type of treatment could not be applied on 2D images

where the results would be blurry with lost details. This may

be explained by the fact that depth data generally fall under

the model in (12).

6. CONCLUSION

A new algorithm has been presented to enhance the quality of

LR depth videos for dynamic scenes containing one or multi-

ple moving objects. This algorithm is based on the SR frame-

(a) 2D image (b) Low resolution frame

(e) Dynamic S&A [1] (f) Proposed UP-SR

Fig. 2. UP-SR example of a dynamic depth scene (r= 5):

(a) 2D Image corresponding to the last frame. (b) Last frame

of 9 LR (56 ×61) depth images. (c) 2D/depth fusion [13]. (d)

SKSR [17]. (e) Dynamic S&A [1]. (f) Proposed UP-SR.

Fig. 3. PSNR for different SR methods applied the moving

hand sequence with (r= 4).

work without strong constraints on objects’ shape or motion.

It takes advantage of the textureless nature of depth data to

achieve robust SR estimation after densely upsampling LR

frames. Experimental results with both synthetic and real ToF

depth images showed that this new approach for SR, although

conceptually simple, provides a more accurate motion estima-

tion which leads to greatly outperforming existing methods

such as fusion based techniques, SKSR, and dynamic S&A.

7. REFERENCES

[1] S. Farsiu, D. Robinson, M. Elad, and P. Milanfar ”Ad-

vances and Challenges in Super-Resolution” , International

Journal of Imaging Systems and Technology, vol. 14, pp.

47-57, 2004

[2] S. Farsiu, D. Robinson, M. Elad, and P. Milanfar, “Fast

and Robust Multi-Frame Super-Resolution”, IEEE TIP,

vol. 13, pp. 1327-1344, 2003.

[3] Z. Lin and H. Shum, “Fundamental Limits of

Reconstruction-Based Superresolution Algorithms under

Local Translation”, IEEE PAMI, vol. 26, no. 1, Jan.2004.

[4] O. M. Aodha, N. Campbell, A. Nair, G. Brostow,

“Patch Based Synthesis for Single Depth Image Super-

Resolution”, ECCV 2012.

[5] D. Glasner, S. Bagon, M. Irani, “Super-Resolution from

a Single Image”, ICCV 2009.

[6] R. Hardie, T. Tuinstra, K. Barnard, J. Bognar, and E.

Armstrong, “High Resolution Image Reconstruction from

Digital Video with In-Scene Motion”, ICIP 1997, pp. 153-

156.

[7] S. Farsiu, M. Elad, adn P. Milanfar, “Video-to-Video

Dynamic Superresolution for Grayscale and Color Se-

quences”, in Journal of Applied Signal Proc., vol. 2006,

ID 61859.

[8] A. W. M. van Eekeren, K. Schutte, J. Dijk, Dirk-Jan de

Lange, and L. J. van Vliet, “Super-Resolution on Moving

Objects and Background”, ICIP 2006, pp. 2709-2712.

[9] A. W. M. van Eekeren, K. Schutte, and L. J. van Vliet,

“Multiframe Super-Resolution Reconstruction of Small

Moving Objects”, IEEE TIP, vol. 19, pp. 2901-2912, 2010.

[10] J. Diebel and S. Thrun, “An application of markov ran-

dom ﬁelds to range sensing”, NIPS 18, pp. 291-298, 2006.

[11] J. Kopf, M. Cohen, D. Lischinski, and M. Uyttendaele,

“Joint bilateral upsampling”, ACM TOG, 26(3), 2007.

[12] Q. Yang, R. Yang, J. Davis, and D. Nister, “Spatial-

depth super resolution for range images”, CVPR, 2007.

[13] F. Garcia, D. Aouada, B. Mirbach, T. Solignac, B. Ot-

tersten, “Real-time Hybrid ToF Multi-Camera Rig Fusion

System for Depth Map Enhancement”, IEEE CVPRW,

pp.1-8, 20-25, Jun. 2011.

[14] X. Xiang, G. Li, J. Tong, Z. Pan, “Fast and Simple Super

Resolution for Range Data”, IEEE Conf on Cyberworlds

(CW), pp.319-324, 20-22 Oct. 2010.

[15] F. Garcia, D. Aouada, B. Mirbach, B Ottersten, “Spatio-

Temporal ToF Data Enhancement by Fusion”, ICIP 2012,

pp.981-984.

[16] L. Xu, J. Jia, S. B. Kang, “Improving sub-pixel corre-

spondence through upsampling”, CVIU, vol 116, Issue 2,

February 2012, pp. 250-261, ISSN 1077-3142.

[17] H. Takeda, P. Milanfar, M. Protter, and M. Elad, “Super-

resolution without Explicit Subpixel Motion Estimation”,

IEEE TIP, Vol. 18, No. 9, September 2009.

[18] http://users.soe.ucsc.edu/htakeda/SpaceTimeSKR.htm

[19] http://users.soe.ucsc.edu/milanfar/software/superresol-

ution.html

[20] M. Elad and A. Feuer, “Super-Resolution reconstruction

of Continuous Image Sequence”, IEEE PAMI, Vol. 21, No.

9, pp. 817-834, Sep 1999.

[21] H. S. Mallikarjuna, L. F. Chaparro, “Iterative composite

ﬁltering for image restoration,” IEEE PAMI, vol.14, no.6,

pp.674-678, Jun 1992

Real-Time Enhancement of Dynamic Depth Videos with Non-Rigid Deformations

Article

Full-text available

Oct 2016

We propose a novel approach for enhancing depth videos containing non-rigidly deforming objects. Depth sensors are capable of capturing depth maps in real-time but suffer from high noise levels and low spatial resolutions. While solutions for reconstructing 3D details in static scenes, or scenes with rigid global motions have been recently proposed, handling unconstrained non-rigid deformations in relative complex scenes remains a challenge. Our solution consists in a recursive dynamic multi-frame super-resolution algorithm where the relative local 3D motions between consecutive frames are directly accounted for. We rely on the assumption that these 3D motions can be decoupled into lateral motions and radial displacements. This allows to perform a simple local per-pixel tracking where both depth measurements and deformations are dynamically optimized. The geometric smoothness is subsequently added using a multi-level L1 minimization with a bilateral total variation regularization. The performance of this method is thoroughly evaluated on both real and synthetic data. As compared to alternative approaches, the results show a clear improvement in reconstruction accuracy and in robustness to noise, to relative large non-rigid deformations, and to topological changes.

Super Resolution Approaches for Depth Video Enhancement

Thesis

Full-text available

Sep 2015

Kassem Al Ismaeil

Sensing using 3D technologies has seen a revolution in the past years where cost-effective depth sensors are today part of accessible consumer electronics. Their ability in directly capturing depth videos in real-time has opened tremendous possibilities for multiple applications in computer vision. These sensors, however, have major shortcomings due to their high noise contamination, including missing and jagged measurements, and their low spatial resolutions. In order to extract detailed 3D features from this type of data, a dedicated data enhancement is required. We propose a generic depth multi-frame super-resolution framework that addresses the limitations of state-of-the-art depth enhancement approaches. The proposed framework does not need any additional hardware or coupling with different modalities. It is based on a new data model that uses densely upsampled low resolution observations. This results in a robust median initial estimation, further refined by a deblurring operation using a bilateral total variation as the regularization term. The upsampling operation ensures a systematic improvement in the registration accuracy. This is explored in different scenarios based on the motions involved in the depth video. For the general and most challenging case of objects deforming non-rigidly in full 3D, we propose a recursive dynamic multi-frame super-resolution algorithm where the relative local 3D motions between consecutive frames are directly accounted for. We rely on the assumption that these 3D motions can be decoupled into lateral motions and radial displacements. This allows to perform a simple local per-pixel tracking where both depth measurements and deformations are optimized. As compared to alternative approaches, the results show a clear improvement in reconstruction accuracy and in robustness to noise, to relative large non-rigid deformations, and to topological changes. Moreover, the proposed approach, implemented on a CPU, is shown to be computationally efficient and working in real-time.

Super-Resolution Approaches for Depth Video Enhancement

Thesis

Full-text available

Sep 2015

Kassem Al Ismaeil

Real-Time Non-Rigid Multi-Frame Depth Video Super–Resolution

Conference Paper

Full-text available

Jun 2015

This paper proposes to enhance low resolution dynamic depth videos containing freely non–rigidly moving objects with a new dynamic multi–frame super–resolution algorithm. Existent methods are either limited to rigid objects, or restricted to global lateral motions discarding radial displacements. We address these shortcomings by accounting for non–rigid displacements in 3D. In addition to 2D optical ﬂow, we estimate the depth displacement, and simultaneously correct the depth measurement by Kalman ﬁltering, This concept is incorporated efﬁciently in a multi–frame super–resolution framework. It is formulated in a recursive manner that ensures an efﬁcient deployment in real–time. Results show the overall improved performance of the proposed method as compared to alternative approaches, and speciﬁcally in handling relatively large 3D motions. Test examples range from a full moving human body to a highly dynamic facial video with varying expressions.

Full 3D Reconstruction of Non-Rigidly Deforming Objects

Article

Full-text available

Jan 2018

In this work, we discuss enhanced full 360 • 3D reconstruction of dynamic scenes containing non-rigidly deforming objects using data acquired from commodity depth or 3D cameras. Several approaches for enhanced and full 3D reconstruction of non-rigid objects have been proposed in the literature. These approaches suffer from several limitations due to requirement of a template, inability to tackle large local deformations and topology changes, inability to tackle highly noisy and low-resolution data, and inability to produce on-line results. We target on-line and template-free enhancement of the quality of noisy and low-resolution full 3D reconstructions of dynamic non-rigid objects. For this purpose, we propose a view-independent recursive and dynamic multi-frame 3D super-resolution scheme for noise removal and resolution enhancement of 3D measurements. The proposed scheme tracks the position and motion of each 3D point at every-time step by making use of the current acquisition and the result of the previous iteration. The affects of system blur due to per-point tracking are subsequently tackled by introducing a novel and efficient multi-level 3D bilateral total variation regularization. These characteristics enable the proposed scheme to handle large deformations and topology changes accurately. A thorough evaluation of the proposed scheme on both real and simulated data is carried out. The results show that the proposed scheme improves upon the performance of the state-of-art methods and is able to accurately enhance the quality of low-resolution and highly noisy 3D reconstructions while being robust to large local deformations.

Unified Multi-Lateral Filter for Real-Time Depth Map Enhancement

Article

Jul 2015
IMAGE VISION COMPUT

This paper proposes a unified multi-lateral filter to efficiently increase the spatial resolution of low-resolution and noisy depth maps in real-time. Time-of-Flight (ToF) cameras have become a very promising alternative to stereo-based range sensing systems as they provide depth measurements at a high frame rate. However, there are actually two main drawbacks that restrict their use in a wide range of applications; namely, their fairly low spatial resolution as well as the amount of noise within the depth estimation. In order to address these drawbacks, we propose a new approach based on sensor fusion. That is, we couple a ToF camera of low-resolution with a 2-D camera of higher resolution to which the low-resolution depth map will be efficiently upsampled. In this paper, we first review the existing depth map enhancement approaches based on sensor fusion and discuss their limitations. We then propose a unified multi-lateral filter that accounts for the inaccuracy of depth edges position due to the low-resolution ToF depth maps. By doing so, unwanted artefacts such as texture copying and edge blurring are almost entirely eliminated. Moreover, the proposed filter is configurable to behave as most of the alternative depth enhancement approaches. Using a convolution-based formulation and data quantization and downsampling, the described filter has been effectively and efficiently implemented for dynamic scenes in real-time applications. The experimental results show a sensitive qualitative as well as quantitative improvement on raw depth maps, outperforming state-of-the-art multi-lateral filters.

Depth Image Super Resolution Based on Edge-Guided Method

Article

Full-text available

Feb 2018

Depth image super-resolution (SR) is a technique which can reconstruct a high-resolution (HR) depth image from a low-resolution (LR) depth image. Its purpose is to obtain HR details to meet the needs of various applications in computer vision. In general, conventional depth image SR methods often cause edges in the final HR image to be blurred or ragged. To solve this problem, an edge-guided method for depth image SR is presented in this paper. To get high-quality edge information, a pair of sparse dictionaries was applied to reconstruct edges of depth image. Then, with the guidance of these high-quality edges, a depth image was interpolated by using a modified joint bilateral filter. Edge-guided method can preserve the sharpness of edges and effectively avoid generating blurry and ragged edges when SR is performed. Experiments showed that the proposed method can get better results on both subjective and objective evaluation, and the reconstructed performance was superior to conventional depth image SR methods.

Enhancement of Dynamic Depth Scenes by Upsampling for Precise Super-Resolution (UP-SR)

Article

Full-text available

Apr 2016
COMPUT VIS IMAGE UND

Multi-frame super-resolution is the process of recovering a high resolution image or video from a set of captured low resolution images. Super-resolution approaches have been largely explored in 2-D imaging. However, their extension to depth videos is not straightforward due to the textureless nature of depth data, and to their high frequency contents coupled with fast motion artifacts. Recently, few attempts have been introduced where only the super-resolution of static depth scenes has been addressed. In this work, we propose to enhance the resolution of dynamic depth videos with non-rigidly moving objects. The proposed approach is based on a new data model that uses densely upsampled, and cumulatively registered versions of the observed low resolution depth frames. We show the impact of upsampling in increasing the sub-pixel accuracy and reducing the rounding error of the motion vectors. Furthermore, with the proposed cumulative motion estimation, a high registration accuracy is achieved between non-successive upsampled frames with relative large motions. A statistical performance analysis is derived in terms of mean square error explaining the effect of the number of observed frames and the effect of the super-resolution factor at a given noise level. We evaluate the accuracy of the proposed algorithm theoretically and experimentally as function of the SR factor, and the level of contamination with noise. Experimental results on both real and synthetic data show the effectiveness of the proposed algorithm on dynamic depth videos as compared to state-of-art methods.

Depth Super-Resolution on RGB-D Video Sequences with Large Displacement 3D Motion

Article

Mar 2018

To enhance the resolution and accuracy of depth data, some video-based depth super-resolution methods have been proposed which utilizes its neighboring depth images in the temporal domain. They often consist of two main stages: motion compensation of temporally neighboring depth images and fusion of compensated depth images. However, large displacement 3D motion often leads to compensation error, and the compensation error is further introduced into the fusion. A video-based depth super-resolution method with novel motion compensation and fusion approaches is proposed in this paper. We claim that, 3D Nearest Neighboring Field (NNF) is a better choice than using positions with true motion displacement for depth enhancements. To handle large displacement 3D motion, the compensation stage utilized 3D NNF instead of true motion used in previous methods. Next, the fusion approach is modeled as a regression problem to predict the super-resolution result efficiently for each depth image by using its compensated depth images. A new deep convolutional neural network architecture is designed for fusion, which is able to employ a large amount of video data for learning the complicated regression function. We comprehensively evaluate our method on various RGB-D video sequences to show its superior performance.

Patch-based Statistical Performance Analysis of Upsampling for Precise Super–Resolution

Conference Paper

Full-text available

Mar 2015

All existent methods for statistical analysis of super–resolution approaches have stopped at the variance term, not accounting for the bias. In this paper we give an original derivation of the bias term. We propose to use a patch-based method inspired by the work of (P. Chatterjee and P. Milanfar, 2009). Our approach, however, is completely new as we derive a new afﬁne bias model dedicated for the multi-frame super resolution framework. We apply the proposed statistical performance analysis to the Upsampling for Precise Super–Resolution (UP-SR) algorithm. This algorithm was shown experimentally to be a good solution for enhancing the res- olution of depth sequences in both cases of global and local motions. Its performance is herein analyzed theoretically in terms of its approximated mean square error, using the proposed derivation of the bias. This analysis is validated experimentally on a simulated static and dynamic depth sequences with a known ground truth. This provides an insightful understanding of the effects of noise variance, number of observed low resolution frames, and super–resolution factor on the ﬁnal and intermediate performance of UP–SR. Our conclusion is that increasing the number of frames should improve the performance while the error is increased due to local motions, and to the upsampling which is part of UP-SR.

Depth Super-Resolution by Enhanced Shift and Add

Conference Paper

Full-text available

Aug 2013

We use multi-frame super-resolution, specifically, Shift & Add, to increase the resolution of depth data. In order to be able to deploy such a framework in practice, without requiring a very high number of observed low resolution frames, we improve the initial estimation of the high resolution frame. To that end, we propose a new data model that leads to a median estimation from densely upsampled low resolution frames. We show that this new formulation solves the problem of undefined pixels and further allows to improve the performance of pyramidal motion estimation in the context of super-resolution without additional computational cost. As a consequence, it increases the motion diversity within a small number of observed frames, making the enhancement of depth data more practical. Quantitative experiments run on the Middlebury dataset show that our method outperforms state-of-the-art techniques in terms of accuracy and robustness to the number of frames and to the noise level.

Patch Based Synthesis for Single Depth Image Super-Resolution

Conference Paper

Full-text available

Oct 2012

We present an algorithm to synthetically increase the resolution of a solitary depth image using only a generic database of local patches. Modern range sensors measure depths with non-Gaussian noise and at lower starting resolutions than typical visible-light cameras. While patch based approaches for upsampling intensity images continue to improve, this is the first exploration of patching for depth images. We match against the height field of each low resolution input depth patch, and search our database for a list of appropriate high resolution candidate patches. Selecting the right candidate at each location in the depth image is then posed as a Markov random field labeling problem. Our experiments also show how important further depth-specific processing, such as noise removal and correct patch normalization, dramatically improves our results. Perhaps surprisingly, even better results are achieved on a variety of real test scenes by providing our algorithm with only synthetic training depth data.

Spatio-Temporal ToF Data Enhancement by Fusion

Conference Paper

Full-text available

Sep 2012
Image Process

We propose an extension of our previous work on spatial domain Time-of-Flight (ToF) data enhancement to the temporal domain. Our goal is to generate enhanced depth maps at the same frame rate of the 2-D camera that, coupled with a ToF camera, constitutes a hybrid ToF multi-camera rig. To that end, we first estimate the motion between consecutive 2-D frames, and then use it to predict their corresponding depth maps. The enhanced depth maps result from the fusion between the recorded 2-D frames and the predicted depth maps by using our previous contribution on ToF data enhancement. The experimental results show that the proposed approach overcomes the ToF camera drawbacks; namely, low resolution in space and time and high level of noise within depth measurements, providing enhanced depth maps at video frame rate.

Real-time hybrid ToF multi-camera rig fusion system for depth map enhancement

Conference Paper

Full-text available

Jun 2011

We present a full real-time implementation of a multilateral filtering system for depth sensor data fusion with 2-D data. For such a system to perform in real-time, it is necessary to have a real-time implementation of the filter, but also a real-time alignment of the data to be fused. To achieve an automatic data mapping, we express disparity as a function of the distance between the scene and the cameras, and simplify the matching procedure to a simple indexation procedure. Our experiments show that this implementation ensures the fusion of 3-D data and 2-D data in real-time and with high accuracy.

Multiframe Super-Resolution Reconstruction of Small Moving Objects

Article

Full-text available

Dec 2010

Multiframe super-resolution (SR) reconstruction of small moving objects against a cluttered background is difficult for two reasons: a small object consists completely of “mixed” boundary pixels and the background contribution changes from frame-to-frame. We present a solution to this problem that greatly improves recognition of small moving objects under the assumption of a simple linear motion model in the real-world. The presented method not only explicitly models the image acquisition system, but also the space-time variant fore- and background contributions to the “mixed” pixels. The latter is due to a changing local background as a result of the apparent motion. The method simultaneously estimates a subpixel precise polygon boundary as well as a high-resolution (HR) intensity description of a small moving object subject to a modified total variation constraint. Experiments on simulated and real-world data show excellent performance of the proposed multiframe SR reconstruction method.

Super-Resolution from a Single Image

Conference Paper

Full-text available

Nov 2009

Methods for super-resolution can be broadly classified into two families of methods: (i) The classical multi-image super-resolution (combining images obtained at subpixel misalignments), and (ii) Example-Based super-resolution (learning correspondence between low and high resolution image patches from a database). In this paper we propose a unified framework for combining these two families of methods. We further show how this combined approach can be applied to obtain super resolution from as little as a single image (with no database or prior examples). Our approach is based on the observation that patches in a natural image tend to redundantly recur many times inside the image, both within the same scale, as well as across different scales. Recurrence of patches within the same image scale (at subpixel misalignments) gives rise to the classical super-resolution, whereas recurrence of patches across different scales of the same image gives rise to example-based super-resolution. Our approach attempts to recover at each pixel its best possible resolution increase based on its patch redundancy within and across scales.

Patch based synthesis for single depth image super-resolution

Article

Jan 2012

An application of markov random fields to range sensing

Article

Jan 2005

Advances and challenges in Super-Resolution

Article

Aug 2004
INT J IMAG SYST TECH

Super-resolution reconstruction produces one or a set of high-resolution images from a se-quence of low-resolution frames. This paper reviews a variety of super-resolution methods proposed in the last twenty years, and provides some insight to, and a summary of, our recent contributions to the general super-resolution problem. In the process, a detailed study of several very important aspects of super-resolution, often ignored in the literature, is presented. Specifically, we discuss robustness, treatment of color, and dynamic operation modes. Novel methods for addressing these issues are accompanied by experimental results on simulated and real data. Finally, some future challenges in super-resolution are outlined and discussed.

Super-Resolution Without Explicit Subpixel Motion Estimation

Article

Oct 2009

The need for precise (subpixel accuracy) motion estimates in conventional super-resolution has limited its applicability to only video sequences with relatively simple motions such as global translational or affine displacements. In this paper, we introduce a novel framework for adaptive enhancement and spatiotemporal upscaling of videos containing complex activities without explicit need for accurate motion estimation. Our approach is based on multidimensional kernel regression, where each pixel in the video sequence is approximated with a 3-D local (Taylor) series, capturing the essential local behavior of its spatiotemporal neighborhood. The coefficients of this series are estimated by solving a local weighted least-squares problem, where the weights are a function of the 3-D space-time orientation in the neighborhood. As this framework is fundamentally based upon the comparison of neighboring pixels in both space and time, it implicitly contains information about the local motion of the pixels across time, therefore rendering unnecessary an explicit computation of motions of modest size. The proposed approach not only significantly widens the applicability of super-resolution methods to a broad variety of video sequences containing complex motions, but also yields improved overall performance. Using several examples, we illustrate that the developed algorithm has super-resolution capabilities that provide improved optical resolution in the output, while being able to work on general input video with essentially arbitrary motion.

Dynamic super resolution of depth sequences with non-rigid motions

Abstract and Figures

Recommended publications

Detecting and tracking dynamic objects in complex environments

Temporal synchronization of non-overlapping videos using known object motion

Occlusion handling in meanshift tracking using adaptive window Normalized Cross Correlation

Enhancement of Dynamic Depth Scenes by Upsampling for Precise Super-Resolution (UP-SR)

Multi-frame super-resolution by enhanced shift & add

Depth Super-Resolution by Enhanced Shift and Add

Real-Time Non-Rigid Multi-Frame Depth Video Super–Resolution