Conference PaperPDF Available

Stereo correspondence with slanted surfaces: critical implications of horizontal slant

Authors:

Abstract and Figures

We examine the stereo correspondence problem in the presence of slanted scene surfaces. In particular we highlight a previously overlooked geometric fact: a horizontally slanted surface (i.e. having depth variation in the direction of the separation of the two cameras) will appear horizontally stretched in one image as compared to the other image. Thus, while corresponding two images, N pixels on a scanline in one image may correspond to a different number of pixels M in the other image. This leads to three important modifications to existing stereo algorithms: (a) due to unequal sampling, intensity matching metrics such as the popular Birchfield-Tomasi procedure must be modified, (b) unequal numbers of pixels in the two images must be allowed to correspond to each other, and (c) the uniqueness constraint, which is often used for detecting occlusions, must be changed to a 3D uniqueness constraint. This paper discusses these new constraints and provides a simple scanline based matching algorithm for illustration. We experimentally demonstrate test cases where existing algorithms fail, and how the incorporation of these new constraints provides correct results. Experimental comparisons of the scanline based algorithm with standard data sets are also provided.
Content may be subject to copyright.
Stereo correspondence with slanted surfaces: critical implications of horizontal
slant
Abhijit S. Ogale and Yiannis Aloimonos
Center for Automation Research, University of Maryland, College Park, MD 20742
{ogale, yiannis}@cfar.umd.edu
Abstract
We examine the stereo correspondence problem in the
presence of slanted scene surfaces. In particular, we high-
light a previously overlooked geometric fact: a horizontally
slanted surface (i.e. having depth variation in the direction
of the separation of the two cameras) will appear horizon-
tally stretched in one image as compared to the other image.
Thus, while corresponding two images, N pixels on a scan-
line in one image may correspond to a different number of
pixels M in the other image. This leads to three important
modifications to existing stereo algorithms: (a) due to un-
equal sampling, intensity matching metrics such as the pop-
ular Birchfield-Tomasi procedure must be modified, (b) un-
equal numbers of pixels in the two images must be allowed
to correspond to each other, and (c) the uniqueness con-
straint, which is often used for detecting occlusions, must
be changed to a 3D uniqueness constraint. This paper dis-
cusses these new constraints and provides a simple scanline
based matching algorithm for illustration. We experimen-
tally demonstrate test cases where existing algorithms fail,
and how the incorporation of these new constraints provides
correct results. Experimental comparisons of the scanline
based algorithm with standard data sets are also provided.
1. Introduction
The dense stereo correspondence problem consists of
finding a mapping between the points in two images of a
scene. If the images have been rectified, then a point Sin
one image may correspond to a point S0in the other image,
where Sand S0lie on the same horizontal scanline. The
difference in the horizontal position of Sand S0is termed
as horizontal disparity. In this paper, we assume that we are
dealing with a rectified pair of images.
1.1. Previous work
There exists a considerable body of work on the dense
stereo correspondence problem. Scharstein and Szeliski
[19] have provided an exhaustive comparison of dense
stereo correspondence algorithms. Most algorithms gen-
erally utilize local measurements such as image intensity
(or color) and phase, and aggregate information from mul-
tiple pixels using smoothness constraints. The simplest
method of aggregation is to minimize the matching error
within rectangular windows of fixed size [16]. Better ap-
proaches utilize multiple windows [8, 7], adaptive win-
dows [10] which change their size in order to minimize the
error, shiftable windows [4, 21], or predicted windows [14],
all of which give performance improvements at discontinu-
ities.
Global approaches to solving the stereo correspondence
problem rely on the extremization of a global cost func-
tion or energy. The energy functions which are used in-
clude terms for local property matching (‘data term’), ad-
ditional smoothness terms, and in some cases, penalties for
occlusions. Depending on the form of the energy function,
the most efficient energy minimization scheme can be cho-
sen. These include dynamic programming [15], simulated
annealing [9, 1], relaxation labeling [20], non-linear diffu-
sion [18], maximum flow [17] and graph cuts [5, 11]. Max-
imum flow and graph cut methods provide better computa-
tional efficiency than simulated annealing for energy func-
tions which possess a certain set of properties. Some of
these algorithms treat the images symmetrically and explic-
itly deal with occlusions (eg. [11]). The uniqueness con-
straint [13] is often used to find regions of occlusion. Egnal
and Wildes [6] provide comparisons of various approaches
for finding occlusions.
Recently, some algorithms [3] have explicitly incorpo-
rated the estimation of slant while performing the estima-
tion of horizontal disparity. Lin and Tomasi [12] explicitly
model the scene using smooth surface patches and also find
occlusions; they initialize their disparity map with integer
disparities obtained using graph cuts, after which surface
Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04)
1063-6919/04 $20.00 © 2004 IEEE
fitting and segmentation are performed repeatedly.
1.2. Our approach
We explicitly examine the stereo correspondence prob-
lem in the presence of horizontally slanted scene surfaces.
In particular, we lay emphasis on the following geometric
effect: a horizontally slanted surface (ie. having depth vari-
ation in the direction of the separation of the two cameras)
will appear horizontally stretched in one image as compared
to the other image. Thus, when we correspond two images,
N pixels on a scanline in one image must be allowed to cor-
respond with a different number of pixels M in the other
image. Furthermore, it is evident that the intensity function
on the true horizontally slanted scene surface is sampled
differently by the two cameras, which is another low-level
effect which needs to be dealt with. Also, the uniqueness
constraint, which is often used to find occlusions by forc-
ing a one-to-one correspondence between pixels, is not true
for horizontally slanted surfaces, since a N-to-M correspon-
dence is possible. Hence, the uniqueness constraint must
be reformulated in terms of scene visibility in the presence
of horizontally slanted surfaces. In Section 2, we exam-
ine the above ideas and underscore the need for the treat-
ment of horizontal slant in the first stage of any stereo al-
gorithm during disparity estimation itself, rather than as a
post-processing or a feedback step. For the sake of illustra-
tion, we present a simple scanline based algorithm in Sec-
tion 3 which makes use of these constraints, and provide
experimental comparisons with existing algorithms using
standard data sets in Section 4.
a
1
b
1
a
2
b
2
AB
C
1
C
2
a
1
b
1
a
2
b
2
AB
C
1
C
2
A
B
C
1
C
2
a
1
b
1
a
2
b
2
A
B
C
1
C
2
a
1
b
1
a
2
b
2
Figure 1. (Left) unequal projection lengths of
a horizontally slanted line (Right) equal pro-
jection lengths of a fronto-parallel line
C
1
C
2
A
B
C
1
C
2
A
B
Figure 2. Sampling problem for a horizontally
slanted line
2. Effects of Horizontal Slant
2.1. Unequal projection lengths
Using a 1D camera, Figure 1 shows on the left, how a
horizontally slanted line DE in the scene projects onto the
line segment d1e1in camera F1,andd2e2in camera F2.
Clearly, the lengths of d1e1and d2e2are not equal. Assume
that the cameras have focal length equal to 1. Let the point
Dhave coordinates ([D>]
D)in space with respect to cam-
era 1, and point Ehave coordinates ([E>]
E), where the
[-axis is along the scanline, and the ]-axisisnormalto
the scanline. Then, if the cameras are separated by a trans-
lation w, we can immediately find the lengths O1and O2of
the projected line segments in the two cameras.
O1=[E@]E[D@]D
O2=([Ew)@]E([Dw)@]D
(1)
Clearly, in general, O1and O2are not equal. For the
fronto-parallel line shown in Figure 1 on the right, ]D=
]E=], hence
O1=O2=([E[D)@] (2)
Thus, except for the fronto-parallel case, horizontally
slanted line segments in space will always project onto seg-
ments of different lengths in the two cameras. Hence, N
pixels on a scanline in one image can correspond to a dif-
ferent number of pixels M on a scanline in the other image.
Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04)
1063-6919/04 $20.00 © 2004 IEEE
We must therefore make a provision in our stereo algorithms
to permit unequal correspondences of this nature.
2.2. Sampling
Since a horizontally slanted line segment in space has
different projection lengths in the two cameras, it’s inten-
sity function is also sampled differently by the two cameras
as shown in Figure 2. Birchfield and Tomasi [2] have pro-
vided a very useful method for matching pixel intensities,
which is insensitive to image sampling. However, due to
unequal sampling in the presence of horizontal slant, we
must first resample each scanline correctly, and then apply
the Birchfield-Tomasi matching procedure, which only uses
nearest neighbor pixels for interpolation. In other words, we
first stretch (resample) one of the scanlines, by an amount
related to the slant we are considering, and then match this
stretched scanline with the other unstretched scanline us-
ing the Birchfield-Tomasi matching process as usual. For
example, if we are considering the linear correspondence
function {2=p{1+gbetween points of camera 1 and 2,
then we must stretch the image of camera 1 by a factor p
before performing the intensity based matching.
2.3. Occlusions and the uniqueness constraint
The uniqueness constraint [13] is often used to find oc-
clusions. In its present form, the uniqueness constraint
forces a one-to-one correspondence between pixels in the
two images. In the end, the unpaired pixels are the occlu-
sions. However, since horizontal slant allows N pixels in
one image to match with a different number of pixels M in
the other image, we can no longer impose a one-to-one cor-
respondence for nding occlusions. We must modify the
uniqueness constraint so that we enforce a one-to-one map-
ping between continuous intervals (line segments) in the
two scanlines, instead of pixels. An interval in one scan-
line may correspond to an interval of a different length in
the other scanline, as long as the correspondence is unique.
This is equivalent to enforcing uniqueness in the scene
space instead of the image space, hence we may also refer
to this constraint as the 3D uniqueness constraint.
Figure 3 shows how the modified uniqueness constraint
is used. Part (a) shows an existing one-to-one correspon-
dence between intervals on the left and right scanlines. This
denotes an intermediate state in the progress of a stereo
matching and segmentation algorithm. Notice that the in-
tervals may correspond in any order (ie. the ordering con-
straint is not needed). Now, in part (b), we wish to insert a
new pair of corresponding intervals, shown by dashed lines.
(This new pair of matching intervals improves upon the
existing matches according to some energy metric which
depends on the stereo algorithm being used). In part (c),
we see that the insertion of this pair of intervals conflicts
with existing intervals (shown in gray). In order to enforce
uniqueness, the gray pair of intervals on the right must be
removed, while the gray pair of intervals on the left must be
resized. In part (d), we see the new correspondences. The
interval pair which was resized is shown in gray, and the
inserted interval is shown as dashed.
3. Scanline stereo algorithm
We now describe a simple algorithm to illustrate how the
above ideas may be implemented. For simplicity, the al-
gorithm processes a pair of scanlines LO({)and LU({)at a
time without using any vertical consistency constraints (the
results are post-processed by a simple median filter). Hor-
izontal disparities O({)are assigned to the left scanline
within a given range [1>2],andU({)to the right scan-
line in the range [2>1]=Notice that the disparities
are not assigned to pixels, but continuously over the whole
scanline. The disparities are not directly estimated, but in-
stead, we search for functions pO({)and gO({)for the left
scanline, and pU({)and gU({)for the right scanline, such
that given a point {Oon the left scanline, its corresponding
point {Uin the right scanline would be
{U=pO({O)·{O+gO({O)
and reciprocally:
{O=pU({U)·{U+gU({U)
Clearly,
pU({U)=1@pO({O)
gU({U)=gO({O)@pO({O)
The disparities are then computed as:
O({O)={U{O=(pO({O)1) ·{O+gO({O)
U({U)={O{U=(pU({U)1) ·{U+gU({U)
The function pOand pUare horizontal slants, which
allow line segments of different length in the two scanlines
to correspond. The scanlines are represented continuously
by linearly interpolating intensities between pixel locations.
Thus, if pO=2, then the left scanline is stretched (resam-
pled) by a factor of 2, and then matched with the unstretched
right scanline using the Birchfield-Tomasi method. Due to
the stretching of one scanline before performing the inten-
sity based matching, we are automatically modifying the
traditional Birchfield-Tomasi method to properly deal with
horizontal slant. For each possible pOand gO, absolute in-
tensity differences between corresponding points are com-
puted, and thresholded by a threshold w. The best value of
pOand gOfor a point is chosen such that it maximizes
Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04)
1063-6919/04 $20.00 © 2004 IEEE
left
right
left
right
left
right
left
right
(a) Initial correspondence (b) Insert new pair of matching intervals
(c) Enforce uniqueness constraint (d) Final correspondence
left
right
left
right
left
right
left
right
(a) Initial correspondence (b) Insert new pair of matching intervals
(c) Enforce uniqueness constraint (d) Final correspondence
Figure 3. The modified uniqueness constraint operates by preserving a one-to-one correspondence
between intervals on the left and right scanlines, instead of pixels
the size of the matching line segment containing that point.
This is the simple global optimization which we perform to
choose among the possible disparities.
The values of the horizontal slant which are to be ex-
amined are provided as inputs, ie. pO>p
U5P,where
P={p1>p
2===> pn}. Thus, given the possible slants P
and the disparity search range [1>2], the possible values
of gOand gUfor each position can be restricted.
In order to find the occlusions, we enforce the unique-
ness constraint in its modified form as shown in Figure 3.
We maintain a one-to-one correspondence between inter-
vals in the two scanlines. Hence, at any stage of the pro-
cess,wehaveasetVOof non-overlapping intervals in the
left scanline and a set VUof non-overlapping intervals in
the right scanline. An interval lis of the form [{1>{
2).The
uniqueness constraint enforces a one-to-one mapping Xbe-
tween the elements of VOand the elements of VU.Whena
new corresponding pair of intervals lOand lUis found, the
previous correspondences of segments in VOwhich overlap
with lOare removed, and the same is done for lUand VU.
Then, lOis added to VO,andlUto VU, and the one-to-one
mapping in Xis updated. Thus, we always ensure that a line
segment in the left scanline uniquely maps to a line segment
in the right scanline. In the end, line segments which remain
unmapped are the occlusions.
4. Experiments
Scharstein and Szeliski [19] have set up a test suite
(at www.middlebury.edu/stereo) of stereo image pairs along
with ground truth disparities for comparing the results of
dense stereo algorithms. The disparity map grxw generated
by an algorithm is compared to the true disparity gwuxh,and
the pixels which deviate by more than 1 unit from their true
disparity are termed as ‘bad’ pixels. The percentages of
bad pixels in the entire image, in the untextured regions and
near depth discontinuities are used to compare the results of
various algorithms. The percentages of bad pixels are re-
ported in Table 1, which was generated by submitting our
disparity maps (Figure 4) using the scanline algorithm to the
Middlebury website created by Scharstein et al (mentioned
earlier). The simple scanline algorithm presented earlier
(denoted ‘slanted scanline’ in the table) ranks ninth over-
all, while the ranks in each column are showed in brackets,
below the error percentages. This performance evaluation is
presented only for the sake of completeness, since the pri-
mary purpose of this paper is not to provide an algorithm,
but rather to understand the effects of horizontal slant, and
propose methods for correctly dealing with them. We ex-
pect that the constraints presented above will improve the
results of many existing stereo algorithms.
The correctness of our approach immediately becomes
evident when dealing with the stereo pair shown in Figure 5.
This pair of test images shows a black object which is hor-
Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04)
1063-6919/04 $20.00 © 2004 IEEE
Table 1. Performance comparison from the Middlebury Stereo Vision Page (overall rank is 9’th among
29 algorithms). The table shows only the top ten algorithms. Error percentages and rank in each
column (in brackets) is shown.
Rank Algorithm Tsukuba Sawtooth Venus Map
all untex disc. all untex disc. all untex disc. all disc.
1Segm.-based
GC
1.23
( 3)
0.29
( 2)
6.94
( 4)
0.30
( 1)
0.00
( 1)
3.24
( 1)
0.08
( 1)
0.01
( 1)
1.39
( 1)
1.49
(21)
15.46
(26)
2Layered 1.58
( 5)
1.06
( 7)
8.82
( 6)
0.34
( 2)
0.00
( 1)
3.35
( 2)
1.52
( 9)
2.96
(18)
2.62
( 3)
0.37
(11)
5.24
(11)
3Belief prop 1.15
( 1)
0.42
( 3)
6.31
( 1)
0.98
( 9)
0.30
(14)
4.83
( 6)
1.00
( 5)
0.76
( 5)
9.13
(13)
0.84
(18)
5.27
(12)
4MultCam
GC
1.85
( 9)
1.94
(14)
6.99
( 5)
0.62
( 6)
0.00
( 1)
6.86
(11)
1.21
( 7)
1.96
( 9)
5.71
( 7)
0.31
( 8)
4.34
(10)
5GC+occl 2b 1.19
( 2)
0.23
( 1)
6.71
( 2)
0.73
( 8)
0.11
( 8)
5.71
( 8)
1.64
(12)
2.75
(16)
5.41
( 6)
0.61
(14)
6.05
(13)
6Impr. Coop. 1.67
( 6)
0.77
( 5)
9.67
(10)
1.21
(13)
0.17
(11)
6.90
(12)
1.04
( 6)
1.07
( 6)
13.68
(18)
0.29
( 6)
3.65
( 7)
7GC+occl. 2a 1.27
( 4)
0.43
( 4)
6.90
( 3)
0.36
( 3)
0.00
( 1)
3.65
( 3)
2.79
(20)
5.39
(21)
2.54
( 2)
1.79
(22)
10.08
(20)
8Disc. pres. 1.78
( 7)
1.22
(10)
9.71
(11)
1.17
(11)
0.08
( 7)
5.55
( 7)
1.61
(11)
2.25
(12)
9.06
(12)
0.32
( 9)
3.33
( 6)
9Slanted
Scanline
1.82
( 8)
1.09
( 8)
9.47
( 8)
0.72
( 7)
0.24
(13)
6.00
( 9)
3.25
(21)
5.73
(22)
8.51
(11)
0.22
( 2)
3.10
( 4)
10 Graph cuts 1.94
(11)
1.09
( 9)
9.49
( 9)
1.30
(15)
0.06
( 6)
6.34
(10)
1.79
(15)
2.61
(15)
6.91
( 8)
0.31
( 7)
3.88
( 8)
29 Max. surf. 11.10
(29)
10.70
(27)
41.99
(29)
5.51
(29)
5.56
(29)
27.39
(28)
4.36
(24)
4.78
(20)
41.13
(28)
4.17
(28)
27.88
(28)
Figure 4. Top row (Left frames), Middle row (ground truth), Bottom row (our results). Occlusions were
filled in before performing the evaluation.
Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04)
1063-6919/04 $20.00 © 2004 IEEE
izontally slanted (depth decreases from left to right). The
second row of the figure shows on the left the output of the
graph cuts algorithm of Kolmogorov et al [11]. The graph
cuts result was obtained using software kindly provided by
the authors (www.cs.cornell.edu/People/vnk/software.html).
Our results are shown in the second row on the right hand
side. The graph cuts algorithm finds a constant disparity
value in the interior of the slanted object, which is clearly
incorrect. Our algorithm correctly shows the disparity of the
slanted object linearly decreasing from left to right (from
white to dark gray). The detected occlusions are shown in
black.
Figure 5. Horizontally slanted object. Top
row: left image, right image. Bottom row:
(left) results using graph cuts [11], (right) our
results. Occlusions are shown in black
5. Conclusions
We have discussed the effects of horizontal slant on the
stereo correspondence problem. We have shown that hor-
izontal slant leads to unequal projections in the two cam-
eras, which requires us to modify stereo algorithms for al-
lowing M-to-N pixel correspondences. Furthermore, we
have shown that horizontal slant leads to uneven sampling
of a surface by the two cameras, and hence local inten-
sity matching metrics must be suitably modified. Finally,
the uniqueness constraint for finding occlusions, which im-
poses a one-to-one correspondence between image pixels,
must be modified to enforce a one-to-one correspondence
between scanline intervals instead of pixels. We have also
presented a simple scanline based algorithm which imple-
ments these constraints, and provided experimental compar-
isons with existing methods.
References
[1] S. T. Barnard. Stochastic stereo matching over scale. IJCV,
3(1):17–32, 1989.
[2] S. Birchfield and C. Tomasi. A pixel dissimilarity measure
that is insensitive to image sampling. IEEE Trans. PAMI,
20(4):401–406, 1998.
[3] S. Birchfield and C. Tomasi. Multiway cut for stereo and
motion with slanted surfaces. ICCV, 1:489–495, 1999.
[4] A. F. Bobick and S. S. Intille. Large occlusion stereo. IJCV,
33(3):181–200, Sept 1999.
[5] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate
energy minimization via graph cuts. IEEE Trans. PAMI,
23(11):1222–1239, Nov 2001.
[6] G. Egnal and R. Wildes. Detecting binocular half-
occlusions: empirical comparisons of five approaches. IEEE
Trans . PAMI, 24(8):1127–1133, Aug 2002.
[7] A. Fusiello, V. Roberto, and E. Trucco. Efficient stereo with
multiple windowing. CVPR, pages 858–863, June 1997.
[8] D. Geiger, B. Ladendorf, and A. Yuille. Occlusions and
binocular stereo. ECCV, pages 425–433, 1992.
[9] S. Geman and D. Geman. Stochastic relaxation, gibbs distri-
butions, and the bayesian restoration of images. IEEE Trans.
PAM I , 6(6):721–741, Nov 1984.
[10] T. Kanade and M. Okutomi. A stereo matching algorithm
with an adaptive window: theory and experiment. IEEE
Trans . PAMI, 16(9):920–932, 1994.
[11] V. Kolmogorov and R. Zabih. Computing visual correspon-
dence with occlusions using graph cuts. ICCV, pages 508–
515, July 2001.
[12] M. Lin and C. Tomasi. Surfaces with occlusions from lay-
ered stereo. CVPR, 1:I–710–I–717, June 2003.
[13] D. Marr and T. Poggio. A computational theory of human
stereo vision. Proc. Royal Soc. London B, 204:301–328,
1979.
[14] J. Mulligan and K. Daniilidis. Predicting disparity windows
for real-time stereo. Lecture Notes in Computer Science,
1842:220–235, 2000.
[15] Y. Ohta and T. Kanade. Stereo by intra- and inter-scanline
search using dynamic programming. IEEE Trans. PAMI,
7(2):139–154, March 1985.
[16] M. Okutomi and T. Kanade. A multiple baseline stereo.
IEEE Trans. PAMI, 15(4):353–363, April 1993.
[17] S. Roy and I. Cox. A maximum-flow formulation of the n-
camera stereo correspondence problem. ICCV, pages 492–
499, 1998.
[18] D. Scharstein and R. Szeliski. Stereo matching with nonlin-
ear diffusion. IJCV, 28(2):155–174, 1998.
[19] D. Scharstein and R. Szeliski. A taxonomy and evaluation
of dense two-frame stereo correspondence algorithms. IJCV,
47(1):7 42, April 2002.
[20] R. Szeliski. Bayesian modeling of uncertainty in low-level
vision. IJCV, 5(3):271–302, Dec 1990.
[21] H. Tao, H. Sawhney, and R. Kumar. A global matching
framework for stereo computation. ICCV, 1:532–539, July
2001.
Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04)
1063-6919/04 $20.00 © 2004 IEEE
... While this category estimates disparity values inside slanted and/or curved surfaces well, sharp disparity changes are not well maintained around object boundaries as compared with the fronto-parallel surface model [20,23,32]. The second category separates disparity estimation into two parts: accurate disparity estimation and surface fitting [3,22]. ...
... • Using a new prior term in the energy cost function Ogale and Aloimonos [23] proposed a method to handle a horizontally slanted surface, which relies on unequally sampled correspondences and the 3-D uniqueness constraint. Tsin and Kanada [32] proposed a correlation-based prior to represent the 3-D geometric structure. ...
... This assumption enforces that each pixel has at most one matching point. Note that this assumption is broken for highly slanted surfaces where due to surface slant one pixel may have multiple (visible) correspondences in the other view [58]. Reasoning. ...
... There is a critical implication of the presence of slanted surfaces in stereo matching [33]. To illustrate this concept consider ...
Article
This paper presents a window-based three-dimens-ional (3-D) aggregation technique, which can approximate the surfaces of all kinds of objects, in stereo matching. The 3-D aggregation, which means to aggregate in 3-D surfaces, is implemented by decomposing the adaptive support window into horizontal segments; we allow the disparity to change smoothly in or between segments. Compared to traditional local stereo methods, the 3-D aggregation greatly improves the accuracy of results in slanted surfaces and occlusion areas while keeping excellent performance near depth discontinuities. We also propose an acceleration method that surprisingly improves the accuracy at the same time. The evaluation experiments confirm our achievements.
Article
The frontal-parallel assumption is made by many matching algorithms, but this assumption fails for slanted surfaces. This study proposes a matching algorithm intended to improve the matching results for slanted surfaces. First, a mathematical model is constructed to prove that slanted surfaces in the environment have corresponding slanted disparity surfaces in the disparity space image, and the model is to help find the proper plane parameters of slanted support windows, then improved cost aggregation and post-processing methods are proposed. The algorithm is tested using the Middlebury and Karlsruhe Institute of Technology and Toyota Technical Institute at Chicago (KITTI) benchmarks. The results demonstrate that the algorithm exhibits good performance and is efficient for slanted surfaces.
Conference Paper
This paper presents an enhancement to the recent framework of histogram aggregation [1], that enables to improve the matching accuracy while preserving a low computational complexity. The original algorithm uses a fronto-parallel support window for cost aggregation, which leads to inaccurate results in the presence of significant surface slant. We address the problem by considering a pre-defined set of discrete orientation hypotheses for the aggregation window. It is shown that a single orientation hypothesis in the Disparity Space Image is usually representative of a large interval of possible 3D slants, and that handling slant in the disparity space has the advantage of avoiding visibility issues. We also propose a fast recognition scheme in the Disparity Space Image volume for selecting the most likely orientation hypothesis for aggregation. The experiments clearly prove the effectiveness of the approach.
Conference Paper
Stereo cameras enable a 3D reconstruction of viewed scenes and are therefore well-suited sensors for many advanced driver assistance systems and autonomous driving. Modern algorithms for estimating distances for every image pixel achieve high-quality results, but their real-time capability is very limited. In contrast, window-based local methods can be implemented very efficiently but are more prone to errors. This is particularly true for spatial changes of distance within the matching window, most prominently on surfaces such as the road which are not parallel to but rather slanted towards the image plane. In this paper the authors present a method to compensate the impact of this effect for arbitrarily oriented sets of planes. It does not depend on any modifications to the actual distance estimation. Instead, it only applies specific transformations to input images and intermediate results. By combining this approach with existing implementations which efficiently use either multi-core or graphics processors, the authors are able to significantly increase quality while maintaining real-time throughputs on a compact target system.
Article
Disparity estimation for a scene with complex geometric characteristics such as slanted or highly curved surfaces is a basic and important issue in stereo matching. Traditional methods often use first-order smoothness priors that always lead to low-curvature frontal-parallel disparity maps. We propose a stereo framework that views the scene as a set of 3D entities with compact and smooth disparity distributions. The 3D entity-based representation enables some contributions to obtain a precise disparity estimation. A GCPs-plane constraint based on ground control points is used to strengthen the compact distributions of the disparities in each entity by restricting the scope of the disparity variance and reducing matching ambiguities in repetitive or low-texture areas. Furthermore, we have formulated a joint second-order smoothness prior, which combines a geometric weight with the derivative of disparity values. This prior encourages smooth disparity variations inside each entity and means that each entity is biased towards being a 3D planar surface. Segmentation is incorporated as soft constraint by effectively fusing the advantages of the image color gradient and GCPs-plane. This avoids blending of the foreground and background and retains only the disparity discontinuities from geometrically smooth regions with strong texture gradients. Our framework is formulated as a maximum a posteriori probability estimation problem that is optimized using the fusion-move approach. Evaluation results on the Middlebury benchmark show that the proposed method ranks second among the approximately \(152\) listed algorithms. In addition, it performs well in real-world scenes.
Article
Full-text available
A method for solving the stereo matching problem in the presence of large occlusion is presented. A data structure—the disparity space image—is defined to facilitate the description of the effects of occlusion on the stereo matching process and in particular on dynamic programming (DP) solutions that find matches and occlusions simultaneously. We significantly improve upon existing DP stereo matching methods by showing that while some cost must be assigned to unmatched pixels, sensitivity to occlusion-cost and algorithmic complexity can be significantly reduced when highly-reliable matches, or ground control points, are incorporated into the matching process. The use of ground control points eliminates both the need for biasing the process towards a smooth solution and the task of selecting critical prior probabilities describing image formation. Finally, we describe how the detection of intensity edges can be used to bias the recovered solution such that occlusion boundaries will tend to be proposed along such edges, reflecting the observation that occlusion boundaries usually cause intensity discontinuities.
Conference Paper
Full-text available
An iterative stereo matching algorithm is presented which selects a window adaptively for each pixel. The selected window is optimal in the sense that it produces the disparity estimate having the least uncertainty after evaluating both the intensity and the disparity variations within a window. The algorithm employs a statistical model that represents uncertainty of disparity of points over the window; the uncertainty is assumed to increase with the distance of the point from the center point. The algorithm is completely local and does not include any global optimization. Also, the algorithm does not use any post-processing smoothing, but smooth surfaces are recovered as smooth while sharp disparity edges are retained. Experimental results have demonstrated a clear advantage of this algorithm over algorithms with a fixed-size window, for both synthetic and real images
Article
We make an analogy between images and statistical mechanics systems. Pixel gray levels and the presence and orientation of edges are viewed as states of atoms or molecules in a lattice-like physical system. The assignment of an energy function in the physical system determines its Gibbs distribution. Because of the Gibbs distribution, Markov random field (MRF) equivalence, this assignment also determines an MRF image model. The energy function is a more convenient and natural mechanism for embodying picture attributes than are the local characteristics of the MRF. For a range of degradation mechanisms, including blurring, nonlinear deformations, and multiplicative or additive noise, the posterior distribution is an MRF with a structure akin to the image model. By the analogy, the posterior distribution defines another (imaginary) physical system. Gradual temperature reduction in the physical system isolates low energy states (``annealing''), or what is the same thing, the most probable states under the Gibbs distribution. The analogous operation under the posterior distribution yields the maximum a posteriori (MAP) estimate of the image given the degraded observations. The result is a highly parallel ``relaxation'' algorithm for MAP estimation. We establish convergence properties of the algorithm and we experiment with some simple pictures, for which good restorations are obtained at low signal-to-noise ratios.
Article
Stereo matching is one of the most active research areas in computer vision. While a large number of algorithms for stereo correspondence have been developed, relatively little work has been done on characterizing their performance. In this paper, we present a taxonomy of dense, two-frame stereo methods. Our taxonomy is designed to assess the different components and design decisions made in individual stereo algorithms. Using this taxonomy, we compare existing stereo methods and present experiments evaluating the performance of many different variants. In order to establish a common software platform and a collection of data sets for easy evaluation, we have designed a stand-alone, flexible C++ implementation that enables the evaluation of individual components and that can easily be extended to include new algorithms. We have also produced several new multi-frame stereo data sets with ground truth and are making both the code and data sets available on the Web. Finally, we include a comparative evaluation of a large set of today's best-performing stereo algorithms.
Article
Binocular stereo is the process of obtaining depth information from a pair of left and right cameras. In the past occlusions have been regions where stereo algorithms have failed. We show that, on the contrary, they can help stereo computation by providing cues for depth discontinuities. We describe a theory for stereo based on the Bayesian approach. We suggest that a disparity discontinuity in one eye's coordinate system always corresponds to an occluded region in the other eye thus leading to an occlusion constraint or monotonicity constraint. The constraint restricts the space of possible disparity values, simplifying the computations, and gives a possible explanation for a variety of optical illusions. Using dynamic programming we have been able to find the optimal solution to our system and the experimental results support the model.
Article
This paper presents a stereo matching algorithm using the dynamic programming technique. The stereo matching problem, that is, obtaining a correspondence between right and left images, can be cast as a search problem. When a pair of stereo images is rectified, pairs of corresponding points can be searched for within the same scanlines. We call this search intra-scanline search. This intra-scanline search can be treated as the problem of finding a matching path on a two-dimensional (2D) search plane whose axes are the right and left scanlines. Vertically connected edges in the images provide consistency constraints across the 2D search planes. Inter-scanline search in a three-dimensional (3D) search space, which is a stack of the 2D search planes, is needed to utilize this constraint. Our stereo matching algorithm uses edge-delimited intervals as elements to be matched, and employs the above mentioned two searches: one is inter-scanline search for possible correspondences of connected edges in right and left images and the other is intra-scanline search for correspondences of edge-delimited intervals on each scanline pair. Dynamic programming is used for both searches which proceed simultaneously: the former supplies the consistency constraint to the latter while the latter supplies the matching score to the former. An interval-based similarity metric is used to compute the score. The algorithm has been tested with different types of images including urban aerial images, synthesized images, and block scenes, and its computational requirement has been discussed.