Content uploaded by Yiannis Aloimonos
Author content
All content in this area was uploaded by Yiannis Aloimonos
Content may be subject to copyright.
Stereo correspondence with slanted surfaces: critical implications of horizontal
slant
Abhijit S. Ogale and Yiannis Aloimonos
Center for Automation Research, University of Maryland, College Park, MD 20742
{ogale, yiannis}@cfar.umd.edu
Abstract
We examine the stereo correspondence problem in the
presence of slanted scene surfaces. In particular, we high-
light a previously overlooked geometric fact: a horizontally
slanted surface (i.e. having depth variation in the direction
of the separation of the two cameras) will appear horizon-
tally stretched in one image as compared to the other image.
Thus, while corresponding two images, N pixels on a scan-
line in one image may correspond to a different number of
pixels M in the other image. This leads to three important
modifications to existing stereo algorithms: (a) due to un-
equal sampling, intensity matching metrics such as the pop-
ular Birchfield-Tomasi procedure must be modified, (b) un-
equal numbers of pixels in the two images must be allowed
to correspond to each other, and (c) the uniqueness con-
straint, which is often used for detecting occlusions, must
be changed to a 3D uniqueness constraint. This paper dis-
cusses these new constraints and provides a simple scanline
based matching algorithm for illustration. We experimen-
tally demonstrate test cases where existing algorithms fail,
and how the incorporation of these new constraints provides
correct results. Experimental comparisons of the scanline
based algorithm with standard data sets are also provided.
1. Introduction
The dense stereo correspondence problem consists of
finding a mapping between the points in two images of a
scene. If the images have been rectified, then a point Sin
one image may correspond to a point S0in the other image,
where Sand S0lie on the same horizontal scanline. The
difference in the horizontal position of Sand S0is termed
as horizontal disparity. In this paper, we assume that we are
dealing with a rectified pair of images.
1.1. Previous work
There exists a considerable body of work on the dense
stereo correspondence problem. Scharstein and Szeliski
[19] have provided an exhaustive comparison of dense
stereo correspondence algorithms. Most algorithms gen-
erally utilize local measurements such as image intensity
(or color) and phase, and aggregate information from mul-
tiple pixels using smoothness constraints. The simplest
method of aggregation is to minimize the matching error
within rectangular windows of fixed size [16]. Better ap-
proaches utilize multiple windows [8, 7], adaptive win-
dows [10] which change their size in order to minimize the
error, shiftable windows [4, 21], or predicted windows [14],
all of which give performance improvements at discontinu-
ities.
Global approaches to solving the stereo correspondence
problem rely on the extremization of a global cost func-
tion or energy. The energy functions which are used in-
clude terms for local property matching (‘data term’), ad-
ditional smoothness terms, and in some cases, penalties for
occlusions. Depending on the form of the energy function,
the most efficient energy minimization scheme can be cho-
sen. These include dynamic programming [15], simulated
annealing [9, 1], relaxation labeling [20], non-linear diffu-
sion [18], maximum flow [17] and graph cuts [5, 11]. Max-
imum flow and graph cut methods provide better computa-
tional efficiency than simulated annealing for energy func-
tions which possess a certain set of properties. Some of
these algorithms treat the images symmetrically and explic-
itly deal with occlusions (eg. [11]). The uniqueness con-
straint [13] is often used to find regions of occlusion. Egnal
and Wildes [6] provide comparisons of various approaches
for finding occlusions.
Recently, some algorithms [3] have explicitly incorpo-
rated the estimation of slant while performing the estima-
tion of horizontal disparity. Lin and Tomasi [12] explicitly
model the scene using smooth surface patches and also find
occlusions; they initialize their disparity map with integer
disparities obtained using graph cuts, after which surface
Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04)
1063-6919/04 $20.00 © 2004 IEEE
fitting and segmentation are performed repeatedly.
1.2. Our approach
We explicitly examine the stereo correspondence prob-
lem in the presence of horizontally slanted scene surfaces.
In particular, we lay emphasis on the following geometric
effect: a horizontally slanted surface (ie. having depth vari-
ation in the direction of the separation of the two cameras)
will appear horizontally stretched in one image as compared
to the other image. Thus, when we correspond two images,
N pixels on a scanline in one image must be allowed to cor-
respond with a different number of pixels M in the other
image. Furthermore, it is evident that the intensity function
on the true horizontally slanted scene surface is sampled
differently by the two cameras, which is another low-level
effect which needs to be dealt with. Also, the uniqueness
constraint, which is often used to find occlusions by forc-
ing a one-to-one correspondence between pixels, is not true
for horizontally slanted surfaces, since a N-to-M correspon-
dence is possible. Hence, the uniqueness constraint must
be reformulated in terms of scene visibility in the presence
of horizontally slanted surfaces. In Section 2, we exam-
ine the above ideas and underscore the need for the treat-
ment of horizontal slant in the first stage of any stereo al-
gorithm during disparity estimation itself, rather than as a
post-processing or a feedback step. For the sake of illustra-
tion, we present a simple scanline based algorithm in Sec-
tion 3 which makes use of these constraints, and provide
experimental comparisons with existing algorithms using
standard data sets in Section 4.
a
1
b
1
a
2
b
2
AB
C
1
C
2
a
1
b
1
a
2
b
2
AB
C
1
C
2
A
B
C
1
C
2
a
1
b
1
a
2
b
2
A
B
C
1
C
2
a
1
b
1
a
2
b
2
Figure 1. (Left) unequal projection lengths of
a horizontally slanted line (Right) equal pro-
jection lengths of a fronto-parallel line
C
1
C
2
A
B
C
1
C
2
A
B
Figure 2. Sampling problem for a horizontally
slanted line
2. Effects of Horizontal Slant
2.1. Unequal projection lengths
Using a 1D camera, Figure 1 shows on the left, how a
horizontally slanted line DE in the scene projects onto the
line segment d1e1in camera F1,andd2e2in camera F2.
Clearly, the lengths of d1e1and d2e2are not equal. Assume
that the cameras have focal length equal to 1. Let the point
Dhave coordinates ([D>]
D)in space with respect to cam-
era 1, and point Ehave coordinates ([E>]
E), where the
[-axis is along the scanline, and the ]-axisisnormalto
the scanline. Then, if the cameras are separated by a trans-
lation w, we can immediately find the lengths O1and O2of
the projected line segments in the two cameras.
O1=[E@]E[D@]D
O2=([Ew)@]E([Dw)@]D
(1)
Clearly, in general, O1and O2are not equal. For the
fronto-parallel line shown in Figure 1 on the right, ]D=
]E=], hence
O1=O2=([E[D)@] (2)
Thus, except for the fronto-parallel case, horizontally
slanted line segments in space will always project onto seg-
ments of different lengths in the two cameras. Hence, N
pixels on a scanline in one image can correspond to a dif-
ferent number of pixels M on a scanline in the other image.
Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04)
1063-6919/04 $20.00 © 2004 IEEE
We must therefore make a provision in our stereo algorithms
to permit unequal correspondences of this nature.
2.2. Sampling
Since a horizontally slanted line segment in space has
different projection lengths in the two cameras, it’s inten-
sity function is also sampled differently by the two cameras
as shown in Figure 2. Birchfield and Tomasi [2] have pro-
vided a very useful method for matching pixel intensities,
which is insensitive to image sampling. However, due to
unequal sampling in the presence of horizontal slant, we
must first resample each scanline correctly, and then apply
the Birchfield-Tomasi matching procedure, which only uses
nearest neighbor pixels for interpolation. In other words, we
first stretch (resample) one of the scanlines, by an amount
related to the slant we are considering, and then match this
stretched scanline with the other unstretched scanline us-
ing the Birchfield-Tomasi matching process as usual. For
example, if we are considering the linear correspondence
function {2=p{1+gbetween points of camera 1 and 2,
then we must stretch the image of camera 1 by a factor p
before performing the intensity based matching.
2.3. Occlusions and the uniqueness constraint
The uniqueness constraint [13] is often used to find oc-
clusions. In its present form, the uniqueness constraint
forces a one-to-one correspondence between pixels in the
two images. In the end, the unpaired pixels are the occlu-
sions. However, since horizontal slant allows N pixels in
one image to match with a different number of pixels M in
the other image, we can no longer impose a one-to-one cor-
respondence for finding occlusions. We must modify the
uniqueness constraint so that we enforce a one-to-one map-
ping between continuous intervals (line segments) in the
two scanlines, instead of pixels. An interval in one scan-
line may correspond to an interval of a different length in
the other scanline, as long as the correspondence is unique.
This is equivalent to enforcing uniqueness in the scene
space instead of the image space, hence we may also refer
to this constraint as the 3D uniqueness constraint.
Figure 3 shows how the modified uniqueness constraint
is used. Part (a) shows an existing one-to-one correspon-
dence between intervals on the left and right scanlines. This
denotes an intermediate state in the progress of a stereo
matching and segmentation algorithm. Notice that the in-
tervals may correspond in any order (ie. the ordering con-
straint is not needed). Now, in part (b), we wish to insert a
new pair of corresponding intervals, shown by dashed lines.
(This new pair of matching intervals improves upon the
existing matches according to some energy metric which
depends on the stereo algorithm being used). In part (c),
we see that the insertion of this pair of intervals conflicts
with existing intervals (shown in gray). In order to enforce
uniqueness, the gray pair of intervals on the right must be
removed, while the gray pair of intervals on the left must be
resized. In part (d), we see the new correspondences. The
interval pair which was resized is shown in gray, and the
inserted interval is shown as dashed.
3. Scanline stereo algorithm
We now describe a simple algorithm to illustrate how the
above ideas may be implemented. For simplicity, the al-
gorithm processes a pair of scanlines LO({)and LU({)at a
time without using any vertical consistency constraints (the
results are post-processed by a simple median filter). Hor-
izontal disparities O({)are assigned to the left scanline
within a given range [1>2],andU({)to the right scan-
line in the range [2>1]=Notice that the disparities
are not assigned to pixels, but continuously over the whole
scanline. The disparities are not directly estimated, but in-
stead, we search for functions pO({)and gO({)for the left
scanline, and pU({)and gU({)for the right scanline, such
that given a point {Oon the left scanline, its corresponding
point {Uin the right scanline would be
{U=pO({O)·{O+gO({O)
and reciprocally:
{O=pU({U)·{U+gU({U)
Clearly,
pU({U)=1@pO({O)
gU({U)=gO({O)@pO({O)
The disparities are then computed as:
O({O)={U{O=(pO({O)1) ·{O+gO({O)
U({U)={O{U=(pU({U)1) ·{U+gU({U)
The function pOand pUare horizontal slants, which
allow line segments of different length in the two scanlines
to correspond. The scanlines are represented continuously
by linearly interpolating intensities between pixel locations.
Thus, if pO=2, then the left scanline is stretched (resam-
pled) by a factor of 2, and then matched with the unstretched
right scanline using the Birchfield-Tomasi method. Due to
the stretching of one scanline before performing the inten-
sity based matching, we are automatically modifying the
traditional Birchfield-Tomasi method to properly deal with
horizontal slant. For each possible pOand gO, absolute in-
tensity differences between corresponding points are com-
puted, and thresholded by a threshold w. The best value of
pOand gOfor a point is chosen such that it maximizes
Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04)
1063-6919/04 $20.00 © 2004 IEEE
left
right
left
right
left
right
left
right
(a) Initial correspondence (b) Insert new pair of matching intervals
(c) Enforce uniqueness constraint (d) Final correspondence
left
right
left
right
left
right
left
right
(a) Initial correspondence (b) Insert new pair of matching intervals
(c) Enforce uniqueness constraint (d) Final correspondence
Figure 3. The modified uniqueness constraint operates by preserving a one-to-one correspondence
between intervals on the left and right scanlines, instead of pixels
the size of the matching line segment containing that point.
This is the simple global optimization which we perform to
choose among the possible disparities.
The values of the horizontal slant which are to be ex-
amined are provided as inputs, ie. pO>p
U5P,where
P={p1>p
2===> pn}. Thus, given the possible slants P
and the disparity search range [1>2], the possible values
of gOand gUfor each position can be restricted.
In order to find the occlusions, we enforce the unique-
ness constraint in its modified form as shown in Figure 3.
We maintain a one-to-one correspondence between inter-
vals in the two scanlines. Hence, at any stage of the pro-
cess,wehaveasetVOof non-overlapping intervals in the
left scanline and a set VUof non-overlapping intervals in
the right scanline. An interval lis of the form [{1>{
2).The
uniqueness constraint enforces a one-to-one mapping Xbe-
tween the elements of VOand the elements of VU.Whena
new corresponding pair of intervals lOand lUis found, the
previous correspondences of segments in VOwhich overlap
with lOare removed, and the same is done for lUand VU.
Then, lOis added to VO,andlUto VU, and the one-to-one
mapping in Xis updated. Thus, we always ensure that a line
segment in the left scanline uniquely maps to a line segment
in the right scanline. In the end, line segments which remain
unmapped are the occlusions.
4. Experiments
Scharstein and Szeliski [19] have set up a test suite
(at www.middlebury.edu/stereo) of stereo image pairs along
with ground truth disparities for comparing the results of
dense stereo algorithms. The disparity map grxw generated
by an algorithm is compared to the true disparity gwuxh,and
the pixels which deviate by more than 1 unit from their true
disparity are termed as ‘bad’ pixels. The percentages of
bad pixels in the entire image, in the untextured regions and
near depth discontinuities are used to compare the results of
various algorithms. The percentages of bad pixels are re-
ported in Table 1, which was generated by submitting our
disparity maps (Figure 4) using the scanline algorithm to the
Middlebury website created by Scharstein et al (mentioned
earlier). The simple scanline algorithm presented earlier
(denoted ‘slanted scanline’ in the table) ranks ninth over-
all, while the ranks in each column are showed in brackets,
below the error percentages. This performance evaluation is
presented only for the sake of completeness, since the pri-
mary purpose of this paper is not to provide an algorithm,
but rather to understand the effects of horizontal slant, and
propose methods for correctly dealing with them. We ex-
pect that the constraints presented above will improve the
results of many existing stereo algorithms.
The correctness of our approach immediately becomes
evident when dealing with the stereo pair shown in Figure 5.
This pair of test images shows a black object which is hor-
Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04)
1063-6919/04 $20.00 © 2004 IEEE
Table 1. Performance comparison from the Middlebury Stereo Vision Page (overall rank is 9’th among
29 algorithms). The table shows only the top ten algorithms. Error percentages and rank in each
column (in brackets) is shown.
Rank Algorithm Tsukuba Sawtooth Venus Map
all untex disc. all untex disc. all untex disc. all disc.
1Segm.-based
GC
1.23
( 3)
0.29
( 2)
6.94
( 4)
0.30
( 1)
0.00
( 1)
3.24
( 1)
0.08
( 1)
0.01
( 1)
1.39
( 1)
1.49
(21)
15.46
(26)
2Layered 1.58
( 5)
1.06
( 7)
8.82
( 6)
0.34
( 2)
0.00
( 1)
3.35
( 2)
1.52
( 9)
2.96
(18)
2.62
( 3)
0.37
(11)
5.24
(11)
3Belief prop 1.15
( 1)
0.42
( 3)
6.31
( 1)
0.98
( 9)
0.30
(14)
4.83
( 6)
1.00
( 5)
0.76
( 5)
9.13
(13)
0.84
(18)
5.27
(12)
4MultCam
GC
1.85
( 9)
1.94
(14)
6.99
( 5)
0.62
( 6)
0.00
( 1)
6.86
(11)
1.21
( 7)
1.96
( 9)
5.71
( 7)
0.31
( 8)
4.34
(10)
5GC+occl 2b 1.19
( 2)
0.23
( 1)
6.71
( 2)
0.73
( 8)
0.11
( 8)
5.71
( 8)
1.64
(12)
2.75
(16)
5.41
( 6)
0.61
(14)
6.05
(13)
6Impr. Coop. 1.67
( 6)
0.77
( 5)
9.67
(10)
1.21
(13)
0.17
(11)
6.90
(12)
1.04
( 6)
1.07
( 6)
13.68
(18)
0.29
( 6)
3.65
( 7)
7GC+occl. 2a 1.27
( 4)
0.43
( 4)
6.90
( 3)
0.36
( 3)
0.00
( 1)
3.65
( 3)
2.79
(20)
5.39
(21)
2.54
( 2)
1.79
(22)
10.08
(20)
8Disc. pres. 1.78
( 7)
1.22
(10)
9.71
(11)
1.17
(11)
0.08
( 7)
5.55
( 7)
1.61
(11)
2.25
(12)
9.06
(12)
0.32
( 9)
3.33
( 6)
9Slanted
Scanline
1.82
( 8)
1.09
( 8)
9.47
( 8)
0.72
( 7)
0.24
(13)
6.00
( 9)
3.25
(21)
5.73
(22)
8.51
(11)
0.22
( 2)
3.10
( 4)
10 Graph cuts 1.94
(11)
1.09
( 9)
9.49
( 9)
1.30
(15)
0.06
( 6)
6.34
(10)
1.79
(15)
2.61
(15)
6.91
( 8)
0.31
( 7)
3.88
( 8)
29 Max. surf. 11.10
(29)
10.70
(27)
41.99
(29)
5.51
(29)
5.56
(29)
27.39
(28)
4.36
(24)
4.78
(20)
41.13
(28)
4.17
(28)
27.88
(28)
Figure 4. Top row (Left frames), Middle row (ground truth), Bottom row (our results). Occlusions were
filled in before performing the evaluation.
Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04)
1063-6919/04 $20.00 © 2004 IEEE
izontally slanted (depth decreases from left to right). The
second row of the figure shows on the left the output of the
graph cuts algorithm of Kolmogorov et al [11]. The graph
cuts result was obtained using software kindly provided by
the authors (www.cs.cornell.edu/People/vnk/software.html).
Our results are shown in the second row on the right hand
side. The graph cuts algorithm finds a constant disparity
value in the interior of the slanted object, which is clearly
incorrect. Our algorithm correctly shows the disparity of the
slanted object linearly decreasing from left to right (from
white to dark gray). The detected occlusions are shown in
black.
Figure 5. Horizontally slanted object. Top
row: left image, right image. Bottom row:
(left) results using graph cuts [11], (right) our
results. Occlusions are shown in black
5. Conclusions
We have discussed the effects of horizontal slant on the
stereo correspondence problem. We have shown that hor-
izontal slant leads to unequal projections in the two cam-
eras, which requires us to modify stereo algorithms for al-
lowing M-to-N pixel correspondences. Furthermore, we
have shown that horizontal slant leads to uneven sampling
of a surface by the two cameras, and hence local inten-
sity matching metrics must be suitably modified. Finally,
the uniqueness constraint for finding occlusions, which im-
poses a one-to-one correspondence between image pixels,
must be modified to enforce a one-to-one correspondence
between scanline intervals instead of pixels. We have also
presented a simple scanline based algorithm which imple-
ments these constraints, and provided experimental compar-
isons with existing methods.
References
[1] S. T. Barnard. Stochastic stereo matching over scale. IJCV,
3(1):17–32, 1989.
[2] S. Birchfield and C. Tomasi. A pixel dissimilarity measure
that is insensitive to image sampling. IEEE Trans. PAMI,
20(4):401–406, 1998.
[3] S. Birchfield and C. Tomasi. Multiway cut for stereo and
motion with slanted surfaces. ICCV, 1:489–495, 1999.
[4] A. F. Bobick and S. S. Intille. Large occlusion stereo. IJCV,
33(3):181–200, Sept 1999.
[5] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate
energy minimization via graph cuts. IEEE Trans. PAMI,
23(11):1222–1239, Nov 2001.
[6] G. Egnal and R. Wildes. Detecting binocular half-
occlusions: empirical comparisons of five approaches. IEEE
Trans . PAMI, 24(8):1127–1133, Aug 2002.
[7] A. Fusiello, V. Roberto, and E. Trucco. Efficient stereo with
multiple windowing. CVPR, pages 858–863, June 1997.
[8] D. Geiger, B. Ladendorf, and A. Yuille. Occlusions and
binocular stereo. ECCV, pages 425–433, 1992.
[9] S. Geman and D. Geman. Stochastic relaxation, gibbs distri-
butions, and the bayesian restoration of images. IEEE Trans.
PAM I , 6(6):721–741, Nov 1984.
[10] T. Kanade and M. Okutomi. A stereo matching algorithm
with an adaptive window: theory and experiment. IEEE
Trans . PAMI, 16(9):920–932, 1994.
[11] V. Kolmogorov and R. Zabih. Computing visual correspon-
dence with occlusions using graph cuts. ICCV, pages 508–
515, July 2001.
[12] M. Lin and C. Tomasi. Surfaces with occlusions from lay-
ered stereo. CVPR, 1:I–710–I–717, June 2003.
[13] D. Marr and T. Poggio. A computational theory of human
stereo vision. Proc. Royal Soc. London B, 204:301–328,
1979.
[14] J. Mulligan and K. Daniilidis. Predicting disparity windows
for real-time stereo. Lecture Notes in Computer Science,
1842:220–235, 2000.
[15] Y. Ohta and T. Kanade. Stereo by intra- and inter-scanline
search using dynamic programming. IEEE Trans. PAMI,
7(2):139–154, March 1985.
[16] M. Okutomi and T. Kanade. A multiple baseline stereo.
IEEE Trans. PAMI, 15(4):353–363, April 1993.
[17] S. Roy and I. Cox. A maximum-flow formulation of the n-
camera stereo correspondence problem. ICCV, pages 492–
499, 1998.
[18] D. Scharstein and R. Szeliski. Stereo matching with nonlin-
ear diffusion. IJCV, 28(2):155–174, 1998.
[19] D. Scharstein and R. Szeliski. A taxonomy and evaluation
of dense two-frame stereo correspondence algorithms. IJCV,
47(1):7 – 42, April 2002.
[20] R. Szeliski. Bayesian modeling of uncertainty in low-level
vision. IJCV, 5(3):271–302, Dec 1990.
[21] H. Tao, H. Sawhney, and R. Kumar. A global matching
framework for stereo computation. ICCV, 1:532–539, July
2001.
Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04)
1063-6919/04 $20.00 © 2004 IEEE