ArticlePDF Available

Key Issues in Modeling of Complex 3D Structures from Video Sequences

Authors:

Abstract and Figures

Construction of three-dimensional structures from video sequences has wide applications for intelligent video analysis. This paper summarizes the key issues of the theory and surveys the recent advances in the state of the art. Reconstruction of a scene object from video sequences often takes the basic principle of structure from motion with an uncalibrated camera. This paper lists the typical strategies and summarizes the typical solutions or algorithms for modeling of complex three-dimensional structures. Open difficult problems are also suggested for further study.
Content may be subject to copyright.
Hindawi Publishing Corporation
Mathematical Problems in Engineering
Volume 2012, Article ID 856523, 17 pages
doi:10.1155/2012/856523
Research Article
Key Issues in Modeling of Complex 3D Structures
from Video Sequences
Shengyong Chen,1Yuehui Wang,2and Carlo Cattani3
1College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China
2College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
3Department of Mathematics, University of Salerno, Via Ponte Don Melillo, 84084 Fisciano, Italy
Correspondence should be addressed to Shengyong Chen, sy@ieee.org
Received 1 July 2011; Accepted 22 August 2011
Academic Editor: Gani Aldashev
Copyright q2012 Shengyong Chen et al. This is an open access article distributed under the
Creative Commons Attribution License, which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly cited.
Construction of three-dimensional structures from video sequences has wide applications for intel-
ligent video analysis. This paper summarizes the key issues of the theory and surveys the recent
advances in the state of the art. Reconstruction of a scene object from video sequences often takes
the basic principle of structure from motion with an uncalibrated camera. This paper lists the
typical strategies and summarizes the typical solutions or algorithms for modeling of complex
three-dimensional structures. Open dicult problems are also suggested for further study.
1. Introduction
Over the past two decades, many researchers seek to reconstruct the model of a three-
dimensional 3Dscene structure and camera motion from video sequences taken with an
uncalibrated camera or unordered photo collections from the Internet. Most traditionally,
depth measurement and 3D metric reconstruction can be done from two uncalibrated stereo
images 1. Nowadays, reconstructing a 3D scene from a moving camera is one of the most
important issues in the field of computer vision. This is a very challenging task because of
its computational eciency, generality, complexity, and exactitude. In this paper, we aim to
show the development and current status of the 3D reconstruction algorithms on this topic.
The basic concept and knowledge of the problem can be found from the fundamentals
of the multiview geometry through the books and thesis such as Multiple View Geometry in
Computer Vision 2,The Geometry of Multiple Images 3,Triangulation 4,andsometypical
publications 58, which are independent for implementing an entire system. Multiple-view
geometry is most fundamental in computer vision, and the algorithms of structure from
2 Mathematical Problems in Engineering
motion are based on the perspective geometry, ane geometry, and the Euclidean geometry.
For simultaneous computation of 3D points and camera positions, this is a linear algorithm
framework for the Euclidean structure recovery utilizing a scaled orthographic view and
perspective views based on having a reference plane visible in all views 9.Thereisanane
framework for perspective views that are captured by a single extremely simple equation
based on a viewer-centered invariant, called relative ane structure 10. A comprehensive
method is used for estimating scene structure and camera motion from an image sequence
taken by ane cameras which can incorporate all point, line, and conic features in a unified
manner 11. The other approach tries to calculate the cameras along with the 3D points, only
relying on established correspondences between the observed images. These systems and
improvements are covered in many publications 2,6,1215. The literature gives a compact
yet accessible overview covering a complete reconstruction system.
For multiview modeling of a rigid scene, an approach is presented in 16,which
merges traditional approaches to reconstructing image-extractable features, and modeling via
user-provided geometry includes steps to obtain features for a first guess of the structure and
motion, fit geometric primitives, correct the structure so that reconstructed features would
lie exactly on geometric primitives, and optimize both structure and motion in a bundle
adjustment manner. A nonlinear least square algorithm is presented in 17for recovering
3D shape and motion from image streams.
Sparse 3D measurements of real scenes are readily estimated from N-view image se-
quences using structure-from-motion techniques. There is a fast algorithm for rigid structure
from image sequences in 18. Hilton presents ageometric theory for reconstruction of surface
models from sparse 3D data captured from N camera views 19for 3D shape reconstruction
by using vanishing points 20.Relativeane structure is given for canonical model for 3D
from 2D geometry and applications 10.
The paper describes the progress in automatic recovering 3D scene structures together
with 3D camera positions from a sequence of images acquired by an unknown camera
undergoing unknown movement 12. The main departure from previous structure from
motion strategies is that the processing is not sequential. Instead, a hierarchical approach
is employed for building from image triplets and associated trifocal tensors. A method is
presented for dealing with hundreds of images without precise calibration knowledge 21.
Optimizing just over the motion unknowns is fast, and given the recovered motion, one can
recover the optimal structure algebraically for two images 4.
In fact, reconstruction of nonrigid scenes is very important in structure from motion.
The recovery of 3D structure and camera motion for nonrigid scenes from single-camera
video footages is a key problem in computer vision. For an implicit imaging model of non-
rigid scenes, there is an approach that gives a nonrigid structure-from-motion algorithm
based on computing matching tensors over subsequences, and each nonrigid matching tensor
is computed, along with the rank of the subsequence, using a robust estimator incorporating
a model selection criterion that detects erroneous image points 22. Uncalibrated motion
captures exploiting articulated structure constraints 23such as humans. The technique
shows promise as a means of creating 3D animations of dynamic activities such as sports
events. For the problem of 3D reconstruction of nonrigid objects from uncalibrated image
sequences, under the assumption of an ane camera and that the nonrigid object is composed
of a rigid part and a deformation part, a stratification approach can be used to recover
the structure of nonrigid objects by first reconstructing the structure in ane space and
then upgrading it to the Euclidean space 24. In addition, a general framework of local-
ly rigid motion for solving the M-point and N-view structure-from-motion problem for
Mathematical Problems in Engineering 3
unknown bodies deforming under orthography is presented in 25. An incremental ap-
proach is presented in 26, where a new framework for nonrigid structure from motion
simultaneously addresses three significant challenges: severe occlusion, perspective camera
projection, and large non-linear deformation.
With the development of structure-from-motion algorithms, geometry constraint and
optimization are necessary for reconstructing a good 3D model of the object or scene. Many
researchers give us some useful approaches. For example, a technique is proposed in 27for
estimating piecewise planar models of objects from their images and geometric constraints
and 3D structure from a single calibrated view using distance constraints 28.Marques
and Costeira present an approach to estimating 3D shape from degenerated sequences with
missing data 29. Beyond the epipolar constraint, it improves the eect of structure from
motion 30.
3D ane measurements may be computed from a single perspective view of a scene
given only minimal geometric information determined from the image. This minimal infor-
mation is typically the vanishing line of a reference plane and a vanishing point for a direction
not parallel to the plane. Without camera parameters, Criminisi et al. 31show how to i
compute the distance between planes parallel to the reference plane; iicompute area and
length ratios on any plane parallel to the reference plane; iiidetermine the camera’ location.
Direct estimation is the fundamental estimation of scene structure and camera motion
from a sequence of images. No computation of optical flow or feature correspondences is
required 32. A good critique on structure-from-motion algorithms can be found in 33by
Oliensis.
The remainder of this paper is organized as follows. Section 2 briefly gives some typ-
ical applications of structure from video sequences. Section 3 introduces the general recon-
struction principle of structure from video sequences and unstructured photo collections.
Section 4 outlines the methods for structure and motion estimation. Section 5 discusses the
relevant available algorithms for every step to obtain a better result. We oer our impres-
sions of current and future trends in the topic and conclude the development in Sections 6
and 7.
2. Typical Applications
2.1. Modeling and Reconstruction of 3D Buildings or Landmarks
For 3D reconstruction of an object or building, Pollefeys et al. typically present a complete
system to build visual model with a hand-held camera 6. There is a system for photorealistic
3D reconstruction from hand-held cameras 34. Sinha et al. 35present an algorithm
for interactive 3D architectural models from unordered photo collections. There is a fully
automated 3D reconstruction and visualization system for architectural scenes including its
interiors and exteriors 36. The system utilizes structure-from-motion, multiview stereo and
a stereo algorithm.
The 3D models of historical relics and buildings, for example, the Emperor Qin’s Terra-
cotta Warriors and Piazza San Marco, have very significant meanings for archeologists. A
system that can match and reconstruct 3D scenes from extremely large collections of photo-
graphs has been developed by Agarwal et al. 37. A method for enabling existing multiview
stereo algorithms to operate on extremely large unstructured photograph collections has been
contrived by Furukawa et al. 38. This approach is to decompose the collection into a set
4 Mathematical Problems in Engineering
of overlapping sets of photos that can be processed in parallel and to merge the resulting
reconstructions 38. People want to sightsee the famous buildings or landscapes from the In-
ternet; they could tour the world via building a web-scale landmark recognition engine 39.
Modeling and recognizing landmarks at world scale is a useful yet challenging task.
There exists no readily available list of worldwide landmarks. Obtaining reliable visual
models for each landmark can also pose problems, and eciency is another challenge for
such a large-scale system. Zheng et al. leverage the vast amount of multimedia data on the
web, the availability of an Internet image search engine, and advances in object recognition
and clustering techniques, to address these issues 39.
2.2. Urban Reconstruction
Modeling the world and reconstructing a city present many challenges for a visualization
system in computer vision. It can use some products such as Google Earth Google Map.
For instance, Pollefeys et al. 40present a system for automatic, georegistered, real-time
multiview stereo 3D reconstruction form long image sequences of urban scenes. The system
collects video streams, as well as GPS and inertia measurements in order to obtain the
georegistered coordinates of the 3D models 40. Faugeras et al. 41address the problem
of recovery of a realistic textured model of a scene from a sequence of images, without any
prior knowledge either about the parameters of the cameras or about their motion.
2.3. Navigation
If the world’s model or the city’s reconstruction is exhaustively completed, we can obtain
relative location of the buildings and find related views for navigation for robots or other
vision systems. Photo Tourism can enable full 3D navigation and exploration of the set of
images and world geometry, along with auxiliary information such as overhead maps 14.It
gives several modes for navigation, including free-fight navigation, moving between related
views, object-based navigation, and creating stabilized slideshows. The system by Pollefeys et
al. also contains the navigation function 40. Supplying realistically textured 3D city models
at ground level promises to be useful for previsualizing upcoming tracsituationsincar
navigation systems 42.
2.4. Visual Servoing
In the literature, there are applications that can employ SfM algorithms successfully in prac-
tical engineering. For instance, based on structure from controlled motion or on robust statis-
tics, a visual servoing system is presented in 43. A general-purpose image understanding
system via a control structure is designed by Marengoni et al. 44and 3D video compression
via topology matching 45. More applications are being developed by researchers and en-
gineers in the community.
2.5. Scene Recognition and Understanding
3D reconstruction is an important application to face recognition, facial expression analysis,
and so on. Fidaleo and Medioni 46design a model-assisted system for reconstruction of 3D
Mathematical Problems in Engineering 5
faces from a single-consumer quality camera using a structure-from-motion approach. Park
and Jain 47present an algorithm for 3D-model-based face recognition in video.
Reconstruction of 3D scene geometry is animportant element for scene understanding,
autonomous vehicle and robot navigation, image retrieval, and 3D television 48.Nedovic
et al. propose accounting for the inherent structure of the visual world when trying to solve
the scene reconstruction problem 48.
3. Information Organization
The goal of structure-form-motion is automatic recovery of camera motion and scene
structure from two or more images. The problem of using pixel correspondences or track
points to determine camera and point geometry in this manner is known as structure from
motion. It is a self-calibration technique and called automatic camera tracking or match
moving. We must consider several questions like
1Correspondence feature extracting and tracking or matching: given a point in
one image, how does it constrain the position of the corresponding point in other
images?
2Scene geometry structure: given point matches in two or more images, where are
the corresponding points in 3D?
3Camera geometry motion: given a set of corresponding points in two or more
images, what are the camera matrices for these views?
Based on these questions, we can give the 3D reconstruction pipeline as in Figure 1.
The goal of correspondence is to build a set of matching 2D coordinates of pixels across
the video sequences. It is a significant step in the flow of the structure from motion. Cor-
respondence is always a challenging task in computer vision. So far, many researchers have
developed some practical and robust algorithms. Given a video sequence of scene, how can
we find matching points?
Firstly, there are some well-known algorithms for image sequences or videos; one
popular is the KLT tracker 4951. It gives us an integrated system that can automatically
detect the KLT feature points and track them. However, it cannot apply to the situations with
wide baseline, illustration changing, variant scale, duplicate and similar structure, occlusion,
noise, image distortion, and so on. Generally speaking, for video sequences, the KLT tracker
can perform a good eect. Figures 2and 3show examples of the feature points of the KLT
detector output with example images from http://www.ces.clemson.edu/stb/klt/.
In the KLT tracker 4951, if the time interval between two frames of video is
suciently short, we can suppose that the positions of feature points move, but their
intensities do not change; that is,
Ix,t
Iδx,tΔt,3.1
where xis the position of a feature point and δxis a transformation function.
In the papers of Lucas and Kanade 49, Tomasi and Kanade 50, and Shi and Tomasi
51, the authors made an important hypothesis that for high enough frame rates, δxcan be
approximated with a displacement vector d:
Ix,t
Ixd,tΔt.3.2
6 Mathematical Problems in Engineering
Video sequences
Feature detection
Feature matching
Structure
from motion P
3D point positions and camera poses
Figure 1: 3D reconstruction pipeline.
Figure 2: Example set of detected KLT features.
Then symmetric definition for the dissimilarity between two windows, one in image
Ix,tand one in image Ixd,tΔt, is as follows:
εW
Ixd,tΔtIx,t
2ωxdx,3.3
where ωxis the weighting function, usually set to the constant 1. The algorithm is cal-
culating the vector dwhich minimizes. Now, utilizing the first-order Taylor expansion of
Mathematical Problems in Engineering 7
Figure 3: Tracking trajectory of KLT tracker through a video sequence.
Ixd,tΔtto truncate to the linear term and setting the derivative of εwith respect to d
to 0, obtaining the linear equation:
Zde,3.4
where Zis the following 2 ×2matrix:
ZW
gxgTxωxdx3.5
and eis the following 2 ×1vector:
eW
Ixd,tΔtIx,t
gxωxdx,3.6
where gx∂I/∂x.
On the other hand, for a completely unorganized set of images, the tracker becomes
invalid. There is another popular algorithm in computer vision area, named scale-invariant
feature transform SIFT52.Itiseective to feature detection and matching in a wide class
of image transformation, including rotations, scales, and changes in brightness or contrast,
and to recognize panoramas 53. Figures 4and 5show examples of the feature points of the
SIFT output with example images from http://www.cs.ubc.ca/lowe/keypoints/.
4. Structure and Motion Estimation
Assume that we have obtained a set of correspondences between images or video sequence,
and then we use the set to reconstruct the 3D structure of each point in the set of corre-
spondences and recover the motion of a camera. This task is called structure from motion.
The problem has been an active research topic in computer vision since the development
of the Longuet-Higgins eight-point algorithm 54that focused on reconstructing geometry
8 Mathematical Problems in Engineering
50 100 150 200 250 300
50
100
150
200
250
300
350
350 400 450 500
Figure 4: Example set of detected SIFT features.
50
100
150
200
250
300
350
100 200 300 400 500 600 700 800
Figure 5: SIFT feature matches between images.
from two views. In the literature 2, several dierent approaches to solve the structure-from-
motion problem are given.
4.1. Factorization
There is a popular factorization algorithm for image streams under orthography, using
many images and tracking many feature points to obtain highly redundant feature position
information, which was firstly developed by Tomasi and Kanade 55in the 1990s. The
main idea of this algorithm is to factorize the tracking matrix into structure and motion
matrices simultaneously via singular value decomposition SVDmethod with low-rank
approximation, taking advantage of the linear algebraic properties of orthographic projection.
However, an orthographic formulation limits the range of motions the method can
accommodate. Perspective projection is a projection model that closely approximates per-
spective projection by modeling several eects not modeled under orthographic projection,
while retaining linear algebraic properties 56,57. Poelman and Kanade 56have developed
Mathematical Problems in Engineering 9
a paraperspective factorization method that can be applied to a much wider range of motion
scenarios, including image sequences containing motion toward the camera and aerial image
sequences of terrain taken from a low-altitude airplane.
With the development of factorization method, a factorization- based algorithm for
multi-image projective structure and motion is developed by Sturm and Triggs 57.This
technique is a practical approach for recovery of scaled feature points, using fundamental
matrix and epipoles estimated from the image sequences.
Because matrix factorization is a key component for solving several computer vision
problems, Tardif et al. have proposed batch algorithms for matrix factorization 58that are
based on closure and basis constraints, which handle the presence of missing or erroneous
data, which often arise in structure from motion.
In mathematical expression of the factorization algorithm, assume that the tracked
points are {xj
i,y
j
i|i1,...,n;j1,...,m}. The algorithm defines the measurement matrix
W:W
U
V
.TherowsofUand Vare then registered by subtracting from each entry the
mean of the entries in that row:
xj
ixj
i1
nxj
i,
yj
iyj
i1
nyj
i.
4.1
The goal of the Tomasi-Kanade algorithm 55is to factorize Winto two matrices as follows:
WMX,4.2
where M, named motion matrix, is a 2m×3 matrix which represents the camera rotation in
each frame and X, named structure matrix, is a 3 ×nmatrix which denotes the positions of
the feature points in object space. So in the absence of the Gauss noise, rank W3.
Then we can compute SVD decomposition of Wto obtain UDVT:
WUDVT,4.3
where if the singular value of Wis σ1
2
3, we can get the matrix Mσ1u1
2u2
3u3
and Xv1,v2,v3.
The method can also handle and obtain a full solution from a partially filled-in
measurement matrix, which occurs when features appear and disappear in the video due to
occlusions or tracking failures 55. This method gives accurate results and does not introduce
smoothing in structure and motion. Using the above method, the problem can be solved for
the video of general scene such as building and sculpture Figure 6.
4.2. Bundle Adjustment
Bundle adjustment is a significant component of most structure from motion systems. It is the
joint nonlinear refinement of camera and point parameters, so it can consume a large amount
of time for large problems. Unfortunately, the optimization underlying structure from motion
10 Mathematical Problems in Engineering
100
100
0
100
200
300
400
500
50 0
0
50 100
100
150 200
200
250 300 300
350
Figure 6: Example of recovering structure and motion.
involves a complex, nonlinear objective function with no closed-form solution, due to non-
linearities in perspective geometry. Most modern approaches use nonlinear least squares
algorithms 17to minimize this objective function, a process known as bundle adjustment;
53that is, basic mathematics of the bundle adjustment problem is well understood 59.
Generally speaking, bundle adjustment is a global algorithm, but it consumes much time and
cannot achieve real time to solve the minimize restriction. Mouragnon et al. 60propose an
approach for generic and real-time structure from motion using local bundle adjustment. It
allows 3D points and camera poses to be refined simultaneously through the image sequence.
Zhang et al. 61apply bundle optimization to further improve the results of consistent depth
maps from a video sequence.
4.3. Self-Calibration
To upgrade the projective and ane reconstruction to a metric reconstruction i.e., deter-
mined up to an arbitrary Euclidean transformation and a scale factor, calibration techniques,
to which we follow the approach described in 2,6,9,15,62, can deal with this problem. It
can be done by imposing some constraints on the intrinsic camera parameters. This approach
that is called self-calibration has received a lot of attention in recent years. The ambiguity
on the reconstruction is restricted from projective to metric through self-calibration 6.
Mostly self-calibration algorithms are concerned with unknown but constant intrinsic camera
parameters 2,4,12. The paper presented the problem of 3D Euclidean reconstruction of
structured scenes from uncalibrated images based on the property of vanishing points 63.
They propose a multistage linear approach, with structure from motion technique based on
point and vanishing point matches in images 64.
4.4. Correlative Improvement
Traditional SFM algorithms using just two images often produce inaccurate 3D reconstruc-
tions, mainly due to incorrect estimation of the camera’ motion. Thomas and Oliensis 65
present a practical algorithm that can deal with noise in multiframe structure from motion.
It describes a new incremental algorithm for reconstructing structure from multi-image
sequences which estimates and corrects for the error in computing the camera motion.
Mathematical Problems in Engineering 11
The research of structure from motion has shown great progress throughout several decades,
but the algorithms on structure from motion still exhibit some faults and shortages. The
result of Structure from Motion cannot satisfy people in many situations. However, many
researchers present a lot of improving approaches, such as dual computation of projective
shape and camera positions from multiple images 66.
For incremental algorithms that solve progressively larger bundle adjustment
problems, Crandall et al. present an alternative formulation for structure from motion based
on finding a coarse initial solution using a hybrid discrete-continuous optimization and then
improve the solution using bundle adjustment. The initial optimization step uses a discrete
Markov random field MRFformulation, coupled with a continuous Levenberg-Marquardt
refinement 67.
For time eciency, Havlena et al. present a method of ecient structure from motion
by graph optimization 68. Gherardi et al. improve the algorithm of eciency with hierar-
chical structure and motion 69.
For duplicate or similar structure, Roberts et al. couple an expectation maximization
EMalgorithm for structure from motion for scenes with large duplicate structures 70.
A hierarchical framework that resamples 3D reconstructed points to reduce computation
cost on time and memory for very-large-scale structure from motion 71. Savarese and Bao
propose a formulation called semantic structure from motion SSFM, where SSFM takes
advantages of both semantic and geometrical properties associated with objects in the scene
72.
5. Relevant Algorithms
5.1. Features
(1) Line
For the problem of camera motion and 3D structure reconstruction from line correspondences
across multiple views, there is a triangulation algorithm that outperforms standard linear and
bias-corrected quasi-linear algorithms, and that bundle adjustment using our orthonormal
representation yields results similar to the standard maximum likelihood trifocal tensor
algorithm, while being usable for any number of views 73. Spetsakis and Aloimonos
74present a system for structure from motion using line correspondences. The recovery
algorithm is formulated in terms of an objective function which measures the total squared
distance in the image plane between the observed edge segments and the projections of the
reconstructed lines 75. A linear method is developed for reconstruction using lines and
points simultaneously 76.
(2) Curve
Tubic et al. 77present an approach for reconstructing a surface from a set of arbitrary,
unorganized, and intersecting curves. There is an approach for reconstructing open surfaces
from image data 78. Kaminski and Shashua 79introduce a number of new results in the
context of multiview geometry from general algebraic curves, which start with the recoveryof
camera geometry from matching curves. Berthilsson et al. present a method for reconstruction
of general curves, using factorization and bundle adjustment 80.
12 Mathematical Problems in Engineering
(3) Silhouette
Liang and Wong 81develop an approach that produces relatively complete 3D models
similar to volumetric approaches, with the topology conforming to what is observed from
the silhouettes. In addition, the method neither assumes nor depends on the spatial order
of viewpoints. Hartley and Kahl give us critical configurations for projective reconstruction
from multiple views in 82. Joshi et al. design an algorithm for structure and motion
estimation from dynamic silhouettes under perspective projection 83.Liuetal.present
a method that is shaped from silhouette outlines using an adaptive dandelion model 84.
Yemez and Wetherilt develop a volumetric fusion technique for surface reconstruction from
silhouettes and range data 85.
5.2. Other Aspects
(1) Multiview Stereo
Multiview stereo MVStechniques take as input a set of images with known camera param-
eters i.e., position and orientation of the camera, focal length, image distortion parameters
38,53,86.Wecanreferto87for a classification and evaluation of recent MVS techniques.
(2) Clustering
There are clustering techniques to partition the image set into groups of related images, based
on the visual structure represented in the image connectivity graph for the collection 88,89.
6. Existing Problems and Future Trends
While algorithms of structure from motion have been developed for 3D reconstruction in
many applications, some problems of reconstructing geometry from video sequences still
exist in computer vision and photography. Until recently, however, there have been no
good computer vision techniques for recovering this kind of structure from motion. Many
researchers are still making eorts to improve the methods mainly in the following aspects.
6.1. Feature Tracking and Matching
Zhang et al. give a robust and ecient algorithm on ecient nonconsecutive feature tracking
for structure from motion via two main steps, that is, consecutive point tracking and
nonconsecutive track matching 90. They improve the KTL tracker by the invariant feature
points and a two-pass matching strategy to significantly extend the track lifetime and reduce
the sensitivity of feature points to variant scale, duplicate and similar structure, and noise and
image distortion. The results can be found at http://www.cad.zju.edu.cn/home/gfzhang/.
6.2. Active Vision
The method is based on the structure from controlled motion that constrains camera motions
to obtain an optimal estimation of the 3D structure of a geometrical primitive 91.Stereo
Mathematical Problems in Engineering 13
geometry is acquired from 3D egomotion streams 92. Wide-area egomotion estimation is
acquired from known 3D structure 93. A work on estimating surface reflectance properties
of a complex scene under captured natural illumination can be found in 94. Other algo-
rithms are also attempted on selective attention of human eyes.
6.3. Unorganized Images
To solve the resulting large-scale nonlinear optimization, we reconstruct the scene incremen-
tally, starting from a single pair of images, then adding new images and points in rounds, and
running a global nonlinear optimization after each round 53. Structure from motion could
be applied to photos found in the wild, reconstructing scenes from several large Internet
photo collections 14. The large redundancy in online photo collections means that a small
fraction of images may be sucient to produce high-quality reconstructions. An investigation
has begun to explore by extracting image “skeletons” from large collections 95.Perhaps
the most important challenge is to find ways to eectively parallelize all the steps of the
reconstruction pipeline to take advantage of multicore architectures and cloud computing
37,38,53,89.
7. Conclusion
This paper has summarized the recent development of structure from motion algorithm that
is able to metrically reconstruct complex scenes and objects. The wide applications have
been addressed in computer vision area. Typical contributions are introduced for feature
point detection, tracking, matching, factorization, bundle adjustment, multiview stereo, self-
calibration, line detection and matching, modeling, and so forth. Representative works are
listed for readers to have a general overview of the state of the art. Finally, a summary of
existing problems and future trends of structure modeling is addressed.
Acknowledgments
This work was supported by the National Natural Science Foundation of China and Microsoft
Research Asia NSFC nos. 61173096, 60870002, and 60802087, Zhejiang Provincial S&T De-
partment 2010R10006, 2010C33095, and Zhejiang Provincial Natural Science Foundation
R1110679.
References
1H. Kuar and K. Takaya, “Depth measurement and 3D metric reconstruction from two uncalibrated
stereo images,” in Proceedings of the Canadian Conference on Electrical and Computer Engineering (CCECD
’07), pp. 1460–1463, April 2007.
2R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press,
2nd edition, 2004.
3O. Faugeras and Q. T. Luong, The Geometry of Multiple Images, MIT Press, Cambridge, Mass, USA, 2001.
4R. I. Hartley and P. Sturm, “Triangulation,” in Proceedings of the American Image Understanding
Workshop, pp. 957–966, 1994.
5P. F. McLauchlan and D. W. Murray, “Unifying framework for structure and motion recovery from
image sequences,” in Proceedings of the 5th International Conference on Computer Vision (ICCV ’95), pp.
314–320, June 1995.
14 Mathematical Problems in Engineering
6M. Pollefeys, L. Van Gool, M. Vergauwen et al., “Visual modeling with a hand-held camera,”
International Journal of Computer Vision, vol. 59, no. 3, pp. 207–232, 2004.
7S. Avidan and A. Shashua, “Trajectory triangulation: 3D reconstruction of moving points from a
monocular image sequence,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22,
no. 4, pp. 348–357, 2000.
8N. Molton and M. Brady, “Practical structure and motion from stereo when motion is unconstrained,”
International Journal of Computer Vision, vol. 39, no. 1, pp. 5–23, 2000.
9A. Marugame, J. Katto, and M. Ohta, Structure recovery with multiple cameras from scaled
orthographic and perspective views,” IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 21, no. 7, pp. 628–633, 1999.
10A. Shashua and N. Navab, “Relative ane structure: canonical model for 3D from 2D geometry and
applications,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 9, pp. 873–883,
1996.
11F. Kahl and A. Heyden, “Ane structure and motion from points, lines and conics,” International
Journal of Computer Vision, vol. 33, no. 3, pp. 163–180, 1999.
12A. W. Fitzgibbon and A. Zisserman, “Automatic camera recovery for closed or open image
sequences,” in Proceedings of the ECCV, pp. 311–326, 1998.
13F. Schaalitzky and A. Zisserman, “Multi-view matching for unordered image sets, or “How do I
organize my holiday snaps?”,” in Proceedings of the IEEE Conference on Computer Vision, vol. 1, pp.
414–431, 2002.
14N. Snavely, S. M. Seitz, and R. Szeliski, “Photo tourism: exploring photo collections in 3D,” in
Proceedings of the International Conference on Computer Graphics and Interactive Technologies, pp. 835–846,
2006.
15R. Hartley, “Euclidean reconstruction from uncalibrated views,” in Applications of Invariance in
Computer Vision, J. L. Mundy, A. Zisserman, and D. Forsyth, Eds., Lecture Notes in Computer, 1994.
16A. Bartoli and P. Sturm, “Constrained structure and motion from multiple uncalibrated views of a
piecewise planar scene,” International Journal of Computer Vision, vol. 52, no. 1, pp. 45–64, 2003.
17R. Szeliski and S. B. Kang, “Recovering 3D Shape and Motion from Image Streams Using Nonlinear
Least Squares,” Journal of Visual Communication and Image Representation, vol. 5, no. 1, pp. 10–28, 1994.
18P. M. Q. Aguiar and J. M. F. Moura, “A fast algorithm for rigid structure from image sequences,” in
Proceedings of the IEEE International Conference on Image Processing (ICIP ’99), vol. 3, pp. 125–129, Kobe,
Japan, 1999.
19A. Hilton, “Scene modelling from sparse 3D data,” Image and Vision Computing, vol. 23, no. 10, pp.
900–920, 2005.
20P. Parodi and G. Piccioli, “3D shape reconstruction by using vanishing points,” IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 18, no. 2, pp. 211–217, 1996.
21M. Lhuillier, “Toward flexible 3D modeling using a catadioptric camera,” in Proceedings of the IEEE
Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ’07), pp. 1560–1567, June
2007.
22A. Bartoli and S. I. Olsen, “A batch algorithm for implicit non-rigid shape and motion recovery,” in
Proceedings of the International Conference on Dynamical Vision, 2006.
23D. Liebowitz and S. Carlsson, “Uncalibrated motion capture exploiting articulated structure
constraints,” International Journal of Computer Vision, vol. 51, no. 3, pp. 171–187, 2003.
24G. Wang and Q. M. J. Wu, “Stratification approach for 3-D euclidean reconstruction of nonrigid objects
from uncalibrated image sequences,” IEEE Transactions on Systems, Man, and Cybernetics, Part B,vol.
38, no. 1, pp. 90–101, 2008.
25J. Taylor, A. D. Jepson, and K. N. Kutulakos, “Non-rigid structure from locally-rigid motion,” in
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR
’10), pp. 2761–2768, June 2010.
26S. Zhu, L. Zhang, and B. M. Smith, “Model evolution: an incremental approach to non-rigid structure
from motion,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern
Recognition (CVPR ’10), pp. 1165–1172, June 2010.
27M. Farenzena and A. Fusiello, “Stabilizing 3D modeling with geometric constraints propagation,”
Computer Vision and Image Understanding, vol. 113, no. 11, pp. 1147–1157, 2009.
28R. Gong and G. Xu, “3D structure from a single calibrated view using distance constraints,” IEICE
Transactions on Information and Systems, vol. 87, no. 6, pp. 1527–1536, 2004.
29M. Marques and J. Costeira, “Estimating 3D shape from degenerate sequences with missing data,”
Computer Vision and Image Understanding, vol. 113, no. 2, pp. 261–272, 2009.
Mathematical Problems in Engineering 15
30T. Brodsky, C. Fermuller, and Y. Aloimonos, “Structure from motion: beyond the epipolar constraint,”
International Journal of Computer Vision, vol. 37, no. 3, pp. 231–258, 2000.
31A. Criminisi, I. Reid, and A. Zisserman, “Single view metrology,” International Journal of Computer
Vis ion, vol. 40, no. 2, pp. 123–148, 2000.
32H. Joachim, “Direct estimation of structure and motion from multiple zframes,” MIT AI Lab. Memo
1190, Massachusetts Institute of Technology, Mass, USA, 190.
33J. Oliensis, “Critique of structure-from-motion algorithms,” Computer Vision and Image Understanding,
vol. 80, no. 2, pp. 172–214, 2000.
34T. Rodriguez, P. Sturm, P. Gargallo et al., “Photorealistic 3D reconstruction from handheld cameras,”
Machine Vision and Applications, vol. 16, no. 4, pp. 246–257, 2005.
35S. N. Sinha, D. Steedly, R. Szeliski, M. Agrawala, and M. Pollefeys, “Interactive 3D architectural
modeling from unordered photo collections,” ACM Transactions on Graphics, vol. 27, no. 5, article 159,
2008.
36Y. Furukawa, B. Curless, S. M. Seitz, and R. Szeliski, Reconstructing building interiors from images,”
in Proceedings of the International Conference on Computer Vision, pp. 80–87, 2009.
37S. Agarwal, N. Snavely, I. Simon, S. M. Seitz, and R. Szeliski, “Building Rome in a day,” in Proceedings
of the 12th International Conference on Computer Vision (ICCV ’09), pp. 72–79, Kyoto, Japan, October
2009.
38Y. Furukawa, B. Curless, S. M. Seitz, and R. Szeliski, Towards internet-scale multi-view stereo,” in
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR
’10), pp. 1434–1441, San Francisco, Calif, USA, June 2010.
39Y. T. Zheng, M. Zhao, Y. Song et al., “Tour the World: building a web-scale landmark recognition
engine,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern
Recognition Workshops (CVPR ’09), pp. 1085–1092, June 2009.
40M. Pollefeys et al., “Detailed real-time urban 3D reconstruction from video,” International Journal of
Computer Vision, vol. 78, no. 2-3, pp. 143–167, 2008.
41O. Faugeras, L. Robert, S. Laveau et al., “3-D reconstruction of urban scenes from image sequences,”
Computer Vision and Image Understanding, vol. 69, no. 3, pp. 292–309, 1998.
42N. Cornelis, B. Leibe, K. Cornelis, and L. Van Gool, “3D urban scene modeling integrating recognition
and reconstruction,” International Journal of Computer Vision, vol. 78, no. 2-3, pp. 121–141, 2008.
43C. Collewet and F. Chaumette, “Visual servoing based on structure from controlled motion or on
robust statistics,” IEEE Transactions on Robotics, vol. 24, no. 2, pp. 318–330, 2008.
44M. Marengoni, A. Hanson, S. Zilberstein, and E. Riseman, “Decision making and uncertainty
management in a 3D reconstruction system,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 25, no. 7, pp. 852–858, 2003.
45T. Tung, F. Schmitt, and T. Matsuyama, “Topology matching for 3D video compression,” in Proceedings
of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ’07), pp. 2719–
2726, June 2007.
46D. Fidaleo and G. Medioni, “Model-assisted 3D face reconstruction from video,” in Proceedings of
the 3rd International Workshop on Analysis and Modeling of Faces and Gestures (AMFG ’07), vol. 4778 of
Lecture Notes in Computer Science, pp. 124–138, 2007.
47U. Park and A. Jain, “3D model-based face recognition in video,” in Proceedings of the Proceedings
International Conference on Advances in Biometrics (ICB ’07), vol. 4642 of Lecture Notes in Computer
Science, pp. 1085–1094, 2007.
48V. Nedovic, A. W. M. Smeulders, A. Redert, and J. M. Geusebroek, “Stages as models of scene
geometry,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1673–1687,
2010.
49B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo
vision,” in Proceedings of the International Joint Conference on Artificial Intelligence, pp. 674–679, 1981.
50C. Tomasi and T. Kanade, “Detection and tracking of point features,” Tech. Rep. CMU-91-132, CMU,
1991.
51J. Shi and C. Tomasi, “Good features to track,” in Proceedings of the IEEE Computer Society Conference
on Computer Vision and Pattern Recognition, pp. 593–600, June 1994.
52D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of
Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.
53N. Snavely, I. Simon, M. Goesele, R. Szeliski, and S. M. Seitz, “Scene reconstruction and visualization
from community photo collections,” Proceedings of the IEEE, vol. 98, no. 8, Article ID 5483186, pp.
1370–1390, 2010.
16 Mathematical Problems in Engineering
54H. C. Longuet-higgins, “A computer algorithm for reconstructing a scene from two projections,”
Nature, vol. 293, no. 5828, pp. 133–135, 1981.
55C. Tomasi and T. Kanade, “Shape and motion from image streams under orthography: a factorization
method,” International Journal of Computer Vision, vol. 9, no. 2, pp. 137–154, 1992.
56C. J. Poelman and T. Kanade, “A paraperspective factorization method for shape and motion
recovery,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 3, pp. 206–218,
1997.
57P. Sturm and B. Triggs, “A factorization based algorithm for multi-image projective structure and
motion,” in Proceedings of the 4th European Conference on Computer Vision, 1996.
58J. P. Tardif, A. Bartoli, M. Trudeau, N. Guilbert, and S. Roy, “Algorithms for batch matrix factorization
with application to structure-from-motion,” in Proceedings of the IEEE Computer Society Conference on
Computer Vision and Pattern Recognition (CVPR ’07), June 2007.
59B.Triggs,P.F.McLauchlan,R.I. Hartley, and A. W. Fitzgibbon, “Bundle adjustment—a modern
synthesis,” in Proceedings of the International Workshop on Vision Algorithms, pp. 298–372, 1999.
60E. Mouragnon, M. Lhuillier, M. Dhome, F. Dekeyser, and P. Sayd, “Generic and real-time structure
from motion using local bundle adjustment,” Image and Vision Computing, vol. 27, no. 8, pp. 1178–
1193, 2009.
61G. Zhang, J. Jia, T. T. Wong, and H. Bao, “Consistent depth maps recovery from a video sequence,”
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 6, pp. 974–988, 2009.
62T. Jebara, A. Azarbayejani, and A. Pentland, “3D structure from 2D motion,” IEEE Signal Processing
Magazine, vol. 16, no. 3, pp. 66–84, 1999.
63G. Wang, H. T. Tsui, and Q. M. Jonathan Wu, “What can we learn about the scene structure from three
orthogonal vanishing points in images,” Pattern Recognition Letters, vol. 30, no. 3, pp. 192–202, 2009.
64S. N. Sinha, D. Steedly, and R. Szeliski, “A multi-stage linear approach to structure from motion,” in
Proceedings of the European Conference on Computer Vision (ECCV ’10), 2010.
65J. I. Thomas and J. Oliensis, “Dealing with noise in multiframe structure from motion,” Computer
Vision and Image Understanding, vol. 76, no. 2, pp. 109–124, 1999.
66S. Carlsson and D. Weinshall, “Dual computation of projective shape and camera positions from
multiple images,” International Journal of Computer Vision, vol. 27, no. 3, pp. 227–241, 1998.
67D. Crandall, A. Owens, N. Snavely, and D. Huttenlocher, “Discrete-continuous optimization for large-
scale structure from motion,” in Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR ’11), 2011.
68M. Havlena, A. Torii, and T. Pajdla, “Ecient structure from motion by graph optimization,” in
Proceedings of the European Conference on Computer Vision (ECCV ’10), 2010.
69R. Gherardi, M. Farenzena, and A. Fusello, “Improving the eciency of hierarchical structure-
and-moton,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern
Recognition (CVPR ’10), pp. 1594–1600, June 2010.
70R. Roberts, S. Sinha, R. Szeliski, and D. Steedly, “Structure from motion for scenes with large duplicate
structures,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’11),
2011.
71T. Fang and L. Quan, “Resampling structure from motion,” in Proceedings of the European Conference
on Computer Vision (ECCV ’10), 2010.
72S. Savarese and S. Y. Z. Bao, “Semantic structure from motion,” in Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition (CVPR ’11), 2011.
73A. Bartoli and P. Sturm, “Structure-from-motion using lines: representation, triangulation, and bundle
adjustment,” Computer Vision and Image Understanding, vol. 100, no. 3, pp. 416–441, 2005.
74M. E. Spetsakis and J. Aloimonos, “Structure from motion using line correspondences,” International
Journal of Computer Vision, vol. 4, no. 3, pp. 171–183, 1990.
75C. J. Taylor and D. J. Kriegman, “Structure and motion from line segments in multiple images,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 17, no. 11, pp. 1021–1032, 1995.
76R. I. Hartley, “Linear method for reconstruction from lines and points,” in Proceedings of the 5th
International Conference on Computer Vision (ICCV ’95), pp. 882–887, June 1995.
77D. Tubic, P. Hebert, and D. Laurendeau, “3D surface modeling from curves,” Image and Vision
Computing, vol. 22, no. 9, pp. 719–734, 2004.
78J. E. Solem and A. Heyden, “Reconstructing open surfaces from image data,” International Journal of
Computer Vision, vol. 69, no. 3, pp. 267–275, 2006.
79J. Y. Kaminski and A. Shashua, “Multiple view geometry of general algebraic curves,” International
Journal of Computer Vision, vol. 56, no. 3, pp. 195–219, 2004.
Mathematical Problems in Engineering 17
80R. Berthilsson, K. Astrom, and A. Heyden, “Reconstruction of general curves, using factorization and
bundle adjustment,” International Journal of Computer Vision, vol. 41, no. 3, pp. 171–182, 2001.
81C. Liang and K. Y. K. Wong, “3D reconstruction using silhouettes from unordered viewpoints,” Image
and Vision Computing, vol. 28, no. 4, pp. 579–589, 2010.
82R. Hartley and F. Kahl, “Critical configurations for projective reconstruction from multiple views,”
International Journal of Computer Vision, vol. 71, no. 1, pp. 5–47, 2007.
83T. Joshi, N. Ahuja, and J. Ponce, “Structure and motion estimation from dynamic silhouettes under
perspective projection,” International Journal of Computer Vision, vol. 31, no. 1, pp. 31–50, 1999.
84X. Liu, H. Yao, and W. Gao, “Shape from silhouette outlines using an adaptive dandelion model,”
Computer Vision and Image Understanding, vol. 105, no. 2, pp. 121–130, 2007.
85Y. Yemez and C. J. Wetherilt, “A volumetric fusion technique for surface reconstruction from
silhouettes and range data,” Computer Vision and Image Understanding, vol. 105, no. 1, pp. 30–41, 2007.
86N. Snavely, S. M. Seitz, and R. Szeliski, “Modeling the world from Internet photo collections,”
International Journal of Computer Vision, vol. 80, no. 2, pp. 189–210, 2008.
87S. M. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski, “A comparison and evaluation of
multi-view stereo reconstruction algorithms,” in Proceedings of the IEEE Computer Society Conference on
Computer Vision and Pattern Recognition (CVPR ’06), pp. 519–526, June 2006.
88I. Simon, N. Snavely, and S. M. Seitz, “Scene summarization for online image collections,” in
Proceedings of the IEEE 11th International Conference on Computer Vision (ICCV ’07), October 2007.
89S. Agarwal, Y. Furukawa, N. Snavely, B. Curless, S. M. Seitz, and R. Szeliski, Reconstructing Rome,”
Computer, vol. 43, no. 6, pp. 40–47, 2010.
90G.Zhang,Z.Dong,J.Jia,T.T.Wong,andH.Bao,“Ecient non-consecutive feature tracking for
structure-from-motion,” in Proceedings of the European Conference on Computer Vision (ECCV ’10), 2010.
91E. Marchand and F. Chaumette, “Active vision for complete scene reconstruction and exploration,
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 1, pp. 65–72, 1999.
92F. Dornaika and C. K. R. Chung, “Stereo geometry from 3-D ego-motion streams,” IEEE Transactions
on Systems, Man, and Cybernetics, Part B, vol. 33, no. 2, pp. 308–323, 2003.
93O. Koch and S. Teller, “Wide-area egomotion estimation from known 3D structure,” in Proceedings of
the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ’07), pp. 437–
444, June 2007.
94P. Debevec, C. Tchou et al., “Estimating surface reflectance properties of a complex scene under
captured natural illuminatio n,” Tech. Rep. ICT-TR-06.2004, University of Southern California Institute
for Creative Technologies, Marina del Rey, Calif, USA, 2004.
95N. Snavely, S. M. Seitz, and R. Szeliski, “Skeletal graphs for ecient structure from motion,” in
Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’08),June
2008.
Submit your manuscripts at
http://www.hindawi.com
Operations
Research
Advances in
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2013
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2013
Mathematical Problems
in Engineering
Abstract and
Applied Analysis
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2013
ISRN
Applied
Mathematics
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2013
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2013
International Journal of
Combinatorics
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2013
Journal of Function Spaces
and Applications
International
Journal of
Mathematics and
Mathematical
Sciences
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2013
ISRN
Geometry
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2013
Discrete Dynamics in
Nature and Society
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2013
Hindawi Publishing Corporation
http://www.hindawi.com
Volume 2013
Advances in
Mathematical Physics
ISRN
Algebra
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2013
Probability
and
Statistics
Journal of
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2013
ISRN
Mathematical
Analysis
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2013
Journal of
Applied Mathematics
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2013
Advances in
Decision
Sciences
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2013
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2013
Stochastic Analysis
International Journal of
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2013
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2013
The Scientic
World Journal
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2013
ISRN
Discrete
Mathematics
Hindawi Publishing Corporation
http://www.hindawi.com
Dierential Equations
International Journal of
Volume 2013
... They found that the error was always generated by all methods and represented as quartic form in the unknowns. However, most methods might ignore some of the terms with nested leastsquares solutions [36,37,51]. It might lead an error greatly with noisy data. ...
Article
Non-rigid Structure from Motion (NRSfM) is a classical computer vision problem. And the main methods used to solve it are in general based on shape models or trajectory models. This paper will provide an overview over kinds of solutions proposed in these researches. It not only gives out the theoretical insights proposed by researchers in recent years, but also discusses them with their pros and cons. At the same time, the progress of the research about this topic is described in detail and its long-term trend is introduced at the end. This paper is very easy to understand, which mainly introduces two practical, everyday models for the NRSfM problem, namely trajectories based model and shape based model. Both of them are based on matrix factorization technology. Inevitably, some relevant optimization methods are mentioned to solve the projection matrix and corresponding coefficients effectively. © 2015 Binary Information Press & Textile Bioengineering and Informatics Society December 2015.
... Such methods are usually not very robust to the environment. Moreover, monocular vision methods can gain 3D information from a sequence of images from different views (shape from motion, SFM) (Chen et al., 2012). The SFM method has a high time cost and space cost. ...
Article
Full-text available
This paper presents an automatic three-dimensional reconstruction method based on multi-view stereo vision for the Mogao Grottoes. 3D digitization technique has been used in cultural heritage conservation and replication over the past decade, especially the methods based on binocular stereo vision. However, mismatched points are inevitable in traditional binocular stereo matching due to repeatable or similar features of binocular images. In order to reduce the probability of mismatching greatly and improve the measure precision, a portable four-camera photographic measurement system is used for 3D modelling of a scene. Four cameras of the measurement system form six binocular systems with baselines of different lengths to add extra matching constraints and offer multiple measurements. Matching error based on epipolar constraint is introduced to remove the mismatched points. Finally, an accurate point cloud can be generated by multi-images matching and sub-pixel interpolation. Delaunay triangulation and texture mapping are performed to obtain the 3D model of a scene. The method has been tested on 3D reconstruction several scenes of the Mogao Grottoes and good results verify the effectiveness of the method.
... Image sequence contains more depth information than a single image [20]. Assuming that an object moves from left to right in image, the object with large motion distance appears to be closer than the object with small motion distance. ...
Article
Full-text available
A method for estimating the depth information of a general monocular image sequence and then creating a 3D stereo video is proposed. Distinguishing between foreground and background is possible without additional information, and then foreground pixels are moved to create the binocular image. The proposed depth estimation method is based on coarse-to-fine strategy. By applying the CID method in the spatial domain, the sharpness and the contrast of an image can be improved by the distance of the region based on its color. Then a coarse depth map of the image can be generated. An optical-flow method based on temporal information is then used to search and compare the block motion status between previous and current frames, and then the distance of the block can be estimated according to the amount of block motion. Finally, the static and motion depth information is integrated to create the fine depth map. By shifting foreground pixels based on the depth information, a binocular image pair can be created. A sense of 3D stereo can be obtained without glasses by an autostereoscopic 3D display.
Chapter
Full-text available
Medical imaging techniques are a vital tool in disease diagnosis. The images are being developed to satisfy the growing need for important information from medical image scans by anticipating constitutional tissues for clinical analysis. The application of deep learning techniques is increasing with the demand for automatic diagnosis of medical imaging. Different layers are used in deep learning models to represent data abstraction and construct computational models. Imaging techniques allow medical experts such as radiologists to correctly recognize a patient’s condition, making medical procedures more accessible and automated. The review’s primary goal is to present a study on recent brain tumor detection segmentation and classification approaches. Brain tumors are reviewed because of their importance compared to other tumors and their high illness rate. Many brain tumor segmentation models have been described to grasp these methodologies well, along with their limits and benefits. The study focuses primarily on contemporary deep learning-based brain tumor detection technologies, such as deep generative and deep learning networks. The more advanced and recent techniques available in the literature are also reviewed to describe the methods for performing image segmentation and to emphasize the importance of segmentation models that are not used in real-time due to little or no interaction between clinicians and developers. Most research does not consider the data augmentation element of brain tumor segmentation, which is critical for improving performance. The most challenging feature, or limitation, is the fluctuation in the morphology of tumors or the intensity degree of tumors, both of which still require study in this arena.
Article
Based on a new definition for derivative and integral of fractional-order, several fractional masks have been presented for the use of image denoising. In each method, the process involves constructing a square and then applying it to all the corresponding blocks in the noisy image. We have measured the denoising performance of our proposed masks by employing some known indexes. They are the peak signal-to-noise ratio (PSNR), ENTROPY, and SSIM. The obtained experimental results show that our proposed masks are computationally efficient, and their performances are compatible with other standard and fractional smoothing filters.
Article
An automatic three-dimensional (3D) reconstruction method based on four-view stereo vision using checkerboard pattern is presented. Mismatches easily exist in traditional binocular stereo matching due to the repeatable or similar features of binocular images. In order to reduce the probability of mismatching and improve the measure precision, a four-camera measurement system which can add extra matching constraints and offer multiple measurements is applied in this work. Moreover, a series of different checkerboard patterns are projected onto the object to obtain dense feature points and remove mismatched points. Finally, the 3D model is generated by performing Delaunay triangulation and texture mapping on the point cloud obtained by four-view matching. This method was tested on the 3D reconstruction of a terracotta soldier sculpture and the Buddhas in the Mogao Grottoes. Their point clouds without mismatched points were obtained and less processing time was consumed in most cases relative to binocular matching. These good reconstructed models show the effectiveness of the method. © 2017, Central South University Press and Springer-Verlag GmbH Germany.
Conference Paper
This paper presents a novel Matching Propagation Framework for addressing the problem of finding better matching pairs between each two images, which is one of the most fundamental tasks in computer vision and pattern recognition. We first select initial seed points by original matching method like SIFT, and then use T-CM to explore more seed points. Finally, a triangle constraint based quasi-dense algorithm is adopted to propagate better matches around seed points. The experimental evaluation shows that our method can get a more precise matching result than classical quasi-dense algorithm. And the 3D reconstruction of the scene from our method has a good visual effect. Both experiments demonstrate the robust performance of our method.
Conference Paper
This paper introduces a Kinect driven 3D character animation system using semantically skeleton. The system obtains the capture character performance animation data from the Kinect device. For the motion data from Kinect is semantically, a semantic skeleton is embedded into the character model. The semantic skeleton was firstly selected from a motion capture database, and then using the normal characteristic value around joint can be computed out in the called joint surface. In the real time animation synthesis process, the system embedded the skeleton based on the feature points to make the 3D character movement with the performer before Kinect realistically. Several experiments are carried out to demonstrate the efficiency of the proposed method.
Article
Non-rigid structure-from-motion is one of the difficult and challenging problems in computer vision, especially when the only input available is 2D correspondences in monocular video sequence. This paper proposed a new constraint based framework for underconstrained non-rigid structure-from-motion problem to constrain the space of solution. The proposed method is based on a point trajectory approach with an additional uniqueness constraint applied to shape coefficients to reduce the basis required to construct the non-rigid 3D shape. A framework for occluded and incomplete measured data is also proposed using low rank matrix fitting which is a robust factorization scheme for the matrix completion problem. This method offers not only new theoretical insight, but also a practical, everyday solution, to non-rigid structure-from-motion. The proposed method is positively compared to the state-of-the-art in non-rigid structure-from-motion, providing improved results on high-frequency deformations of both articulated and simpler deformable shapes.
Article
Full-text available
We present a process for estimating spatially-varying surface re- flectance of a complex scene observed under natural illumination conditions. The process uses a laser-scanned model of the scene's geometry, a set of digital images viewing the scene's surfaces under a variety of natural illumination conditions, and a set of correspond- ing measurements of the scene's incident illumination in each pho- tograph. The process then employs an iterative inverse global illu- mination technique to compute surface colors for the scene which, when rendered under the recorded illumination conditions, best re- produce the scene's appearance in the photographs. In our process we measure BRDFs of representative surfaces in the scene to better model the non-Lambertian surface reflectance. Our process uses a novel lighting measurement apparatus to record the full dynamic range of both sunlit and cloudy natural illumination conditions. We employ Monte-Carlo global illumination, multiresolution geome- try, and a texture atlas system to perform inverse global illumina- tion on the scene. The result is a lighting-independent model of the scene that can be re-illuminated under any form of lighting. We demonstrate the process on a real-world archaeological site, show- ing that the technique can produce novel illumination renderings consistent with real photographs as well as reflectance properties that are consistent with ground-truth reflectance measurements.
Article
We present a system for interactively browsing and exploring large unstructured collections of photographs of a scene using a novel 3D interface. Our system consists of an image-based modeling front end that automatically computes the viewpoint of each photograph as well as a sparse 3D model of the scene and image to model correspondences. Our photo explorer uses image-based rendering techniques to smoothly transition between photographs, while also enabling full 3D navigation and exploration of the set of images and world geometry, along with auxiliary information such as overhead maps. Our system also makes it easy to construct photo tours of scenic or historic locations, and to annotate image details, which are automatically transferred to other relevant images. We demonstrate our system on several large personal photo collections as well as images gathered from Internet photo sharing sites.
Article
The factorization method described in this series of reports requires an algorithm to track the motion of features in an image stream. Given the small inter-frame displacement made possible by the factorization approach, the best tracking method turns out to be the one proposed by Lucas and Kanade in 1981. The method defines the measure of match between fixed-size feature windows in the past and current frame as the sum of squared intensity differences over the windows. The displacement is then defined as the one that minimizes this sum. For small motions, a linearization of the image intensities leads to a Newton-Raphson style minimization. In this report, after rederiving the method in a physically intuitive way, we answer the crucial question of how to choose the feature windows that are best suited for tracking. Our selection criterion is based directly on the definition of the tracking algorithm, and expresses how well a feature can be tracked. As a result, the criterion is optima...
Conference Paper
The paper proposes a statistical framework that enables 3D structure and motion to be computed optimally from an image sequence, on the assumption that feature measurement errors are independent and Gaussian distributed. The analysis and results demonstrate that computing both camera/scene motion and 3D structure is essential to computing either with any accuracy. Having computed optimal estimates of structure and motion over a small number of initial images, a recursive version of the algorithm (previously reported) recomputes sub optimal estimates given new image data. The algorithm is designed explicitly for real time implementation, and the complexity is proportional to the number of tracked features. 3D projective, affine and Euclidean models of structure and motion recovery have been implemented, incorporating both point and line features into the computation. The framework can handle any feature type and camera model that may be encapsulated as a projection equation from scene to image
Article
A simple algorithm for computing the three-dimensional structure of a scene from a correlated pair of perspective projections is described here, when the spatial relationship between the two projections is unknown. This problem is relevant not only to photographic surveying1 but also to binocular vision2, where the non-visual information available to the observer about the orientation and focal length of each eye is much less accurate than the optical information supplied by the retinal images themselves. The problem also arises in monocular perception of motion3, where the two projections represent views which are separated in time as well as space. As Marr and Poggio4 have noted, the fusing of two images to produce a three-dimensional percept involves two distinct processes: the establishment of a 1:1 correspondence between image points in the two views—the 'correspondence problem'—and the use of the associated disparities for determining the distances of visible elements in the scene. I shall assume that the correspondence problem has been solved; the problem of reconstructing the scene then reduces to that of finding the relative orientation of the two viewpoints.
Conference Paper
We present a new structure from motion (Sfm) technique based on point and vanishing point (VP) matches in images. First, all global camera rotations are computed from VP matches as well as rel-ative rotation estimates obtained from pairwise image matches. A new multi-staged linear technique is then used to estimate all camera trans-lations and 3D points simultaneously. The proposed method involves first performing pairwise reconstructions, then robustly aligning these in pairs, and finally aligning all of them globally by simultaneously es-timating their unknown relative scales and translations. In doing so, measurements inconsistent in three views are efficiently removed. Unlike sequential Sfm, the proposed method treats all images equally, is easy to parallelize and does not require intermediate bundle adjustments. There is also a reduction of drift and significant speedups up to two order of magnitude over sequential Sfm. We compare our method with a standard Sfm pipeline [1] and demonstrate that our linear estimates are accurate on a variety of datasets, and can serve as good initializations for final bundle adjustment. Because we exploit VPs when available, our approach is particularly well-suited to the reconstruction of man-made scenes.