ArticlePDF Available

Key Issues in Modeling of Complex 3D Structures from Video Sequences

January 2012
Mathematical Problems in Engineering 2012(9)

January 2012
2012(9)

DOI:10.1155/2012/856523

License
CC BY 3.0

Authors:

Carlo Cattani

Tuscia University

Construction of three-dimensional structures from video sequences has wide applications for intelligent video analysis. This paper summarizes the key issues of the theory and surveys the recent advances in the state of the art. Reconstruction of a scene object from video sequences often takes the basic principle of structure from motion with an uncalibrated camera. This paper lists the typical strategies and summarizes the typical solutions or algorithms for modeling of complex three-dimensional structures. Open difficult problems are also suggested for further study.

Tracking trajectory of KLT tracker through a video sequence.

…

3D reconstruction pipeline.

…

Example set of detected KLT features.

…

Example set of detected SIFT features.

…

SIFT feature matches between images.

…

Figures - uploaded by Carlo Cattani

Content may be subject to copyright.

Content uploaded by Carlo Cattani

Content may be subject to copyright.

Hindawi Publishing Corporation

Mathematical Problems in Engineering

Volume 2012, Article ID 856523, 17 pages

doi:10.1155/2012/856523

Research Article

Key Issues in Modeling of Complex 3D Structures

from Video Sequences

Shengyong Chen,1Yuehui Wang,2and Carlo Cattani3

1College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China

2College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China

3Department of Mathematics, University of Salerno, Via Ponte Don Melillo, 84084 Fisciano, Italy

Correspondence should be addressed to Shengyong Chen, sy@ieee.org

Received 1 July 2011; Accepted 22 August 2011

Academic Editor: Gani Aldashev

Creative Commons Attribution License, which permits unrestricted use, distribution, and

reproduction in any medium, provided the original work is properly cited.

Construction of three-dimensional structures from video sequences has wide applications for intel-

ligent video analysis. This paper summarizes the key issues of the theory and surveys the recent

advances in the state of the art. Reconstruction of a scene object from video sequences often takes

the basic principle of structure from motion with an uncalibrated camera. This paper lists the

typical strategies and summarizes the typical solutions or algorithms for modeling of complex

three-dimensional structures. Open diﬃcult problems are also suggested for further study.

1. Introduction

Over the past two decades, many researchers seek to reconstruct the model of a three-

dimensional 3Dscene structure and camera motion from video sequences taken with an

uncalibrated camera or unordered photo collections from the Internet. Most traditionally,

depth measurement and 3D metric reconstruction can be done from two uncalibrated stereo

images 1. Nowadays, reconstructing a 3D scene from a moving camera is one of the most

important issues in the ﬁeld of computer vision. This is a very challenging task because of

its computational eﬃciency, generality, complexity, and exactitude. In this paper, we aim to

show the development and current status of the 3D reconstruction algorithms on this topic.

The basic concept and knowledge of the problem can be found from the fundamentals

of the multiview geometry through the books and thesis such as Multiple View Geometry in

Computer Vision 2,The Geometry of Multiple Images 3,Triangulation 4,andsometypical

publications 5–8, which are independent for implementing an entire system. Multiple-view

geometry is most fundamental in computer vision, and the algorithms of structure from

2 Mathematical Problems in Engineering

motion are based on the perspective geometry, aﬃne geometry, and the Euclidean geometry.

For simultaneous computation of 3D points and camera positions, this is a linear algorithm

framework for the Euclidean structure recovery utilizing a scaled orthographic view and

perspective views based on having a reference plane visible in all views 9.Thereisanaﬃne

framework for perspective views that are captured by a single extremely simple equation

based on a viewer-centered invariant, called relative aﬃne structure 10. A comprehensive

method is used for estimating scene structure and camera motion from an image sequence

taken by aﬃne cameras which can incorporate all point, line, and conic features in a uniﬁed

manner 11. The other approach tries to calculate the cameras along with the 3D points, only

relying on established correspondences between the observed images. These systems and

improvements are covered in many publications 2,6,12–15. The literature gives a compact

yet accessible overview covering a complete reconstruction system.

For multiview modeling of a rigid scene, an approach is presented in 16,which

merges traditional approaches to reconstructing image-extractable features, and modeling via

user-provided geometry includes steps to obtain features for a ﬁrst guess of the structure and

motion, ﬁt geometric primitives, correct the structure so that reconstructed features would

lie exactly on geometric primitives, and optimize both structure and motion in a bundle

adjustment manner. A nonlinear least square algorithm is presented in 17for recovering

3D shape and motion from image streams.

Sparse 3D measurements of real scenes are readily estimated from N-view image se-

quences using structure-from-motion techniques. There is a fast algorithm for rigid structure

from image sequences in 18. Hilton presents ageometric theory for reconstruction of surface

models from sparse 3D data captured from N camera views 19for 3D shape reconstruction

by using vanishing points 20.Relativeaﬃne structure is given for canonical model for 3D

from 2D geometry and applications 10.

The paper describes the progress in automatic recovering 3D scene structures together

with 3D camera positions from a sequence of images acquired by an unknown camera

undergoing unknown movement 12. The main departure from previous structure from

motion strategies is that the processing is not sequential. Instead, a hierarchical approach

is employed for building from image triplets and associated trifocal tensors. A method is

presented for dealing with hundreds of images without precise calibration knowledge 21.

Optimizing just over the motion unknowns is fast, and given the recovered motion, one can

recover the optimal structure algebraically for two images 4.

In fact, reconstruction of nonrigid scenes is very important in structure from motion.

The recovery of 3D structure and camera motion for nonrigid scenes from single-camera

video footages is a key problem in computer vision. For an implicit imaging model of non-

rigid scenes, there is an approach that gives a nonrigid structure-from-motion algorithm

based on computing matching tensors over subsequences, and each nonrigid matching tensor

is computed, along with the rank of the subsequence, using a robust estimator incorporating

a model selection criterion that detects erroneous image points 22. Uncalibrated motion

captures exploiting articulated structure constraints 23such as humans. The technique

shows promise as a means of creating 3D animations of dynamic activities such as sports

events. For the problem of 3D reconstruction of nonrigid objects from uncalibrated image

sequences, under the assumption of an aﬃne camera and that the nonrigid object is composed

of a rigid part and a deformation part, a stratiﬁcation approach can be used to recover

the structure of nonrigid objects by ﬁrst reconstructing the structure in aﬃne space and

then upgrading it to the Euclidean space 24. In addition, a general framework of local-

ly rigid motion for solving the M-point and N-view structure-from-motion problem for

Mathematical Problems in Engineering 3

unknown bodies deforming under orthography is presented in 25. An incremental ap-

proach is presented in 26, where a new framework for nonrigid structure from motion

simultaneously addresses three signiﬁcant challenges: severe occlusion, perspective camera

projection, and large non-linear deformation.

With the development of structure-from-motion algorithms, geometry constraint and

optimization are necessary for reconstructing a good 3D model of the object or scene. Many

researchers give us some useful approaches. For example, a technique is proposed in 27for

estimating piecewise planar models of objects from their images and geometric constraints

and 3D structure from a single calibrated view using distance constraints 28.Marques

and Costeira present an approach to estimating 3D shape from degenerated sequences with

missing data 29. Beyond the epipolar constraint, it improves the eﬀect of structure from

motion 30.

3D aﬃne measurements may be computed from a single perspective view of a scene

given only minimal geometric information determined from the image. This minimal infor-

mation is typically the vanishing line of a reference plane and a vanishing point for a direction

not parallel to the plane. Without camera parameters, Criminisi et al. 31show how to i

compute the distance between planes parallel to the reference plane; iicompute area and

length ratios on any plane parallel to the reference plane; iiidetermine the camera’ location.

Direct estimation is the fundamental estimation of scene structure and camera motion

from a sequence of images. No computation of optical ﬂow or feature correspondences is

required 32. A good critique on structure-from-motion algorithms can be found in 33by

Oliensis.

The remainder of this paper is organized as follows. Section 2 brieﬂy gives some typ-

ical applications of structure from video sequences. Section 3 introduces the general recon-

struction principle of structure from video sequences and unstructured photo collections.

Section 4 outlines the methods for structure and motion estimation. Section 5 discusses the

relevant available algorithms for every step to obtain a better result. We oﬀer our impres-

sions of current and future trends in the topic and conclude the development in Sections 6

and 7.

2. Typical Applications

2.1. Modeling and Reconstruction of 3D Buildings or Landmarks

For 3D reconstruction of an object or building, Pollefeys et al. typically present a complete

system to build visual model with a hand-held camera 6. There is a system for photorealistic

3D reconstruction from hand-held cameras 34. Sinha et al. 35present an algorithm

for interactive 3D architectural models from unordered photo collections. There is a fully

automated 3D reconstruction and visualization system for architectural scenes including its

interiors and exteriors 36. The system utilizes structure-from-motion, multiview stereo and

a stereo algorithm.

The 3D models of historical relics and buildings, for example, the Emperor Qin’s Terra-

cotta Warriors and Piazza San Marco, have very signiﬁcant meanings for archeologists. A

system that can match and reconstruct 3D scenes from extremely large collections of photo-

graphs has been developed by Agarwal et al. 37. A method for enabling existing multiview

stereo algorithms to operate on extremely large unstructured photograph collections has been

contrived by Furukawa et al. 38. This approach is to decompose the collection into a set

4 Mathematical Problems in Engineering

of overlapping sets of photos that can be processed in parallel and to merge the resulting

reconstructions 38. People want to sightsee the famous buildings or landscapes from the In-

ternet; they could tour the world via building a web-scale landmark recognition engine 39.

Modeling and recognizing landmarks at world scale is a useful yet challenging task.

There exists no readily available list of worldwide landmarks. Obtaining reliable visual

models for each landmark can also pose problems, and eﬃciency is another challenge for

such a large-scale system. Zheng et al. leverage the vast amount of multimedia data on the

web, the availability of an Internet image search engine, and advances in object recognition

and clustering techniques, to address these issues 39.

2.2. Urban Reconstruction

Modeling the world and reconstructing a city present many challenges for a visualization

system in computer vision. It can use some products such as Google Earth Google Map.

For instance, Pollefeys et al. 40present a system for automatic, georegistered, real-time

multiview stereo 3D reconstruction form long image sequences of urban scenes. The system

collects video streams, as well as GPS and inertia measurements in order to obtain the

georegistered coordinates of the 3D models 40. Faugeras et al. 41address the problem

of recovery of a realistic textured model of a scene from a sequence of images, without any

prior knowledge either about the parameters of the cameras or about their motion.

2.3. Navigation

If the world’s model or the city’s reconstruction is exhaustively completed, we can obtain

relative location of the buildings and ﬁnd related views for navigation for robots or other

vision systems. Photo Tourism can enable full 3D navigation and exploration of the set of

images and world geometry, along with auxiliary information such as overhead maps 14.It

gives several modes for navigation, including free-ﬁght navigation, moving between related

views, object-based navigation, and creating stabilized slideshows. The system by Pollefeys et

al. also contains the navigation function 40. Supplying realistically textured 3D city models

at ground level promises to be useful for previsualizing upcoming traﬃcsituationsincar

navigation systems 42.

2.4. Visual Servoing

In the literature, there are applications that can employ SfM algorithms successfully in prac-

tical engineering. For instance, based on structure from controlled motion or on robust statis-

tics, a visual servoing system is presented in 43. A general-purpose image understanding

system via a control structure is designed by Marengoni et al. 44and 3D video compression

via topology matching 45. More applications are being developed by researchers and en-

gineers in the community.

2.5. Scene Recognition and Understanding

3D reconstruction is an important application to face recognition, facial expression analysis,

and so on. Fidaleo and Medioni 46design a model-assisted system for reconstruction of 3D

Mathematical Problems in Engineering 5

faces from a single-consumer quality camera using a structure-from-motion approach. Park

and Jain 47present an algorithm for 3D-model-based face recognition in video.

Reconstruction of 3D scene geometry is animportant element for scene understanding,

autonomous vehicle and robot navigation, image retrieval, and 3D television 48.Nedovic

et al. propose accounting for the inherent structure of the visual world when trying to solve

the scene reconstruction problem 48.

3. Information Organization

The goal of structure-form-motion is automatic recovery of camera motion and scene

structure from two or more images. The problem of using pixel correspondences or track

points to determine camera and point geometry in this manner is known as structure from

motion. It is a self-calibration technique and called automatic camera tracking or match

moving. We must consider several questions like

1Correspondence feature extracting and tracking or matching: given a point in

one image, how does it constrain the position of the corresponding point in other

images?

2Scene geometry structure: given point matches in two or more images, where are

the corresponding points in 3D?

3Camera geometry motion: given a set of corresponding points in two or more

images, what are the camera matrices for these views?

Based on these questions, we can give the 3D reconstruction pipeline as in Figure 1.

The goal of correspondence is to build a set of matching 2D coordinates of pixels across

the video sequences. It is a signiﬁcant step in the ﬂow of the structure from motion. Cor-

respondence is always a challenging task in computer vision. So far, many researchers have

developed some practical and robust algorithms. Given a video sequence of scene, how can

we ﬁnd matching points?

Firstly, there are some well-known algorithms for image sequences or videos; one

popular is the KLT tracker 49–51. It gives us an integrated system that can automatically

detect the KLT feature points and track them. However, it cannot apply to the situations with

wide baseline, illustration changing, variant scale, duplicate and similar structure, occlusion,

noise, image distortion, and so on. Generally speaking, for video sequences, the KLT tracker

can perform a good eﬀect. Figures 2and 3show examples of the feature points of the KLT

detector output with example images from http://www.ces.clemson.edu/∼stb/klt/.

In the KLT tracker 49–51, if the time interval between two frames of video is

suﬃciently short, we can suppose that the positions of feature points move, but their

intensities do not change; that is,

Ix,t

Iδx,tΔt,3.1

where xis the position of a feature point and δxis a transformation function.

In the papers of Lucas and Kanade 49, Tomasi and Kanade 50, and Shi and Tomasi

51, the authors made an important hypothesis that for high enough frame rates, δxcan be

approximated with a displacement vector d:

Ix,t

Ixd,tΔt.3.2

6 Mathematical Problems in Engineering

Video sequences

Feature detection

Feature matching

Structure

from motion P

3D point positions and camera poses

Figure 1: 3D reconstruction pipeline.

Figure 2: Example set of detected KLT features.

Then symmetric deﬁnition for the dissimilarity between two windows, one in image

Ix,tand one in image Ixd,tΔt, is as follows:

εW

Ixd,tΔt−Ix,t

2ωxdx,3.3

where ωxis the weighting function, usually set to the constant 1. The algorithm is cal-

culating the vector dwhich minimizes. Now, utilizing the ﬁrst-order Taylor expansion of

Mathematical Problems in Engineering 7

Figure 3: Tracking trajectory of KLT tracker through a video sequence.

Ixd,tΔtto truncate to the linear term and setting the derivative of εwith respect to d

to 0, obtaining the linear equation:

Zde,3.4

where Zis the following 2 ×2matrix:

ZW

gxgTxωxdx3.5

and eis the following 2 ×1vector:

eW

Ixd,tΔt−Ix,t

gxωxdx,3.6

where gx∂I/∂x.

On the other hand, for a completely unorganized set of images, the tracker becomes

invalid. There is another popular algorithm in computer vision area, named scale-invariant

feature transform SIFT52.Itiseﬀective to feature detection and matching in a wide class

of image transformation, including rotations, scales, and changes in brightness or contrast,

and to recognize panoramas 53. Figures 4and 5show examples of the feature points of the

SIFT output with example images from http://www.cs.ubc.ca/∼lowe/keypoints/.

4. Structure and Motion Estimation

Assume that we have obtained a set of correspondences between images or video sequence,

and then we use the set to reconstruct the 3D structure of each point in the set of corre-

spondences and recover the motion of a camera. This task is called structure from motion.

The problem has been an active research topic in computer vision since the development

of the Longuet-Higgins eight-point algorithm 54that focused on reconstructing geometry

8 Mathematical Problems in Engineering

50 100 150 200 250 300

100

150

200

250

300

350

350 400 450 500

Figure 4: Example set of detected SIFT features.

100

150

200

250

300

350

100 200 300 400 500 600 700 800

Figure 5: SIFT feature matches between images.

from two views. In the literature 2, several diﬀerent approaches to solve the structure-from-

motion problem are given.

4.1. Factorization

There is a popular factorization algorithm for image streams under orthography, using

many images and tracking many feature points to obtain highly redundant feature position

information, which was ﬁrstly developed by Tomasi and Kanade 55in the 1990s. The

main idea of this algorithm is to factorize the tracking matrix into structure and motion

matrices simultaneously via singular value decomposition SVDmethod with low-rank

approximation, taking advantage of the linear algebraic properties of orthographic projection.

However, an orthographic formulation limits the range of motions the method can

accommodate. Perspective projection is a projection model that closely approximates per-

spective projection by modeling several eﬀects not modeled under orthographic projection,

while retaining linear algebraic properties 56,57. Poelman and Kanade 56have developed

Mathematical Problems in Engineering 9

a paraperspective factorization method that can be applied to a much wider range of motion

scenarios, including image sequences containing motion toward the camera and aerial image

sequences of terrain taken from a low-altitude airplane.

With the development of factorization method, a factorization- based algorithm for

multi-image projective structure and motion is developed by Sturm and Triggs 57.This

technique is a practical approach for recovery of scaled feature points, using fundamental

matrix and epipoles estimated from the image sequences.

Because matrix factorization is a key component for solving several computer vision

problems, Tardif et al. have proposed batch algorithms for matrix factorization 58that are

based on closure and basis constraints, which handle the presence of missing or erroneous

data, which often arise in structure from motion.

In mathematical expression of the factorization algorithm, assume that the tracked

points are {xj

i,y

i|i1,...,n;j1,...,m}. The algorithm deﬁnes the measurement matrix

W:W

U

V

.TherowsofUand Vare then registered by subtracting from each entry the

mean of the entries in that row:

ixj

i−1

nxj

iyj

i−1

nyj

4.1

The goal of the Tomasi-Kanade algorithm 55is to factorize Winto two matrices as follows:

WMX,4.2

where M, named motion matrix, is a 2m×3 matrix which represents the camera rotation in

each frame and X, named structure matrix, is a 3 ×nmatrix which denotes the positions of

the feature points in object space. So in the absence of the Gauss noise, rank W≤3.

Then we can compute SVD decomposition of Wto obtain UDVT:

WUDVT,4.3

where if the singular value of Wis σ1,σ

2,σ

3, we can get the matrix Mσ1u1,σ

2u2,σ

3u3

and Xv1,v2,v3.

The method can also handle and obtain a full solution from a partially ﬁlled-in

measurement matrix, which occurs when features appear and disappear in the video due to

occlusions or tracking failures 55. This method gives accurate results and does not introduce

smoothing in structure and motion. Using the above method, the problem can be solved for

the video of general scene such as building and sculpture Figure 6.

4.2. Bundle Adjustment

Bundle adjustment is a signiﬁcant component of most structure from motion systems. It is the

joint nonlinear reﬁnement of camera and point parameters, so it can consume a large amount

of time for large problems. Unfortunately, the optimization underlying structure from motion

10 Mathematical Problems in Engineering

−100

100

200

300

400

500

−50 0

50 100

100

150 200

200

250 300 300

350

Figure 6: Example of recovering structure and motion.

involves a complex, nonlinear objective function with no closed-form solution, due to non-

linearities in perspective geometry. Most modern approaches use nonlinear least squares

algorithms 17to minimize this objective function, a process known as bundle adjustment;

53that is, basic mathematics of the bundle adjustment problem is well understood 59.

Generally speaking, bundle adjustment is a global algorithm, but it consumes much time and

cannot achieve real time to solve the minimize restriction. Mouragnon et al. 60propose an

approach for generic and real-time structure from motion using local bundle adjustment. It

allows 3D points and camera poses to be reﬁned simultaneously through the image sequence.

Zhang et al. 61apply bundle optimization to further improve the results of consistent depth

maps from a video sequence.

4.3. Self-Calibration

To upgrade the projective and aﬃne reconstruction to a metric reconstruction i.e., deter-

mined up to an arbitrary Euclidean transformation and a scale factor, calibration techniques,

to which we follow the approach described in 2,6,9,15,62, can deal with this problem. It

can be done by imposing some constraints on the intrinsic camera parameters. This approach

that is called self-calibration has received a lot of attention in recent years. The ambiguity

on the reconstruction is restricted from projective to metric through self-calibration 6.

Mostly self-calibration algorithms are concerned with unknown but constant intrinsic camera

parameters 2,4,12. The paper presented the problem of 3D Euclidean reconstruction of

structured scenes from uncalibrated images based on the property of vanishing points 63.

They propose a multistage linear approach, with structure from motion technique based on

point and vanishing point matches in images 64.

4.4. Correlative Improvement

Traditional SFM algorithms using just two images often produce inaccurate 3D reconstruc-

tions, mainly due to incorrect estimation of the camera’ motion. Thomas and Oliensis 65

present a practical algorithm that can deal with noise in multiframe structure from motion.

It describes a new incremental algorithm for reconstructing structure from multi-image

sequences which estimates and corrects for the error in computing the camera motion.

Mathematical Problems in Engineering 11

The research of structure from motion has shown great progress throughout several decades,

but the algorithms on structure from motion still exhibit some faults and shortages. The

result of Structure from Motion cannot satisfy people in many situations. However, many

researchers present a lot of improving approaches, such as dual computation of projective

shape and camera positions from multiple images 66.

For incremental algorithms that solve progressively larger bundle adjustment

problems, Crandall et al. present an alternative formulation for structure from motion based

on ﬁnding a coarse initial solution using a hybrid discrete-continuous optimization and then

improve the solution using bundle adjustment. The initial optimization step uses a discrete

Markov random ﬁeld MRFformulation, coupled with a continuous Levenberg-Marquardt

reﬁnement 67.

For time eﬃciency, Havlena et al. present a method of eﬃcient structure from motion

by graph optimization 68. Gherardi et al. improve the algorithm of eﬃciency with hierar-

chical structure and motion 69.

For duplicate or similar structure, Roberts et al. couple an expectation maximization

EMalgorithm for structure from motion for scenes with large duplicate structures 70.

A hierarchical framework that resamples 3D reconstructed points to reduce computation

cost on time and memory for very-large-scale structure from motion 71. Savarese and Bao

propose a formulation called semantic structure from motion SSFM, where SSFM takes

advantages of both semantic and geometrical properties associated with objects in the scene

72.

5. Relevant Algorithms

5.1. Features

(1) Line

For the problem of camera motion and 3D structure reconstruction from line correspondences

across multiple views, there is a triangulation algorithm that outperforms standard linear and

bias-corrected quasi-linear algorithms, and that bundle adjustment using our orthonormal

representation yields results similar to the standard maximum likelihood trifocal tensor

algorithm, while being usable for any number of views 73. Spetsakis and Aloimonos

74present a system for structure from motion using line correspondences. The recovery

algorithm is formulated in terms of an objective function which measures the total squared

distance in the image plane between the observed edge segments and the projections of the

reconstructed lines 75. A linear method is developed for reconstruction using lines and

points simultaneously 76.

(2) Curve

Tubic et al. 77present an approach for reconstructing a surface from a set of arbitrary,

unorganized, and intersecting curves. There is an approach for reconstructing open surfaces

from image data 78. Kaminski and Shashua 79introduce a number of new results in the

context of multiview geometry from general algebraic curves, which start with the recoveryof

camera geometry from matching curves. Berthilsson et al. present a method for reconstruction

of general curves, using factorization and bundle adjustment 80.

12 Mathematical Problems in Engineering

(3) Silhouette

Liang and Wong 81develop an approach that produces relatively complete 3D models

similar to volumetric approaches, with the topology conforming to what is observed from

the silhouettes. In addition, the method neither assumes nor depends on the spatial order

of viewpoints. Hartley and Kahl give us critical conﬁgurations for projective reconstruction

from multiple views in 82. Joshi et al. design an algorithm for structure and motion

estimation from dynamic silhouettes under perspective projection 83.Liuetal.present

a method that is shaped from silhouette outlines using an adaptive dandelion model 84.

Yemez and Wetherilt develop a volumetric fusion technique for surface reconstruction from

silhouettes and range data 85.

5.2. Other Aspects

(1) Multiview Stereo

Multiview stereo MVStechniques take as input a set of images with known camera param-

eters i.e., position and orientation of the camera, focal length, image distortion parameters

38,53,86.Wecanreferto87for a classiﬁcation and evaluation of recent MVS techniques.

(2) Clustering

There are clustering techniques to partition the image set into groups of related images, based

on the visual structure represented in the image connectivity graph for the collection 88,89.

6. Existing Problems and Future Trends

While algorithms of structure from motion have been developed for 3D reconstruction in

many applications, some problems of reconstructing geometry from video sequences still

exist in computer vision and photography. Until recently, however, there have been no

good computer vision techniques for recovering this kind of structure from motion. Many

researchers are still making eﬀorts to improve the methods mainly in the following aspects.

6.1. Feature Tracking and Matching

Zhang et al. give a robust and eﬃcient algorithm on eﬃcient nonconsecutive feature tracking

for structure from motion via two main steps, that is, consecutive point tracking and

nonconsecutive track matching 90. They improve the KTL tracker by the invariant feature

points and a two-pass matching strategy to signiﬁcantly extend the track lifetime and reduce

the sensitivity of feature points to variant scale, duplicate and similar structure, and noise and

image distortion. The results can be found at http://www.cad.zju.edu.cn/home/gfzhang/.

6.2. Active Vision

The method is based on the structure from controlled motion that constrains camera motions

to obtain an optimal estimation of the 3D structure of a geometrical primitive 91.Stereo

Mathematical Problems in Engineering 13

geometry is acquired from 3D egomotion streams 92. Wide-area egomotion estimation is

acquired from known 3D structure 93. A work on estimating surface reﬂectance properties

of a complex scene under captured natural illumination can be found in 94. Other algo-

rithms are also attempted on selective attention of human eyes.

6.3. Unorganized Images

To solve the resulting large-scale nonlinear optimization, we reconstruct the scene incremen-

tally, starting from a single pair of images, then adding new images and points in rounds, and

running a global nonlinear optimization after each round 53. Structure from motion could

be applied to photos found in the wild, reconstructing scenes from several large Internet

photo collections 14. The large redundancy in online photo collections means that a small

fraction of images may be suﬃcient to produce high-quality reconstructions. An investigation

has begun to explore by extracting image “skeletons” from large collections 95.Perhaps

the most important challenge is to ﬁnd ways to eﬀectively parallelize all the steps of the

reconstruction pipeline to take advantage of multicore architectures and cloud computing

37,38,53,89.

7. Conclusion

This paper has summarized the recent development of structure from motion algorithm that

is able to metrically reconstruct complex scenes and objects. The wide applications have

been addressed in computer vision area. Typical contributions are introduced for feature

point detection, tracking, matching, factorization, bundle adjustment, multiview stereo, self-

calibration, line detection and matching, modeling, and so forth. Representative works are

listed for readers to have a general overview of the state of the art. Finally, a summary of

existing problems and future trends of structure modeling is addressed.

Acknowledgments

This work was supported by the National Natural Science Foundation of China and Microsoft

Research Asia NSFC nos. 61173096, 60870002, and 60802087, Zhejiang Provincial S&T De-

partment 2010R10006, 2010C33095, and Zhejiang Provincial Natural Science Foundation

R1110679.

References

1H. Kuﬀar and K. Takaya, “Depth measurement and 3D metric reconstruction from two uncalibrated

stereo images,” in Proceedings of the Canadian Conference on Electrical and Computer Engineering (CCECD

’07), pp. 1460–1463, April 2007.

2R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press,

2nd edition, 2004.

3O. Faugeras and Q. T. Luong, The Geometry of Multiple Images, MIT Press, Cambridge, Mass, USA, 2001.

4R. I. Hartley and P. Sturm, “Triangulation,” in Proceedings of the American Image Understanding

Workshop, pp. 957–966, 1994.

5P. F. McLauchlan and D. W. Murray, “Unifying framework for structure and motion recovery from

image sequences,” in Proceedings of the 5th International Conference on Computer Vision (ICCV ’95), pp.

314–320, June 1995.

14 Mathematical Problems in Engineering

6M. Pollefeys, L. Van Gool, M. Vergauwen et al., “Visual modeling with a hand-held camera,”

International Journal of Computer Vision, vol. 59, no. 3, pp. 207–232, 2004.

7S. Avidan and A. Shashua, “Trajectory triangulation: 3D reconstruction of moving points from a

monocular image sequence,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22,

no. 4, pp. 348–357, 2000.

8N. Molton and M. Brady, “Practical structure and motion from stereo when motion is unconstrained,”

International Journal of Computer Vision, vol. 39, no. 1, pp. 5–23, 2000.

9A. Marugame, J. Katto, and M. Ohta, “Structure recovery with multiple cameras from scaled

orthographic and perspective views,” IEEE Transactions on Pattern Analysis and Machine Intelligence,

vol. 21, no. 7, pp. 628–633, 1999.

10A. Shashua and N. Navab, “Relative aﬃne structure: canonical model for 3D from 2D geometry and

applications,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 9, pp. 873–883,

1996.

11F. Kahl and A. Heyden, “Aﬃne structure and motion from points, lines and conics,” International

Journal of Computer Vision, vol. 33, no. 3, pp. 163–180, 1999.

12A. W. Fitzgibbon and A. Zisserman, “Automatic camera recovery for closed or open image

sequences,” in Proceedings of the ECCV, pp. 311–326, 1998.

13F. Schaﬀalitzky and A. Zisserman, “Multi-view matching for unordered image sets, or “How do I

organize my holiday snaps?”,” in Proceedings of the IEEE Conference on Computer Vision, vol. 1, pp.

414–431, 2002.

14N. Snavely, S. M. Seitz, and R. Szeliski, “Photo tourism: exploring photo collections in 3D,” in

Proceedings of the International Conference on Computer Graphics and Interactive Technologies, pp. 835–846,

2006.

15R. Hartley, “Euclidean reconstruction from uncalibrated views,” in Applications of Invariance in

Computer Vision, J. L. Mundy, A. Zisserman, and D. Forsyth, Eds., Lecture Notes in Computer, 1994.

16A. Bartoli and P. Sturm, “Constrained structure and motion from multiple uncalibrated views of a

piecewise planar scene,” International Journal of Computer Vision, vol. 52, no. 1, pp. 45–64, 2003.

17R. Szeliski and S. B. Kang, “Recovering 3D Shape and Motion from Image Streams Using Nonlinear

Least Squares,” Journal of Visual Communication and Image Representation, vol. 5, no. 1, pp. 10–28, 1994.

18P. M. Q. Aguiar and J. M. F. Moura, “A fast algorithm for rigid structure from image sequences,” in

Proceedings of the IEEE International Conference on Image Processing (ICIP ’99), vol. 3, pp. 125–129, Kobe,

Japan, 1999.

19A. Hilton, “Scene modelling from sparse 3D data,” Image and Vision Computing, vol. 23, no. 10, pp.

900–920, 2005.

20P. Parodi and G. Piccioli, “3D shape reconstruction by using vanishing points,” IEEE Transactions on

Pattern Analysis and Machine Intelligence, vol. 18, no. 2, pp. 211–217, 1996.

21M. Lhuillier, “Toward ﬂexible 3D modeling using a catadioptric camera,” in Proceedings of the IEEE

Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ’07), pp. 1560–1567, June

2007.

22A. Bartoli and S. I. Olsen, “A batch algorithm for implicit non-rigid shape and motion recovery,” in

Proceedings of the International Conference on Dynamical Vision, 2006.

23D. Liebowitz and S. Carlsson, “Uncalibrated motion capture exploiting articulated structure

constraints,” International Journal of Computer Vision, vol. 51, no. 3, pp. 171–187, 2003.

24G. Wang and Q. M. J. Wu, “Stratiﬁcation approach for 3-D euclidean reconstruction of nonrigid objects

from uncalibrated image sequences,” IEEE Transactions on Systems, Man, and Cybernetics, Part B,vol.

38, no. 1, pp. 90–101, 2008.

25J. Taylor, A. D. Jepson, and K. N. Kutulakos, “Non-rigid structure from locally-rigid motion,” in

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR

’10), pp. 2761–2768, June 2010.

26S. Zhu, L. Zhang, and B. M. Smith, “Model evolution: an incremental approach to non-rigid structure

from motion,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern

Recognition (CVPR ’10), pp. 1165–1172, June 2010.

27M. Farenzena and A. Fusiello, “Stabilizing 3D modeling with geometric constraints propagation,”

Computer Vision and Image Understanding, vol. 113, no. 11, pp. 1147–1157, 2009.

28R. Gong and G. Xu, “3D structure from a single calibrated view using distance constraints,” IEICE

Transactions on Information and Systems, vol. 87, no. 6, pp. 1527–1536, 2004.

29M. Marques and J. Costeira, “Estimating 3D shape from degenerate sequences with missing data,”

Computer Vision and Image Understanding, vol. 113, no. 2, pp. 261–272, 2009.

Mathematical Problems in Engineering 15

30T. Brodsky, C. Fermuller, and Y. Aloimonos, “Structure from motion: beyond the epipolar constraint,”

International Journal of Computer Vision, vol. 37, no. 3, pp. 231–258, 2000.

31A. Criminisi, I. Reid, and A. Zisserman, “Single view metrology,” International Journal of Computer

Vis ion, vol. 40, no. 2, pp. 123–148, 2000.

32H. Joachim, “Direct estimation of structure and motion from multiple zframes,” MIT AI Lab. Memo

1190, Massachusetts Institute of Technology, Mass, USA, 190.

33J. Oliensis, “Critique of structure-from-motion algorithms,” Computer Vision and Image Understanding,

vol. 80, no. 2, pp. 172–214, 2000.

34T. Rodriguez, P. Sturm, P. Gargallo et al., “Photorealistic 3D reconstruction from handheld cameras,”

Machine Vision and Applications, vol. 16, no. 4, pp. 246–257, 2005.

35S. N. Sinha, D. Steedly, R. Szeliski, M. Agrawala, and M. Pollefeys, “Interactive 3D architectural

modeling from unordered photo collections,” ACM Transactions on Graphics, vol. 27, no. 5, article 159,

2008.

36Y. Furukawa, B. Curless, S. M. Seitz, and R. Szeliski, “Reconstructing building interiors from images,”

in Proceedings of the International Conference on Computer Vision, pp. 80–87, 2009.

37S. Agarwal, N. Snavely, I. Simon, S. M. Seitz, and R. Szeliski, “Building Rome in a day,” in Proceedings

of the 12th International Conference on Computer Vision (ICCV ’09), pp. 72–79, Kyoto, Japan, October

2009.

38Y. Furukawa, B. Curless, S. M. Seitz, and R. Szeliski, “Towards internet-scale multi-view stereo,” in

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR

’10), pp. 1434–1441, San Francisco, Calif, USA, June 2010.

39Y. T. Zheng, M. Zhao, Y. Song et al., “Tour the World: building a web-scale landmark recognition

engine,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern

Recognition Workshops (CVPR ’09), pp. 1085–1092, June 2009.

40M. Pollefeys et al., “Detailed real-time urban 3D reconstruction from video,” International Journal of

Computer Vision, vol. 78, no. 2-3, pp. 143–167, 2008.

41O. Faugeras, L. Robert, S. Laveau et al., “3-D reconstruction of urban scenes from image sequences,”

Computer Vision and Image Understanding, vol. 69, no. 3, pp. 292–309, 1998.

42N. Cornelis, B. Leibe, K. Cornelis, and L. Van Gool, “3D urban scene modeling integrating recognition

and reconstruction,” International Journal of Computer Vision, vol. 78, no. 2-3, pp. 121–141, 2008.

43C. Collewet and F. Chaumette, “Visual servoing based on structure from controlled motion or on

robust statistics,” IEEE Transactions on Robotics, vol. 24, no. 2, pp. 318–330, 2008.

44M. Marengoni, A. Hanson, S. Zilberstein, and E. Riseman, “Decision making and uncertainty

management in a 3D reconstruction system,” IEEE Transactions on Pattern Analysis and Machine

Intelligence, vol. 25, no. 7, pp. 852–858, 2003.

45T. Tung, F. Schmitt, and T. Matsuyama, “Topology matching for 3D video compression,” in Proceedings

of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ’07), pp. 2719–

2726, June 2007.

46D. Fidaleo and G. Medioni, “Model-assisted 3D face reconstruction from video,” in Proceedings of

the 3rd International Workshop on Analysis and Modeling of Faces and Gestures (AMFG ’07), vol. 4778 of

Lecture Notes in Computer Science, pp. 124–138, 2007.

47U. Park and A. Jain, “3D model-based face recognition in video,” in Proceedings of the Proceedings

International Conference on Advances in Biometrics (ICB ’07), vol. 4642 of Lecture Notes in Computer

Science, pp. 1085–1094, 2007.

48V. Nedovic, A. W. M. Smeulders, A. Redert, and J. M. Geusebroek, “Stages as models of scene

geometry,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1673–1687,

2010.

49B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo

vision,” in Proceedings of the International Joint Conference on Artiﬁcial Intelligence, pp. 674–679, 1981.

50C. Tomasi and T. Kanade, “Detection and tracking of point features,” Tech. Rep. CMU-91-132, CMU,

1991.

51J. Shi and C. Tomasi, “Good features to track,” in Proceedings of the IEEE Computer Society Conference

on Computer Vision and Pattern Recognition, pp. 593–600, June 1994.

52D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of

Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.

53N. Snavely, I. Simon, M. Goesele, R. Szeliski, and S. M. Seitz, “Scene reconstruction and visualization

from community photo collections,” Proceedings of the IEEE, vol. 98, no. 8, Article ID 5483186, pp.

1370–1390, 2010.

16 Mathematical Problems in Engineering

54H. C. Longuet-higgins, “A computer algorithm for reconstructing a scene from two projections,”

Nature, vol. 293, no. 5828, pp. 133–135, 1981.

55C. Tomasi and T. Kanade, “Shape and motion from image streams under orthography: a factorization

method,” International Journal of Computer Vision, vol. 9, no. 2, pp. 137–154, 1992.

56C. J. Poelman and T. Kanade, “A paraperspective factorization method for shape and motion

recovery,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 3, pp. 206–218,

1997.

57P. Sturm and B. Triggs, “A factorization based algorithm for multi-image projective structure and

motion,” in Proceedings of the 4th European Conference on Computer Vision, 1996.

58J. P. Tardif, A. Bartoli, M. Trudeau, N. Guilbert, and S. Roy, “Algorithms for batch matrix factorization

with application to structure-from-motion,” in Proceedings of the IEEE Computer Society Conference on

Computer Vision and Pattern Recognition (CVPR ’07), June 2007.

59B.Triggs,P.F.McLauchlan,R.I. Hartley, and A. W. Fitzgibbon, “Bundle adjustment—a modern

synthesis,” in Proceedings of the International Workshop on Vision Algorithms, pp. 298–372, 1999.

60E. Mouragnon, M. Lhuillier, M. Dhome, F. Dekeyser, and P. Sayd, “Generic and real-time structure

from motion using local bundle adjustment,” Image and Vision Computing, vol. 27, no. 8, pp. 1178–

1193, 2009.

61G. Zhang, J. Jia, T. T. Wong, and H. Bao, “Consistent depth maps recovery from a video sequence,”

IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 6, pp. 974–988, 2009.

62T. Jebara, A. Azarbayejani, and A. Pentland, “3D structure from 2D motion,” IEEE Signal Processing

Magazine, vol. 16, no. 3, pp. 66–84, 1999.

63G. Wang, H. T. Tsui, and Q. M. Jonathan Wu, “What can we learn about the scene structure from three

orthogonal vanishing points in images,” Pattern Recognition Letters, vol. 30, no. 3, pp. 192–202, 2009.

64S. N. Sinha, D. Steedly, and R. Szeliski, “A multi-stage linear approach to structure from motion,” in

Proceedings of the European Conference on Computer Vision (ECCV ’10), 2010.

65J. I. Thomas and J. Oliensis, “Dealing with noise in multiframe structure from motion,” Computer

Vision and Image Understanding, vol. 76, no. 2, pp. 109–124, 1999.

66S. Carlsson and D. Weinshall, “Dual computation of projective shape and camera positions from

multiple images,” International Journal of Computer Vision, vol. 27, no. 3, pp. 227–241, 1998.

67D. Crandall, A. Owens, N. Snavely, and D. Huttenlocher, “Discrete-continuous optimization for large-

scale structure from motion,” in Proceedings of the IEEE Conference on Computer Vision and Pattern

Recognition (CVPR ’11), 2011.

68M. Havlena, A. Torii, and T. Pajdla, “Eﬃcient structure from motion by graph optimization,” in

Proceedings of the European Conference on Computer Vision (ECCV ’10), 2010.

69R. Gherardi, M. Farenzena, and A. Fusello, “Improving the eﬃciency of hierarchical structure-

and-moton,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern

Recognition (CVPR ’10), pp. 1594–1600, June 2010.

70R. Roberts, S. Sinha, R. Szeliski, and D. Steedly, “Structure from motion for scenes with large duplicate

structures,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’11),

2011.

71T. Fang and L. Quan, “Resampling structure from motion,” in Proceedings of the European Conference

on Computer Vision (ECCV ’10), 2010.

72S. Savarese and S. Y. Z. Bao, “Semantic structure from motion,” in Proceedings of the IEEE Conference

on Computer Vision and Pattern Recognition (CVPR ’11), 2011.

73A. Bartoli and P. Sturm, “Structure-from-motion using lines: representation, triangulation, and bundle

adjustment,” Computer Vision and Image Understanding, vol. 100, no. 3, pp. 416–441, 2005.

74M. E. Spetsakis and J. Aloimonos, “Structure from motion using line correspondences,” International

Journal of Computer Vision, vol. 4, no. 3, pp. 171–183, 1990.

75C. J. Taylor and D. J. Kriegman, “Structure and motion from line segments in multiple images,” IEEE

Transactions on Pattern Analysis and Machine Intelligence, vol. 17, no. 11, pp. 1021–1032, 1995.

76R. I. Hartley, “Linear method for reconstruction from lines and points,” in Proceedings of the 5th

International Conference on Computer Vision (ICCV ’95), pp. 882–887, June 1995.

77D. Tubic, P. Hebert, and D. Laurendeau, “3D surface modeling from curves,” Image and Vision

Computing, vol. 22, no. 9, pp. 719–734, 2004.

78J. E. Solem and A. Heyden, “Reconstructing open surfaces from image data,” International Journal of

Computer Vision, vol. 69, no. 3, pp. 267–275, 2006.

79J. Y. Kaminski and A. Shashua, “Multiple view geometry of general algebraic curves,” International

Journal of Computer Vision, vol. 56, no. 3, pp. 195–219, 2004.

Mathematical Problems in Engineering 17

80R. Berthilsson, K. Astrom, and A. Heyden, “Reconstruction of general curves, using factorization and

bundle adjustment,” International Journal of Computer Vision, vol. 41, no. 3, pp. 171–182, 2001.

81C. Liang and K. Y. K. Wong, “3D reconstruction using silhouettes from unordered viewpoints,” Image

and Vision Computing, vol. 28, no. 4, pp. 579–589, 2010.

82R. Hartley and F. Kahl, “Critical conﬁgurations for projective reconstruction from multiple views,”

International Journal of Computer Vision, vol. 71, no. 1, pp. 5–47, 2007.

83T. Joshi, N. Ahuja, and J. Ponce, “Structure and motion estimation from dynamic silhouettes under

perspective projection,” International Journal of Computer Vision, vol. 31, no. 1, pp. 31–50, 1999.

84X. Liu, H. Yao, and W. Gao, “Shape from silhouette outlines using an adaptive dandelion model,”

Computer Vision and Image Understanding, vol. 105, no. 2, pp. 121–130, 2007.

85Y. Yemez and C. J. Wetherilt, “A volumetric fusion technique for surface reconstruction from

silhouettes and range data,” Computer Vision and Image Understanding, vol. 105, no. 1, pp. 30–41, 2007.

86N. Snavely, S. M. Seitz, and R. Szeliski, “Modeling the world from Internet photo collections,”

International Journal of Computer Vision, vol. 80, no. 2, pp. 189–210, 2008.

87S. M. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski, “A comparison and evaluation of

multi-view stereo reconstruction algorithms,” in Proceedings of the IEEE Computer Society Conference on

Computer Vision and Pattern Recognition (CVPR ’06), pp. 519–526, June 2006.

88I. Simon, N. Snavely, and S. M. Seitz, “Scene summarization for online image collections,” in

Proceedings of the IEEE 11th International Conference on Computer Vision (ICCV ’07), October 2007.

89S. Agarwal, Y. Furukawa, N. Snavely, B. Curless, S. M. Seitz, and R. Szeliski, “Reconstructing Rome,”

Computer, vol. 43, no. 6, pp. 40–47, 2010.

90G.Zhang,Z.Dong,J.Jia,T.T.Wong,andH.Bao,“Eﬃcient non-consecutive feature tracking for

structure-from-motion,” in Proceedings of the European Conference on Computer Vision (ECCV ’10), 2010.

91E. Marchand and F. Chaumette, “Active vision for complete scene reconstruction and exploration,”

IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 1, pp. 65–72, 1999.

92F. Dornaika and C. K. R. Chung, “Stereo geometry from 3-D ego-motion streams,” IEEE Transactions

on Systems, Man, and Cybernetics, Part B, vol. 33, no. 2, pp. 308–323, 2003.

93O. Koch and S. Teller, “Wide-area egomotion estimation from known 3D structure,” in Proceedings of

the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ’07), pp. 437–

444, June 2007.

94P. Debevec, C. Tchou et al., “Estimating surface reﬂectance properties of a complex scene under

captured natural illuminatio n,” Tech. Rep. ICT-TR-06.2004, University of Southern California Institute

for Creative Technologies, Marina del Rey, Calif, USA, 2004.

95N. Snavely, S. M. Seitz, and R. Szeliski, “Skeletal graphs for eﬃcient structure from motion,” in

Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’08),June

2008.

Submit your manuscripts at

http://www.hindawi.com

Operations

Research

Advances in

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2013

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2013

Mathematical Problems

in Engineering

Abstract and

Applied Analysis

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2013

ISRN

Applied

Mathematics

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2013

Hindawi Publishing Corporation

http://www.hindawi.com

Volume 2013

International Journal of

Combinatorics

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2013

Journal of Function Spaces

and Applications

International

Journal of

Mathematics and

Mathematical

Sciences

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2013

ISRN

Geometry

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2013

Discrete Dynamics in

Nature and Society

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2013

Hindawi Publishing Corporation

http://www.hindawi.com

Volume 2013

Advances in

Mathematical Physics

ISRN

Algebra

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2013

Probability

and

Statistics

Journal of

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2013

ISRN

Mathematical

Analysis

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2013

Journal of

Applied Mathematics

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2013

Advances in

Decision

Sciences

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2013

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2013

Stochastic Analysis

International Journal of

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2013

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2013

The Scientic

World Journal

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2013

ISRN

Discrete

Mathematics

Hindawi Publishing Corporation

http://www.hindawi.com

Diﬀerential Equations

International Journal of

Volume 2013

Research on Non-rigid Structure from Motion: A Literature Review

Article

Dec 2015

Yaming Wang

Non-rigid Structure from Motion (NRSfM) is a classical computer vision problem. And the main methods used to solve it are in general based on shape models or trajectory models. This paper will provide an overview over kinds of solutions proposed in these researches. It not only gives out the theoretical insights proposed by researchers in recent years, but also discusses them with their pros and cons. At the same time, the progress of the research about this topic is described in detail and its long-term trend is introduced at the end. This paper is very easy to understand, which mainly introduces two practical, everyday models for the NRSfM problem, namely trajectories based model and shape based model. Both of them are based on matrix factorization technology. Inevitably, some relevant optimization methods are mentioned to solve the projection matrix and corresponding coefficients effectively. © 2015 Binary Information Press & Textile Bioengineering and Informatics Society December 2015.

AN AUTOMATIC 3D RECONSTRUCTION METHOD BASED ON MULTI-VIEW STEREO VISION FOR THE MOGAO GROTTOES

Article

Full-text available

May 2015

This paper presents an automatic three-dimensional reconstruction method based on multi-view stereo vision for the Mogao Grottoes. 3D digitization technique has been used in cultural heritage conservation and replication over the past decade, especially the methods based on binocular stereo vision. However, mismatched points are inevitable in traditional binocular stereo matching due to repeatable or similar features of binocular images. In order to reduce the probability of mismatching greatly and improve the measure precision, a portable four-camera photographic measurement system is used for 3D modelling of a scene. Four cameras of the measurement system form six binocular systems with baselines of different lengths to add extra matching constraints and offer multiple measurements. Matching error based on epipolar constraint is introduced to remove the mismatched points. Finally, an accurate point cloud can be generated by multi-images matching and sub-pixel interpolation. Delaunay triangulation and texture mapping are performed to obtain the 3D model of a scene. The method has been tested on 3D reconstruction several scenes of the Mogao Grottoes and good results verify the effectiveness of the method.

Merging Static and Dynamic Depth Cues with Optical-Flow Recovery for Creating Stereo Videos

Article

Full-text available

Jan 2013
MATH PROBL ENG

A method for estimating the depth information of a general monocular image sequence and then creating a 3D stereo video is proposed. Distinguishing between foreground and background is possible without additional information, and then foreground pixels are moved to create the binocular image. The proposed depth estimation method is based on coarse-to-fine strategy. By applying the CID method in the spatial domain, the sharpness and the contrast of an image can be improved by the distance of the region based on its color. Then a coarse depth map of the image can be generated. An optical-flow method based on temporal information is then used to search and compare the block motion status between previous and current frames, and then the distance of the block can be estimated according to the amount of block motion. Finally, the static and motion depth information is integrated to create the fine depth map. By shifting foreground pixels based on the depth information, a binocular image pair can be created. A sense of 3D stereo can be obtained without glasses by an autostereoscopic 3D display.

A Review: Recent automatic Algorithms for the Detection of Brain Tumor

Book

Feb 2022

A Review: Recent Automatic Algorithms for the Segmentation of Brain Tumor MRI

Chapter

Full-text available

Jan 2022

Medical imaging techniques are a vital tool in disease diagnosis. The images are being developed to satisfy the growing need for important information from medical image scans by anticipating constitutional tissues for clinical analysis. The application of deep learning techniques is increasing with the demand for automatic diagnosis of medical imaging. Different layers are used in deep learning models to represent data abstraction and construct computational models. Imaging techniques allow medical experts such as radiologists to correctly recognize a patient’s condition, making medical procedures more accessible and automated. The review’s primary goal is to present a study on recent brain tumor detection segmentation and classification approaches. Brain tumors are reviewed because of their importance compared to other tumors and their high illness rate. Many brain tumor segmentation models have been described to grasp these methodologies well, along with their limits and benefits. The study focuses primarily on contemporary deep learning-based brain tumor detection technologies, such as deep generative and deep learning networks. The more advanced and recent techniques available in the literature are also reviewed to describe the methods for performing image segmentation and to emphasize the importance of segmentation models that are not used in real-time due to little or no interaction between clinicians and developers. Most research does not consider the data augmentation element of brain tumor segmentation, which is critical for improving performance. The most challenging feature, or limitation, is the fluctuation in the morphology of tumors or the intensity degree of tumors, both of which still require study in this arena.

A new application of fractional Atangana-Baleanu derivatives: Designing ABC-fractional masks in image processing

Article

Nov 2019
PHYSICA A

Based on a new definition for derivative and integral of fractional-order, several fractional masks have been presented for the use of image denoising. In each method, the process involves constructing a square and then applying it to all the corresponding blocks in the noisy image. We have measured the denoising performance of our proposed masks by employing some known indexes. They are the peak signal-to-noise ratio (PSNR), ENTROPY, and SSIM. The obtained experimental results show that our proposed masks are computationally efficient, and their performances are compatible with other standard and fractional smoothing filters.

Automatic three-dimensional reconstruction based on four-view stereo vision using checkerboard pattern

Article

May 2017

An automatic three-dimensional (3D) reconstruction method based on four-view stereo vision using checkerboard pattern is presented. Mismatches easily exist in traditional binocular stereo matching due to the repeatable or similar features of binocular images. In order to reduce the probability of mismatching and improve the measure precision, a four-camera measurement system which can add extra matching constraints and offer multiple measurements is applied in this work. Moreover, a series of different checkerboard patterns are projected onto the object to obtain dense feature points and remove mismatched points. Finally, the 3D model is generated by performing Delaunay triangulation and texture mapping on the point cloud obtained by four-view matching. This method was tested on the 3D reconstruction of a terracotta soldier sculpture and the Buddhas in the Mogao Grottoes. Their point clouds without mismatched points were obtained and less processing time was consumed in most cases relative to binocular matching. These good reconstructed models show the effectiveness of the method. © 2017, Central South University Press and Springer-Verlag GmbH Germany.

Propagation for feature matching using triangular constraints

Conference Paper

Dec 2013

This paper presents a novel Matching Propagation Framework for addressing the problem of finding better matching pairs between each two images, which is one of the most fundamental tasks in computer vision and pattern recognition. We first select initial seed points by original matching method like SIFT, and then use T-CM to explore more seed points. Finally, a triangle constraint based quasi-dense algorithm is adopted to propagate better matches around seed points. The experimental evaluation shows that our method can get a more precise matching result than classical quasi-dense algorithm. And the 3D reconstruction of the scene from our method has a good visual effect. Both experiments demonstrate the robust performance of our method.

Kinect driven 3D character animation using semantical skeleton

Conference Paper

Oct 2012

This paper introduces a Kinect driven 3D character animation system using semantically skeleton. The system obtains the capture character performance animation data from the Kinect device. For the motion data from Kinect is semantically, a semantic skeleton is embedded into the character model. The semantic skeleton was firstly selected from a motion capture database, and then using the normal characteristic value around joint can be computed out in the called joint surface. In the real time animation synthesis process, the system embedded the skeleton based on the feature points to make the 3D character movement with the performer before Kinect realistically. Several experiments are carried out to demonstrate the efficiency of the proposed method.

Non-Rigid Structure-From-Motion With Uniqueness Constraint and Low Rank Matrix Fitting Factorization

Article

Aug 2014

Imran Khan

Non-rigid structure-from-motion is one of the difficult and challenging problems in computer vision, especially when the only input available is 2D correspondences in monocular video sequence. This paper proposed a new constraint based framework for underconstrained non-rigid structure-from-motion problem to constrain the space of solution. The proposed method is based on a point trajectory approach with an additional uniqueness constraint applied to shape coefficients to reduce the basis required to construct the non-rigid 3D shape. A framework for occluded and incomplete measured data is also proposed using low rank matrix fitting which is a robust factorization scheme for the matrix completion problem. This method offers not only new theoretical insight, but also a practical, everyday solution, to non-rigid structure-from-motion. The proposed method is positively compared to the state-of-the-art in non-rigid structure-from-motion, providing improved results on high-frequency deformations of both articulated and simpler deformable shapes.

Estimating Surface Reflectance Properties of a Complex Scene under Captured Natural Illumination

Article

Full-text available

Jan 2004

We present a process for estimating spatially-varying surface re- flectance of a complex scene observed under natural illumination conditions. The process uses a laser-scanned model of the scene's geometry, a set of digital images viewing the scene's surfaces under a variety of natural illumination conditions, and a set of correspond- ing measurements of the scene's incident illumination in each pho- tograph. The process then employs an iterative inverse global illu- mination technique to compute surface colors for the scene which, when rendered under the recorded illumination conditions, best re- produce the scene's appearance in the photographs. In our process we measure BRDFs of representative surfaces in the scene to better model the non-Lambertian surface reflectance. Our process uses a novel lighting measurement apparatus to record the full dynamic range of both sunlit and cloudy natural illumination conditions. We employ Monte-Carlo global illumination, multiresolution geome- try, and a texture atlas system to perform inverse global illumina- tion on the scene. The result is a lighting-independent model of the scene that can be re-illuminated under any form of lighting. We demonstrate the process on a real-world archaeological site, show- ing that the technique can produce novel illumination renderings consistent with real photographs as well as reflectance properties that are consistent with ground-truth reflectance measurements.

The Geometry of Multiple Images

Article

Full-text available

Jan 2001

Photo tourism: Exploring photo collections in 3D

Article

Jan 2006

We present a system for interactively browsing and exploring large unstructured collections of photographs of a scene using a novel 3D interface. Our system consists of an image-based modeling front end that automatically computes the viewpoint of each photograph as well as a sparse 3D model of the scene and image to model correspondences. Our photo explorer uses image-based rendering techniques to smoothly transition between photographs, while also enabling full 3D navigation and exploration of the set of images and world geometry, along with auxiliary information such as overhead maps. Our system also makes it easy to construct photo tours of scenic or historic locations, and to annotate image details, which are automatically transferred to other relevant images. We demonstrate our system on several large personal photo collections as well as images gathered from Internet photo sharing sites.

Triangulation

Article

Jan 1994

Detection and Tracking of Point Features

Article

Jan 1991
INT J COMPUT VISION

The factorization method described in this series of reports requires an algorithm to track the motion of features in an image stream. Given the small inter-frame displacement made possible by the factorization approach, the best tracking method turns out to be the one proposed by Lucas and Kanade in 1981. The method defines the measure of match between fixed-size feature windows in the past and current frame as the sum of squared intensity differences over the windows. The displacement is then defined as the one that minimizes this sum. For small motions, a linearization of the image intensities leads to a Newton-Raphson style minimization. In this report, after rederiving the method in a physically intuitive way, we answer the crucial question of how to choose the feature windows that are best suited for tracking. Our selection criterion is based directly on the definition of the tracking algorithm, and expresses how well a feature can be tracked. As a result, the criterion is optima...

Multiple view geometry in computer vision. With foreword by Olivier Faugeras. 2nd edition

Article

Jan 2003

A unifying framework for structure and motion recovery from image sequences

Conference Paper

Jan 1995

The paper proposes a statistical framework that enables 3D structure and motion to be computed optimally from an image sequence, on the assumption that feature measurement errors are independent and Gaussian distributed. The analysis and results demonstrate that computing both camera/scene motion and 3D structure is essential to computing either with any accuracy. Having computed optimal estimates of structure and motion over a small number of initial images, a recursive version of the algorithm (previously reported) recomputes sub optimal estimates given new image data. The algorithm is designed explicitly for real time implementation, and the complexity is proportional to the number of tracked features. 3D projective, affine and Euclidean models of structure and motion recovery have been implemented, incorporating both point and line features into the computation. The framework can handle any feature type and camera model that may be encapsulated as a projection equation from scene to image

A Computer Alorithm for Reconstructing a Scene from Two Projections

Article

Sep 1981
NATURE

H. C. Longuet-Higgins

A simple algorithm for computing the three-dimensional structure of a scene from a correlated pair of perspective projections is described here, when the spatial relationship between the two projections is unknown. This problem is relevant not only to photographic surveying1 but also to binocular vision2, where the non-visual information available to the observer about the orientation and focal length of each eye is much less accurate than the optical information supplied by the retinal images themselves. The problem also arises in monocular perception of motion3, where the two projections represent views which are separated in time as well as space. As Marr and Poggio4 have noted, the fusing of two images to produce a three-dimensional percept involves two distinct processes: the establishment of a 1:1 correspondence between image points in the two views—the 'correspondence problem'—and the use of the associated disparities for determining the distances of visible elements in the scene. I shall assume that the correspondence problem has been solved; the problem of reconstructing the scene then reduces to that of finding the relative orientation of the two viewpoints.

Shape and motion from image streams under orthography: A factorization approach

Article

Jan 1992

A Multi-stage Linear Approach to Structure from Motion

Conference Paper

Sep 2010

We present a new structure from motion (Sfm) technique based on point and vanishing point (VP) matches in images. First, all global camera rotations are computed from VP matches as well as rel-ative rotation estimates obtained from pairwise image matches. A new multi-staged linear technique is then used to estimate all camera trans-lations and 3D points simultaneously. The proposed method involves first performing pairwise reconstructions, then robustly aligning these in pairs, and finally aligning all of them globally by simultaneously es-timating their unknown relative scales and translations. In doing so, measurements inconsistent in three views are efficiently removed. Unlike sequential Sfm, the proposed method treats all images equally, is easy to parallelize and does not require intermediate bundle adjustments. There is also a reduction of drift and significant speedups up to two order of magnitude over sequential Sfm. We compare our method with a standard Sfm pipeline [1] and demonstrate that our linear estimates are accurate on a variety of datasets, and can serve as good initializations for final bundle adjustment. Because we exploit VPs when available, our approach is particularly well-suited to the reconstruction of man-made scenes.