ThesisPDF Available

Reconstruction of the Indoor Environment based on Omnidirectional Vision with the Laser Structured Lights

Authors:

Abstract and Figures

The indoor reconstruction is an important topic in computer vision, contributing to various applications such as virtual and augmented reality; layout recovery; mobile robots navigation. Perception and sensing become an important part of the reconstruction of unknown environments. This thesis is a contribution towards the methods of extrinsic calibration of vision system and 3D reconstruction of indoor environments based on the omnidirectional vision with laser illumination. There have been improvements towards calibration techniques for vision systems with laser illumination. However, these techniques are limited for an omnidirectional vision system in particular configurations. Therefore, an improved omnidirectional vision system with laser illumination in a flexible configuration and a method for its extrinsic calibration are proposed. Results of experiments and simulations demonstrate that the proposed calibration method is more robust and provides a higher measurement accuracy. In general, reconstruction methods can be based on passive or active sensing techniques. Passive vision systems depend on the detected environmental features and work under environmental lightning conditions. As a result, when textures on the reconstructed surface are insufficient, e.g. simple long corridors, such methods present low matching accuracy. These problems could be eliminated by active vision systems, e.g. a structured light can be easily detected by a camera. Recent studies also consider deep learning for recovering 3D structure of the environment. In this thesis, a novel approach for 3D reconstruction is proposed, which combines methods of deep learning in combination with an active omnidirectional vision system. The proposed reconstruction technique is simple, robust and accurate. Recent advances in a deep learning require a large amount of the annotated training data in environments with a variety of conditions. Thus, developing and testing algorithms for navigation of mobile robots can be expensive and time-consuming. Motivated by the problems, a photo-realistic simulation platform is developed. Built using Unity, the simulation platform integrates sensors, mobile robots, elements of the indoor environment, and facilitates generation of synthetic photorealistic datasets with automatic ground truth annotations. The proposed simulator and supporting materials are available online: http://www.ilabit.org or https://ilabit4.wixsite.com/mysite-1
Content may be subject to copyright.
基于全向激光结构光视觉的室内环境重构研究
Ivan Kholodilin
2021 12
中图分类号:TP13
UDC 分类号:621.3
基于全向激光结构光视觉的室内环境重构研究
作者姓名 Ivan Kholodilin
学院名称 自动化学院
指导教师 王庆林 教授
答辩委员会主席 徐德 研究员
申请学位 工学博士
学科专业 控制科学与工程
学位授予单位 北京理工大学
论文答辩日期 2021 12
Reconstruction of the Indoor Environment based on
Omnidirectional Vision with the Laser Structured Lights
Candidate Name: Ivan Kholodilin
School or Department: School of Automation
Faculty Mentor: Prof. Qinglin Wang
Chair, Thesis Committee: Prof. De Xu
Degree Applied: Doctor of Philosophy
Major: Control Science and Engineering
Degree by: Beijing Institute of Technology
The Date of Defense: December, 2021
研究成果声明
本人郑重声明:所提交的学位论文是我本人在指导教师的指导下进行的
研究工作获得的研究成果。尽我所知,文中除特别标注和致谢的地方外,学
位论文中不包含其他人已经发表或撰写过的研究成果,也不包含为获得北京
理工大学或其它教育机构的学位或证书所使用过的材料。与我一同工作的合
作者对此研究工作所做的任何贡献均已在学位论文中作了明确的说明并表示
了谢意。
特此申明。
签名: 日期:
关于学位论文使用权的说明
本人完全了解北京理工大学有关保管、使用学位论文的规定,其中包括:
学校有权保管、并向有关部门送交学位论文的原件与复印件;学校可以
采用影印、缩印或其它复制手段复制并保存学位论文;学校可允许学位论
文被查阅或借阅;学校可以学术交流为目的,复制赠送和交换学位论文;
学校可以公布学位论文的全部或部分内容(保密学位论文在解密后遵守此规
定)。
签名: 日期:
导师签名: 日期:
Abstract
The indoor reconstruction is an important topic in computer vision, contributing to
various applications such as virtual and augmented reality; layout recovery; mobile robots
navigation. Perception and sensing become an important part of the reconstruction of unknown
environments. This thesis is a contribution towards the methods of extrinsic calibration of vision
system and 3D reconstruction of indoor environments based on the omnidirectional vision with
laser illumination.
There have been improvements towards calibration techniques for vision systems with
laser illumination. However, these techniques are limited for an omnidirectional vision system in
particular configurations. Therefore, an improved omnidirectional vision system with laser
illumination in a flexible configuration and a method for its extrinsic calibration are proposed.
Results of experiments and simulations demonstrate that the proposed calibration method is more
robust and provides a higher measurement accuracy.
In general, reconstruction methods can be based on passive or active sensing techniques.
Passive vision systems depend on the detected environmental features and work under
environmental lightning conditions. As a result, when textures on the reconstructed surface are
insufficient, e.g. simple long corridors, such methods present low matching accuracy. These
problems could be eliminated by active vision systems, e.g. a structured light can be easily
detected by a camera. Recent studies also consider deep learning for recovering 3D structure of
the environment. In this thesis, a novel approach for 3D reconstruction is proposed, which
combines methods of deep learning in combination with an active omnidirectional vision system.
The proposed reconstruction technique is simple, robust and accurate.
Recent advances in a deep learning require a large amount of the annotated training data
in environments with a variety of conditions. Thus, developing and testing algorithms for
navigation of mobile robots can be expensive and time-consuming. Motivated by the problems, a
photo-realistic simulation platform is developed. Built using Unity, the simulation platform
integrates sensors, mobile robots, elements of the indoor environment, and facilitates generation
of synthetic photorealistic datasets with automatic ground truth annotations. The proposed
simulator and supporting materials are available online: http://www.ilabit.org or
https://ilabit4.wixsite.com/mysite-1
Keywords: Extrinsic Calibration, Measurement, Omnidirectional Vision System,
Structured Light, Simulation, Mapping, Environment Reconstruction
Acknowledgement
The study in China completely changed my perception of the educational process. Since
childhood Chinese kids are becoming familiar with educational centers, mostly for studying
English language. The educational process never stops, during the school and university time
Chinese people also study very hard, by spending all of the time in the library of in the
laboratory. It was so unusual and strange for me in the beginning, but now I’m sure this approach
is the best investment towards the future life. No wonder that China is so powerful today and it
was so pleasant for me to be a part of this country, that is why I’ve been intensively studying
Chinese language! However, the Covid-19 pandemic has changed the rules and I can’t currently
return to Beijing for defensing procedure as well for realizing my future plans, but I am trying to
keep a Chinese presence of mind, namely to work hard and won’t be depressed J
I would like to express my deepest appreciation and gratitude to my research supervisor
Professor Wang Qing Lin. I would not have made it through these years at the university without
your support given to me for finding my way in life. I hope my occasional recklessness and
spontaneity didn’t give you too much grey hair. Thank you for teaching me everything I know
about computer vision and robotics. The knowledge you have given me, made it possible for me
to reach new heights and do things that would be unimaginable a few years ago. Thank you for
being like a father to me all these years and giving me guidance on various academic things and
also sharing your life with me. Thank you and Associate Professor Li Yuan for giving me the
opportunity to work in the State Key Laboratory of Intelligent Control and Decision of Complex
Systems, for your continuous support throughout the years!
I would also like to thank my lab-mates: Qi Yang, Jiang Zhao Guo and Li Meng Meng. I
am grateful for their advice as well as active support in conducting various experiments for my
research and in my academic life.
I thank my fellows in China: Luis Lago, Dmitry Zuev, Mark Rudak, Kanat Orazaliev,
Shazad, Imran for their helpful insights towards my research and for accompanying me
throughout these years.
Last but not least I would like to thank my mom for her support and love.
List of Publications
Journal Publications:
1. I. Kholodilin, Y. Li and Q. Wang, “Omnidirectional Vision System With Laser Illumination
in a Flexible Configuration and Its Calibration by One Single Snapshot,” in IEEE
Transactions on Instrumentation and Measurement, vol. 69, no. 11, pp. 9105-9118, Nov.
2020.
SCI Journal, Rank – Q1, Impact Factor – 3.658
doi: 10.1109/TIM.2020.2998598.
2. I. Kholodilin, Y. Li, Q. Wang, and P. Bourke, “Calibration and three-dimensional
reconstruction with a photorealistic simulator based on the omnidirectional vision system
(2021).
Accepted on 25 Oct. 2021: International Journal of Advanced Robotic Systems.
SCI Journal, Rank – Q2, Impact Factor – 1.652
doi: 10.1177/17298814211059313.
Conference Publications:
1. I. Kholodilin, A. Nesterov, “Future of the Electrical Engineering Education on the AR and
VR Basis,” in Proc. of the International Conference on Video, Signal and Image Processing,
pp. 113–117, Oct. 2019.
doi: 10.1145/3369318.3369337.
I
Table of Content
Abstract .......................................................................................................................................... 6
Acknowledgement .......................................................................................................................... 7
List of Publications ........................................................................................................................ 8
Table of Content ............................................................................................................................ I
List of Figures ............................................................................................................................... V
Abbreviations and Symbols .......................................................................................................... 9
Chapter 1 Introduction ......................................................................................................... 10
1.1 2D Mapping and 3D Reconstruction of the Indoor Environment ................................... 11
1.1.1 General reconstruction techniques ......................................................................... 11
1.1.2 Passive vision ........................................................................................................ 12
1.1.3 Active vision .......................................................................................................... 16
1.2 Research Challenges ....................................................................................................... 20
1.2.1 Sensing techniques ................................................................................................ 20
1.2.2 Field of view .......................................................................................................... 20
1.2.3 Results validation .................................................................................................. 21
1.2.4 Structured light ...................................................................................................... 21
1.3 Problem Statement .......................................................................................................... 21
1.4 Thesis Aims and Objectives ............................................................................................ 23
1.4.1 Aim 1 – Literature review ..................................................................................... 23
1.4.2 Aim 2 – A novel omni-vision system .................................................................... 23
1.4.3 Aim 3 – Novel methods of extrinsic calibration .................................................... 24
1.4.4 Aim 4 – A novel 3D reconstruction technique ...................................................... 24
1.4.5 Tasks ...................................................................................................................... 25
1.5 Thesis Contributions ....................................................................................................... 25
1.6 Thesis Road Map ............................................................................................................ 26
Chapter 2 Literature Review ................................................................................................ 27
2.1 Virtual Environment ....................................................................................................... 27
2.1.1 Introduction ........................................................................................................... 27
2.1.2 Review on simulators ............................................................................................ 28
II
2.1.3 Omni-vision simulation ......................................................................................... 34
2.2 Extrinsic Calibration ....................................................................................................... 35
2.2.1 Review on methods ............................................................................................... 36
2.2.2 Problem formulation .............................................................................................. 40
2.3 Reconstruction of the indoor environment ..................................................................... 41
2.3.1 Stereo vision .......................................................................................................... 41
2.3.2 Kinect ..................................................................................................................... 42
2.3.3 Lidars ..................................................................................................................... 43
2.3.4 Structured light ...................................................................................................... 44
2.3.5 Deep learning ......................................................................................................... 45
2.4 Conclusion ...................................................................................................................... 46
Chapter 3 Omni-vision System Research Platform ............................................................ 47
3.1 Main Contributions of this chapter ................................................................................. 47
3.2 An Improved Omnidirectional Vision System ............................................................... 47
3.3 Intrinsic Calibration ........................................................................................................ 50
3.3.1 Calibration principle .............................................................................................. 50
3.3.2 Camera model ........................................................................................................ 53
3.3.3 Implementation ...................................................................................................... 55
3.3.4 Results ................................................................................................................... 56
3.4 Preliminary Work on Extrinsic Calibration .................................................................... 57
3.4.1 System model ........................................................................................................ 57
3.4.2 Calibration procedure ............................................................................................ 59
3.4.3 Discussion .............................................................................................................. 62
3.5 Development of the Simulation Environment ................................................................ 63
3.5.1 Motivation ............................................................................................................. 63
3.5.2 Simulation Environment ........................................................................................ 65
3.5.3 Platform features .................................................................................................... 67
3.5.4 Capabilities ............................................................................................................ 69
3.5.5 Map transform ....................................................................................................... 70
3.5.6 Experiment setup ................................................................................................... 72
3.5.7 Experiment results ................................................................................................. 73
III
3.5.8 Discussion .............................................................................................................. 74
Chapter 4 Extrinsic Calibration and 2D Mapping ............................................................. 75
4.1 Main Contributions of this chapter ................................................................................. 75
4.2 Proposed Method of Extrinsic Calibration ..................................................................... 75
4.2.1 System model ........................................................................................................ 75
4.2.2 Calibration procedure in a general form ................................................................ 76
4.2.3 Camera Calibration Placed to the Indoor Environment ......................................... 77
4.2.4 Laser Plane Calibration ......................................................................................... 80
4.3 Experiment Setup ............................................................................................................ 82
4.3.1 Self-evaluation technique ...................................................................................... 83
4.3.2 Comparison with the state-of-the-art calibration method ...................................... 84
4.3.3 2D mapping ........................................................................................................... 85
4.3.4 Real data ................................................................................................................ 85
4.4 Experiment Results ......................................................................................................... 86
4.4.1 Self-evaluation technique ...................................................................................... 86
4.4.2 Comparison with the state-of-the-art calibration method ...................................... 89
4.4.3 2D mapping ........................................................................................................... 89
4.4.4 Real data ................................................................................................................ 92
Chapter 5 Extrinsic Calibration and 3D Reconstruction ................................................... 95
5.1 Motivation ....................................................................................................................... 95
5.2 System Model ................................................................................................................. 95
5.3 Calibration Procedure ..................................................................................................... 96
5.3.1 Extrinsic parameters of the camera ....................................................................... 96
5.3.2 Extrinsic parameters of the laser plane .................................................................. 98
5.4 Evaluation ....................................................................................................................... 99
5.4.1 Experiment Setup .................................................................................................. 99
5.4.2 Results ................................................................................................................. 101
5.4.3 Discussion ............................................................................................................ 102
5.5 3D Reconstruction Method ........................................................................................... 102
5.5.1 Semantic Segmentation ....................................................................................... 103
5.5.2 Feature Extraction ................................................................................................ 104
IV
5.5.3 Experimental Setup .............................................................................................. 105
5.5.4 Results ................................................................................................................. 105
5.5.5 Discussion ............................................................................................................ 106
5.6 Perspective Projection ................................................................................................... 107
5.6.1 Preparation ........................................................................................................... 107
5.6.2 Perspective projection .......................................................................................... 107
5.6.3 Proposed 3D Reconstruction Technique ............................................................. 111
5.6.4 Comparison with commonly used reconstruction methods ................................. 114
Chapter 6 Conclusions ......................................................................................................... 117
6.1 Proposed Simulator ....................................................................................................... 117
6.2 Omni-Vision System and its Extrinsic Calibration ....................................................... 117
6.3 3D Reconstruction of the Indoor Environment ............................................................. 118
References ................................................................................................................................... 120
V
List of Figures
Figure 1.1: Applications related to the indoor reconstruction. ...................................................... 11
Figure 1.2: Stereo vision. ............................................................................................................... 14
Figure 1.3: Structure from motion. ................................................................................................ 16
Figure 1.4: Time-of-Flight scanners. ............................................................................................. 18
Figure 1.5: Laser range finders. ..................................................................................................... 19
Figure 1.6: Structured light. ........................................................................................................... 19
Figure 1.7: Mapping results when extrinsic calibration was not considered. ................................ 22
Figure 1.8: Configuration of the vision system with several laser emitters. ................................. 23
Figure 2.1: Webots simulator. ....................................................................................................... 29
Figure 2.2: Robotics System Toolbox. .......................................................................................... 30
Figure 2.3: Microsoft Robotics Developer Studio. ........................................................................ 30
Figure 2.4: USARSim simulator. .................................................................................................. 31
Figure 2.5: OpenSim simulator. .................................................................................................... 32
Figure 2.6: Robot Soccer Simulator with ÜberSim. ...................................................................... 32
Figure 2.7: Simbad simulator. ....................................................................................................... 33
Figure 2.8: Ways of simulating the omni-camera. ........................................................................ 34
Figure 2.9: Ways of simulating the omni-camera. ........................................................................ 35
Figure 2.10: Calibration methods for obtaining extrinsic parameters of the vision system. ......... 37
Figure 2.11: Setup Laser – Camera used in Zhang and Pless [100]. ............................................. 37
Figure 2.12: Setup Laser – Camera used in Vasconcelos et al [109]. ........................................... 38
Figure 2.13: Setup Laser – Camera used in Bok et al [110]. ......................................................... 39
Figure 2.14: Example of 3D points cloud. ..................................................................................... 42
Figure 2.15: 3D reconstruction based on the Kinect sensor. The image above shows single Kinect.
The image below shows multiple Kinects. .................................................................................... 43
Figure 2.16: 3D reconstruction based on the Lidar sensor. ........................................................... 44
Figure 2.17: 3D reconstruction based on the rotated structured light. .......................................... 45
Figure 2.18: Layout recovery with deep learning. ......................................................................... 46
Figure 3.1: Configuration of the vision system. A – fisheye camera Ricoh Theta S. B –
omnidirectional laser emitter. C – snapshot captured by fisheye camera. .................................... 47
Figure 3.2: Types of Cameras. A – Perspective camera. B – Catadioptric camera. C – Fisheye
camera. ........................................................................................................................................... 48
Figure 3.3: Mei’s model projection steps. ..................................................................................... 52
Figure 3.4: Pre-calibration procedure: preparing images for the calibration procedure. .............. 56
Figure 3.5: The evaluation of the intrinsic camera calibration. ..................................................... 57
Figure 3.6: Calibration images. . Left image shows the single checkerboard patterns. Right image
shows the checkerboard patterns with the laser beam. .................................................................. 59
Figure 3.7: Initial projection of the target (up- and front-views). ................................................. 60
Figure 3.8: Calibrated projection of the target (up- and front-views). .......................................... 60
VI
Figure 3.9: From left to right: Input image; Extracted laser beam; Projection of the laser and
points of the left checkerboard pattern; Projection of the laser and points of the front
checkerboard pattern. ..................................................................................................................... 61
Figure 3.10: Fitting plane to the laser points. ................................................................................ 61
Figure 3.11: Verification by mapping. Left image shows checkerboard patterns with the laser
beam. Midle image shows not-calibrated map. Right image shows calibrated map. .................... 62
Figure 3.12: Omnidirectional vision system. Left image shows elements of the system. Right
image shows snapshot captured by fisheye camera ....................................................................... 66
Figure 3.13: Several modes supported by the simulator, from left to right: ordinary mode,
semantic labeling mode, depth mode. ........................................................................................... 68
Figure 3.14: The calibration screen. .............................................................................................. 69
Figure 3.15: The measurement screen. .......................................................................................... 70
Figure 3.16: Left image shows initial map obtained with laser emitter; right image shows shifted
map. ............................................................................................................................................... 71
Figure 3.17: Left image shows binary map supported by the Robotics System Toolbox; right
image shows logical map supported by the Robotics Toolbox. .................................................... 72
Figure 3.18: Left image shows navigation to the target position with Robotics System Toolbox;
right image shows navigation to the target position with Robotics Toolbox. ............................... 73
Figure 4.1: Calibration target. Green border shows target in a cross-section. .............................. 77
Figure 4.2: Blue – initial projection. Red – projection with the optimized camera pitch and roll.
....................................................................................................................................................... 78
Figure 4.3: Optimized camera yaw. ............................................................................................... 80
Figure 4.4: Blue – initial projection. Red – projection with the optimized laser pitch and roll. ... 81
Figure 4.5: Optimized laser distance. ............................................................................................ 82
Figure 4.6: Noise Salt & Pepper. Left image is related with the camera calibration. Right image
is related with the laser plane calibration. Magenta color shows extracted image points. Red
curve is fitted among the majority of the points. Blue points are the curve points, which are used
for calibration and are also superimposed on the bottom image, for better visual understanding.
....................................................................................................................................................... 87
Figure 4.7: Mapping of the indoor environment. Red color – Proposed method; Blue color –
Standard method. Under maps it is shown index of the particular experiment with the
configuration of the vision system, where values in the brackets reveal [pitch; roll; yaw] in
degrees respectively. ...................................................................................................................... 90
Figure 4.8: Calibration target. Green border shows target in a cross-section. .............................. 92
Figure 4.9: Extrinsic calibration. ................................................................................................... 93
Figure 4.10: Mapping of the indoor environment. ........................................................................ 94
Figure 5.1: A shows previous configuration. B shows proposed configuration. ........................... 95
Figure 5.2: The proposed calibration target. .................................................................................. 96
Figure 5.3: A – initial projection. B – projection with the optimized camera pitch and roll ........ 97
VII
Figure 5.4: A – initial projection. B – projection with the optimized laser pitch, roll and distance.
....................................................................................................................................................... 98
Figure 5.5: Configuration of the vision system #1. A – Method 1. B – Proposed. C – Method 2.
..................................................................................................................................................... 100
Figure 5.6: Configuration of the vision system #2. A – Proposed. B – Method 2. ..................... 100
Figure 5.7: Overview: From a single fisheye snapshot, the proposed method combines semantic
segmentation and laser data to generate the 3D model of the indoor environment. .................... 103
Figure 5.8: Neural network performance evaluation. ResNet18 is shown in red color and
ResNet50 is shown in blue color. ................................................................................................ 105
Figure 5.9: Training results. ........................................................................................................ 106
Figure 5.10: Upper row shows extracted regions of the indoor environment (floor, ceiling, walls,
and doors) with the visible laser beam. Lower row shows corresponding perspective projection
for extracted regions of the indoor environment. ........................................................................ 107
Figure 5.11: Initialization of the virtual camera. ......................................................................... 108
Figure 5.12: Perspective image. .................................................................................................. 109
Figure 5.13: Transformation between world and camera coordinates. ....................................... 110
Figure 5.14: Transformation to the image coordinates. ............................................................... 110
Figure 5.15: Reconstruction results (single reconstruction and global map are presented). ....... 114
Figure 5.16: Left image shows reconstruction results on a passive method. Right image shows
reconstruction results on the proposed method. .......................................................................... 115
Figure 5.17: Images above show ground truth point clouds. Images below show reconstruction
results on the proposed method. .................................................................................................. 116
VIII
List of Tables
Table 3-1: Configuration parameters ............................................................................................. 56
Table 3-2: The Evaluation of the Intrinsic Camera Calibration .................................................... 57
Table 3-3: The Evaluation of the Extrinsic Calibration ................................................................ 62
Table 3-4: Summary of the considered calibration method .......................................................... 62
Table 3-5: Operating Modes of the Simulator ............................................................................... 68
Table 3-6: Comparison Analysis ................................................................................................... 74
Table 4-1: The Configurations of the Calibration Methods .......................................................... 84
Table 4-2: The Configuration of the Simulation Environment ..................................................... 85
Table 4-3: The Configuration of the Real Environment ................................................................ 85
Table 4-4: The Evaluation of the Intrinsic Camera Calibration .................................................... 87
Table 4-5: The Evaluation of the Extrinsic Calibration ................................................................ 89
Table 4-6: The Evaluation of the Mapping Results ....................................................................... 90
Table 4-7: The Evaluation of the Mapping Results ....................................................................... 91
Table 4-8: The Evaluation of the Extrinsic Calibration ................................................................ 93
Table 4-9: The Evaluation of the Mapping Results ....................................................................... 94
Table 5-1: The Configurations of the Calibrations Methods ......................................................... 99
Table 5-2: The Evaluation of the Extrinsic Calibration .............................................................. 101
Table 5-3: The Evaluation of the Trained Networks ................................................................... 106
Table 5-4: Rotations of the cube face .......................................................................................... 108
北京理工大学博士学位论文!
9
Abbreviations and Symbols
2D
Two-dimensional space
3D
Three-dimensional space
AE
Absolute error
CV
Computer vision
d
Noise density
DOF
Degree of freedom
FOV
Field of view
HRI
Human-robot interaction
IMU
Inertial Measurement Unit
LRF
Laser range finders
MAE
Mean absolute error
MLE
Maximum likelihood estimation
PRM
Probabilistic Roadmap
RMSE
Root mean squared error
SFM
Structure from motion
SLAM
Simultaneous localization and mapping
TCP
Transmission Control Protocol
ToF
Time-of-Flight
UAV
Unmanned aerial vehicles
UI
User interface
Notation
X, Y, Z
World coordinates
!
Depth scale
u’, v’
Distorted coordinates in the sensor image plane
u, v
Undistorted ones in a virtual normalized image plane
"#$%
Polynomial
R
Rotation matrix
T
Translation matrix
&
Standard deviation
北京理工大学博士学位论文!
10
Chapter 1 Introduction
Robots and unmanned systems are increasingly rising in the new era of artificial
intelligence and fourth industrial revolution [1–13]. The advancements in digital image
processing improved the development and the intelligence of robots and made them more
popular [14]. One of the main reasons for not having mobile robots around us nowadays is the
difficulty to process and interpret exteroceptive information from the world. This is key for
mobile robotics since a robot must understand its environment before it can interact with it.
The quality and performance of the applications that employ any form of sensing is
largely dependent on the precision and accuracy of sensors integrated to the robotic systems.
Unfortunately, measurements from any type of sensor are always corrupted by noise; in many
cases, the characteristics of the physical sensor (e.g., its transfer function in the case where it is
modeled as a linear time-invariant system) deviate from the ideal characteristics. Therefore, there
is always a technological demand for more accurate and precise sensors.
However, increasing the accuracy and precision of sensors may come at a cost; for
example, this may require modifying the hardware design of the sensing part, which typically
leads to increasing the costs of the sensors, in turn potentially limiting their applications. This
intuitively shows why there is a large strive to attempt to increase the accuracy and precision of
sensors without modifying their hardware but rather through statistical signal processing tools. In
other words, assuming the availability of extra computational resources or information, one may
implement calibration techniques.
Calibrated sensors by acquiring depth perception allows a robot to create representations
of the free space and to avoid collisions, thus creating useful maps for self-navigation, and to
perform recognition through shape description. Particularly, the above-mentioned potential
applications can substantially benefit from the approach based on the omnidirectional cameras in
the sense that the information related with the wide field of interest can be obtained by a single
capture. However, just for the camera itself (structure from motion) or several cameras (stereo
vision), it can be difficult to deal with these kinds of images for the feature extraction, as a result
of the high images distortion. Moreover, the inherent characteristics of the environment will also
have a deep impact on the results. For instance, the illumination level as well as the pixels
similarity are two well-known features that may impose several constraints in the feature
extraction process. The effect of these limiting characteristics inherent to the real environments is
北京理工大学博士学位论文!
11
difficult to process and it is known as a correspondence problem of the stereo vision. The
solution can be found by means of the integration of the structured light to the system. An
overview of sensing techniques of the reconstruction methods is considered through this Section.
1.1 2D Mapping and 3D Reconstruction of the Indoor Environment
1.1.1 General reconstruction techniques
Awareness of surroundings of the mobile robot in 3D, estimating its own position,
defining motion as well as understanding depth and range/field of views are integral parts of the
computer vision. Modern technologies and, in particular, computer vision, tries to come closer to
the perception of depth and volume similar to the human vision. On the current stage,
possibilities of the computer vision are still quite far from capabilities of the human vision,
however nowadays a lot of methods has been developed for obtaining 3D information about
surroundings and in order to do not discuss them all, in this Thesis we will focus on
reconstruction techniques which are relevant to our topic, namely to the reconstruction of the
indoor environment.
The indoor reconstruction is a crucial technique in computer vision, contributing to
various applications (see Figure 1.1), such as virtual and augmented reality [15, 16]; layout
recovery [17, 18]; and mobile navigation [19-21]. A wide variety of approaches and algorithms
have been proposed to tackle this complex problem. Perception and sensing become an integral
part of reconstruction of a previously unknown environment. They must be able to estimate the
three-dimensional structure of the environment in order to perform useful tasks. In general,
reconstruction methods can be based on passive or active sensing techniques, each method has
its own relative merits.
Figure 1.1: Applications related to the indoor reconstruction.
北京理工大学博士学位论文!
12
Many methods evaluating depth of the scene operates similar to the human vision
perception:
• stereo vision [22-24];
• structure from motion [25-27];
• shadow based methods [28-30];
• tactile methods [31-33].
There are also methods, based on new physical principles:
Estimating passing time of the light to obstacles and backward (time-of-flight) [34-36];
• phase methods, based on principles of interference and holography [37-39];
structured light [40-43].
Methods measuring 3D-objects can be also divide on contact (mechanical sensors) and
non-contact methods (stereo vision; laser and x-ray scanning).
Contact methods obtain information about the object by direct physical contact with its
surface. Such device used basically in manufacturing processes and should be enough accurate.
Several measurements can be taken for preparing surface grid for modeling future model. One of
the main advantages of this method is a high modeling control, because of the manual hand
movements of the operator. However, movements of the hand for obtaining high-quality model
can be proceed very slow. Scanning speed do not exceeds several hundred hertz, after that
additional time has to be dedicated to the to the process of manual adjustments for obtaining final
model. Moreover, another lack of the contact scanners is the need of the direct contact with
surface of the scanning object, consequently there is a possibility to damage object during the
scanning process.
As for the latest years we can say that a significant progress has been achieved in the
development of contactless vision methods. Non-contact methods may divide on two groups
active and passive vision.
1.1.2 Passive vision
Range sensing is extremely important in mobile robotics, since it is a basic input for
successful obstacle avoidance. As we have seen earlier in this chapter, a number of sensors are
popular in robotics explicitly for their ability to recover depth estimates: ultrasonic, laser
rangefinder, time-of-flight cameras. It is natural to attempt to implement ranging functionality
using vision chips as well.
北京理工大学博士学位论文!
13
However, a fundamental problem with visual images makes range finding relatively
difficult. Any vision chip collapses the 3D world into a 2D image plane, thereby losing depth
information. If one can make strong assumptions regarding the size of objects in the world, or
their particular color and reflectance, then one can directly interpret the appearance of the 2D
image to recover depth. But such assumptions are rarely possible in real-world mobile robot
applications. Without such assumptions, a single picture does not provide enough information to
recover spatial information.
The general solution is to recover depth by looking at several images of the scene to gain
more information, hopefully enough to at least partially recover depth. The images used must be
different, so that taken together they provide additional information. They could differ in camera
geometry—such as the focus position or lens iris—yielding depth from focus (or defocus)
techniques. An alternative is to create different images, not by changing the camera geometry,
but by changing the camera viewpoint to a different camera position. This is the fundamental
idea behind structure from stereo (i.e., stereo vision) and structure from motion. As we will see,
stereo vision processes two distinct images taken at the same time and assumes that the relative
pose between the two cameras is known. Structure from motion conversely processes two images
taken with the same or a different camera at different times and from different unknown
positions; the problem consists in recovering both the relative motion between the views and the
depth. The 3D scene that we want to reconstruct is usually called structure.
Stereoscopic system.
Nowadays, stereo cameras are commonly available. Stereo cameras work based on the
principle of two-view triangulation. Stereo cameras can again be of two types: ‘binocularstereo’
and ‘structured-light-stereo’. A binocular-stereo camera is made of two cameras placed side-by-
side. The relative positions and the orientations of the two cameras with respect to each other are
fixed in such devices. These relative positions and orientations can also be easily calibrated. By
analysis differences between images, namely by establishing point correspondences between
them we can determine distance from the vision system to each point in 3D space. This method
on its principle similar to the stereoscopic human vision. The simplest way is to capture two
snapshots of the scene and convert them to three-dimensional representation. It was already
mentioned earlier that for obtaining three-dimensional points it is compulsory to install points
北京理工大学博士学位论文!
14
correspondence among two pictures, after that by triangulation it is possible to determine depth
of the scene. In stereo vision we can identify two major problems:
1. The correspondence problem;
2. 3D reconstruction.
The first consists in matching (pairing) points of the two images which are the projection
of the same point in the scene. These matching points are called corresponding points or
correspondences (Figure 1.2). The correspondence search is based on the assumption that the
two images of the same scene do not differ too much, that is, a feature in the scene is supposed to
appear very similar in both images. Using an opportune image similarity metric, a given point in
the first image can be paired with one point in the second image. The problem of false
correspondences makes the correspondence search challenging. False correspondences occur
when a point is paired to another that is not its real conjugate. This is because the assumption of
image similarity does not hold very well, for instance if the part of the scene to be paired appears
under different illumination or geometric conditions. Other problems that make the
correspondence search difficult are:
• Occlusions: the scene is seen by two cameras at different viewpoints and therefore there
are parts of the scene that appear only in one of the images. This means, there exist points in one
image which do not have a correspondent in the other image.
Distortion: there are surfaces in the scene which are nonperfectly lambertian, that is,
surfaces whose behavior is partly specular. Therefore, the intensity observed by the two cameras
is different for the same point in the scene as more as the cameras are farther apart.
Projective distortion: because of the perspective distortion, an object in the scene is
projected differently on the two images, as more as the cameras are farther apart.
Figure 1.2: Stereo vision.
北京理工大学博士学位论文!
15
Knowing the correspondences between the two images, knowing the relative orientation
and position of the two cameras, and knowing the intrinsic parameters of the two cameras, it is
possible to reconstruct the scene points (i.e., the structure). This process of reconstruction
requires the prior calibration of the stereo camera; that is, we need to calibrate the two cameras
separately for estimating their extrinsic parameters, but we also need to determine their extrinsic
parameters, i.e. the camera relative position.
Structure from motion (SFM).
This method applies for evaluating spatial structure of the scene. SFM aims to establish
correspondence between several cameras and simultaneously define camera position and three-
dimensional points, for example by factorization method. However, such methods are useful
only for offline applications, operating not in a real time, because of the requirements of many
frame for defining pose and points calculation. Mostly, SFM methods are based on the single
camera, where base line is determined by methods evaluating own motion, e.g. SLAM
(simultaneous localization and mapping) [44, 45]. Correct estimation of the camera position and
the fundamental matrix itself is extremely important for defining geometry of the scene and
computing its three-dimensional structure.
When there are just two images captured, the geometrical situation can be either modeled
by homography, when camera undergoes pure rotation or observes a single plane in the scene, or
by epipolar geometry, when the movement of the camera is general and an articulated scene is
observed. Measurements in the images, namely the image pixel coordinates of the corresponding
projections of unknown scene 3D points, can be used to form a system of equations according to
the geometrical constraints of the appropriate model.
As the systems of equations become more complicated for the cases where more than two
cameras are involved, there is no closed form solution for the SFM problem [46]. The obvious
drawback of SFM methods is high sensitivity to noise in the measurements. These methods are
also sensitive to various degeneracies of the scene and specific actions have to be taken in order
to prevent failures [47].
北京理工大学博士学位论文!
16
Figure 1.3: Structure from motion.
1.1.3 Active vision
The passive vision system usually consists of one or more cameras. By passive, we mean
that no energy is emitted for the sensing purpose. Hence the system receives the reflected light
from its surroundings passively and the images are the only input data. On the whole, the passive
system works in a similar way as human visual system does and its equipment is simple and low-
cost. Another advantage is that the properties of the object’s surface can be analyzed directly
since it is based on ambient light reflection off the object surfaces. However, this type of
techniques suffers from some nontrivial difficulties. Firstly, since the passive systems are mainly
developed by utilizing the clues from the scene or illumination, e.g. texture or shade, the
unknown scene becomes a source of the uncertainties. As a result, the major disadvantage is its
low accuracy and speed. Secondly, to extract enough features and establish point
correspondences from image sequence is a time-consuming and difficult task in many cases.
Usually, there exists an awkward dilemma between the disparity and the overlapping area. With
a large overlapping area between the images, it is easy to extract corresponding features. But the
algorithm for 3D reconstruction is sensitive to noise or even ill-conditioned since the images
have small disparity. On the other hand, the algorithm becomes more and more robust as the
disparity is enlarged. But it is difficult to extract the corresponding features due to occlusion, or
out of scope of viewing field, etc.
While the passive vision system suffers from many difficulty problems, the active vision
system especially structured light system, is designed to overcome or alleviate these problems to
a certain degree. In this kind of system, one of the cameras in passive system is replaced by an
external projecting device (e.g. laser or LCD/DLP projector). The scene is illuminated by
北京理工大学博士学位论文!
17
emitting light patterns from the device and detected by the remaining cameras. Compared with
the passive approach, the active vision systems are in general more accurate and reliable.
Range cameras.
The current development level of electronics allows directly measure time, passing by
light to an object and calculate the corresponding distance. 3D scanners, working on this
principle, are called Time-of-Flight (ToF) scanners. This type of scanners illuminate the scene
using near-infrared light and have a special CCD/CMOS array that can demodulate the reflected
light at every pixel. By measuring the phase difference between the emitted and received
amplitude-modulated light at every pixel location, the 3D geometric information of the scene can
be captured instantaneously in a single exposure. There is another category of 3D range cameras
that is based on the triangulation principle instead; examples from this category include the
Microsoft Kinect, Asus Xtion Pro, and Fotonic P70. Some of the advantages and disadvantages
of 3D range cameras over traditional photogrammetric systems and TLS instruments are listed
below:
Advantages:
3D Data capture at video frame rates: 3D geometric information can be captured at up
to 100 Hz without any scan time delay.
Active sensor: There is no correspondence issue as the camera illuminates the scene and
acquires dense 3D geometric information in a single exposure.
Disadvantages:
• Short maximum distance: The maximum unambiguous range (typically under 10 m) of
a phase-based 3D range camera is limited by the modulation frequency. Maximum range of a
triangulation-based 3D range camera is restricted by the baseline separation between the emitter
and receiver.
Small FOV: The largest angular FOV for 3D cameras currently on the market is 90°.
Low measurement accuracy: Even in close-range applications, only centimeter-level
accuracy should be expected for each pixel due to systematic errors and a low signal-to-noise
ratio.
北京理工大学博士学位论文!
18
Figure 1.4: Time-of-Flight scanners.
Device on basis laser range finders (LRF).
The LRF is a sensor that achieves significant improvements over the ultrasonic range
sensor owing to the use of laser light instead of sound. This type of sensor consists of a
transmitter that illuminates a target with a collimated beam (e.g., laser), and a receiver capable of
detecting the component of light, which is essentially coaxial with the transmitted beam. LRF
sends signal, which spreads with some speed, then this signal is reflected from object and returns
back. Passage time defines past distance, namely if speed of the signal is known, the
multiplication of this speed on half time between moments of emitting signal and receiving it
back will give distance from emitter to object. Basic operating principle of LRF includes pulse
and phase distance measurement methods and also triangulation method. A mechanical
mechanism with a mirror sweeps the light beam to cover the required scene in a plane or even in
three dimensions, using a rotating, nodding mirror.
One way to measure the time of flight for the light beam is to use a pulsed laser and then
measure the elapsed time directly. Electronics capable of resolving picoseconds are required in
such devices and they are therefore very expensive. Besides cost of the LRF another attention
should be paid to an error mode, which involves coherent reflection of the energy. With light,
this will occur only when striking a highly polished surface. Practically, a mobile robot may
encounter such surfaces in the form of a polished desktop, file cabinet or, of course, a mirror.
Unlike ultrasonic sensors, laser rangefinders cannot detect the presence of optically transparent
materials such as glass, and this can be a significant obstacle in environments like, for example,
museums, where glass is commonly used.
北京理工大学博士学位论文!
19
Figure 1.5: Laser range finders.
Structured light.
The use of the structured light is one of the most reliable methods for recovery 3D
information of the scene. This method is based on projecting light template on the scene or
structured light, which then is viewed one or several cameras. Since this template is encoded,
correspondence between image points and scene points can be easily found. In present time
various templates have been developed for use in vision systems with the structured light basis,
representing a series changing pictures, as well as constant pictures with a variety of color
encoding options [48]. Regardless of how it is created, the projected light has a known structure,
and therefore the image taken by the camera can be filtered to identify the pattern’s reflection.
Note that the problem of recovering depth is in this case far simpler than the problem of
passive image analysis. In passive image analysis, existing features in the environment must be
used to perform correlation, while the present method projects a known pattern upon the
environment and thereby avoids the standard correlation problem altogether. Furthermore, the
structured light sensor is an active device so it will continue to work in dark environments as
well as environments in which the objects are featureless (e.g., uniformly colored and edgeless).
In contrast, stereo vision would fail in such texture free circumstances.
Figure 1.6: Structured light.
北京理工大学博士学位论文!
20
1.2 Research Challenges
1.2.1 Sensing techniques
As it was mentioned earlier passive vision systems suffer from some nontrivial
difficulties. Here, we summarize the main difficulties of the passive vision systems. Firstly, the
passive systems are mainly developed by utilizing the clues from the scene (e.g. texture).
Therefore, it may be less effective in areas consisting of plain walls or long simple corridors
from which few features can be identified and extracted for model reconstruction. Secondly, to
extract enough features and establish point correspondences from image sequence is a time-
consuming and difficult task in many cases. As a result, the major disadvantage is its low
accuracy and speed. Moreover, there exists an awkward dilemma between the disparity and the
overlapping area. With a large overlapping area between the images, it is easy to extract
corresponding features. But the algorithm for 3D reconstruction is sensitive to noise or even ill-
conditioned when the images have small disparity. On the other hand, the algorithm becomes
more and more robust as the disparity is enlarged, but then it is difficult to extract the
corresponding features due to occlusion.
1.2.2 Field of view
Besides sensing techniques another performance indicator of the indoor reconstruction is
the field of view (FOV). Conventional visual sensors such as normal or even wide-angle CCD
cameras still have relatively modest FOVs which complicates the reconstruction of the whole
surroundings. For example, the ceiling is not usually visible [49], despite being an important
component of the main structure of the indoor environment. Therefore, a more recent research
direction looks to improve the situation by extending the FOV by deploying omnidirectional
cameras. The main benefit of the approach based on the omnidirectional cameras in the sense
that the information related with the wide field of interest can be obtained by a single capture.
However, just for the camera itself (structure from motion) or several cameras (stereo vision), it
can be difficult to deal with these kinds of images for the feature extraction, as a result of the
high images distortion. Moreover, the inherent characteristics of the environment will also have a
deep impact on the results. For instance, the illumination level as well as the pixels similarity are
two well-known features that may impose several constraints in the feature extraction process.
北京理工大学博士学位论文!
21
The effect of these limiting characteristics inherent to the real environments is difficult to process
and it is known as a correspondence problem of the stereo vision.
1.2.3 Results validation
In the real environment it is difficult to estimate the error between real orientations (of
the camera and laser plane) and those obtained during the calibration process. Real orientation
cannot be measured in a precise way, due to the fact that manual inspection can lead to errors in
the measurement. Therefore, analysis of the calibration results becomes more challenging. In
addition, it is difficult to compare how one parameter or another may influence the final
calibration results.
1.2.4 Structured light
First of all, it is worthwhile mentioning that indoor environments have certain constraints
between the floor, wall, and ceiling, which can be taken into consideration for mobile robot
navigation with laser illumination. Secondly, in contrast to the passive vision systems, the active
vision systems may rely on energy (e.g. structured light) being projected into the scene
intentionally. The main benefit of using structured light for data analysis is its simple detection
and extraction mechanism from the given image. Furthermore, the image of the indoor
environment obtained as a result of both the omnidirectional camera and the projected laser
includes more information for data understanding as compared with that obtained by means of
sonar, for instance. Indeed, under the former scenario the distance can be easily superimposed on
the image while providing a visual representation. On the other hand, the sonar-based approach
can provide distance related information, however it lacks a visual representation. Therefore, a
vision system for indoor navigation consisting of an omnidirectional camera in combination with
the structured light has gained a widespread attention among scholars, due to its large scene
range and high measurement’s efficiency. However, an important step here is the calibration
between the camera and laser source. Otherwise, measurement results can be not accurate as it
was expected.
1.3 Problem Statement
In order to obtain reliable mapping results, a vision system must be calibrated. This is
mainly due to the fact that without a known relationship between the camera and laser plane, it is
not possible to carry out the measurements in an appropriate way (see Figure 1.7). This statement
北京理工大学博士学位论文!
22
can be justified by analyzing several existing works that are presented below. Some experiments
available in the literature on the topic have been carried out under certain pre-defined
assumptions, namely, the camera and laser planes were installed parallel to the floor. The
possible reason for these assumptions is that existed calibration techniques are suitable only for
the one laser plane presented in the scene. But omnidirectional vision systems presented in a
literature [50-52] are based on the several laser emitters (see Figure 1.8). Probably that is why
measurements to the obstacles were carried out on a laser plane adjusted parallel to the floor.
Consequently, these experiment results were not as accurate as expected, e.g. see Figure 1.7. In
other works, calibration procedure of the extrinsic parameters was not considered by assuming
an ideal system in which calibration methods were not required, which may not be the case in
practical systems [53, 54]. At the same time, even small misalignments can lead to the incorrect
measurements, which is more important for omnidirectional vision system characterized by a
wide field of view. Therefore, not only new calibration techniques should be considered but it is
also important to look for new configurations of the vision system, which can be calibrated with
existed calibration methods.
Figure 1.7: Mapping results when extrinsic calibration was not considered.
北京理工大学博士学位论文!
23
Figure 1.8: Configuration of the vision system with several laser emitters.
1.4 Thesis Aims and Objectives
The overall goal of this Thesis is to achieve more accurate and robust results of the 2D
and 3D mappings with the less input data from the sensors involved to the vision system. This
work is made possible by recent advances in a deep learning and platforms allowing to generate
a large amount of the annotated training data in environments with a variety of conditions. A
series of three descriptive studies is proposed to systematically approached to more advanced
measurements for an omnidirectional vision system with laser illumination:
1.4.1 Aim 1 – Literature review
Aim 1 proposes to carried out a complete literature review, analysis, comparisons and
evaluation of existing successful schemes for extrinsic calibration of the vision system as well as
2D and 3D mappings of the indoor environment. We find out the weaknesses of these methods,
identify the research gaps, perform the analysis and provide the contributions.
Outcome: with an extensive overview of the related work it will be possible to determine
what is known on the topic, how well this knowledge is established and where future research
might best be directed.
1.4.2 Aim 2 – A novel omni-vision system
Aim 2 proposes to investigate a novel omnidirectional vision system with laser
illumination in a flexible configuration with the simulated and real data in order to eliminate
assumptions which previously were used for measurements. In the proposed vision system
flexible configuration means that, the camera and laser plane are not considered parallel to the
北京理工大学博士学位论文!
24
floor and the relationship between them can be obtained by means of calibration. The calibration
procedure is possible because the proposed vision system will consist of the single camera and
single laser emitter. During real experiments it can be difficult or impossible to compare methods
with each other because of the lack of the ground truth and measurement uncertainty. In contrast,
inside the simulation environment all of the variables are known. Therefore, this research will be
also focused on the development of the virtual environment, which will be helpful before
verifying vision system by real experiments.
Outcome: with the proposed vision system it will be possible to achieve a higher
accuracy and reliability of measurements, while involving less sensors to the system, which
makes it more affordable. As for the virtual environment, the simulator itself will be able to
generate photo-realistic images of objects and environments. The simulator will be helpful for
comparing methods with each other and testing theories before carrying out experiments in real
environments.
1.4.3 Aim 3 – Novel methods of extrinsic calibration
Aim 3 proposes to investigate calibration methods for 2D and 3D mappings with more
effective and automatic algorithms for computing extrinsic parameters. Preliminary work has
revealed particular limitations of the existed calibration methods, such as noise, their complexity
as well as limitations. These limitations are going to be eliminated with the proposed calibration
methods.
Outcome: with the aid of the deep analysis of the existed works on calibration and their
limitations it will be possible to extract only relevant information for developing new calibration
methods for the proposed vision system. Thus, by new calibration targets and new calibration
techniques it will be possible to obtain accurate and reliable extrinsic parameters of the vision
system by a single input image.
1.4.4 Aim 4 – A novel 3D reconstruction technique
Aim 4 proposes to investigate reconstruction of the indoor environment for the proposed
vision system in combination with semantic segmentation, allowing to obtain 3D model of the
indoor environment with one single snapshot. This study more directly addresses the practical
applications of the proposed vision system. By including structured light with its high degree of
detection on the input image it will be possible to extract depth of the scene. By involving
北京理工大学博士学位论文!
25
semantic segmentation network in combination with the depth data it will be possible to
reconstruct the indoor environment.
Outcome: the proposed reconstruction method will significantly improve reconstruction
results by eliminating disadvantages of passive vision systems, which are not effective in non-
textured environments. Thus, it is expected that by involving structured light and deep learning,
the accuracy and reliability of reconstruction results will significantly increase.
1.4.5 Tasks
In order to achieve defined aims of the Thesis bellow are enlisted the main tasks:
1- Comprehensive review of related works. Analyzing methods based on effectiveness
and relevance with regard to our topic.
2- Complete understanding models of vision systems and their weaknesses. Identification
of research gaps and issues encountered in existing models. On the basis of these findings
propose a suitable and robust solution for 2D/3D mappings in the indoor environment.
3- To verify and validate the performance of methods, based on effectiveness and
performance, select a suitable or propose novel relevant virtual environment for carrying out
experiments before testing the proposed vision system in the real environment.
4- On the basis of literature review, propose method for extrinsic calibration of the vision
system, which will be beneficial for both efficiency and accuracy of the calibration parameters.
5- On the basis of literature review, propose methods for 2D/3D mappings, which will be
beneficial for both efficiency and accuracy of the mapping results.
6- Collection of prominent results based on virtual/real data and their comparison to the
existing traditional and supervised calibration/mappings methods in the context of both
efficiency and accuracy.
1.5 Thesis Contributions
The contributions of the Thesis are given as follows:
1- A complete review of the related works.
2- A novel omnidirectional vision system with laser illumination in a flexible
configuration
3- The calibration method for 2D mapping, allowing to obtain extrinsic parameters with
one single snapshot.
北京理工大学博士学位论文!
26
4- The calibration method for 3D mapping, allowing to obtain extrinsic parameters with
one single snapshot.
5- The reconstruction method for omnidirectional vision system based on laser
illumination in combination with semantic segmentation, allowing to obtain 3D model of the
indoor environment with one single snapshot.
6- A customizable photo-realistic simulator for the CV community working with
omnidirectional vision systems with laser illumination in indoor scenarios
1.6 Thesis Road Map
Chapter-1: Describes the main problems on topics 2D mapping and 3D reconstruction of
the indoor environment. We present the ultimate aim, objectives, motivation, major challenges,
and enlist the possible applications of the research study. Then Thesis contributions are given at
the end of the chapter. The remaining Thesis is structured as follows:
Chapter-2: Provides comprehensive literature review of simulators, novel calibration
techniques and reconstruction methods. We present the evolution of these methods over the last
three decades with their introduction, types of features, and operating techniques. Next their
merits are discussed and based on these statements research problem is identified and the
appropriate solutions are proposed.
Chapter-3: Considers preliminary work, dedicated to the understanding operating
principles of the state-of-the-art calibration techniques, in order to evaluate advantages and
disadvantages of them. Here we also present the simulation environment for conducting
experiments and testing theories.
Chapter-4: This chapter proposes a novel calibration method of extrinsic calibration and
verifying results by 2D mapping with the simulated and real data.
Chapter-5: We introduce another method of extrinsic calibration and verifying results
with the proposed reconstruction technique.
Chapter-6: This chapter summarizes and draws a conclusion; we discuss our
achievements and contributions. We also suggest scope for future work in short- and long-term
perspectives.
北京理工大学博士学位论文!
27
Chapter 2 Literature Review
The content of this Chapter is based on the topic of “Calibration and 3D Reconstruction
with a Photo Realistic Simulator Based on the Omnidirectional Vision System”. In the literature,
we study and investigate the new challenges, new requirements and existing key models. This
Chapter also summarizes the current domestic and international research status and evolution
trends. On the basis of this comprehensive literature review, the research problem is identified
and the appropriate solutions are proposed to the related issues.
2.1 Virtual Environment
2.1.1 Introduction
Last decades, we observe a huge increase in interest towards engineering and computer
science courses, e.g. electrical engineering, electronic engineering and mechanical majored
courses. More and more students choose this direction for their future career. At the same time,
mobile robots are slowly getting to be adopted for those courses. For example, in some
universities students are able to acquire a valuable experience and useful skill by working with
real robots [55]. The teamworking, problem-solving are important aspects towards flexibility of
the educational process, as it allows students arouse their curiosity by exploring practical tasks.
However, some educational centers may suffer from the lack of the real robots. Consequently,
the number of various online platforms where students are able to acquire a variety of practical
skills in the field of engineering and computer science is becoming popular [56]. These trends
make the question of teaching methods for the following sciences more relevant. It is also
worthwhile mentioning that students do not always have enough motivation and involvement to
the educational process; as a result, we do not always observe their high academic performance
in those fields. Teaching materials which are presented in a simple form, namely overloaded with
diagrams and text, may cause those problems defined earlier. Secondly, these standard teaching
materials make the educational process repetitious. Thus, students often lack of real, practical
experience related with the disciplines they are studying. Mostly, standard educational process
gives an understanding of how the computer and engineering knowledge can be applied to
science, business, industry, and other fields only theoretically. In standard programming or math
classes, students usually solve various problems that are poorly related to their future profession.
Moreover, over the past year, the situation with Covid-19 has involved the adoption of drastic
北京理工大学博士学位论文!
28
and necessary measures by various countries around the world. The Covid-19 pandemic has
affected nearly 1.6 billion students in more than 190 countries around the world. In this context,
students at different levels of education have witnessed the interruption of the educational
process that would otherwise take place under normal conditions at universities and other
educational centers at different levels. The effects of this interruption have been particularly
difficult for engineering students, where students are typically required to carry out laboratory
experiments in order to conduct certain proofs of concepts as well as, to validate the research
results on a particular subject. Thus, the maintaining of educational prosses during pandemic is
broadly discussed by scholars [57-60]. By analyzing those works we can say that, the most
appropriate way to combine theory with practice under current situation and within trends
defined earlier is the use of simulation platforms.
In our published paper we demonstrated the capability of the simulation environment by
comparing different calibration techniques with each other [61]. In real experiments there is a
measurement uncertainty, which makes the comparison between methods more complicated.
Moreover, in real cases it is difficult or not possible to estimate real values of some of the
parameters, e.g. real location or orientation of the laser plane, whereas inside the simulation
environment they are known. Our work also proved that the modern game engines (Unity
platform in our case) allows users to create photo-realistic virtual environments, which are
suitable for testing theories before experiments are performed in real conditions. In this work we
decided to release the simulator to the CV community, to the best of our knowledge this is the
first customizable simulator allowing investigation between omnidirectional vision systems.
2.1.2 Review on simulators
Many simulators available in the market provide simulations for agents whether wheeled,
legged or UAV. These simulators are pertinent for these kinds of robots because of the features
instilled in them by extensive modern research. Some of these simulators are provided gratis in
order to involve maximum cooperation from dexterous persons who can contribute towards the
development of the simulator code; commonly known as ‘open-source simulators’; e.g. OpenSim.
Whereas, the simulators utilized and produced to charge the consumers in the market are
‘commercially available’ ones; e.g Cyberbotics’s Webots. All these simulators differ in the
features imparted and performance. They differ largely in the 3D visualization, cost, fidelity,
simulation engines, governing architecture and many more.
北京理工大学博士学位论文!
29
Commercial Simulators:
Webots: Webots is a 3D mobile simulation software package that facilitates robotics
research and robot modeling. It was developed by Cyberbotics Ltd. and EPFL. The Open
Dynamics Engine [62] used, provides accurate physics simulation. Complex worlds can be
created using Open GL technologies and built-in 3D editor. Also, 3D models can be imported
from software programs (MATLAB, LabView, Lisp, etc.) through VRML [63] standard, linking
controller and application with TCP/ IP interface. Robots controllers can be transferred to real
robots like Aibo, Lego mindstorms, Khepera, Koala, Hermission, Boe- Bot, E-puck and many
more [64-67].
Figure 2.1: Webots simulator.
Matlab: Matlab is a high-level language and interactive environment for numerical
computation, algorithm development, data visualization and data analysis. Simulink [68] creates
3D animation and models motors and sensors. Robotics Toolbox provides kinematics, dynamics,
and trajectory generation. In addition, toolboxes can interact with each other and with other
simulators to provide necessary support [69-74].
北京理工大学博士学位论文!
30
Figure 2.2: Robotics System Toolbox.
Microsoft Robotics Developer Studio (MRDS): is a Windows-compatible
development environment for robot control and simulation across a wide variety of platforms. It
is used by academic, hobbyist, and commercial developers. Robot interaction can be achieved
using Web Browsers or Windowsbased interfaces like HTML and JavaScript. The new Robotics
Studio 2008 is based on IDE (Integrated Development Environment) for visually and graphically
producing code [75-77].
Figure 2.3: Microsoft Robotics Developer Studio.
Open-Source Simulators:
USARSim: is a high-fidelity simulation system designed for automation and to
overcome modern robot design problems. Originally, it was used for urban search and rescue. It
supports multi robot coordination and human-robot interaction (HRI) through accurate
北京理工大学博士学位论文!
31
representation of robot, remote environment and user-interface behavior. It supports a variety of
robots including wheeled, tracked, legged and flying robots. High fidelity and low cost are
achieved through the integration of development tools, advanced editing that widens the range of
platforms that can be modeled. System architecture is based upon Client/Server Model consisting
of Controllers (Client), Game Bots and Unreal Engine (Server). Unreal Script facilitates creation
of new objects in the game. GameBots communicate between Unreal Engine and Controllers by
TCP/IP socket interface. Simulation is of three types: Environment, Sensor and Robot. The
system is extensively used in RoboCup Federation. IEEE supports two robotic competitions and
provides robotics education with USARSim [78-80].
Figure 2.4: USARSim simulator.
OpenSimulator: referred as OpenSim, it is application software, written in C#,
enabling the creation of 3D virtual environment. OpenSim is the implementation of the Linden
Labs SecondLife [81] server. OpenSim is in the alpha testing phase; however, it is already used
formally by educational organizations and companies such as IBM, Microsoft, Nokia and Intel
[82-83].
北京理工大学博士学位论文!
32
Figure 2.5: OpenSim simulator.
ÜberSim: ÜberSim is an open-source (GPL release) high fidelity multi-robot
simulation engine primarily intended for high-development rate of robot control systems and
their easy transference to real robots. It is based on Client/ Server architecture in which client
and server are synchronous at all times. Client and Server change functions to provide increased
interchangeability between the simulation control code and the real robot control code. The high-
fidelity simulation engine and extensible robot classes aid the simulation of a wide variety of
robot types ranging from small-size soccer robots to legged robots (Aibo). Simulated robot is
modeled with XML description language that obviates the changes in control algorithms when
physical changes (sensors/ actuators) are made to the robot structure [84-85].
Figure 2.6: Robot Soccer Simulator with ÜberSim.
北京理工大学博士学位论文!
33
Simbad: It is an open source Java 3D simulator that facilitates computer simulation, not
the real world simulation for mobile robots. It uses built-in physics simulation. Complete Simbad
package comprises of simulation engine with two standalone libraries: Neural Network
(PicoNode) and Artificial Evolution (PicoEvo). It works on any system having Java language
and Java 3D library. Robots operate in time-sharing mode. It is used for research work in AI and
Machine Learning pertaining to Autonomous Robotics [86-88].
Figure 2.7: Simbad simulator.
Breve: Breve is a 3D simulation environment that provides frameworks for the
simulation of decentralized systems and artificial life. For enabling customization of the
functionality of applications, Breve defines a ‘plugin’ architecture that allows users to create
plug-ins. The simulation engine does not provide an accurate touch to the simulation, however
aims at making the simulation ‘realistic’, like rigid body simulation, collision detection/response,
and articulated body simulation. 3D visualization is obtained using OpenGL display engine. The
display engine also produces ‘special-effects’ like shadows, reflections, lighting, semi-
transparent bitmaps, lines connecting neighboring objects, texturing of objects and the ability to
treat objects as light sources. It is a free software package and released under GPL license [89-
91].
北京理工大学博士学位论文!
34
Figure 2.8: Ways of simulating the omni-camera.
This subsection reviewed a wide variety of simulators with different features. The
process of supporting/improving currently available simulators as well as the fact of appearing
new simulators show the importance of their development in the field of robotics. At the same
time, during the literature review we did not find simulators which are able to simulate
omnidirectional vision systems with laser illumination. However, a couple of works from the
literature which can be beneficial for the development process of the simulator deserve
consideration.
2.1.3 Omni-vision simulation
In the last few decades a wide variety of robotic simulators have been developed
commercially or in research laboratories [92] resulting in considerable publication in this area.
An exhaustive review is beyond the scope of this dissertation so this subsection considers only
those most relevant to our work, namely the simulators supporting omnidirectional cameras and
structured light. Widely used simulators as Gazebo [93] and USARSim [94] support laser
plugins but unfortunately, they do not include omnidirectional cameras. In contrast, in works [95,
96] authors managed to integrate omnidirectional cameras to these simulators. Authors
superimposed images of the environment onto the faces of a cube, after which they were able to
use this as a texture for creating a hyperbolic mirror or a fisheye camera. However, such
manipulations require certain programming skills what could be problematic for some users. In
recent years NVIDIA released the photorealistic robotics simulator, namely NVIDIA Isaac Sim
[97], and the latest version of this simulator supports a fisheye camera. However, the use of this
simulator is limited to computers with NVIDIA GPUs. Multiple platforms support is provided by
programs such as Blender and Unity.
北京理工大学博士学位论文!
35
In order to generate photorealistic synthetic images, in a couple of works [98, 99] Blender
was considered as the basis for creation of omnidirectional vision systems (see Figure 2.9).
Blender is an open source suite of tools for 3D modelling, rendering and animation. However, it
is not suited to programming tasks and communication with other programs, which restricts its
use in certain cases. More flexibility is provided by game engines such as Unity, UNIGINE,
CRYENGINE, and Unreal Engine 4, which provide extensions of their core functionality with
their native programming languages. By taking an advantage of modern game engines P. Bourke
released a publicly available fisheye camera with the various FOV, which was simulated in
Unity [100]. Thus, users familiar with the Unity platform, may capture omnidirectional images
of their 3D scenes (see Figure 2.9). However, the process of developing a new scene is time-
consuming and requires certain skills in Unity. Thus, in this Thesis we propose a simulator,
which is targets the study of omnidirectional vision system in indoor environments. There are not
any particular skills in Unity are needed to use our simulator, it is built like a video game that can
simply be installed without any additional dependencies.
Figure 2.9: Ways of simulating the omni-camera.
2.2 Extrinsic Calibration
The critical task in autonomous navigation is related with acquiring information about the
environment. It is possible to classify sensors as being proprioceptive or exteroceptive. A
proprioceptive sensor measure values internal to the robot. On the other hand, exteroceptive
sensors acquire information about the environment where the robot is inserted. The information
can be processed in order to extract meaningful environmental features.
北京理工大学博士学位论文!
36
However, the relative sensor poses to each other is needed to transform all measurements
into a common coordinate frame for tasks like localization or sensor fusion. The so-called
extrinsic sensor parameters are defined by 3 DoFs (degrees of freedom) for translation and 3
DoFs for rotation (e.g., yaw, pitch and roll components). Another situation is for safety sensors,
e.g., errors of 1o in the orientation of a safety laser can lead to position errors up to 0.5m at a 30m
range. The 0.5m error can put in danger the robots or persons moving through the environment
by not detecting correctly if an obstacle is in the imminence of colliding with the robot. Thus, the
literature review on extrinsic sensor calibration is considered through this Section.
2.2.1 Review on methods
Many applications in the field of mobile robotics employ a variety of sensors, from
proprioceptive sensors (like GPS, Inertial Measurement Units (IMU) or shaft encoders) to
exteroceptive sensors (including vision, range or contact devices). In order to exploit efficiently
the information provided by such sensorial systems, the sensors must be calibrated to: interpret
correctly the acquired data (intrinsic calibration), and to put all the measurements in a common
reference frame (extrinsic calibration). This Chapter is focused on sensor extrinsic calibration.
For more specific literature on this the reader is referred to [101].
A structured-light vision system is basically composed of a camera and a laser projector,
the working principle is laser triangulation. When the laser plane is projected on a scene, the
camera captures the image of the modulated light stripe. In order to obtain reliable reconstruction
results, any vision system must be calibrated. If the sensor is calibrated, the 2D data in the laser
plane could be gained from the image. The goal of sensor calibration is to establish the mapping
relationship between the laser plane and the computer image plane, and the key procedure is to
collect calibration points. Extrinsic calibration consists in finding the mapping relationship
between the laser plane and the camera. The key technique in this stage is to determine
calibration points in the laser plane and their corresponding points on the image plane.
Different calibration techniques have been presented in the literature (see Figure 2.10)
such as the method based on the raised block basis [102, 103]; calibration by using ball as a
target [104]; geometrical calibration method based on the theory of vanishing points and
vanishing lines [105]. The most commonly used method includes checkerboard pattern for the
calibration process [106-120]. Laser strip intersections with the patterns can be obtained, after
that the relationship between the camera and laser plane can be calculated. In order to calibrate
北京理工大学博士学位论文!
37
the laser plane, at least three non-collinear points are required, which allows one to obtain a
unique solution by analyzing the extracted laser points belonging to the pattern placed at the
different positions.
Figure 2.10: Calibration methods for obtaining extrinsic parameters of the vision system.
Zhang and Pless [106] first initialized the extrinsic parameters by a linear approach using
a checkerboard (see Figure 2.11) as a calibration pattern to define a geometric constraint between
a 2D laser scanner and a camera. Next, a non-linear optimization procedure was developed to
minimize the point-to-plane error, and it was also implemented an outlier detection based on the
noise extracted from fitting a line to the laser data. Finally, a global optimization procedure also
using the Levenberg-Marquardt algorithm minimized the combined reprojection error and the
point-to-plane error to refine the extrinsic parameters. The Levenberg-Marquardt algorithm was
used for both optimization procedures. Note that [106] does not need an initial estimation.
Figure 2.11: Setup Laser – Camera used in Zhang and Pless [100].
A similar calibration technique was adopted for an omnidirectional vision system in the
paper [107]. Instead of moving the pattern to the different position, authors used an alternative
solution, which is based on the establishment of two perpendicular checkerboard patterns (see
Figure 2.10). The spatial coordinates of the laser line on two calibration patterns can be obtained
北京理工大学博士学位论文!
38
by means of the homography matrix. Afterwards, the plane equation can be fitted to these
coordinates.
Based on [106], Mei and Rives [108] calibrated an omnidirectional camera based on a
central catadioptric sensor and a 2D laser scanner. [108] considered both visible and invisible
laser sensors. The first setup (visible laser–camera) can be calibrated using two different
approaches: minimization of the reprojection error (association between the visible laser points
and the images) or minimization based on the association between the visible laser trace and the
images. As for the second setup (invisible laser–camera), similar to [106], [108] minimized the
reprojection error considering a checkerboard as a calibration object. [108] does not need an
initial estimation.
Although Vasconcelos et al. [109] based their work on [106], they did not use point-to-
plane correspondences. Instead, [109] fitted lines to the laser points to compute an initial
estimation, and each plane obtained from the calibration object (checkerboard) must go through
the lines, as illustrated in Figure 2.12. Finally, the extrinsic parameters are refined by minimizing
the reprojection error using bundle adjustment. [109] noted that the minimum of different poses
and orientations of the checkerboard is three.
Figure 2.12: Setup Laser – Camera used in Vasconcelos et al [109].
Gong et al. [110] used a trihedron (orthogonal or not) as a calibration object to compute
the 3D laser scanner–camera extrinsic parameters. [110] defined four constraints: trihedral
constraint (trihedron defines the relative sensor’s pose to the world frame), planarity constrain
between two frames (coplanar point lies on a plane independently of the reference frame),
planarity constraint between two images (correspondence of coplanar features in two different
images), and the motion constraint (sensors’ rigidity). These constraints were considered on a
北京理工大学博士学位论文!
39
non-linear least-squares problem further solved with the Levenberg-Marquardt algorithm. Even
tough [110] does not need initial estimations, the user must select the trihedron planes in the
observations.
Gomez-Ojeda et al. [111] used maximum likelihood estimation (MLE) processes to
minimize line-to-plane (rotation) and point-to-plane (translation) errors using an orthogonal
trihedron (more restricted that [110], in terms of the calibration object). The MLE processes were
formulated independently. First, the rotation matrix is estimated by formulating the optimization
procedure on the tangent space using Lie algebra. Next, the translation component is obtained
using the Levenberg-Marquardt algorithm. [111] requires an initial estimation for the extrinsic
parameters.
A method that does not require an overlapping field of view (2D laser–camera setup) was
proposed by Bok et al. [112]. Two approaches were developed: one similar to [106] assumed that
the checkerboard is perpendicular to a plane detected by the laser (left side of Figure 2.13), and
the other one assumed that the intersection line of two planes (orthogonal or not) is aligned with
the checkerboard’s coordinate system (right side of Figure 2.13). Authors in [112] initialized the
extrinsic parameters using the least-squares algorithm. An optimization procedure refined the
extrinsic parameters minimizing the point-to-plane error for the first and the distance error
between the line and feature points for the second approach. Another cost function was
formulated for the second approach to minimize the point-to-plane error. [112] needs five and six
different checkerboard’s poses for the first and second method, respectively.
Figure 2.13: Setup Laser – Camera used in Bok et al [110].
Pereira et al. [113] also calibrated camera–laser (2D or 3D) setups and proposed the
minimization of the reprojection error using the Levenberg-Marquardt algorithm. This error is
北京理工大学博士学位论文!
40
relative to a 3D sensor frame (e.g., 3D laser scanner), instead of reprojecting the ball’s center to
the camera’s frame. Note that the point-to-point error continues to be valid for the 2D/3D laser–
camera setup. Similar to [113], Guindel et al. [114] implemented the ICP algorithm to calibrate
the 3D laser scanner–camera setup with closed-form equations to minimize the point-to-point
error using a planar object with four symmetric circular holes. This error was computed using the
correspondences of holes’ centers between the sensors. [114] needs a stereo-pair configuration
(two cameras with a known pose between them).
Yousef et al. [115] proposed another method that does not require an overlapping field of
view. [115] adapted the robot-world hand-eye calibration problem to the calibration of a 2D laser
scanner–camera setup. Different transformations between the laser, camera, floor, and
world/checkerboard frames were formulated to perform a Euler parameterization using the
Levenberg-Marquardt optimization algorithm. [115] requires that the floor frame remains
stationary relative to the calibration object (checkerboard) and detectable by the laser scanner. As
a calibration environment, the authors used a corner of a wall (taking advantage of the
orthogonality assumption for the wall planes) with a checkerboard fixed on one of the wall
planes. However, [115] has a restriction: it assumes that the laser plane is parallel to the floor
(zero pitch and roll).
Kühner and Kümmerle [116] calibrated the camera–laser (2D or 3D laser scanners) using
least-squares. However, [116] minimized the point–to–ray distance error. The method has the
same requirements as for the laser–laser setup, and it was noted that the problem is only fully
constraint if scale information is provided by a range measurement (at least one observation).
Even though [114] proposed an error function for 2D lasers, [116] was only tested with 3D laser
scanners.
Lastly, Oliveira et al. [117] also estimated the extrinsic parameters for the 2D laser–
camera setup and minimized the reprojection error using bundle adjustment.
2.2.2 Problem formulation
However, some types of the omnidirectional vision systems do not meet to requirements
of the considered calibration methods. Previously mentioned calibration techniques assume that
only one laser plane is present in the scene. But as for mapping, in order to obtain the distance
information about the environment, omnidirectional vision systems are based on the several laser
emitters [56-58]. That is why in the previously mentioned works, the distance measurements to
北京理工大学博士学位论文!
41
the obstacles were carried out on a laser plane adjusted parallel to the floor, in which the offset
between the origin of the camera and origin of the laser plane is not significant. Otherwise, with
the emitters’ orientation, the number of laser planes increases. Calibration of every sensor is a
tedious process and at the same time is meaningless, e.g. when lasers emit similar red light. In
another work, in order to estimate the position of the structured light in an omnidirectional vision
system, X. Chen et al. proposed a structured light calibration method for estimating the 3D
information of objects [121]. As omnidirectional imaging technology can reflect objects light
within a 360o field of view via a conic mirror, this proposed method can be used in order to
calibrate the position of multiple structured lights (emitted same red light) simultaneously.
However, this method is only applicable for point-structured lights, which are not really suitable
for navigation and mapping of the indoor environment.
Laser plane calibration by checkerboard patterns requires additional steps, which make
the calibration process more complicated. Namely, for every pattern position two snapshots
should be taken: one with the laser beam (for laser extraction) and another one without the laser
beam (for obtaining pattern points). In the specific case with the one image, pattern points might
be extracted in a wrong way, because of the laser points belonging to the pattern. Therefore, in
order to obtain reliable measurement results and simplify the calibration process, new
configurations of omnidirectional vision systems based on laser illumination must be
implemented.
2.3 Reconstruction of the indoor environment
Perception and sensing become an integral part of reconstruction of a previously
unknown environment. They must be able to estimate the three-dimensional structure of the
environment in order to perform useful tasks. In general, reconstruction methods can be based on
passive or active sensing techniques, each method has its own relative merits.
2.3.1 Stereo vision
Numerous techniques have been studied for 3D reconstruction of the indoor environment.
A popular and conventional approach of creating a digital representation of the scene is to
generate 3D points cloud (see Figure 2.14) from multiple digital images [122]. In this case,
identified feature points between multiple images can be used to determine camera poses
(Bundler) and subsequently 3D point clouds can be created. The performance of these methods
北京理工大学博士学位论文!
42
depends on being able to reliably detected features in the surroundings, therefore methods based
on the passive vision systems may fail for featureless environments.
Figure 2.14: Example of 3D points cloud.
2.3.2 Kinect
On the other hand, with the current popularization of RGBD cameras such as Microsoft’s
Kinect, several techniques have been proposed recently to model scenes with a depth camera
[123, 124]. However, a key limitation of the Kinect sensor is the limited FOV (see Figure 2.15).
In an attempt to address these deficiencies F. Tsai et al. proposed a vision system consisting of
multiple RGBD (Kinect) and DSLR cameras [124]. By merging the conventional images with
the depth images authors were able to reconstruct the environment even in featureless areas (see
Figure 2.15). At the same time, from the results presented in their work it can be seen that even if
multiple sensors involved, there can still be unreconstructed regions, which makes this method
less applicable to the certain applications, e.g. mobile navigation. This problem might be solved
by integrating even more vision sensors to the system, but this increases the computation and
overall expense of the vision system.
北京理工大学博士学位论文!
43
Figure 2.15: 3D reconstruction based on the Kinect sensor. The image above shows single Kinect. The image below
shows multiple Kinects.
2.3.3 Lidars
In order to achieve a wide horizontal FOV with long range and high accuracy, several
Kinect sensors can be replaced with a LIDAR sensor [116]. This involves fewer elements in the
vision system, which makes it more reliable, but still a single LIDAR unit is generally
insufficient for analyzing indoor scene in the vertical direction (see Figure 2.16). At the same
time vision systems with multiple LIDARs are problematic due to their cost, size, and weight.
Several approaches based on the single LIDAR sensor have addressed this problem [125-129].
The general idea of these works, that provide a cost-effective vision system and achieve a wide
vertical FOV, is that authors have attempted to shift from rigid vison systems to more flexible
configurations by rotating the LIDAR sensor (see Figure 2.16). This make it possible to extract
more features of the environment with a single LIDAR unit. Even though these LIDAR based
approaches provide a fully omnidirectional depth sensing capability it is still a relatively costly
sensor for the indoor environment. A more cost-effective and lightweight solution is achieved by
using a structured light approach.
北京理工大学博士学位论文!
44
Figure 2.16: 3D reconstruction based on the Lidar sensor.
2.3.4 Structured light
The structured light is not only cost-effective and lightweight, but it also allows easily
detection of projected features by the camera the subsequent calculation of depth information
from laser triangulation. This approach provides a wide FOV while achieving portability and
affordability. Y. Son et al. proposed a tiny palm-sized vision sensor, which is composed of the
fisheye camera, structured light and rotating motor. With the rotational movement a 3D
omnidirectional sensing capability is achieved [130]. Particular attention needs to be made to the
type of the encoder used, e.g. magnetic encoders may suffer from nonlinearity problems,
otherwise, angular position measurement error may have an adverse impact on the final
reconstruction results. P. De Ruvo et al. also proposed the vision system based on the rotation
platform [131]. By ensuring accurate control of the angular velocity, authors have achieved a
high precision 3D omnidirectional reconstruction (see Figure 2.17A). However, this type of the
vision system has a relatively large and complex structure, which is challenging to recreate.
Moreover, both methods described above are focused on the issues of data acquisition of the
individual scans and their analysis, but the challenge addressed here is that reconstructed models
are not textured. To overcome this problem X. Lian et al. proposed an omnidirectional vision
system, where a vertically mounted laser sensor is used to acquire the geometrical data [132].
After joint calibration of the vertical laser sensors and omnidirectional camera, the authors
combined the extracted laser points with the corresponding pixels in the panoramic images. As a
result, they were able to reconstruct the 3D model by merging the range data and color
北京理工大学博士学位论文!
45
information (see Figure 2.17B). However, the reconstruction process of the whole environment
would be time-consuming as it additionally depends on the movement of the mobile robot. In
order to reduce the reconstruction time, it is considered necessary to review other methods.
Figure 2.17: 3D reconstruction based on the rotated structured light.
2.3.5 Deep learning
In recent years the advancement of deep learning has been applied to field of research on
structure reconstruction of indoor scenes [17, 18]. The main advantage of the proposed methods,
that it is possible to recover the 3D layout of an indoor scene from an image captured from a
single position in space (see Figure 2.18). The main limitation is that depth data is not involved
in the reconstruction procedure. Thus, it is still a relevant research problem of how to create a
reliable 3D digital representation of the indoor environment with the less input data from the
vision sensors.
北京理工大学博士学位论文!
46
Figure 2.18: Layout recovery with deep learning.
2.4 Conclusion
In this Chapter we conducted analysis of the existing methods for calibration of the vision
systems with the further reconstruction of the indoor environment. Particular limitations of those
techniques have been listed as well as directions on their improvements. It is also worthwhile
mentioning that as we operate inside the indoor environment, therefore particular attention
deserves configurations of the vision system, which can be significantly simplified from those
considered in the literature review. Moreover, in this Thesis we will also focus on the creating
process of the simulation environment, which can be helpful for testing theories before their
practical applications.
北京理工大学博士学位论文!
47
Chapter 3 Omni-vision System Research Platform
3.1 Main Contributions of this chapter
In this Chapter we consider and analyze the most popular calibration method, which is
based on the perpendicular checkerboard patterns, this method was adopted from the [107]. This
analyzing process was carried out for understanding limitations of currently existed methods in
order to propose more reliable and robust calibration technique. Contributions of this Chapter are
five-fold:
Presenting an improved omni-vision system.
Calibration of the proposed omni-vision system with existed calibration methods.
Analyzing process of existed calibration methods.
Presenting the simulation environment.
Testing of the simulation environment with existed navigation algorithms.
3.2 An Improved Omnidirectional Vision System
In this Thesis for eliminating lacks of previous vision systems [50-54], namely the
inability to carried out extrinsic calibration we decided to propose a different configuration of it.
Thus, by replacing several laser emitters with the one omnidirectional laser emitter (see Figure
3.2), it is possible to create one single laser plane, which can be calibrated by adopting
algorithms of existing methods [100-120]. Certain modification of the vision system was made
with regards to the camera.
A
B
C
Figure 3.1: Configuration of the vision system. A fisheye camera Ricoh Theta S. B omnidirectional laser emitter.
C – snapshot captured by fisheye camera.
北京理工大学博士学位论文!
48
Conventional cameras are generally seen as perspective tools (pinhole model), which is
convenient for modeling and algorithmic design. Moreover, they exhibit small distortions and
thus acquired images can easily be interpreted. Unfortunately, conventional cameras suffer from
a restricted field of view (see Figure 3.2 A). For example, a camera with 1/3-inch sensor and
8mm lens can only provide about 50 in the horizontal angle and 40 in vertical angle. So, the
vision system with this kind of cameras is easy to miss the concerned objects or their feature
points in a dynamic environment. Furthermore, it is often difficult to obtain sufficient overlap
between different images, which will lead to troublesome when performing the vision tasks. This
problem can be generally solved if the view field of the system is enlarged. In the literature,
several methods have been proposed for increasing the field of view [133]. The first method is to
simultaneously use a set of conventional cameras and combine their view fields together. The
main problem in this kind of vision system is how to synchronously control those cameras. The
second method consists on using a moving camera or a system made of multiple cameras [134-
138]. However, the process of involving a large number of elements to the system makes its less
robust and reliable, as well as increase the total cost.
Thanks to developments in optics manufacturing, and to the decreasing prices in the
cameras’ market, catadioptric cameras (see Figure 3.2B) and dioptric omni-directional (fisheye)
cameras (see Figure 3.2C) are being more and more used in different research fields. A
catadioptric camera consists of combining a conventional camera and mirrors. A fisheye camera
is an imaging system that combines a fisheye lens and a conventional camera.
A
B
C
Figure 3.2: Types of Cameras. A Perspective camera. B – Catadioptric camera. C – Fisheye camera.
The camera-lens-mirror system is considered as the third category, which composes with
the traditional camera and curved mirror with special shape to enhance the sensor’s field of view.
Since both reflective (catadioptric) and refractive (dioptric) rays are involved, the system is also
named as catadioptric camera system or omni-directional system. Compared to the traditional
北京理工大学博士学位论文!
49
system with narrow field of view, there are several advantages for such systems: Firstly, the
search for feature correspondences is easier since the corresponding points do not often
disappear from the images; Secondly, a large field of view stabilizes the motion estimation
algorithms; Last but not least, more information of the scene or interested objects can be
reconstructed from fewer images. Many different types of mirrors can be employed in such
system, including planar mirrors, elliptical mirrors, parabolic and hyperbolic mirrors, etc.
Accordingly, they can be categorized into parabolic camera system (combining a parabolic
mirror with an orthographic camera) and hyperbolic camera system (a hyperbolic mirror with a
perspective camera), etc. On the other hand, depending on whether or not all the incident rays
will pass through a single point named center of projection, the system can be classified as
noncentral projection system and central projection system. Everything has its advantages and
disadvantages. In the noncentral system, the relative position and orientation between the camera
and the mirror can be arbitrary which allows zooming and resolution enhancing in some selected
regions of the image. However, such flexible configuration results in high complexity for system
modeling. Therefore, there is no robust linear calibration algorithm for them and thus their
applications often require less accuracy. The resulting imaging systems have been termed central
catadioptric when a single projection center describes the world-image mapping [139-141]. In
[142], a projection model valid for the entire class of central catadioptric cameras has been
proposed. According to this generic model, all central catadioptric cameras can be modeled by a
central projection onto a unitary sphere followed by a perspective projection onto the image
plane. This cost brings a closed-form expression for the projection modeling, which maps 3D
space points to image pixels. Hence, the complexity of system modeling has been considerably
reduced.
In nature, most species with lateral eye placement have an almost spherical view angle,
for example the insect eyes. In the vision community, the large field of view can be achieved
with pure dioptric elements. Here, this kind of system falls into the last category, i.e. a camera
system with fisheye lens. The distinct advantage of such system is that it itself can provide
extremely wide view angle. Hence, they can provide images of large areas of the surrounding
scene with a single shot. The fundamental difference between a fisheye lens and the classical
pin-hole lens is that the projection from 3D rays to 2D image in the former is intrinsically non-
perspective, which makes the classical imaging model invalid. As a result, it is difficult to give a
北京理工大学博士学位论文!
50
closed-from model for the imaging process. In the literature, the Taylor expansion model,
rational model or division model is frequently adopted for such vision system.
Miniature dioptric and catadioptric cameras are now used by the automobile industry in
addition to sonars for improving safety, by providing to the driver an omnidirectional view of the
surrounding environment. Miniature fisheye cameras are used in endoscopes for surgical
operations or on board microaerial vehicles for pipeline inspection as well as rescue operations.
Other examples involve meteorology for sky observation. Roboticists have also been using
omnidirectional vision with very successful results on robot localization, mapping, and aerial and
ground robot navigation [143-148]. Omnidirectional vision allows the robot to recognize places
more easily than with standard perspective cameras [149]. Furthermore, landmarks can be
tracked in all directions and over longer periods of time, making it possible to estimate motion
and build maps of the environment with better accuracy than with standard cameras. For an in-
depth study on omnidirectional vision, we refer the reader to [150-152].
By analyzing catadioptric and dioptric cameras in more details we decided integrate to
our vision system a dioptric camera. The main disadvantage of catadioptric cameras lies in the
image taken by them, as it has a large dead area in the center (see Figure 3.2 B), which can be a
huge drawback. Such sensors also have the drawback of requiring a mirror, which results in a
more cumbersome and more fragile sensor imaging system. Therefore, we decided to integrate
the fisheye camera to our vision system (see Figure 3.2).
3.3 Intrinsic Calibration
In this Thesis we propose a calibration method for obtaining extrinsic parameters of the
vision system, which assumes that the intrinsic parameters of the camera are already known.
However, it is important to get reliable calibration results here, because the accuracy of the
extrinsic parameters depends on the intrinsic ones as well. Therefore, this Section aims to
investigate existed calibration methods, evaluate their merits, select the most reliable calibration
method.
3.3.1 Calibration principle
The camera calibration is an important step towards structure from motion, automobile
navigation and many other vision tasks. The intrinsic calibration of a sensor consists of providing
a model to interpret the raw measurements, so that the data is put in correspondence with world
北京理工大学博士学位论文!
51
properties. Examples of intrinsic parameters can be named as the focal lengths, the skew factor,
the principal point, and its distortion parameters. Such parameters are usually provided by the
manufacturer; however, sometimes it is crucial to estimate these intrinsic parameters in order to
model deviations from the construction parameters or particular circumstances of the system.
The intrinsic parameters of the camera have to be estimated before conduction of the extrinsic
calibration since it is needed to interpret the sensor measurements.
The pinhole camera model accompanied with lens distortion models is a fair
approximation for most conventional cameras with narrow-angle or even wide-angle lenses
[153-155]. But it is still not suitable for fish-eye lens cameras. Fish-eye lenses are designed to
cover the whole hemispherical field in front of the camera and the angle of view is very large,
about 180o. Therefore, Kannala and Brandt [156] proposed a new model. However, their
perspective projection model had a number of drawbacks, so later another more appropriate
model was proposed [157], where an equidistant projection model was proposed. The equidistant
model differs from the regular radial distortion model by representing the distortions from the
shifted center of the image. There are a number of other models related to the use of polynomial
or rational functions for various projection models: the spherical projection model, where
straight lines are projected on a circle in the image in spherical perspective [158], the perspective
projection model, and the use of polynomial functions to optimize the shape of distortions [159].
In General, calibration methods can be divided into two groups:
marker-based calibration;
autocalibration.
In turn, marker-based calibration can be divided into calibrations that use 2D patterns
[160-165] and calibrations using 3D patterns [166]. In contrast to the marker-based calibration,
auto-calibration refers to self-calibration techniques that do not use markers. Most of these
techniques require more than one image per scene or a special scene structure. This is often
because many cameras have an epipolar geometry. Other techniques require the presence of
straight lines on the scene, detecting which you can determine the parameters of distortion.
Using different models to calibrate fisheye cameras requires appropriate software to carry
out the necessary calculations. In [167], a calibration toolbox is proposed for estimating camera
parameters in Matlab. It can be used to calibrate and evaluate parabolic, catadioptric, dioptric
camera types. Diopter cameras can be used to calculate fisheye cameras, as they only use optics,
北京理工大学博士学位论文!
52
as opposed to using mirrors in catadioptric cameras. As described in [168], a projection model
was used in which the points are projected onto a single sphere with subsequent projection onto a
normalized image plane, allowing for a shift of the design center on the image plane. A flat
calibration grid is used for the calibration procedure, the center of the image is used as the main
point estimate for the diopter camera model. The Mei projection model projects 3D points as
shown in the figure below in five steps:
Step 1. The Point is projected onto a unit sphere.
Step 2. Points are shifted.
Step 3. The points are projected onto the normalized image plane.
Step 4. Radial offset is added.
Step 5. The point is projected onto the images.
Figure 3.3: Mei’s model projection steps.
Another calibration tool offered by Scaramuzza is the Matlab calibration tool [169]. The
tool is applicable to any panoramic camera, including fisheye cameras. In many ways, it has the
same functionality as the one mentioned above, but allows for more precise adjustment of
external parameters (offset and rotations), as well as the elimination of various internal
distortions. During literature review also an extension for this toolbox was found, according to
which it becomes possible to achieve more stable, robust and accurate calibration results was
proposed in the paper [170]. Authors replaced the residual function and joint refinement of all
parameters. In doing so, they achieved more stable, robust and accurate calibration results and
reduced the number of necessary calibration steps from five to three. Experiment results showed
北京理工大学博士学位论文!
53
that a significant performance increased in comparison with other calibration methods [162, 171].
Therefore, this extension is used in this proposal in order to obtain intrinsic camera parameters.
3.3.2 Camera model
Camera calibration process can be carried out by using model proposed by Scaramuzza et
al. in [171]. The used model assumes that the camera is a central system, i.e., all the perceived
rays intersect at a single point. The model is rotationally symmetrical with respect to the z axis. It
maps a 2D point (u, v) in a virtual normalized image plane to a 3D vector P (direction of ray)
through a polynomial function f as shown in Equations (3.1) and (3.2). The degree of the
polynomial can be freely chosen, and a quartic equation model proved to be a good compromise
as stated by Scaramuzza and measured in our own experiments.
λ
λ*
/
(3.1)
where λ is a depth scale and the vector p emanating from the
and
represent the point in the
camera image plane to a scene point X expressed in homogeneous coordinates; P is the
perspective projection matrix; the polynomial
which approximates the function that back
projects every pixel point into the 3D space, has the following form:
#
%
,
9#
%
#
%
(3.2)
where
– coefficients; N – degree of the polynomial to be determined by the calibration;
and
represent the coordinates center of an omnidirectional image.
Small misalignment between sensor, lens or mirrors are modeled using an affine 2D
transformation as shown in Equation (3.3), where (u’, v’) stands for the real distorted coordinates
in the sensor image plane and (u, v) are ideal undistorted ones in a virtual normalized image
plane; c, d, e are the affine transformation parameters.
<
=
>
C<
=
<
= (3.3)
Equation 3.1 can be rewritten as:
北京理工大学博士学位论文!
54
*
/
D
J
K
L
L
L
M
Q
R
R
R
S
T
W
K
L
L
L
M
Q
R
R
R
S
88888#XYZ%8
whrere
,
,
,
are the column vectors of the transformation matrix related to the rotation
and translation parameters respectively.
Checkerboard pattern is the planar object where we can detect X, Y coordinates and z
value is constant (
), therefore coordinate Z doesn’t change and becomes equal to zero and at
the same time column vector
of the matrix P is equal to zero as well. Therefore, Equation 3.4
transforms to:
λ+
,'(/
08)λ+
,]8 +1
&
-1
&
;"6;#$1
&#676;2$1
&28^)_&'`1
&
)T[-
&[#
&[.
&\&W'K
L
L
L
M
a1
&
b1
&
c
BQ
R
R
R
S
)T[-
&[#
&\&W'Da1
&
b1
&
BJ
(3.5)
Before representing how to arbitrate the extrinsic parameters, it is crucial to eliminate the
dependence from the depth scale λ. It is done by multiplying right and left side of equation
vectorially by
:
λ
+
,'(+
,e(+
,)(+
,e
T
f-
&f#
&g&
W
'
D
a1
&
b1
&
B
J
)c
h
]
8+1
&
-1
&
;"6;#$1
&#676;2$1
&28
^
e
T
f-
&f#
&g&
W
'
D
a1
&
b1
&
B
J
)c
(3.6)
Now, let us focus on a particular observation i of the calibration pattern. From Equation (3.6), we
have that each point
on the pattern contributes three homogeneous equations (we removed
superscript i to facilitate the reading):
-1
i
[.-a16[.#b16j.
k
:l
i
$1
ki
[#-a16[##b16j#
k
)c
(3.7)
北京理工大学博士学位论文!
55
l
i
$1
ki
[--a16[-#b16j-
k
:+1
i
[.-a16[.#b16j.
k
)c
(3.8)
+1
i
[#-a16[##b16j#
k
:-1
i
[--a16[-#b16j-
k
)c
(3.9)
with
i
k
. Observe that here
,
,
are known and so are
,
.
Also, observe that only (3.9) is linear in the unknown
,
,
,
,
,
. Thus, by stacking
all the unknown entries of (3.9) into a vector, we can rewrite Equation (3.9) for L points of the
calibration pattern as a system of linear equations:
n'o)c
(3.10)
where
T
W
and
*
/
A linear estimate of H can be obtained by minimizing the least squares criterion min
r
r
, subject to r
r
. This is accomplished by using the Singular Value
Decomposition (SVD). The solution of Equation (3.10) is known up to a scale factor which will
be arbitrated unambiguously since vectors
,
are orthonormal. Because of the orthonormality,
the unknown entries
,
can also be computed uniquely. Calibration allows us to determine
the extrinsic parameters
,
,
,
,
,
,
,
for each pose i of the calibration pattern
other than for the translation parameter
. Once
i
k is known parameter
can be estimated
as well.
3.3.3 Implementation
This Subsection explains how the intrinsic camera parameters were estimated. For
calibration procedure we followed a similar approach as the one introduced in [169] by D.
Scaramuzza. Several important advices towards calibration process were found in there, such as
the considered number of the checkerboard pattern poses (from 6 to 10 should be enough) and
the pattern positions (the pattern should be more visible for the camera while poses should cover
all of the visible camera’s area, e.g. from all around the camera lens). Based on these
recommendations, 9 snapshots were captured as shown in Figure 3.4. Once these images were
北京理工大学博士学位论文!
56
captured, the calibration process was carried out by means of a Toolbox extension [170]. The
configurations of the simulation and real environments are depicted in the Table 3-1.
Figure 3.4: Pre-calibration procedure: preparing images for the calibration procedure.
Table 3-1: Configuration parameters
Image resolution, pixels
1920x1920
Checkerboard patterns:
Pattern size
9x6
Square size, mm
36x36
3.3.4 Results
After calibration procedure, the above-mentioned Toolbox [169] can be used for
estimating calibration results. The average error (mean of the reprojection error computed over
all checkerboards) and the sum of the squared errors (sum of squared reprojection errors) are
presented in the Table 3-2. Figure 3.5 shows the distribution of the reprojection error of each
point for all the checkerboards. Different colors refer to the different images of the checkerboard.
This data reveals that more accurate calibration results were obtained in the case of the simulated
data. This might be related to the higher image resolution. Moreover, the fisheye camera placed
in the simulation environment does not have any distortion in contrast to the real cases.
北京理工大学博士学位论文!
57
Table 3-2: The Evaluation of the Intrinsic Camera Calibration
Average error, pixels
0.357560
Sum of squared errors
115.091981
Figure 3.5: The evaluation of the intrinsic camera calibration.
3.4 Preliminary Work on Extrinsic Calibration
3.4.1 System model
The omnidirectional vision system consists of the fisheye camera and structured light.
The primary objective of this system is to obtain the distance information from the mobile robot
to the surrounding obstacles located nearby. The distance information can be obtained by
processing the laser features from the given image. The camera model was considered in the
previous Section. By taking this information into consideration, the equation of the laser plane
projection can be written in the following way:
*
+
-
"#$
%/
e
T
[-
'[#
'[.
'
WT
[-
5[#
5[.
5j5
WD
a
b
m
B
J
)c
(3.11)
In Equation (3.11) we separated orientation matrix of the camera, represented by vectors
and transformation matrix of the laser plane, represented by vectors
and
. The laser plane
is located on the constant from the camera optical center distance, which corresponds to the 3rd
北京理工大学博士学位论文!
58
row of the column vector
. Therefore, world coordinates along Z-axis (if the camera looks
forward the floor, see Figure 3.2) do not change, which means that
in Equation (3.11).
Owing to the above-mentioned condition, i.e., Z=0, Equation (3.11) can be transformed into:
*
+
-
"#$
%/
e
T
[-
'[#
'[.
'
WT
[-
5[#
5j5
W*
a
b
B
/
)c
(3.12)
Moreover, 1st and 2nd rows of the
vector represent offset between the origin of the
camera and the origin of the laser plane and both rows are equal to zeros. The value of this offset
is significant to know for reconstruction tasks, where laser plane should be rotated per each
scanning frame. However, as for 2D mapping, laser plane is fixed and only its orientation and
distance to the camera origin along the Z-axis are needed to be known. Consequently, as for
extrinsic calibration of the vision system, we need to estimate the orientation of the camera as
well as the laser plane and obtain the distance information between the vision sensors.
Afterwards, the results of this calibration can be verified by mapping of the indoor environment.
Equation (3.12) can be rewritten after multiplying the camera rotation matrix by the
transformation matrix of the laser plane, whereas the relationship between the world and image
points will have the following form:
*
+
-
"#$
%/
e
*
v-- v-# v-.
v#- v## v#.
v.- v.# v..
/*
a
b
B
/
)c
(3.13)
After conducting the necessary mathematical operations with Equation (3.13) it will be
represented as a series of equations:
-
#
v.-a6v.#b6v..
%
:"#$
%#
v#-a6v##b6v#.
%
)c
(3.14)
"#$
%#
v--a6v-#b6v-.
%
:+
#
v.-a6v.#b6v..
%
)c
(3.15)
+
#
v#-a6v##b6v#.
%
:-
#
v--a6v-#b6v-.
%
)c
(3.16)
World points X, Y can be calculated from the above equations. Equations (3.15) and
(3.16), might be considered more carefully:
w
;-a6x-b6?-)c
;#a6x#b68?#)c
(3.17)
where
%
;
%
;
%
;
;
;
.
北京理工大学博士学位论文!
59
Finally, each world coordinate of the laser’s projection represents by the distance from
the camera to the laser plane (Z-coordinate) and by two points (X, Y) after solving system of
Equations (3.17):
a)#:?-:x-b%y8;-
; Y = (
;#?-:;-?#%y#8;-x#:;#x-%
(3.18)
3.4.2 Calibration procedure
Step 1. Preparing images for calibration procedure:
- the first image should include calibration target based on two perpendicular
checkerboard patterns for conducting calibration procedure (see Figure 3.6). We also
placed to our environment two additional checkerboard patterns on the distances
different to the target. Because of the lack of the ground truth we use these additional
patterns for mapping verification of the calculated extrinsic parameters;
- the second image should include laser beam belonging to the checkerboard patterns
(see Figure 3.6).
Figure 3.6: Calibration images. . Left image shows the single checkerboard patterns. Right image shows the
checkerboard patterns with the laser beam.
Step 2. Estimating camera orientation with regard to the target:
- first of all, we have to extract checkerboard points from the target;
- secondly, as we know square size of the pattern, we can take this information into
consideration and project these image points to the world ones. Here we do not know
the orientation of the target, therefore in the rotation matrix of the camera in Equation
(3.12) we set up orientation parameters equal to zeros. The projection results are
shown below:
北京理工大学博士学位论文!
60
Figure 3.7: Initial projection of the target (up- and front-views).
- From the projection presented in Figure 3.7, by simple geometry it is possible to
calculate orientation of the camera (pitch, roll, and yaw) with regard to the target
around corresponding axes: X, Y, and Z. Once calculations are done, previously
defined zeros in Equation (3.12) can be replaced with found angles. The projection
results are shown below:
Figure 3.8: Calibrated projection of the target (up- and front-views).
Step 3. Extracting laser beam from the second image:
- once we estimated position and orientation of our target, we can move to the 2nd
image and extract laser beam from it with the following algorithm:
a. laser strip segmentation by thresholding;
b. applying morphological operation such as skeletonization;
北京理工大学博士学位论文!
61
c. mask containing checkerboards border was added, in order to work only with
the region belonging to the pattern;
d. for extracted laser points belonging to the checkerboard patterns two curves
were fitted, one for each pattern;
e. points from these curves were summarized in the two arrays;
f. after that, processed previously image points can be projected to the world
ones.
The results are shown below:
Figure 3.9: From left to right: Input image; Extracted laser beam; Projection of the laser and points of the left
checkerboard pattern; Projection of the laser and points of the front checkerboard pattern.
Step 4. Fitting plane to the laser points:
- Among laser world points we will fit plane and find its incline by simple geometry.
As a result, we will obtain orientation of the laser plane (pitch and roll) as well as the
distance between the camera and laser plane.
Figure 3.10: Fitting plane to the laser points.
北京理工大学博士学位论文!
62
Step 5. Mapping verification:
- Once extrinsic parameters were estimated we can carry out mapping for evaluating
the quality of the calibration. In order to highlight the importance of the calibration
we depicted calibrated and not-calibrated map (see Figure 3.11). From this figure it
can be seen that extrinsic parameters were found correctly, as the calibrated map
match the structure of obstacles on the input image. Comparison of experiment
distances with the real ones is presented in the Table 3-3.
Figure 3.11: Verification by mapping. Left image shows checkerboard patterns with the laser beam. Midle image
shows not-calibrated map. Right image shows calibrated map.
Table 3-3: The Evaluation of the Extrinsic Calibration
Real
Experiment
Absolute Error
Right obstacle, mm
410
416
6
Bottom obstacle, mm
630
641
11
3.4.3 Discussion
This Subsection discusses main drawbacks which were found while working with the
calibration method based on the checkerboard patterns:
Table 3-4: Summary of the considered calibration method
Parameter
Description
Noise
One of the problems of existed techniques included target to the calibration procedures [105-120] is
that these techniques are not robust to the noise. These methods are based on converting images to
the binary ones. Thus, noise pixels may prevent algorithms to extract interested points from the
target. For example, when image containing the checkerboard pattern in combination with the noise
was analyzed by the calibration Matlab Toolbox [169] the following message was obtained: “Image
omitted -- Not all corners found.”
Complexity
Laser plane calibration by checkerboard patterns requires additional steps, which make the
北京理工大学博士学位论文!
63
calibration process more complicated. Namely, in order to obtain equation of the laser plane with
regard to the camera, checkerboard pattern should be moved to the different locations. For every
pattern position two snapshots should be taken: one with the laser beam (for laser extraction) and
another one without the laser beam (for obtaining pattern points). Otherwise, pattern points might be
extracted in a wrong way.
Illumination
Image has to be bright in order to extract points of the checkerboard pattern. Image has to be dark in
order to extract laser beam.
Limitations
Moreover, when calibration of the vision system with the structured light is carried out by
checkerboard patterns, we are losing laser information belonging to the black squares, where the laser
beam is not visible.
In the Section 3.1 we presented an improved vision system, consisted of the single
omnidirectional laser emitter. The proposed vision system showed a positive outcome towards
mapping of the indoor environment. Recovering a correct geometrical structure of the
environment was possible by calibration, which shows the importance of obtaining extrinsic
parameters before conducting certain measurements. At the same time, particular drawbacks of
the state-of-the-art calibration method were found (see Table 3-4). Consequently, it is crucial to
propose new calibration methods for obtaining extrinsic parameters. Moreover, during
experiments in this Section we faced particular difficulties of evaluating parameters of extrinsic
calibration as well as providing ground truth data for evaluating experiment results. Thus, in the
next Section we focus on building of the simulation environment able to provide more
opportunities for testing theories, methods and validating experiment results.
3.5 Development of the Simulation Environment
3.5.1 Motivation
One of the main goals of the proposed simulation environment is to provide greater
opportunities for testing theories and algorithms by researchers and engineers working with
omnidirectional vision systems. They might have software solution but might suffer from the
lack of the hardware. Moreover, empowering omnidirectional vision system with CV capabilities
(e.g. SLAM, path planning, semantic segmentation, etc.) is becoming a very important research
direction in the field of mobile robots. However, in real world applications it might be difficult or
even impossible to generate ground truth data for comparative analysis with the experiment data.
For example, the following issues may influence the ground truth data generation: drift of mobile
北京理工大学博士学位论文!
64
robot wheels, noise and drift of sensors, accuracy of the semantically labelled data, measurement
uncertainty etc. Considering all the above-mentioned issues, we propose a high-fidelity
simulation environment which is aimed at bridging the gap between simulation and reality,
making it relevant for CV researchers and engineers. Our simulation environment allows
experiments to be performed in a cost-effective way; compared to real experiments. It is easier
and more cost-effective to set up; it works faster and is more convenient to use than physical
experiments. With the proposed simulator it is also possible to generate depth data, semantically
labelled data and path data, which can be tested within the photo-realistic simulated indoor
scenarios. This opens up new methods for evaluating performance across a diverse set of
experiments. Moreover, it can be used in combination with other educational programs, e.g.
Matlab.
Matlab is a widespread among universities software for teaching engineering and
computer science courses. Matlab is an interactive environment for programming, data analysis
and for the development and visualization of algorithms. An important feature of this program is
the possibility of working with a variety of toolboxes. These toolboxes are able to simulate
practical tasks, e.g. to model and program mobile robots. Robotics System Toolbox [172]
developed by Matlab team or Robotics Toolbox [173] developed by Peter Corke, both of them
can be named as compact education system with tutorials and educational materials. These
toolboxes have the following capabilities: planning the trajectory of mobile robots, creating
algorithms, localization, monitoring the trajectories and, in general, controlling the mobile robot.
The main drawback of them is the lack of realistic graphics and an appropriate interaction with
objects on the scene. Those drawbacks can be eliminated with our simulation environment. In the
experiment part of this Section we will show main capabilities of our simulator by navigation of
the mobile robot inside the indoor environment in combination with Robotics System Toolbox
and Robotics Toolbox. In doing so, we not only prepare testing environments for next Chapters,
but also users familiar with those toolboxes with the aid of our extension will be able to get
realistic rendering mode and interactive scenarios for operating of the mobile robot equipped
with the fisheye camera and structured light. Moreover, as we decided to operate with Robotics
System Toolbox and Robotics Toolbox, we conduct comparison analysis between them as well
as discuss main features and capabilities.
北京理工大学博士学位论文!
65
3.5.2 Simulation Environment
Simulation conditions with the environment similar to the real ones was built by Unity
(see
Real environment
Simulation environment
Figure 3.12). Unity is the world’s most widely-used game development platform. Unity
provides a marketplace in which users can contribute and sell assets to the community, e.g., an
indoor environment was taken from there and modified for our own needs.
Real environment
北京理工大学博士学位论文!
66
Simulation environment
Figure 3.12: Omnidirectional vision system. Left image shows elements of the system. Right image shows snapshot
captured by fisheye camera
The omnidirectional vision system proposed in this Thesis consists of the fisheye camera,
as well as the laser emitter, and can cover horizontal 360° scene distance information with the
one single image (see
Real environment
北京理工大学博士学位论文!
67
Simulation environment
Figure 3.12). Distance to the obstacles can be calculated by the calibrated vision system
and extracted pixels of the laser strip from the given image. The fisheye camera integrated to the
simulation environment is mounted above the mobile robot and has a field of view (FOV) equal
to 180°. It is noteworthy that the above-mentioned FOV can also be replaced with the 210° or
240° FOV cameras [100]. Usually, several laser emitters are used in order to create a continuous
projected line to cover the field of interest around the mobile robot. The distinctive feature of the
proposed vision system is that it only has one laser emitter which covers surroundings of the
mobile robot. In the simulation environment, the distance between the laser plane and camera
might vary. Both, camera and laser plane orientations, can be changed (pitch, roll, yaw).
3.5.3 Platform features
The proposed simulator can be simply installed and configured on Windows, macOS as
well as the Linux operating systems. The simulator package includes: (1) interaction with scenes
and objects, (2) communication with other programs through Transmission Control Protocol
北京理工大学博士学位论文!
68
(TCP/IP), (3) fully implemented applications related to the calibration of the vision system as
well as 2D/3D mapping. All these features allow researchers to configure their own experimental
setups and design better algorithms for these purposes. An attempt has been made to create
realistic scenes by using rendering capabilities such as light sources, reflections, shadows etc.
Figure 3.13 shows a snapshot taken by our simulator illustrating these rendering capabilities.
Figure 3.13: Several modes supported by the simulator, from left to right: ordinary mode, semantic labeling mode,
depth mode.
Features listed above, can be helpful for testing theories and modelling systems, while
operating with types of sensors not included to the Robotics System Toolbox and Robotics
Toolbox before. The simulator supports two modes: manual mode and automatic mode, which
represents communication with other software via TCP/IP. In this Thesis we consider navigation
inside the indoor environment based on the Matlab toolboxes and proposed extension, for
performing these programs, functions presented in Table 3-5 are required.
Table 3-5: Operating Modes of the Simulator
Action
Manual mode from the user interface (UI)
of the simulator
Communication with other
programs by TCP/IP
Image capturing
Image can be saved with the corresponding
Image can be obtained in
北京理工大学博士学位论文!
69
button. Section “Camera” (see Figure 3.13)
Matlab via TCP/IP.
Movements of the
mobile robot
Mobile robot can be controlled by keyboard or
input fields. Section “Robot” (see Figure 3.13)
Mobile robot can be controlled
in Matlab via TCP/IP
Laser activation
Laser can be controlled by Section “Laser
Plane”
Laser can be controlled in
Matlab via TCP/IP.
3.5.4 Capabilities
The simulator consists of several screens: simulation, calibration, and measurement tool.
Simulation is the main screen where experiments take place (see Figure 3.13). Elements included
in the simulation can be configured and controlled with the panels on the left and right. For
example, changing the resolution of the camera, its FOV, activating/changing/moving objects,
tracking of the mobile robot and so on.
The calibration screen consists of the camera a checkerboard pattern. By moving the
camera or checkerboard pattern it is possible to collect data for intrinsic calibration of the lens
(see Figure 3.14). Similar to the simulation screen it supports changing the resolution of the
camera, its FOV, and changing the relative size of the checkerboard pattern.
Figure 3.14: The calibration screen.
The measurement screen includes the measurement tool, which is highlighted in yellow
in Figure 3.15. By moving this tool, it is possible to measure distances from the camera to certain
objects in the simulated scene, which may not be presented in the panels of the simulation screen.
In Figure 3.15 this tool was moved to the laser strip associated with the sofa.
北京理工大学博士学位论文!
70
Figure 3.15: The measurement screen.
3.5.5 Map transform
This Section considers the process of obtaining map with its further transforming to the
formats supported by Robotics Toolbox and Robotics System Toolbox for navigation of the
mobile robot. Mapping is possible the aid of the fisheye camera and laser emitter included to the
simulation environment. The system model for the camera with Z-axis looking down the floor
was described in Section 3.3. Camera in Figure 3.13 has different configuration, namely Z-axis is
looking forward. This means that only coordinates belonging to obstacles, namely along axes Y,
Z are changing and the distance from the camera to the laser plane along X-axis do not change.
The last one is represented by the 1st row of the column vector tl. By knowing the fact that only
(Y, Z) coordinates are changing we can transform Equation (3.11) into:
*
+
-
"#$
%/
e
T
[-
'[#
'[.
'
WT
[#
5[.
5j5
W*
b
m
B
/
)c
(3.19)
By knowing intrinsic and extrinsic parameters of the vision system the image containing
laser beam belonging to obstacles can be transformed to the map. First of all, laser beam must be
detected on the fisheye image and extracted from it. This procedure can be carried out by
thresholding as the laser beam has its unique red color. Once the laser beam was extracted, we
can transform laser points on the image to the world coordinates by Equation (3.19).
Transformed coordinates are represented by blue dots (see Figure 3.16). Initial map presented in
Figure 3.16 contains negative coordinates of obstacles, which cannot be transformed to the binary
北京理工大学博士学位论文!
71
map of the Robotics System Toolbox and to the logical map of the Robotics Toolbox. Thus,
initial map has to be shifted for eliminating those negative coordinates. Shifted coordinates can
be obtained by Equation (3.20) and Equation (3.21) for coordinates along X- and Y-axes.
N67&89:; )a&6a<&=
(3.20)
O67&89:; )b&6b<&=
(3.21)
where Xshifted and Yshifted represent shifted coordinates; Xi and Yi represent current coordinates;
Xmin and Ymin represent the minimum coordinates. Finally, the shifted map is shown in Figure 3.16.
Figure 3.16: Left image shows initial map obtained with laser emitter; right image shows shifted map.
Once shifted map with positive coordinates was created, it can be converted to maps for
corresponding toolboxes. Firstly, the shifted map is converted to the format suitable for the
Robotics System Toolbox, namely to the binary map with the aid of the Matlab function
“binaryOccupancyMap”. The binary map is shown in Figure 3.17. Next, coordinates of the
binary map can be simply converted to the format suitable for the Robotics Toolbox, namely to
the logical map with the aid of the Matlab function “occupancyMatrix”. The logical map is
shown in Figure 3.17.
北京理工大学博士学位论文!
72
Figure 3.17: Left image shows binary map supported by the Robotics System Toolbox; right image shows logical
map supported by the Robotics Toolbox.
Once input maps for the corresponding toolboxes were created, we can move to the
navigation part, which is considered in the next Section. An important observation here should
be given to the trajectory points generated by those toolboxes. Namely, in order to control the
mobile robot inside the proposed virtual environment and navigate around obstacles (see Figure
3.13) the generated trajectory points must be converted back to the range of coordinates similar to
the initial plot (see Figure 3.16).
3.5.6 Experiment setup
The main goal of the experiment was to investigate the compatibility of our simulator
with the Robotics System Toolbox and Robotics Toolbox. It was decided to demonstrate this
compatibility by means of practical tasks, namely by navigation of the mobile robot inside the
indoor environment. Additionally, the comparison analysis among these two toolboxes is
provided. Before conducting experiments, vision system must be calibrated, the calibration
process itself will be presented in the next Chapter. In this Section as we only testing the
simulation environment, therefore we assume that parameters of the extrinsic calibration are
known, namely laser plane is parallel to the floor and distance between camera and laser plane is
known. These assumptions can be guaranteed by the virtual environment. The OCamCalib
toolbox was used for obtaining intrinsic parameters of the fisheye camera. More details about the
experiment are provided bellow.
北京理工大学博士学位论文!
73
In order to perform navigation of the mobile robot, several obstacles were placed inside
the indoor environment (see Figure 3.13). The initial position of the robot is (0; 0) meters, target
position is (-0.2; 2.5) meters (see Figure 3.16). Robotics System Toolbox uses binary matrix of
obstacles, whereas Robotics Toolbox uses two-dimensional matrix. Consequently, extracted laser
beam belonging to obstacles was previously converted to the formats readable by these toolboxes.
Next, path between start and target points was created with corresponding toolboxes. After that,
these trajectory points were converted back and transmitted via TCP/IP to the simulator for
controlling the mobile robot. Finally, experimental target position was compared with the
position used for experiment setup.
3.5.7 Experiment results
Visual results of the conducted experiment are presented in Figure 3.18 for the Robotics
System Toolbox and for the Robotics Toolbox respectively. From these figures it can be seen
that the mobile robot has reached the desired target position without any visual misalignments.
More accurate comparison analysis is presented in Table 3-6. From the values in Table 3-6 it can
be seen that the coordinates of the experiment are similar to the coordinates from the experiment
setup, the difference between them is very small, in the range of a few millimeters. Parameters of
the Table 3-6 also demonstrates that with the Robotics System Toolbox mobile robot reached
target position faster than in case with the Robotics Toolboxes.
Figure 3.18: Left image shows navigation to the target position with Robotics System Toolbox; right image shows
navigation to the target position with Robotics Toolbox.
北京理工大学博士学位论文!
74
Table 3-6: Comparison Analysis
Parameter
Robotics System Toolbox
Robotics Toolbox
Input target position, meters
(-0.200; 2.500)
Experiment target position, meters
(0.195; 2.505)
(0.199; 2.499)
Operating time, seconds
67.68
178.85
3.5.8 Discussion
In general, in can be mentioned that Robotics System Toolbox provides more powerful
functionality for modeling and controlling mobile robots. For example, as for navigation with
Robotics System Toolbox it was possible to take into consideration dimension of the robot in
order to do not collide with any obstacles. More details about experiment results and merits of
those toolboxes are discussed below:
Operating time: experiment result showed that both toolboxes can generate and provide
an accurate trajectory path to the target point. Meanwhile, for the Robotics System Toolbox this
target point was reached almost three times faster in comparison with Robotics Toolbox. The
reason for this may be related with the format of the navigation map. Robotics System Toolbox
uses binary matrix which occupies less space in memory and consequently processed faster than
ordinary two-dimensional matrix of the Robotics Toolbox.
Path modeling: in order to reach the target, Robotics Toolbox moves robot to the
neighboring cell with the smallest distance to the target. The process is repeated until the robot
reaches a cell with a distance value of zero which is the target. Meanwhile, Robotics System
Toolbox operates with more advanced path planner, which based on the Probabilistic Roadmap
(PRM). PRM path planner constructs a roadmap in the free space of a given map using randomly
sampled nodes in the free space and connecting them with each other.
Robot movements: instead of using discrete movements of the mobile robot provided by
Robotics Toolbox, Robotics System Toolbox can provide more realistic navigation scenarios,
namely with aid of the Pure Pursuit controller. This controller is used in order to drive the
simulated robot along the desired trajectory towards target point.
北京理工大学博士学位论文!
75
Chapter 4 Extrinsic Calibration and 2D Mapping
4.1 Main Contributions of this chapter
The contributions of our work can be summarized as follows. First, we present a novel
omnidirectional vision system with laser illumination in a flexible configuration while proposing
a suitable method for its calibration. In this dissertation, we consider the more realistic scenario,
i.e., a flexible configuration where both, the camera and laser plane are no longer considered
parallel to the floor. As for calibration, we propose a method based on the one single snapshot
and show that reliable measurements can be obtained for our vision system. It is noteworthy that
there is a different definition of the term “flexible configuration” in another work [174]. C.
Paniagua et al. proposed a new omnidirectional structured light system based on a conic pattern
light emitter in flexible configuration, which can be used as a personal assistance system.
Flexible configuration means that the system does not need to be calibrated beforehand, namely,
the relationship between the camera and laser plane can be obtained during usage, which is
possible by reconstruction of the projected conic and known distance between the camera and
laser emitter. This distance was calculated by attaching a small ball with a known radius to the
endpoint of the laser. According to the data presented in their paper, it can be seen that the
counter extraction of the ball does not provide an appropriate shape of a circle, which also shows
the difficulty of extracting small objects from the omnidirectional images. Thus, the process of
adopting calibration procedures, such as using a ball as the target [104], to the omnidirectional
images might not provide reliable calibration results, especially in the case of the vision systems
when the laser emitter is located at a significant distance from the camera. Other authors
presented a self-calibration technique using a projected grid pattern for one-shot scanning to
densely measure shapes of objects by a projector and camera [175]. However, these methods are
not suitable for line emitters, where image features are not enough for extrinsic self-calibration
by one single capture.
4.2 Proposed Method of Extrinsic Calibration
4.2.1 System model
The system model was described in Chapter 3, here we only depict the general equation
of the relationship between image and world points, namely Equation 3.2. From this equation we
北京理工大学博士学位论文!
76
know that the laser plane is located on the constant from the camera optical center distance. This
means, that distance to the laser plane along Z-axis is constant and coordinates of the laser beam
belonging to obstacles along axes X and Y are changing (see
Real environment
Simulation environment
Figure 3.12). To this particular condition corresponds the mapping equation (3.12).
4.2.2 Calibration procedure in a general form
In order to place the camera or laser emitter above the mobile robot, intermediate links
and joints are required. Therefore, it is difficult to adjust the sensors’ orientation in a desirable
way, e.g. parallel to the floor and, as a consequence, small deviations from these assumptions
might occur. That is why the calibration is important. Calibration process consists in finding the
camera rotation matrix and transformation matrix of the laser plane. In a general form, these
parameters can be efficiently found by solving the following optimization problem:
北京理工大学博士学位论文!
77
z
{
{
|
{
{
}
~•€>!?>"?3"
r
"
#
'p
T
58
8\5W
%r
#
ƒ2„…†‡I88"
#
'p
T
58
8\5W
%
)
*
+
-
"#$
%/
e'
T
58
8\5W
*
a
b
B
/
8
8888888888888888888888888•')8
T
[-
'[#
'[.
'
W
88
8888888888888888888888888
T
58
8\5W)8
T
[-
5[#
5[.
5j5
W
888888
(4.1)
4.2.3 Camera Calibration Placed to the Indoor Environment
Equation (4.1) provides substantial insights, especially on the evidence that in camera
extrinsic parameters we are only interested in its orientation parameters forming the camera
rotation matrix
, namely pitch, roll, and yaw. Figure 4. illustrates the box target developed for
the calibration procedure. This target is used in order to calibrate the camera and laser plane, the
calibration procedure of the last one is explained during the next step. Similar to the indoor
environment, the target has rectangular structure. Therefore, calibration process aims at finding
the camera orientation with regard to the target sides by computing pitch, roll, and yaw, which
minimize the Equation (3.12), whose formulation is depicted in Equation (4.1).
Figure 4.1: Calibration target. Green border shows target in a cross-section.
The target has white and black regions, and we are interested in the border extraction
between them, which is parallel to the floor. This means that orientation parameters of the border
included into the
matrix are equal to zero (we are using border here, in order to obtain the
relationship between known matrix
and unknown matrix
). Firstly, in order to show that the
projected border points will not have a desirable rectangular shape even when small errors in the
camera orientation are present. We set up the orientation parameters of the camera in
equal to
zero. Furthermore, since we are only interested in the orientation parameters, the distance
北京理工大学博士学位论文!
78
between the camera and border in
can be set up randomly (the geometry does not change).
Once this procedure has been done, the pixel coordinates are projected to the world ones by
Equation (3.12) and the mapping result is shown in Figure 4. (blue color). In Figure 4., it can be
seen that the projected points do not have a desirable rectangular shape and orientation, because
the parameters of the camera orientation (pitch, roll, yaw) are different from those previously set
as zero in the Equation (3.12). The desirable shape of the border projection is as follows:
Vectors
and
are collinear;
Vectors
and
are collinear;
Vectors
and
are orthogonal.
Figure 4.2: Blue initial projection. Red projection with the optimized camera pitch and roll.
In order to solve the minimization problem, we use the Matlab function “fmincon”,
which allows one to find the global minimum of a predefined constrained nonlinear function.
The minimized pitch can be found by making the projected vectors
and
collinear to each
other. The above-mentioned minimization problem can be written in a following way:
z
{
|
{
}
~•€@&9'7
r
"
#
Ž•j?v
%r
#
ƒ2„…†‡I88"
#
Ž•j?v
%
)‰ŠA
9
‰ŠA
#6‰ŠB
#:ΥA
9
ΥA
#6ΥB
#
:Xc•Ž•j?vXc•
(4.2)
where the vectors
and
depend only on the pitch; roll with yaw do not influence on
their collinearity, thus roll and yaw are equal to zero. The previously established limits
T
W for pitch in the minimization problem (4.1) and bellow for other orientation
parameters were written this way based on the evidence that for obtaining a better performance
北京理工大学博士学位论文!
79
of the vision system, these orientation parameters of the camera and the laser plane should not be
very significant in order to cover the wide range of the indoor environment. Therefore, these
limits have the above-mentioned form, by which the minimized orientation parameters can be
successfully found.
The minimized roll is found in a similar way as the pitch, but different vectors
and
are defined in this case. These vectors depend on the roll; thus, pitch and yaw are equal to
zero, respectively. The formulation has the following form:
z
{
|
{
}
~•€CD55
r
"
#
[“””
%r
#
ƒ2„…†‡I88"
#
[“””
%
)Œ‰B
9
Œ‰A
#6Œ‰B
#:•ŠB
9
•ŠA
#6•ŠB
#
:Xc[“””Xc
(4.3)
Afterwards, the previously defined zero values in the Equation (3.12) can be replaced
with the optimized pitch and roll. In Figure 4. (red color), it can be seen that now the projected
border points have an appropriate rectangular shape. However, one parameter is still unknown.
Thus, as a result the rectangular projection having a rotation around Z-axis (yaw), our goal is to
obtain this angle. The yaw is obtained by means of the minimization multiplication of the slopes
between the horizontal and vertical vectors
and
. These vectors depend on the yaw,
whereas pitch and roll are constant. The formulation is as follows:
z
{
|
{
}
~•€>!
r
"
#
•;–
%r
#
ƒ2„…†‡I88"
#
•;–
%
)Œ‰B
9
Œ‰A
#6Œ‰B
#'ΥA
9
ΥA
#6ΥB
#
:Xc•;–Xc
(4.4)
Finally, the desirable projection is shown in Figure 4..
北京理工大学博士学位论文!
80
Figure 4.3: Optimized camera yaw.
4.2.4 Laser Plane Calibration
In this Subsection, it is described how the orientation of the laser plane and distance
between the camera and laser plane are calculated. The pitch and roll determine the camera
rotation matrix
whereas the distance is associated with the translation represented by
. First
of all, the laser strip has to be extracted from the given image with the aid of the following
algorithm:
Laser beam segmentation by thresholding;
Applying morphological operation such as skeletonization;
Creation of the mask for the laser beam belonging to every side of the target (corners of
the target has blue patterns, see Figure 4., after their segmentation mask is created);
For extracted laser points belonging to the sides curves were fitted for the noise
reduction;
Afterwards, the previously processed image points can be projected to the world ones.
The angle values obtained during the previous step, which are related with the camera
orientation, are used here in order to calibrate the laser plane by using a similar approach as in
the previous Subsection. The only difference is that now we operate with the laser transformation
matrix. The initial projection of the laser strip with the known camera rotation matrix and
unknown laser rotation matrix (pitch and roll are equal to zeros) is shown in Figure 4. (blue color).
The laser plane does not have yaw, which means that once the pitch and roll are calculated, the
projected laser beam has an appropriate rectangular shape:
北京理工大学博士学位论文!
81
Vectors
and
are collinear;
Vectors
and
are collinear.
Figure 4.4: Blue initial projection. Red projection with the optimized laser pitch and roll.
By taking this information into consideration, the minimized pitch can be found by
making the projected vectors
and
collinear to each other. These vectors depend on the
pitch and roll, which are equal to zero, respectively, whereas
is known and constant. The
minimization problem can be written in a following way:
z
{
|
{
}
~•€@&9'7
r
"
#
Ž•j?v
%r
#
ƒ2„…†‡I88"
#
Ž•j?v
%
)—˜A
9
—˜A
#6—˜B
#:o™A
9
o™A
#6o™B
#8
:Xc•Ž•j?vXc•
(4.5)
The minimized roll is found in a similar way as the pitch, but different vectors
and
are used here. These vectors depend on the roll; thus, pitch is equal to zero, whereas
is
known and constant. The formulation has the following form:
z
{
|
{
}
~•€@&9'7
r
"
#
[“””
%r
#
ƒ2„…†‡I88"
#
[“””
%
)o—B
9
o—A
#6o—B
#:™˜B
9
™˜A
#6™˜B
#8
:Xc•[“””Xc•
(4.6)
Once the roll and pitch of the matrix
are calculated, the projected laser points have an
appropriate shape (see Figure 4.). However, one parameter related with the transformation matrix
of the laser plane is still unknown, i.e., the distance between the camera and laser plane. The size
of the target is known, hence the distance between the camera and laser plane can be obtained.
The surface can be calculated after measuring the known real distances between the sides of the
北京理工大学博士学位论文!
82
target – S1. This in turn, can also be calculated from the projected world coordinates of the laser
beam – S2. World points are calculated with the known matrices
and
. This means that S2
only depends on the distance associated to the matrix
. Thus, by minimizing the difference
between S1 and S2 in the objective function, it becomes possible to find the depending variable,
which represents the distance between the camera and laser plane. The aforementioned
procedure has the following form:
š
~•€;&69
r
"
#
@•›j
%r
#
"
#
@•›j
%
)œ-:œ#
œ#)#E6F
:G6H
%'#žF6žH
:žE6žG
%
(4.7)
where during the calculations of the parameter S2, in order to obtain difference between sides the
mean value of the coordinates in each case are used. The rationality behind the use of the mean
value can be perfectly understood by analyzing that the projected sides are not strictly parallel to
each other, as their coplanarity is achieved by minimization.
Figure 4. shows the achieved mapping results in the calibrated vision system, representing
measurements of the calibration target. It is worthwhile mentioning that the real size of the target
for that specific case was set equal to 1084.5x1084.5 mm. Similar results equal to 1084.9x1084.1
mm, were achieved experimentally by calibration.
Figure 4.5: Optimized laser distance.
4.3 Experiment Setup
The calibration of the aforementioned extrinsic camera parameters has been done once all
parameters of the camera rotation matrix and the laser transformation matrix of Equation (3.12)
北京理工大学博士学位论文!
83
are known. These extrinsic parameters cannot be measured precisely by real experiments, as a
result it is difficult to prove that the proposed methodology is reliable and robust, as well as
when comparing it with other methods. Thus, these experiments are carried out by the simulation
environment:
4.3.1 Self-evaluation technique
The experiments included in this Section were aimed at examining the robustness of the
proposed calibration method with regard to the noise, location of the target and to asses more
appropriate conditions for the calibration process itself. The initial configuration of the vision
system is the following: the distances between the parallel sides of the calibration target, which
was described are equal to 1084.5 mm; the distance from the camera to the left side is 646 mm,
to the front side is 512 mm and to the laser plane is 466 mm. The image resolution is 1920x1920
pixels. The calibration procedure was tested in different conditions as well as different
configurations, which are set out bellow (10 experiments were carried out for each of them):
Adding noise to the border pixels. The Salt & Pepper noise was added to the image (see
Figure 4.). Generally, this type of noise can be caused by defects of the camera sensor,
software failure, or hardware failure in image capturing or transmission [176]. To allow
the addition of noise, the Matlab function “imnoise” was used. The noise density “d” is
increased by 0.005 in each step. The noise affect can be approximately represented as
pixels, where the function “numel” returns the number of array elements.
The aim is to estimate the robustness of obtaining orientations parameters of the camera
with regard to the noise;
Adding noise to the laser pixels. Similar Salt & Pepper noise consisting of RGB pixels
was added to the image. However, as for extraction of the pixels of the laser beam, only
red pixels are significant. Thus, noise density “d” is increased to 0.01, in order to expand
the number of red pixels included to the noise. Pitch and roll of the laser plane are
calculated by Equation (4.5) and Equation (4.6), respectively. The known constant
parameters of the camera orientation were set up to these equations, as we estimating the
influence of the noise on the calibration of the laser plane.
Changing the distance between the camera and target in the horizontal direction. The
target is moving forward and right by 50 mm in each step and parameters related with the
北京理工大学博士学位论文!
84
extrinsic calibration were estimated. Pitch and roll of the laser plane depend on the
camera orientations parameters, as it was mention before. Thus, in order to do not include
redundant information, the analysis of the data was related with the extrinsic parameters
of the laser plane, which are based on the camera’s orientation parameters determined
during the calibration process. The next two follow a similar approach. It is also
worthwhile mentioning that the parameters of the camera orientation are presented when
the proposed method is compared with the state-of-the-art calibration technique, where it
can be analyzed how these parameters correlate between methods;
Changing the distance between the camera and target in the vertical direction. The
camera is moving up by 50 mm in each step and the parameters related with the extrinsic
calibration were estimated;
Changing the size of the target. The distances between the sides of the box target are
decreased by 50 mm in each step and the parameters related with the extrinsic calibration
were estimated.
4.3.2 Comparison with the state-of-the-art calibration method
The calibration included to this Section was conducted for 40 different configurations of
the vision system generated by the simulation environment, in order to verify how these changes
might influence the final calibration results. The orientations of the camera and laser plane were
set up randomly from -10 to 10 degrees. It was planned to understand what is the difference
between the value of the real extrinsic parameters, included to
,
,
and ones obtained
during the calibration procedure.
The proposed calibration method was compared with the state-of-the-art technique based
on the two perpendicular checkerboards (standard method), described in [107] and similar
configuration is shown in Error! Reference source not found.. Inside the simulation environment
the configurations of these two methods are depicted in the Table 4-1.
Table 4-1: The Configurations of the Calibration Methods
Target
Proposed
Box target
Standard
Checkerboard patterns
Configuration of the target
Proposed
Distances between parallel sides of the target are 1084.5 mm
北京理工大学博士学位论文!
85
Standard
Size of the checkerboard is 9x6. The square size is 97x97 mm
Location of the target
Proposed
Distance from the camera to the left side/pattern is 646 mm and to the
front side/pattern is 512 mm
Standard
Other configurations
Proposed
Distance from the camera to the floor is 927 mm and to the laser plane
is 466 mm. Image resolution is 1920x1920 pixels
Standard
4.3.3 2D mapping
The primary objective of the vision system is to obtain the distance information to the
obstacles in a correct way. In order to compare the mapping results between different methods,
four obstacles were placed in the simulation environment to the different from calibration target
distances. The configuration of the simulation environment is depicted in the Table 4-2.
Table 4-2: The Configuration of the Simulation Environment
Left
Bottom
Front
Right
Distance to the obstacles, mm
466
750
992
1584
The rationality behind the location of the obstacles at different distances can be
understood by analyzing that this approach reveals the performance of the calibration results in
the environment as a whole.
4.3.4 Real data
Finally, in order to show that our approach has actual practical implementation in real
scenarios, conditions similar to those described in the simulation environment were created with
real elements (see Chapter 3). The distance between the parallel sides of the calibration target
was equal to 390 mm. The image resolution equal to 960x960 pixels was generated by the
fisheye camera Ricoh Theta S. The laser emitter with a wavelength of 650 nm was chosen. The
distances from camera to the four obstacles was measured manually, because of the lack of the
ground truth. The configuration of the real environment is depicted in the Table 4-3.
Table 4-3: The Configuration of the Real Environment
Left
Bottom
Front
Right
Distance to the obstacles, mm
300
405
490
605
北京理工大学博士学位论文!
86
4.4 Experiment Results
The calibration procedure of the vision system and results validation were tested on the
data obtained from the simulated and real environments. Firstly, the results of the intrinsic
camera calibration are shown. Secondly, the error analysis of the extrinsic parameters is
presented. In order to conduct the comparison analysis of the calibration results between
different methods, only error analysis of the extrinsic parameters is required. However, as
mentioned before, the main objective of the vision system is an accurate measurement of the
distance to the obstacles, located around the mobile robot. Thus, the calibration results can also
be compared by means of the mapping of the indoor environment. Aiming at a better
understanding, we provide the visual data, for particular configurations of the vision system and
in order to show measurement uncertainty, the standard deviation is calculated among all
configurations, for each of the measured distances to the obstacle. After that, we show the
calibration results obtained by real data, where the calibration results are evaluated and the
mapping of the indoor environment is presented.
4.4.1 Self-evaluation technique
The proposed method searches for black and white pixels, after that, it extracts the border
between them. Once the border is extracted it also contains noise, but only the one which is
within the searching range of the black and white pixels. The noise pixels which are out of the
extracted border can be easily eliminated from the image. Furthermore, the noise points
belonging to the border line do not have an adverse impact on the final results, because the fitting
curve is mostly based on the majority of the border points, as it is shown for one of the sides of
the target in Figure 4.. Part 1 of the Table 4-4 also shows that when the noise density is increased,
the parameters of the camera orientation do not differ much from the initial state, which does not
contain noise.
北京理工大学博士学位论文!
87
Figure 4.6: Noise Salt & Pepper. Left image is related with the camera calibration. Right image is related with the
laser plane calibration. Magenta color shows extracted image points. Red curve is fitted among the majority of the
points. Blue points are the curve points, which are used for calibration and are also superimposed on the bottom
image, for better visual understanding.
Table 4-4: The Evaluation of the Intrinsic Camera Calibration
Absolute Error
1
2
3
4
5
6
7
8
9
10
Part 1. Adding noise to the border pixels
Camera
Pitch, degrees
0.052
0.053
0.051
0.043
0.044
0.065
0.046
0.053
0.052
0.077
Roll, degrees
0.033
0.033
0.033
0.034
0.028
0.023
0.015
0.015
0.012
0.038
Yaw, degrees
0.093
0.093
0.095
0.114
0.120
0.066
0.012
0.074
0.069
0.023
Part 2. Adding noise to the laser pixels
Laser
plane
Pitch, degrees
0.009
0.009
0.002
0.004
0.012
0.033
0.037
0.033
0.030
0.031
Roll, degrees
0.002
0.002
0.010
0.049
0.028
0.032
0.047
0.060
0.034
0.058
Distance, mm
2.419
2.406
2.412
2.322
2.238
2.061
2.173
2.293
2.069
2.368
Part 3. Changing distance between the camera and target in the horizontal direction
Laser
Pitch, degrees
0.086
0.101
0.060
0.030
0.053
0.079
0.086
0.081
0.102
0.107
北京理工大学博士学位论文!
88
plane
Roll, degrees
0.050
0.078
0.046
0.023
0.069
0.100
0.101
0.105
0.113
0.145
Distance, mm
2.243
2.189
2.363
2.495
2.201
2.963
2.870
2.745
2.541
2.625
Part 4. Changing distance between the camera and target in the vertical direction
Laser
plane
Pitch, degrees
0.086
0.063
0.074
0.070
0.059
0.038
0.036
0.039
0.042
0.022
Roll, degrees
0.050
0.057
0.023
0.010
0.018
0.023
0.032
0.022
0.051
0.034
Distance, mm
2.243
2.691
2.691
2.915
3.032
3.053
3.039
3.353
3.652
3.426
5. Changing size of the target
Laser
plane
Pitch, degrees
0.086
0.052
0.063
0.004
0.002
0.026
0.056
0.060
0.76
0.075
Roll, degrees
0.050
0.033
0.080
0.056
0.024
0.058
0.024
0.072
0.057
0.062
Distance, mm
2.243
2.677
3.024
3.162
3.394
3.466
3.572
3.585
3.584
3.712
Similar to the border between black and white regions, the laser beam also belongs to the
whole side of the target. During this step it is shown different approach from the camera
calibration. Namely, the border points for camera calibration were obtained with the eliminated
separate noise pixels. A different approach means that for laser plane calibration, noise pixels,
which are out of the laser beam, were not removed from the image (see Figure 4.). However, in
that case, curve is still appropriately fitting among the extracted points (laser points). Part 2 of
the Table 4-4 also shows that when the noise density is increased, the parameters of the laser
orientation do not differ much from the initial position, which does not contain noise.
By analyzing the experiment related with the changing distance between the camera and
target in the horizontal direction, it is worthwhile to mention that small translations from the
initial position provide similar results (see Table 4-4, Part 3). However, once the target was
moved farther, the quality of the calibration parameters slightly decreased. Thus, for the
calibration procedure it is recommended to locate the target closer to the camera.
The last two parts of the experiment are related with the movements of the camera up
from the target and with the reduction of the target size. Both of them provide quite similar
calibration results (see Table 4-4 Part 4 and Part 5). The parameters of the camera orientations
remain stable in the different configurations; these results also support the aforementioned
statement that the target should be placed closer to the camera along the horizontal direction.
However, it can also be seen that if the distance between the camera and the target along the
vertical direction is increased or the size of the target is decreased, in that case, the error related
with the distance between the camera and laser plane becomes bigger. Thus, the recommendation
for calibration is to keep the camera closer to the target with an appropriate size of the last one.
北京理工大学博士学位论文!
89
4.4.2 Comparison with the state-of-the-art calibration method
As for extrinsic parameters estimation, the mean absolute error (MAE) and the root mean
squared error (RMSE) among all of the configurations of the vision system were calculated for
the proposed method and the standard one for comparison analysis. The standard one method
was based on the checkerboard patterns. These values are depicted in Table 4-5. Moreover,
Table 4-5 also includes the values corresponding to the maximum absolute error (AE), which
was obtained during the specific experiment’s index (the index is shown in the brackets). For
these experiments corresponding to the maximum AE, mapping of the indoor environment is
provided in the next Subsection. Mapping is essential for conducting the visual comparison
analysis the different calibration methods. In Table 4-5, it can be seen that better calibration
results were obtained by the proposed method.
Table 4-5: The Evaluation of the Extrinsic Calibration
Camera Parameters
Laser Parameters
Pitch,
degrees
Roll,
degrees
Yaw,
degrees
Pitch,
degrees
Roll,
degrees
Distance,
mm
MAE
Standard
0.13
0.11
0.10
0.28
0.20
2.04
Proposed
0.04
0.02
0.03
0.04
0.04
1.49
RMSE
Standard
0.18
0.18
0.15
0.43
0.31
2.47
Proposed
0.04
0.03
0.04
0.05
0.06
1.70
Maximum AE
Standard
0.54 (21)
0.61 (10)
0.44 (10)
1.50 (13)
1.17 (10)
7.89 (10)
Proposed
0.11 (11)
0.10 (13)
0.09 (10)
0.15 (39)
0.19 (13)
2.58 (7)
4.4.3 2D mapping
In Table 4-5, the indexes associated to each particular experiment were illustrated.
Particularly, Figure 4. illustrates the corresponding maps for these configurations. For
configurations 7 and 39 both methods provide quite similar results. However, as for the rest part
of the maps, the difference between these methods seems insignificant for obstacles located at
small from the camera distances. On the other side, the difference in the outcome of these two
methods is quite significant for obstacles placed far away from the mobile robot. Accordingly, it
can be noticed that the proposed method provides better results, outperforming the standard one.
北京理工大学博士学位论文!
90
7. camera [0;5.5;0.5], laser plane [0;-1;0]
10. camera [-10;5;7], laser plane [-9;-5;0]
11. camera [9;0;7], laser plane [-1;7;0]
13. camera [0.5;8;9], laser plane [5.5;1;0]
21. camera [0;9;7], laser plane [-0.5;-7.5;0]
№39. camera [-2.5;1.5;5], laser plane [-4.5;3.5;0]
Figure 4.7: Mapping of the indoor environment. Red color Proposed method; Blue color Standard method.
Under maps it is shown index of the particular experiment with the configuration of the vision system, where values
in the brackets reveal [pitch; roll; yaw] in degrees respectively.
For better visual analysis more mapping samples are required and it is difficult to include
all of them in the Thesis. Therefore, the main comparison analysis was conducted by estimating
the extrinsic parameters and also measurement uncertainty is calculated bellow, whereas the
mapping of some configurations was added additionally, for better visual understanding of what
is the main goal of the vision system (obtaining distance to the obstacles in an appropriate way).
Table 4-6 shows the AE for distances from the mobile robot to the obstacles placed to the indoor
environment. The experimental distances were calculated as the mean value of all points
belonging to the particular obstacle. This approach can provide approximate understanding of
how these distances correlate between the different methods. For some configurations the
standard method provided more accurate results, however, the proposed method proved its
robustness among all the configurations.
Table 4-6: The Evaluation of the Mapping Results
Exp.
Left
Bottom
Front
Right
北京理工大学博士学位论文!
91
Real Distance, mm
466.00
750.00
992.00
1584.00
AE, mm
Proposed (Standard)
7
1.19 (0.44)
1.74 (0.74)
3.76 (2.59)
7.00 (0.6)
10
2.73 (8.46)
2.90 (5.63)
4.07 (35.20)
27.10 (113.10)
11
1.81 (4.99)
2.82 (21.15)
13.00 (23.00)
5.30 (60.00)
13
1.67 (15.87)
0.09 (13.22)
0.52 (9.40)
15.30 (159.30)
21
1.21 (16.62)
3.10 (0.32)
7.41 (20.30)
1.50 (134.80)
39
0.98 (1.22)
1.33 (1.67)
8.40 (0.33)
23.10 (3.50)
This Section also evaluates the calibration results by estimating the distances from the
mobile robot to the obstacles placed in the indoor environment. By changing the configuration of
the vision system (orientation of the camera and laser plane), the distance from the camera to the
obstacles does not change, because the simulation environment allows rotating the camera
around its optical center. By taking this advantage of the simulation environment into
consideration, it is possible to evaluate the calibration results by mapping in the appropriate way.
Thus, in order to estimate the quality of the measurements and compare the calibration results
between the different methods, the measurement uncertainty obtained by means of the standard
deviation is calculated in the following way:
&)8
¡¢
&:ž%
=
&I-
Ÿ:B
(4.8)
where
is the ith measured value;
is the mean measured value, which is taken among all the
laser points belonging to the particular obstacle;
is index of the experiment.
Table 4-7 presents the complete measurement results for the proposed and the standard
methods, respectively. In this table, it can be seen that the standard deviation becomes bigger
when the distance to the obstacle increases. However, it can also be seen that the better
measurement results were obtained by the proposed calibration method.
Table 4-7: The Evaluation of the Mapping Results
Real, mm
Proposed 𝒙 ± 𝝈, mm
Standard 𝒙 ± 𝝈, mm
Left
466.00
467.62 ± 0.98
467.84 ± 3.82
Bottom
750.00
751.47 ± 3.45
751.94 ± 6.43
Front
992.00
997.14 ± 2.62
999.67 ± 7.88
Right
1584.00
1589.14 ± 7.15
1597.68 ± 35.64
北京理工大学博士学位论文!
92
4.4.4 Real data
In the previous Section, the proposed calibration technique proved its robustness
according to the simulated data. In order analyze the practical application of the proposed
method in real scenarios, we recreated similar conditions to the simulation environment and
carried out the calibration procedure as well as the mapping. It is worthwhile mentioning that
during the experiments carried out in real environments we realized that the process of creating
the mask for the laser beam belonging to the target sides should be performed out in a different
way. We attached blue patterns to the corners, however the illumination of the real environment
was not enough for their detection. A solution was simply found by attaching black stripes to the
corners, in doing so the red beam becomes invisible when it belongs to these regions (see Figure
4.). This figure also provides useful insights on the fact that when the calibration of the vision
system with structured light is carried out by checkerboard patterns, we are losing laser
information belonging to the black squares. The proposed calibration target outperforms the
standard method, by taking an advantage of the continuous laser beam in all directions.
Figure 4.8: Calibration target. Green border shows target in a cross-section.
Figure 4. shows the results of the extrinsic calibration. Owing to the lack of the ground
truth, we cannot compare the experiment results with the real ones in the same manner as it was
carried out with the simulation environment. However, we know the real size of the target and
we have the final map (see Figure 4.C). Thus, we can estimate the calibration results by means of
this information as it is depicted in Table 4-8. Results show that the system was calibrated in a
quite accurate way. These results can be improved by achieving more accurate intrinsic camera
parameters and also by manufacturing more professional targets. A simple box was used in this
case, in order to show that the proposed method is available for everyone.
北京理工大学博士学位论文!
93
A. Blue initial projection [0;0;0]. Red
projection with the optimized camera
orientation [-3.24;4.70;-3.56]
B. Blue initial projection [0;0;0]. Red
projection with the optimized laser
orientation [2.81;2.61;0]
C. Optimized laser distance 215.96 mm. Box
size is equal to 390x390 mm
Figure 4.9: Extrinsic calibration.
Table 4-8: The Evaluation of the Extrinsic Calibration
Real
Experiment
Absolute Error
Width, mm
390.00
391.57
1.57
Height, mm
390.00
389.88
0.12
The calibration results were also evaluated by means of the mapping of the indoor
environment (see Figure 4.10). We achieved more appropriate geometrical structure of the
environment once the system was calibrated. The calibration process is not difficult (only one
single image is required) and at the same time it significantly improves the final mapping results.
This, indeed, can be seen by mapping verification (see Table 4-9).
A. Input image for mapping
B. Blue mapping of the not calibrated system. Red mapping of the
北京理工大学博士学位论文!
94
calibrated system
Figure 4.10: Mapping of the indoor environment.
Table 4-9: The Evaluation of the Mapping Results
Front, mm
Right, mm
Left, mm
Bottom, mm
Real
300.00
405.00
490.00
605.00
Experiment
303,20
411,56
498,27
593,77
AE
3,20
6,56
8,27
11,23
北京理工大学博士学位论文!
95
Chapter 5 Extrinsic Calibration and 3D Reconstruction
5.1 Motivation
In the previous Section we presented a novel calibration technique for obtaining extrinsic
parameters between the camera and laser plane. Our calibration method was based on the box
target (see Figure 5.6) and proved its effectiveness and robustness in comparison with other state-
of-the-art calibration methods. However, we realized there were limitations in this approach,
namely it is not suited to the configuration of the vision system proposed 3D reconstruction,
where camera is looking forward, consequently we lose part of the target (see Figure 5.6) required
for the calibration procedure. In this Section we consider an improved calibration technique and
estimate its robustness in comparison with our previous calibration method.
A
B
Figure 5.1: A shows previous configuration. B shows proposed configuration.
5.2 System Model
The system model was described in Chapter 3 and just a brief overview is presented in
this Section. The configuration of the proposed vision is different (the camera is rotated, see
Figure 5.6). How this difference affects the equations is explained bellow. World coordinates of
the laser plane (X, Y, Z) can be obtained by Equation (3.11).
The laser plane is located a fixed distance from the camera optical center: along the Z-
axis in Figure 5.1A and along X-axis in Figure 5.1B. For the proposed vision system this distance
corresponds to the 1st row of the column vector
Consequently, world coordinates along X-
axis do not change. As in Equation (3.11)
this equation ise transformed into Equation
(3.19).
北京理工大学博士学位论文!
96
5.3 Calibration Procedure
The extrinsic calibration procedure consists of finding the camera rotation matrix and
transformation matrix of the laser plane. These parameters can be found by solving the following
optimization problem:
~•€>!?>"?3"
r
"
#
'p
T
58
8\5W
%r
#
ƒ2„…†‡I88"
#
'p
T
58
8\5W
%
)
*
+
-
"#$
%/
e'
T
58
8\5W
*
b
m
B
/
8
')8
T
[-
'[#
'[.
'
W
88
T
58
8\5W)8
T
[#
5[.
5j5
W
888888
(5.1)
In order to solve this optimization problem for the proposed vision system an improved
calibration target was developed (see Figure 5.2). The main advantage of this target is its
versatility, as it can be applied to various configurations of a vision system, and its flexibility, as
it can be simply placed in front of the mobile robot. The proposed target allows an extrinsic
calibration to be performed by only capturing a single snapshot, this procedure is explained
bellow.
Figure 5.2: The proposed calibration target.
5.3.1 Extrinsic parameters of the camera
This Subsection explains the process of obtaining parameters forming the camera rotation
matrix
as described in Equation (5.1). In order to carry out the calibration process to
obtaining camera extrinsic parameters, first of all, pixel coordinates belonging to the border
(between white and black regions) of the target are projected by Equation (3.19) to the world
coordinate system (see Figure 5.3A). After that for every parameter, namely for the pitch, roll, and
北京理工大学博士学位论文!
97
yaw, the minimization problem can be written by a series of Equations (5.4-5.6). The minimized
pitch can be found by making the projected vectors
and
of the target collinear to each
other. The formulation has the following form:
£
~•€@&9'7
r
"
#
Ž•j?v
%r
#
ƒ2„…†‡I88"
#
Ž•j?v
%
)‰ŠB
‰ŠJ:Œ•B
ΥJ
(5.2)
A
B
Figure 5.3: A initial projection. B projection with the optimized camera pitch and roll
The yaw is obtained by means of the minimization multiplication of the slopes between
vectors
and
. These vectors depend on the yaw, whereas pitch obtained during the
previous step is constant. The formulation is as follows:
£
~•€KLM
r
"
#
•;–
%r
#
ƒ2„…†‡I88"
#
•;–
%
)‰ŠB
‰ŠJ'Œ•B
ΥJ
(5.3)
Once the pitch and yaw are known it is possible to calculate the roll. The roll can be
found by minimization of the slope of the vector
, whereas pitch and yaw obtained during
previous steps are constant. This minimization problem can be written as follows:
£
~•€CD55
r
"
#
[“””
%r
#
ƒ2„…†‡I88"
#
[“””
%
)•ŠJ
•ŠB
(5.4)
At this point, pixel coordinates belonging to the border of the target can be projected by
Equation (3.19) to the world ones with the pitch, roll, and yaw, determined during the calibration
北京理工大学博士学位论文!
98
procedure (see Figure 5.3B). Once the camera is calibrated, we can move to the calibration of the
laser plane.
5.3.2 Extrinsic parameters of the laser plane
This Subsection outlines the process of obtaining parameters forming the transformation
matrix T
of the laser plane, which are part of the Equation (5.1), whereas parameters of
are known and constant. In order to carry out the calibration process to obtain extrinsic
parameters of the laser plane, first of all, pixel coordinates of the laser beam belonging to sides
of the target are projected by Equation (3.19) to the world ones (see Figure 5.4A). After that for
every parameter, namely for the pitch, roll, and distance between the camera and laser plane, the
minimization problem can be written by a series of Equations (5.7-5.9). The minimized pitch can
be found by making the projected vectors
and
of the target collinear to each other. The
formulation has the following form:
£
~•€@&9'7
r
"
#
Ž•j?v
%r
#
ƒ2„…†‡I88"
#
Ž•j?v
%
)—˜B
—˜J:—˜B
—˜J
(5.5)
A
B
Figure 5.4: A initial projection. B projection with the optimized laser pitch, roll and distance.
Another parameter related with the
is the roll. The roll can be found by minimization
of the slope of the vector
. This minimization problem can be written as follows:
£
~•€CD55
r
"
#
[“””
%r
#
ƒ2„…†‡I88"
#
[“””
%
)™˜J
™˜B
(5.6)
北京理工大学博士学位论文!
99
Once parameters of the
are known it is possible to calculate the distance between the
camera and laser plane, which is the part of the translation matrix
. The real distance D1
between the left side of the target and the right side is known. The experimental distance D2
between sides of the target can be calculated from the projected image coordinates of the laser
beam to the world coordinates. Thus, by minimizing the difference between D1 and D2 in the
objective function, it is possible to find the dependent variable, which represents the distance
between the camera and laser plane. This procedure takes following form:
z
|
}
~•€;&69
r
"
#
@•›j
%r
#
ƒ2„…†‡I88"
#
@•›j
%
)Œ-:Œ#
Œ#)#bE6bF
:bG6bH
%
(5.7)
Afterwards, pixel coordinates of the laser beam can be projected by Equation (3.19) to
the world ones with the pitch, roll, and distance between the camera and laser plane, determined
during the calibration procedure (see Figure 5.4B).
5.4 Evaluation
5.4.1 Experiment Setup
In order to evaluate the accuracy and robustness of the proposed calibration method, as
well as to demonstrate the quality of its performance, it was compared with two other calibration
techniques. The experiment setup is depicted in the
Table 5-1. Method 1 is based on the box target and the calibration technique considered in
the previous Section. Method 2 is based on the perpendicular checkerboard patterns and was
adopted from [107]. Firstly, the comparison of extrinsic parameters (
,
,
) was based on the
configuration of the vision system where all calibration targets were visible (see Figure 5.5).
Secondly, the performance of the calibration method was estimated for the configuration of the
vision system considered in this chapter (see Figure 5.6). For this configuration one side of the
box target (method 1) was not visible, thus the proposed method was compared only with the
method 2. Experiments were carried out for 35 different configurations of the vision system,
where orientations of the camera and laser plane were randomly changed from -10 to 10 degrees.
Table 5-1: The Configurations of the Calibrations Methods
Calibration Target
Proposed
Target with 3 sides
北京理工大学博士学位论文!
100
Method 1
Target with 4 sides
Method 2
Checkerboard patterns
Configuration of the target
Proposed
Distances between parallel sides of the target are 1084.5 mm
Method 1
Method 2
Size of the checkerboard is 9x6. The square size is 97x97 mm
Location of the target (Configuration of the vision system #1)
Proposed
Distance from the camera to the left side is 646 mm and to the front side is 746 mm
Method 1
Method 2
Location of the target (Configuration of the vision system #2)
Proposed
Distance from the camera to the left side is 646 mm and to the front side is 1346 mm
Method 2
Other configurations
Proposed
Distance from the camera to the floor is 927 mm and to the laser plane is 466 mm. Image
resolution is 1920x1920 pixels
Method 1
Method 2
A
B
C
Figure 5.5: Configuration of the vision system #1. A Method 1. B Proposed. C – Method 2.
A
B
Figure 5.6: Configuration of the vision system #2. A Proposed. B Method 2.
北京理工大学博士学位论文!
101
5.4.2 Results
As for extrinsic parameters estimation, the mean absolute error (MAE) and the root mean
squared error (RMSE) among all of the vision system configurations were calculated for the
comparative analysis between the calibration methods. These values are depicted in Table 5-2.
For configuration of the vision system #1, it can be seen that better calibration results were
obtained by method 1, followed by the proposed method, and lastly method 2. As for
configuration of the vision system #2, method 1 is no longer applicable, thus among other two
methods better calibration results were obtained by the proposed method. It is also worthwhile
mentioning that the proposed method works much faster than the one based on the checkerboard
patterns. The average run time among all configurations for the proposed method is 39.5 seconds
and for the method 2 it is 95.8 seconds. Table 5-2 also includes the values corresponding to the
maximum absolute error (AE), which was obtained during the specific experiment (the index of
the experiment is shown in the brackets). For these experiments corresponding to the maximum
AE, 3D reconstruction of the indoor environment is provided in the next Section, which was
added for visual analysis between calibration methods.
Table 5-2: The Evaluation of the Extrinsic Calibration
Camera Parameters
Laser Parameters
Pitch, deg
Roll, deg
Yaw, deg
Pitch, deg
Roll, deg
Distance, mm
Configuration of the vision system #1
MAE
Proposed
0.05
0.11
0.06
0.16
0.09
2.81
Method 1
0.05
0.03
0.04
0.05
0.05
1.89
Method 2
0.10
0.13
0.08
0.27
0.06
3.32
RMSE
Proposed
0.05
0.13
0.07
0.18
0.11
2.97
Method 1
0.05
0.06
0.04
0.08
0.09
2.01
Method 2
0.14
0.15
0.10
0.35
0.25
3.89
Configuration of the vision system #2
MAE
Proposed
0.10
0.25
0.05
0.18
0.06
3.68
Method 2
0.18
0.38
0.12
0.21
0.24
3.89
RMSE
Proposed
0.13
0.31
0.07
0.25
0.08
4.11
北京理工大学博士学位论文!
102
Method 2
0.26
0.42
0.17
0.28
0.49
5.14
Maximum AE
Proposed
0.40 (6)
0.69 (19)
0.17 (3)
0.78 (19)
0.16 (16)
7.24 (10)
Method 2
0.86 (10)
0.76 (9)
0.66 (12)
0.81 (34)
2.54 (34)
14.64 (12)
5.4.3 Discussion
In the previous Section it was shown that for configuration of the vision system #1,
method 1 works better than method 2. The current work is aimed at evaluating the proposed
method against other calibration techniques. It was assumed that by modifying the calibration
target and calibration technique of the method 1 it would be possible to achieve similar
calibration results. However, in Table 5-2 it can be seen that the calibration results of the method
1 are better than both other methods. Thus, for configuration of the vision system #1 it is better
to use method 1. The performance of the proposed method is not as good as of the method 1, but
the proposed method is more universal and can be implemented for different vision system
configurations. It is also worthwhile mentioning that for both configurations of the vision system,
the proposed method showed better calibration results than the method 2, which is based on the
checkerboard patterns.
5.5 3D Reconstruction Method
Once the vision system was calibrated, we can move to the reconstruction of the 3D
structure of the indoor environment. Our method consists of several primary steps, outlined in
Figure 5.7. The input color image is first segmented into a set of objects of interest with semantic
labels. Secondly, images of these objects are extracted and transformed so as to be imaged in a
perspective projection. After that the depth information is recovered based on the laser data.
Finally, the 3D model is assembled. These steps are described below in more details.
北京理工大学博士学位论文!
103
Figure 5.7: Overview: From a single fisheye snapshot, the proposed method combines semantic segmentation and
laser data to generate the 3D model of the indoor environment.
5.5.1 Semantic Segmentation
The proposed reconstruction technique is aimed at obtaining the 3D model of an indoor
environment by one single snapshot. By fusing data from the laser, fisheye image, and semantic
segmentation it is possible to recover layout of the indoor environment as well as calculate depth
information to the corresponding elements, such as floor, walls, ceiling, and doors. In doing so,
we also solve the problem of perception of doors that has been intensively researched for over a
decade.
The problem of door identification has been tackled by Anguelov et al. [177] who present
an interesting approach based on laser range scans and a panoramic camera. They identify doors
that have been observed in different opening angles by the laser range scanner. The identified
doors allow them to learn how to distinguish walls and doors by color, such that similar doors
can be identified in the camera data. While limited in determining the exact geometry of the
doors, their approach provides valuable annotations of the doors in a map.
Limketkai et al. [178] propose a system to identify doors in 2D occupancy grid maps by
learning common properties of the doors in the specific environment, such as the width and the
indent from the wall. While based on strong assumptions, the method has the advantage to not
rely on observing doors in different states. Rusu et al. [179] propose a system for identifying
doors and extracting information about the geometry. They detect doors in 3D point clouds from
北京理工大学博士学位论文!
104
a tilting laser range scanner by searching for offset planes that follow the standards for
wheelchair accessible doors.
Several methods have been proposed to detect the doors using visual features [180, 181]
or a laser range scanner [182]. Others methods additionally obtain the exact location and
dimensions of the door frame, e.g. using active vision with a stereo camera [183] or a tilting laser
range scanner [179]. In order to eliminate lacks of existing methods and provide more reliable
and robust solution we decided to operate with the structured light and deep learning.
One of the benefits of the proposed simulator is that it can provide automatic ground truth
labeling for the main parts of the scene (see Figure 5.7). The problem with the manual labeling is
the process itself is time-consuming as images may contain a wide range of elements, this is
especially so for omnidirectional images. This Section demonstrates the capacity of the
automatic ground truth labeling by training a semantic segmentation network using deep learning.
5.5.2 Feature Extraction
There are a variety of features for better image understanding, which in general can be
named as hand-crafted features and learned features. Hand-crafted features are extracted
manually using an algorithm defined by an expert. Learned features can be extracted with the use
of Convolutional Neural Networks (CNNs) [184, 185].
CNNs architectures are used in fields such as image recognition, image annotation, image
retrieval, etc. [186]. As for image classification, CNN architecture consists of several
convolutional layers followed by one or more fully connected layers [187]. Image feature
extraction based on CNNs have demonstrated its effectiveness in a number of applications [188-
190].
In this dissertation, the goal of the CNN is the detection of features of an indoor
environment (floor, ceiling, walls, and doors) by labeling them with different colors (semantic
segmentation). He et al. pointed out that the deeper the neural network, the more difficult it is to
train it [191]. This problem was solved by using the residual learning framework, namely ResNet.
Experimental results showed a better performance in training and testing on the ILSVRC 2015
(ImageNet Large Scale Visual Recognition Challenge) validation set with a top 1-recognition
accuracy of about 80% [191]. An operating principle of Residual Network is that residual
functions (instead of unreferenced functions) with reference to the layer inputs should be learned
by each layer of the network. These architectures are easier to optimize and it is possible to
北京理工大学博士学位论文!
105
obtain improved accuracy by significantly increasing the depth [191]. Thus, these networks were
considered in our work.
5.5.3 Experimental Setup
The dataset was generated by our simulator and contains 300 labeled images. 80% of the
dataset was partitioned into training data and the remaining 20% was used as test data. This
dataset is composed of 240-by-240-pixel images and tested by two networks: ResNet18 and
ResNet50. Networks were trained with the use of a single CPU with the clock speed equal to
2.5GHz; 10G of RAM, and the GPU was an Intel HD Graphics 4000 which has 1.5G memory.
We used 64-bit macOS as the operating system.
5.5.4 Results
Figure 5.8 shows the behaviors of ResNets which are similar to each other. From Table
5-3 it can be seen that performance of the ResNet18 is not inferior to ResNet50, but at the same
time it took less time to train ResNet18 and the network size is lower in comparison with
ResNet50. Figure 5.9 shows some of the output results of the trained networks. By visual
representation it can be seen that both networks were trained in an accurate way in comparison
with the ground truth.
Figure 5.8: Neural network performance evaluation. ResNet18 is shown in red color and ResNet50 is shown in blue
color.
北京理工大学博士学位论文!
106
Table 5-3: The Evaluation of the Trained Networks
Network
Validation accuracy
Training time
Network size
ResNet18
96.60 %
129 min 11 sec
103.4 MB
ResNet50
96.65 %
284 min 59 sec
236.6 MB
ResNet18
ResNet50
Ground t
ruth
Figure 5.9: Training results.
5.5.5 Discussion
It was demonstrated that the labeled data generated by our simulator is suitable for
training neural networks. The automatic labeling itself can significantly simplify the process of
collecting data for testing theories and verifying experiment results. It was also found that by
using deep learning the semantic segmentation network can be well-trained with the small
amount of the network layers. Moreover, his approach is fast and does not increase the output
network size.
北京理工大学博士学位论文!
107
5.6 Perspective Projection
5.6.1 Preparation
Once the semantic segmentation network was trained the portion of interest can be
extracted from the input fisheye image. First of all, for every element (floor, ceiling, walls, and
doors) masks are created (see Figure 5.7). It is worthwhile mentioning that the reconstruction
method proposed in the Thesis allows one to obtain 3D model of the indoor scene within the
visible region of the laser beam, which is belonging to the walls in the fisheye image (see Figure
5.7). Next, with the previously created masks and the working region of the laser beam, it is
possible to extract the interesting portions of the scene (see Figure 5.10). Finally, when the
interested elements of interest are extracted from the fisheye image, the perspective projections
can be created.
Figure 5.10: Upper row shows extracted regions of the indoor environment (floor, ceiling, walls, and doors) with the
visible laser beam. Lower row shows corresponding perspective projection for extracted regions of the indoor
environment.
5.6.2 Perspective projection
An equirectangular projection represents everything visible from a particular point in
space. As such, any other 3D to 2D projection can be created, including a standard perspective
projection. Since a perspective projection only captures a relatively small field of view,
compared to the 360 by 180 degrees of the equirectangular projection, there are an infinity of
possible perspective projections that might be calculated. Each of these possible perspective
projections can be characterized by the pitch, roll and yaw rotation of the camera view frustum,
as well as the horizontal and vertical field of view.
北京理工大学博士学位论文!
108
A single perspective projection cannot capture everything represented within an
equirectangular projection, however multiple perspective projections can. An industry standard is
to create 6 perspective views with the camera view frustum corresponding to the 6 faces of a
cube centered at the camera position. Each perspective projection has 90 degrees field of view
both horizontally and vertically.
The algorithm employed here to create a perspective projection starts by considering a
virtual camera located at the origin with a view "forward" direction pointing down the positive y
axis and with the z axis being the "up" vector. In the conventions used here a right-hand
coordinate system is used so the camera "right" vector is along the positive x axis.
Figure 5.11: Initialization of the virtual camera.
To create a particular perspective view one rotates this initial camera direction and
orientation about any axis, or combination of axes. A roll in the chosen coordinate system is a
rotation about the y axis (forward), panning is a rotation about the z axis (up) and tilting is a
rotation about the x axis (right). To create the 6 faces of the cube map the initial camera view
direction is rotated as shown, the horizontal and vertical field of view set to 90 degrees.
Table 5-4: Rotations of the cube face
Cube face
Rotation
front
-
left
Pan by 90 degrees
right
Pan by -90 degrees
back
Pan by 180 degrees
top
Tilt by -90 degrees
bottom
Tilt by 90 degrees
x
z
y
view frustum
camera!
location
up
forward
right
projection plane
北京理工大学博士学位论文!
109
The process of creating the perspective projection image is performed in the reverse
direction, that is, for every pixel (or subpixel for anti-alising) in the perspective image plane, one
needs to find the best RGB estimate in the fisheye image.
The high-level process is as follows:
Initialize the virtual camera, located at the origin, looking down the y axis and with a
horizontal and vertical FOV of 90 degrees.
For every pixel (i, j) in the camera projection plane derive the corresponding 3D vector P
in world coordinates by Equation (5.10).
¤#žp•p¥%)
¦§
••
:B
¨
pBp
§
•©
v:B
¨ª
(5.8)
Figure 5.12: Perspective image.
Rotate this vector P about the axes corresponding to roll, pitch and yaw to orientate the
perspective camera as desired, call this vector P'.
Calculate the angles ø and θ as shown below.
«);j;Ÿ•#¤N
!p¤O
!%
;
¬);j;Ÿ
i9
¤O
!# 6¤N
!#p¤K
!
k
(5.9)
Perspective image
w
0
0
h
P
i
j
-1 1
x
-1
1
z
Pixel coordinates
Normalised image coordinates
北京理工大学博士学位论文!
110
Figure 5.13: Transformation between world and camera coordinates.
Determine the image index (I, J) in normalized fisheye image coordinates given these
and ø and the linear relationship between ø and radius r in a fisheye projection. This gives
the RGB value to assign to pixel (i, j) in the perspective image.
®)•¯8?“›#-%y"<LO
;
°)•¯8›•Ÿ#-%y"<LO
(5.10)
where
is the field of view of the fisheye lens.
Results of the perspective projection are shown in Figure 5.10.
Figure 5.14: Transformation to the image coordinates.
x
z
y
P'
θ
ø
right
forward
up
camera !
position
p
I
J
θ
r
-1
-1
1
1
Fisheye image
Normalised image coordinates
0
0
北京理工大学博士学位论文!
111
5.6.3 Proposed 3D Reconstruction Technique
Last step is related to the 3D reconstruction of the indoor environment. In order to
reconstruct the depth of the scene several steps are required. First of all, image coordinates of the
laser beam are extracted and distances to the corresponding walls and doors are calculated by
Equation (3.19). Once the location of the wall is known it is possible to calculate distances to the
floor and ceiling. Now, the labeled image regions between interested parts can be extracted as
follows:
Floor. Region between the magenta color and yellow color (see Figure 5.7);
Ceiling. Region between the magenta color and aqua color (see Figure 5.7).
In a similar fashion to the laser plane, distances to the floor and ceiling can be
successfully found by triangulation. The distance from the mobile robot to the particular wall
along Y-axis (see Figure 5.2) is known and constant. Consequently, world coordinates along Y-
axis do not change. By knowing that
, equation for calculating world coordinates of the
wall and ceiling can be written as:
*
+
-
"#$
%/
e
T
[-
'[#
'[.
'
WT
[-
5[.
5j5
W*
a
m
B
/
)c
(5.11)
By our reconstruction method it is also possible to reconstruct a corner part. For this
procedure first of all, endpoints of the laser beam have to be extracted. Secondly, orientation of
every wall is calculated and wall in the corner can be divided for two parts. Finally, the 3D
model can be assembled from the segmented parts of the scene in combination with the
北京理工大学博士学位论文!
112
corresponding distances.
Figure 5.15 shows individual reconstructed 3D models as well as global map. Results
show that the proposed reconstruction technique provides accurate and robust 3D models for
different configurations of the indoor scenes. Results also show that by using a single input
image it is possible to reconstruct not only the layout of the indoor environment, but depth as
well.
北京理工大学博士学位论文!
113
In the Table 5-2, indices associated with each particular experiment with the maximum
AE were mentioned.
Figure 5.15 illustrates the corresponding 3D models for some of these configurations. The
goal is to compare calibration methods with a visual analysis. For configuration 6 both methods
provide quite similar results. However, for the rest part of the 3D models, the difference between
methods is quite obvious. From the visual analysis it can also be noticed that the proposed
method provides better results, outperforming the standard one. For an improved visual analysis
more samples of the 3D models would be required and it is difficult to include all of them in the
Thesis. Thus, the main comparison analysis between calibration methods was conducted by
estimating the extrinsic parameters, whereas a visual analysis of some configurations was added
additionally.
北京理工大学博士学位论文!
114
Figure 5.15: Reconstruction results (single reconstruction and global map are presented).
5.6.4 Comparison with commonly used reconstruction methods
- Qualitive Comparison
As it was mentioned before passive vision systems depend on the detected environmental
features and work under environmental lightning conditions. Representative examples include
stereo vision and structure from motion (SFM), where images are captured from multiple
perspectives, and correspondence has to be established between pixels on the different images.
Namely, this correspondence determination process is highly relying on image features. As a
result, when textures on the reconstructed surface are insufficient, such methods present low
北京理工大学博士学位论文!
115
matching accuracy. In order to justify the statements mentioned above a well-established Visual
Structure from Motion (VSfM) tool [192] was applied for generating sparse point clouds. Figure
5.16 presents a qualitative comparison of reconstructed models between our method and VSfM.
Figure 5.16: Left image shows reconstruction results on a passive method. Right image shows reconstruction results
on the proposed method.
From the reconstruction results illustrated in Figure 5.16 it can be seen that our system
outperforms reconstruction based on passive methods. The proposed reconstruction method is
comparable to rigid indoor reconstruction. In contrast, passive methods introduce more noise and
artifacts to the final 3D model, this may be seen in Figure 5.16. Moreover, the reconstruction is
mostly perfomed in textured areas, e.g. in edges and corners, as in textureless areas such as walls,
floor, ceiling are not enough distinguish features for detection. In contrast, our method is able to
recover the textureless surfaces in an accurate and smooth way.
- Quantative Comparison
Because of the lack of the ground truth in real cases, authors usually compare the surface
reconstruction of their methods with the geometrical data aquired by a LIDAR system [193-196].
LIDAR systems serve as a high precision ground truth 3D point clouds generators. In contrats to
北京理工大学博士学位论文!
116
real cases, inside the simulation environment ground truth is known, e.g. distances from mobile
robot to obstacles. Moreover, point clouds also can be generated inside the simulation
environment (see Error! Reference source not found.). This generation of the point clouds also can
be considered as a benefit of the simulation environment as them can be used for deep learnig
purposes or as a ground truth data, for instance.
Figure 5.17: Images above show ground truth point clouds. Images below show reconstruction results on the
proposed method.
The quality on reconstruction results of the proposed method depends on the quality of
the carried out extrinsic calibration. This comparison on distances estimation was illustrated in
Table 4-7. Table 4-7 demonstrates that more reliable distance estimation to the structured light
belonging to obstacles can be obtained with the proposed calibration technique. Consecuently, as
structured light is used as a reference tool for generating the 3D model of the indoor scene, then
qulity of the reconstruction depend on how accurate the distance is estimated to particular
obstacle.
北京理工大学博士学位论文!
117
Chapter 6 Conclusions
6.1 Proposed Simulator
In this Thesis, we presented a high-fidelity simulator that can be used across various
applications of computer vision. We demonstrate the versatility of the simulator for evaluating
vision problems by studying several applications within simulated environments, namely
extrinsic calibration of the vision system and 2D/3D mapping of the indoor environment. We
also showed how the mobile robot equipped with the fisheye camera and structured light can be
controlled by inside the indoor environment. To the best of our knowledge, this simulator is the
first based on an omnidirectional camera and laser illumination. The simulator goes beyond
providing just synthetic data, but provides controlled environmental conditions that are not easily
replicated in the real-world. We strongly believe that this simulator can be of assistance to
researchers, and enable those with the requisite hardware to perform experiments in a safe
manner.
6.2 Omni-Vision System and its Extrinsic Calibration
This Thesis proposed an improved omnidirectional vision system in a flexible
configuration and technique for its calibration. The main contribution towards the vision system
is that we reduced the number of elements by leaving only one laser emitter covering the whole
robot’s surroundings and, at the same time, we eliminated the ambiguous problem of how several
laser emitters included in the vision system can be calibrated. As for the proposed calibration
technique we can say that the process itself is simple and by one single capture it is possible to
achieve reliable calibration results. By developing a rectangular target, the laser beam becomes
more visible in comparison with its projection onto the checkerboard patterns and our method
also includes more useful information by covering 360-degree field of view, as a result of the
projection onto the four sides. The experiments were performed and analyzed in order to verify
the accuracy and robustness of the proposed vision system and calibration method itself. Results
showed that our calibration method outperforms the most common one based on the
checkerboard patterns and our vision system can be used in the real scenarios.
By means of the simulation environment it was possible to achieve the contributions
mentioned above. We spent a lot of time for its development; however, it is been helpful in
testing theories before their practical applications. As for the current work the main benefit of
北京理工大学博士学位论文!
118
using the simulation environment was related with the comparison analysis between extrinsic
parameters among different calibration methods as well as the comparison of the mapping results.
6.3 3D Reconstruction of the Indoor Environment
Finally, based on our understanding of the accuracy and precision of a structured-light
based vision system, we also have developed a pipeline for building geometrically consistent 3D
reconstruction of the indoor environment. We have proposed a fusion of the information from
structured-light-omni-vision system and the semantic segmentation neural network to build high
quality 3D reconstructions. While these methods separately have their own advantages and
limitations, our approach of the fusion takes the advantages of both these methods and also
overcomes the limitations of the individual methods. Our approach does not suffer any
significant coarse level deformation, while traditional photometric-stereo is very much prone to
this problem. Moreover, in order to perform the proposed technique only one input image is
required for recovering layout the room as well as depth data. The experiments were performed
and analyzed in order to verify the accuracy and robustness of the proposed methods. Results
showed that our reconstruction method is able to recover not only layout of the indoor scene but
also depth information. Experimental results also demonstrated that passive vision systems often
fail on challenging scene parts, such as textureless surfaces, where they can produce depth
estimation errors, whereas the proposed approach, which is based on the active vision system
provides stable layout recovery with depth estimation.
6.4 Summary of the Dissertation and further work
In this Dissertation, we have investigated some of the important problems in the domain
of extrinsic calibration and 3D reconstruction in geometric computer vision, by developing
robust and efficient techniques in these fields. We have analyzed accuracy of the proposed omni-
vision system with laser illumination, by developed simulator, as well as by real data. We also
have developed a novel method for 3D reconstruction by fusing the information from structured-
light-stereo and semantic segmentation.
In the future, we aim to further investigate the transference of capabilities learned in
simulated worlds to the real-world. One of key advantages of employing virtual environments is
their ability to represent a diverse and dynamic range of real-world conditions. In order to add
more world dynamics and diversity it is planned to extend the capability of the current version of
北京理工大学博士学位论文!
119
the simulator by adding pedestrians, by creating manual as well automated environment
generation systems. Thus, users will be able to interact with standardized blocks representing
elements such as walls, floor, ceiling, furniture or obstacles. This approach will easily allow
generation of a wide variety of training and testing environments. The differences in appearance
between the simulated and real-world scenarios will need to be smoothed through deep transfer
learning techniques. Moreover, recent work by F. Sadeghi et al. showed that transfer from virtual
environments to the real-world is possible even without a strong degree of photorealism [197].
北京理工大学博士学位论文!
120
References
[1] R. Al-Harasis, E. Al-Zmaily, H. Al-Bishawi, J. Abu Shash, M. Shreim, and B. Sababha, “Design and
Implementation of an Autonomous UGV for the Twenty Second Intelligent Ground Vehicle
Competition,” The International Conference on Software Engineering, Mobile Computing and Media
Informatics, 2015.
[2] A. Bechar, and C. Vigneault, “Agricultural robots for field operations. Part 2: Operations and systems,”
Biosyst Eng, pp. 110128, 2017.
[3] N. Bellas, S. Chai, M. Dwyer, and D. Linzmeier, “Real-time fisheye lens distortion correction using
automatically generated streaming accelerators,” In: Field Programmable Custom Computing
Machines, 2009.
[4] O. Rawashdeh, H. Yang, R. AbouSleiman, and B. Sababha, “Microraptor: A low-cost autonomous
quadrotor system,” In: ASME 2009 International Design Engineering Technical Conferences and
Computers and Information in Engineering Conference, 2009.
[5] B. Sababha, H. Al Zu'bi, and O. Rawashdeh, “A rotor-tilt-free tricopter UAV: design, modelling, and
stability control,” International Journal of Mechatronics and Automation, pp. 107113, 2015.
[6] H. Sawalmeh, H. Bjanthala, M. Al-Lahham, and B. Sababha, “A Surveillance 3D Hand-Tracking-
Based Tele-Operated UGV,” In: The 6th International Conference on Information and
Communication Systems, 2015.
[7] V. Semwal, A. Bhushan, and G. Nandi, “Study of humanoid Push recovery based on experiments,” In:
Control, Automation, Robotics and Embedded Systems (CARE), 2013.
[8] V. Semwal, P. Chakraborty, and G. Nandi, “Less computationally intensive fuzzy logic (type-1)-based
controller for humanoid push recovery,” Robot Auton Syst, pp. 122135, 2015.
[9] V. Semwal, N. Gaud, and G. Nandi, “Human Gait State Prediction Using Cellular Automata and
Classification Using ELM,” In: MISP-2017.
[10] V. Semwal, S. Katiyar, R. Chakraborty, and G. Nandi, “Biologically-inspired push recovery capable
bipedal locomotion modeling through hybrid automata,” Robot Auton Syst, pp.181190, 2015.
[11] V. Semwal, K. Mondal, and G. Nandi, “Robust and accurate feature selection for humanoid push
recovery and classification: deep learning approach,” Neural Comput & Applic, pp. 565574, 2017.
[12] V. Semwal, and G. Nand, “Generation of joint trajectories using hybrid automate-based model: a
rocking block-based approach,” IEEE Sensors J, pp. 58055816, 2016.
[13] V. Semwal, M. Raj, and G. Nandi, “Biometric gait identification based on a multilayer perceptron,”
Robot Auton Syst pp. 6575, 2016.
[14] T. Liu, C. Ju, Y. Huang, T. Chang, K. Yang, and Y. Lin, “A 360-degree 4Kx2K Panorama Video
Processing Over Smart-phones,” IEEE International Conference on Consumer Electronics (ICCE),
2017.
北京理工大学博士学位论文!
121
[15] K. Ma, F. Lu, and X. Chen, "Robust Planar Surface Extraction from Noisy and Semi-Dense 3D Point
Cloud for Augmented Reality," 2016 International Conference on Virtual Reality and Visualization
(ICVRV), pp. 453-458, Sep. 2016, doi: 10.1109/ICVRV.2016.83.
[16] A. Mossel, and M. Kroeter, "Streaming and Exploration of Dynamically Changing Dense 3D
Reconstructions in Immersive Virtual Reality," 2016 IEEE International Symposium on Mixed and
Augmented Reality (ISMAR-Adjunct), pp. 43-48, Sep. 2016, doi: 10.1109/ISMARAdjunct.
2016.0035.
[17] C. Fernandez-Labrador, A. Perez-Yus, G. Lopez-Nicolas, and J. Guerrero, "Layouts From Panoramic
Images With Geometry and Deep Learning," in IEEE Robotics and Automation Letters, vol. 3, no. 4,
pp. 3153-3160, Oct. 2018, doi: 10.1109/LRA.2018.2850532.
[18] C. Fernandez-Labrador, J. Facil, A. Perez-Yus, C. Demonceaux, J. Civera, and J. Guerrero, "Corners
for Layout: End-to-End Layout Recovery From 360 Images," in IEEE Robotics and Automation
Letters, vol. 5, no. 2, pp. 1255-1262, Apr. 2020, doi: 10.1109/LRA.2020.2967274.
[19] S. Shah, and J. Aggarwal, “Mobile robot navigation and scene modeling using stereo fish-eye lens
system,” in Machine Vision and Applications, vol. 10, pp. 159-173, Oct. 1996, doi:
10.1007/s001380050069.
[20] M. Nakagawa, T. Yamamoto, S. Tanaka, M. Shiozaki, and T. Ohhashi. “Topological 3D Modeling
Using Indoor Mobile Lidar Data,” in ISPRS - International Archives of the Photogrammetry, Remote
Sensing and Spatial Information Sciences, pp. 13-18, May 2015 doi: 10.5194/isprsarchives-XL-4-W5-
13-2015.
[21] X. Lian, Z. Liu, X. Wang, and L. Dou, “Reconstructing Indoor Environmental 3D Model Using Laser
Range Scanners and Omnidirectional Camera,” In Proc. of the 7th World Congress on Intelligent
Control and Automation, vol. 1, no. 23, pp. 16401644, Jun. 2008.
[22] E. DANDIL, and K. K. ÇEVİK, “Computer Vision Based Distance Measurement System using Stereo
Camera View, In Proc. of the 3rd International Symposium on Multidisciplinary Studies and
Innovative Technologies (ISMSIT), pp. 1-4, 2019.
[23] D. V. Alekseevich, and M. D. Alexandrovich, “Construction of a depth map using stereo vision based
on a developed stereo camera for the anthropomorphic robot AR600E:,” In Proc. of the 2nd School on
Dynamics of Complex Networks and their Application in Intellectual Robotics (DCNAIR), pp. 27-30,
2018.
[24] P. N. Koundinya, Y. Ikeda, S. N.T., P. Rajalakshmi, and T. Fukao, “Comparative Analysis of Depth
Detection Algorithms using Stereo Vision, In Proc. of the 6th World Forum on Internet of Things
(WF-IoT), pp. 1-5, 2020.
[25] M. Song, H. Watanabe, and J. Hara, Robust 3D reconstruction with omni-directional camera based
on structure from motion,2018 International Workshop on Advanced Image Technology (IWAIT),
pp. 1-4, 2018.
北京理工大学博士学位论文!
122
[26] Q. Zhang, P. An, S. Wang, X. Bai, and W. Zhang, Image-based Space Object Reconstruction and
Relative Motion Estimation using Incremental Structure from Motion,In Proc. of the IEEE CSAA
Guidance, Navigation and Control Conference (CGNCC), pp. 1-6, 2018.
[27] A. D. Sergeeva, and V. A. Sablina, Using structure from motion for monument 3D reconstruction
from images with heterogeneous background, In Proc. of the 7th Mediterranean Conference on
Embedded Computing (MECO), pp. 1-4, 2018.
[28] M. Daum, and G. Dudek, On 3-D surface reconstruction using shape from shadows,In Proc. of the
IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 461-468, 1998.
[29] R. Gouiaa, and J. Meunier, 3D Reconstruction by Fusioning Shadow and Silhouette Information,In
Proc. of the Canadian Conference on Computer and Robot Vision, pp. 378-384, 2014.
[30] C. Wohler, 3D surface reconstruction by self-consistent fusion of shading and shadow features,In
Proc. of the 17th International Conference on Pattern Recognition, pp. 204-207, 2004.
[31] R. Taira, S. Saga, T. Okatani, and K. Deguchi, 3D reconstruction of reflective surface on reflection
type tactile sensor using constraints of geometrical optics,In Proc. of the SICE Annual Conference,
pp. 3144-3149, 2010.
[32] X. Zongwu, Z. Zhaojie, and L. Yechao, “3-D Object Reconstruction Using Tactile Sensing for Multi-
fingered Hand, 2018 10th International Conference on Intelligent Human-Machine Systems and
Cybernetics (IHMSC), pp. 209-212,2018.
[33] Fuming Liu, and T. Hasegawa, Reconstruction of curved surfaces using active tactile sensing and
surface normal information, In Proc. of the IEEE International Conference on Robotics and
Automation, vol. 4, pp. 4029-4034, 2001.
[34] E. Le Francois, J. Herrnsdorf, J. J. D. McKendry, L. Broadbent, M. D. Dawson, and M. J. Strain,
“Combined Time of Flight and Photometric Stereo Imaging for Surface Reconstruction,In Proc. of
the IEEE Photonics Conference (IPC), pp. 1-2, 2020.
[35] M. Xie, and J. R. Cooperstock, Time-of-Flight Camera Calibration for Improved 3D Reconstruction
of Indoor Scenes,2014 Seventh International Symposium on Computational Intelligence and Design,
pp. 478-481, 2014.
[36] J. M. Gutierrez-Villalobos, T. Dimas, and J. C. Mora-Vazquez, Simple and low cost scanner 3D
system based on a Time-of-Flight ranging sensor, In Proc. of the XIII International Engineering
Congress (CONIIN), pp. 1-5, 2017.
[37] B. Zhang, H. Zhang, J. Zhu, H. Xu, and Y. Zhao, Research On Self-Mixing Interference
Displacement Reconstruction Method Based On Ensemble Empirical Mode Decomposition,In Proc.
of the IEEE International Conference on Mechatronics and Automation (ICMA), pp. 723-727, 2019.
[38] J. Zalev and M. C. Kolios, Image Reconstruction Combined With Interference Removal Using a
Mixed-Domain Proximal Operator,In Proc. of the IEEE Signal Processing Letters, vol. 25, no. 12,
pp. 1840-1844, Dec. 2018.
北京理工大学博士学位论文!
123
[39] Y. Awatsuji, Yexin Wang, P. Xia, and O. Matoba, 3D image reconstruction of transparent gas flow
by parallel phase-shifting digital holography, In Proc. of the 15th Workshop on Information Optics
(WIO), pp. 1-2, 2016.
[40] Z. Hu, Q. Guan, S. Liu, and S. Y. Chen, Robust 3D Shape Reconstruction from a Single Image Based
on Color Structured Light, In Proc. of the International Conference on Artificial Intelligence and
Computational Intelligence, pp. 168-172, 2009.
[41] H. Lin, and Z. Song, 3D reconstruction of specular surface via a novel structured light approach,In
Proc. of the IEEE International Conference on Information and Automation, pp. 530-534, 2015.
[42] J. Deng, B. Chen, X. Cao, B. Yao, Z. Zhao, and J. Yu, 3D Reconstruction of Rotating Objects Based
on Line Structured-Light Scanning,In Proc. of the International Conference on Sensing, Diagnostics,
Prognostics, and Control (SDPC), pp. 244-247, 2018.
[43] C. Albitar, P. Graebling, and C. Doignon, “Robust Structured Light Coding for 3D Reconstruction” In
Proc. of IEEE International Conference on Computer Vision, 2007.
[44] E.V. German, and E.R. Muratov, “Assessment of combining heterogeneous images,” World &
Science: Materials of the international research and practice conference, Brno, Czeh. Rep., May 2014.
[45] H. Chao, Y. Gu, and M. Napolitano, “A survey of optical flow techniques for robotics navigation
applications,” Journal of Intelligent & Robotic Systems, vol. 73, no. 1, pp. 361-372, 2014.
[46] D. Nist´er, F. Kahl, and H. Stew´enius, “Structure from motion with missing data is np-hard,” In Proc.
of ICCV’07, pp. 17, 2007.
[47] O. Chum, T. Werner, and J. Matas, “Two-view geometry estimation unaffected by a dominant plane,”
In Proc. of CVPR’05, pp. 772780, 2005.
[48] J. Salvi, J. Pages, and J. Battle, “Pattern codification strategies in structured light systems,” Pattern
Recognition, vol. 37 pp. 827-849, 2004.
[49] M. Adachi, S. Shatari, and R. Miyamoto, "Visual Navigation Using a Webcam Based on Semantic
Segmentation for Indoor Robots," 2019 15th International Conference on Signal-Image Technology &
Internet-Based Systems (SITIS), pp. 15-21, Nov. 2019 doi: 10.1109/SITIS.2019.00015.
[50] X. Lian, Z. Liu, X. Wang, and L. Dou, “Reconstructing Indoor Environmental 3D Model Using Laser
Range Scanners and Omnidirectional Camera,” In Proc. of the 7th World Congress on Intelligent
Control and Automation, vol. 1, no. 23, pp. 16401644, Jun. 2008.
[51] Y. Li, and Q. Wang, Catadioptric Omni-direction Vision System based on Laser Illumination,” In
Proc. of the 2013 IEEE International Symposium on Industrial Electronics (ISIE), May. 2013.
[52] J. Shin, and S. Yi, “Development of Omnidirectional Ranging System Based on Structured Light
Image,” Journal of Institute of Control, Robotics and Systems, vol. 18, no. 5, pp. 479-486, May. 2012.
[53] J. Shin, S. Yi, Y. Hong, and J. Suh, “Omnidirectional Distance Measurement based on Active
Structured Light Image,” Journal of Institute of Control, Robotics and Systems, vol. 16, no. 8, pp.
751-755, Aug. 2010.
北京理工大学博士学位论文!
124
[54] S. Yi, B. Choi, and N. Ahuja, “Real-time omni-directional distance measurement with active
panoramic vision,” International Journal of Control, Automation and Systems, vol. 5, no. 2, pp. 184-
191, Apr. 2007.
[55] A. Lenskiy, H. Junho, K. Dongyun, and P. Junsu, “Educational platform for learning programming via
controlling mobile robots,” in Proc. International Conference on Data and Software Engineering, 2014,
pp. 1-4.
[56] M. Sadiku, P. Adebo, and S. Musa, “Online Teaching and Learning,” International Journals of
Advanced Research in Computer Science and Software Engineering, vol. 8, no.2, pp. 73-75, Feb.
2018.
[57] A. S. Alves Gomes, J. F. Da Silva and L. R. De Lima Teixeira, “Educational Robotics in Times of
Pandemic: Challenges and Possibilities,” 2020 Latin American Robotics Symposium, 2020 Brazilian
Symposium on Robotics and 2020 Workshop on Robotics in Education, Natal, Brazil, 2020, pp. 1-5.
[58] L. Ma, H. Bai, Q. Dai, and H. Wang, “Practice and Thinking of Online Teaching During Epidemic
Period *,” in Proc. 15th International Conference on Computer Science & Education, 2020, pp. 568-
571.
[59] J. Hu, and B. Zhang, “Application of SalesForce Platform in Online Teaching in Colleges and
Universities under Epidemic Situation,” in Proc. International Conference on Big Data, Artificial
Intelligence and Internet of Things Engineering, 2020, pp. 276-279.
[60] L. Kexin, Q. Yi, S. Xiaoou, and L. Yan, “Future Education Trend Learned From the Covid-19
Pandemic: Take Artificial Intelligence Online Course As an Example,” in Proc. International
Conference on Artificial Intelligence and Education, 2020, pp. 108- 111.
[61] I. Kholodilin, Y. Li, and Q. Wang, “Omnidirectional Vision System With Laser Illumination in a
Flexible Configuration and Its Calibration by One Single Snapshot,” IEEE Transactions on
Instrumentation and Measurement, vol. 69, no. 11, pp. 9105-9118, Nov. 2020.
[62] Open Dynamics Engine Website [Online]. Available: http://www.ode.org/
[63] VRML [Online]. Available: http://en.wikipedia.org/wiki/VRML
[64] O. Michel, “Cyberbotics Ltd. WebotsTM: Professional Mobile Robot Simulation”, International
Journal of Advanced Robotic Systems, vol. 1, no. 1, pp. 39-42, 2004.
[65] WebotsTM 6 Fast Prototyping & Simulation of Mobile Robots, Cyberbotics Ltd, 2009
[66] Webots [Online]. Available: http://www.cyberbotics.com/products/webots/
[67] Webots [Online]. Available: http://en.wikipedia.org/wiki/Webots
[68] Simulink - Simulation and Model based Design [Online]. Available:
http://www.mathworks.com/products/simulink/
[69] SimRobot - Robotics Simulator [Online]. Available: http://www.informatik.uni-
bremen.de/simrobot/index_e.htm
北京理工大学博士学位论文!
125
[70] MATLAB - The Language of Technical Computing [Online]. Available:
http://www.mathworks.com/products/matlab/
[71] MATLAB Product Help
[72] T. Petrinić, E. Ivanjko, and I. Petrović, “AMORsim − A Mobile Robot Simulator for Matlab” in Proc.
of 15th International Workshop on Robotics in Alpe-Adria-Danube Region (RAAD), Balatonfüred,
Hungary, June 2006.
[73] P. I. Corke, “A computer tool for simulation and analysis: The Robotics Toolbox for MATLAB”
Robotics & Automation Magazine, IEEE, vol. 3, pp. 24-32, March 1996.
[74] H. Aezman, “Mobile Robot Simulation and Contoller design with MATLAB/ SIMULINK” B.Eng.
Thesis, Kolej Universiti Teknikal Kebangsaan, Malaysia, Mar. 2005.
[75] Microsoft Robotics Developer Studio [Online]. Available:
http://en.wikipedia.org/wiki/Microsoft_Robotics_Developer_Studio
[76] J. Fernando: “Microsoft Robotics Studio: A Technical Introduction”
[77] J. Cogswell, “Microsoft Robotics Studio 2008 Makes Controlling Robots Easier” [Online]. Available:
http://www.eweek.com/c/a/Application- Development/Microsoft-Robotics-Studio-2008-Makes-
Controlling- Robots-Easier/
[78] B. Balaguer, S. Balakirsky, S. Carpin, M. Lewis, and C. Scrapper,USARSim: A Validated Simulator
for Research in Robotics and Automation,” Workshop on Robot Simulators: Available Software,
Scientific Applications, and Future Trendsat IEEE/RSJ, 2008.
[79] J. Wang, “USARSim: A Game-based Simulation of the NIST Reference Arenas, University of
Pittsburgh and School of Computer Science, Carnegie Mellon Publication.
[80] S. Carpin, M. Lewis, J. Wang, S. Balakirsky, and C. Scrapper, “USARSim: a robot simulator for
research and education” in Proc. of the IEEE International Conference on Robotics and Automation,
pp. 1400-1405, 2007.
[81] Second Life [Online]. Available: http://en.wikipedia.org/wiki/Second_Life
[82] T. Censullo, “Tutorial: Architecture of Open Simulator”.
[83] OpenSimulator [Online]. Available: http://en.wikipedia.org/wiki/OpenSimulator
[84] B. Browning, and E. Tryzelaar, “ÜberSim: A Multi-Robot Simulator for Robot Soccer, in Proc. of
Autonomous Agents and Multi-Agent Systems, AAMAS'03, Australia, pp. 948 - 949, Jul. 2003.
[85] J. Go, B. Browning, and M. Veloso, “Accurate and Flexible Simulation for Dynamic, Vision-Centric
Robots,in Proc. of the 3rd International Joint Conference on Autonomous Agents and Multi Agent
Systems, 2004
[86] Simbad Project Home [Online]. Available: http://simbad.sourceforge.net/index.php
[87] P. Reiners: Robots, mazes, and subsumption architecture [Online]. Available:
http://www.ibm.com/developerworks/java/library/j-robots/
北京理工大学博士学位论文!
126
[88] L. Hugues, and N. Bredeche, “Simbad: an Autonomous Robot Simulation Package for Education and
Research” in Proc. of the 9th International Conference on Simulation of Adaptive Behavior, SAB
2006, Rome, Italy, Sept. 2006.
[89] J. Klein, “BREVE: a 3D Environment for the Simulation of Decentralized Systems and Artificial Life,
in Proc. of the eighth international conference on Artificial life
[90] Breve software [Online]. Available: http://en.wikipedia.org/wiki/Breve(software)
[91] The breve Simulation Environment [Online]. Available: http://www.spiderland.org/
[92] Sapounidis, T, Demetriadis S. Educational robots driven by tangible programming languages: A
review on the field. Adv. Intell. Syst. Comput. 2017, 560: 205214.
[93] Koenig N, and Howard A. Design and use paradigms for Gazebo, an open-source multi-robot
simulator. In: International Conference on Intelligent Robots and Systems, ser. IROS ’04, 2004, 3:
21492154.
[94] Wang J, Lewis M, and Gennari K. Usar: A game-based simulation for teleoperation. In: IEEE
International conference on systems, man and cybernatics, 2003, 493497.
[95] Schmits, T., Visser, A.: An Omnidirectional Camera Simulation for the USARSim World. Lecture
Notes in Artificial Intelligence, 2009, 5339: 296307.
[96] Beck D, Ferrein A, Lakemeyer G. A simulation environment for middle-size robots with multi-level
abstraction. Lecture Notes in Artificial Intelligence, 2008, 5001: 136147.
[97] NVIDIA Isaac Sim. [Online]. Available: https://developer.nvidia.com/isaac-sim. Accessed:
01.10.2021.
[98] C. Won, J. Ryu, and J. Lim, SweepNet: Wide-baseline Omnidirectional Depth Estimation,” 2019
International Conference on Robotics and Automation (ICRA), pp. 6073-6079, 2019.
[99] Z. Zhang, H. Rebecq, C. Forster, and D. Scaramuzza, “Benefit of large field-of-view cameras for
visual odometry,” 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 801-
808, 2016.
[100] B. Paul, Creating fisheye image sequences with Unity3D,” 2015. [Online]. Available:
https://www.researchgate.net/publication/279963195_Creating_fisheye_image_sequences_with_Unity
3D. Accessed: 29.06.2020.
[101] K. Whitehouse, and D. Culler, “Macrocalibration in sensor/actuator networks. Mobile Networks and
Applications,” vol. 8, no. 4, pp. 463-472, 2003.
[102] G. Zhang, and Z. Wei, “A novel calibration approach to structured light 3D vision inspection,” Optics
and Laser Technology, vol. 34, no. 5, pp. 373380, Jul. 2002.
[103] Z. Xie, W. Zhu, Z. Zhang, and M. Jin, “A novel approach for the field calibration of line structured-
light sensors,” Measurement, vol. 43, no. 2, pp. 190196, Feb. 2010.
[104] Z. Liu, X. Li, F. Li, and G. Zhang, “Calibration method for linestructured light vision sensor based on
a single ball target,” Optics and Lasers in Engineering, vol. 69, pp. 2028, Jun. 2015.
北京理工大学博士学位论文!
127
[105] Z. Wei, M. Shao, G. Zhang, and Y. Wang, “Parallel-based calibration method for line-structured light
vision sensor,” Optical Engineering, vol. 53, no. 3, Mar. 2014.
[106] Q. Zhang, and R. Pless, “Extrinsic calibration of a camera and laser range finder (improves camera
calibration),” In Proc. of the 2004 IEEE/RSJ International Conference on Intelligent Robots and
Systems (IROS), vol. 3, Jan. 2004.
[107] J. Xu, B. Gao, C. Liu, P. Wang, and S. Gao, “An omnidirectional 3D sensor with line laser scanning,”
Optics and Lasers in Engineering, vol. 84, pp. 96104, Sep. 2016.
[108] C. Mei and P. Rives. Calibration between a central catadioptric camera and a laser range finder for
robotic applications. In Proceedings 2006 IEEE International Conference on Robotics and Automation,
2006. ICRA 2006., pages 532537, Orlando, FL, USA, May 2006. IEEE. DOI:
10.1109/ROBOT.2006.1641765.
[109] F. Vasconcelos, J. P. Barreto, and U. Nunes. A minimal solution for the extrinsic calibration of a
camera and a laser-rangefinder. IEEE Transactions on Pattern Analysis and Machine Intelligence,
34(11):20972107, Nov 2012. DOI: 10.1109/TPAMI.2012.18.
[110] X. Gong, Y. Lin, and J. Liu. 3D LIDAR-camera extrinsic calibration using an arbitrary trihedron.
Sensors, 13(2):19021918, Feb 2013. DOI: 10.3390/s130201902.
[111] R. Gomez-Ojeda, J. Briales, E. Fernandez-Moral, and J. Gonzalez-Jimenez. Extrinsic calibration of a
2d laser-rangefinder and a camera based on scene corners. In 2015 IEEE International Conference on
Robotics and Automation (ICRA), pages 36113616, Seattle, WA, USA, May 2015. IEEE. DOI:
10.1109/ICRA.2015.7139700.
[112] Y. Bok, D.-G. Choi, and I. S. Kweon. Extrinsic calibration of a camera and a 2D laser without overlap.
Robotics and Autonomous Systems, 78:1728, April 2016. DOI: 10.1016/j.robot.2015.12.007.
[113] M. Pereira, D. Silva, V. Santos, and P. Dias. Self calibration of multiple LIDARs and cameras on
autonomous vehicles. Robotics and Autonomous Systems, 83:326337, Sep 2016. DOI:
10.1016/j.robot.2016.05.010.
[114] C. Guindel, J. Beltrán, D. Martín, and F. García. Automatic extrinsic calibration for lidarstereo vehicle
sensor setups. In 2017 IEEE 20th International Conference on Intelligent Transportation Systems
(ITSC), pages 16, Yokohama, Japan, Oct 2017. IEEE. DOI: 10.1109/ITSC.2017.8317829.
[115] K. A. Yousef, B Mohd, K. Al-Widyan, and T. Hayajneh. Extrinsic calibration of camera and 2D laser
sensors without overlap. Sensors, 17(10):2346, Oct 2017. DOI: 10.3390/s17102346.
[116] T. Kühner and J. Kümmerle. Extrinsic multi sensor calibration under uncertainties. In 2019 IEEE
Intelligent Transportation Systems Conference (ITSC), pages 39213927, Auckland, New Zealand,
Oct 2019. IEEE. DOI: 10.1109/ITSC.2019.8917319.
[117] M. Oliveira, A. Castro, T. Madeira, P. Dias, and V. Santos. A general approach to the extrinsic
calibration of intelligent vehicles using ROS. In M. Silva, J. L. Lima, L. P. Reis, A. Sanfeliu, and D.
北京理工大学博士学位论文!
128
Tardioli, editors, Robot 2019: Fourth Iberian Robotics Conference, pages 203215, Cham, 2020.
Springer International Publishing. DOI: 10.1007/978-3-030-35990-4_17.
[118] E. B. Bacca, E. Mouaddib, and X. Cufi, “Embedding Range Information in Omnidirectional Images
through Laser Range Finder,” In Proc. of the 2010 IEEE International Conference on Intelligent
Robots and Systems, pp. 20532058, Oct. 2010.
[119] B. Wang, M. Wu, W. Jia, “The Light Plane Calibration Method of the Laser Welding Vision
Monitoring System,” In Proc. of the 2017 2nd International Conference on Mechatronics and
Electrical Systems (ICMES), vol. 339, 2018.
[120] L. Kurnianggoro, V. Hoang, and K. Jo, Calibration of a 2D Laser Scanner System and Rotating
Platform using a Point-Plane Constraint,” Computer Science and Information, vol. 12, no. 1, pp. 307-
322, Jan. 2015.
[121] X. Chen, F. Zhou, and T. Xue, “Omnidirectional field of view structured light calibration method for
catadioptric vision system,” Measurement, vol. 148, Dec. 2019.
[122] M. Nakagawa, T. Yamamoto, S. Tanaka, M. Shiozaki, and T. Ohhashi. “Topological 3D Modeling
Using Indoor Mobile Lidar Data,” in ISPRS - International Archives of the Photogrammetry, Remote
Sensing and Spatial Information Sciences, pp. 13-18, May 2015 doi: 10.5194/isprsarchives-XL-4-W5-
13-2015.
[123] X. Li, S. Li, S. Jia, and C. Xu, "Mobile robot map building based on laser ranging and kinect," 2016
IEEE International Conference on Information and Automation (ICIA), pp. 819-824, Aug. 2016 doi:
10.1109/ICInfA.2016.7831932.
[124] F. Tsai, T. Wu, I. Lee, H. Chang, and A. Su, “Reconstruction of Indoor Models Using Point Clouds
Generated from Single-Lens Reflex Cameras and Depth Images,” ISPRS - International Archives of
the Photogrammetry, Remote Sensing and Spatial Information Sciences, pp. 99-102, May 2015.
[125] M. Bosse, and R. Zlot, “Continuous 3D scan-matching with a spinning 2D laser,”' in Proc. IEEE Int.
Conf. Robot. Automat., pp. 4312-4319, May 2009.
[126] A. Nuchter, K. Lingemann, J. Hertzberg, and H. Surmann, “6D SLAM with approximate data
association,'” in Proc. Int. Conf. Adv. Robot., pp. 242-249, 2005.
[127] T. Fujita, “3D sensing and mapping for a tracked mobile robot with a movable laser ranger finder,” in
Int. J. Mech. Mechatron. Eng., vol. 6, no. 2, pp. 501-506, 2012.
[128] T. Ueda, H. Kawata, T. Tomizawa, A. Ohya, and S. Yuta, “Mobile sokuiki sensor system-accurate
range data mapping system with sensor motion,” in Proc. IEEE Int. Conf. Auton. Robots Agents, 2006,
pp. 1-6.
[129] H. Qin, Y. Bi, F. Lin, Y. F. Zhang, and B. M. Chen, “A 3D rotating laser based navigation solution for
micro aerial vehicles in dynamic environments,'' Unmanned Syst., vol. 6, pp. 1-8, Sep. 2018.
北京理工大学博士学位论文!
129
[130] Y. Son, S. Yoon, S. Oh, and S. Han, “A Lightweight and Cost-Effective 3D Omnidirectional Depth
Sensor Based on Laser Triangulation,” in IEEE Access, vol. 7, pp. 58740-58750, 2019, doi:
10.1109/ACCESS.2019.2914220.
[131] P. De Ruvo, G. De Ruvo, A. Distante, M. Nitti, E. Stella, and F. Marino, “An Omnidirectional Range
Sensor for Environmental 3-D Reconstruction,” In Proc. of the 2010 IEEE Symposium on Industrial
Electronics (ISIE), pp. 396401, Jul. 2010.
[132] X. Lian, Z. Liu, X. Wang, and L. Dou, “Reconstructing Indoor Environmental 3D Model Using Laser
Range Scanners and Omnidirectional Camera,” In Proc. of the 7th World Congress on Intelligent
Control and Automation, vol. 1, no. 23, pp. 16401644, Jun. 2008.
[133] R. Benosman, and S. Kang, Panoramic Vision,” Springer Verlag ISBN 0-387-95111-3, 2000.
[134] Fermuller C, Aloimonos Y, Baker P, et al., “Multi-camera Networks: Eyes from Eyes,” IEEE
Workshop on Omnidirectional Vision, 2000.
[135] Cutler R., “Distributed meetings: a meeting capture and broadcasting system,” Tenth ACM
International Conference on Multimedia, pp. 503-512, 2002.
[136] F. Huang, and R. Klette, “Stereo panorama acquisition and automatic image disparity adjustment for
stereoscopic visualization,” Multimedia Tools Appl., vol. 47, pp. 353377 , 2010.
[137] J. Yu, L. McMillan, and P. Sturm, “Multi-perspective modelling, rendering and imaging,” Comput.
Graph. Forum 29, pp. 227246, 2010.
[138] D. Zamalieva, and A. Yilmaz, “Background subtraction for the moving camera: a geometric approach,”
Comput. Vis. Image Underst. pp. 127, 73–85, 2014.
[139] S. Baker, and S. K. Nayar, “A theory of catadioptric image formation,” in Proc. of the Int. Conf. on
Computer Vision, Bombay, pp. 3542, 1998.
[140] H. H. P. Wu, and S. H. Chang, “Fundamental matrix of planar catadioptric stereo systems,” IET
Comput. Vis., vol. 4, pp. 85104, 2010.
[141] I. Cinaroglu, and Y. Bastanlar, “A direct approach for object detection with catadioptric
omnidirectional cameras,” Signal Image Video Process., vol. 10, pp. 413420, 2016.
[142] C. Geyer, and K. Daniilidis, “A unifying theory for central panoramic systems and practical
implications, in Proc. of the Eur. Conf. on Computer Vision, Dublin, pp. 159179, 2000.
[143] M. Blösch, S. Weiss, D. Scaramuzza, and R. Siegwart, “Vision based MAV navigation in unknown
and unstructured environments,” In Proc. of the International Conference on Robotics and Automation
(ICRA 2010), Anchorage, Alaska, May 2010.
[144] M. Bosse, R. Rikoski, J. Leonard, and S. Teller, “Vanishing points and 3d lines from omnidirectional
video,” In Proc. of the International Conference on Image Processing, 2002.
[145] P. Corke, D. Strelow, and S. Singh, “Omnidirectional visual odometry for a planetary rover,” In Proc.
of the International Conference on Intelligent Robots and Systems, 2004.
北京理工大学博士学位论文!
130
[146] D. Scaramuzza, F. Fraundorfer, and R. Siegwart, Real-time monocular visual odometry for on-road
vehicles with 1-point RANSAC,” In Proc. of the International Conference on Robotics and
Automation (ICRA 2009), Kobe, Japan, May 2009.
[147] D. Scaramuzza, and R. Siegwart, R., “Appearance guided monocular omnidirectional visual odometry
for outdoor ground vehicles,” IEEE Transactions on Robotics, vol. 24, no. 5, Oct. 2008.
[148] J. Tardif, Y. Pavlidis, and K. Daniilidis, “Monocular visual odometry in urban environments using an
omnidirectional camera,” In Proc. of the International Conference on Intelligent Robots and Systems,
2008.
[149] D. Scaramuzza, F. Fraundorfer, and M. Pollefeys, Closing the loop in appearance guided
omnidirectional visual odometry by using vocabulary trees,” Robotics and Autonomous System
Journal (Elsevier), 2010.
[150] R. Benosman, and S. Kang, Panoramic Vision,” Sensors, Theory, and Applications, New York,
Springer-Verlag, 2001.
[151] K. Daniillidis, and R. Klette, Imaging Beyond the Pinhole Camera,” New York, Springer, 2006.
[152] D. Scaramuzza, “Omnidirectional vision: from calibration to robot motion estimation,” PhD thesis n.
17635, ETH Zurich, February 2008.
[153] Brown, D. C.: “Close-range camera calibration”, Photogrammetric Engineering, 37(8):855-866, 1971.
[154] Heikkil¨a, J.: “Geometric camera calibration using circular control points”, TPAMI, 22(10):1066-1077,
2000.
[155] Swaminathan, R. and Nayar, S. K.: “Nonmetric calibration of wide-angle lenses and polycameras”,
TPAMI, 22(10):1172-1178, 2000.
[156] J. Kannala and S. Brandt. A generic camera calibration method for fish-eye lenses. Proceedings of the
17th International Conference on Pattern Recognition, 2004. ICPR 2004., 2004
[157] Peter Sturm. Camera models and fundamental concepts used in geometric computer vision. Now,
2011.
[158] Xianghua Ying, Zhanyi Hu, and Hongbin Zha. Fisheye lenses calibration using straight-line spherical
perspective projection constraint. Computer Vision ACCV 2006.
[159] Jan Heller, Didier Henrion, and Tomas Pajdla. Stable radial distortion calibration by polynomial
matrix inequalities programming, 2014. URL http://arxiv.org/abs/1409.5753.
[160] A. Bechar, and C. Vigneault, “Agricultural robots for field operations. Part 2: Operations and systems,”
Biosyst Eng, pp. 110128, 2017.
[161] Kannala, J.; Brandt, S. A generic camera model and calibration method for conventional, wide-eye
and fisheye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 13351340.
[162] C. Mei and P. Rives. Single view point omnidirectional camera calibration from planar grids. In IEEE
International Conference on Robotics and Automation, Roma, Italy, April 2007
北京理工大学博士学位论文!
131
[163] D. Xiao-Ming, W. Fu-Chao, and W. Yi-Hong. An easy calibration method for central catadioptric
cameras. ACTA AUTOMATICA SINICA, 33:801{808, 2007.
[164] S. Gasparini, P. F. Sturm, and J. P. Barreto. Plane-based calibration of central catadioptric cameras. In
IEEE International Conference on Computer Vision, 2009.
[165] S. Shah and J. K. Aggarwal. A simple calibration procedure for sh-eye (highdistortion) lens camera. In
Proceedings of the 1994 International Conference on Robotics and Automation, San Diego, CA, USA,
May 1994.
[166] L. Puig, Y. Bastanlar, P. Sturm, J. J. Guerrero, and J. Barreto. Calibration of central catadioptric
cameras using a dlt-like approach. International Journal of Computer Vision, 93(1):101{114, March
2011.
[167] Christopher Mei. Christopher mei - research assistant, 2015b. URL
http://www.robots.ox.ac.uk/~cmei/Toolbox.html. (Accessed August 28 , 2015).
[168] Christopher Mei, 2015a. URL http://www.robots.ox.ac.uk/~cmei/articles/projection_model.pdf.
(Accessed August 23 , 2015).
[169] D. Scaramuzza, “OCamCalib: Omnidirectional Camera Calibration Toolbox for Matlab”. [Online].
Available: https://www.sites.google.com/site/scarabotix/ocamcalib-toolbox.
[170] S. Urban, J. Leitloff, and S. Hinz, “Improved Wide-Angle, Fisheye and Omnidirectional Camera
Calibration,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 108, pp. 7279, Oct. 2015.
[171] D. Scaramuzza, A. Martinelli, and R. Siegwart, “A toolbox for easily calibrating omnidirectional
cameras,” In Proc. of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems,
Oct. 2006.
[172] Robotics System Toolbox. [Online]. Available: https://www.mathworks.com/help/robotics/ (accessed
Feb. 6, 2021).
[173] P. I. Corke, “A Robotics Toolbox for Matlab,” IEEE Robotics and Automation Magazine, 1996.
[174] C. Paniagua, L. Puig, and J. Guerrero, “Omnidirectional Structured Light in a Flexible Configuration,”
Sensors, vol. 13, no. 10, pp. 13903-13916, Oct. 2013.
[175] H. Kawasaki, R. Sagawa, Y. Yagi, R. Furukawa, N. Asada, and P. Sturm, “One-shot scanning method
using an uncalibrated projector and camera system,” In Proc. of the 2010 IEEE Computer Society
Conference on Computer Vision and Pattern Recognition Workshops, pp. 104-111, Jul. 2010.
[176] J. Al-Azzeh, B. Zahran, Z. Alqadi, “Salt and Pepper Noise: Effects and Removal,” International
Journal on Informatics Visualization, vol. 2, no. 4, pp. 252-256, Jul. 2018.
[177] D. Anguelov, D. Koller, E. Parker, and S. Thrun, “Detecting and modeling doors with mobile robots,”
In Proc. of the IEEE Int. Conf. on Robotics & Automation (ICRA), pp. 37773784, 2004.
[178] B. Limketkai, L. Liao, and D. Fox, “Relational object maps for mobile robots,” In Proceedings of the
19th International Joint Conference on Artificial Intelligence, IJCAI’05, pp. 14711476, San
Francisco, CA, USA, 2005.
北京理工大学博士学位论文!
132
[179] R. B. Rusu, W. Meeussen, S. Chitta, and M. Beetz, “Laser-based Perception for Door and Handle
Identification,” In Proc. of the Intl. Conf. on Advanced Robotics (ICAR), 2009.
[180] E. Klingbeil, A. Saxena, and A. Ng, “Learning to open new doors,” In Proc. of the IEEE/RSJ Int. Conf.
on Intelligent Robots and Systems (IROS), pp. 27512757, 2010.
[181] D. Kragic, L. Petersson, and H. I. Christensen, “Visually guided manipulation tasks. In Robotics and
Autonomous Systems,” pp. 193 203, 2002.
[182] M. Quigley, S. Batra, S. Gould, E. Klingbeil, Q. Le, A. Wellman, and A. Ng, “High-accuracy 3d
sensing for mobile manipulation: Improving object detection and door opening,” In Proc. of the IEEE
Int. Conf. on Robotics & Automation (ICRA), pp. 28162822, 2009.
[183] A. Andreopoulos, and J. K. Tsotsos, “Active vision for door localization and door opening using
playbot: A computer controlled wheelchair for people with mobility impairments,” In Computer and
Robot Vision, 2008. CRV ’08. Canadian Conference on, pp. 310, May 2008.
[184] Cusano C, Napoletano P, Schettini R. Intensity and color descriptors for texture classification. In:
Image Processing: Machine Vision Applications VI, 2013; 8661: 866113.
[185] Napoletano P. Hand-Crafted vs Learned Descriptors for Color Texture Classification. In: International
Workshop on Computational Color Imaging, March 2017, 259271.
[186] Schmidhuber J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61: 85117.
[187] Napoletano P, Flavio P, and Raimondo S. Anomaly Detection in Nanofibrous Materials by CNN-
Based Self-Similarity. Sensors, 2018, 18(1): 209. https://doi.org/10.3390/s18010209.
[188] Napoletano P. Visual descriptors for content-based retrieval of remote-sensing images. Int. J. Remote
Sens. 2018, 39: 134.
[189] Bianco S, Celona L, Napoletano P, Schettini R. On the Use of Deep Learning for Blind Image Quality
Assessment. arXiv, 2017, arXiv:1602.05531.
[190] Cusano C, Napoletano P, Schettini R. Combining multiple features for color texture classification. J.
Electron. Imaging, 2016, 25: 061410-061410.
[191] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” In Proc. of the
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778, Jun. 2016.
[192] C. Wu, Towards Linear-Time Incremental Structure from Motion,2013 International Conference on
3D Vision - 3DV 2013, 2013, pp. 127-134.
[193] C. Sui, K. He, C. Lyu, Z. Wang and Y. -H. Liu, "Active Stereo 3-D Surface Reconstruction Using
Multistep Matching," in IEEE Transactions on Automation Science and Engineering, vol. 17, no. 4, pp.
2130-2144, Oct. 2020.
[194] H. Wang, J. Wang and L. Wang, "Online Reconstruction of Indoor Scenes from RGB-D Streams,"
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3271-3279.
北京理工大学博士学位论文!
133
[195] S. -H. Baek and F. Heide, "Polka Lines: Learning Structured Illumination and Reconstruction for
Active Stereo," 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
2021, pp. 5753-5763.
[196] Z. Li, P. C. Gogia and M. Kaess, "Dense Surface Reconstruction from Monocular Vision and
LiDAR," 2019 International Conference on Robotics and Automation (ICRA), 2019, pp. 6905-6911.
[197] F. Sadeghi, and S. Levine, CAD2RL: Real single-image flight without a single real image,” 20167
arXiv:1611.04201.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
An omnidirectional vision system consisting of a fisheye camera and structured light projector has its own distinctive advantages towards other vision systems. Vision system provides a wide range of information by one single capture. Moreover, the laser feature is easy to detect and extract from the given image for further analysis. In order to obtain reliable results, the vision system must be calibrated. There have been improvements towards calibration techniques for vision systems with laser illumination. However, these techniques are not suitable for an omnidirectional vision system in a particular configuration. Motivated by the above-mentioned problem, this paper presents a novel omnidirectional vision system with laser illumination in a flexible configuration and a method for its extrinsic calibration. Flexible configuration means that both camera and laser plane are no longer parallel to the floor. The calibrated vision system is evaluated experimentally by using simulated and real data to demonstrate that the proposed calibration method is more robust and provides a higher measurement accuracy.
Article
Precise 3-D surface reconstruction plays an important role in automated manipulation, industrial inspection, robotics, and so on. In this article, we present a novel 3-D surface reconstruction framework for stereo vision systems assisted with structured light projection. In the framework, a multistep matching scheme is proposed to establish a reliable correspondence between image pairs with high computation efficiency and accuracy. The successive matching steps can find the most precise correspondence through a step-by-step filtering procedure. To further enhance the precision, a correspondence refinement algorithm is presented. Phase maps with different frequencies are utilized as the code words for the multistep matching due to their high encoding accuracy and robustness to noise. This method does not require phase unwrapping or projector calibration, which improves the reconstruction precision and simplifies the operation. Selection strategies for the number of matching steps, the pattern frequencies, and the matching threshold are proposed. Furthermore, various 3-D reconstruction experiments are conducted using the proposed framework. Comparative experiments verify the advantages of the proposed framework compared with existing 3-D reconstruction methods regarding the accuracy and precision. The adaptability to scenarios with different motion speeds is demonstrated. Robustness and limitations of the framework are also revealed by conducting experiments in challenging scenarios.
Book
Camera Models and Fundamental Concepts Used in Geometric Computer Vision is mainly motivated by the increased availability and use of panoramic image acquisition devices, in computer vision and several of its applications. Different technologies and different computational models thereof exist and algorithms and theoretical studies for geometric computer vision ("structure-from-motion") are often re-developed without highlighting common underlying principles. Camera Models and Fundamental Concepts Used in Geometric Computer Vision surveys the image acquisition methods used in computer vision and especially, the vast number of camera models that have been proposed and investigated over the years, and points out similarities between different models. Results on epipolar and multi-view geometry for different camera models are reviewed as well as various calibration and self-calibration approaches, with an emphasis on non-perspective cameras. Camera Models and Fundamental Concepts Used in Geometric Computer Vision also describes what the authors consider are fundamental building blocks for geometric computer vision or structure-from-motion: epipolar geometry, pose and motion estimation, 3D scene modeling, and bundle adjustment. The main goal here is to highlight the core principles of these, which are independent of specific camera models.