(PDF) Reconstruction of the Indoor Environment based on Omnidirectional Vision with the Laser Structured Lights

基于全向激光结构光视觉的室内环境重构研究

Ivan Kholodilin

2021 年12 月

中图分类号：TP13

UDC 分类号：621.3

基于全向激光结构光视觉的室内环境重构研究

作者姓名 Ivan Kholodilin

学院名称自动化学院

指导教师王庆林教授

答辩委员会主席徐德研究员

申请学位工学博士

学科专业控制科学与工程

学位授予单位北京理工大学

论文答辩日期 2021 年12 月

Reconstruction of the Indoor Environment based on

Omnidirectional Vision with the Laser Structured Lights

Candidate Name: Ivan Kholodilin

School or Department: School of Automation

Faculty Mentor: Prof. Qinglin Wang

Chair, Thesis Committee: Prof. De Xu

Degree Applied: Doctor of Philosophy

Major: Control Science and Engineering

Degree by: Beijing Institute of Technology

The Date of Defense: December, 2021

基

于

全

向

激

光

结

构

光

视

觉

的

室

内

环

境

重

构

研

究

北

京

理

工

大

学

研究成果声明

本人郑重声明：所提交的学位论文是我本人在指导教师的指导下进行的

研究工作获得的研究成果。尽我所知，文中除特别标注和致谢的地方外，学

位论文中不包含其他人已经发表或撰写过的研究成果，也不包含为获得北京

理工大学或其它教育机构的学位或证书所使用过的材料。与我一同工作的合

作者对此研究工作所做的任何贡献均已在学位论文中作了明确的说明并表示

了谢意。

特此申明。

签名：日期：

关于学位论文使用权的说明

本人完全了解北京理工大学有关保管、使用学位论文的规定，其中包括：

①学校有权保管、并向有关部门送交学位论文的原件与复印件；②学校可以

采用影印、缩印或其它复制手段复制并保存学位论文；③学校可允许学位论

文被查阅或借阅；④学校可以学术交流为目的,复制赠送和交换学位论文；⑤

学校可以公布学位论文的全部或部分内容（保密学位论文在解密后遵守此规

定）。

签名：日期：

导师签名：日期：

Abstract

The indoor reconstruction is an important topic in computer vision, contributing to

various applications such as virtual and augmented reality; layout recovery; mobile robots

navigation. Perception and sensing become an important part of the reconstruction of unknown

environments. This thesis is a contribution towards the methods of extrinsic calibration of vision

system and 3D reconstruction of indoor environments based on the omnidirectional vision with

laser illumination.

There have been improvements towards calibration techniques for vision systems with

laser illumination. However, these techniques are limited for an omnidirectional vision system in

particular configurations. Therefore, an improved omnidirectional vision system with laser

illumination in a flexible configuration and a method for its extrinsic calibration are proposed.

Results of experiments and simulations demonstrate that the proposed calibration method is more

robust and provides a higher measurement accuracy.

In general, reconstruction methods can be based on passive or active sensing techniques.

Passive vision systems depend on the detected environmental features and work under

environmental lightning conditions. As a result, when textures on the reconstructed surface are

insufficient, e.g. simple long corridors, such methods present low matching accuracy. These

problems could be eliminated by active vision systems, e.g. a structured light can be easily

detected by a camera. Recent studies also consider deep learning for recovering 3D structure of

the environment. In this thesis, a novel approach for 3D reconstruction is proposed, which

combines methods of deep learning in combination with an active omnidirectional vision system.

The proposed reconstruction technique is simple, robust and accurate.

Recent advances in a deep learning require a large amount of the annotated training data

in environments with a variety of conditions. Thus, developing and testing algorithms for

navigation of mobile robots can be expensive and time-consuming. Motivated by the problems, a

photo-realistic simulation platform is developed. Built using Unity, the simulation platform

integrates sensors, mobile robots, elements of the indoor environment, and facilitates generation

of synthetic photorealistic datasets with automatic ground truth annotations. The proposed

simulator and supporting materials are available online: http://www.ilabit.org or

https://ilabit4.wixsite.com/mysite-1

Keywords: Extrinsic Calibration, Measurement, Omnidirectional Vision System,

Structured Light, Simulation, Mapping, Environment Reconstruction

Acknowledgement

The study in China completely changed my perception of the educational process. Since

childhood Chinese kids are becoming familiar with educational centers, mostly for studying

English language. The educational process never stops, during the school and university time

Chinese people also study very hard, by spending all of the time in the library of in the

laboratory. It was so unusual and strange for me in the beginning, but now I’m sure this approach

is the best investment towards the future life. No wonder that China is so powerful today and it

was so pleasant for me to be a part of this country, that is why I’ve been intensively studying

Chinese language! However, the Covid-19 pandemic has changed the rules and I can’t currently

return to Beijing for defensing procedure as well for realizing my future plans, but I am trying to

keep a Chinese presence of mind, namely to work hard and won’t be depressed J

I would like to express my deepest appreciation and gratitude to my research supervisor

Professor Wang Qing Lin. I would not have made it through these years at the university without

your support given to me for finding my way in life. I hope my occasional recklessness and

spontaneity didn’t give you too much grey hair. Thank you for teaching me everything I know

about computer vision and robotics. The knowledge you have given me, made it possible for me

to reach new heights and do things that would be unimaginable a few years ago. Thank you for

being like a father to me all these years and giving me guidance on various academic things and

also sharing your life with me. Thank you and Associate Professor Li Yuan for giving me the

opportunity to work in the State Key Laboratory of Intelligent Control and Decision of Complex

Systems, for your continuous support throughout the years!

I would also like to thank my lab-mates: Qi Yang, Jiang Zhao Guo and Li Meng Meng. I

am grateful for their advice as well as active support in conducting various experiments for my

research and in my academic life.

I thank my fellows in China: Luis Lago, Dmitry Zuev, Mark Rudak, Kanat Orazaliev,

Shazad, Imran for their helpful insights towards my research and for accompanying me

throughout these years.

Last but not least I would like to thank my mom for her support and love.

List of Publications

Journal Publications:

1. I. Kholodilin, Y. Li and Q. Wang, “Omnidirectional Vision System With Laser Illumination

in a Flexible Configuration and Its Calibration by One Single Snapshot,” in IEEE

Transactions on Instrumentation and Measurement, vol. 69, no. 11, pp. 9105-9118, Nov.

2020.

SCI Journal, Rank – Q1, Impact Factor – 3.658

doi: 10.1109/TIM.2020.2998598.

2. I. Kholodilin, Y. Li, Q. Wang, and P. Bourke, “Calibration and three-dimensional

reconstruction with a photorealistic simulator based on the omnidirectional vision system”

(2021).

Accepted on 25 Oct. 2021: International Journal of Advanced Robotic Systems.

SCI Journal, Rank – Q2, Impact Factor – 1.652

doi: 10.1177/17298814211059313.

Conference Publications:

1. I. Kholodilin, A. Nesterov, “Future of the Electrical Engineering Education on the AR and

VR Basis,” in Proc. of the International Conference on Video, Signal and Image Processing,

pp. 113–117, Oct. 2019.

doi: 10.1145/3369318.3369337.

I

Table of Content

Abstract .......................................................................................................................................... 6

Acknowledgement .......................................................................................................................... 7

List of Publications ........................................................................................................................ 8

Table of Content ............................................................................................................................ I

List of Figures ............................................................................................................................... V

Abbreviations and Symbols .......................................................................................................... 9

Chapter 1 Introduction ......................................................................................................... 10

1.1 2D Mapping and 3D Reconstruction of the Indoor Environment ................................... 11

1.1.1 General reconstruction techniques ......................................................................... 11

1.1.2 Passive vision ........................................................................................................ 12

1.1.3 Active vision .......................................................................................................... 16

1.2 Research Challenges ....................................................................................................... 20

1.2.1 Sensing techniques ................................................................................................ 20

1.2.2 Field of view .......................................................................................................... 20

1.2.3 Results validation .................................................................................................. 21

1.2.4 Structured light ...................................................................................................... 21

1.3 Problem Statement .......................................................................................................... 21

1.4 Thesis Aims and Objectives ............................................................................................ 23

1.4.1 Aim 1 – Literature review ..................................................................................... 23

1.4.2 Aim 2 – A novel omni-vision system .................................................................... 23

1.4.3 Aim 3 – Novel methods of extrinsic calibration .................................................... 24

1.4.4 Aim 4 – A novel 3D reconstruction technique ...................................................... 24

1.4.5 Tasks ...................................................................................................................... 25

1.5 Thesis Contributions ....................................................................................................... 25

1.6 Thesis Road Map ............................................................................................................ 26

Chapter 2 Literature Review ................................................................................................ 27

2.1 Virtual Environment ....................................................................................................... 27

2.1.1 Introduction ........................................................................................................... 27

2.1.2 Review on simulators ............................................................................................ 28

II

2.1.3 Omni-vision simulation ......................................................................................... 34

2.2 Extrinsic Calibration ....................................................................................................... 35

2.2.1 Review on methods ............................................................................................... 36

2.2.2 Problem formulation .............................................................................................. 40

2.3 Reconstruction of the indoor environment ..................................................................... 41

2.3.1 Stereo vision .......................................................................................................... 41

2.3.2 Kinect ..................................................................................................................... 42

2.3.3 Lidars ..................................................................................................................... 43

2.3.4 Structured light ...................................................................................................... 44

2.3.5 Deep learning ......................................................................................................... 45

2.4 Conclusion ...................................................................................................................... 46

Chapter 3 Omni-vision System Research Platform ............................................................ 47

3.1 Main Contributions of this chapter ................................................................................. 47

3.2 An Improved Omnidirectional Vision System ............................................................... 47

3.3 Intrinsic Calibration ........................................................................................................ 50

3.3.1 Calibration principle .............................................................................................. 50

3.3.2 Camera model ........................................................................................................ 53

3.3.3 Implementation ...................................................................................................... 55

3.3.4 Results ................................................................................................................... 56

3.4 Preliminary Work on Extrinsic Calibration .................................................................... 57

3.4.1 System model ........................................................................................................ 57

3.4.2 Calibration procedure ............................................................................................ 59

3.4.3 Discussion .............................................................................................................. 62

3.5 Development of the Simulation Environment ................................................................ 63

3.5.1 Motivation ............................................................................................................. 63

3.5.2 Simulation Environment ........................................................................................ 65

3.5.3 Platform features .................................................................................................... 67

3.5.4 Capabilities ............................................................................................................ 69

3.5.5 Map transform ....................................................................................................... 70

3.5.6 Experiment setup ................................................................................................... 72

3.5.7 Experiment results ................................................................................................. 73

III

3.5.8 Discussion .............................................................................................................. 74

Chapter 4 Extrinsic Calibration and 2D Mapping ............................................................. 75

4.1 Main Contributions of this chapter ................................................................................. 75

4.2 Proposed Method of Extrinsic Calibration ..................................................................... 75

4.2.1 System model ........................................................................................................ 75

4.2.2 Calibration procedure in a general form ................................................................ 76

4.2.3 Camera Calibration Placed to the Indoor Environment ......................................... 77

4.2.4 Laser Plane Calibration ......................................................................................... 80

4.3 Experiment Setup ............................................................................................................ 82

4.3.1 Self-evaluation technique ...................................................................................... 83

4.3.2 Comparison with the state-of-the-art calibration method ...................................... 84

4.3.3 2D mapping ........................................................................................................... 85

4.3.4 Real data ................................................................................................................ 85

4.4 Experiment Results ......................................................................................................... 86

4.4.1 Self-evaluation technique ...................................................................................... 86

4.4.2 Comparison with the state-of-the-art calibration method ...................................... 89

4.4.3 2D mapping ........................................................................................................... 89

4.4.4 Real data ................................................................................................................ 92

Chapter 5 Extrinsic Calibration and 3D Reconstruction ................................................... 95

5.1 Motivation ....................................................................................................................... 95

5.2 System Model ................................................................................................................. 95

5.3 Calibration Procedure ..................................................................................................... 96

5.3.1 Extrinsic parameters of the camera ....................................................................... 96

5.3.2 Extrinsic parameters of the laser plane .................................................................. 98

5.4 Evaluation ....................................................................................................................... 99

5.4.1 Experiment Setup .................................................................................................. 99

5.4.2 Results ................................................................................................................. 101

5.4.3 Discussion ............................................................................................................ 102

5.5 3D Reconstruction Method ........................................................................................... 102

5.5.1 Semantic Segmentation ....................................................................................... 103

5.5.2 Feature Extraction ................................................................................................ 104

IV

5.5.3 Experimental Setup .............................................................................................. 105

5.5.4 Results ................................................................................................................. 105

5.5.5 Discussion ............................................................................................................ 106

5.6 Perspective Projection ................................................................................................... 107

5.6.1 Preparation ........................................................................................................... 107

5.6.2 Perspective projection .......................................................................................... 107

5.6.3 Proposed 3D Reconstruction Technique ............................................................. 111

5.6.4 Comparison with commonly used reconstruction methods ................................. 114

Chapter 6 Conclusions ......................................................................................................... 117

6.1 Proposed Simulator ....................................................................................................... 117

6.2 Omni-Vision System and its Extrinsic Calibration ....................................................... 117

6.3 3D Reconstruction of the Indoor Environment ............................................................. 118

References ................................................................................................................................... 120

V

List of Figures

Figure 1.1: Applications related to the indoor reconstruction. ...................................................... 11

Figure 1.2: Stereo vision. ............................................................................................................... 14

Figure 1.3: Structure from motion. ................................................................................................ 16

Figure 1.4: Time-of-Flight scanners. ............................................................................................. 18

Figure 1.5: Laser range finders. ..................................................................................................... 19

Figure 1.6: Structured light. ........................................................................................................... 19

Figure 1.7: Mapping results when extrinsic calibration was not considered. ................................ 22

Figure 1.8: Configuration of the vision system with several laser emitters. ................................. 23

Figure 2.1: Webots simulator. ....................................................................................................... 29

Figure 2.2: Robotics System Toolbox. .......................................................................................... 30

Figure 2.3: Microsoft Robotics Developer Studio. ........................................................................ 30

Figure 2.4: USARSim simulator. .................................................................................................. 31

Figure 2.5: OpenSim simulator. .................................................................................................... 32

Figure 2.6: Robot Soccer Simulator with ÜberSim. ...................................................................... 32

Figure 2.7: Simbad simulator. ....................................................................................................... 33

Figure 2.8: Ways of simulating the omni-camera. ........................................................................ 34

Figure 2.9: Ways of simulating the omni-camera. ........................................................................ 35

Figure 2.10: Calibration methods for obtaining extrinsic parameters of the vision system. ......... 37

Figure 2.11: Setup Laser – Camera used in Zhang and Pless [100]. ............................................. 37

Figure 2.12: Setup Laser – Camera used in Vasconcelos et al [109]. ........................................... 38

Figure 2.13: Setup Laser – Camera used in Bok et al [110]. ......................................................... 39

Figure 2.14: Example of 3D points cloud. ..................................................................................... 42

Figure 2.15: 3D reconstruction based on the Kinect sensor. The image above shows single Kinect.

The image below shows multiple Kinects. .................................................................................... 43

Figure 2.16: 3D reconstruction based on the Lidar sensor. ........................................................... 44

Figure 2.17: 3D reconstruction based on the rotated structured light. .......................................... 45

Figure 2.18: Layout recovery with deep learning. ......................................................................... 46

Figure 3.1: Configuration of the vision system. A – fisheye camera Ricoh Theta S. B –

omnidirectional laser emitter. C – snapshot captured by fisheye camera. .................................... 47

Figure 3.2: Types of Cameras. A – Perspective camera. B – Catadioptric camera. C – Fisheye

camera. ........................................................................................................................................... 48

Figure 3.3: Mei’s model projection steps. ..................................................................................... 52

Figure 3.4: Pre-calibration procedure: preparing images for the calibration procedure. .............. 56

Figure 3.5: The evaluation of the intrinsic camera calibration. ..................................................... 57

Figure 3.6: Calibration images. . Left image shows the single checkerboard patterns. Right image

shows the checkerboard patterns with the laser beam. .................................................................. 59

Figure 3.7: Initial projection of the target (up- and front-views). ................................................. 60

Figure 3.8: Calibrated projection of the target (up- and front-views). .......................................... 60

VI

Figure 3.9: From left to right: Input image; Extracted laser beam; Projection of the laser and

points of the left checkerboard pattern; Projection of the laser and points of the front

checkerboard pattern. ..................................................................................................................... 61

Figure 3.10: Fitting plane to the laser points. ................................................................................ 61

Figure 3.11: Verification by mapping. Left image shows checkerboard patterns with the laser

beam. Midle image shows not-calibrated map. Right image shows calibrated map. .................... 62

Figure 3.12: Omnidirectional vision system. Left image shows elements of the system. Right

image shows snapshot captured by fisheye camera ....................................................................... 66

Figure 3.13: Several modes supported by the simulator, from left to right: ordinary mode,

semantic labeling mode, depth mode. ........................................................................................... 68

Figure 3.14: The calibration screen. .............................................................................................. 69

Figure 3.15: The measurement screen. .......................................................................................... 70

Figure 3.16: Left image shows initial map obtained with laser emitter; right image shows shifted

map. ............................................................................................................................................... 71

Figure 3.17: Left image shows binary map supported by the Robotics System Toolbox; right

image shows logical map supported by the Robotics Toolbox. .................................................... 72

Figure 3.18: Left image shows navigation to the target position with Robotics System Toolbox;

right image shows navigation to the target position with Robotics Toolbox. ............................... 73

Figure 4.1: Calibration target. Green border shows target in a cross-section. .............................. 77

Figure 4.2: Blue – initial projection. Red – projection with the optimized camera pitch and roll.

....................................................................................................................................................... 78

Figure 4.3: Optimized camera yaw. ............................................................................................... 80

Figure 4.4: Blue – initial projection. Red – projection with the optimized laser pitch and roll. ... 81

Figure 4.5: Optimized laser distance. ............................................................................................ 82

Figure 4.6: Noise Salt & Pepper. Left image is related with the camera calibration. Right image

is related with the laser plane calibration. Magenta color shows extracted image points. Red

curve is fitted among the majority of the points. Blue points are the curve points, which are used

for calibration and are also superimposed on the bottom image, for better visual understanding.

....................................................................................................................................................... 87

Figure 4.7: Mapping of the indoor environment. Red color – Proposed method; Blue color –

Standard method. Under maps it is shown index of the particular experiment with the

configuration of the vision system, where values in the brackets reveal [pitch; roll; yaw] in

degrees respectively. ...................................................................................................................... 90

Figure 4.8: Calibration target. Green border shows target in a cross-section. .............................. 92

Figure 4.9: Extrinsic calibration. ................................................................................................... 93

Figure 4.10: Mapping of the indoor environment. ........................................................................ 94

Figure 5.1: A shows previous configuration. B shows proposed configuration. ........................... 95

Figure 5.2: The proposed calibration target. .................................................................................. 96

Figure 5.3: A – initial projection. B – projection with the optimized camera pitch and roll ........ 97

VII

Figure 5.4: A – initial projection. B – projection with the optimized laser pitch, roll and distance.

....................................................................................................................................................... 98

Figure 5.5: Configuration of the vision system #1. A – Method 1. B – Proposed. C – Method 2.

..................................................................................................................................................... 100

Figure 5.6: Configuration of the vision system #2. A – Proposed. B – Method 2. ..................... 100

Figure 5.7: Overview: From a single fisheye snapshot, the proposed method combines semantic

segmentation and laser data to generate the 3D model of the indoor environment. .................... 103

Figure 5.8: Neural network performance evaluation. ResNet18 is shown in red color and

ResNet50 is shown in blue color. ................................................................................................ 105

Figure 5.9: Training results. ........................................................................................................ 106

Figure 5.10: Upper row shows extracted regions of the indoor environment (floor, ceiling, walls,

and doors) with the visible laser beam. Lower row shows corresponding perspective projection

for extracted regions of the indoor environment. ........................................................................ 107

Figure 5.11: Initialization of the virtual camera. ......................................................................... 108

Figure 5.12: Perspective image. .................................................................................................. 109

Figure 5.13: Transformation between world and camera coordinates. ....................................... 110

Figure 5.14: Transformation to the image coordinates. ............................................................... 110

Figure 5.15: Reconstruction results (single reconstruction and global map are presented). ....... 114

Figure 5.16: Left image shows reconstruction results on a passive method. Right image shows

reconstruction results on the proposed method. .......................................................................... 115

Figure 5.17: Images above show ground truth point clouds. Images below show reconstruction

results on the proposed method. .................................................................................................. 116

VIII

List of Tables

Table 3-1: Configuration parameters ............................................................................................. 56

Table 3-2: The Evaluation of the Intrinsic Camera Calibration .................................................... 57

Table 3-3: The Evaluation of the Extrinsic Calibration ................................................................ 62

Table 3-4: Summary of the considered calibration method .......................................................... 62

Table 3-5: Operating Modes of the Simulator ............................................................................... 68

Table 3-6: Comparison Analysis ................................................................................................... 74

Table 4-1: The Configurations of the Calibration Methods .......................................................... 84

Table 4-2: The Configuration of the Simulation Environment ..................................................... 85

Table 4-3: The Configuration of the Real Environment ................................................................ 85

Table 4-4: The Evaluation of the Intrinsic Camera Calibration .................................................... 87

Table 4-5: The Evaluation of the Extrinsic Calibration ................................................................ 89

Table 4-6: The Evaluation of the Mapping Results ....................................................................... 90

Table 4-7: The Evaluation of the Mapping Results ....................................................................... 91

Table 4-8: The Evaluation of the Extrinsic Calibration ................................................................ 93

Table 4-9: The Evaluation of the Mapping Results ....................................................................... 94

Table 5-1: The Configurations of the Calibrations Methods ......................................................... 99

Table 5-2: The Evaluation of the Extrinsic Calibration .............................................................. 101

Table 5-3: The Evaluation of the Trained Networks ................................................................... 106

Table 5-4: Rotations of the cube face .......................................................................................... 108

北京理工大学博士学位论文!

9

Abbreviations and Symbols

2D

Two-dimensional space

3D

Three-dimensional space

AE

Absolute error

CV

Computer vision

d

Noise density

DOF

Degree of freedom

FOV

Field of view

HRI

Human-robot interaction

IMU

Inertial Measurement Unit

LRF

Laser range finders

MAE

Mean absolute error

MLE

Maximum likelihood estimation

PRM

Probabilistic Roadmap

RMSE

Root mean squared error

SFM

Structure from motion

SLAM

Simultaneous localization and mapping

TCP

Transmission Control Protocol

ToF

Time-of-Flight

UAV

Unmanned aerial vehicles

UI

User interface

Notation

X, Y, Z

World coordinates

!

Depth scale

u’, v’

Distorted coordinates in the sensor image plane

u, v

Undistorted ones in a virtual normalized image plane

"#$%

Polynomial

R

Rotation matrix

T

Translation matrix

&

Standard deviation

北京理工大学博士学位论文!

10

Chapter 1 Introduction

Robots and unmanned systems are increasingly rising in the new era of artificial

intelligence and fourth industrial revolution [1–13]. The advancements in digital image

processing improved the development and the intelligence of robots and made them more

popular [14]. One of the main reasons for not having mobile robots around us nowadays is the

difficulty to process and interpret exteroceptive information from the world. This is key for

mobile robotics since a robot must understand its environment before it can interact with it.

The quality and performance of the applications that employ any form of sensing is

largely dependent on the precision and accuracy of sensors integrated to the robotic systems.

Unfortunately, measurements from any type of sensor are always corrupted by noise; in many

cases, the characteristics of the physical sensor (e.g., its transfer function in the case where it is

modeled as a linear time-invariant system) deviate from the ideal characteristics. Therefore, there

is always a technological demand for more accurate and precise sensors.

However, increasing the accuracy and precision of sensors may come at a cost; for

example, this may require modifying the hardware design of the sensing part, which typically

leads to increasing the costs of the sensors, in turn potentially limiting their applications. This

intuitively shows why there is a large strive to attempt to increase the accuracy and precision of

sensors without modifying their hardware but rather through statistical signal processing tools. In

other words, assuming the availability of extra computational resources or information, one may

implement calibration techniques.

Calibrated sensors by acquiring depth perception allows a robot to create representations

of the free space and to avoid collisions, thus creating useful maps for self-navigation, and to

perform recognition through shape description. Particularly, the above-mentioned potential

applications can substantially benefit from the approach based on the omnidirectional cameras in

the sense that the information related with the wide field of interest can be obtained by a single

capture. However, just for the camera itself (structure from motion) or several cameras (stereo

vision), it can be difficult to deal with these kinds of images for the feature extraction, as a result

of the high images distortion. Moreover, the inherent characteristics of the environment will also

have a deep impact on the results. For instance, the illumination level as well as the pixels

similarity are two well-known features that may impose several constraints in the feature

extraction process. The effect of these limiting characteristics inherent to the real environments is

北京理工大学博士学位论文!

11

difficult to process and it is known as a correspondence problem of the stereo vision. The

solution can be found by means of the integration of the structured light to the system. An

overview of sensing techniques of the reconstruction methods is considered through this Section.

1.1 2D Mapping and 3D Reconstruction of the Indoor Environment

1.1.1 General reconstruction techniques

Awareness of surroundings of the mobile robot in 3D, estimating its own position,

defining motion as well as understanding depth and range/field of views are integral parts of the

computer vision. Modern technologies and, in particular, computer vision, tries to come closer to

the perception of depth and volume similar to the human vision. On the current stage,

possibilities of the computer vision are still quite far from capabilities of the human vision,

however nowadays a lot of methods has been developed for obtaining 3D information about

surroundings and in order to do not discuss them all, in this Thesis we will focus on

reconstruction techniques which are relevant to our topic, namely to the reconstruction of the

indoor environment.

The indoor reconstruction is a crucial technique in computer vision, contributing to

various applications (see Figure 1.1), such as virtual and augmented reality [15, 16]; layout

recovery [17, 18]; and mobile navigation [19-21]. A wide variety of approaches and algorithms

have been proposed to tackle this complex problem. Perception and sensing become an integral

part of reconstruction of a previously unknown environment. They must be able to estimate the

three-dimensional structure of the environment in order to perform useful tasks. In general,

reconstruction methods can be based on passive or active sensing techniques, each method has

its own relative merits.

Figure 1.1: Applications related to the indoor reconstruction.

北京理工大学博士学位论文!

12

Many methods evaluating depth of the scene operates similar to the human vision

perception:

• stereo vision [22-24];

• structure from motion [25-27];

• shadow based methods [28-30];

• tactile methods [31-33].

There are also methods, based on new physical principles:

• Estimating passing time of the light to obstacles and backward (time-of-flight) [34-36];

• phase methods, based on principles of interference and holography [37-39];

• structured light [40-43].

Methods measuring 3D-objects can be also divide on contact (mechanical sensors) and

non-contact methods (stereo vision; laser and x-ray scanning).

Contact methods obtain information about the object by direct physical contact with its

surface. Such device used basically in manufacturing processes and should be enough accurate.

Several measurements can be taken for preparing surface grid for modeling future model. One of

the main advantages of this method – is a high modeling control, because of the manual hand

movements of the operator. However, movements of the hand for obtaining high-quality model

can be proceed very slow. Scanning speed do not exceeds several hundred hertz, after that

additional time has to be dedicated to the to the process of manual adjustments for obtaining final

model. Moreover, another lack of the contact scanners is the need of the direct contact with

surface of the scanning object, consequently there is a possibility to damage object during the

scanning process.

As for the latest years we can say that a significant progress has been achieved in the

development of contactless vision methods. Non-contact methods may divide on two groups –

active and passive vision.

1.1.2 Passive vision

Range sensing is extremely important in mobile robotics, since it is a basic input for

successful obstacle avoidance. As we have seen earlier in this chapter, a number of sensors are

popular in robotics explicitly for their ability to recover depth estimates: ultrasonic, laser

rangefinder, time-of-flight cameras. It is natural to attempt to implement ranging functionality

using vision chips as well.

北京理工大学博士学位论文!

13

However, a fundamental problem with visual images makes range finding relatively

difficult. Any vision chip collapses the 3D world into a 2D image plane, thereby losing depth

information. If one can make strong assumptions regarding the size of objects in the world, or

their particular color and reflectance, then one can directly interpret the appearance of the 2D

image to recover depth. But such assumptions are rarely possible in real-world mobile robot

applications. Without such assumptions, a single picture does not provide enough information to

recover spatial information.

The general solution is to recover depth by looking at several images of the scene to gain

more information, hopefully enough to at least partially recover depth. The images used must be

different, so that taken together they provide additional information. They could differ in camera

geometry—such as the focus position or lens iris—yielding depth from focus (or defocus)

techniques. An alternative is to create different images, not by changing the camera geometry,

but by changing the camera viewpoint to a different camera position. This is the fundamental

idea behind structure from stereo (i.e., stereo vision) and structure from motion. As we will see,

stereo vision processes two distinct images taken at the same time and assumes that the relative

pose between the two cameras is known. Structure from motion conversely processes two images

taken with the same or a different camera at different times and from different unknown

positions; the problem consists in recovering both the relative motion between the views and the

depth. The 3D scene that we want to reconstruct is usually called structure.

Stereoscopic system.

Nowadays, stereo cameras are commonly available. Stereo cameras work based on the

principle of two-view triangulation. Stereo cameras can again be of two types: ‘binocularstereo’

and ‘structured-light-stereo’. A binocular-stereo camera is made of two cameras placed side-by-

side. The relative positions and the orientations of the two cameras with respect to each other are

fixed in such devices. These relative positions and orientations can also be easily calibrated. By

analysis differences between images, namely by establishing point correspondences between

them we can determine distance from the vision system to each point in 3D space. This method

on its principle similar to the stereoscopic human vision. The simplest way is to capture two

snapshots of the scene and convert them to three-dimensional representation. It was already

mentioned earlier that for obtaining three-dimensional points it is compulsory to install points

北京理工大学博士学位论文!

14

correspondence among two pictures, after that by triangulation it is possible to determine depth

of the scene. In stereo vision we can identify two major problems:

1. The correspondence problem;

2. 3D reconstruction.

The first consists in matching (pairing) points of the two images which are the projection

of the same point in the scene. These matching points are called corresponding points or

correspondences (Figure 1.2). The correspondence search is based on the assumption that the

two images of the same scene do not differ too much, that is, a feature in the scene is supposed to

appear very similar in both images. Using an opportune image similarity metric, a given point in

the first image can be paired with one point in the second image. The problem of false

correspondences makes the correspondence search challenging. False correspondences occur

when a point is paired to another that is not its real conjugate. This is because the assumption of

image similarity does not hold very well, for instance if the part of the scene to be paired appears

under different illumination or geometric conditions. Other problems that make the

correspondence search difficult are:

• Occlusions: the scene is seen by two cameras at different viewpoints and therefore there

are parts of the scene that appear only in one of the images. This means, there exist points in one

image which do not have a correspondent in the other image.

• Distortion: there are surfaces in the scene which are nonperfectly lambertian, that is,

surfaces whose behavior is partly specular. Therefore, the intensity observed by the two cameras

is different for the same point in the scene as more as the cameras are farther apart.

• Projective distortion: because of the perspective distortion, an object in the scene is

projected differently on the two images, as more as the cameras are farther apart.

Figure 1.2: Stereo vision.

北京理工大学博士学位论文!

15

Knowing the correspondences between the two images, knowing the relative orientation

and position of the two cameras, and knowing the intrinsic parameters of the two cameras, it is

possible to reconstruct the scene points (i.e., the structure). This process of reconstruction

requires the prior calibration of the stereo camera; that is, we need to calibrate the two cameras

separately for estimating their extrinsic parameters, but we also need to determine their extrinsic

parameters, i.e. the camera relative position.

Structure from motion (SFM).

This method applies for evaluating spatial structure of the scene. SFM aims to establish

correspondence between several cameras and simultaneously define camera position and three-

dimensional points, for example by factorization method. However, such methods are useful

only for offline applications, operating not in a real time, because of the requirements of many

frame for defining pose and points calculation. Mostly, SFM methods are based on the single

camera, where base line is determined by methods evaluating own motion, e.g. SLAM

(simultaneous localization and mapping) [44, 45]. Correct estimation of the camera position and

the fundamental matrix itself is extremely important for defining geometry of the scene and

computing its three-dimensional structure.

When there are just two images captured, the geometrical situation can be either modeled

by homography, when camera undergoes pure rotation or observes a single plane in the scene, or

by epipolar geometry, when the movement of the camera is general and an articulated scene is

observed. Measurements in the images, namely the image pixel coordinates of the corresponding

projections of unknown scene 3D points, can be used to form a system of equations according to

the geometrical constraints of the appropriate model.

As the systems of equations become more complicated for the cases where more than two

cameras are involved, there is no closed form solution for the SFM problem [46]. The obvious

drawback of SFM methods is high sensitivity to noise in the measurements. These methods are

also sensitive to various degeneracies of the scene and specific actions have to be taken in order

to prevent failures [47].

北京理工大学博士学位论文!

16

Figure 1.3: Structure from motion.

1.1.3 Active vision

The passive vision system usually consists of one or more cameras. By passive, we mean

that no energy is emitted for the sensing purpose. Hence the system receives the reflected light

from its surroundings passively and the images are the only input data. On the whole, the passive

system works in a similar way as human visual system does and its equipment is simple and low-

cost. Another advantage is that the properties of the object’s surface can be analyzed directly

since it is based on ambient light reflection off the object surfaces. However, this type of

techniques suffers from some nontrivial difficulties. Firstly, since the passive systems are mainly

developed by utilizing the clues from the scene or illumination, e.g. texture or shade, the

unknown scene becomes a source of the uncertainties. As a result, the major disadvantage is its

low accuracy and speed. Secondly, to extract enough features and establish point

correspondences from image sequence is a time-consuming and difficult task in many cases.

Usually, there exists an awkward dilemma between the disparity and the overlapping area. With

a large overlapping area between the images, it is easy to extract corresponding features. But the

algorithm for 3D reconstruction is sensitive to noise or even ill-conditioned since the images

have small disparity. On the other hand, the algorithm becomes more and more robust as the

disparity is enlarged. But it is difficult to extract the corresponding features due to occlusion, or

out of scope of viewing field, etc.

While the passive vision system suffers from many difficulty problems, the active vision

system especially structured light system, is designed to overcome or alleviate these problems to

a certain degree. In this kind of system, one of the cameras in passive system is replaced by an

external projecting device (e.g. laser or LCD/DLP projector). The scene is illuminated by

北京理工大学博士学位论文!

17

emitting light patterns from the device and detected by the remaining cameras. Compared with

the passive approach, the active vision systems are in general more accurate and reliable.

Range cameras.

The current development level of electronics allows directly measure time, passing by

light to an object and calculate the corresponding distance. 3D – scanners, working on this

principle, are called Time-of-Flight (ToF) scanners. This type of scanners illuminate the scene

using near-infrared light and have a special CCD/CMOS array that can demodulate the reflected

light at every pixel. By measuring the phase difference between the emitted and received

amplitude-modulated light at every pixel location, the 3D geometric information of the scene can

be captured instantaneously in a single exposure. There is another category of 3D range cameras

that is based on the triangulation principle instead; examples from this category include the

Microsoft Kinect, Asus Xtion Pro, and Fotonic P70. Some of the advantages and disadvantages

of 3D range cameras over traditional photogrammetric systems and TLS instruments are listed

below:

Advantages:

• 3D Data capture at video frame rates: 3D geometric information can be captured at up

to 100 Hz without any scan time delay.

• Active sensor: There is no correspondence issue as the camera illuminates the scene and

acquires dense 3D geometric information in a single exposure.

Disadvantages:

• Short maximum distance: The maximum unambiguous range (typically under 10 m) of

a phase-based 3D range camera is limited by the modulation frequency. Maximum range of a

triangulation-based 3D range camera is restricted by the baseline separation between the emitter

and receiver.

• Small FOV: The largest angular FOV for 3D cameras currently on the market is 90°.

• Low measurement accuracy: Even in close-range applications, only centimeter-level

accuracy should be expected for each pixel due to systematic errors and a low signal-to-noise

ratio.

北京理工大学博士学位论文!

18

Figure 1.4: Time-of-Flight scanners.

Device on basis laser range finders (LRF).

The LRF is a sensor that achieves significant improvements over the ultrasonic range

sensor owing to the use of laser light instead of sound. This type of sensor consists of a

transmitter that illuminates a target with a collimated beam (e.g., laser), and a receiver capable of

detecting the component of light, which is essentially coaxial with the transmitted beam. LRF

sends signal, which spreads with some speed, then this signal is reflected from object and returns

back. Passage time defines past distance, namely if speed of the signal is known, the

multiplication of this speed on half time between moments of emitting signal and receiving it

back will give distance from emitter to object. Basic operating principle of LRF includes pulse

and phase distance measurement methods and also triangulation method. A mechanical

mechanism with a mirror sweeps the light beam to cover the required scene in a plane or even in

three dimensions, using a rotating, nodding mirror.

One way to measure the time of flight for the light beam is to use a pulsed laser and then

measure the elapsed time directly. Electronics capable of resolving picoseconds are required in

such devices and they are therefore very expensive. Besides cost of the LRF another attention

should be paid to an error mode, which involves coherent reflection of the energy. With light,

this will occur only when striking a highly polished surface. Practically, a mobile robot may

encounter such surfaces in the form of a polished desktop, file cabinet or, of course, a mirror.

Unlike ultrasonic sensors, laser rangefinders cannot detect the presence of optically transparent

materials such as glass, and this can be a significant obstacle in environments like, for example,

museums, where glass is commonly used.

北京理工大学博士学位论文!

19

Figure 1.5: Laser range finders.

Structured light.

The use of the structured light is one of the most reliable methods for recovery 3D

information of the scene. This method is based on projecting light template on the scene or

structured light, which then is viewed one or several cameras. Since this template is encoded,

correspondence between image points and scene points can be easily found. In present time

various templates have been developed for use in vision systems with the structured light basis,

representing a series changing pictures, as well as constant pictures with a variety of color

encoding options [48]. Regardless of how it is created, the projected light has a known structure,

and therefore the image taken by the camera can be filtered to identify the pattern’s reflection.

Note that the problem of recovering depth is in this case far simpler than the problem of

passive image analysis. In passive image analysis, existing features in the environment must be

used to perform correlation, while the present method projects a known pattern upon the

environment and thereby avoids the standard correlation problem altogether. Furthermore, the

structured light sensor is an active device so it will continue to work in dark environments as

well as environments in which the objects are featureless (e.g., uniformly colored and edgeless).

In contrast, stereo vision would fail in such texture free circumstances.

Figure 1.6: Structured light.

北京理工大学博士学位论文!

20

1.2 Research Challenges

1.2.1 Sensing techniques

As it was mentioned earlier passive vision systems suffer from some nontrivial

difficulties. Here, we summarize the main difficulties of the passive vision systems. Firstly, the

passive systems are mainly developed by utilizing the clues from the scene (e.g. texture).

Therefore, it may be less effective in areas consisting of plain walls or long simple corridors

from which few features can be identified and extracted for model reconstruction. Secondly, to

extract enough features and establish point correspondences from image sequence is a time-

consuming and difficult task in many cases. As a result, the major disadvantage is its low

accuracy and speed. Moreover, there exists an awkward dilemma between the disparity and the

overlapping area. With a large overlapping area between the images, it is easy to extract

corresponding features. But the algorithm for 3D reconstruction is sensitive to noise or even ill-

conditioned when the images have small disparity. On the other hand, the algorithm becomes

more and more robust as the disparity is enlarged, but then it is difficult to extract the

corresponding features due to occlusion.

1.2.2 Field of view

Besides sensing techniques another performance indicator of the indoor reconstruction is

the field of view (FOV). Conventional visual sensors such as normal or even wide-angle CCD

cameras still have relatively modest FOVs which complicates the reconstruction of the whole

surroundings. For example, the ceiling is not usually visible [49], despite being an important

component of the main structure of the indoor environment. Therefore, a more recent research

direction looks to improve the situation by extending the FOV by deploying omnidirectional

cameras. The main benefit of the approach based on the omnidirectional cameras in the sense

that the information related with the wide field of interest can be obtained by a single capture.

However, just for the camera itself (structure from motion) or several cameras (stereo vision), it

can be difficult to deal with these kinds of images for the feature extraction, as a result of the

high images distortion. Moreover, the inherent characteristics of the environment will also have a

deep impact on the results. For instance, the illumination level as well as the pixels similarity are

two well-known features that may impose several constraints in the feature extraction process.

北京理工大学博士学位论文!

21

The effect of these limiting characteristics inherent to the real environments is difficult to process

and it is known as a correspondence problem of the stereo vision.

1.2.3 Results validation

In the real environment it is difficult to estimate the error between real orientations (of

the camera and laser plane) and those obtained during the calibration process. Real orientation

cannot be measured in a precise way, due to the fact that manual inspection can lead to errors in

the measurement. Therefore, analysis of the calibration results becomes more challenging. In

addition, it is difficult to compare how one parameter or another may influence the final

calibration results.

1.2.4 Structured light

First of all, it is worthwhile mentioning that indoor environments have certain constraints

between the floor, wall, and ceiling, which can be taken into consideration for mobile robot

navigation with laser illumination. Secondly, in contrast to the passive vision systems, the active

vision systems may rely on energy (e.g. structured light) being projected into the scene

intentionally. The main benefit of using structured light for data analysis is its simple detection

and extraction mechanism from the given image. Furthermore, the image of the indoor

environment obtained as a result of both the omnidirectional camera and the projected laser

includes more information for data understanding as compared with that obtained by means of

sonar, for instance. Indeed, under the former scenario the distance can be easily superimposed on

the image while providing a visual representation. On the other hand, the sonar-based approach

can provide distance related information, however it lacks a visual representation. Therefore, a

vision system for indoor navigation consisting of an omnidirectional camera in combination with

the structured light has gained a widespread attention among scholars, due to its large scene

range and high measurement’s efficiency. However, an important step here is the calibration

between the camera and laser source. Otherwise, measurement results can be not accurate as it

was expected.

1.3 Problem Statement

In order to obtain reliable mapping results, a vision system must be calibrated. This is

mainly due to the fact that without a known relationship between the camera and laser plane, it is

not possible to carry out the measurements in an appropriate way (see Figure 1.7). This statement

北京理工大学博士学位论文!

22

can be justified by analyzing several existing works that are presented below. Some experiments

available in the literature on the topic have been carried out under certain pre-defined

assumptions, namely, the camera and laser planes were installed parallel to the floor. The

possible reason for these assumptions is that existed calibration techniques are suitable only for

the one laser plane presented in the scene. But omnidirectional vision systems presented in a

literature [50-52] are based on the several laser emitters (see Figure 1.8). Probably that is why

measurements to the obstacles were carried out on a laser plane adjusted parallel to the floor.

Consequently, these experiment results were not as accurate as expected, e.g. see Figure 1.7. In

other works, calibration procedure of the extrinsic parameters was not considered by assuming

an ideal system in which calibration methods were not required, which may not be the case in

practical systems [53, 54]. At the same time, even small misalignments can lead to the incorrect

measurements, which is more important for omnidirectional vision system characterized by a

wide field of view. Therefore, not only new calibration techniques should be considered but it is

also important to look for new configurations of the vision system, which can be calibrated with

existed calibration methods.

Figure 1.7: Mapping results when extrinsic calibration was not considered.

北京理工大学博士学位论文!

23

Figure 1.8: Configuration of the vision system with several laser emitters.

1.4 Thesis Aims and Objectives

The overall goal of this Thesis is to achieve more accurate and robust results of the 2D

and 3D mappings with the less input data from the sensors involved to the vision system. This

work is made possible by recent advances in a deep learning and platforms allowing to generate

a large amount of the annotated training data in environments with a variety of conditions. A

series of three descriptive studies is proposed to systematically approached to more advanced

measurements for an omnidirectional vision system with laser illumination:

1.4.1 Aim 1 – Literature review

Aim 1 proposes to carried out a complete literature review, analysis, comparisons and

evaluation of existing successful schemes for extrinsic calibration of the vision system as well as

2D and 3D mappings of the indoor environment. We find out the weaknesses of these methods,

identify the research gaps, perform the analysis and provide the contributions.

Outcome: with an extensive overview of the related work it will be possible to determine

what is known on the topic, how well this knowledge is established and where future research

might best be directed.

1.4.2 Aim 2 – A novel omni-vision system

Aim 2 proposes to investigate a novel omnidirectional vision system with laser

illumination in a flexible configuration with the simulated and real data in order to eliminate

assumptions which previously were used for measurements. In the proposed vision system

flexible configuration means that, the camera and laser plane are not considered parallel to the

北京理工大学博士学位论文!

24

floor and the relationship between them can be obtained by means of calibration. The calibration

procedure is possible because the proposed vision system will consist of the single camera and

single laser emitter. During real experiments it can be difficult or impossible to compare methods

with each other because of the lack of the ground truth and measurement uncertainty. In contrast,

inside the simulation environment all of the variables are known. Therefore, this research will be

also focused on the development of the virtual environment, which will be helpful before

verifying vision system by real experiments.

Outcome: with the proposed vision system it will be possible to achieve a higher

accuracy and reliability of measurements, while involving less sensors to the system, which

makes it more affordable. As for the virtual environment, the simulator itself will be able to

generate photo-realistic images of objects and environments. The simulator will be helpful for

comparing methods with each other and testing theories before carrying out experiments in real

environments.

1.4.3 Aim 3 – Novel methods of extrinsic calibration

Aim 3 proposes to investigate calibration methods for 2D and 3D mappings with more

effective and automatic algorithms for computing extrinsic parameters. Preliminary work has

revealed particular limitations of the existed calibration methods, such as noise, their complexity

as well as limitations. These limitations are going to be eliminated with the proposed calibration

methods.

Outcome: with the aid of the deep analysis of the existed works on calibration and their

limitations it will be possible to extract only relevant information for developing new calibration

methods for the proposed vision system. Thus, by new calibration targets and new calibration

techniques it will be possible to obtain accurate and reliable extrinsic parameters of the vision

system by a single input image.

1.4.4 Aim 4 – A novel 3D reconstruction technique

Aim 4 proposes to investigate reconstruction of the indoor environment for the proposed

vision system in combination with semantic segmentation, allowing to obtain 3D model of the

indoor environment with one single snapshot. This study more directly addresses the practical

applications of the proposed vision system. By including structured light with its high degree of

detection on the input image it will be possible to extract depth of the scene. By involving

北京理工大学博士学位论文!

25

semantic segmentation network in combination with the depth data it will be possible to

reconstruct the indoor environment.

Outcome: the proposed reconstruction method will significantly improve reconstruction

results by eliminating disadvantages of passive vision systems, which are not effective in non-

textured environments. Thus, it is expected that by involving structured light and deep learning,

the accuracy and reliability of reconstruction results will significantly increase.

1.4.5 Tasks

In order to achieve defined aims of the Thesis bellow are enlisted the main tasks:

1- Comprehensive review of related works. Analyzing methods based on effectiveness

and relevance with regard to our topic.

2- Complete understanding models of vision systems and their weaknesses. Identification

of research gaps and issues encountered in existing models. On the basis of these findings

propose a suitable and robust solution for 2D/3D mappings in the indoor environment.

3- To verify and validate the performance of methods, based on effectiveness and

performance, select a suitable or propose novel relevant virtual environment for carrying out

experiments before testing the proposed vision system in the real environment.

4- On the basis of literature review, propose method for extrinsic calibration of the vision

system, which will be beneficial for both efficiency and accuracy of the calibration parameters.

5- On the basis of literature review, propose methods for 2D/3D mappings, which will be

beneficial for both efficiency and accuracy of the mapping results.

6- Collection of prominent results based on virtual/real data and their comparison to the

existing traditional and supervised calibration/mappings methods in the context of both

efficiency and accuracy.

1.5 Thesis Contributions

The contributions of the Thesis are given as follows:

1- A complete review of the related works.

2- A novel omnidirectional vision system with laser illumination in a flexible

configuration

3- The calibration method for 2D mapping, allowing to obtain extrinsic parameters with

one single snapshot.

北京理工大学博士学位论文!

26

4- The calibration method for 3D mapping, allowing to obtain extrinsic parameters with

one single snapshot.

5- The reconstruction method for omnidirectional vision system based on laser

illumination in combination with semantic segmentation, allowing to obtain 3D model of the

indoor environment with one single snapshot.

6- A customizable photo-realistic simulator for the CV community working with

omnidirectional vision systems with laser illumination in indoor scenarios

1.6 Thesis Road Map

Chapter-1: Describes the main problems on topics 2D mapping and 3D reconstruction of

the indoor environment. We present the ultimate aim, objectives, motivation, major challenges,

and enlist the possible applications of the research study. Then Thesis contributions are given at

the end of the chapter. The remaining Thesis is structured as follows:

Chapter-2: Provides comprehensive literature review of simulators, novel calibration

techniques and reconstruction methods. We present the evolution of these methods over the last

three decades with their introduction, types of features, and operating techniques. Next their

merits are discussed and based on these statements research problem is identified and the

appropriate solutions are proposed.

Chapter-3: Considers preliminary work, dedicated to the understanding operating

principles of the state-of-the-art calibration techniques, in order to evaluate advantages and

disadvantages of them. Here we also present the simulation environment for conducting

experiments and testing theories.

Chapter-4: This chapter proposes a novel calibration method of extrinsic calibration and

verifying results by 2D mapping with the simulated and real data.

Chapter-5: We introduce another method of extrinsic calibration and verifying results

with the proposed reconstruction technique.

Chapter-6: This chapter summarizes and draws a conclusion; we discuss our

achievements and contributions. We also suggest scope for future work in short- and long-term

perspectives.

北京理工大学博士学位论文!

27

Chapter 2 Literature Review

The content of this Chapter is based on the topic of “Calibration and 3D Reconstruction

with a Photo Realistic Simulator Based on the Omnidirectional Vision System”. In the literature,

we study and investigate the new challenges, new requirements and existing key models. This

Chapter also summarizes the current domestic and international research status and evolution

trends. On the basis of this comprehensive literature review, the research problem is identified

and the appropriate solutions are proposed to the related issues.

2.1 Virtual Environment

2.1.1 Introduction

Last decades, we observe a huge increase in interest towards engineering and computer

science courses, e.g. electrical engineering, electronic engineering and mechanical majored

courses. More and more students choose this direction for their future career. At the same time,

mobile robots are slowly getting to be adopted for those courses. For example, in some

universities students are able to acquire a valuable experience and useful skill by working with

real robots [55]. The teamworking, problem-solving are important aspects towards flexibility of

the educational process, as it allows students arouse their curiosity by exploring practical tasks.

However, some educational centers may suffer from the lack of the real robots. Consequently,

the number of various online platforms where students are able to acquire a variety of practical

skills in the field of engineering and computer science is becoming popular [56]. These trends

make the question of teaching methods for the following sciences more relevant. It is also

worthwhile mentioning that students do not always have enough motivation and involvement to

the educational process; as a result, we do not always observe their high academic performance

in those fields. Teaching materials which are presented in a simple form, namely overloaded with

diagrams and text, may cause those problems defined earlier. Secondly, these standard teaching

materials make the educational process repetitious. Thus, students often lack of real, practical

experience related with the disciplines they are studying. Mostly, standard educational process

gives an understanding of how the computer and engineering knowledge can be applied to

science, business, industry, and other fields only theoretically. In standard programming or math

classes, students usually solve various problems that are poorly related to their future profession.

Moreover, over the past year, the situation with Covid-19 has involved the adoption of drastic

北京理工大学博士学位论文!

28

and necessary measures by various countries around the world. The Covid-19 pandemic has

affected nearly 1.6 billion students in more than 190 countries around the world. In this context,

students at different levels of education have witnessed the interruption of the educational

process that would otherwise take place under normal conditions at universities and other

educational centers at different levels. The effects of this interruption have been particularly

difficult for engineering students, where students are typically required to carry out laboratory

experiments in order to conduct certain proofs of concepts as well as, to validate the research

results on a particular subject. Thus, the maintaining of educational prosses during pandemic is

broadly discussed by scholars [57-60]. By analyzing those works we can say that, the most

appropriate way to combine theory with practice under current situation and within trends

defined earlier is the use of simulation platforms.

In our published paper we demonstrated the capability of the simulation environment by

comparing different calibration techniques with each other [61]. In real experiments there is a

measurement uncertainty, which makes the comparison between methods more complicated.

Moreover, in real cases it is difficult or not possible to estimate real values of some of the

parameters, e.g. real location or orientation of the laser plane, whereas inside the simulation

environment they are known. Our work also proved that the modern game engines (Unity

platform in our case) allows users to create photo-realistic virtual environments, which are

suitable for testing theories before experiments are performed in real conditions. In this work we

decided to release the simulator to the CV community, to the best of our knowledge this is the

first customizable simulator allowing investigation between omnidirectional vision systems.

2.1.2 Review on simulators

Many simulators available in the market provide simulations for agents whether wheeled,

legged or UAV. These simulators are pertinent for these kinds of robots because of the features

instilled in them by extensive modern research. Some of these simulators are provided gratis in

order to involve maximum cooperation from dexterous persons who can contribute towards the

development of the simulator code; commonly known as ‘open-source simulators’; e.g. OpenSim.

Whereas, the simulators utilized and produced to charge the consumers in the market are

‘commercially available’ ones; e.g Cyberbotics’s Webots. All these simulators differ in the

features imparted and performance. They differ largely in the 3D visualization, cost, fidelity,

simulation engines, governing architecture and many more.

北京理工大学博士学位论文!

29

Commercial Simulators:

• Webots: Webots is a 3D mobile simulation software package that facilitates robotics

research and robot modeling. It was developed by Cyberbotics Ltd. and EPFL. The Open

Dynamics Engine [62] used, provides accurate physics simulation. Complex worlds can be

created using Open GL technologies and built-in 3D editor. Also, 3D models can be imported

from software programs (MATLAB, LabView, Lisp, etc.) through VRML [63] standard, linking

controller and application with TCP/ IP interface. Robots controllers can be transferred to real

robots like Aibo, Lego mindstorms, Khepera, Koala, Hermission, Boe- Bot, E-puck and many

more [64-67].

Figure 2.1: Webots simulator.

• Matlab: Matlab is a high-level language and interactive environment for numerical

computation, algorithm development, data visualization and data analysis. Simulink [68] creates

3D animation and models motors and sensors. Robotics Toolbox provides kinematics, dynamics,

and trajectory generation. In addition, toolboxes can interact with each other and with other

simulators to provide necessary support [69-74].

北京理工大学博士学位论文!

30

Figure 2.2: Robotics System Toolbox.

• Microsoft Robotics Developer Studio (MRDS): is a Windows-compatible

development environment for robot control and simulation across a wide variety of platforms. It

is used by academic, hobbyist, and commercial developers. Robot interaction can be achieved

using Web Browsers or Windowsbased interfaces like HTML and JavaScript. The new Robotics

Studio 2008 is based on IDE (Integrated Development Environment) for visually and graphically

producing code [75-77].

Figure 2.3: Microsoft Robotics Developer Studio.

Open-Source Simulators:

• USARSim: is a high-fidelity simulation system designed for automation and to

overcome modern robot design problems. Originally, it was used for urban search and rescue. It

supports multi robot coordination and human-robot interaction (HRI) through accurate

北京理工大学博士学位论文!

31

representation of robot, remote environment and user-interface behavior. It supports a variety of

robots including wheeled, tracked, legged and flying robots. High fidelity and low cost are

achieved through the integration of development tools, advanced editing that widens the range of

platforms that can be modeled. System architecture is based upon Client/Server Model consisting

of Controllers (Client), Game Bots and Unreal Engine (Server). Unreal Script facilitates creation

of new objects in the game. GameBots communicate between Unreal Engine and Controllers by

TCP/IP socket interface. Simulation is of three types: Environment, Sensor and Robot. The

system is extensively used in RoboCup Federation. IEEE supports two robotic competitions and

provides robotics education with USARSim [78-80].

Figure 2.4: USARSim simulator.

• OpenSimulator: referred as OpenSim, it is application software, written in C#,

enabling the creation of 3D virtual environment. OpenSim is the implementation of the Linden

Labs SecondLife [81] server. OpenSim is in the alpha testing phase; however, it is already used

formally by educational organizations and companies such as IBM, Microsoft, Nokia and Intel

[82-83].

北京理工大学博士学位论文!

32

Figure 2.5: OpenSim simulator.

• ÜberSim: ÜberSim is an open-source (GPL release) high fidelity multi-robot

simulation engine primarily intended for high-development rate of robot control systems and

their easy transference to real robots. It is based on Client/ Server architecture in which client

and server are synchronous at all times. Client and Server change functions to provide increased

interchangeability between the simulation control code and the real robot control code. The high-

fidelity simulation engine and extensible robot classes aid the simulation of a wide variety of

robot types ranging from small-size soccer robots to legged robots (Aibo). Simulated robot is

modeled with XML description language that obviates the changes in control algorithms when

physical changes (sensors/ actuators) are made to the robot structure [84-85].

Figure 2.6: Robot Soccer Simulator with ÜberSim.

北京理工大学博士学位论文!

33

• Simbad: It is an open source Java 3D simulator that facilitates computer simulation, not

the real world simulation for mobile robots. It uses built-in physics simulation. Complete Simbad

package comprises of simulation engine with two standalone libraries: Neural Network

(PicoNode) and Artificial Evolution (PicoEvo). It works on any system having Java language

and Java 3D library. Robots operate in time-sharing mode. It is used for research work in AI and

Machine Learning pertaining to Autonomous Robotics [86-88].

Figure 2.7: Simbad simulator.

• Breve: Breve is a 3D simulation environment that provides frameworks for the

simulation of decentralized systems and artificial life. For enabling customization of the

functionality of applications, Breve defines a ‘plugin’ architecture that allows users to create

plug-ins. The simulation engine does not provide an accurate touch to the simulation, however

aims at making the simulation ‘realistic’, like rigid body simulation, collision detection/response,

and articulated body simulation. 3D visualization is obtained using OpenGL display engine. The

display engine also produces ‘special-effects’ like shadows, reflections, lighting, semi-

transparent bitmaps, lines connecting neighboring objects, texturing of objects and the ability to

treat objects as light sources. It is a free software package and released under GPL license [89-

91].

北京理工大学博士学位论文!

34

Figure 2.8: Ways of simulating the omni-camera.

This subsection reviewed a wide variety of simulators with different features. The

process of supporting/improving currently available simulators as well as the fact of appearing

new simulators show the importance of their development in the field of robotics. At the same

time, during the literature review we did not find simulators which are able to simulate

omnidirectional vision systems with laser illumination. However, a couple of works from the

literature which can be beneficial for the development process of the simulator deserve

consideration.

2.1.3 Omni-vision simulation

In the last few decades a wide variety of robotic simulators have been developed

commercially or in research laboratories [92] resulting in considerable publication in this area.

An exhaustive review is beyond the scope of this dissertation so this subsection considers only

those most relevant to our work, namely the simulators supporting omnidirectional cameras and

structured light. Widely used simulators as Gazebo [93] and USARSim [94] support laser

plugins but unfortunately, they do not include omnidirectional cameras. In contrast, in works [95,

96] authors managed to integrate omnidirectional cameras to these simulators. Authors

superimposed images of the environment onto the faces of a cube, after which they were able to

use this as a texture for creating a hyperbolic mirror or a fisheye camera. However, such

manipulations require certain programming skills what could be problematic for some users. In

recent years NVIDIA released the photorealistic robotics simulator, namely NVIDIA Isaac Sim

[97], and the latest version of this simulator supports a fisheye camera. However, the use of this

simulator is limited to computers with NVIDIA GPUs. Multiple platforms support is provided by

programs such as Blender and Unity.

北京理工大学博士学位论文!

35

In order to generate photorealistic synthetic images, in a couple of works [98, 99] Blender

was considered as the basis for creation of omnidirectional vision systems (see Figure 2.9).

Blender is an open source suite of tools for 3D modelling, rendering and animation. However, it

is not suited to programming tasks and communication with other programs, which restricts its

use in certain cases. More flexibility is provided by game engines such as Unity, UNIGINE,

CRYENGINE, and Unreal Engine 4, which provide extensions of their core functionality with

their native programming languages. By taking an advantage of modern game engines P. Bourke

released a publicly available fisheye camera with the various FOV, which was simulated in

Unity [100]. Thus, users familiar with the Unity platform, may capture omnidirectional images

of their 3D scenes (see Figure 2.9). However, the process of developing a new scene is time-

consuming and requires certain skills in Unity. Thus, in this Thesis we propose a simulator,

which is targets the study of omnidirectional vision system in indoor environments. There are not

any particular skills in Unity are needed to use our simulator, it is built like a video game that can

simply be installed without any additional dependencies.

Figure 2.9: Ways of simulating the omni-camera.

2.2 Extrinsic Calibration

The critical task in autonomous navigation is related with acquiring information about the

environment. It is possible to classify sensors as being proprioceptive or exteroceptive. A

proprioceptive sensor measure values internal to the robot. On the other hand, exteroceptive

sensors acquire information about the environment where the robot is inserted. The information

can be processed in order to extract meaningful environmental features.

北京理工大学博士学位论文!

36

However, the relative sensor poses to each other is needed to transform all measurements

into a common coordinate frame for tasks like localization or sensor fusion. The so-called

extrinsic sensor parameters are defined by 3 DoFs (degrees of freedom) for translation and 3

DoFs for rotation (e.g., yaw, pitch and roll components). Another situation is for safety sensors,

e.g., errors of 1o in the orientation of a safety laser can lead to position errors up to 0.5m at a 30m

range. The 0.5m error can put in danger the robots or persons moving through the environment

by not detecting correctly if an obstacle is in the imminence of colliding with the robot. Thus, the

literature review on extrinsic sensor calibration is considered through this Section.

2.2.1 Review on methods

Many applications in the field of mobile robotics employ a variety of sensors, from

proprioceptive sensors (like GPS, Inertial Measurement Units (IMU) or shaft encoders) to

exteroceptive sensors (including vision, range or contact devices). In order to exploit efficiently

the information provided by such sensorial systems, the sensors must be calibrated to: interpret

correctly the acquired data (intrinsic calibration), and to put all the measurements in a common

reference frame (extrinsic calibration). This Chapter is focused on sensor extrinsic calibration.

For more specific literature on this the reader is referred to [101].

A structured-light vision system is basically composed of a camera and a laser projector,

the working principle is laser triangulation. When the laser plane is projected on a scene, the

camera captures the image of the modulated light stripe. In order to obtain reliable reconstruction

results, any vision system must be calibrated. If the sensor is calibrated, the 2D data in the laser

plane could be gained from the image. The goal of sensor calibration is to establish the mapping

relationship between the laser plane and the computer image plane, and the key procedure is to

collect calibration points. Extrinsic calibration consists in finding the mapping relationship

between the laser plane and the camera. The key technique in this stage is to determine

calibration points in the laser plane and their corresponding points on the image plane.

Different calibration techniques have been presented in the literature (see Figure 2.10)

such as the method based on the raised block basis [102, 103]; calibration by using ball as a

target [104]; geometrical calibration method based on the theory of vanishing points and

vanishing lines [105]. The most commonly used method includes checkerboard pattern for the

calibration process [106-120]. Laser strip intersections with the patterns can be obtained, after

that the relationship between the camera and laser plane can be calculated. In order to calibrate

北京理工大学博士学位论文!

37

the laser plane, at least three non-collinear points are required, which allows one to obtain a

unique solution by analyzing the extracted laser points belonging to the pattern placed at the

different positions.

Figure 2.10: Calibration methods for obtaining extrinsic parameters of the vision system.

Zhang and Pless [106] first initialized the extrinsic parameters by a linear approach using

a checkerboard (see Figure 2.11) as a calibration pattern to define a geometric constraint between

a 2D laser scanner and a camera. Next, a non-linear optimization procedure was developed to

minimize the point-to-plane error, and it was also implemented an outlier detection based on the

noise extracted from fitting a line to the laser data. Finally, a global optimization procedure also

using the Levenberg-Marquardt algorithm minimized the combined reprojection error and the

point-to-plane error to refine the extrinsic parameters. The Levenberg-Marquardt algorithm was

used for both optimization procedures. Note that [106] does not need an initial estimation.

Figure 2.11: Setup Laser – Camera used in Zhang and Pless [100].

A similar calibration technique was adopted for an omnidirectional vision system in the

paper [107]. Instead of moving the pattern to the different position, authors used an alternative

solution, which is based on the establishment of two perpendicular checkerboard patterns (see

Figure 2.10). The spatial coordinates of the laser line on two calibration patterns can be obtained

北京理工大学博士学位论文!

38

by means of the homography matrix. Afterwards, the plane equation can be fitted to these

coordinates.

Based on [106], Mei and Rives [108] calibrated an omnidirectional camera based on a

central catadioptric sensor and a 2D laser scanner. [108] considered both visible and invisible

laser sensors. The first setup (visible laser–camera) can be calibrated using two different

approaches: minimization of the reprojection error (association between the visible laser points

and the images) or minimization based on the association between the visible laser trace and the

images. As for the second setup (invisible laser–camera), similar to [106], [108] minimized the

reprojection error considering a checkerboard as a calibration object. [108] does not need an

initial estimation.

Although Vasconcelos et al. [109] based their work on [106], they did not use point-to-

plane correspondences. Instead, [109] fitted lines to the laser points to compute an initial

estimation, and each plane obtained from the calibration object (checkerboard) must go through

the lines, as illustrated in Figure 2.12. Finally, the extrinsic parameters are refined by minimizing

the reprojection error using bundle adjustment. [109] noted that the minimum of different poses

and orientations of the checkerboard is three.

Figure 2.12: Setup Laser – Camera used in Vasconcelos et al [109].

Gong et al. [110] used a trihedron (orthogonal or not) as a calibration object to compute

the 3D laser scanner–camera extrinsic parameters. [110] defined four constraints: trihedral

constraint (trihedron defines the relative sensor’s pose to the world frame), planarity constrain

between two frames (coplanar point lies on a plane independently of the reference frame),

planarity constraint between two images (correspondence of coplanar features in two different

images), and the motion constraint (sensors’ rigidity). These constraints were considered on a

北京理工大学博士学位论文!

39

non-linear least-squares problem further solved with the Levenberg-Marquardt algorithm. Even

tough [110] does not need initial estimations, the user must select the trihedron planes in the

observations.

Gomez-Ojeda et al. [111] used maximum likelihood estimation (MLE) processes to

minimize line-to-plane (rotation) and point-to-plane (translation) errors using an orthogonal

trihedron (more restricted that [110], in terms of the calibration object). The MLE processes were

formulated independently. First, the rotation matrix is estimated by formulating the optimization

procedure on the tangent space using Lie algebra. Next, the translation component is obtained

using the Levenberg-Marquardt algorithm. [111] requires an initial estimation for the extrinsic

parameters.

A method that does not require an overlapping field of view (2D laser–camera setup) was

proposed by Bok et al. [112]. Two approaches were developed: one similar to [106] assumed that

the checkerboard is perpendicular to a plane detected by the laser (left side of Figure 2.13), and

the other one assumed that the intersection line of two planes (orthogonal or not) is aligned with

the checkerboard’s coordinate system (right side of Figure 2.13). Authors in [112] initialized the

extrinsic parameters using the least-squares algorithm. An optimization procedure refined the

extrinsic parameters minimizing the point-to-plane error for the first and the distance error

between the line and feature points for the second approach. Another cost function was

formulated for the second approach to minimize the point-to-plane error. [112] needs five and six

different checkerboard’s poses for the first and second method, respectively.

Figure 2.13: Setup Laser – Camera used in Bok et al [110].

Pereira et al. [113] also calibrated camera–laser (2D or 3D) setups and proposed the

minimization of the reprojection error using the Levenberg-Marquardt algorithm. This error is

北京理工大学博士学位论文!

40

relative to a 3D sensor frame (e.g., 3D laser scanner), instead of reprojecting the ball’s center to

the camera’s frame. Note that the point-to-point error continues to be valid for the 2D/3D laser–

camera setup. Similar to [113], Guindel et al. [114] implemented the ICP algorithm to calibrate

the 3D laser scanner–camera setup with closed-form equations to minimize the point-to-point

error using a planar object with four symmetric circular holes. This error was computed using the

correspondences of holes’ centers between the sensors. [114] needs a stereo-pair configuration

(two cameras with a known pose between them).

Yousef et al. [115] proposed another method that does not require an overlapping field of

view. [115] adapted the robot-world hand-eye calibration problem to the calibration of a 2D laser

scanner–camera setup. Different transformations between the laser, camera, floor, and

world/checkerboard frames were formulated to perform a Euler parameterization using the

Levenberg-Marquardt optimization algorithm. [115] requires that the floor frame remains

stationary relative to the calibration object (checkerboard) and detectable by the laser scanner. As

a calibration environment, the authors used a corner of a wall (taking advantage of the

orthogonality assumption for the wall planes) with a checkerboard fixed on one of the wall

planes. However, [115] has a restriction: it assumes that the laser plane is parallel to the floor

(zero pitch and roll).

Kühner and Kümmerle [116] calibrated the camera–laser (2D or 3D laser scanners) using

least-squares. However, [116] minimized the point–to–ray distance error. The method has the

same requirements as for the laser–laser setup, and it was noted that the problem is only fully

constraint if scale information is provided by a range measurement (at least one observation).

Even though [114] proposed an error function for 2D lasers, [116] was only tested with 3D laser

scanners.

Lastly, Oliveira et al. [117] also estimated the extrinsic parameters for the 2D laser–

camera setup and minimized the reprojection error using bundle adjustment.

2.2.2 Problem formulation

However, some types of the omnidirectional vision systems do not meet to requirements

of the considered calibration methods. Previously mentioned calibration techniques assume that

only one laser plane is present in the scene. But as for mapping, in order to obtain the distance

information about the environment, omnidirectional vision systems are based on the several laser

emitters [56-58]. That is why in the previously mentioned works, the distance measurements to

北京理工大学博士学位论文!

41

the obstacles were carried out on a laser plane adjusted parallel to the floor, in which the offset

between the origin of the camera and origin of the laser plane is not significant. Otherwise, with

the emitters’ orientation, the number of laser planes increases. Calibration of every sensor is a

tedious process and at the same time is meaningless, e.g. when lasers emit similar red light. In

another work, in order to estimate the position of the structured light in an omnidirectional vision

system, X. Chen et al. proposed a structured light calibration method for estimating the 3D

information of objects [121]. As omnidirectional imaging technology can reflect objects light

within a 360o field of view via a conic mirror, this proposed method can be used in order to

calibrate the position of multiple structured lights (emitted same red light) simultaneously.

However, this method is only applicable for point-structured lights, which are not really suitable

for navigation and mapping of the indoor environment.

Laser plane calibration by checkerboard patterns requires additional steps, which make

the calibration process more complicated. Namely, for every pattern position two snapshots

should be taken: one with the laser beam (for laser extraction) and another one without the laser

beam (for obtaining pattern points). In the specific case with the one image, pattern points might

be extracted in a wrong way, because of the laser points belonging to the pattern. Therefore, in

order to obtain reliable measurement results and simplify the calibration process, new

configurations of omnidirectional vision systems based on laser illumination must be

implemented.

2.3 Reconstruction of the indoor environment

Perception and sensing become an integral part of reconstruction of a previously

unknown environment. They must be able to estimate the three-dimensional structure of the

environment in order to perform useful tasks. In general, reconstruction methods can be based on

passive or active sensing techniques, each method has its own relative merits.

2.3.1 Stereo vision

Numerous techniques have been studied for 3D reconstruction of the indoor environment.

A popular and conventional approach of creating a digital representation of the scene is to

generate 3D points cloud (see Figure 2.14) from multiple digital images [122]. In this case,

identified feature points between multiple images can be used to determine camera poses

(Bundler) and subsequently 3D point clouds can be created. The performance of these methods

北京理工大学博士学位论文!

42

depends on being able to reliably detected features in the surroundings, therefore methods based

on the passive vision systems may fail for featureless environments.

Figure 2.14: Example of 3D points cloud.

2.3.2 Kinect

On the other hand, with the current popularization of RGBD cameras such as Microsoft’s

Kinect, several techniques have been proposed recently to model scenes with a depth camera

[123, 124]. However, a key limitation of the Kinect sensor is the limited FOV (see Figure 2.15).

In an attempt to address these deficiencies F. Tsai et al. proposed a vision system consisting of

multiple RGBD (Kinect) and DSLR cameras [124]. By merging the conventional images with

the depth images authors were able to reconstruct the environment even in featureless areas (see

Figure 2.15). At the same time, from the results presented in their work it can be seen that even if

multiple sensors involved, there can still be unreconstructed regions, which makes this method

less applicable to the certain applications, e.g. mobile navigation. This problem might be solved

by integrating even more vision sensors to the system, but this increases the computation and

overall expense of the vision system.

北京理工大学博士学位论文!

43

Figure 2.15: 3D reconstruction based on the Kinect sensor. The image above shows single Kinect. The image below

shows multiple Kinects.

2.3.3 Lidars

In order to achieve a wide horizontal FOV with long range and high accuracy, several

Kinect sensors can be replaced with a LIDAR sensor [116]. This involves fewer elements in the

vision system, which makes it more reliable, but still a single LIDAR unit is generally

insufficient for analyzing indoor scene in the vertical direction (see Figure 2.16). At the same

time vision systems with multiple LIDARs are problematic due to their cost, size, and weight.

Several approaches based on the single LIDAR sensor have addressed this problem [125-129].

The general idea of these works, that provide a cost-effective vision system and achieve a wide

vertical FOV, is that authors have attempted to shift from rigid vison systems to more flexible

configurations by rotating the LIDAR sensor (see Figure 2.16). This make it possible to extract

more features of the environment with a single LIDAR unit. Even though these LIDAR based

approaches provide a fully omnidirectional depth sensing capability it is still a relatively costly

sensor for the indoor environment. A more cost-effective and lightweight solution is achieved by

using a structured light approach.

北京理工大学博士学位论文!

44

Figure 2.16: 3D reconstruction based on the Lidar sensor.

2.3.4 Structured light

The structured light is not only cost-effective and lightweight, but it also allows easily

detection of projected features by the camera the subsequent calculation of depth information

from laser triangulation. This approach provides a wide FOV while achieving portability and

affordability. Y. Son et al. proposed a tiny palm-sized vision sensor, which is composed of the

fisheye camera, structured light and rotating motor. With the rotational movement a 3D

omnidirectional sensing capability is achieved [130]. Particular attention needs to be made to the

type of the encoder used, e.g. magnetic encoders may suffer from nonlinearity problems,

otherwise, angular position measurement error may have an adverse impact on the final

reconstruction results. P. De Ruvo et al. also proposed the vision system based on the rotation

platform [131]. By ensuring accurate control of the angular velocity, authors have achieved a

high precision 3D omnidirectional reconstruction (see Figure 2.17A). However, this type of the

vision system has a relatively large and complex structure, which is challenging to recreate.

Moreover, both methods described above are focused on the issues of data acquisition of the

individual scans and their analysis, but the challenge addressed here is that reconstructed models

are not textured. To overcome this problem X. Lian et al. proposed an omnidirectional vision

system, where a vertically mounted laser sensor is used to acquire the geometrical data [132].

After joint calibration of the vertical laser sensors and omnidirectional camera, the authors

combined the extracted laser points with the corresponding pixels in the panoramic images. As a

result, they were able to reconstruct the 3D model by merging the range data and color

北京理工大学博士学位论文!

45

information (see Figure 2.17B). However, the reconstruction process of the whole environment

would be time-consuming as it additionally depends on the movement of the mobile robot. In

order to reduce the reconstruction time, it is considered necessary to review other methods.

Figure 2.17: 3D reconstruction based on the rotated structured light.

2.3.5 Deep learning

In recent years the advancement of deep learning has been applied to field of research on

structure reconstruction of indoor scenes [17, 18]. The main advantage of the proposed methods,

that it is possible to recover the 3D layout of an indoor scene from an image captured from a

single position in space (see Figure 2.18). The main limitation is that depth data is not involved

in the reconstruction procedure. Thus, it is still a relevant research problem of how to create a

reliable 3D digital representation of the indoor environment with the less input data from the

vision sensors.

北京理工大学博士学位论文!

46

Figure 2.18: Layout recovery with deep learning.

2.4 Conclusion

In this Chapter we conducted analysis of the existing methods for calibration of the vision

systems with the further reconstruction of the indoor environment. Particular limitations of those

techniques have been listed as well as directions on their improvements. It is also worthwhile

mentioning that as we operate inside the indoor environment, therefore particular attention

deserves configurations of the vision system, which can be significantly simplified from those

considered in the literature review. Moreover, in this Thesis we will also focus on the creating

process of the simulation environment, which can be helpful for testing theories before their

practical applications.

北京理工大学博士学位论文!

47

Chapter 3 Omni-vision System Research Platform

3.1 Main Contributions of this chapter

In this Chapter we consider and analyze the most popular calibration method, which is

based on the perpendicular checkerboard patterns, this method was adopted from the [107]. This

analyzing process was carried out for understanding limitations of currently existed methods in

order to propose more reliable and robust calibration technique. Contributions of this Chapter are

five-fold:

• Presenting an improved omni-vision system.

• Calibration of the proposed omni-vision system with existed calibration methods.

• Analyzing process of existed calibration methods.

• Presenting the simulation environment.

• Testing of the simulation environment with existed navigation algorithms.

3.2 An Improved Omnidirectional Vision System

In this Thesis for eliminating lacks of previous vision systems [50-54], namely the

inability to carried out extrinsic calibration we decided to propose a different configuration of it.

Thus, by replacing several laser emitters with the one omnidirectional laser emitter (see Figure

3.2), it is possible to create one single laser plane, which can be calibrated by adopting

algorithms of existing methods [100-120]. Certain modification of the vision system was made

with regards to the camera.

A

B

C

Figure 3.1: Configuration of the vision system. A – fisheye camera Ricoh Theta S. B – omnidirectional laser emitter.

C – snapshot captured by fisheye camera.

北京理工大学博士学位论文!

48

Conventional cameras are generally seen as perspective tools (pinhole model), which is

convenient for modeling and algorithmic design. Moreover, they exhibit small distortions and

thus acquired images can easily be interpreted. Unfortunately, conventional cameras suffer from

a restricted field of view (see Figure 3.2 A). For example, a camera with 1/3-inch sensor and

8mm lens can only provide about 50◦ in the horizontal angle and 40◦ in vertical angle. So, the

vision system with this kind of cameras is easy to miss the concerned objects or their feature

points in a dynamic environment. Furthermore, it is often difficult to obtain sufficient overlap

between different images, which will lead to troublesome when performing the vision tasks. This

problem can be generally solved if the view field of the system is enlarged. In the literature,

several methods have been proposed for increasing the field of view [133]. The first method is to

simultaneously use a set of conventional cameras and combine their view fields together. The

main problem in this kind of vision system is how to synchronously control those cameras. The

second method consists on using a moving camera or a system made of multiple cameras [134-

138]. However, the process of involving a large number of elements to the system makes its less

robust and reliable, as well as increase the total cost.

Thanks to developments in optics manufacturing, and to the decreasing prices in the

cameras’ market, catadioptric cameras (see Figure 3.2B) and dioptric omni-directional (fisheye)

cameras (see Figure 3.2C) are being more and more used in different research fields. A

catadioptric camera consists of combining a conventional camera and mirrors. A fisheye camera

is an imaging system that combines a fisheye lens and a conventional camera.

A

B

C

Figure 3.2: Types of Cameras. A – Perspective camera. B – Catadioptric camera. C – Fisheye camera.

The camera-lens-mirror system is considered as the third category, which composes with

the traditional camera and curved mirror with special shape to enhance the sensor’s field of view.

Since both reflective (catadioptric) and refractive (dioptric) rays are involved, the system is also

named as catadioptric camera system or omni-directional system. Compared to the traditional

北京理工大学博士学位论文!

49

system with narrow field of view, there are several advantages for such systems: Firstly, the

search for feature correspondences is easier since the corresponding points do not often

disappear from the images; Secondly, a large field of view stabilizes the motion estimation

algorithms; Last but not least, more information of the scene or interested objects can be

reconstructed from fewer images. Many different types of mirrors can be employed in such

system, including planar mirrors, elliptical mirrors, parabolic and hyperbolic mirrors, etc.

Accordingly, they can be categorized into parabolic camera system (combining a parabolic

mirror with an orthographic camera) and hyperbolic camera system (a hyperbolic mirror with a

perspective camera), etc. On the other hand, depending on whether or not all the incident rays

will pass through a single point named center of projection, the system can be classified as

noncentral projection system and central projection system. Everything has its advantages and

disadvantages. In the noncentral system, the relative position and orientation between the camera

and the mirror can be arbitrary which allows zooming and resolution enhancing in some selected

regions of the image. However, such flexible configuration results in high complexity for system

modeling. Therefore, there is no robust linear calibration algorithm for them and thus their

applications often require less accuracy. The resulting imaging systems have been termed central

catadioptric when a single projection center describes the world-image mapping [139-141]. In

[142], a projection model valid for the entire class of central catadioptric cameras has been

proposed. According to this generic model, all central catadioptric cameras can be modeled by a

central projection onto a unitary sphere followed by a perspective projection onto the image

plane. This cost brings a closed-form expression for the projection modeling, which maps 3D

space points to image pixels. Hence, the complexity of system modeling has been considerably

reduced.

In nature, most species with lateral eye placement have an almost spherical view angle,

for example the insect eyes. In the vision community, the large field of view can be achieved

with pure dioptric elements. Here, this kind of system falls into the last category, i.e. a camera

system with fisheye lens. The distinct advantage of such system is that it itself can provide

extremely wide view angle. Hence, they can provide images of large areas of the surrounding

scene with a single shot. The fundamental difference between a fisheye lens and the classical

pin-hole lens is that the projection from 3D rays to 2D image in the former is intrinsically non-

perspective, which makes the classical imaging model invalid. As a result, it is difficult to give a

北京理工大学博士学位论文!

50

closed-from model for the imaging process. In the literature, the Taylor expansion model,

rational model or division model is frequently adopted for such vision system.

Miniature dioptric and catadioptric cameras are now used by the automobile industry in

addition to sonars for improving safety, by providing to the driver an omnidirectional view of the

surrounding environment. Miniature fisheye cameras are used in endoscopes for surgical

operations or on board microaerial vehicles for pipeline inspection as well as rescue operations.

Other examples involve meteorology for sky observation. Roboticists have also been using

omnidirectional vision with very successful results on robot localization, mapping, and aerial and

ground robot navigation [143-148]. Omnidirectional vision allows the robot to recognize places

more easily than with standard perspective cameras [149]. Furthermore, landmarks can be

tracked in all directions and over longer periods of time, making it possible to estimate motion

and build maps of the environment with better accuracy than with standard cameras. For an in-

depth study on omnidirectional vision, we refer the reader to [150-152].

By analyzing catadioptric and dioptric cameras in more details we decided integrate to

our vision system a dioptric camera. The main disadvantage of catadioptric cameras lies in the

image taken by them, as it has a large dead area in the center (see Figure 3.2 B), which can be a

huge drawback. Such sensors also have the drawback of requiring a mirror, which results in a

more cumbersome and more fragile sensor imaging system. Therefore, we decided to integrate

the fisheye camera to our vision system (see Figure 3.2).

3.3 Intrinsic Calibration

In this Thesis we propose a calibration method for obtaining extrinsic parameters of the

vision system, which assumes that the intrinsic parameters of the camera are already known.

However, it is important to get reliable calibration results here, because the accuracy of the

extrinsic parameters depends on the intrinsic ones as well. Therefore, this Section aims to

investigate existed calibration methods, evaluate their merits, select the most reliable calibration

method.

3.3.1 Calibration principle

The camera calibration is an important step towards structure from motion, automobile

navigation and many other vision tasks. The intrinsic calibration of a sensor consists of providing

a model to interpret the raw measurements, so that the data is put in correspondence with world

北京理工大学博士学位论文!

51

properties. Examples of intrinsic parameters can be named as the focal lengths, the skew factor,

the principal point, and its distortion parameters. Such parameters are usually provided by the

manufacturer; however, sometimes it is crucial to estimate these intrinsic parameters in order to

model deviations from the construction parameters or particular circumstances of the system.

The intrinsic parameters of the camera have to be estimated before conduction of the extrinsic

calibration since it is needed to interpret the sensor measurements.

The pinhole camera model accompanied with lens distortion models is a fair

approximation for most conventional cameras with narrow-angle or even wide-angle lenses

[153-155]. But it is still not suitable for fish-eye lens cameras. Fish-eye lenses are designed to

cover the whole hemispherical field in front of the camera and the angle of view is very large,

about 180o. Therefore, Kannala and Brandt [156] proposed a new model. However, their

perspective projection model had a number of drawbacks, so later another more appropriate

model was proposed [157], where an equidistant projection model was proposed. The equidistant

model differs from the regular radial distortion model by representing the distortions from the

shifted center of the image. There are a number of other models related to the use of polynomial

or rational functions for various projection models: the spherical projection model, where

straight lines are projected on a circle in the image in spherical perspective [158], the perspective

projection model, and the use of polynomial functions to optimize the shape of distortions [159].

In General, calibration methods can be divided into two groups:

• marker-based calibration;

• autocalibration.

In turn, marker-based calibration can be divided into calibrations that use 2D patterns

[160-165] and calibrations using 3D patterns [166]. In contrast to the marker-based calibration,

auto-calibration refers to self-calibration techniques that do not use markers. Most of these

techniques require more than one image per scene or a special scene structure. This is often

because many cameras have an epipolar geometry. Other techniques require the presence of

straight lines on the scene, detecting which you can determine the parameters of distortion.

Using different models to calibrate fisheye cameras requires appropriate software to carry

out the necessary calculations. In [167], a calibration toolbox is proposed for estimating camera

parameters in Matlab. It can be used to calibrate and evaluate parabolic, catadioptric, dioptric

camera types. Diopter cameras can be used to calculate fisheye cameras, as they only use optics,

北京理工大学博士学位论文!

52

as opposed to using mirrors in catadioptric cameras. As described in [168], a projection model

was used in which the points are projected onto a single sphere with subsequent projection onto a

normalized image plane, allowing for a shift of the design center on the image plane. A flat

calibration grid is used for the calibration procedure, the center of the image is used as the main

point estimate for the diopter camera model. The Mei projection model projects 3D points as

shown in the figure below in five steps:

Step 1. The Point is projected onto a unit sphere.

Step 2. Points are shifted.

Step 3. The points are projected onto the normalized image plane.

Step 4. Radial offset is added.

Step 5. The point is projected onto the images.

Figure 3.3: Mei’s model projection steps.

Another calibration tool offered by Scaramuzza is the Matlab calibration tool [169]. The

tool is applicable to any panoramic camera, including fisheye cameras. In many ways, it has the

same functionality as the one mentioned above, but allows for more precise adjustment of

external parameters (offset and rotations), as well as the elimination of various internal

distortions. During literature review also an extension for this toolbox was found, according to

which it becomes possible to achieve more stable, robust and accurate calibration results was

proposed in the paper [170]. Authors replaced the residual function and joint refinement of all

parameters. In doing so, they achieved more stable, robust and accurate calibration results and

reduced the number of necessary calibration steps from five to three. Experiment results showed

北京理工大学博士学位论文!

53

that a significant performance increased in comparison with other calibration methods [162, 171].

Therefore, this extension is used in this proposal in order to obtain intrinsic camera parameters.

3.3.2 Camera model

Camera calibration process can be carried out by using model proposed by Scaramuzza et

al. in [171]. The used model assumes that the camera is a central system, i.e., all the perceived

rays intersect at a single point. The model is rotationally symmetrical with respect to the z axis. It

maps a 2D point (u, v) in a virtual normalized image plane to a 3D vector P (direction of ray)

through a polynomial function f as shown in Equations (3.1) and (3.2). The degree of the

polynomial can be freely chosen, and a quartic equation model proved to be a good compromise

as stated by Scaramuzza and measured in our own experiments.

λ

'()

λ*

+,

-,

.#$,%

/

)0'1

(3.1)

where λ is a depth scale and the vector p emanating from the

2!

and

3!

represent the point in the

camera image plane to a scene point X expressed in homogeneous coordinates; P is the

perspective projection matrix; the polynomial

.#$,%

which approximates the function that back

projects every pixel point into the 3D space, has the following form:

.

#

4,

%

)5"65#4,#6765$4,$

,

4,)8

9#

2!:2%

%

#6

#

3!:3%

%

#

(3.2)

where

;&

– coefficients; N – degree of the polynomial to be determined by the calibration;

+'

and

-'

represent the coordinates center of an omnidirectional image.

Small misalignment between sensor, lens or mirrors are modeled using an affine 2D

transformation as shown in Equation (3.3), where (u’, v’) stands for the real distorted coordinates

in the sensor image plane and (u, v) are ideal undistorted ones in a virtual normalized image

plane; c, d, e are the affine transformation parameters.

<

+

′

-

′=

)

>

?8888@

′

A8888B8

C<

+

-

=

6

<

+'8

-'

= (3.3)

Equation 3.1 can be rewritten as:

北京理工大学博士学位论文!

54

!

*

2,

3,

.#4%

/

)8

D

EFF EFG EFH I(

EGF EGG EGH I)

EHF EHG EHH I*

J

'

K

L

M

N+

,

O+

,

P+

,

B

Q

R

S

)

T

U-

,U#

,U.

,V,

W

'

K

L

M

N+

,

O+

,

P+

,

B

Q

R

S

88888#XYZ%8

whrere

[-

&

,

8[#

&

,

8[.

&

,

8\&

are the column vectors of the transformation matrix related to the rotation

and translation parameters respectively.

Checkerboard pattern is the planar object where we can detect X, Y coordinates and z

value is constant (

I*

), therefore coordinate Z doesn’t change and becomes equal to zero and at

the same time column vector

U.

,

of the matrix P is equal to zero as well. Therefore, Equation 3.4

transforms to:

λ+

,'(/

08)λ+

,]8 +1

&

-1

&

;"6;#$1

ʤ2$1

&28^)_&'`1

&

)T[-

&[#

&[.

&\&W'K

L

M

a1

&

b1

&

c

BQ

R

S

)T[-

&[#

&\&W'Da1

&

b1

&

BJ

(3.5)

Before representing how to arbitrate the extrinsic parameters, it is crucial to eliminate the

dependence from the depth scale λ. It is done by multiplying right and left side of equation

vectorially by

d+

,

:

λ

+

,'(+

,e(+

,)(+

,e

T

f-

&f#

&g&

W

'

D

a1

&

b1

&

B

J

)c

h

]

8+1

&

-1

&

;"6;#$1

ʤ2$1

&28

^

e

T

f-

&f#

&g&

W

'

D

a1

&

b1

&

B

J

)c

(3.6)

Now, let us focus on a particular observation i of the calibration pattern. From Equation (3.6), we

have that each point

(1

on the pattern contributes three homogeneous equations (we removed

superscript i to facilitate the reading):

-1

i

[.-a16[.#b16j.

k

:l

i

$1

ki

[#-a16[##b16j#

k

)c

(3.7)

北京理工大学博士学位论文!

55

l

i

$1

ki

[--a16[-#b16j-

k

:+1

i

[.-a16[.#b16j.

k

)c

(3.8)

+1

i

[#-a16[##b16j#

k

:-1

i

[--a16[-#b16j-

k

)c

(3.9)

with

l

i

$1

k

);"6;#$1

ʤ2$1

&2

. Observe that here

a1

,

b1

,

8m1

are known and so are

+1

,

-1

.

Also, observe that only (3.9) is linear in the unknown

[--

,

8[-#

,

8[#-

,

8[##

,

8j-

,

j#

. Thus, by stacking

all the unknown entries of (3.9) into a vector, we can rewrite Equation (3.9) for L points of the

calibration pattern as a system of linear equations:

n'o)c

(3.10)

where

o)

T

[--p[-#p[#-p[##pj-pj#

W

3

and

n)

*

:--a-

q

:-4a4

:--b-

q

:-4b4

+-a-

q

+4a4

+-b-

q

+4b4

:--

q

:-4

+-

q

+4

/

A linear estimate of H can be obtained by minimizing the least squares criterion min

r

s't

r

#

, subject to r

t

r

#)B

. This is accomplished by using the Singular Value

Decomposition (SVD). The solution of Equation (3.10) is known up to a scale factor which will

be arbitrated unambiguously since vectors

u-

,

8u#

are orthonormal. Because of the orthonormality,

the unknown entries

[.-

,

8[.#

can also be computed uniquely. Calibration allows us to determine

the extrinsic parameters

[--

,

8[-#

,

8[#-

,

8[##

,

[.-

,

[.#

,

8j-

,

j#

for each pose i of the calibration pattern

other than for the translation parameter

j.

. Once

l

i

$1

k is known parameter

j.

can be estimated

as well.

3.3.3 Implementation

This Subsection explains how the intrinsic camera parameters were estimated. For

calibration procedure we followed a similar approach as the one introduced in [169] by D.

Scaramuzza. Several important advices towards calibration process were found in there, such as

the considered number of the checkerboard pattern poses (from 6 to 10 should be enough) and

the pattern positions (the pattern should be more visible for the camera while poses should cover

all of the visible camera’s area, e.g. from all around the camera lens). Based on these

recommendations, 9 snapshots were captured as shown in Figure 3.4. Once these images were

北京理工大学博士学位论文!

56

captured, the calibration process was carried out by means of a Toolbox extension [170]. The

configurations of the simulation and real environments are depicted in the Table 3-1.

Figure 3.4: Pre-calibration procedure: preparing images for the calibration procedure.

Table 3-1: Configuration parameters

Image resolution, pixels

1920x1920

Checkerboard patterns:

Pattern size

9x6

Square size, mm

36x36

3.3.4 Results

After calibration procedure, the above-mentioned Toolbox [169] can be used for

estimating calibration results. The average error (mean of the reprojection error computed over

all checkerboards) and the sum of the squared errors (sum of squared reprojection errors) are

presented in the Table 3-2. Figure 3.5 shows the distribution of the reprojection error of each

point for all the checkerboards. Different colors refer to the different images of the checkerboard.

This data reveals that more accurate calibration results were obtained in the case of the simulated

data. This might be related to the higher image resolution. Moreover, the fisheye camera placed

in the simulation environment does not have any distortion in contrast to the real cases.

北京理工大学博士学位论文!

57

Table 3-2: The Evaluation of the Intrinsic Camera Calibration

Average error, pixels

0.357560

Sum of squared errors

115.091981

Figure 3.5: The evaluation of the intrinsic camera calibration.

3.4 Preliminary Work on Extrinsic Calibration

3.4.1 System model

The omnidirectional vision system consists of the fisheye camera and structured light.

The primary objective of this system is to obtain the distance information from the mobile robot

to the surrounding obstacles located nearby. The distance information can be obtained by

processing the laser features from the given image. The camera model was considered in the

previous Section. By taking this information into consideration, the equation of the laser plane

projection can be written in the following way:

*

+

-

"#$

%/

e

T

[-

'[#

'[.

'

WT

[-

5[#

5[.

5j5

WD

a

b

m

B

J

)c

(3.11)

In Equation (3.11) we separated orientation matrix of the camera, represented by vectors

[,

'

and transformation matrix of the laser plane, represented by vectors

[,

5

and

j5

. The laser plane

is located on the constant from the camera optical center distance, which corresponds to the 3rd

北京理工大学博士学位论文!

58

row of the column vector

j5

. Therefore, world coordinates along Z-axis (if the camera looks

forward the floor, see Figure 3.2) do not change, which means that

m)c

in Equation (3.11).

Owing to the above-mentioned condition, i.e., Z=0, Equation (3.11) can be transformed into:

*

+

-

"#$

%/

e

T

[-

'[#

'[.

'

WT

[-

5[#

5j5

W*

a

b

B

/

)c

(3.12)

Moreover, 1st and 2nd rows of the

j5

vector represent offset between the origin of the

camera and the origin of the laser plane and both rows are equal to zeros. The value of this offset

is significant to know for reconstruction tasks, where laser plane should be rotated per each

scanning frame. However, as for 2D mapping, laser plane is fixed and only its orientation and

distance to the camera origin along the Z-axis are needed to be known. Consequently, as for

extrinsic calibration of the vision system, we need to estimate the orientation of the camera as

well as the laser plane and obtain the distance information between the vision sensors.

Afterwards, the results of this calibration can be verified by mapping of the indoor environment.

Equation (3.12) can be rewritten after multiplying the camera rotation matrix by the

transformation matrix of the laser plane, whereas the relationship between the world and image

points will have the following form:

*

+

-

"#$

%/

e

*

v-- v-# v-.

v#- v## v#.

v.- v.# v..

/*

a

b

B

/

)c

(3.13)

After conducting the necessary mathematical operations with Equation (3.13) it will be

represented as a series of equations:

-

#

v.-a6v.#b6v..

%

:"#$

%#

v#-a6v##b6v#.

%

)c

(3.14)

"#$

%#

v--a6v-#b6v-.

%

:+

#

v.-a6v.#b6v..

%

)c

(3.15)

+

#

v#-a6v##b6v#.

%

:-

#

v--a6v-#b6v-.

%

)c

(3.16)

World points X, Y can be calculated from the above equations. Equations (3.15) and

(3.16), might be considered more carefully:

w

;-a6x-b6?-)c

;#a6x#b68?#)c

(3.17)

where

;-)8"#$

%

v-- :+v.-

;

x-)"#$

%

v-# :+v.#

;

?-)"#$

%

v-. :+v..

;

;#)

+v#- :-v--

;

x#)+v## :-v-#

;

8?#)+v#. :-v-.

.

北京理工大学博士学位论文!

59

Finally, each world coordinate of the laser’s projection represents by the distance from

the camera to the laser plane (Z-coordinate) and by two points (X, Y) after solving system of

Equations (3.17):

a)#:?-:x-b%y8;-

; Y = (

;#?-:;-?#%y#8;-x#:;#x-%

(3.18)

3.4.2 Calibration procedure

Step 1. Preparing images for calibration procedure:

- the first image should include calibration target based on two perpendicular

checkerboard patterns for conducting calibration procedure (see Figure 3.6). We also

placed to our environment two additional checkerboard patterns on the distances

different to the target. Because of the lack of the ground truth we use these additional

patterns for mapping verification of the calculated extrinsic parameters;

- the second image should include laser beam belonging to the checkerboard patterns

(see Figure 3.6).

Figure 3.6: Calibration images. . Left image shows the single checkerboard patterns. Right image shows the

checkerboard patterns with the laser beam.

Step 2. Estimating camera orientation with regard to the target:

- first of all, we have to extract checkerboard points from the target;

- secondly, as we know square size of the pattern, we can take this information into

consideration and project these image points to the world ones. Here we do not know

the orientation of the target, therefore in the rotation matrix of the camera in Equation

(3.12) we set up orientation parameters equal to zeros. The projection results are

shown below:

北京理工大学博士学位论文!

60

Figure 3.7: Initial projection of the target (up- and front-views).

- From the projection presented in Figure 3.7, by simple geometry it is possible to

calculate orientation of the camera (pitch, roll, and yaw) with regard to the target

around corresponding axes: X, Y, and Z. Once calculations are done, previously

defined zeros in Equation (3.12) can be replaced with found angles. The projection

results are shown below:

Figure 3.8: Calibrated projection of the target (up- and front-views).

Step 3. Extracting laser beam from the second image:

- once we estimated position and orientation of our target, we can move to the 2nd

image and extract laser beam from it with the following algorithm:

a. laser strip segmentation by thresholding;

b. applying morphological operation such as skeletonization;

北京理工大学博士学位论文!

61

c. mask containing checkerboards border was added, in order to work only with

the region belonging to the pattern;

d. for extracted laser points belonging to the checkerboard patterns two curves

were fitted, one for each pattern;

e. points from these curves were summarized in the two arrays;

f. after that, processed previously image points can be projected to the world

ones.

The results are shown below:

Figure 3.9: From left to right: Input image; Extracted laser beam; Projection of the laser and points of the left

checkerboard pattern; Projection of the laser and points of the front checkerboard pattern.

Step 4. Fitting plane to the laser points:

- Among laser world points we will fit plane and find its incline by simple geometry.

As a result, we will obtain orientation of the laser plane (pitch and roll) as well as the

distance between the camera and laser plane.

Figure 3.10: Fitting plane to the laser points.

北京理工大学博士学位论文!

62

Step 5. Mapping verification:

- Once extrinsic parameters were estimated we can carry out mapping for evaluating

the quality of the calibration. In order to highlight the importance of the calibration

we depicted calibrated and not-calibrated map (see Figure 3.11). From this figure it

can be seen that extrinsic parameters were found correctly, as the calibrated map

match the structure of obstacles on the input image. Comparison of experiment

distances with the real ones is presented in the Table 3-3.

Figure 3.11: Verification by mapping. Left image shows checkerboard patterns with the laser beam. Midle image

shows not-calibrated map. Right image shows calibrated map.

Table 3-3: The Evaluation of the Extrinsic Calibration

Real

Experiment

Absolute Error

Right obstacle, mm

410

416

6

Bottom obstacle, mm

630

641

11

3.4.3 Discussion

This Subsection discusses main drawbacks which were found while working with the

calibration method based on the checkerboard patterns:

Table 3-4: Summary of the considered calibration method

Parameter

Description

Noise

One of the problems of existed techniques included target to the calibration procedures [105-120] is

that these techniques are not robust to the noise. These methods are based on converting images to

the binary ones. Thus, noise pixels may prevent algorithms to extract interested points from the

target. For example, when image containing the checkerboard pattern in combination with the noise

was analyzed by the calibration Matlab Toolbox [169] the following message was obtained: “Image

omitted -- Not all corners found.”

Complexity

Laser plane calibration by checkerboard patterns requires additional steps, which make the

北京理工大学博士学位论文!

63

calibration process more complicated. Namely, in order to obtain equation of the laser plane with

regard to the camera, checkerboard pattern should be moved to the different locations. For every

pattern position two snapshots should be taken: one with the laser beam (for laser extraction) and

another one without the laser beam (for obtaining pattern points). Otherwise, pattern points might be

extracted in a wrong way.

Illumination

Image has to be bright in order to extract points of the checkerboard pattern. Image has to be dark in

order to extract laser beam.

Limitations

Moreover, when calibration of the vision system with the structured light is carried out by

checkerboard patterns, we are losing laser information belonging to the black squares, where the laser

beam is not visible.

In the Section 3.1 we presented an improved vision system, consisted of the single

omnidirectional laser emitter. The proposed vision system showed a positive outcome towards

mapping of the indoor environment. Recovering a correct geometrical structure of the

environment was possible by calibration, which shows the importance of obtaining extrinsic

parameters before conducting certain measurements. At the same time, particular drawbacks of

the state-of-the-art calibration method were found (see Table 3-4). Consequently, it is crucial to

propose new calibration methods for obtaining extrinsic parameters. Moreover, during

experiments in this Section we faced particular difficulties of evaluating parameters of extrinsic

calibration as well as providing ground truth data for evaluating experiment results. Thus, in the

next Section we focus on building of the simulation environment able to provide more

opportunities for testing theories, methods and validating experiment results.

3.5 Development of the Simulation Environment

3.5.1 Motivation

One of the main goals of the proposed simulation environment is to provide greater

opportunities for testing theories and algorithms by researchers and engineers working with

omnidirectional vision systems. They might have software solution but might suffer from the

lack of the hardware. Moreover, empowering omnidirectional vision system with CV capabilities

(e.g. SLAM, path planning, semantic segmentation, etc.) is becoming a very important research

direction in the field of mobile robots. However, in real world applications it might be difficult or

even impossible to generate ground truth data for comparative analysis with the experiment data.

For example, the following issues may influence the ground truth data generation: drift of mobile

北京理工大学博士学位论文!

64

robot wheels, noise and drift of sensors, accuracy of the semantically labelled data, measurement

uncertainty etc. Considering all the above-mentioned issues, we propose a high-fidelity

simulation environment which is aimed at bridging the gap between simulation and reality,

making it relevant for CV researchers and engineers. Our simulation environment allows

experiments to be performed in a cost-effective way; compared to real experiments. It is easier

and more cost-effective to set up; it works faster and is more convenient to use than physical

experiments. With the proposed simulator it is also possible to generate depth data, semantically

labelled data and path data, which can be tested within the photo-realistic simulated indoor

scenarios. This opens up new methods for evaluating performance across a diverse set of

experiments. Moreover, it can be used in combination with other educational programs, e.g.

Matlab.

Matlab is a widespread among universities software for teaching engineering and

computer science courses. Matlab is an interactive environment for programming, data analysis

and for the development and visualization of algorithms. An important feature of this program is

the possibility of working with a variety of toolboxes. These toolboxes are able to simulate

practical tasks, e.g. to model and program mobile robots. Robotics System Toolbox [172]

developed by Matlab team or Robotics Toolbox [173] developed by Peter Corke, both of them

can be named as compact education system with tutorials and educational materials. These

toolboxes have the following capabilities: planning the trajectory of mobile robots, creating

algorithms, localization, monitoring the trajectories and, in general, controlling the mobile robot.

The main drawback of them is the lack of realistic graphics and an appropriate interaction with

objects on the scene. Those drawbacks can be eliminated with our simulation environment. In the

experiment part of this Section we will show main capabilities of our simulator by navigation of

the mobile robot inside the indoor environment in combination with Robotics System Toolbox

and Robotics Toolbox. In doing so, we not only prepare testing environments for next Chapters,

but also users familiar with those toolboxes with the aid of our extension will be able to get

realistic rendering mode and interactive scenarios for operating of the mobile robot equipped

with the fisheye camera and structured light. Moreover, as we decided to operate with Robotics

System Toolbox and Robotics Toolbox, we conduct comparison analysis between them as well

as discuss main features and capabilities.

北京理工大学博士学位论文!

65

3.5.2 Simulation Environment

Simulation conditions with the environment similar to the real ones was built by Unity

(see

Real environment

Simulation environment

Figure 3.12). Unity is the world’s most widely-used game development platform. Unity

provides a marketplace in which users can contribute and sell assets to the community, e.g., an

indoor environment was taken from there and modified for our own needs.

Real environment

北京理工大学博士学位论文!

66

Simulation environment

Figure 3.12: Omnidirectional vision system. Left image shows elements of the system. Right image shows snapshot

captured by fisheye camera

The omnidirectional vision system proposed in this Thesis consists of the fisheye camera,

as well as the laser emitter, and can cover horizontal 360° scene distance information with the

one single image (see

Real environment

北京理工大学博士学位论文!

67

Simulation environment

Figure 3.12). Distance to the obstacles can be calculated by the calibrated vision system

and extracted pixels of the laser strip from the given image. The fisheye camera integrated to the

simulation environment is mounted above the mobile robot and has a field of view (FOV) equal

to 180°. It is noteworthy that the above-mentioned FOV can also be replaced with the 210° or

240° FOV cameras [100]. Usually, several laser emitters are used in order to create a continuous

projected line to cover the field of interest around the mobile robot. The distinctive feature of the

proposed vision system is that it only has one laser emitter which covers surroundings of the

mobile robot. In the simulation environment, the distance between the laser plane and camera

might vary. Both, camera and laser plane orientations, can be changed (pitch, roll, yaw).

3.5.3 Platform features

The proposed simulator can be simply installed and configured on Windows, macOS as

well as the Linux operating systems. The simulator package includes: (1) interaction with scenes

and objects, (2) communication with other programs through Transmission Control Protocol

北京理工大学博士学位论文!

68

(TCP/IP), (3) fully implemented applications related to the calibration of the vision system as

well as 2D/3D mapping. All these features allow researchers to configure their own experimental

setups and design better algorithms for these purposes. An attempt has been made to create

realistic scenes by using rendering capabilities such as light sources, reflections, shadows etc.

Figure 3.13 shows a snapshot taken by our simulator illustrating these rendering capabilities.

Figure 3.13: Several modes supported by the simulator, from left to right: ordinary mode, semantic labeling mode,

depth mode.

Features listed above, can be helpful for testing theories and modelling systems, while

operating with types of sensors not included to the Robotics System Toolbox and Robotics

Toolbox before. The simulator supports two modes: manual mode and automatic mode, which

represents communication with other software via TCP/IP. In this Thesis we consider navigation

inside the indoor environment based on the Matlab toolboxes and proposed extension, for

performing these programs, functions presented in Table 3-5 are required.

Table 3-5: Operating Modes of the Simulator

Action

Manual mode from the user interface (UI)

of the simulator

Communication with other

programs by TCP/IP

Image capturing

Image can be saved with the corresponding

Image can be obtained in

北京理工大学博士学位论文!

69

button. Section “Camera” (see Figure 3.13)

Matlab via TCP/IP.

Movements of the

mobile robot

Mobile robot can be controlled by keyboard or

input fields. Section “Robot” (see Figure 3.13)

Mobile robot can be controlled

in Matlab via TCP/IP

Laser activation

Laser can be controlled by Section “Laser

Plane”

Laser can be controlled in

Matlab via TCP/IP.

3.5.4 Capabilities

The simulator consists of several screens: simulation, calibration, and measurement tool.

Simulation is the main screen where experiments take place (see Figure 3.13). Elements included

in the simulation can be configured and controlled with the panels on the left and right. For

example, changing the resolution of the camera, its FOV, activating/changing/moving objects,

tracking of the mobile robot and so on.

The calibration screen consists of the camera a checkerboard pattern. By moving the

camera or checkerboard pattern it is possible to collect data for intrinsic calibration of the lens

(see Figure 3.14). Similar to the simulation screen it supports changing the resolution of the

camera, its FOV, and changing the relative size of the checkerboard pattern.

Figure 3.14: The calibration screen.

The measurement screen includes the measurement tool, which is highlighted in yellow

in Figure 3.15. By moving this tool, it is possible to measure distances from the camera to certain

objects in the simulated scene, which may not be presented in the panels of the simulation screen.

In Figure 3.15 this tool was moved to the laser strip associated with the sofa.

北京理工大学博士学位论文!

70

Figure 3.15: The measurement screen.

3.5.5 Map transform

This Section considers the process of obtaining map with its further transforming to the

formats supported by Robotics Toolbox and Robotics System Toolbox for navigation of the

mobile robot. Mapping is possible the aid of the fisheye camera and laser emitter included to the

simulation environment. The system model for the camera with Z-axis looking down the floor

was described in Section 3.3. Camera in Figure 3.13 has different configuration, namely Z-axis is

looking forward. This means that only coordinates belonging to obstacles, namely along axes Y,

Z are changing and the distance from the camera to the laser plane along X-axis do not change.

The last one is represented by the 1st row of the column vector tl. By knowing the fact that only

(Y, Z) coordinates are changing we can transform Equation (3.11) into:

*

+

-

"#$

%/

e

T

[-

'[#

'[.

'

WT

[#

5[.

5j5

W*

b

m

B

/

)c

(3.19)

By knowing intrinsic and extrinsic parameters of the vision system the image containing

laser beam belonging to obstacles can be transformed to the map. First of all, laser beam must be

detected on the fisheye image and extracted from it. This procedure can be carried out by

thresholding as the laser beam has its unique red color. Once the laser beam was extracted, we

can transform laser points on the image to the world coordinates by Equation (3.19).

Transformed coordinates are represented by blue dots (see Figure 3.16). Initial map presented in

Figure 3.16 contains negative coordinates of obstacles, which cannot be transformed to the binary

北京理工大学博士学位论文!

71

map of the Robotics System Toolbox and to the logical map of the Robotics Toolbox. Thus,

initial map has to be shifted for eliminating those negative coordinates. Shifted coordinates can

be obtained by Equation (3.20) and Equation (3.21) for coordinates along X- and Y-axes.

N67&89:; )a&6a<&=

(3.20)

O67&89:; )b&6b<&=

(3.21)

where Xshifted and Yshifted represent shifted coordinates; Xi and Yi represent current coordinates;

Xmin and Ymin represent the minimum coordinates. Finally, the shifted map is shown in Figure 3.16.

Figure 3.16: Left image shows initial map obtained with laser emitter; right image shows shifted map.

Once shifted map with positive coordinates was created, it can be converted to maps for

corresponding toolboxes. Firstly, the shifted map is converted to the format suitable for the

Robotics System Toolbox, namely to the binary map with the aid of the Matlab function

“binaryOccupancyMap”. The binary map is shown in Figure 3.17. Next, coordinates of the

binary map can be simply converted to the format suitable for the Robotics Toolbox, namely to

the logical map with the aid of the Matlab function “occupancyMatrix”. The logical map is

shown in Figure 3.17.

北京理工大学博士学位论文!

72

Figure 3.17: Left image shows binary map supported by the Robotics System Toolbox; right image shows logical

map supported by the Robotics Toolbox.

Once input maps for the corresponding toolboxes were created, we can move to the

navigation part, which is considered in the next Section. An important observation here should

be given to the trajectory points generated by those toolboxes. Namely, in order to control the

mobile robot inside the proposed virtual environment and navigate around obstacles (see Figure

3.13) the generated trajectory points must be converted back to the range of coordinates similar to

the initial plot (see Figure 3.16).

3.5.6 Experiment setup

The main goal of the experiment was to investigate the compatibility of our simulator

with the Robotics System Toolbox and Robotics Toolbox. It was decided to demonstrate this

compatibility by means of practical tasks, namely by navigation of the mobile robot inside the

indoor environment. Additionally, the comparison analysis among these two toolboxes is

provided. Before conducting experiments, vision system must be calibrated, the calibration

process itself will be presented in the next Chapter. In this Section as we only testing the

simulation environment, therefore we assume that parameters of the extrinsic calibration are

known, namely laser plane is parallel to the floor and distance between camera and laser plane is

known. These assumptions can be guaranteed by the virtual environment. The OCamCalib

toolbox was used for obtaining intrinsic parameters of the fisheye camera. More details about the

experiment are provided bellow.

北京理工大学博士学位论文!

73

In order to perform navigation of the mobile robot, several obstacles were placed inside

the indoor environment (see Figure 3.13). The initial position of the robot is (0; 0) meters, target

position is (-0.2; 2.5) meters (see Figure 3.16). Robotics System Toolbox uses binary matrix of

obstacles, whereas Robotics Toolbox uses two-dimensional matrix. Consequently, extracted laser

beam belonging to obstacles was previously converted to the formats readable by these toolboxes.

Next, path between start and target points was created with corresponding toolboxes. After that,

these trajectory points were converted back and transmitted via TCP/IP to the simulator for

controlling the mobile robot. Finally, experimental target position was compared with the

position used for experiment setup.

3.5.7 Experiment results

Visual results of the conducted experiment are presented in Figure 3.18 for the Robotics

System Toolbox and for the Robotics Toolbox respectively. From these figures it can be seen

that the mobile robot has reached the desired target position without any visual misalignments.

More accurate comparison analysis is presented in Table 3-6. From the values in Table 3-6 it can

be seen that the coordinates of the experiment are similar to the coordinates from the experiment

setup, the difference between them is very small, in the range of a few millimeters. Parameters of

the Table 3-6 also demonstrates that with the Robotics System Toolbox mobile robot reached

target position faster than in case with the Robotics Toolboxes.

Figure 3.18: Left image shows navigation to the target position with Robotics System Toolbox; right image shows

navigation to the target position with Robotics Toolbox.

北京理工大学博士学位论文!

74

Table 3-6: Comparison Analysis

Parameter

Robotics System Toolbox

Robotics Toolbox

Input target position, meters

(-0.200; 2.500)

Experiment target position, meters

(0.195; 2.505)

(0.199; 2.499)

Operating time, seconds

67.68

178.85

3.5.8 Discussion

In general, in can be mentioned that Robotics System Toolbox provides more powerful

functionality for modeling and controlling mobile robots. For example, as for navigation with

Robotics System Toolbox it was possible to take into consideration dimension of the robot in

order to do not collide with any obstacles. More details about experiment results and merits of

those toolboxes are discussed below:

Operating time: experiment result showed that both toolboxes can generate and provide

an accurate trajectory path to the target point. Meanwhile, for the Robotics System Toolbox this

target point was reached almost three times faster in comparison with Robotics Toolbox. The

reason for this may be related with the format of the navigation map. Robotics System Toolbox

uses binary matrix which occupies less space in memory and consequently processed faster than

ordinary two-dimensional matrix of the Robotics Toolbox.

Path modeling: in order to reach the target, Robotics Toolbox moves robot to the

neighboring cell with the smallest distance to the target. The process is repeated until the robot

reaches a cell with a distance value of zero which is the target. Meanwhile, Robotics System

Toolbox operates with more advanced path planner, which based on the Probabilistic Roadmap

(PRM). PRM path planner constructs a roadmap in the free space of a given map using randomly

sampled nodes in the free space and connecting them with each other.

Robot movements: instead of using discrete movements of the mobile robot provided by

Robotics Toolbox, Robotics System Toolbox can provide more realistic navigation scenarios,

namely with aid of the Pure Pursuit controller. This controller is used in order to drive the

simulated robot along the desired trajectory towards target point.

北京理工大学博士学位论文!

75

Chapter 4 Extrinsic Calibration and 2D Mapping

4.1 Main Contributions of this chapter

The contributions of our work can be summarized as follows. First, we present a novel

omnidirectional vision system with laser illumination in a flexible configuration while proposing

a suitable method for its calibration. In this dissertation, we consider the more realistic scenario,

i.e., a flexible configuration where both, the camera and laser plane are no longer considered

parallel to the floor. As for calibration, we propose a method based on the one single snapshot

and show that reliable measurements can be obtained for our vision system. It is noteworthy that

there is a different definition of the term “flexible configuration” in another work [174]. C.

Paniagua et al. proposed a new omnidirectional structured light system based on a conic pattern

light emitter in flexible configuration, which can be used as a personal assistance system.

Flexible configuration means that the system does not need to be calibrated beforehand, namely,

the relationship between the camera and laser plane can be obtained during usage, which is

possible by reconstruction of the projected conic and known distance between the camera and

laser emitter. This distance was calculated by attaching a small ball with a known radius to the

endpoint of the laser. According to the data presented in their paper, it can be seen that the

counter extraction of the ball does not provide an appropriate shape of a circle, which also shows

the difficulty of extracting small objects from the omnidirectional images. Thus, the process of

adopting calibration procedures, such as using a ball as the target [104], to the omnidirectional

images might not provide reliable calibration results, especially in the case of the vision systems

when the laser emitter is located at a significant distance from the camera. Other authors

presented a self-calibration technique using a projected grid pattern for one-shot scanning to

densely measure shapes of objects by a projector and camera [175]. However, these methods are

not suitable for line emitters, where image features are not enough for extrinsic self-calibration

by one single capture.

4.2 Proposed Method of Extrinsic Calibration

4.2.1 System model

The system model was described in Chapter 3, here we only depict the general equation

of the relationship between image and world points, namely Equation 3.2. From this equation we

北京理工大学博士学位论文!

76

know that the laser plane is located on the constant from the camera optical center distance. This

means, that distance to the laser plane along Z-axis is constant and coordinates of the laser beam

belonging to obstacles along axes X and Y are changing (see

Real environment

Simulation environment

Figure 3.12). To this particular condition corresponds the mapping equation (3.12).

4.2.2 Calibration procedure in a general form

In order to place the camera or laser emitter above the mobile robot, intermediate links

and joints are required. Therefore, it is difficult to adjust the sensors’ orientation in a desirable

way, e.g. parallel to the floor and, as a consequence, small deviations from these assumptions

might occur. That is why the calibration is important. Calibration process consists in finding the

camera rotation matrix and transformation matrix of the laser plane. In a general form, these

parameters can be efficiently found by solving the following optimization problem:

北京理工大学博士学位论文!

77

z

{

|

{

}

~•€>!?>"?3"

r

"

#

•'p

T

•58

‚

8\5W

%r

#

ƒ2„…†‡I8Iˆ8"

#

•'p

T

•58

‚

8\5W

%

)

*

+

-

"#$

%/

e•'

T

•58

‚

8\5W

*

a

b

B

/

8

8888888888888888888888888•')8

T

[-

'[#

'[.

'

W

88

8888888888888888888888888

T

•58

‚

8\5W)8

T

[-

5[#

5[.

5j5

W

888888

(4.1)

4.2.3 Camera Calibration Placed to the Indoor Environment

Equation (4.1) provides substantial insights, especially on the evidence that in camera

extrinsic parameters we are only interested in its orientation parameters forming the camera

rotation matrix

•'

, namely pitch, roll, and yaw. Figure 4. illustrates the box target developed for

the calibration procedure. This target is used in order to calibrate the camera and laser plane, the

calibration procedure of the last one is explained during the next step. Similar to the indoor

environment, the target has rectangular structure. Therefore, calibration process aims at finding

the camera orientation with regard to the target sides by computing pitch, roll, and yaw, which

minimize the Equation (3.12), whose formulation is depicted in Equation (4.1).

Figure 4.1: Calibration target. Green border shows target in a cross-section.

The target has white and black regions, and we are interested in the border extraction

between them, which is parallel to the floor. This means that orientation parameters of the border

included into the

•5

matrix are equal to zero (we are using border here, in order to obtain the

relationship between known matrix

•5

and unknown matrix

•'

). Firstly, in order to show that the

projected border points will not have a desirable rectangular shape even when small errors in the

camera orientation are present. We set up the orientation parameters of the camera in

•'

equal to

zero. Furthermore, since we are only interested in the orientation parameters, the distance

北京理工大学博士学位论文!

78

between the camera and border in

\5

can be set up randomly (the geometry does not change).

Once this procedure has been done, the pixel coordinates are projected to the world ones by

Equation (3.12) and the mapping result is shown in Figure 4. (blue color). In Figure 4., it can be

seen that the projected points do not have a desirable rectangular shape and orientation, because

the parameters of the camera orientation (pitch, roll, yaw) are different from those previously set

as zero in the Equation (3.12). The desirable shape of the border projection is as follows:

• Vectors

‰Š

‹

and

Œ•

‹

are collinear;

• Vectors

Œ‰

‹

and

•Š

‹

are collinear;

• Vectors

Œ‰

‹

and

Œ•

‹

are orthogonal.

Figure 4.2: Blue – initial projection. Red – projection with the optimized camera pitch and roll.

In order to solve the minimization problem, we use the Matlab function “fmincon”,

which allows one to find the global minimum of a predefined constrained nonlinear function.

The minimized pitch can be found by making the projected vectors

‰Š

‹

and

Œ•

‹

collinear to each

other. The above-mentioned minimization problem can be written in a following way:

z

{

|

{

}

~•€@&9'7

r

"

#

Ž•j?v

%r

#

ƒ2„…†‡I8Iˆ8"

#

Ž•j?v

%

)‰ŠA

9

‰ŠA

#6‰ŠB

#:Œ•A

9

Œ•A

#6Œ•B

#

:Xc•‘Ž•j?v‘Xc•

(4.2)

where the vectors

‰Š

‹

and

Œ•

‹

depend only on the pitch; roll with yaw do not influence on

their collinearity, thus roll and yaw are equal to zero. The previously established limits

T

:Xc•’Xc•

W for pitch in the minimization problem (4.1) and bellow for other orientation

parameters were written this way based on the evidence that for obtaining a better performance

北京理工大学博士学位论文!

79

of the vision system, these orientation parameters of the camera and the laser plane should not be

very significant in order to cover the wide range of the indoor environment. Therefore, these

limits have the above-mentioned form, by which the minimized orientation parameters can be

successfully found.

The minimized roll is found in a similar way as the pitch, but different vectors

Œ‰

‹

and

•Š

‹

are defined in this case. These vectors depend on the roll; thus, pitch and yaw are equal to

zero, respectively. The formulation has the following form:

z

{

|

{

}

~•€CD55

r

"

#

[“””

%r

#

ƒ2„…†‡I8Iˆ8"

#

[“””

%

)Œ‰B

9

Œ‰A

#6Œ‰B

#:•ŠB

9

•ŠA

#6•ŠB

#

:Xc‘[“””‘Xc

(4.3)

Afterwards, the previously defined zero values in the Equation (3.12) can be replaced

with the optimized pitch and roll. In Figure 4. (red color), it can be seen that now the projected

border points have an appropriate rectangular shape. However, one parameter is still unknown.

Thus, as a result the rectangular projection having a rotation around Z-axis (yaw), our goal is to

obtain this angle. The yaw is obtained by means of the minimization multiplication of the slopes

between the horizontal and vertical vectors

Œ‰

‹

and

Œ•

‹

. These vectors depend on the yaw,

whereas pitch and roll are constant. The formulation is as follows:

z

{

|

{

}

~•€>!

r

"

#

•;–

%r

#

ƒ2„…†‡I8Iˆ8"

#

•;–

%

)Œ‰B

9

Œ‰A

#6Œ‰B

#'Œ•A

9

Œ•A

#6Œ•B

#

:Xc‘•;–‘Xc

(4.4)

Finally, the desirable projection is shown in Figure 4..

北京理工大学博士学位论文!

80

Figure 4.3: Optimized camera yaw.

4.2.4 Laser Plane Calibration

In this Subsection, it is described how the orientation of the laser plane and distance

between the camera and laser plane are calculated. The pitch and roll determine the camera

rotation matrix

•5p

whereas the distance is associated with the translation represented by

\5

. First

of all, the laser strip has to be extracted from the given image with the aid of the following

algorithm:

• Laser beam segmentation by thresholding;

• Applying morphological operation such as skeletonization;

• Creation of the mask for the laser beam belonging to every side of the target (corners of

the target has blue patterns, see Figure 4., after their segmentation mask is created);

• For extracted laser points belonging to the sides – curves were fitted for the noise

reduction;

• Afterwards, the previously processed image points can be projected to the world ones.

The angle values obtained during the previous step, which are related with the camera

orientation, are used here in order to calibrate the laser plane by using a similar approach as in

the previous Subsection. The only difference is that now we operate with the laser transformation

matrix. The initial projection of the laser strip with the known camera rotation matrix and

unknown laser rotation matrix (pitch and roll are equal to zeros) is shown in Figure 4. (blue color).

The laser plane does not have yaw, which means that once the pitch and roll are calculated, the

projected laser beam has an appropriate rectangular shape:

北京理工大学博士学位论文!

81

• Vectors

—˜

‹

and

o™

‹

are collinear;

• Vectors

o—

‹

and

™˜

‹

are collinear.

Figure 4.4: Blue – initial projection. Red – projection with the optimized laser pitch and roll.

By taking this information into consideration, the minimized pitch can be found by

making the projected vectors

—˜

‹

and

o™

‹

collinear to each other. These vectors depend on the

pitch and roll, which are equal to zero, respectively, whereas

•'

is known and constant. The

minimization problem can be written in a following way:

z

{

|

{

}

~•€@&9'7

r

"

#

Ž•j?v

%r

#

ƒ2„…†‡I8Iˆ8"

#

Ž•j?v

%

)—˜A

9

—˜A

#6—˜B

#:o™A

9

o™A

#6o™B

#8

:Xc•‘Ž•j?v‘Xc•

(4.5)

The minimized roll is found in a similar way as the pitch, but different vectors

Œ‰

‹

and

•Š

‹

are used here. These vectors depend on the roll; thus, pitch is equal to zero, whereas

•'

is

known and constant. The formulation has the following form:

z

{

|

{

}

~•€@&9'7

r

"

#

[“””

%r

#

ƒ2„…†‡I8Iˆ8"

#

[“””

%

)o—B

9

o—A

#6o—B

#:™˜B

9

™˜A

#6™˜B

#8

:Xc•‘[“””‘Xc•

(4.6)

Once the roll and pitch of the matrix

•5

are calculated, the projected laser points have an

appropriate shape (see Figure 4.). However, one parameter related with the transformation matrix

of the laser plane is still unknown, i.e., the distance between the camera and laser plane. The size

of the target is known, hence the distance between the camera and laser plane can be obtained.

The surface can be calculated after measuring the known real distances between the sides of the

北京理工大学博士学位论文!

82

target – S1. This in turn, can also be calculated from the projected world coordinates of the laser

beam – S2. World points are calculated with the known matrices

•'

and

•5

. This means that S2

only depends on the distance associated to the matrix

\5

. Thus, by minimizing the difference

between S1 and S2 in the objective function, it becomes possible to find the depending variable,

which represents the distance between the camera and laser plane. The aforementioned

procedure has the following form:

š

~•€;&69

r

"

#

@•›j

%r

#

"

#

@•›j

%

)œ-:œ#

œ#)#•E6•F

•:•G6•H

•%'#žF6žH

•:žE6žG

•%

(4.7)

where during the calculations of the parameter S2, in order to obtain difference between sides the

mean value of the coordinates in each case are used. The rationality behind the use of the mean

value can be perfectly understood by analyzing that the projected sides are not strictly parallel to

each other, as their coplanarity is achieved by minimization.

Figure 4. shows the achieved mapping results in the calibrated vision system, representing

measurements of the calibration target. It is worthwhile mentioning that the real size of the target

for that specific case was set equal to 1084.5x1084.5 mm. Similar results equal to 1084.9x1084.1

mm, were achieved experimentally by calibration.

Figure 4.5: Optimized laser distance.

4.3 Experiment Setup

The calibration of the aforementioned extrinsic camera parameters has been done once all

parameters of the camera rotation matrix and the laser transformation matrix of Equation (3.12)

北京理工大学博士学位论文!

83

are known. These extrinsic parameters cannot be measured precisely by real experiments, as a

result it is difficult to prove that the proposed methodology is reliable and robust, as well as

when comparing it with other methods. Thus, these experiments are carried out by the simulation

environment:

4.3.1 Self-evaluation technique

The experiments included in this Section were aimed at examining the robustness of the

proposed calibration method with regard to the noise, location of the target and to asses more

appropriate conditions for the calibration process itself. The initial configuration of the vision

system is the following: the distances between the parallel sides of the calibration target, which

was described are equal to 1084.5 mm; the distance from the camera to the left side is 646 mm,

to the front side is 512 mm and to the laser plane is 466 mm. The image resolution is 1920x1920

pixels. The calibration procedure was tested in different conditions as well as different

configurations, which are set out bellow (10 experiments were carried out for each of them):

• Adding noise to the border pixels. The Salt & Pepper noise was added to the image (see

Figure 4.). Generally, this type of noise can be caused by defects of the camera sensor,

software failure, or hardware failure in image capturing or transmission [176]. To allow

the addition of noise, the Matlab function “imnoise” was used. The noise density “d” is

increased by 0.005 in each step. The noise affect can be approximately represented as

@'

Ÿ+ A”#• ;lA%

pixels, where the function “numel” returns the number of array elements.

The aim is to estimate the robustness of obtaining orientations parameters of the camera

with regard to the noise;

• Adding noise to the laser pixels. Similar Salt & Pepper noise consisting of RGB pixels

was added to the image. However, as for extraction of the pixels of the laser beam, only

red pixels are significant. Thus, noise density “d” is increased to 0.01, in order to expand

the number of red pixels included to the noise. Pitch and roll of the laser plane are

calculated by Equation (4.5) and Equation (4.6), respectively. The known constant

parameters of the camera orientation were set up to these equations, as we estimating the

influence of the noise on the calibration of the laser plane.

• Changing the distance between the camera and target in the horizontal direction. The

target is moving forward and right by 50 mm in each step and parameters related with the

北京理工大学博士学位论文!

84

extrinsic calibration were estimated. Pitch and roll of the laser plane depend on the

camera orientations parameters, as it was mention before. Thus, in order to do not include

redundant information, the analysis of the data was related with the extrinsic parameters

of the laser plane, which are based on the camera’s orientation parameters determined

during the calibration process. The next two follow a similar approach. It is also

worthwhile mentioning that the parameters of the camera orientation are presented when

the proposed method is compared with the state-of-the-art calibration technique, where it

can be analyzed how these parameters correlate between methods;

• Changing the distance between the camera and target in the vertical direction. The

camera is moving up by 50 mm in each step and the parameters related with the extrinsic

calibration were estimated;

• Changing the size of the target. The distances between the sides of the box target are

decreased by 50 mm in each step and the parameters related with the extrinsic calibration

were estimated.

4.3.2 Comparison with the state-of-the-art calibration method

The calibration included to this Section was conducted for 40 different configurations of

the vision system generated by the simulation environment, in order to verify how these changes

might influence the final calibration results. The orientations of the camera and laser plane were

set up randomly from -10 to 10 degrees. It was planned to understand what is the difference

between the value of the real extrinsic parameters, included to

•'

,

•5

,

\5

and ones obtained

during the calibration procedure.

The proposed calibration method was compared with the state-of-the-art technique based

on the two perpendicular checkerboards (standard method), described in [107] and similar

configuration is shown in Error! Reference source not found.. Inside the simulation environment

the configurations of these two methods are depicted in the Table 4-1.

Table 4-1: The Configurations of the Calibration Methods

Target

Proposed

Box target

Standard

Checkerboard patterns

Configuration of the target

Proposed

Distances between parallel sides of the target are 1084.5 mm

北京理工大学博士学位论文!

85

Standard

Size of the checkerboard is 9x6. The square size is 97x97 mm

Location of the target

Proposed

Distance from the camera to the left side/pattern is 646 mm and to the

front side/pattern is 512 mm

Standard

Other configurations

Proposed

Distance from the camera to the floor is 927 mm and to the laser plane

is 466 mm. Image resolution is 1920x1920 pixels

Standard

4.3.3 2D mapping

The primary objective of the vision system is to obtain the distance information to the

obstacles in a correct way. In order to compare the mapping results between different methods,

four obstacles were placed in the simulation environment to the different from calibration target

distances. The configuration of the simulation environment is depicted in the Table 4-2.

Table 4-2: The Configuration of the Simulation Environment

Left

Bottom

Front

Right

Distance to the obstacles, mm

466

750

992

1584

The rationality behind the location of the obstacles at different distances can be

understood by analyzing that this approach reveals the performance of the calibration results in

the environment as a whole.

4.3.4 Real data

Finally, in order to show that our approach has actual practical implementation in real

scenarios, conditions similar to those described in the simulation environment were created with

real elements (see Chapter 3). The distance between the parallel sides of the calibration target

was equal to 390 mm. The image resolution equal to 960x960 pixels was generated by the

fisheye camera Ricoh Theta S. The laser emitter with a wavelength of 650 nm was chosen. The

distances from camera to the four obstacles was measured manually, because of the lack of the

ground truth. The configuration of the real environment is depicted in the Table 4-3.

Table 4-3: The Configuration of the Real Environment

Left

Bottom

Front

Right

Distance to the obstacles, mm

300

405

490

605

北京理工大学博士学位论文!

86

4.4 Experiment Results

The calibration procedure of the vision system and results validation were tested on the

data obtained from the simulated and real environments. Firstly, the results of the intrinsic

camera calibration are shown. Secondly, the error analysis of the extrinsic parameters is

presented. In order to conduct the comparison analysis of the calibration results between

different methods, only error analysis of the extrinsic parameters is required. However, as

mentioned before, the main objective of the vision system is an accurate measurement of the

distance to the obstacles, located around the mobile robot. Thus, the calibration results can also

be compared by means of the mapping of the indoor environment. Aiming at a better

understanding, we provide the visual data, for particular configurations of the vision system and

in order to show measurement uncertainty, the standard deviation is calculated among all

configurations, for each of the measured distances to the obstacle. After that, we show the

calibration results obtained by real data, where the calibration results are evaluated and the

mapping of the indoor environment is presented.

4.4.1 Self-evaluation technique

The proposed method searches for black and white pixels, after that, it extracts the border

between them. Once the border is extracted it also contains noise, but only the one which is

within the searching range of the black and white pixels. The noise pixels which are out of the

extracted border can be easily eliminated from the image. Furthermore, the noise points

belonging to the border line do not have an adverse impact on the final results, because the fitting

curve is mostly based on the majority of the border points, as it is shown for one of the sides of

the target in Figure 4.. Part 1 of the Table 4-4 also shows that when the noise density is increased,

the parameters of the camera orientation do not differ much from the initial state, which does not

contain noise.

北京理工大学博士学位论文!

87

Figure 4.6: Noise Salt & Pepper. Left image is related with the camera calibration. Right image is related with the

laser plane calibration. Magenta color shows extracted image points. Red curve is fitted among the majority of the

points. Blue points are the curve points, which are used for calibration and are also superimposed on the bottom

image, for better visual understanding.

Table 4-4: The Evaluation of the Intrinsic Camera Calibration

Absolute Error

№

1

2

3

4

5

6

7

8

9

10

Part 1. Adding noise to the border pixels

Camera

Pitch, degrees

0.052

0.053

0.051

0.043

0.044

0.065

0.046

0.053

0.052

0.077

Roll, degrees

0.033

0.034

0.028

0.023

0.015

0.012

0.038

Yaw, degrees

0.093

0.095

0.114

0.120

0.066

0.012

0.074

0.069

0.023

Part 2. Adding noise to the laser pixels

Laser

plane

Pitch, degrees

0.009

0.002

0.004

0.012

0.033

0.037

0.033

0.030

0.031

Roll, degrees

0.002

0.010

0.049

0.028

0.032

0.047

0.060

0.034

0.058

Distance, mm

2.419

2.406

2.412

2.322

2.238

2.061

2.173

2.293

2.069

2.368

Part 3. Changing distance between the camera and target in the horizontal direction

Laser

Pitch, degrees

0.086

0.101

0.060

0.030

0.053

0.079

0.086

0.081

0.102

0.107

北京理工大学博士学位论文!

88

plane

Roll, degrees

0.050

0.078

0.046

0.023

0.069

0.100

0.101

0.105

0.113

0.145

Distance, mm

2.243

2.189

2.363

2.495

2.201

2.963

2.870

2.745

2.541

2.625

Part 4. Changing distance between the camera and target in the vertical direction

Laser

plane

Pitch, degrees

0.086

0.063

0.074

0.070

0.059

0.038

0.036

0.039

0.042

0.022

Roll, degrees

0.050

0.057

0.023

0.010

0.018

0.023

0.032

0.022

0.051

0.034

Distance, mm

2.243

2.691

2.915

3.032

3.053

3.039

3.353

3.652

3.426

5. Changing size of the target

Laser

plane

Pitch, degrees

0.086

0.052

0.063

0.004

0.002

0.026

0.056

0.060

0.76

0.075

Roll, degrees

0.050

0.033

0.080

0.056

0.024

0.058

0.024

0.072

0.057

0.062

Distance, mm

2.243

2.677

3.024

3.162

3.394

3.466

3.572

3.585

3.584

3.712

Similar to the border between black and white regions, the laser beam also belongs to the

whole side of the target. During this step it is shown different approach from the camera

calibration. Namely, the border points for camera calibration were obtained with the eliminated

separate noise pixels. A different approach means that for laser plane calibration, noise pixels,

which are out of the laser beam, were not removed from the image (see Figure 4.). However, in

that case, curve is still appropriately fitting among the extracted points (laser points). Part 2 of

the Table 4-4 also shows that when the noise density is increased, the parameters of the laser

orientation do not differ much from the initial position, which does not contain noise.

By analyzing the experiment related with the changing distance between the camera and

target in the horizontal direction, it is worthwhile to mention that small translations from the

initial position provide similar results (see Table 4-4, Part 3). However, once the target was

moved farther, the quality of the calibration parameters slightly decreased. Thus, for the

calibration procedure it is recommended to locate the target closer to the camera.

The last two parts of the experiment are related with the movements of the camera up

from the target and with the reduction of the target size. Both of them provide quite similar

calibration results (see Table 4-4 Part 4 and Part 5). The parameters of the camera orientations

remain stable in the different configurations; these results also support the aforementioned

statement that the target should be placed closer to the camera along the horizontal direction.

However, it can also be seen that if the distance between the camera and the target along the

vertical direction is increased or the size of the target is decreased, in that case, the error related

with the distance between the camera and laser plane becomes bigger. Thus, the recommendation

for calibration is to keep the camera closer to the target with an appropriate size of the last one.

北京理工大学博士学位论文!

89

4.4.2 Comparison with the state-of-the-art calibration method

As for extrinsic parameters estimation, the mean absolute error (MAE) and the root mean

squared error (RMSE) among all of the configurations of the vision system were calculated for

the proposed method and the standard one for comparison analysis. The standard one method

was based on the checkerboard patterns. These values are depicted in Table 4-5. Moreover,

Table 4-5 also includes the values corresponding to the maximum absolute error (AE), which

was obtained during the specific experiment’s index (the index is shown in the brackets). For

these experiments corresponding to the maximum AE, mapping of the indoor environment is

provided in the next Subsection. Mapping is essential for conducting the visual comparison

analysis the different calibration methods. In Table 4-5, it can be seen that better calibration

results were obtained by the proposed method.

Table 4-5: The Evaluation of the Extrinsic Calibration

Camera Parameters

Laser Parameters

Pitch,

degrees

Roll,

degrees

Yaw,

degrees

Pitch,

degrees

Roll,

degrees

Distance,

mm

MAE

Standard

0.13

0.11

0.10

0.28

0.20

2.04

Proposed

0.04

0.02

0.03

0.04

1.49

RMSE

Standard

0.18

0.15

0.43

0.31

2.47

Proposed

0.04

0.03

0.04

0.05

0.06

1.70

Maximum AE

Standard

0.54 (21)

0.61 (10)

0.44 (10)

1.50 (13)

1.17 (10)

7.89 (10)

Proposed

0.11 (11)

0.10 (13)

0.09 (10)

0.15 (39)

0.19 (13)

2.58 (7)

4.4.3 2D mapping

In Table 4-5, the indexes associated to each particular experiment were illustrated.

Particularly, Figure 4. illustrates the corresponding maps for these configurations. For

configurations 7 and 39 both methods provide quite similar results. However, as for the rest part

of the maps, the difference between these methods seems insignificant for obstacles located at

small from the camera distances. On the other side, the difference in the outcome of these two

methods is quite significant for obstacles placed far away from the mobile robot. Accordingly, it

can be noticed that the proposed method provides better results, outperforming the standard one.

北京理工大学博士学位论文!

90

№7. camera [0;5.5;0.5], laser plane [0;-1;0]

№10. camera [-10;5;7], laser plane [-9;-5;0]

№11. camera [9;0;7], laser plane [-1;7;0]

№13. camera [0.5;8;9], laser plane [5.5;1;0]

№21. camera [0;9;7], laser plane [-0.5;-7.5;0]

№39. camera [-2.5;1.5;5], laser plane [-4.5;3.5;0]

Figure 4.7: Mapping of the indoor environment. Red color – Proposed method; Blue color – Standard method.

Under maps it is shown index of the particular experiment with the configuration of the vision system, where values

in the brackets reveal [pitch; roll; yaw] in degrees respectively.

For better visual analysis more mapping samples are required and it is difficult to include

all of them in the Thesis. Therefore, the main comparison analysis was conducted by estimating

the extrinsic parameters and also measurement uncertainty is calculated bellow, whereas the

mapping of some configurations was added additionally, for better visual understanding of what

is the main goal of the vision system (obtaining distance to the obstacles in an appropriate way).

Table 4-6 shows the AE for distances from the mobile robot to the obstacles placed to the indoor

environment. The experimental distances were calculated as the mean value of all points

belonging to the particular obstacle. This approach can provide approximate understanding of

how these distances correlate between the different methods. For some configurations the

standard method provided more accurate results, however, the proposed method proved its

robustness among all the configurations.

Table 4-6: The Evaluation of the Mapping Results

Exp. №

Left

Bottom

Front

Right

北京理工大学博士学位论文!

91

Real Distance, mm

466.00

750.00

992.00

1584.00

AE, mm

Proposed (Standard)

7

1.19 (0.44)

1.74 (0.74)

3.76 (2.59)

7.00 (0.6)

10

2.73 (8.46)

2.90 (5.63)

4.07 (35.20)

27.10 (113.10)

11

1.81 (4.99)

2.82 (21.15)

13.00 (23.00)

5.30 (60.00)

13

1.67 (15.87)

0.09 (13.22)

0.52 (9.40)

15.30 (159.30)

21

1.21 (16.62)

3.10 (0.32)

7.41 (20.30)

1.50 (134.80)

39

0.98 (1.22)

1.33 (1.67)

8.40 (0.33)

23.10 (3.50)

This Section also evaluates the calibration results by estimating the distances from the

mobile robot to the obstacles placed in the indoor environment. By changing the configuration of

the vision system (orientation of the camera and laser plane), the distance from the camera to the

obstacles does not change, because the simulation environment allows rotating the camera

around its optical center. By taking this advantage of the simulation environment into

consideration, it is possible to evaluate the calibration results by mapping in the appropriate way.

Thus, in order to estimate the quality of the measurements and compare the calibration results

between the different methods, the measurement uncertainty obtained by means of the standard

deviation is calculated in the following way:

&)8

¡¢

#ž&:ž%

=

&I-

Ÿ:B

(4.8)

where

ž&

is the ith measured value;

ž

is the mean measured value, which is taken among all the

laser points belonging to the particular obstacle;

Ÿ

is index of the experiment.

Table 4-7 presents the complete measurement results for the proposed and the standard

methods, respectively. In this table, it can be seen that the standard deviation becomes bigger

when the distance to the obstacle increases. However, it can also be seen that the better

measurement results were obtained by the proposed calibration method.

Table 4-7: The Evaluation of the Mapping Results

Real, mm

Proposed 𝒙 ± 𝝈, mm

Standard 𝒙 ± 𝝈, mm

Left

466.00

467.62 ± 0.98

467.84 ± 3.82

Bottom

750.00

751.47 ± 3.45

751.94 ± 6.43

Front

992.00

997.14 ± 2.62

999.67 ± 7.88

Right

1584.00

1589.14 ± 7.15

1597.68 ± 35.64

北京理工大学博士学位论文!

92

4.4.4 Real data

In the previous Section, the proposed calibration technique proved its robustness

according to the simulated data. In order analyze the practical application of the proposed

method in real scenarios, we recreated similar conditions to the simulation environment and

carried out the calibration procedure as well as the mapping. It is worthwhile mentioning that

during the experiments carried out in real environments we realized that the process of creating

the mask for the laser beam belonging to the target sides should be performed out in a different

way. We attached blue patterns to the corners, however the illumination of the real environment

was not enough for their detection. A solution was simply found by attaching black stripes to the

corners, in doing so the red beam becomes invisible when it belongs to these regions (see Figure

4.). This figure also provides useful insights on the fact that when the calibration of the vision

system with structured light is carried out by checkerboard patterns, we are losing laser

information belonging to the black squares. The proposed calibration target outperforms the

standard method, by taking an advantage of the continuous laser beam in all directions.

Figure 4.8: Calibration target. Green border shows target in a cross-section.

Figure 4. shows the results of the extrinsic calibration. Owing to the lack of the ground

truth, we cannot compare the experiment results with the real ones in the same manner as it was

carried out with the simulation environment. However, we know the real size of the target and

we have the final map (see Figure 4.C). Thus, we can estimate the calibration results by means of

this information as it is depicted in Table 4-8. Results show that the system was calibrated in a

quite accurate way. These results can be improved by achieving more accurate intrinsic camera

parameters and also by manufacturing more professional targets. A simple box was used in this

case, in order to show that the proposed method is available for everyone.

北京理工大学博士学位论文!

93

A. Blue – initial projection [0;0;0]. Red –

projection with the optimized camera

orientation [-3.24;4.70;-3.56]

B. Blue – initial projection [0;0;0]. Red –

projection with the optimized laser

orientation [2.81;2.61;0]

C. Optimized laser distance – 215.96 mm. Box

size is equal to 390x390 mm

Figure 4.9: Extrinsic calibration.

Table 4-8: The Evaluation of the Extrinsic Calibration

Real

Experiment

Absolute Error

Width, mm

390.00

391.57

1.57

Height, mm

390.00

389.88

0.12

The calibration results were also evaluated by means of the mapping of the indoor

environment (see Figure 4.10). We achieved more appropriate geometrical structure of the

environment once the system was calibrated. The calibration process is not difficult (only one

single image is required) and at the same time it significantly improves the final mapping results.

This, indeed, can be seen by mapping verification (see Table 4-9).

A. Input image for mapping

B. Blue – mapping of the not calibrated system. Red – mapping of the

北京理工大学博士学位论文!

94

calibrated system

Figure 4.10: Mapping of the indoor environment.

Table 4-9: The Evaluation of the Mapping Results

Front, mm

Right, mm

Left, mm

Bottom, mm

Real

300.00

405.00

490.00

605.00

Experiment

303,20

411,56

498,27

593,77

AE

3,20

6,56

8,27

11,23

北京理工大学博士学位论文!

95

Chapter 5 Extrinsic Calibration and 3D Reconstruction

5.1 Motivation

In the previous Section we presented a novel calibration technique for obtaining extrinsic

parameters between the camera and laser plane. Our calibration method was based on the box

target (see Figure 5.6) and proved its effectiveness and robustness in comparison with other state-

of-the-art calibration methods. However, we realized there were limitations in this approach,

namely it is not suited to the configuration of the vision system proposed 3D reconstruction,

where camera is looking forward, consequently we lose part of the target (see Figure 5.6) required

for the calibration procedure. In this Section we consider an improved calibration technique and

estimate its robustness in comparison with our previous calibration method.

A

B

Figure 5.1: A shows previous configuration. B shows proposed configuration.

5.2 System Model

The system model was described in Chapter 3 and just a brief overview is presented in

this Section. The configuration of the proposed vision is different (the camera is rotated, see

Figure 5.6). How this difference affects the equations is explained bellow. World coordinates of

the laser plane (X, Y, Z) can be obtained by Equation (3.11).

The laser plane is located a fixed distance from the camera optical center: along the Z-

axis in Figure 5.1A and along X-axis in Figure 5.1B. For the proposed vision system this distance

corresponds to the 1st row of the column vector

j5Y

Consequently, world coordinates along X-

axis do not change. As in Equation (3.11)

a)c

this equation ise transformed into Equation

(3.19).

北京理工大学博士学位论文!

96

5.3 Calibration Procedure

The extrinsic calibration procedure consists of finding the camera rotation matrix and

transformation matrix of the laser plane. These parameters can be found by solving the following

optimization problem:

~•€>!?>"?3"

r

"

#

•'p

T

•58

‚

8\5W

%r

#

ƒ2„…†‡I8Iˆ8"

#

•'p

T

•58

‚

8\5W

%

)

*

+

-

"#$

%/

e•'

T

•58

‚

8\5W

*

b

m

B

/

8

•')8

T

[-

'[#

'[.

'

W

88

T

•58

‚

8\5W)8

T

[#

5[.

5j5

W

888888

(5.1)

In order to solve this optimization problem for the proposed vision system an improved

calibration target was developed (see Figure 5.2). The main advantage of this target is its

versatility, as it can be applied to various configurations of a vision system, and its flexibility, as

it can be simply placed in front of the mobile robot. The proposed target allows an extrinsic

calibration to be performed by only capturing a single snapshot, this procedure is explained

bellow.

Figure 5.2: The proposed calibration target.

5.3.1 Extrinsic parameters of the camera

This Subsection explains the process of obtaining parameters forming the camera rotation

matrix

•'

as described in Equation (5.1). In order to carry out the calibration process to

obtaining camera extrinsic parameters, first of all, pixel coordinates belonging to the border

(between white and black regions) of the target are projected by Equation (3.19) to the world

coordinate system (see Figure 5.3A). After that for every parameter, namely for the pitch, roll, and

北京理工大学博士学位论文!

97

yaw, the minimization problem can be written by a series of Equations (5.4-5.6). The minimized

pitch can be found by making the projected vectors

‰Š

‹

and

Œ•

‹

of the target collinear to each

other. The formulation has the following form:

£

~•€@&9'7

r

"

#

Ž•j?v

%r

#

ƒ2„…†‡I8Iˆ8"

#

Ž•j?v

%

)‰ŠB

‰ŠJ:Œ•B

Œ•J

(5.2)

A

B

Figure 5.3: A – initial projection. B – projection with the optimized camera pitch and roll

The yaw is obtained by means of the minimization multiplication of the slopes between

vectors

‰Š

‹

and

Œ•

‹

. These vectors depend on the yaw, whereas pitch obtained during the

previous step is constant. The formulation is as follows:

£

~•€KLM

r

"

#

•;–

%r

#

ƒ2„…†‡I8Iˆ8"

#

•;–

%

)‰ŠB

‰ŠJ'Œ•B

Œ•J

(5.3)

Once the pitch and yaw are known it is possible to calculate the roll. The roll can be

found by minimization of the slope of the vector

•Š

‹

, whereas pitch and yaw obtained during

previous steps are constant. This minimization problem can be written as follows:

£

~•€CD55

r

"

#

[“””

%r

#

ƒ2„…†‡I8Iˆ8"

#

[“””

%

)•ŠJ

•ŠB

(5.4)

At this point, pixel coordinates belonging to the border of the target can be projected by

Equation (3.19) to the world ones with the pitch, roll, and yaw, determined during the calibration

北京理工大学博士学位论文!

98

procedure (see Figure 5.3B). Once the camera is calibrated, we can move to the calibration of the

laser plane.

5.3.2 Extrinsic parameters of the laser plane

This Subsection outlines the process of obtaining parameters forming the transformation

matrix T

•58

‚

8\5W

of the laser plane, which are part of the Equation (5.1), whereas parameters of

•'

are known and constant. In order to carry out the calibration process to obtain extrinsic

parameters of the laser plane, first of all, pixel coordinates of the laser beam belonging to sides

of the target are projected by Equation (3.19) to the world ones (see Figure 5.4A). After that for

every parameter, namely for the pitch, roll, and distance between the camera and laser plane, the

minimization problem can be written by a series of Equations (5.7-5.9). The minimized pitch can

be found by making the projected vectors

—˜

‹

and

o™

‹

of the target collinear to each other. The

formulation has the following form:

£

~•€@&9'7

r

"

#

Ž•j?v

%r

#

ƒ2„…†‡I8Iˆ8"

#

Ž•j?v

%

)—˜B

—˜J:—˜B

—˜J

(5.5)

A

B

Figure 5.4: A – initial projection. B – projection with the optimized laser pitch, roll and distance.

Another parameter related with the

•5

is the roll. The roll can be found by minimization

of the slope of the vector

™˜

‹

. This minimization problem can be written as follows:

£

~•€CD55

r

"

#

[“””

%r

#

ƒ2„…†‡I8Iˆ8"

#

[“””

%

)™˜J

™˜B

(5.6)

北京理工大学博士学位论文!

99

Once parameters of the

•5

are known it is possible to calculate the distance between the

camera and laser plane, which is the part of the translation matrix

\5

. The real distance D1

between the left side of the target and the right side is known. The experimental distance D2

between sides of the target can be calculated from the projected image coordinates of the laser

beam to the world coordinates. Thus, by minimizing the difference between D1 and D2 in the

objective function, it is possible to find the dependent variable, which represents the distance

between the camera and laser plane. This procedure takes following form:

z

|

}

~•€;&69

r

"

#

@•›j

%r

#

ƒ2„…†‡I8Iˆ8"

#

@•›j

%

)Œ-:Œ#

Œ#)#bE6bF

•:bG6bH

•%

(5.7)

Afterwards, pixel coordinates of the laser beam can be projected by Equation (3.19) to

the world ones with the pitch, roll, and distance between the camera and laser plane, determined

during the calibration procedure (see Figure 5.4B).

5.4 Evaluation

5.4.1 Experiment Setup

In order to evaluate the accuracy and robustness of the proposed calibration method, as

well as to demonstrate the quality of its performance, it was compared with two other calibration

techniques. The experiment setup is depicted in the

Table 5-1. Method 1 is based on the box target and the calibration technique considered in

the previous Section. Method 2 is based on the perpendicular checkerboard patterns and was

adopted from [107]. Firstly, the comparison of extrinsic parameters (

•'

,

•5

,

\5

) was based on the

configuration of the vision system where all calibration targets were visible (see Figure 5.5).

Secondly, the performance of the calibration method was estimated for the configuration of the

vision system considered in this chapter (see Figure 5.6). For this configuration one side of the

box target (method 1) was not visible, thus the proposed method was compared only with the

method 2. Experiments were carried out for 35 different configurations of the vision system,

where orientations of the camera and laser plane were randomly changed from -10 to 10 degrees.

Table 5-1: The Configurations of the Calibrations Methods

Calibration Target

Proposed

Target with 3 sides

北京理工大学博士学位论文!

100

Method 1

Target with 4 sides

Method 2

Checkerboard patterns

Configuration of the target

Proposed

Distances between parallel sides of the target are 1084.5 mm

Method 1

Method 2

Size of the checkerboard is 9x6. The square size is 97x97 mm

Location of the target (Configuration of the vision system #1)

Proposed

Distance from the camera to the left side is 646 mm and to the front side is 746 mm

Method 1

Method 2

Location of the target (Configuration of the vision system #2)

Proposed

Distance from the camera to the left side is 646 mm and to the front side is 1346 mm

Method 2

Other configurations

Proposed

Distance from the camera to the floor is 927 mm and to the laser plane is 466 mm. Image

resolution is 1920x1920 pixels

Method 1

Method 2

A

B

C

Figure 5.5: Configuration of the vision system #1. A – Method 1. B – Proposed. C – Method 2.

A

B

Figure 5.6: Configuration of the vision system #2. A – Proposed. B – Method 2.

北京理工大学博士学位论文!

101

5.4.2 Results

As for extrinsic parameters estimation, the mean absolute error (MAE) and the root mean

squared error (RMSE) among all of the vision system configurations were calculated for the

comparative analysis between the calibration methods. These values are depicted in Table 5-2.

For configuration of the vision system #1, it can be seen that better calibration results were

obtained by method 1, followed by the proposed method, and lastly method 2. As for

configuration of the vision system #2, method 1 is no longer applicable, thus among other two

methods better calibration results were obtained by the proposed method. It is also worthwhile

mentioning that the proposed method works much faster than the one based on the checkerboard

patterns. The average run time among all configurations for the proposed method is 39.5 seconds

and for the method 2 it is 95.8 seconds. Table 5-2 also includes the values corresponding to the

maximum absolute error (AE), which was obtained during the specific experiment (the index of

the experiment is shown in the brackets). For these experiments corresponding to the maximum

AE, 3D reconstruction of the indoor environment is provided in the next Section, which was

added for visual analysis between calibration methods.

Table 5-2: The Evaluation of the Extrinsic Calibration

Camera Parameters

Laser Parameters

Pitch, deg

Roll, deg

Yaw, deg

Pitch, deg

Roll, deg

Distance, mm

Configuration of the vision system #1

MAE

Proposed

0.05

0.11

0.06

0.16

0.09

2.81

Method 1

0.05

0.03

0.04

0.05

1.89

Method 2

0.10

0.13

0.08

0.27

0.06

3.32

RMSE

Proposed

0.05

0.13

0.07

0.18

0.11

2.97

Method 1

0.05

0.06

0.04

0.08

0.09

2.01

Method 2

0.14

0.15

0.10

0.35

0.25

3.89

Configuration of the vision system #2

MAE

Proposed

0.10

0.25

0.05

0.18

0.06

3.68

Method 2

0.18

0.38

0.12

0.21

0.24

3.89

RMSE

Proposed

0.13

0.31

0.07

0.25

0.08

4.11

北京理工大学博士学位论文!

102

Method 2

0.26

0.42

0.17

0.28

0.49

5.14

Maximum AE

Proposed

0.40 (6)

0.69 (19)

0.17 (3)

0.78 (19)

0.16 (16)

7.24 (10)

Method 2

0.86 (10)

0.76 (9)

0.66 (12)

0.81 (34)

2.54 (34)

14.64 (12)

5.4.3 Discussion

In the previous Section it was shown that for configuration of the vision system #1,

method 1 works better than method 2. The current work is aimed at evaluating the proposed

method against other calibration techniques. It was assumed that by modifying the calibration

target and calibration technique of the method 1 it would be possible to achieve similar

calibration results. However, in Table 5-2 it can be seen that the calibration results of the method

1 are better than both other methods. Thus, for configuration of the vision system #1 it is better

to use method 1. The performance of the proposed method is not as good as of the method 1, but

the proposed method is more universal and can be implemented for different vision system

configurations. It is also worthwhile mentioning that for both configurations of the vision system,

the proposed method showed better calibration results than the method 2, which is based on the

checkerboard patterns.

5.5 3D Reconstruction Method

Once the vision system was calibrated, we can move to the reconstruction of the 3D

structure of the indoor environment. Our method consists of several primary steps, outlined in

Figure 5.7. The input color image is first segmented into a set of objects of interest with semantic

labels. Secondly, images of these objects are extracted and transformed so as to be imaged in a

perspective projection. After that the depth information is recovered based on the laser data.

Finally, the 3D model is assembled. These steps are described below in more details.

北京理工大学博士学位论文!

103

Figure 5.7: Overview: From a single fisheye snapshot, the proposed method combines semantic segmentation and

laser data to generate the 3D model of the indoor environment.

5.5.1 Semantic Segmentation

The proposed reconstruction technique is aimed at obtaining the 3D model of an indoor

environment by one single snapshot. By fusing data from the laser, fisheye image, and semantic

segmentation it is possible to recover layout of the indoor environment as well as calculate depth

information to the corresponding elements, such as floor, walls, ceiling, and doors. In doing so,

we also solve the problem of perception of doors that has been intensively researched for over a

decade.

The problem of door identification has been tackled by Anguelov et al. [177] who present

an interesting approach based on laser range scans and a panoramic camera. They identify doors

that have been observed in different opening angles by the laser range scanner. The identified

doors allow them to learn how to distinguish walls and doors by color, such that similar doors

can be identified in the camera data. While limited in determining the exact geometry of the

doors, their approach provides valuable annotations of the doors in a map.

Limketkai et al. [178] propose a system to identify doors in 2D occupancy grid maps by

learning common properties of the doors in the specific environment, such as the width and the

indent from the wall. While based on strong assumptions, the method has the advantage to not

rely on observing doors in different states. Rusu et al. [179] propose a system for identifying

doors and extracting information about the geometry. They detect doors in 3D point clouds from

北京理工大学博士学位论文!

104

a tilting laser range scanner by searching for offset planes that follow the standards for

wheelchair accessible doors.

Several methods have been proposed to detect the doors using visual features [180, 181]

or a laser range scanner [182]. Others methods additionally obtain the exact location and

dimensions of the door frame, e.g. using active vision with a stereo camera [183] or a tilting laser

range scanner [179]. In order to eliminate lacks of existing methods and provide more reliable

and robust solution we decided to operate with the structured light and deep learning.

One of the benefits of the proposed simulator is that it can provide automatic ground truth

labeling for the main parts of the scene (see Figure 5.7). The problem with the manual labeling is

the process itself is time-consuming as images may contain a wide range of elements, this is

especially so for omnidirectional images. This Section demonstrates the capacity of the

automatic ground truth labeling by training a semantic segmentation network using deep learning.

5.5.2 Feature Extraction

There are a variety of features for better image understanding, which in general can be

named as hand-crafted features and learned features. Hand-crafted features are extracted

manually using an algorithm defined by an expert. Learned features can be extracted with the use

of Convolutional Neural Networks (CNNs) [184, 185].

CNNs architectures are used in fields such as image recognition, image annotation, image

retrieval, etc. [186]. As for image classification, CNN architecture consists of several

convolutional layers followed by one or more fully connected layers [187]. Image feature

extraction based on CNNs have demonstrated its effectiveness in a number of applications [188-

190].

In this dissertation, the goal of the CNN is the detection of features of an indoor

environment (floor, ceiling, walls, and doors) by labeling them with different colors (semantic

segmentation). He et al. pointed out that the deeper the neural network, the more difficult it is to

train it [191]. This problem was solved by using the residual learning framework, namely ResNet.

Experimental results showed a better performance in training and testing on the ILSVRC 2015

(ImageNet Large Scale Visual Recognition Challenge) validation set with a top 1-recognition

accuracy of about 80% [191]. An operating principle of Residual Network is that residual

functions (instead of unreferenced functions) with reference to the layer inputs should be learned

by each layer of the network. These architectures are easier to optimize and it is possible to

北京理工大学博士学位论文!

105

obtain improved accuracy by significantly increasing the depth [191]. Thus, these networks were

considered in our work.

5.5.3 Experimental Setup

The dataset was generated by our simulator and contains 300 labeled images. 80% of the

dataset was partitioned into training data and the remaining 20% was used as test data. This

dataset is composed of 240-by-240-pixel images and tested by two networks: ResNet18 and

ResNet50. Networks were trained with the use of a single CPU with the clock speed equal to

2.5GHz; 10G of RAM, and the GPU was an Intel HD Graphics 4000 which has 1.5G memory.

We used 64-bit macOS as the operating system.

5.5.4 Results

Figure 5.8 shows the behaviors of ResNets which are similar to each other. From Table

5-3 it can be seen that performance of the ResNet18 is not inferior to ResNet50, but at the same

time it took less time to train ResNet18 and the network size is lower in comparison with

ResNet50. Figure 5.9 shows some of the output results of the trained networks. By visual

representation it can be seen that both networks were trained in an accurate way in comparison

with the ground truth.

Figure 5.8: Neural network performance evaluation. ResNet18 is shown in red color and ResNet50 is shown in blue

color.

北京理工大学博士学位论文!

106

Table 5-3: The Evaluation of the Trained Networks

Network

Validation accuracy

Training time

Network size

ResNet18

96.60 %

129 min 11 sec

103.4 MB

ResNet50

96.65 %

284 min 59 sec

236.6 MB

ResNet18

ResNet50

Ground t

ruth

Figure 5.9: Training results.

5.5.5 Discussion

It was demonstrated that the labeled data generated by our simulator is suitable for

training neural networks. The automatic labeling itself can significantly simplify the process of

collecting data for testing theories and verifying experiment results. It was also found that by

using deep learning the semantic segmentation network can be well-trained with the small

amount of the network layers. Moreover, his approach is fast and does not increase the output

network size.

北京理工大学博士学位论文!

107

5.6 Perspective Projection

5.6.1 Preparation

Once the semantic segmentation network was trained the portion of interest can be

extracted from the input fisheye image. First of all, for every element (floor, ceiling, walls, and

doors) masks are created (see Figure 5.7). It is worthwhile mentioning that the reconstruction

method proposed in the Thesis allows one to obtain 3D model of the indoor scene within the

visible region of the laser beam, which is belonging to the walls in the fisheye image (see Figure

5.7). Next, with the previously created masks and the working region of the laser beam, it is

possible to extract the interesting portions of the scene (see Figure 5.10). Finally, when the

interested elements of interest are extracted from the fisheye image, the perspective projections

can be created.

Figure 5.10: Upper row shows extracted regions of the indoor environment (floor, ceiling, walls, and doors) with the

visible laser beam. Lower row shows corresponding perspective projection for extracted regions of the indoor

environment.

5.6.2 Perspective projection

An equirectangular projection represents everything visible from a particular point in

space. As such, any other 3D to 2D projection can be created, including a standard perspective

projection. Since a perspective projection only captures a relatively small field of view,

compared to the 360 by 180 degrees of the equirectangular projection, there are an infinity of

possible perspective projections that might be calculated. Each of these possible perspective

projections can be characterized by the pitch, roll and yaw rotation of the camera view frustum,

as well as the horizontal and vertical field of view.

北京理工大学博士学位论文!

108

A single perspective projection cannot capture everything represented within an

equirectangular projection, however multiple perspective projections can. An industry standard is

to create 6 perspective views with the camera view frustum corresponding to the 6 faces of a

cube centered at the camera position. Each perspective projection has 90 degrees field of view

both horizontally and vertically.

The algorithm employed here to create a perspective projection starts by considering a

virtual camera located at the origin with a view "forward" direction pointing down the positive y

axis and with the z axis being the "up" vector. In the conventions used here a right-hand

coordinate system is used so the camera "right" vector is along the positive x axis.

Figure 5.11: Initialization of the virtual camera.

To create a particular perspective view one rotates this initial camera direction and

orientation about any axis, or combination of axes. A roll in the chosen coordinate system is a

rotation about the y axis (forward), panning is a rotation about the z axis (up) and tilting is a

rotation about the x axis (right). To create the 6 faces of the cube map the initial camera view

direction is rotated as shown, the horizontal and vertical field of view set to 90 degrees.

Table 5-4: Rotations of the cube face

Cube face

Rotation

front

-

left

Pan by 90 degrees

right

Pan by -90 degrees

back

Pan by 180 degrees

top

Tilt by -90 degrees

bottom

Tilt by 90 degrees

x

z

y

view frustum

camera!

location

up

forward

right

projection plane

北京理工大学博士学位论文!

109

The process of creating the perspective projection image is performed in the reverse

direction, that is, for every pixel (or subpixel for anti-alising) in the perspective image plane, one

needs to find the best RGB estimate in the fisheye image.

The high-level process is as follows:

• Initialize the virtual camera, located at the origin, looking down the y axis and with a

horizontal and vertical FOV of 90 degrees.

• For every pixel (i, j) in the camera projection plane derive the corresponding 3D vector P

in world coordinates by Equation (5.10).

¤#žp•p¥%)

¦§

••

–:B

¨

pBp

§

•©

v:B

¨ª

(5.8)

Figure 5.12: Perspective image.

• Rotate this vector P about the axes corresponding to roll, pitch and yaw to orientate the

perspective camera as desired, call this vector P'.

• Calculate the angles ø and θ as shown below.

«);j;Ÿ•#¤N

!p¤O

!%

;

¬);j;Ÿ•

i9

¤O

!# 6¤N

!#p¤K

!

k

(5.9)

Perspective image

w

0

h

P

i

j

-1 1

x

-1

1

z

Pixel coordinates

Normalised image coordinates

北京理工大学博士学位论文!

110

Figure 5.13: Transformation between world and camera coordinates.

• Determine the image index (I, J) in normalized fisheye image coordinates given these

-

and ø and the linear relationship between ø and radius r in a fisheye projection. This gives

the RGB value to assign to pixel (i, j) in the perspective image.

®)•¯8?“›#-%y"<LO

;

°)•¯8›•Ÿ#-%y"<LO

(5.10)

where

"<LO

is the field of view of the fisheye lens.

Results of the perspective projection are shown in Figure 5.10.

Figure 5.14: Transformation to the image coordinates.

x

z

y

P'

θ

ø

right

forward

up

camera !

position

p

I

J

θ

r

-1

1

Fisheye image

Normalised image coordinates

0

北京理工大学博士学位论文!

111

5.6.3 Proposed 3D Reconstruction Technique

Last step is related to the 3D reconstruction of the indoor environment. In order to

reconstruct the depth of the scene several steps are required. First of all, image coordinates of the

laser beam are extracted and distances to the corresponding walls and doors are calculated by

Equation (3.19). Once the location of the wall is known it is possible to calculate distances to the

floor and ceiling. Now, the labeled image regions between interested parts can be extracted as

follows:

• Floor. Region between the magenta color and yellow color (see Figure 5.7);

• Ceiling. Region between the magenta color and aqua color (see Figure 5.7).

In a similar fashion to the laser plane, distances to the floor and ceiling can be

successfully found by triangulation. The distance from the mobile robot to the particular wall

along Y-axis (see Figure 5.2) is known and constant. Consequently, world coordinates along Y-

axis do not change. By knowing that

b)c

, equation for calculating world coordinates of the

wall and ceiling can be written as:

*

+

-

"#$

%/

e

T

[-

'[#

'[.

'

WT

[-

5[.

5j5

W*

a

m

B

/

)c

(5.11)

By our reconstruction method it is also possible to reconstruct a corner part. For this

procedure first of all, endpoints of the laser beam have to be extracted. Secondly, orientation of

every wall is calculated and wall in the corner can be divided for two parts. Finally, the 3D

model can be assembled from the segmented parts of the scene in combination with the

北京理工大学博士学位论文!

112

corresponding distances.

Figure 5.15 shows individual reconstructed 3D models as well as global map. Results

show that the proposed reconstruction technique provides accurate and robust 3D models for

different configurations of the indoor scenes. Results also show that by using a single input

image it is possible to reconstruct not only the layout of the indoor environment, but depth as

well.

北京理工大学博士学位论文!

113

In the Table 5-2, indices associated with each particular experiment with the maximum

AE were mentioned.

Figure 5.15 illustrates the corresponding 3D models for some of these configurations. The

goal is to compare calibration methods with a visual analysis. For configuration 6 both methods

provide quite similar results. However, for the rest part of the 3D models, the difference between

methods is quite obvious. From the visual analysis it can also be noticed that the proposed

method provides better results, outperforming the standard one. For an improved visual analysis

more samples of the 3D models would be required and it is difficult to include all of them in the

Thesis. Thus, the main comparison analysis between calibration methods was conducted by

estimating the extrinsic parameters, whereas a visual analysis of some configurations was added

additionally.

北京理工大学博士学位论文!

114

Figure 5.15: Reconstruction results (single reconstruction and global map are presented).

5.6.4 Comparison with commonly used reconstruction methods

- Qualitive Comparison

As it was mentioned before passive vision systems depend on the detected environmental

features and work under environmental lightning conditions. Representative examples include

stereo vision and structure from motion (SFM), where images are captured from multiple

perspectives, and correspondence has to be established between pixels on the different images.

Namely, this correspondence determination process is highly relying on image features. As a

result, when textures on the reconstructed surface are insufficient, such methods present low

北京理工大学博士学位论文!

115

matching accuracy. In order to justify the statements mentioned above a well-established Visual

Structure from Motion (VSfM) tool [192] was applied for generating sparse point clouds. Figure

5.16 presents a qualitative comparison of reconstructed models between our method and VSfM.

Figure 5.16: Left image shows reconstruction results on a passive method. Right image shows reconstruction results

on the proposed method.

From the reconstruction results illustrated in Figure 5.16 it can be seen that our system

outperforms reconstruction based on passive methods. The proposed reconstruction method is

comparable to rigid indoor reconstruction. In contrast, passive methods introduce more noise and

artifacts to the final 3D model, this may be seen in Figure 5.16. Moreover, the reconstruction is

mostly perfomed in textured areas, e.g. in edges and corners, as in textureless areas such as walls,

floor, ceiling are not enough distinguish features for detection. In contrast, our method is able to

recover the textureless surfaces in an accurate and smooth way.

- Quantative Comparison

Because of the lack of the ground truth in real cases, authors usually compare the surface

reconstruction of their methods with the geometrical data aquired by a LIDAR system [193-196].

LIDAR systems serve as a high precision ground truth 3D point clouds generators. In contrats to

北京理工大学博士学位论文!

116

real cases, inside the simulation environment ground truth is known, e.g. distances from mobile

robot to obstacles. Moreover, point clouds also can be generated inside the simulation

environment (see Error! Reference source not found.). This generation of the point clouds also can

be considered as a benefit of the simulation environment as them can be used for deep learnig

purposes or as a ground truth data, for instance.

Figure 5.17: Images above show ground truth point clouds. Images below show reconstruction results on the

proposed method.

The quality on reconstruction results of the proposed method depends on the quality of

the carried out extrinsic calibration. This comparison on distances estimation was illustrated in

Table 4-7. Table 4-7 demonstrates that more reliable distance estimation to the structured light

belonging to obstacles can be obtained with the proposed calibration technique. Consecuently, as

structured light is used as a reference tool for generating the 3D model of the indoor scene, then

qulity of the reconstruction depend on how accurate the distance is estimated to particular

obstacle.

北京理工大学博士学位论文!

117

Chapter 6 Conclusions

6.1 Proposed Simulator

In this Thesis, we presented a high-fidelity simulator that can be used across various

applications of computer vision. We demonstrate the versatility of the simulator for evaluating

vision problems by studying several applications within simulated environments, namely

extrinsic calibration of the vision system and 2D/3D mapping of the indoor environment. We

also showed how the mobile robot equipped with the fisheye camera and structured light can be

controlled by inside the indoor environment. To the best of our knowledge, this simulator is the

first based on an omnidirectional camera and laser illumination. The simulator goes beyond

providing just synthetic data, but provides controlled environmental conditions that are not easily

replicated in the real-world. We strongly believe that this simulator can be of assistance to

researchers, and enable those with the requisite hardware to perform experiments in a safe

manner.

6.2 Omni-Vision System and its Extrinsic Calibration

This Thesis proposed an improved omnidirectional vision system in a flexible

configuration and technique for its calibration. The main contribution towards the vision system

is that we reduced the number of elements by leaving only one laser emitter covering the whole

robot’s surroundings and, at the same time, we eliminated the ambiguous problem of how several

laser emitters included in the vision system can be calibrated. As for the proposed calibration

technique we can say that the process itself is simple and by one single capture it is possible to

achieve reliable calibration results. By developing a rectangular target, the laser beam becomes

more visible in comparison with its projection onto the checkerboard patterns and our method

also includes more useful information by covering 360-degree field of view, as a result of the

projection onto the four sides. The experiments were performed and analyzed in order to verify

the accuracy and robustness of the proposed vision system and calibration method itself. Results

showed that our calibration method outperforms the most common one based on the

checkerboard patterns and our vision system can be used in the real scenarios.

By means of the simulation environment it was possible to achieve the contributions

mentioned above. We spent a lot of time for its development; however, it is been helpful in

testing theories before their practical applications. As for the current work the main benefit of

北京理工大学博士学位论文!

118

using the simulation environment was related with the comparison analysis between extrinsic

parameters among different calibration methods as well as the comparison of the mapping results.

6.3 3D Reconstruction of the Indoor Environment

Finally, based on our understanding of the accuracy and precision of a structured-light

based vision system, we also have developed a pipeline for building geometrically consistent 3D

reconstruction of the indoor environment. We have proposed a fusion of the information from

structured-light-omni-vision system and the semantic segmentation neural network to build high

quality 3D reconstructions. While these methods separately have their own advantages and

limitations, our approach of the fusion takes the advantages of both these methods and also

overcomes the limitations of the individual methods. Our approach does not suffer any

significant coarse level deformation, while traditional photometric-stereo is very much prone to

this problem. Moreover, in order to perform the proposed technique only one input image is

required for recovering layout the room as well as depth data. The experiments were performed

and analyzed in order to verify the accuracy and robustness of the proposed methods. Results

showed that our reconstruction method is able to recover not only layout of the indoor scene but

also depth information. Experimental results also demonstrated that passive vision systems often

fail on challenging scene parts, such as textureless surfaces, where they can produce depth

estimation errors, whereas the proposed approach, which is based on the active vision system

provides stable layout recovery with depth estimation.

6.4 Summary of the Dissertation and further work

In this Dissertation, we have investigated some of the important problems in the domain

of extrinsic calibration and 3D reconstruction in geometric computer vision, by developing

robust and efficient techniques in these fields. We have analyzed accuracy of the proposed omni-

vision system with laser illumination, by developed simulator, as well as by real data. We also

have developed a novel method for 3D reconstruction by fusing the information from structured-

light-stereo and semantic segmentation.

In the future, we aim to further investigate the transference of capabilities learned in

simulated worlds to the real-world. One of key advantages of employing virtual environments is

their ability to represent a diverse and dynamic range of real-world conditions. In order to add

more world dynamics and diversity it is planned to extend the capability of the current version of

北京理工大学博士学位论文!

119

the simulator by adding pedestrians, by creating manual as well automated environment

generation systems. Thus, users will be able to interact with standardized blocks representing

elements such as walls, floor, ceiling, furniture or obstacles. This approach will easily allow

generation of a wide variety of training and testing environments. The differences in appearance

between the simulated and real-world scenarios will need to be smoothed through deep transfer

learning techniques. Moreover, recent work by F. Sadeghi et al. showed that transfer from virtual

environments to the real-world is possible even without a strong degree of photorealism [197].

北京理工大学博士学位论文!

120

References

[1] R. Al-Harasis, E. Al-Zmaily, H. Al-Bishawi, J. Abu Shash, M. Shreim, and B. Sababha, “Design and

Implementation of an Autonomous UGV for the Twenty Second Intelligent Ground Vehicle

Competition,” The International Conference on Software Engineering, Mobile Computing and Media

Informatics, 2015.

[2] A. Bechar, and C. Vigneault, “Agricultural robots for field operations. Part 2: Operations and systems,”

Biosyst Eng, pp. 110–128, 2017.

[3] N. Bellas, S. Chai, M. Dwyer, and D. Linzmeier, “Real-time fisheye lens distortion correction using

automatically generated streaming accelerators,” In: Field Programmable Custom Computing

Machines, 2009.

[4] O. Rawashdeh, H. Yang, R. AbouSleiman, and B. Sababha, “Microraptor: A low-cost autonomous

quadrotor system,” In: ASME 2009 International Design Engineering Technical Conferences and

Computers and Information in Engineering Conference, 2009.

[5] B. Sababha, H. Al Zu'bi, and O. Rawashdeh, “A rotor-tilt-free tricopter UAV: design, modelling, and

stability control,” International Journal of Mechatronics and Automation, pp. 107–113, 2015.

[6] H. Sawalmeh, H. Bjanthala, M. Al-Lahham, and B. Sababha, “A Surveillance 3D Hand-Tracking-

Based Tele-Operated UGV,” In: The 6th International Conference on Information and

Communication Systems, 2015.

[7] V. Semwal, A. Bhushan, and G. Nandi, “Study of humanoid Push recovery based on experiments,” In:

Control, Automation, Robotics and Embedded Systems (CARE), 2013.

[8] V. Semwal, P. Chakraborty, and G. Nandi, “Less computationally intensive fuzzy logic (type-1)-based

controller for humanoid push recovery,” Robot Auton Syst, pp. 122–135, 2015.

[9] V. Semwal, N. Gaud, and G. Nandi, “Human Gait State Prediction Using Cellular Automata and

Classification Using ELM,” In: MISP-2017.

[10] V. Semwal, S. Katiyar, R. Chakraborty, and G. Nandi, “Biologically-inspired push recovery capable

bipedal locomotion modeling through hybrid automata,” Robot Auton Syst, pp.181–190, 2015.

[11] V. Semwal, K. Mondal, and G. Nandi, “Robust and accurate feature selection for humanoid push

recovery and classification: deep learning approach,” Neural Comput & Applic, pp. 565–574, 2017.

[12] V. Semwal, and G. Nand, “Generation of joint trajectories using hybrid automate-based model: a

rocking block-based approach,” IEEE Sensors J, pp. 5805–5816, 2016.

[13] V. Semwal, M. Raj, and G. Nandi, “Biometric gait identification based on a multilayer perceptron,”

Robot Auton Syst pp. 65–75, 2016.

[14] T. Liu, C. Ju, Y. Huang, T. Chang, K. Yang, and Y. Lin, “A 360-degree 4Kx2K Panorama Video

Processing Over Smart-phones,” IEEE International Conference on Consumer Electronics (ICCE),

2017.

北京理工大学博士学位论文!

121

[15] K. Ma, F. Lu, and X. Chen, "Robust Planar Surface Extraction from Noisy and Semi-Dense 3D Point

Cloud for Augmented Reality," 2016 International Conference on Virtual Reality and Visualization

(ICVRV), pp. 453-458, Sep. 2016, doi: 10.1109/ICVRV.2016.83.

[16] A. Mossel, and M. Kroeter, "Streaming and Exploration of Dynamically Changing Dense 3D

Reconstructions in Immersive Virtual Reality," 2016 IEEE International Symposium on Mixed and

Augmented Reality (ISMAR-Adjunct), pp. 43-48, Sep. 2016, doi: 10.1109/ISMARAdjunct.

2016.0035.

[17] C. Fernandez-Labrador, A. Perez-Yus, G. Lopez-Nicolas, and J. Guerrero, "Layouts From Panoramic

Images With Geometry and Deep Learning," in IEEE Robotics and Automation Letters, vol. 3, no. 4,

pp. 3153-3160, Oct. 2018, doi: 10.1109/LRA.2018.2850532.

[18] C. Fernandez-Labrador, J. Facil, A. Perez-Yus, C. Demonceaux, J. Civera, and J. Guerrero, "Corners

for Layout: End-to-End Layout Recovery From 360 Images," in IEEE Robotics and Automation

Letters, vol. 5, no. 2, pp. 1255-1262, Apr. 2020, doi: 10.1109/LRA.2020.2967274.

[19] S. Shah, and J. Aggarwal, “Mobile robot navigation and scene modeling using stereo fish-eye lens

system,” in Machine Vision and Applications, vol. 10, pp. 159-173, Oct. 1996, doi:

10.1007/s001380050069.

[20] M. Nakagawa, T. Yamamoto, S. Tanaka, M. Shiozaki, and T. Ohhashi. “Topological 3D Modeling

Using Indoor Mobile Lidar Data,” in ISPRS - International Archives of the Photogrammetry, Remote

Sensing and Spatial Information Sciences, pp. 13-18, May 2015 doi: 10.5194/isprsarchives-XL-4-W5-

13-2015.

[21] X. Lian, Z. Liu, X. Wang, and L. Dou, “Reconstructing Indoor Environmental 3D Model Using Laser

Range Scanners and Omnidirectional Camera,” In Proc. of the 7th World Congress on Intelligent

Control and Automation, vol. 1, no. 23, pp. 1640–1644, Jun. 2008.

[22] E. DANDIL, and K. K. ÇEVİK, “Computer Vision Based Distance Measurement System using Stereo

Camera View,” In Proc. of the 3rd International Symposium on Multidisciplinary Studies and

Innovative Technologies (ISMSIT), pp. 1-4, 2019.

[23] D. V. Alekseevich, and M. D. Alexandrovich, “Construction of a depth map using stereo vision based

on a developed stereo camera for the anthropomorphic robot AR600E:,” In Proc. of the 2nd School on

Dynamics of Complex Networks and their Application in Intellectual Robotics (DCNAIR), pp. 27-30,

2018.

[24] P. N. Koundinya, Y. Ikeda, S. N.T., P. Rajalakshmi, and T. Fukao, “Comparative Analysis of Depth

Detection Algorithms using Stereo Vision,” In Proc. of the 6th World Forum on Internet of Things

(WF-IoT), pp. 1-5, 2020.

[25] M. Song, H. Watanabe, and J. Hara, “Robust 3D reconstruction with omni-directional camera based

on structure from motion,” 2018 International Workshop on Advanced Image Technology (IWAIT),

pp. 1-4, 2018.

北京理工大学博士学位论文!

122

[26] Q. Zhang, P. An, S. Wang, X. Bai, and W. Zhang, “Image-based Space Object Reconstruction and

Relative Motion Estimation using Incremental Structure from Motion,” In Proc. of the IEEE CSAA

Guidance, Navigation and Control Conference (CGNCC), pp. 1-6, 2018.

[27] A. D. Sergeeva, and V. A. Sablina, “Using structure from motion for monument 3D reconstruction

from images with heterogeneous background,” In Proc. of the 7th Mediterranean Conference on

Embedded Computing (MECO), pp. 1-4, 2018.

[28] M. Daum, and G. Dudek, “On 3-D surface reconstruction using shape from shadows,” In Proc. of the

IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 461-468, 1998.

[29] R. Gouiaa, and J. Meunier, “3D Reconstruction by Fusioning Shadow and Silhouette Information,” In

Proc. of the Canadian Conference on Computer and Robot Vision, pp. 378-384, 2014.

[30] C. Wohler, “3D surface reconstruction by self-consistent fusion of shading and shadow features,” In

Proc. of the 17th International Conference on Pattern Recognition, pp. 204-207, 2004.

[31] R. Taira, S. Saga, T. Okatani, and K. Deguchi, “3D reconstruction of reflective surface on reflection

type tactile sensor using constraints of geometrical optics,” In Proc. of the SICE Annual Conference,

pp. 3144-3149, 2010.

[32] X. Zongwu, Z. Zhaojie, and L. Yechao, “3-D Object Reconstruction Using Tactile Sensing for Multi-

fingered Hand,” 2018 10th International Conference on Intelligent Human-Machine Systems and

Cybernetics (IHMSC), pp. 209-212,2018.

[33] Fuming Liu, and T. Hasegawa, “Reconstruction of curved surfaces using active tactile sensing and

surface normal information,” In Proc. of the IEEE International Conference on Robotics and

Automation, vol. 4, pp. 4029-4034, 2001.

[34] E. Le Francois, J. Herrnsdorf, J. J. D. McKendry, L. Broadbent, M. D. Dawson, and M. J. Strain,

“Combined Time of Flight and Photometric Stereo Imaging for Surface Reconstruction,” In Proc. of

the IEEE Photonics Conference (IPC), pp. 1-2, 2020.

[35] M. Xie, and J. R. Cooperstock, “Time-of-Flight Camera Calibration for Improved 3D Reconstruction

of Indoor Scenes,” 2014 Seventh International Symposium on Computational Intelligence and Design,

pp. 478-481, 2014.

[36] J. M. Gutierrez-Villalobos, T. Dimas, and J. C. Mora-Vazquez, “Simple and low cost scanner 3D

system based on a Time-of-Flight ranging sensor,” In Proc. of the XIII International Engineering

Congress (CONIIN), pp. 1-5, 2017.

[37] B. Zhang, H. Zhang, J. Zhu, H. Xu, and Y. Zhao, “Research On Self-Mixing Interference

Displacement Reconstruction Method Based On Ensemble Empirical Mode Decomposition,” In Proc.

of the IEEE International Conference on Mechatronics and Automation (ICMA), pp. 723-727, 2019.

[38] J. Zalev and M. C. Kolios, “Image Reconstruction Combined With Interference Removal Using a

Mixed-Domain Proximal Operator,” In Proc. of the IEEE Signal Processing Letters, vol. 25, no. 12,

pp. 1840-1844, Dec. 2018.

北京理工大学博士学位论文!

123

[39] Y. Awatsuji, Yexin Wang, P. Xia, and O. Matoba, “3D image reconstruction of transparent gas flow

by parallel phase-shifting digital holography,” In Proc. of the 15th Workshop on Information Optics

(WIO), pp. 1-2, 2016.

[40] Z. Hu, Q. Guan, S. Liu, and S. Y. Chen, “Robust 3D Shape Reconstruction from a Single Image Based

on Color Structured Light,” In Proc. of the International Conference on Artificial Intelligence and

Computational Intelligence, pp. 168-172, 2009.

[41] H. Lin, and Z. Song, “3D reconstruction of specular surface via a novel structured light approach,” In

Proc. of the IEEE International Conference on Information and Automation, pp. 530-534, 2015.

[42] J. Deng, B. Chen, X. Cao, B. Yao, Z. Zhao, and J. Yu, “3D Reconstruction of Rotating Objects Based

on Line Structured-Light Scanning,” In Proc. of the International Conference on Sensing, Diagnostics,

Prognostics, and Control (SDPC), pp. 244-247, 2018.

[43] C. Albitar, P. Graebling, and C. Doignon, “Robust Structured Light Coding for 3D Reconstruction” In

Proc. of IEEE International Conference on Computer Vision, 2007.

[44] E.V. German, and E.R. Muratov, “Assessment of combining heterogeneous images,” World &

Science: Materials of the international research and practice conference, Brno, Czeh. Rep., May 2014.

[45] H. Chao, Y. Gu, and M. Napolitano, “A survey of optical flow techniques for robotics navigation

applications,” Journal of Intelligent & Robotic Systems, vol. 73, no. 1, pp. 361-372, 2014.

[46] D. Nist´er, F. Kahl, and H. Stew´enius, “Structure from motion with missing data is np-hard,” In Proc.

of ICCV’07, pp. 1–7, 2007.

[47] O. Chum, T. Werner, and J. Matas, “Two-view geometry estimation unaffected by a dominant plane,”

In Proc. of CVPR’05, pp. 772–780, 2005.

[48] J. Salvi, J. Pages, and J. Battle, “Pattern codification strategies in structured light systems,” Pattern

Recognition, vol. 37 pp. 827-849, 2004.

[49] M. Adachi, S. Shatari, and R. Miyamoto, "Visual Navigation Using a Webcam Based on Semantic

Segmentation for Indoor Robots," 2019 15th International Conference on Signal-Image Technology &

Internet-Based Systems (SITIS), pp. 15-21, Nov. 2019 doi: 10.1109/SITIS.2019.00015.

[50] X. Lian, Z. Liu, X. Wang, and L. Dou, “Reconstructing Indoor Environmental 3D Model Using Laser

Range Scanners and Omnidirectional Camera,” In Proc. of the 7th World Congress on Intelligent

Control and Automation, vol. 1, no. 23, pp. 1640–1644, Jun. 2008.

[51] Y. Li, and Q. Wang, “Catadioptric Omni-direction Vision System based on Laser Illumination,” In

Proc. of the 2013 IEEE International Symposium on Industrial Electronics (ISIE), May. 2013.

[52] J. Shin, and S. Yi, “Development of Omnidirectional Ranging System Based on Structured Light

Image,” Journal of Institute of Control, Robotics and Systems, vol. 18, no. 5, pp. 479-486, May. 2012.

[53] J. Shin, S. Yi, Y. Hong, and J. Suh, “Omnidirectional Distance Measurement based on Active

Structured Light Image,” Journal of Institute of Control, Robotics and Systems, vol. 16, no. 8, pp.

751-755, Aug. 2010.

北京理工大学博士学位论文!

124

[54] S. Yi, B. Choi, and N. Ahuja, “Real-time omni-directional distance measurement with active

panoramic vision,” International Journal of Control, Automation and Systems, vol. 5, no. 2, pp. 184-

191, Apr. 2007.

[55] A. Lenskiy, H. Junho, K. Dongyun, and P. Junsu, “Educational platform for learning programming via

controlling mobile robots,” in Proc. International Conference on Data and Software Engineering, 2014,

pp. 1-4.

[56] M. Sadiku, P. Adebo, and S. Musa, “Online Teaching and Learning,” International Journals of

Advanced Research in Computer Science and Software Engineering, vol. 8, no.2, pp. 73-75, Feb.

2018.

[57] A. S. Alves Gomes, J. F. Da Silva and L. R. De Lima Teixeira, “Educational Robotics in Times of

Pandemic: Challenges and Possibilities,” 2020 Latin American Robotics Symposium, 2020 Brazilian

Symposium on Robotics and 2020 Workshop on Robotics in Education, Natal, Brazil, 2020, pp. 1-5.

[58] L. Ma, H. Bai, Q. Dai, and H. Wang, “Practice and Thinking of Online Teaching During Epidemic

Period *,” in Proc. 15th International Conference on Computer Science & Education, 2020, pp. 568-

571.

[59] J. Hu, and B. Zhang, “Application of SalesForce Platform in Online Teaching in Colleges and

Universities under Epidemic Situation,” in Proc. International Conference on Big Data, Artificial

Intelligence and Internet of Things Engineering, 2020, pp. 276-279.

[60] L. Kexin, Q. Yi, S. Xiaoou, and L. Yan, “Future Education Trend Learned From the Covid-19

Pandemic: Take ≪Artificial Intelligence≫ Online Course As an Example,” in Proc. International

Conference on Artificial Intelligence and Education, 2020, pp. 108- 111.

[61] I. Kholodilin, Y. Li, and Q. Wang, “Omnidirectional Vision System With Laser Illumination in a

Flexible Configuration and Its Calibration by One Single Snapshot,” IEEE Transactions on

Instrumentation and Measurement, vol. 69, no. 11, pp. 9105-9118, Nov. 2020.

[62] Open Dynamics Engine Website [Online]. Available: http://www.ode.org/

[63] VRML [Online]. Available: http://en.wikipedia.org/wiki/VRML

[64] O. Michel, “Cyberbotics Ltd. WebotsTM: Professional Mobile Robot Simulation”, International

Journal of Advanced Robotic Systems, vol. 1, no. 1, pp. 39-42, 2004.

[65] WebotsTM 6 Fast Prototyping & Simulation of Mobile Robots, Cyberbotics Ltd, 2009

[66] Webots [Online]. Available: http://www.cyberbotics.com/products/webots/

[67] Webots [Online]. Available: http://en.wikipedia.org/wiki/Webots

[68] Simulink - Simulation and Model based Design [Online]. Available:

http://www.mathworks.com/products/simulink/

[69] SimRobot - Robotics Simulator [Online]. Available: http://www.informatik.uni-

bremen.de/simrobot/index_e.htm

北京理工大学博士学位论文!

125

[70] MATLAB - The Language of Technical Computing [Online]. Available:

http://www.mathworks.com/products/matlab/

[71] MATLAB Product Help

[72] T. Petrinić, E. Ivanjko, and I. Petrović, “AMORsim − A Mobile Robot Simulator for Matlab” in Proc.

of 15th International Workshop on Robotics in Alpe-Adria-Danube Region (RAAD), Balatonfüred,

Hungary, June 2006.

[73] P. I. Corke, “A computer tool for simulation and analysis: The Robotics Toolbox for MATLAB”

Robotics & Automation Magazine, IEEE, vol. 3, pp. 24-32, March 1996.

[74] H. Aezman, “Mobile Robot Simulation and Contoller design with MATLAB/ SIMULINK” B.Eng.

Thesis, Kolej Universiti Teknikal Kebangsaan, Malaysia, Mar. 2005.

[75] Microsoft Robotics Developer Studio [Online]. Available:

http://en.wikipedia.org/wiki/Microsoft_Robotics_Developer_Studio

[76] J. Fernando: “Microsoft Robotics Studio: A Technical Introduction”

[77] J. Cogswell, “Microsoft Robotics Studio 2008 Makes Controlling Robots Easier” [Online]. Available:

http://www.eweek.com/c/a/Application- Development/Microsoft-Robotics-Studio-2008-Makes-

Controlling- Robots-Easier/

[78] B. Balaguer, S. Balakirsky, S. Carpin, M. Lewis, and C. Scrapper, “USARSim: A Validated Simulator

for Research in Robotics and Automation,” Workshop on “Robot Simulators: Available Software,

Scientific Applications, and Future Trends” at IEEE/RSJ, 2008.

[79] J. Wang, “USARSim: A Game-based Simulation of the NIST Reference Arenas,” University of

Pittsburgh and School of Computer Science, Carnegie Mellon Publication.

[80] S. Carpin, M. Lewis, J. Wang, S. Balakirsky, and C. Scrapper, “USARSim: a robot simulator for

research and education” in Proc. of the IEEE International Conference on Robotics and Automation,

pp. 1400-1405, 2007.

[81] Second Life [Online]. Available: http://en.wikipedia.org/wiki/Second_Life

[82] T. Censullo, “Tutorial: Architecture of Open Simulator”.

[83] OpenSimulator [Online]. Available: http://en.wikipedia.org/wiki/OpenSimulator

[84] B. Browning, and E. Tryzelaar, “ÜberSim: A Multi-Robot Simulator for Robot Soccer,” in Proc. of

Autonomous Agents and Multi-Agent Systems, AAMAS'03, Australia, pp. 948 - 949, Jul. 2003.

[85] J. Go, B. Browning, and M. Veloso, “Accurate and Flexible Simulation for Dynamic, Vision-Centric

Robots,” in Proc. of the 3rd International Joint Conference on Autonomous Agents and Multi Agent

Systems, 2004

[86] Simbad Project Home [Online]. Available: http://simbad.sourceforge.net/index.php

[87] P. Reiners: Robots, mazes, and subsumption architecture [Online]. Available:

http://www.ibm.com/developerworks/java/library/j-robots/

北京理工大学博士学位论文!

126

[88] L. Hugues, and N. Bredeche, “Simbad: an Autonomous Robot Simulation Package for Education and

Research” in Proc. of the 9th International Conference on Simulation of Adaptive Behavior, SAB

2006, Rome, Italy, Sept. 2006.

[89] J. Klein, “BREVE: a 3D Environment for the Simulation of Decentralized Systems and Artificial Life,”

in Proc. of the eighth international conference on Artificial life

[90] Breve software [Online]. Available: http://en.wikipedia.org/wiki/Breve(software)

[91] The breve Simulation Environment [Online]. Available: http://www.spiderland.org/

[92] Sapounidis, T, Demetriadis S. Educational robots driven by tangible programming languages: A

review on the field. Adv. Intell. Syst. Comput. 2017, 560: 205–214.

[93] Koenig N, and Howard A. Design and use paradigms for Gazebo, an open-source multi-robot

simulator. In: International Conference on Intelligent Robots and Systems, ser. IROS ’04, 2004, 3:

2149– 2154.

[94] Wang J, Lewis M, and Gennari K. Usar: A game-based simulation for teleoperation. In: IEEE

International conference on systems, man and cybernatics, 2003, 493–497.

[95] Schmits, T., Visser, A.: An Omnidirectional Camera Simulation for the USARSim World. Lecture

Notes in Artificial Intelligence, 2009, 5339: 296–307.

[96] Beck D, Ferrein A, Lakemeyer G. A simulation environment for middle-size robots with multi-level

abstraction. Lecture Notes in Artificial Intelligence, 2008, 5001: 136–147.

[97] NVIDIA Isaac Sim. [Online]. Available: https://developer.nvidia.com/isaac-sim. Accessed:

01.10.2021.

[98] C. Won, J. Ryu, and J. Lim, “SweepNet: Wide-baseline Omnidirectional Depth Estimation,” 2019

International Conference on Robotics and Automation (ICRA), pp. 6073-6079, 2019.

[99] Z. Zhang, H. Rebecq, C. Forster, and D. Scaramuzza, “Benefit of large field-of-view cameras for

visual odometry,” 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 801-

808, 2016.

[100] B. Paul, “Creating fisheye image sequences with Unity3D,” 2015. [Online]. Available:

https://www.researchgate.net/publication/279963195_Creating_fisheye_image_sequences_with_Unity

3D. Accessed: 29.06.2020.

[101] K. Whitehouse, and D. Culler, “Macrocalibration in sensor/actuator networks. Mobile Networks and

Applications,” vol. 8, no. 4, pp. 463-472, 2003.

[102] G. Zhang, and Z. Wei, “A novel calibration approach to structured light 3D vision inspection,” Optics

and Laser Technology, vol. 34, no. 5, pp. 373–380, Jul. 2002.

[103] Z. Xie, W. Zhu, Z. Zhang, and M. Jin, “A novel approach for the field calibration of line structured-

light sensors,” Measurement, vol. 43, no. 2, pp. 190–196, Feb. 2010.

[104] Z. Liu, X. Li, F. Li, and G. Zhang, “Calibration method for linestructured light vision sensor based on

a single ball target,” Optics and Lasers in Engineering, vol. 69, pp. 20–28, Jun. 2015.

北京理工大学博士学位论文!

127

[105] Z. Wei, M. Shao, G. Zhang, and Y. Wang, “Parallel-based calibration method for line-structured light

vision sensor,” Optical Engineering, vol. 53, no. 3, Mar. 2014.

[106] Q. Zhang, and R. Pless, “Extrinsic calibration of a camera and laser range finder (improves camera

calibration),” In Proc. of the 2004 IEEE/RSJ International Conference on Intelligent Robots and

Systems (IROS), vol. 3, Jan. 2004.

[107] J. Xu, B. Gao, C. Liu, P. Wang, and S. Gao, “An omnidirectional 3D sensor with line laser scanning,”

Optics and Lasers in Engineering, vol. 84, pp. 96–104, Sep. 2016.

[108] C. Mei and P. Rives. Calibration between a central catadioptric camera and a laser range finder for

robotic applications. In Proceedings 2006 IEEE International Conference on Robotics and Automation,

2006. ICRA 2006., pages 532–537, Orlando, FL, USA, May 2006. IEEE. DOI:

10.1109/ROBOT.2006.1641765.

[109] F. Vasconcelos, J. P. Barreto, and U. Nunes. A minimal solution for the extrinsic calibration of a

camera and a laser-rangefinder. IEEE Transactions on Pattern Analysis and Machine Intelligence,

34(11):2097–2107, Nov 2012. DOI: 10.1109/TPAMI.2012.18.

[110] X. Gong, Y. Lin, and J. Liu. 3D LIDAR-camera extrinsic calibration using an arbitrary trihedron.

Sensors, 13(2):1902–1918, Feb 2013. DOI: 10.3390/s130201902.

[111] R. Gomez-Ojeda, J. Briales, E. Fernandez-Moral, and J. Gonzalez-Jimenez. Extrinsic calibration of a

2d laser-rangefinder and a camera based on scene corners. In 2015 IEEE International Conference on

Robotics and Automation (ICRA), pages 3611–3616, Seattle, WA, USA, May 2015. IEEE. DOI:

10.1109/ICRA.2015.7139700.

[112] Y. Bok, D.-G. Choi, and I. S. Kweon. Extrinsic calibration of a camera and a 2D laser without overlap.

Robotics and Autonomous Systems, 78:17–28, April 2016. DOI: 10.1016/j.robot.2015.12.007.

[113] M. Pereira, D. Silva, V. Santos, and P. Dias. Self calibration of multiple LIDARs and cameras on

autonomous vehicles. Robotics and Autonomous Systems, 83:326–337, Sep 2016. DOI:

10.1016/j.robot.2016.05.010.

[114] C. Guindel, J. Beltrán, D. Martín, and F. García. Automatic extrinsic calibration for lidarstereo vehicle

sensor setups. In 2017 IEEE 20th International Conference on Intelligent Transportation Systems

(ITSC), pages 1–6, Yokohama, Japan, Oct 2017. IEEE. DOI: 10.1109/ITSC.2017.8317829.

[115] K. A. Yousef, B Mohd, K. Al-Widyan, and T. Hayajneh. Extrinsic calibration of camera and 2D laser

sensors without overlap. Sensors, 17(10):2346, Oct 2017. DOI: 10.3390/s17102346.

[116] T. Kühner and J. Kümmerle. Extrinsic multi sensor calibration under uncertainties. In 2019 IEEE

Intelligent Transportation Systems Conference (ITSC), pages 3921–3927, Auckland, New Zealand,

Oct 2019. IEEE. DOI: 10.1109/ITSC.2019.8917319.

[117] M. Oliveira, A. Castro, T. Madeira, P. Dias, and V. Santos. A general approach to the extrinsic

calibration of intelligent vehicles using ROS. In M. Silva, J. L. Lima, L. P. Reis, A. Sanfeliu, and D.

北京理工大学博士学位论文!

128

Tardioli, editors, Robot 2019: Fourth Iberian Robotics Conference, pages 203–215, Cham, 2020.

Springer International Publishing. DOI: 10.1007/978-3-030-35990-4_17.

[118] E. B. Bacca, E. Mouaddib, and X. Cufi, “Embedding Range Information in Omnidirectional Images

through Laser Range Finder,” In Proc. of the 2010 IEEE International Conference on Intelligent

Robots and Systems, pp. 2053–2058, Oct. 2010.

[119] B. Wang, M. Wu, W. Jia, “The Light Plane Calibration Method of the Laser Welding Vision

Monitoring System,” In Proc. of the 2017 2nd International Conference on Mechatronics and

Electrical Systems (ICMES), vol. 339, 2018.

[120] L. Kurnianggoro, V. Hoang, and K. Jo, “Calibration of a 2D Laser Scanner System and Rotating

Platform using a Point-Plane Constraint,” Computer Science and Information, vol. 12, no. 1, pp. 307-

322, Jan. 2015.

[121] X. Chen, F. Zhou, and T. Xue, “Omnidirectional field of view structured light calibration method for

catadioptric vision system,” Measurement, vol. 148, Dec. 2019.

[122] M. Nakagawa, T. Yamamoto, S. Tanaka, M. Shiozaki, and T. Ohhashi. “Topological 3D Modeling

Using Indoor Mobile Lidar Data,” in ISPRS - International Archives of the Photogrammetry, Remote

Sensing and Spatial Information Sciences, pp. 13-18, May 2015 doi: 10.5194/isprsarchives-XL-4-W5-

13-2015.

[123] X. Li, S. Li, S. Jia, and C. Xu, "Mobile robot map building based on laser ranging and kinect," 2016

IEEE International Conference on Information and Automation (ICIA), pp. 819-824, Aug. 2016 doi:

10.1109/ICInfA.2016.7831932.

[124] F. Tsai, T. Wu, I. Lee, H. Chang, and A. Su, “Reconstruction of Indoor Models Using Point Clouds

Generated from Single-Lens Reflex Cameras and Depth Images,” ISPRS - International Archives of

the Photogrammetry, Remote Sensing and Spatial Information Sciences, pp. 99-102, May 2015.

[125] M. Bosse, and R. Zlot, “Continuous 3D scan-matching with a spinning 2D laser,”' in Proc. IEEE Int.

Conf. Robot. Automat., pp. 4312-4319, May 2009.

[126] A. Nuchter, K. Lingemann, J. Hertzberg, and H. Surmann, “6D SLAM with approximate data

association,'” in Proc. Int. Conf. Adv. Robot., pp. 242-249, 2005.

[127] T. Fujita, “3D sensing and mapping for a tracked mobile robot with a movable laser ranger finder,” in

Int. J. Mech. Mechatron. Eng., vol. 6, no. 2, pp. 501-506, 2012.

[128] T. Ueda, H. Kawata, T. Tomizawa, A. Ohya, and S. Yuta, “Mobile sokuiki sensor system-accurate

range data mapping system with sensor motion,” in Proc. IEEE Int. Conf. Auton. Robots Agents, 2006,

pp. 1-6.

[129] H. Qin, Y. Bi, F. Lin, Y. F. Zhang, and B. M. Chen, “A 3D rotating laser based navigation solution for

micro aerial vehicles in dynamic environments,'' Unmanned Syst., vol. 6, pp. 1-8, Sep. 2018.

北京理工大学博士学位论文!

129

[130] Y. Son, S. Yoon, S. Oh, and S. Han, “A Lightweight and Cost-Effective 3D Omnidirectional Depth

Sensor Based on Laser Triangulation,” in IEEE Access, vol. 7, pp. 58740-58750, 2019, doi:

10.1109/ACCESS.2019.2914220.

[131] P. De Ruvo, G. De Ruvo, A. Distante, M. Nitti, E. Stella, and F. Marino, “An Omnidirectional Range

Sensor for Environmental 3-D Reconstruction,” In Proc. of the 2010 IEEE Symposium on Industrial

Electronics (ISIE), pp. 396–401, Jul. 2010.

[132] X. Lian, Z. Liu, X. Wang, and L. Dou, “Reconstructing Indoor Environmental 3D Model Using Laser

Range Scanners and Omnidirectional Camera,” In Proc. of the 7th World Congress on Intelligent

Control and Automation, vol. 1, no. 23, pp. 1640–1644, Jun. 2008.

[133] R. Benosman, and S. Kang, “Panoramic Vision,” Springer Verlag ISBN 0-387-95111-3, 2000.

[134] Fermuller C, Aloimonos Y, Baker P, et al., “Multi-camera Networks: Eyes from Eyes,” IEEE

Workshop on Omnidirectional Vision, 2000.

[135] Cutler R., “Distributed meetings: a meeting capture and broadcasting system,” Tenth ACM

International Conference on Multimedia, pp. 503-512, 2002.

[136] F. Huang, and R. Klette, “Stereo panorama acquisition and automatic image disparity adjustment for

stereoscopic visualization,” Multimedia Tools Appl., vol. 47, pp. 353–377 , 2010.

[137] J. Yu, L. McMillan, and P. Sturm, “Multi-perspective modelling, rendering and imaging,” Comput.

Graph. Forum 29, pp. 227–246, 2010.

[138] D. Zamalieva, and A. Yilmaz, “Background subtraction for the moving camera: a geometric approach,”

Comput. Vis. Image Underst. pp. 127, 73–85, 2014.

[139] S. Baker, and S. K. Nayar, “A theory of catadioptric image formation,” in Proc. of the Int. Conf. on

Computer Vision, Bombay, pp. 35–42, 1998.

[140] H. H. P. Wu, and S. H. Chang, “Fundamental matrix of planar catadioptric stereo systems,” IET

Comput. Vis., vol. 4, pp. 85–104, 2010.

[141] I. Cinaroglu, and Y. Bastanlar, “A direct approach for object detection with catadioptric

omnidirectional cameras,” Signal Image Video Process., vol. 10, pp. 413–420, 2016.

[142] C. Geyer, and K. Daniilidis, “A unifying theory for central panoramic systems and practical

implications, in Proc. of the Eur. Conf. on Computer Vision, Dublin, pp. 159–179, 2000.

[143] M. Blösch, S. Weiss, D. Scaramuzza, and R. Siegwart, “Vision based MAV navigation in unknown

and unstructured environments,” In Proc. of the International Conference on Robotics and Automation

(ICRA 2010), Anchorage, Alaska, May 2010.

[144] M. Bosse, R. Rikoski, J. Leonard, and S. Teller, “Vanishing points and 3d lines from omnidirectional

video,” In Proc. of the International Conference on Image Processing, 2002.

[145] P. Corke, D. Strelow, and S. Singh, “Omnidirectional visual odometry for a planetary rover,” In Proc.

of the International Conference on Intelligent Robots and Systems, 2004.

北京理工大学博士学位论文!

130

[146] D. Scaramuzza, F. Fraundorfer, and R. Siegwart, “Real-time monocular visual odometry for on-road

vehicles with 1-point RANSAC,” In Proc. of the International Conference on Robotics and

Automation (ICRA 2009), Kobe, Japan, May 2009.

[147] D. Scaramuzza, and R. Siegwart, R., “Appearance guided monocular omnidirectional visual odometry

for outdoor ground vehicles,” IEEE Transactions on Robotics, vol. 24, no. 5, Oct. 2008.

[148] J. Tardif, Y. Pavlidis, and K. Daniilidis, “Monocular visual odometry in urban environments using an

omnidirectional camera,” In Proc. of the International Conference on Intelligent Robots and Systems,

2008.

[149] D. Scaramuzza, F. Fraundorfer, and M. Pollefeys, “Closing the loop in appearance guided

omnidirectional visual odometry by using vocabulary trees,” Robotics and Autonomous System

Journal (Elsevier), 2010.

[150] R. Benosman, and S. Kang, “Panoramic Vision,” Sensors, Theory, and Applications, New York,

Springer-Verlag, 2001.

[151] K. Daniillidis, and R. Klette, “Imaging Beyond the Pinhole Camera,” New York, Springer, 2006.

[152] D. Scaramuzza, “Omnidirectional vision: from calibration to robot motion estimation,” PhD thesis n.

17635, ETH Zurich, February 2008.

[153] Brown, D. C.: “Close-range camera calibration”, Photogrammetric Engineering, 37(8):855-866, 1971.

[154] Heikkil¨a, J.: “Geometric camera calibration using circular control points”, TPAMI, 22(10):1066-1077,

2000.

[155] Swaminathan, R. and Nayar, S. K.: “Nonmetric calibration of wide-angle lenses and polycameras”,

TPAMI, 22(10):1172-1178, 2000.

[156] J. Kannala and S. Brandt. A generic camera calibration method for fish-eye lenses. Proceedings of the

17th International Conference on Pattern Recognition, 2004. ICPR 2004., 2004

[157] Peter Sturm. Camera models and fundamental concepts used in geometric computer vision. Now,

2011.

[158] Xianghua Ying, Zhanyi Hu, and Hongbin Zha. Fisheye lenses calibration using straight-line spherical

perspective projection constraint. Computer Vision ACCV 2006.

[159] Jan Heller, Didier Henrion, and Tomas Pajdla. Stable radial distortion calibration by polynomial

matrix inequalities programming, 2014. URL http://arxiv.org/abs/1409.5753.

[160] A. Bechar, and C. Vigneault, “Agricultural robots for field operations. Part 2: Operations and systems,”

Biosyst Eng, pp. 110–128, 2017.

[161] Kannala, J.; Brandt, S. A generic camera model and calibration method for conventional, wide-eye

and fisheye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 1335–1340.

[162] C. Mei and P. Rives. Single view point omnidirectional camera calibration from planar grids. In IEEE

International Conference on Robotics and Automation, Roma, Italy, April 2007

北京理工大学博士学位论文!

131

[163] D. Xiao-Ming, W. Fu-Chao, and W. Yi-Hong. An easy calibration method for central catadioptric

cameras. ACTA AUTOMATICA SINICA, 33:801{808, 2007.

[164] S. Gasparini, P. F. Sturm, and J. P. Barreto. Plane-based calibration of central catadioptric cameras. In

IEEE International Conference on Computer Vision, 2009.

[165] S. Shah and J. K. Aggarwal. A simple calibration procedure for sh-eye (highdistortion) lens camera. In

Proceedings of the 1994 International Conference on Robotics and Automation, San Diego, CA, USA,

May 1994.

[166] L. Puig, Y. Bastanlar, P. Sturm, J. J. Guerrero, and J. Barreto. Calibration of central catadioptric

cameras using a dlt-like approach. International Journal of Computer Vision, 93(1):101{114, March

2011.

[167] Christopher Mei. Christopher mei - research assistant, 2015b. URL

http://www.robots.ox.ac.uk/~cmei/Toolbox.html. (Accessed August 28 , 2015).

[168] Christopher Mei, 2015a. URL http://www.robots.ox.ac.uk/~cmei/articles/projection_model.pdf.

(Accessed August 23 , 2015).

[169] D. Scaramuzza, “OCamCalib: Omnidirectional Camera Calibration Toolbox for Matlab”. [Online].

Available: https://www.sites.google.com/site/scarabotix/ocamcalib-toolbox.

[170] S. Urban, J. Leitloff, and S. Hinz, “Improved Wide-Angle, Fisheye and Omnidirectional Camera

Calibration,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 108, pp. 72–79, Oct. 2015.

[171] D. Scaramuzza, A. Martinelli, and R. Siegwart, “A toolbox for easily calibrating omnidirectional

cameras,” In Proc. of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems,

Oct. 2006.

[172] Robotics System Toolbox. [Online]. Available: https://www.mathworks.com/help/robotics/ (accessed

Feb. 6, 2021).

[173] P. I. Corke, “A Robotics Toolbox for Matlab,” IEEE Robotics and Automation Magazine, 1996.

[174] C. Paniagua, L. Puig, and J. Guerrero, “Omnidirectional Structured Light in a Flexible Configuration,”

Sensors, vol. 13, no. 10, pp. 13903-13916, Oct. 2013.

[175] H. Kawasaki, R. Sagawa, Y. Yagi, R. Furukawa, N. Asada, and P. Sturm, “One-shot scanning method

using an uncalibrated projector and camera system,” In Proc. of the 2010 IEEE Computer Society

Conference on Computer Vision and Pattern Recognition – Workshops, pp. 104-111, Jul. 2010.

[176] J. Al-Azzeh, B. Zahran, Z. Alqadi, “Salt and Pepper Noise: Effects and Removal,” International

Journal on Informatics Visualization, vol. 2, no. 4, pp. 252-256, Jul. 2018.

[177] D. Anguelov, D. Koller, E. Parker, and S. Thrun, “Detecting and modeling doors with mobile robots,”

In Proc. of the IEEE Int. Conf. on Robotics & Automation (ICRA), pp. 3777–3784, 2004.

[178] B. Limketkai, L. Liao, and D. Fox, “Relational object maps for mobile robots,” In Proceedings of the

19th International Joint Conference on Artificial Intelligence, IJCAI’05, pp. 1471–1476, San

Francisco, CA, USA, 2005.

北京理工大学博士学位论文!

132

[179] R. B. Rusu, W. Meeussen, S. Chitta, and M. Beetz, “Laser-based Perception for Door and Handle

Identification,” In Proc. of the Intl. Conf. on Advanced Robotics (ICAR), 2009.

[180] E. Klingbeil, A. Saxena, and A. Ng, “Learning to open new doors,” In Proc. of the IEEE/RSJ Int. Conf.

on Intelligent Robots and Systems (IROS), pp. 2751–2757, 2010.

[181] D. Kragic, L. Petersson, and H. I. Christensen, “Visually guided manipulation tasks. In Robotics and

Autonomous Systems,” pp. 193 – 203, 2002.

[182] M. Quigley, S. Batra, S. Gould, E. Klingbeil, Q. Le, A. Wellman, and A. Ng, “High-accuracy 3d

sensing for mobile manipulation: Improving object detection and door opening,” In Proc. of the IEEE

Int. Conf. on Robotics & Automation (ICRA), pp. 2816–2822, 2009.

[183] A. Andreopoulos, and J. K. Tsotsos, “Active vision for door localization and door opening using

playbot: A computer controlled wheelchair for people with mobility impairments,” In Computer and

Robot Vision, 2008. CRV ’08. Canadian Conference on, pp. 3–10, May 2008.

[184] Cusano C, Napoletano P, Schettini R. Intensity and color descriptors for texture classification. In:

Image Processing: Machine Vision Applications VI, 2013; 8661: 866113.

[185] Napoletano P. Hand-Crafted vs Learned Descriptors for Color Texture Classification. In: International

Workshop on Computational Color Imaging, March 2017, 259–271.

[186] Schmidhuber J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61: 85–117.

[187] Napoletano P, Flavio P, and Raimondo S. Anomaly Detection in Nanofibrous Materials by CNN-

Based Self-Similarity. Sensors, 2018, 18(1): 209. https://doi.org/10.3390/s18010209.

[188] Napoletano P. Visual descriptors for content-based retrieval of remote-sensing images. Int. J. Remote

Sens. 2018, 39: 1–34.

[189] Bianco S, Celona L, Napoletano P, Schettini R. On the Use of Deep Learning for Blind Image Quality

Assessment. arXiv, 2017, arXiv:1602.05531.

[190] Cusano C, Napoletano P, Schettini R. Combining multiple features for color texture classification. J.

Electron. Imaging, 2016, 25: 061410-061410.

[191] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” In Proc. of the

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778, Jun. 2016.

[192] C. Wu, “Towards Linear-Time Incremental Structure from Motion,” 2013 International Conference on

3D Vision - 3DV 2013, 2013, pp. 127-134.

[193] C. Sui, K. He, C. Lyu, Z. Wang and Y. -H. Liu, "Active Stereo 3-D Surface Reconstruction Using

Multistep Matching," in IEEE Transactions on Automation Science and Engineering, vol. 17, no. 4, pp.

2130-2144, Oct. 2020.

[194] H. Wang, J. Wang and L. Wang, "Online Reconstruction of Indoor Scenes from RGB-D Streams,"

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3271-3279.

北京理工大学博士学位论文!

133

[195] S. -H. Baek and F. Heide, "Polka Lines: Learning Structured Illumination and Reconstruction for

Active Stereo," 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),

2021, pp. 5753-5763.

[196] Z. Li, P. C. Gogia and M. Kaess, "Dense Surface Reconstruction from Monocular Vision and

LiDAR," 2019 International Conference on Robotics and Automation (ICRA), 2019, pp. 6905-6911.

[197] F. Sadeghi, and S. Levine, “CAD2RL: Real single-image flight without a single real image,” 20167

arXiv:1611.04201.

Reconstruction of the Indoor Environment based on Omnidirectional Vision with the Laser Structured Lights

Abstract and Figures

Recommended publications

Calibration and three-dimensional reconstruction with a photorealistic simulator based on the omnidi...

Внешняя калибровка всенаправленной системы компьютерного зрения и метод реконструкции внутренней сре...

Measuring the Geometry of Half-Cylinders in a Pipe Production Line on a Basis of an Omnidirectional...

Extrinsic Sensor Calibration Methods for Mobile Robots: A Short Review