Conference PaperPDF Available

Robust Method for Removing Dynamic Objects from Point Clouds

May 2020

May 2020

DOI:10.1109/ICRA40945.2020.9197168

Conference: 2020 IEEE International Conference on Robotics and Automation (ICRA)

Authors:

Divya Agarwal

Purdue University

Show all 6 authorsHide

Dynamic objects, highlighted in red, causing ghost trail effect in a point cloud map

…

Figures - uploaded by Divya Agarwal

Content may be subject to copyright.

Content uploaded by Divya Agarwal

Content may be subject to copyright.

Robust Method for Removing Dynamic Objects from Point Clouds

Shishir Pagad∗, Divya Agarwal∗, Sathya Narayanan Kasturi Rangan, Hyungjin Kim, Ganesh Yalla

Abstract— 3D point cloud maps are an accumulation of laser

scans obtained at different positions and times. Since laser scans

represent a snapshot of the surrounding at the time of capture,

they often contain moving objects which may not be observed

at all times. Dynamic objects in point cloud maps decrease the

quality of maps and affect localization accuracy, hence it is

important to remove the dynamic objects from 3D point cloud

maps. In this paper, we present a robust method to remove

dynamic objects from 3D point cloud maps. Given a registered

set of 3D point clouds, we build an occupancy map in which the

voxels represent the occupancy state of the volume of space over

an extended time period. After building the occupancy map, we

use it as a ﬁlter to remove dynamic points in lidar scans before

adding the points to the map. Furthermore, we accelerate the

process of building occupancy maps using object detection and a

novel voxel traversal method. Once the occupancy map is built,

dynamic object removal can run in real-time. Our approach

works well on wide urban roads with stopped or moving trafﬁc

and the occupancy maps get better with the inclusion of more

lidar scans from the same scene.

I. INTRODUCTION

Various autonomous robotic systems rely on maps for pre-

cise localization and navigation. Maps serve as a redundant

source of information for ﬁnding the location of the au-

tonomous robotic systems and also improve the localization

accuracy. 3D point cloud maps are one of the common map

formats used for this purpose and they represent a snapshot of

the static environment around the robotic systems. However,

most of the 3D point cloud maps are built from data collected

by driving a mobile mapping system on roads ﬁlled with

dynamic objects like vehicles, pedestrians etc. So, it is

important to remove the dynamic objects from maps so that

the autonomous robotic systems can use clean point cloud

maps for localization.

Furthermore, 3D point cloud maps are not a scalable map

representation as they occupy a lot of memory. There have

been many popular approaches to model 3D environments

[1], [2] or sub-sampled representations [3], [4]. However, full

resolution 3D point clouds form a basis for extracting useful

information from the map. For example, we can extract static

features like road and lane markers from the 3D point cloud

maps. But the dynamic object points in point cloud maps

are rendered as “ghost” tracks in the point cloud map, often

overlapping and occluding the view of road markers, trafﬁc

signs and other important static features, making it difﬁcult

to extract the static features from road, refer Fig. (1). For

∗Authors have contributed equally. All authors are with

Autonomous Driving, Perception Team, NIO USA Inc., San

Jose, CA, USA [shishir.pagad, divyaagarwal31,

sathya.aeronautics, hjkim0508]@gmail.com,

gyalla@us.toyota-itc.com

Fig. 1. Dynamic objects, highlighted in red, causing ghost trail effect in

a point cloud map

this reason, the dynamic object points need to be ﬁltered out

from point cloud when building the 3D point cloud maps.

In order to solve the problem discussed above, we borrow

valuable ideas from [3] and [5] and build an occupancy map

using octree data structure, in which the voxels represents

occupancy state of the volume of space for extended period

of time. After building the occupancy map, we identify the

points which fall in free voxels and remove them from 3D

point cloud scans. The contributions of our work are the

following:

•We propose a new occupancy probability update strat-

egy which builds persistent occupancy maps by con-

sidering the occupancy history of voxels. Unlike our

approach, [3] favors quick update of occupancy score

of voxels, favoring latest seen occupancy states.

•We provide an optional method to accelerate occupancy

map building process by classifying object points using

object detection method and a strategy for updating

occupancy map using these points.

•Furthermore, we provide a unique way of generating

artiﬁcial endpoints which are used to update the occu-

pancy score of the voxels.

The input to our algorithm is a set of registered 3D point

clouds, typically acquired by 3D laser scanner. First, we

classify the points in the point cloud into 3 categories: object

points, ground points and unknown points using ground

plane detection and object detection algorithms. Next, we

perform voxel traversal on the unknown and ground points

and decrease occupancy scores of all the voxels which are on

the path of the ray from the sensor origin to the endpoints,

but increase the occupancy score of the endpoint voxel.

Similarly, we perform voxel traversal on the object points,

but instead of increasing the occupancy score of the endpoint

voxel, we decrease its occupancy score as we already know

from object detection that the endpoint falls on a moving

object. We also maintain a set of ground voxels which

correspond to the ground points, and prevent them from

being marked free during the two voxel traversal steps. We

repeat this process for all the point clouds in the registered

set and build an occupancy map. Finally, we overlay the fully

built occupancy map on a point cloud map and remove points

which are in free voxels (refer Fig. 2). It should be noted

that the occupancy map grows more stable and accurate with

integration of more point clouds over time.

The paper is organized as follows. Section II discusses

related work; Section III describes our methodology pipeline

and the algorithm; Section IV presents experiments and its

analysis; ﬁnally, Section V provides concluding thoughts and

future work.

II. RELATED WORK

A good amount of work has been done on detect-

ing/removing dynamic objects in laser scans, and different

methods have been proposed to solve this problem. These

methods can be broadly classiﬁed into three categories:

•Model-free, change-detection based approach which re-

lies on comparing current laser scan with previous or a

set of previous scans and future scans [6], [7], [8], [9],

[10].

•Neural-network model based approach of classifying

the points corresponding to dynamic objects [11], and

generating bounding boxes around objects in laser scan

[12], [13], [14] and classifying the points inside these

bounding boxes as dynamic object points.

•Map based approach in which a global occupancy

map/voxel grid is built from laser scans using Bayes’

rule [3], or by just storing laser scan identiﬁers in the

voxels [5], and the occupancy map is used as a ﬁlter

to remove points in the free space. The dynamic object

removal method proposed in this paper is a hybrid of

neural network model based and occupancy map based

approaches.

In the model-free, change-detection based approaches, [6]

compares query scan against a reference scan and uses either

point to point or point to plane error metric to identify the

moving objects. The misclassiﬁed dynamic points are cor-

rected by comparing them against the free space of another

scan and the points not in free space are not dynamic and are

changed to static. [7] uses Dempster-Shafer Theory (DST) to

extract mobile objects from lidar point cloud and maps them

back to images to extract images of moving objects. Current

lidar scan is compared against nscans before and after the

current one. [8] identiﬁes the dynamic points in a dataset by

constructing occupancy grid using DST of source and target

datasets and ﬁnding conﬂicts between the two occupancy

grids. [9], similar to [8], uses DST to ﬁnd conﬂicting data

between two laser scans. And ﬁnally, [10] segments and

tracks the moving objects using motion cues and performs

point level matching between consecutive scans. The main

drawback of model-free approach is that for dynamic objects

to be fully detected/removed, it needs to have fully moved

outside of the volume it occupies between the two frames

being compared. One common scenario where this category

of algorithms wont work well is when vehicles have stopped

at a trafﬁc signal. Our method is independent of dynamic

object speed.

The model based dynamic object removal methods do

not require comparison of multiple scans to detect moving

objects. The probability scores are generated for individual

scans. [15] uses neural network to predict the probability

of 3D laser points being reﬂected by dynamic objects. The

computed probabilities are used to build a 3D grid map

where each cell represents the probability that a beam is

reﬂected by a static or dynamic object. [11], [12], [13] use

neural network to detect static/dynamic objects in laser scan

and generate bounding boxes around objects. Once we get

the bounding boxes, removing the points which lie inside

the bounding boxes is a trivial task. However, model based

dynamic object removal methods have a few drawbacks: they

can’t detect objects which they have not been trained on and

they occasionally fail to detect objects.

In map based approaches, both [3] and [5] build occupancy

map of the area being mapped. [3] uses octree [16] data

structure to store the occupancy information, and each node

in the octree stores occupancy probability of the node in the

form of log-odds for efﬁcient update. [5] uses voxel grid

instead of octree and each voxel in the grid stores identiﬁers

of all the laser rays that end in the voxel. Both [3] and [5] use

voxel traversal [17] to update occupancy information of the

nodes/voxels, where all the voxels which are in line of sight

of the sensor but containing non empty set of laser endpoints

are marked as dynamic. The fully built occupancy grid acts

as a binary classiﬁer to ﬁlter the points from the actual point

cloud map. However, the probability update function used in

[3] is very sensitive, and it’s suitable for mapping free space

and allows for quickly adding obstacles to the scene. But,

our goal is to map the static objects in scene and to make

probability update function less sensitive to dynamic objects.

Our approach is a hybrid of model-based and occupancy

map based approaches. We use best of both the approaches

to remove dynamic objects more effectively and maintain

occupancy maps which represent long-term occupancy states

of the area being mapped.

III. METHODOLOGY

In section III-A we describe why we prefer octree data

structure, section III-B we describe how we use object

detection to speed up the process of inserting point cloud

in the map. Section III-D, III-E and III-F discuss use of

free counter to make occupancy maps favor persistency over

easy updatability, and how we prune nodes with free counter

values. Finally, section III-G describes a unique ﬁltering

strategy to improve quality of our occupancy map.

A. Occupancy Octree

Octree is a hierarchical data structure used for storing in-

formation about 3D space. Each node in an octree represents

a volume of space, called a voxel. We use the log odds form

to represent the occupancy information of a space. We also

store the number of times a node was measured free as free

Input Point

Cloud

Ground Point

Detection

Object

Detection

Voxel

Traversal

Occupancy

Map

Input Point

Cloud

Occupancy

Map as Binary

Filter

Clean Point

Cloud

Point cloud with

point classification

Fig. 2. The overall pipeline of our system. (top) Input Point Cloud goes through ground point detection and object detection. Processed point cloud

contains points classiﬁed as ground, object and free. Voxel traversal is done and an occupancy map is built. (bottom) Octree occupancy ﬁlter is applied to

input point cloud map to get clean point cloud map.

counter in each node. We explain more about how we use

free counter to update the occupancy of voxel in Section

III-D.

A 3D space can be represented using several data struc-

tures, prominent among them are voxel-grid, k-d tree and

octree. We chose octree for occupancy maps because the

hierarchical structure of octrees allows for compact and

efﬁcient representation of space. In-contrast to voxel grid,

we only need to create nodes where occupancy information

is measured. In this paper, we are using octrees of ﬁxed

maximum depth of 16 and leaf node voxel size of 0.3 meters.

B. Object Detection and Voxel Traversal

Object Detection can be used to accelerate the process of

generating occupancy map and improve the precision and

recall scores for dynamic object removal. Our approach is

not dependent on any speciﬁc object detection method, but

in our experiments we used AVOD (Aggregate View Object

Detection) [13] network to get bounding boxes. Currently,

the network is trained to detect small and large vehicles.

However, many of the neural network based object detec-

tion methods occasionally fail to detect objects. The models

can only detect the objects classes which they have been

trained on. Furthermore, the bounding boxes often don’t

completely encompass the detected objects, shown in Fig.

(3). So, the point cloud maps built using the above method

will still have some dynamic points.

To remove the dynamic objects which were missed by

object detection method, we use the above method to classify

the points in a point cloud into two classes: object points,

points which lie inside bounding boxes of detected objects,

and non-object points, points outside the bounding boxes. For

each of the non-object points in the point cloud, we perform

voxel traversal to ﬁnd all the voxels along the laser ray from

sensor origin to the endpoint. We decrease the occupancy

probability of all these voxels except the endpoint voxel,

and increase the occupancy probability of endpoint voxel.

Next, we follow the same steps for inserting the object points,

but instead of increasing the occupancy probability of the

endpoint voxel, we decrease its occupancy probability. This

is because we already know, from object detection, that the

object points corresponds to a dynamic object. We explain

how free counter value of a node/voxel is updated in Section

III-D.

As mentioned previously, occasionally, the bounding

boxes generated around objects do not completely encompass

the object. We partially solve this problem by inserting object

points after non-object points have been inserted into the

occupancy map. This ordering ensures that the part of the

dynamic object not encompassed by the bounding box will be

removed when performing voxel traversal on object points.

C. Ground Point Detection

As noted in [3], performing voxel traversal to laser rays

sweeping a ﬂat surface at shallow angles lead to undesirable

discretization effects. The voxels which have been measured

occupied during voxel traversal may be marked free when

traversing another nearby voxel. This effect usually happens

on ﬂat surfaces like ground and ﬂat walls, and the effect

is rendered as holes in the ﬂat surface, shown in Fig. (4).

Octomap [3] overcomes this effect by updating a node only

once for a given point cloud and by giving preference

to occupied nodes over free nodes. However, this method

does not work in our case because we insert object points

after inserting non-object points, and we mark the nodes

corresponding to endpoints in object point cloud, as free.

Furthermore, the bounding box generated by object detection

may include the ground points under the detected objects and

falsely classify the ground points as dynamic object points.

As a result of this, some of the ground voxels may still be

marked as free.

To solve this problem, we maintain a counter per ground

voxel which indicates the number of times the voxel has been

classiﬁed as ground. We also maintain a set of all the detected

ground voxels and update this set for every lidar scan. Only

those voxels in the ground voxel set which have the counter

value greater than a certain threshold are considered as true

ground voxels and are prevented from being marked as free

voxels during ray traversal. Thus, the voxels which have

been wrongly classiﬁed as ground voxels in a few instances

will have smaller counter value compared to the true ground

voxels which have been consistently detected in most of the

lidar scans. We use RANSAC based ground plane detection

described in [14], [18].

Fig. 3. Dynamic Object points which are outside the bounding box, become

part of the point cloud causing false negatives. Object points are in red, non-

object points are in yellow.

Fig. 4. Holes on ﬂat walls caused by laser rays incident on the surface at

shallow angles

D. Weighted Probabilities

Octomap [3] uses clamping policy to allow easy updata-

bility and compressibility of occupancy octree map. The

clamping policy ensures that the log-odds value of a node

does not fall below the low threshold, lmin and does not go

beyond the high threshold lmax. A node is considered stable

when its log-odds value reaches either of the thresholds, and

these nodes have been measured free or occupied with high

conﬁdence.

The clamping policy ensures that all stable free and

occupied nodes have same log-odds values, thus enabling

the neighboring nodes with the same log-odds value to be

pruned. It also ensures that the occupancy states of the nodes

are easily updatable. For example, consider a robot which has

mapped a certain area and has measured a few nodes in front

of it as free. Now, if a person walks in front of the robot and

stands in its path, the robot should be able to quickly update

the nodes as occupied.

L(posterior) = L(measurements) + L(prior)(1)

L(posterior) = L(measurements

freecounter ) + L(prior)(2)

where L represents log odds.

However, the occupancy update policy in [3] favors latest

occupancy state of a voxel. In contrast, our goal is to

lmax

fcmax

lmax

fcmax

lmax

fcmax

lmax

fcmax

lmax

fcmax

lmax

fcmax

lmax

fcmax

lmax

fcmax

lmax

fcmax

Fig. 5. We use the log odds occupancy probability and free counter value

for pruning and expanding the nodes.

create occupancy map of an area which represents long-term

occupancy state of the environment, we want our occupancy

update algorithm to be less sensitive to dynamic objects. We

achieve this by maintaining free counter for each voxel and

by using weighted probability. The free counter of a voxel

is used to count the number of times the voxel has been

measured free. It is incremented by one, every time the voxel

is measured free during voxel traversal, and decremented by

one if its value is greater than one and the voxel is measured

occupied during voxel traversal.

To understand how this works, consider two scenarios.

First scenario where a voxel has a free counter value greater

than one, and the voxel has been measured occupied when

inserting the current point cloud into occupancy map. Since

the free counter value is greater than one, it indicates that

the voxel was measured free previously, so there is a high

possibility that this voxel may have been occupied by a

dynamic object in the current point cloud. So, we lighten the

probability update by dividing the probability of hit value of

the voxel with free counter value as seen in equation (2).

The intent behind this is to make it hard to increase the

occupancy probability of a voxel which has been measured

free previously. Higher the free counter value of a voxel,

harder it is to increase the occupancy probability of the voxel.

Second scenario is when the voxel has been wrongly marked

as occupied, due to wrong detection by object detection

method. In that case, we use the original equation (1).

This allows for easy updatability of the voxels belonging

to dynamic points.

E. Pruning and Expanding Nodes

The hierarchical structure of octree enables pruning of

nodes for efﬁcient representation of space. If all the children

of an inner node have same occupancy probability and

free counter value, then children can be pruned, and their

occupancy probability and free counter value are stored in

the parent node (refer Fig. 5). We also add clamping of

the occupancy probability in the same way as done in [3]

and clamp the free counter value to a maximum value of

fcmax and a minimum value of 1. After adding sufﬁcient

number of point clouds to the occupancy map, the nodes

corresponding to the static area will converge to same max

occupancy probability score and max free counter value of

fcmax.

Fig. 6. Virtual sphere used to generate artiﬁcial endpoints.

F. Spherical Projection

The endpoints far from laser sensor are often noisy, so

when mapping an outdoor space, it’s necessary to trim the

point cloud to a certain distance from the sensor to avoid

noisy data. However, by removing the endpoints which fall

beyond a certain range from the laser sensor, we lose valu-

able information about free space. Even if these endpoints

are noisy, we know the area between the laser and these

endpoints is free. We use this information to reﬁne our object

detection algorithm. The endpoints which lie beyond a radius

r from the sensor are projected back onto a virtual sphere

of radius r, centered at sensor origin, shown in Fig. (6).

Then, we perform voxel traversal for each endpoint projected

on the virtual sphere and decrease occupancy probability

of all the voxels along the rays from sensor origin to the

endpoints, including the voxel corresponding to the endpoint.

This allows us to artiﬁcially generate endpoints, for better

occupancy updates.

To project the points onto virtual sphere, we transform

the points from cartesian coordinate space to spherical co-

ordinate space (r,θand φ). Keeping θand φconstant, we

transform the points back to cartesian space by replacing

radius of the points with radius of the sphere rsphere.

G. Occupancy Octree Map as a Binary Filter

The stability and accuracy of occupancy octree map in-

creases with integration of data from multiple data collection

drives for the same region. With the assumption that the

point cloud registration works reasonably well, the static

parts of the map from different data collection drives should

be mapped to the same set of voxels in the occupancy octree

map, thus increasing the occupancy score and stabilizing

these voxels. On the contrary, the voxels corresponding to

the area of the occupancy octree map traversed by dynamic

objects would see ﬂuctuations in occupancy values. With our

approach, only the voxels which are repeatedly measured

as occupied are marked as occupied, and once a voxel is

measured as free then it makes it hard to mark it as occupied.

After the occupancy octree has stabilized, it can be super-

imposed on the point cloud map and used as a binary ﬁlter

to remove all the points which fall in the free space of the

octree map.

Fig. 7. The pipeline above shows from top to bottom a) input lidar scan

with ghosting effect b) octomap generated from the input scan c) output

clean lidar scan, generated on KITTI dataset.

IV. EXP ERI MEN TS AND RE SULTS

The goal of our experiments is to show our approach

of dynamic object points removal is superior compared

to removing the dynamic object points using just object

detection. We test our approach on real world data collected

from busy streets. We also benchmark our algorithm with

KITTI dataset.

A. Dataset

Our approach of removing dynamic object points from

point cloud relies on building occupancy maps. It is hard to

estimate the time invariant occupancy state of an environment

with a single set of scans of the area as scans may contain

moving objects. Therefore we need multiple sets of scans

collected from the same area, preferably at different times to

identify temporarily static objects like parked cars which oth-

erwise get marked as occupied. To the best of our knowledge,

the KITTI dataset has only a few loops of driving through

same parts of the map, mostly for loop closure. So, we also

build our own dataset using the setup described below.

Our setup consists of Velodyne VLP-32C Lidar mounted

on car roof and Novatel RTK GPS. Both sensors are synchro-

nized. The point cloud obtained is motion compensated using

GPS data. We collected four sets of data, each consisting of

a single loop around the urban roads near NIO ofﬁce. We

use NDT [19] to match lidar scans.

B. Evaluation

We assess the performance of our approach statistically by

using precision and recall. Precision represents the percent-

age of removed dynamic object points which actually corre-

sponds to dynamic objects. Recall represents the percentage

of total dynamic object points that were actually removed.

Precision scores are negatively affected by false positives

i.e, classifying static object points as dynamic object points.

Laser rays incident on ﬂat surfaces at shallow angles are one

of the main cause of decrease in precision scores. This effect

is very clearly explained in [3]. Furthermore, false positives

in object detection can also decrease precision score. To solve

the ﬁrst problem, we use ground plane detection method

explained in section III. This strategy helps minimize false

positives but not eliminate them.

C. Results

To evaluate our algorithm accuracy, we compare our

proposed method against KITTI dataset and [9]. Table I is

an overview of how our algorithm did on various KITTI

sequences. We used KITTI sequences that have wide roads

and moving cars, to better demonstrate our algorithm capa-

bilities. We have ground truth for moving objects for KITTI

to calculate P&R values. Table I shows better recall than

precision values. This is because we do not run our object

detection pipeline on KITTI dataset and we only have one

loop of the KITTI sequences. The values will signiﬁcantly

improve if we have more loops of the same sequence.

We also ran it on sequences mentioned in [9] as shown in

Table II, but we notice bad P&R values, as those sequences

have narrow roads surrounded by ﬂat building walls. This

causes false positives leading to bad precision values.

Table III shows, proposed method’s performance on our

dataset. We show differences in P&R values with and without

object detection. We also show that spherical projection helps

reduce false positives and false negatives. We can see the

P&R values are much better compared to Table I, due to

a number of reasons. Firstly, we use object detection to

TABLE I

KITTI DATA CATEG ORY: WI DE R OAD S / HI G HWAYS

number w/o Obj Detection

of scans P R

seq 004 345 0.738 0.570

seq 015 303 0.492 0.579

seq 016 285 0.495 0.738

seq 042 1176 0.460 0.827

TABLE II

KITTI DATA CATEG ORY: NA RRO W CIT Y ROA DS

number Proposed Method Postica. et al [9]

of scans P R P R

seq 091 346 0.14 0.51 0.25 0.75

seq 095 274 0.21 0.53 0.19 0.79

seq 104 318 0.11 0.49 0.44 0.87

remove dynamic objects. Secondly, we process the point

cloud in a particular order explained in Section III. Thirdly,

we have multiple loops of the data as shown in the table III,

which improves precision values. We do not perform manual

labelling of the dataset, hence we do not have ground truth.

We assume the output of the object detection method as our

absolute ground truth and calculate the P&R scores. And we

can see our method does signiﬁcantly better, even with one

loop of driving data. The overall pipeline shown in Fig. (7).

Since the method used by [5] uses very dense point

clouds generated from terrestrial scanner, we cannot di-

rectly compare our results. It also requires expensive surface

normal computation, while our method relies on weighted

probability described in Section III-D.

Another observation from our dataset, as we collect more

data our method gives almost similar P&R numbers, with

and without Object Detection, seen for loop4 in Table III.

This is why we mentioned object detection as an optional

algorithm for using our method.

Our pipeline is robust to scenarios with stop-and-go trafﬁc

as opposed to [6]. And performs good with consistent motion

also. The pipeline struggles with narrow roads surrounded by

tall ﬂat buildings, due to discretization in the ﬂat walls. The

results will be better if object detection and ground detection

are perfect.

V. CONCLUSIONS

In this paper, we propose a novel and robust method to

remove dynamic objects from point cloud maps. We use the

occupancy octree map to create clean point cloud. There are

limitations with the implementation of occupancy octree in

Octomap, it is not scalable for large scale outdoor maps. The

area covered by the occupancy octree map is limited by the

maximum depth and size of the voxels at maximum depth.

In our case with the maximum octree depth of 16 and voxel

size of 0.3 meters, it can only cover an area of (216 ∗0.3)3m,

which is 7599.82 cubic km of volume. This limitation can

be overcome by tiling, which is next topic of our research.

We rely on multiple loops of data in scenarios with heavy

trafﬁc due to lidar occlusions. Our method is robust and can

be considered for building long term good quality 3D point

cloud maps.

ACKNOWLEDGMENT

This work was supported and developed at NIO USA Inc.

We would like to thank NIO USA Inc. for sponsoring this

R&D work.

TABLE III

PRECISION AND REC AL L RE SU LTS ON O UR D ATASE T

PROBA BI L IT Y OCCUPANCY THRESHOLD = 0.8

w/o Obj Detection w. Obj Detection w/o Virtual Sphere

P R P R P R

loop1 0.163 0.939 0.161 0.917 0.177 0.659

loop2 0.191 0.890 0.192 0.874 0.229 0.790

loop3 0.215 0.878 0.214 0.870 0.268 0.793

loop4 0.226 0.868 0.225 0.860 0.307 0.811

REFERENCES

[1] R. Triebel, P. Pfaff, and W. Burgard, “Multi-level surface maps

for outdoor terrain mapping and loop closing,” in 2006 IEEE/RSJ

international conference on intelligent robots and systems. IEEE,

2006, pp. 2276–2282.

[2] I.-S. Kweon, M. Hebert, E. Krotkov, and T. Kanade, “Terrain mapping

for a roving planetary explorer,” in IEEE International Conference on

Robotics and Automation. IEEE, 1989, pp. 997–1002.

[3] A. Hornung, K. M. Wurm, M. Bennewitz, C. Stachniss, and W. Bur-

gard, “Octomap: An efﬁcient probabilistic 3d mapping framework

based on octrees,” Autonomous robots, vol. 34, no. 3, pp. 189–206,

2013.

[4] Y. Roth-Tabak and R. Jain, “Building an environment model using

depth information,” Computer, vol. 22, no. 6, pp. 85–90, 1989.

[5] J. Schauer and A. N¨

uchter, “The peopleremover—removing dynamic

objects from 3-d point cloud data by traversing a voxel occupancy

grid,” IEEE robotics and automation letters, vol. 3, no. 3, pp. 1679–

1686, 2018.

[6] D. Yoon, T. Tang, and T. Barfoot, “Mapless online detection of

dynamic objects in 3d lidar,” in 2019 16th Conference on Computer

and Robot Vision (CRV). IEEE, 2019, pp. 113–120.

[7] B. Vallet, W. Xiao, and M. Br´

edif, “Extracting mobile objects in

images using a velodyne lidar point cloud,” ISPRS annals of the

photogrammetry, remote sensing and spatial information sciences,

vol. 2, no. 3, p. 247, 2015.

[8] W. Xiao, B. Vallet, and N. Paparoditis, “Change detection in 3d

point clouds acquired by a mobile mapping system,” ISPRS Annals of

Photogrammetry, Remote Sensing and Spatial Information Sciences,

vol. 1, no. 2, pp. 331–336, 2013.

[9] G. Postica, A. Romanoni, and M. Matteucci, “Robust moving objects

detection in lidar data exploiting visual cues,” in 2016 IEEE/RSJ

International Conference on Intelligent Robots and Systems (IROS).

IEEE, 2016, pp. 1093–1098.

[10] A. Dewan, T. Caselitz, G. D. Tipaldi, and W. Burgard, “Motion-based

detection and tracking in 3d lidar scans,” in 2016 IEEE International

Conference on Robotics and Automation (ICRA). IEEE, 2016, pp.

4508–4513.

[11] Y. Zhou and O. Tuzel, “Voxelnet: End-to-end learning for point cloud

based 3d object detection,” in Proceedings of the IEEE Conference on

Computer Vision and Pattern Recognition, 2018, pp. 4490–4499.

[12] S. Shi, X. Wang, and H. Li, “Pointrcnn: 3d object proposal gener-

ation and detection from point cloud,” in Proceedings of the IEEE

Conference on Computer Vision and Pattern Recognition, 2019, pp.

770–779.

[13] J. Ku, M. Moziﬁan, J. Lee, A. Harakeh, and S. L. Waslander, “Joint

3d proposal generation and object detection from view aggregation,”

in 2018 IEEE/RSJ International Conference on Intelligent Robots and

Systems (IROS). IEEE, 2018, pp. 1–8.

[14] M. A. Fischler and R. C. Bolles, “Random sample consensus: a

paradigm for model ﬁtting with applications to image analysis and

automated cartography,” Communications of the ACM, vol. 24, no. 6,

pp. 381–395, 1981.

[15] P. Ruchti and W. Burgard, “Mapping with dynamic-object probabilities

calculated from single 3d range scans,” in 2018 IEEE International

Conference on Robotics and Automation (ICRA). IEEE, 2018, pp.

6331–6336.

[16] D. Meagher, “Geometric modeling using octree encoding,” Computer

graphics and image processing, vol. 19, no. 2, pp. 129–147, 1982.

[17] J. Amanatides, A. Woo et al., “A fast voxel traversal algorithm for ray

tracing,” in Eurographics, vol. 87, no. 3, 1987, pp. 3–10.

[18] M. Y. Yang and W. F¨

orstner, “Plane detection in point cloud data,” in

Proceedings of the 2nd int conf on machine control guidance, Bonn,

vol. 1, 2010, pp. 95–104.

[19] P. Biber and W. Straßer, “The normal distributions transform: A new

approach to laser scan matching,” in Proceedings 2003 IEEE/RSJ

International Conference on Intelligent Robots and Systems (IROS

2003)(Cat. No. 03CH37453), vol. 3. IEEE, 2003, pp. 2743–2748.

A Review of Dynamic Object Filtering in SLAM Based on 3D LiDAR

Article

Full-text available

Jan 2024
SENSORS-BASEL

SLAM (Simultaneous Localization and Mapping) based on 3D LiDAR (Laser Detection and Ranging) is an expanding field of research with numerous applications in the areas of autonomous driving, mobile robotics, and UAVs (Unmanned Aerial Vehicles). However, in most real-world scenarios, dynamic objects can negatively impact the accuracy and robustness of SLAM. In recent years, the challenge of achieving optimal SLAM performance in dynamic environments has led to the emergence of various research efforts, but there has been relatively little relevant review. This work delves into the development process and current state of SLAM based on 3D LiDAR in dynamic environments. After analyzing the necessity and importance of filtering dynamic objects in SLAM, this paper is developed from two dimensions. At the solution-oriented level, mainstream methods of filtering dynamic targets in 3D point cloud are introduced in detail, such as the ray-tracing-based approach, the visibility-based approach, the segmentation-based approach, and others. Then, at the problem-oriented level, this paper classifies dynamic objects and summarizes the corresponding processing strategies for different categories in the SLAM framework, such as online real-time filtering, post-processing after the mapping, and Long-term SLAM. Finally, the development trends and research directions of dynamic object filtering in SLAM based on 3D LiDAR are discussed and predicted.

Observation Time Difference: an Online Dynamic Objects Removal Method for Ground Vehicles

Preprint

Jun 2024

In the process of urban environment mapping, the sequential accumulations of dynamic objects will leave a large number of traces in the map. These traces will usually have bad influences on the localization accuracy and navigation performance of the robot. Therefore, dynamic objects removal plays an important role for creating clean map. However, conventional dynamic objects removal methods usually run offline. That is, the map is reprocessed after it is constructed, which undoubtedly increases additional time costs. To tackle the problem, this paper proposes a novel method for online dynamic objects removal for ground vehicles. According to the observation time difference between the object and the ground where it is located, dynamic objects are classified into two types: suddenly appear and suddenly disappear. For these two kinds of dynamic objects, we propose downward retrieval and upward retrieval methods to eliminate them respectively. We validate our method on SemanticKITTI dataset and author-collected dataset with highly dynamic objects. Compared with other state-of-the-art methods, our method is more efficient and robust, and reduces the running time per frame by more than 60$\%$ on average.

Hierarchical Fusion based High Precision SLAM for Solid-State Lidar

Article

Full-text available

Feb 2024
MEAS SCI TECHNOL

Solid-state LiDARs have become an important perceptual device for simultaneous localization and mapping (SLAM) due to its low-cost and high-reliability compared to mechanical LiDARs. Nevertheless, existing solid-state LiDARs-based SLAM methods face challenges, including drift and mapping inconsistency, when operating in dynamic environments over extended periods and long distances. To this end, this paper proposes a robust, high-precision, real-time LiDAR-inertial SLAM method for solid-state LiDARs. At the front-end, the raw point cloud is segmented to filter dynamic points in preprocessing process. Subsequently, features are extracted using a combination of Principal Component Analysis (PCA) and Mean Clustering to reduce redundant points and improve data processing efficiency. At the back-end, a hierarchical fusion method is proposed to improve the accuracy of the system by fusing the feature information to iteratively optimize the LiDAR frames, and then adaptively selecting the LiDAR keyframes to be fused with the IMU. The proposed method is extensively evaluated using a Livox Avia solid-state LiDAR collecting datasets on two different platforms. In experiments, the end-to-end error is reduced by 35% and the single-frame operational efficiency is improved by 12% compared to LiLi-OM.

Semantic-Assisted LIDAR Tightly Coupled SLAM for Dynamic Environments

Article

Full-text available

Jan 2024

The Simultaneous Localization and Mapping (SLAM) environment is evolving from static to dynamic. However, traditional SLAM methods struggle to eliminate the influence of dynamic objects, leading to significant deviations in pose estimation. Addressing these challenges in dynamic environments, this paper introduces a semantic-assisted LIDAR tightly coupled SLAM method. Specifically, to mitigate interference from dynamic objects, a scheme for calculating static semantic probability is proposed. This enables the segmentation of static and dynamic points while eliminating both stationary dynamic objects and moving environmental blocking objects. Additionally, in point cloud feature extraction and matching processes, we incorporate constraint conditions based on semantic information to enhance accuracy and improve pose estimation precision. Furthermore, a semantic similarity constraint is included within the closed-loop factor module to significantly enhance positioning accuracy and facilitate the construction of maps with higher global consistency. Experimental results from KITTI and M2DGR datasets demonstrate that our method exhibits generalization ability towards unknown data while effectively mitigating dynamic interference in real-world environments. Compared with current state-of-the-art methods, our approach achieves notable improvements in both accuracy and robustness.

MF-MOS: A Motion-Focused Model for Moving Object Segmentation

Conference Paper

Full-text available

Jan 2024

Moving object segmentation (MOS) provides a reliable solution for detecting traffic participants and thus is of great interest in the autonomous driving field. Dynamic capture is always critical in the MOS problem. Previous methods capture motion features from the range images directly. Differently, we argue that the residual maps provide greater potential for motion information, while range images contain rich semantic guidance. Based on this intuition, we propose MF-MOS, a novel motion-focused model with a dual-branch structure for LiDAR moving object segmentation. Novelly, we decouple the spatial-temporal information by capturing the motion from residual maps and generating semantic features from range images, which are used as movable object guidance for the motion branch. Our straightforward yet distinctive solution can make the most use of both range images and residual maps, thus greatly improving the performance of the LiDAR-based MOS task. Remarkably, our MF-MOS achieved a leading IoU of 76.7% on the MOS leaderboard of the SemanticKITTI dataset upon submission, demonstrating the current state-of-the-art performance. The implementation of our MF-MOS has been released at https://github.com/SCNU-RISLAB/MF-MOS.

Dense Map Construction by Stereo Camera with Removal of Dynamic Points

Conference Paper

Dec 2023

Improving Robustness in Dynamic Scenes: A Framework for Visual-Inertial-Semantic SLAM with Dynamic Probability Fusion

Conference Paper

Jan 2024

DynamicRemover: Constructing Static Map with Object Dynamic Probability Analysis

Conference Paper

Sep 2023

SSF-MOS: Semantic Scene Flow assisted Moving Object Segmentation for Autonomous Vehicles

Article

Jan 2024

Detecting moving objects in dynamic environments precisely is essential in autonomous driving. Existing object detection methods using point clouds have difficulties to distinguish moving and static objects in dynamic environments. Motivated by the optical flow method widely used in image-based dynamic object perception, we propose SSF-MOS (Semantic Scene Flow assisted Moving Object Segmentation), a unified framework that incorporates semantic information and ego-motion estimation in moving object segmentation. SSF-MOS first detects and excludes absolutely static objects, such as poles and roads, by applying the semantic segmentation method. Subsequently, the proposed semantic scene flow estimation method computes the motion vectors between consecutive point clouds and predicts the motion state (moving or static) of each point. Furthermore, SSF-MOS calibrates results of moving points by considering the ego-motion of autonomous vehicles. We directly introduce semantic information in the decoupled framework for more accurate results and convenience of upgrades. The extensive experiments show that the proposed SSF-MOS achieves competitive performance of 0.701 mIOU compared with other state-of-the-art methods on the public dataset SemanticKITTI.

Moving event detection from LiDAR point streams

Article

Full-text available

Jan 2024

In dynamic environments, robots require instantaneous detection of moving events with microseconds of latency. This task, known as moving event detection, is typically achieved using event cameras. While light detection and ranging (LiDAR) sensors are essential for robots due to their dense and accurate depth measurements, their use in event detection has not been thoroughly explored. Current approaches involve accumulating LiDAR points into frames and detecting object-level motions, resulting in a latency of tens to hundreds of milliseconds. We present a different approach called M-detector, which determines if a point is moving immediately after its arrival, resulting in a point-by-point detection with a latency of just several microseconds. M-detector is designed based on occlusion principles and can be used in different environments with various types of LiDAR sensors. Our experiments demonstrate the effectiveness of M-detector on various datasets and applications, showcasing its superior accuracy, computational efficiency, detection latency, and generalization ability.

Robust moving objects detection in lidar data exploiting visual cues

Conference Paper

Full-text available

Oct 2016

PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud

Conference Paper

Jun 2019

Mapless Online Detection of Dynamic Objects in 3D Lidar

Conference Paper

May 2019

Joint 3D Proposal Generation and Object Detection from View Aggregation

Conference Paper

Oct 2018

VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection

Conference Paper

Jun 2018

Mapping with Dynamic-Object Probabilities Calculated from Single 3D Range Scans

Conference Paper

May 2018

The Peopleremover—Removing Dynamic Objects From 3-D Point Cloud Data by Traversing a Voxel Occupancy Grid

Article

Feb 2018

Even though it would be desirable for most post-processing purposes to obtain a point cloud without moving objects in it, it is often impractical or downright impossible to free a scene from all non-static clutter. Outdoor environments contain pedestrians, bicycles, and motor vehicles which cannot easily be stopped from entering the sensor range and indoor environments like factory production lines cannot be evacuated due to production losses during the time of the scan. In this paper we present a solution to this problem that we call the "peopleremover". Given a registered set of 3D point clouds we build a regular voxel occupancy grid and then traverse it along the lines of sight between the sensor and the measured points to find differences in volumetric occupancy between the scans. Our approach works for scan slices from mobile mapping as well as for the more general scenario of terrestrial scan data. The result is a clean point cloud free of dynamic objects.

VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection

Article

Nov 2017

Accurate detection of objects in 3D point clouds is a central problem in many applications, such as autonomous navigation, housekeeping robots, and augmented/virtual reality. To interface a highly sparse LiDAR point cloud with a region proposal network (RPN), most existing efforts have focused on hand-crafted feature representations, for example, a bird's eye view projection. In this work, we remove the need of manual feature engineering for 3D point clouds and propose VoxelNet, a generic 3D detection network that unifies feature extraction and bounding box prediction into a single stage, end-to-end trainable deep network. Specifically, VoxelNet divides a point cloud into equally spaced 3D voxels and transforms a group of points within each voxel into a unified feature representation through the newly introduced voxel feature encoding (VFE) layer. In this way, the point cloud is encoded as a descriptive volumetric representation, which is then connected to a RPN to generate detections. Experiments on the KITTI car detection benchmark show that VoxelNet outperforms the state-of-the-art LiDAR based 3D detection methods by a large margin. Furthermore, our network learns an effective discriminative representation of objects with various geometries, leading to encouraging results in 3D detection of pedestrians and cyclists, based on only LiDAR.

Plane Detection in Point Cloud Data

Technical Report