Fig 3 - uploaded by Seokju Lee
Content may be subject to copyright.
Relative pose of the Velodyne scanner and the omni-directional camera. 

Relative pose of the Velodyne scanner and the omni-directional camera. 

Similar publications

Chapter
Full-text available
We propose in this paper a minimal speed-based pedestrian model for which particle dynamics are intrinsically collision-free. The speed model is an optimal velocity function depending on the agent length (i.e. particle diameter), maximum speed and time gap parameters. The direction model is a weighted sum of exponential repulsion from the neighbour...

Citations

... The concept of semantic segmentation was first proposed by Lee et al. [20] which is defined as assigning a pre-defined label to each pixel in an image to represent its semantic category. Being able to extract the information that the image itself needs to express base on its texture, scene, and other high-level semantic features is more practical. ...
Article
Full-text available
This paper elaborates on a three-dimensional scene reconstruction method based on point clouds, images, and the fusion of images and point clouds. Relevant evaluation indicators are used to evaluate the performance of traffic scene reconstruction technology. The problems in traffic reconstruction under image and point cloud elements are analyzed and summarized. Finally, the challenges and future research directions in the field of traffic scene reconstruction based on images and point clouds are pointed out.
... Knowledge of the 3D environment structure and the motion of dynamic objects is essential for autonomous navigation (Shashua, Gdalyahu, and Hayun 2004;Geiger et al. 2014). The 3D structure is valuable because it implicitly models the relative position of the agent, and it is also utilized to improve the performance of high-level scene understanding tasks such as detection and segmentation (Lee et al. 2015(Lee et al. , 2017Yang et al. 2018;Shin, Kwon, and Tomizuka 2019;Behley et al. 2019;Lee et al. 2019b). Besides scene structure, the 3D motion of the agent and traffic participants such as pedestrians and vehicles is also required for safe driving. ...
Article
We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion, and depth in a monocular camera setup without supervision. Our technical contributions are three-fold. First, we highlight the fundamental difference between inverse and forward projection while modeling the individual motion of each rigid object, and propose a geometrically correct projection pipeline using a neural forward projection module. Second, we design a unified instance-aware photometric and geometric consistency loss that holistically imposes self-supervisory signals for every background and object region. Lastly, we introduce a general-purpose auto-annotation scheme using any off-the-shelf instance segmentation and optical flow models to produce video instance segmentation maps that will be utilized as input to our training pipeline. These proposed elements are validated in a detailed ablation study. Through extensive experiments conducted on the KITTI and Cityscapes dataset, our framework is shown to outperform the state-of-the-art depth and motion estimation methods. Our code, dataset, and models are publicly available.
... Knowledge of the 3D environment structure and the motion of dynamic objects is essential for autonomous navigation (Shashua, Gdalyahu, and Hayun 2004;Geiger et al. 2014). The 3D structure is valuable because it implicitly models the relative position of the agent, and it is also utilized to improve the performance of high-level scene understanding tasks such as detection and segmentation (Lee et al. 2015(Lee et al. , 2017Yang et al. 2018;Shin, Kwon, and Tomizuka 2019;Behley et al. 2019;Lee et al. 2019b). Besides scene structure, the 3D motion of the agent and traffic participants such as pedestrians and vehicles is also required for safe driving. ...
Preprint
Full-text available
We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision. Our technical contributions are three-fold. First, we highlight the fundamental difference between inverse and forward projection while modeling the individual motion of each rigid object, and propose a geometrically correct projection pipeline using a neural forward projection module. Second, we design a unified instance-aware photometric and geometric consistency loss that holistically imposes self-supervisory signals for every background and object region. Lastly, we introduce a general-purpose auto-annotation scheme using any off-the-shelf instance segmentation and optical flow models to produce video instance segmentation maps that will be utilized as input to our training pipeline. These proposed elements are validated in a detailed ablation study. Through extensive experiments conducted on the KITTI and Cityscapes dataset, our framework is shown to outperform the state-of-the-art depth and motion estimation methods. Our code, dataset, and models are available at https://github.com/SeokjuLee/Insta-DM .