Fig 3 - uploaded by Amnon Shashua
Content may be subject to copyright.
Three-dimensional rendering of the estimated depth map from images in Fig. 1. These images show the inverse depth, k ˆ I z , which is the natural value to compute. In (b), the texture map was removed to show the detail and the flaws.  

Three-dimensional rendering of the estimated depth map from images in Fig. 1. These images show the inverse depth, k ˆ I z , which is the natural value to compute. In (b), the texture map was removed to show the detail and the flaws.  

Source publication
Article
Full-text available
We describe a direct method for estimating structure and motion from image intensities of multiple views. We extend the direct methods of Horn and Weldon (1988) to three views. Adding the third view enables us to solve for motion and compute a dense depth map of the scene, directly from image spatio-temporal derivatives in a linear manner without f...

Similar publications

Article
Full-text available
The aim of this study is to present a fast parallel implementation of the Horn and Schunck method using a new kind of recurrent neural network called discrete Zhang neural networks. This network is characterized by a few iterations to converge which make it very suitable for real-time motion estimation. To compute the optical flow, we propose to sol...
Conference Paper
Full-text available
In this paper we present a method for achieving real-time view interpolation in a virtual navigation application that uses a collection of pre-captured panoramic views as a representation of the environment. In this context, viewpoint interpolation is essential to achieve smooth and realistic viewpoint transition while the user is moving from one p...
Conference Paper
Full-text available
We present a mapping approach to road scene awareness based on active stereo vision. We generalise traditional static multi-camera rectification techniques to enable active epipolar rectification with a mosaic representation of the output. The approach is used to apply standard static depth mapping and optical flow techniques to the active case. We...
Conference Paper
Full-text available
In this paper, we present an observation model based on the Lucas and Kanade algorithm for computing optical flow, to track objects using particle filter algorithms. Although optical flow information enables us to know the displacement of objects present in a scene, it cannot be used directly to displace an object model since flow calculation techn...
Conference Paper
Full-text available
In this paper we address the problem of dense stereo matching and computation of optical flow. We propose a generalized dense correspondence computation algorithm, so that stereo matching and optical flow can be performed robustly and efficiently at the same time. We particularly target automotive applications and tested our method on real sequence...

Citations

... Prior to the development of robust feature point descriptors, pixel-intensity-based techniques for depth estimation from image sequences were also common [30,19,29,18,15,39,33]. Our method thus falls in this category. ...
Preprint
We present a feature-free photogrammetric technique that enables quantitative 3D mesoscopic (mm-scale height variation) imaging with tens-of-micron accuracy from sequences of images acquired by a smartphone at close range (several cm) under freehand motion without additional hardware. Our end-to-end, pixel-intensity-based approach jointly registers and stitches all the images by estimating a coaligned height map, which acts as a pixel-wise radial deformation field that orthorectifies each camera image to allow homographic registration. The height maps themselves are reparameterized as the output of an untrained encoder-decoder convolutional neural network (CNN) with the raw camera images as the input, which effectively removes many reconstruction artifacts. Our method also jointly estimates both the camera's dynamic 6D pose and its distortion using a nonparametric model, the latter of which is especially important in mesoscopic applications when using cameras not designed for imaging at short working distances, such as smartphone cameras. We also propose strategies for reducing computation time and memory, applicable to other multi-frame registration problems. Finally, we demonstrate our method using sequences of multi-megapixel images captured by an unstabilized smartphone on a variety of samples (e.g., painting brushstrokes, circuit board, seeds).
... In image-based (photometric) optimization there is always a distinguished reference frame providing fixed measurements [38,39,9]. Selecting a single reference in photometric VSLAM is unnecessary and may be inadvisable. It is unnecessary as the density of reconstruction is not our main goal. ...
... The use of direct algorithms for SFM applications was studied for small-scale problems [68,69,70,39], but feature-based alignment has proven more successful in handling wide baseline matching problems [12] as small pixel displacements is an integral assumption for direct methods. Nonetheless, with the increasing availability of high frame-rate cameras, video applications, and increasing computational power, direct methods are demonstrating great promise [5,6,9]. ...
... In this work, in contrast to previous research in direct image-based alignment [9,39], we show that provided good initialization, it is possible to jointly refine the structure and motion parameters by minimizing the photometric error and without restricting the camera motion or the scene structure. ...
Article
Full-text available
We propose a novel algorithm for the joint refinement of structure and motion parameters from image data directly without relying on fixed and known correspondences. In contrast to traditional bundle adjustment (BA) where the optimal parameters are determined by minimizing the reprojection error using tracked features, the proposed algorithm relies on maximizing the photometric consistency and estimates the correspondences implicitly. Since the proposed algorithm does not require correspondences, its application is not limited to corner-like structure; any pixel with nonvanishing gradient could be used in the estimation process. Furthermore, we demonstrate the feasibility of refining the motion and structure parameters simultaneously using the photometric in unconstrained scenes and without requiring restrictive assumptions such as planarity. The proposed algorithm is evaluated on range of challenging outdoor datasets, and it is shown to improve upon the accuracy of the state-of-the-art VSLAM methods obtained using the minimization of the reprojection error using traditional BA as well as loop closure.
... Partially, the problem of their presence can be solved by post-correction of the displacement vector field, such as a spatial filter [7], smoothing [8] and so on. Another way to solve the problem of their influence is to increase the subset size [9,10]. This size must be sufficient to compare a few portions of images and minimize the noise influences, on the other hand to lead to the minimum smoothing of displacement vector fields. ...
Article
Full-text available
An algorithm for selection of the size of a correlation kernel at displacement vector field construction by the method of digital image correlation has been proposed. The algorithm has been tested on simulated and experimental optical images having different texture. The influence of the correlation kernel size and image texture on nose immunity at determining displacements has been studied. It is shown that the proposed algorithm allows to find this size providing the minimum error when determination of displacements and estimation of deformation.
... Another method of solving the problem is to increase the aperture (computation window) size; in this case, some constraints are imposed onto the global model [10,11]. In this case, it becomes possible to use computational windows whose size is comparable with the size of the entire image, which eliminates the main "bottleneck" of this approach, i.e., the lack of local information. ...
Article
An algorithm of constructing displacement vector fields, based on the incremental approach to estimating displacements of image fragments, is proposed. Application of this algorithm for processing model and experimental optical images is analyzed. Noise immunity of the incremental algorithm for determining displacements is compared with that of the combined algorithm developed by the authors. It is shown that the proposed incremental algorithm ensures a more noise-immune estimate of displacements in processing images of a deformation relief essentially varying in time.
... x a a× = t w and γ is the orientation of the normal flow at image position x. Stein et al. proposed the use of three views imaging the same scene to estimate the camera motion [9]. Suppose the general motion between view 1 and view 2 is motion 1: (t 1 , w 1 ), view 1 and view 3 is motion 2: (t 2 , w 2 ) as shown in Fig. 1. ...
... Unlike the work of [7,9], we derived (7) using vector entities t a and w a as defined in (6). This facilitates the development of our proposed method later. ...
Conference Paper
Full-text available
Calibrating hand-eye geometry is often based on explicit feature correspondences. This article presents an alternative method that uses the apparent flow induced by the motion of the camera to achieve self-calibration. To make the method more robust against noise, the strategy is to use the orientation of the normal flow field which is more noise-immune, to recover first the direction component of the hand-eye geometry. Outliers in the extracted flow data are identified using some intrinsic properties of the flow field together with the partially recovered hand-eye geometry. The final complete solution is refined using a robust process. The proposed method can also be used for determining the relative geometry of multiple cameras without demanding overlap in the visual fields of the cameras. Experimental results on synthetic data and real image data are shown to illustrate the feasibility of the method.
... Une autre façon d'incorporer les contraintes photométriques et géométriques pour l'analyse de la structure et du mouvement a été étudiée par Stein et Shashua dans [SS00]. Le flot optique fournit un champ de mouvement dense qu'on peut assimiler à de l'information photométrique mais souffre du problème d'ouverture (aperture problem), qui est gênant pour les scènes avec des longs bords droits, et souffre aussi de l'hypothèse d'intensité constante, qui est au contraire résolue par les descripteurs invariants aux changements de contraste tels que SIFT. ...
Thesis
L'analyse de la structure et du mouvement permet d'estimer la forme d'objets 3D et la position de la caméra à partir de photos ou de vidéos. Le plus souvent, elle est réalisée au moyen des étapes suivantes : 1) L'extraction de points d'intérêt, 2) La mise en correspondance des points d'intérêt entre les images à l'aide de descripteurs photométriques des voisinages de point, 3) Le filtrage des appariements produits à l'étape précédente afin de ne conserver que ceux compatibles avec une contrainte géométrique fixée, dont on peut alors calculer les paramètres. Cependant, la ressemblance photométrique seule utilisée en deuxième étape ne suffit pas quand plusieurs points ont la même apparence. Ensuite, la dernière étape est effectuée par un algorithme de filtrage robuste, Ransac, qui nécessite de fixer des seuils, ce qui se révèle être une opération délicate. Le point de départ de ce travail est l'approche A Contrario Ransac de Moisan et Stival, qui permet de s'abstraire des seuils. Ensuite, notre première contribution a consisté en l'élaboration d'un modèle a contrario qui réalise la mise en correspondance à l'aide de critères photométrique et géométrique, ainsi que le filtrage robuste en une seule étape. Cette méthode permet de mettre en correspondance des scènes contenant des motifs répétés, ce qui n'est pas possible par l'approche habituelle. Notre seconde contribution étend ce résultat aux forts changements de point de vue, en améliorant la méthode ASift de Morel et Yu. Elle permet d'obtenir des correspondances plus nombreuses et plus densément réparties, dans des scènes difficiles contenant des motifs répétés observés sous des angles très différents
... Une autre façon d'incorporer les contraintes photométriques et géométriques pour l'analyse de la structure et du mouvement a été étudiée par Stein et Shashua dans [SS00]. Le flot optique fournit un champ de mouvement dense qu'on peut assimiler à de l'information photométrique mais souffre du problème d'ouverture (aperture problem), qui est gênant pour les scènes avec des longs bords droits, et souffre aussi de l'hypothèse d'intensité constante, qui est au contraire résolue par les descripteurs invariants aux changements de contraste tels que SIFT. ...
Article
The analysis of structure from motion allows one to estimate the shape of 3D objects and the position of the camera from pictures or videos. It usually follows these three steps: 1) Extracting points of interest, 2) Matching points of interest using photometric descriptors computed on point neighborhoods, 3) Filtering previous matches so as to retain only those compatible with a geometric constraint, whose parameters can then be computed. However, for the second step, the photometric criterion is not enough on its own when several points are alike. As for the third step, it uses the Ransac robust filtering scheme, which requires setting thresholds, and that can be a difficult task. This work is based on Moisan and Stival's A Contrario Ransac approach, which allows one to set thresholds automatically. After assessing that method, the first contribution was the elaboration an a contrario model, which simultaneously achieves robust filtering and matching through both geometric and photometric criteria. That method allows one to match scenes with repeated patterns, which is impossible with the usual approach. The second contribution extended that result to strong viewpoint changes, improving the ASift method. The matches obtained are both more numerous and more densely distributed, in scenes containing many repeated patterns seen from very different angles.
... Another way of incorporating photometric and geometric constraint for structure and motion estimation has been investigated by Stein and Shashua in [56]. Optical flow provides photometric information but suffers from the well known aperture problem which is painful for scenes with long straight edges (and also suffers from the constant intensity assumption, which is on the contrary overcome by contrast invariant descriptors such as SIFT.) ...
... Optical flow provides photometric information but suffers from the well known aperture problem which is painful for scenes with long straight edges (and also suffers from the constant intensity assumption, which is on the contrary overcome by contrast invariant descriptors such as SIFT.) To avoid this, the authors of [56] propose to build the so-called tensor brightness constraint which is based on both the optical flow and the trifocal tensor which encodes the geometry between three views [22]. However, as every method based on the optical flow, this cannot be extended to large transformations between views, which would make the optical flow estimation unreliable. ...
Article
Full-text available
Matching or tracking interest points between several views is one of the keystones of many computer vision applications. The procedure generally consists in several independent steps, basically interest point extraction, then interest point matching by keeping only the ''best correspondences'' with respect to similarity between some local descriptors, and final correspondence pruning to keep those that are consistent with a realistic camera motion (here, consistent with epipolar constraints or homography transformation.) Each step in itself is a delicate task which may endanger the whole process. In particular, repeated patterns give lots of false correspondences in descriptor-based matching which are hardly, if ever, recovered by the final pruning step. We discuss here the specific difficulties raised by repeated patterns in the point correspondence problem. Then we show to what extent it is possible to address these difficulties. Starting from a statistical model by Moisan and Stival, we propose a one-stage approach for matching interest points based on simultaneous descriptor similarity and geometric constraint. The resulting algorithm has adaptive matching thresholds and is able to pick up point correspondences beyond the nearest neighbour. We also discuss Generalized Ransac and we show how to improve Morel and Yu's Asift, an effective point matching algorithm to make it more robust to the presence of repeated patterns.
... Direct approaches are also called intensity-based, appearance-based, template-based, or even texture-based in the computer vision community. In these approaches the intensity value of the pixels is directly exploited to recover the related parameters (Irani and Anandan (1999); Stein and Shashua (2000)). Which mean that in contrast with feature-based approaches, there is no feature extraction step. ...
Article
Autonomous navigation in unknown environments has been the focus of attention in the mobile robotics community for the last three decades. When neither the location of the robot nor a map of the region are known, localization and mapping are two tasks that are highly inter-dependent and must be performed concurrently. This problem, is known as Simultaneous Localization and Mapping (SLAM). In order to gather accurate information about the environment, mobile robots are equipped with a variety of sensors that together form a perception system that allows accurate localization and reconstruction of reliable and consistent representations of the environment. We believe that a perception system composed of the odometry of the robot, an omnidirectional camera and a 2D laser range finder provide enough information to solve the SLAM problem robustly. In this context we propose an appearance-based approach to solve the SLAM problem and reconstruct a reliable 3D representation of the environment. This approach relies on a tightly-coupled laser/omnidirectional sensor in order to take profit of the complementarity of each sensor modality. A novel generic robot-centered representation that is well adapted to the appearance-based SLAM is proposed. This augmented spherical view is constructed using the depth information from the laser range finder and the floor plane, together with lines extracted from the omnidirectional image. The appearance-based localization method minimizes a non-linear cost function directly built from the augmented spherical view. However, recursive optimization methods suffer from convergence problems when initialized far from the solution. This is also true for our method where an initialization sufficiently close to the solution is needed to ensure rapid convergence and reduce computational cost. A Enhanced Polar Scan Matching algorithm is used to obtain this initial guess of the position of the robot to initialize the algorithm.
... Another way of incorporating photometric and geometric constraint for structure and motion estimation has been investigated by Stein and Shashua in [51]. Optical flow [19] provides photometric information but suffers from the well known aperture problem which is painful for scenes with long straight edges (and also suffers from the constant intensity assumption, which is on the contrary overcome by contrast invariant descriptors such as SIFT). ...
... Optical flow [19] provides photometric information but suffers from the well known aperture problem which is painful for scenes with long straight edges (and also suffers from the constant intensity assumption, which is on the contrary overcome by contrast invariant descriptors such as SIFT). To avoid this, the authors of [51] propose to build the so-called tensor brightness constraint which is based on both the optical flow and the trifocal tensor which encodes the geometry between three views [18]. However, as every method based on the optical flow, this cannot be extended to large transformations between views which makes the optical flow estimation unreliable. ...
Article
Full-text available
Matching or tracking points of interest between several views is one of the keystones of many computer vision applications, especially when considering structure and motion estimation. The procedure generally consists in several independent steps, basically 1) point of interest extraction, 2) point of interest matching by keeping only the ``best correspondences'' with respect to similarity between some local descriptors, 3) correspondence pruning to keep those consistent with an estimated camera motion (here, consistent with epipolar constraints or homography transformation). Each step in itself is a touchy task which may endanger the whole process. In particular, repeated patterns give lots of false matches in step 2) which are hardly, if never, recovered by step 3). Starting from a statistical model by Moisan and Stival, we propose a new one-stage approach to steps 2) and 3), which does not need tricky parameters. The advantage of the proposed method is its robustness to repeated patterns.