Example of rolling shutter imagery.

Source publication

Rectifying rolling shutter video from hand-held devices

Conference Paper

Full-text available

Jul 2010

This paper presents a method for rectifying video sequences from rolling shutter (RS) cameras. In contrast to previous RS rectification attempts we model distortions as being caused by the 3D motion of the camera. The camera motion is parametrised as a continuous curve, with knots at the last row of each frame. Curve parameters are solved for using...

Context 1

... can now loop over all values of x , and use (30) to find the pixel locations in the distorted image, and cubically interpolate these. In order to do a controlled evaluation of algorithms for RS compensation we have generated six test sequences (available at [18]), using the Autodesk Maya software pack- age. Each sequence consists of 12 RS distorted frames of size 640 × 480 , corresponding ground-truth global shutter (GS) frames, and masks that indicate pixels in the ground- truth frames that can be reconstructed from the corresponding rolling-shutter frame. In order to suit all algorithms, the ground-truth frames and masks come in three variants: for rectification to the time instant when the first, middle and last row of the RS frame were imaged. Each synthetic RS frame is created from a GS sequence with one frame for each RS image row. One row in each GS image is used, starting at the top row and sequentially down to the bottom row. In order to simulate an inter-frame delay, we also generate a number of GS frames that are not used to build any of the RS frames. The camera is however still moving during these frames. We have generated four kinds of synthetic sequences, using different camera motions in a static scene, see figure 4. In the first sequence type, the camera rotates around its centre in a spiral fashion, see figure 4 top left. Three different versions of this sequence exist to test the importance of modelling the inter-frame delay. The different inter-frame delays are N b = 0 , 20 and 40 blank rows (i.e. the number of unused GS frames). In the second sequence type, the camera makes a pure translation to the right and has an inter-frame delay of 40 blank rows, see figure 4 top right. In the last two sequence types the camera makes an up/down rotating movement, with a superimposed rotation from left to right, see figure 4 bottom left. There is also a back-and-forth rotation with the viewing direction as axis. The last sequence type is the same as the third except that a translation parallel to the image plane has been added, see figure 4 bottom right. For each frame in the ground-truth sequences, we have created masks that indicate pixels that can be reconstructed from the corresponding RS frame, see figure 5. These masks were rendered by inserting one light source for each image row, into an otherwise dark scene. The light sources had a rectangular shape that illuminates exactly the part of the scene that was imaged by the RS camera when located at that particular place. To acquire the mask, a global shutter render is triggered at the desired location (e.g. corresponding to first, middle or last row in the RS-frame). We have compared the errors of the three interpolation approaches described in section 3, in figure 6. Here we have used known ground-truth rotations to rectify each frame in two pure camera rotation sequences, sequence type #1, with N b = 0 , and sequence type #3, with N b = 40 (see section 4 for a description of the sequences). We have used two pure rotation sequences, as for these an almost perfect reconstruction is possible, and thus the errors shown are due to interpolation only. The error measure used is average Euclidean distance to the RGB pixel values in the ground truth images, within the valid mask. In some frames, the methods differ quite a bit, while in others they are very similar. The reason for this is that only for larger rotations, do the neighbours in the distorted and undistorted images start to differ. As can be seen in figure 6, griddata and our forward interpolation are superior to inverse sampling. Among the three methods, griddata stands out, by being approximately 40 × more expensive on 640 × 480 images. As our forward interpolation scheme is both fast and accurate, we recommend it over the other methods. For very fast motions, and a slow rolling shutter, the 3 × 3 grid used in forward interpolation may be too small. The interpolated image would then have pixels where the value is undefined. In our experiments on real video we have however not experienced this. Should this effect occur, one could simply increase the grid size to 5 × 5 . We have compared our methods to the global affine model (GA) [13], and the global shift model (GS) [8] on our synthetic sequences, see section 4. The comparison is done using thresholded Euclidean colour distance. Pixels that de- viate more than d thr = 0 . 3 are counted as incorrect. We have also tried other threshold values, and while the exact choice changes the locations of the curves, it does not change their order (for reasonable values of d thr ). As evaluation measure we use the fraction of correctly reconstructed pixels within the mask of valid locations. For clarity of presentation, we only present a subset of the results on our synthetic dataset. As a baseline, all plots contain the errors for uncorrected frames, with respect to the first frame ground-truth. As our reconstruction solves for several cameras in each frame interval, we have simply chosen to present all of them in the following plots. E.g. Rotation 1, 2, and 3 in figure 7 are the three solutions in a 3-frame reconstruction. In figure 7 we compare the GA, and GS methods with our pure rotation model. The sequence used is type #1 (rotation only), with N b = 0 . As can be seen, our methods do better than GA, GS, and the baseline. In figure 8 we compare the GA, and GS methods with our pure rotation model on sequence type #3 (rotation only), with N b = 40 . As can be seen our methods do better than both GA, GS, and the baseline. GA, and GS, have problems with this sequence, and sometimes fall below the baseline. In general, other values of N b give very similar results for our methods. For GA and GS the variability is larger, but we have not seen any consistent degradation or improvement. In figure 9, left, we compare the GA, and GS methods with our pure rotation model. The sequence used is type #2 (translation only), with N b = 40 . As can be seen our methods do slightly worse than GA and GS, but they still improve on the uncorrected input. In figure 9, right, we compare GA, GS, and our translation-only model. The translation reconstruction for the first frame is still worse than GA and GS, but the other two do significantly better. In figure 10 we have compared GA, GT, with our rotation only model (left) and with the full model (right). As can be seen, the rotation only model does consistently better than the others. Note that the full model currently does worse than the rotation only model. When we gave the optimiser different starting points, (e.g. the result from the rotation model) we obtained different solutions, thus we conclude that the cost function for the full model is not convex. A better initialisation may solve this problem, but this is out of the scope of this paper. We have done a simple comparison of RS compensation algorithms on real imagery, using image stabilisation. Such a comparison requires that the imaged scene is static, and that the camera translation is negligible. We do this by tracking two points through the RS frames, using the KLT-tracker [15, 19]. After rolling-shutter compensation, we perform a virtual rotation of the frames (using a global homography), such that two points in the scene are placed symmetrically about the image centre, along a vertical line, see figure 11. The only manual input to this approach is that the two points are indicated manually in the first frame. We supply two such stabilised sequences as supplemen- tal material (one for the iPhone 3GS and one from the W890i), together with the corresponding uncorrected RS sequences, and results for the GA and GS methods. A single frame comparison of the rectification step is also shown in figure 1, for the iPhone 3GS. In this paper, we have demonstrated rolling-shutter rectification by modelling the camera motion, and shown this to be superior to techniques that model movements in the image plane only. We even saw that image-plane techniques occasionally perform worse than the uncorrected baseline. This is especially true for motions that they do not model, e.g. rotations for the Global shift model [8]. The method we currently see as the best one is the rotation only model. In addition to being the overall best method, it is also the fastest of our models. Note that even this model corrects for more types of camera motion than does mechanical image stabilisation (MIS). In future work we plan to improve our approach by replacing the linear interpolation with a higher order spline. We will also in- vestigate better initialisations for the full model. Another obvious improvement is to optimise parameters over full sequences. However, we wish to stress that our aim is currently to allow the algorithm to run on mobile platforms, which excludes optimisation over longer frame intervals than the 2-4 that we currently use. In general, the quality of the reconstruction should bene- fit from more measurements. In MIS systems, camera rotations are measured by MEMS gyro sensors [4]. It would be interesting to see how such measurements could be combined with measurements from KLT-tracking when rectifying video. There are also accelerometers in many cellphones, and measurements from these could also be useful in ego-motion ...

View in full-text

Context 2

... consumer products that allow video capture are quite common. Examples are cell-phones, music players, and regular cameras. Most of these devices, as well as cam- corders in the consumer price range, have CMOS image sensors. CMOS sensors have several advantages over the conventional CCD sensors: they are cheaper to manufac- ture, and typically offer on-chip processing [9], for e.g. au- tomated white balance and auto-focus measurements. However, most CMOS sensors, by design make use of what is known as a rolling shutter (RS). In an RS camera, detector rows are read and reset sequentially. As the detectors collect light right until the time of readout, this means that each row is exposed during a slightly different time window. The more conventional CCD sensors on the other hand use a global shutter (GS), where all pixels are reset simultane- ously, and collect light during the same time interval. The downside with a rolling shutter is that since pixels are acquired at different points in time, motion of either camera or target will cause geometrical distortions in the acquired images. Figure 1 shows an example of geometric distortions caused by using a rolling shutter, and how this frame is rectified by our proposed method, as well as two others. A camera motion between two points in time can be described with a three element translation vector, and a 3DOF (degrees-of-freedom) rotation. For hand-held footage, the rotation component is typically the dominant cause of image plane motion. (A notable exception to this is footage from a moving platform, such as a car.) Many new cam- corders thus have mechanical image stabilisation (MIS) systems that move the lenses (some instead move the sensor) to compensate for small pan and tilt rotational motions (image plane rotations, and large motions, are not hand- led). The MIS parameters are typically optimised to the frequency range caused by a person holding a camera, and thus work well for such situations. However, since lenses have a certain mass, and thus inertia, MIS has problems keeping up with faster motions, such as caused by vibrations from a car engine. Furthermore, cell phones, and lower end ...

View in full-text

Performance evaluation of accelerated functional MRI acquisition using compressed sensing

Conference Paper

Full-text available

Aug 2009

Functional MRI (fMRI) has been widely accepted as a standard tool to study the function of brain. However, because of the limited temporal resolution of MR scanning, researchers have experienced difficulties in various event related cognitive studies which usually require higher temporal resolution than the current acquisition protocol. Even if sev...

Application of the Karhunen-Loève Transform to the C5G7 benchmark in the Response Matrix Method

Conference Paper

Full-text available

Nov 2015

Adaptive Karhunen-Lòeve Transform for Multichannel Audio

Conference Paper

Full-text available

Oct 2007

In previous works, the authors proposed a hierarchical bandwidth limitation technique based on the Karhunen-Lòeve Transform (KLT) to reduce the bandwidth for multichannel audio transmission. The subjective results proved that this technique could be used to reduce the overall bandwidth without significant audio quality degradation. Further study fo...

The crowd congestion level — A new measure for risk assessment in video-based crowd monitoring

Conference Paper

Full-text available

Dec 2016

In this paper, we propose a new characteristic measure relative people density and motion dynamics for the purpose of long-term crowd monitoring. While many related works focus on direct people counting and absolute density estimation, we will show that relative densities provide reliable information on crowd behaviour. Furthermore, we will discuss...

Coding of spectrally homogeneous regions in multispectral image compression

Conference Paper

Full-text available

Oct 1996

In this paper we present a new approach in the compression of multispectral images. It is based on the merging of two processing tools, namely Karhunen-Loeve transform (KLT) as a spectral decorrelator, and object based image coding schemes. The potential use of the principal components analysis in multispectral imagery is described and it is used t...

Towards High-Frequency Tracking and Fast Edge-Aware Optimization

Preprint

Full-text available

Sep 2023

Akash Bapat

This dissertation advances the state of the art for AR/VR tracking systems by increasing the tracking frequency by orders of magnitude and proposes an efficient algorithm for the problem of edge-aware optimization. AR/VR is a natural way of interacting with computers, where the physical and digital worlds coexist. We are on the cusp of a radical change in how humans perform and interact with computing. Humans are sensitive to small misalignments between the real and the virtual world, and tracking at kilo-Hertz frequencies becomes essential. Current vision-based systems fall short, as their tracking frequency is implicitly limited by the frame-rate of the camera. This thesis presents a prototype system which can track at orders of magnitude higher than the state-of-the-art methods using multiple commodity cameras. The proposed system exploits characteristics of the camera traditionally considered as flaws, namely rolling shutter and radial distortion. The experimental evaluation shows the effectiveness of the method for various degrees of motion. Furthermore, edge-aware optimization is an indispensable tool in the computer vision arsenal for accurate filtering of depth-data and image-based rendering, which is increasingly being used for content creation and geometry processing for AR/VR. As applications increasingly demand higher resolution and speed, there exists a need to develop methods that scale accordingly. This dissertation proposes such an edge-aware optimization framework which is efficient, accurate, and algorithmically scales well, all of which are much desirable traits not found jointly in the state of the art. The experiments show the effectiveness of the framework in a multitude of computer vision tasks such as computational photography and stereo.

Experimental Tests and Simulations on Correction Models for the Rolling Shutter Effect in UAV Photogrammetry

Article

Full-text available

May 2023

Many unmanned aerial vehicles (UAV) host rolling shutter (RS) cameras, i.e., cameras where image rows are exposed at slightly different times. As the camera moves in the meantime, this causes inconsistencies in homologous ray intersections in the bundle adjustment, so correction models have been proposed to deal with the problem. This paper presents a series of test flights and simulations performed with different UAV platforms at varying speeds over terrain of various morphologies with the objective of investigating and possibly optimising how RS correction models perform under different conditions, in particular as far as block control is concerned. To this aim, three RS correction models have been applied in various combinations, decreasing the number of fixed ground control points (GCP) or exploiting GNSS-determined camera stations. From the experimental tests as well as from the simulations, four conclusions can be drawn: (a) RS affects primarily horizontal coordinates and varies notably from platform to platform; (b) if the ground control is dense enough, all correction models lead practically to the same mean error on checkpoints; however, some models may cause large errors in elevation if too few GCP are used; (c) in most cases, a specific correction model is not necessary since the affine deformation caused by RS can be adequately modelled by just applying the extended Fraser camera calibration model; (d) using GNSS-assisted block orientation, the number of necessary GCP is strongly reduced.

Rolling Shutter Inversion: Bring Rolling Shutter Images to High Framerate Global Shutter Video

Preprint

Full-text available

Oct 2022

A single rolling-shutter (RS) image may be viewed as a row-wise combination of a sequence of global-shutter (GS) images captured by a (virtual) moving GS camera within the exposure duration. Although RS cameras are widely used, the RS effect causes obvious image distortion especially in the presence of fast camera motion, hindering downstream computer vision tasks. In this paper, we propose to invert the RS image capture mechanism, i.e., recovering a continuous high framerate GS video from two time-consecutive RS frames. We call this task the RS temporal super-resolution (RSSR) problem. The RSSR is a very challenging task, and to our knowledge, no practical solution exists to date. This paper presents a novel deep-learning based solution. By leveraging the multi-view geometry relationship of the RS imaging process, our learning-based framework successfully achieves high framerate GS generation. Specifically, three novel contributions can be identified: (i) novel formulations for bidirectional RS undistortion flows under constant velocity as well as constant acceleration motion model. (ii) a simple linear scaling operation, which bridges the RS undistortion flow and regular optical flow. (iii) a new mutual conversion scheme between varying RS undistortion flows that correspond to different scanlines. Our method also exploits the underlying spatial-temporal geometric relationships within a deep learning framework, where no additional supervision is required beyond the necessary middle-scanline GS image. Building upon these contributions, we represent the very first rolling-shutter temporal super-resolution deep-network that is able to recover high framerate GS videos from just two RS frames. Extensive experimental results on both synthetic and real data show that our proposed method can produce high-quality GS image sequences with rich details, outperforming the state-of-the-art methods.

Context-Aware Video Reconstruction for Rolling Shutter Cameras

Preprint

Full-text available

May 2022

With the ubiquity of rolling shutter (RS) cameras, it is becoming increasingly attractive to recover the latent global shutter (GS) video from two consecutive RS frames, which also places a higher demand on realism. Existing solutions, using deep neural networks or optimization, achieve promising performance. However, these methods generate intermediate GS frames through image warping based on the RS model, which inevitably result in black holes and noticeable motion artifacts. In this paper, we alleviate these issues by proposing a context-aware GS video reconstruction architecture. It facilitates the advantages such as occlusion reasoning, motion compensation, and temporal abstraction. Specifically, we first estimate the bilateral motion field so that the pixels of the two RS frames are warped to a common GS frame accordingly. Then, a refinement scheme is proposed to guide the GS frame synthesis along with bilateral occlusion masks to produce high-fidelity GS video frames at arbitrary times. Furthermore, we derive an approximated bilateral motion field model, which can serve as an alternative to provide a simple but effective GS frame initialization for related tasks. Experiments on synthetic and real data show that our approach achieves superior performance over state-of-the-art methods in terms of objective metrics and subjective visual quality. Code is available at \url{https://github.com/GitCVfb/CVR}.

Bringing Rolling Shutter Images Alive with Dual Reversed Distortion

Preprint

Mar 2022

Rolling shutter (RS) distortion can be interpreted as the result of picking a row of pixels from instant global shutter (GS) frames over time during the exposure of the RS camera. This means that the information of each instant GS frame is partially, yet sequentially, embedded into the row-dependent distortion. Inspired by this fact, we address the challenging task of reversing this process, i.e., extracting undistorted GS frames from images suffering from RS distortion. However, since RS distortion is coupled with other factors such as readout settings and the relative velocity of scene elements to the camera, models that only exploit the geometric correlation between temporally adjacent images suffer from poor generality in processing data with different readout settings and dynamic scenes with both camera motion and object motion. In this paper, instead of two consecutive frames, we propose to exploit a pair of images captured by dual RS cameras with reversed RS directions for this highly challenging task. Grounded on the symmetric and complementary nature of dual reversed distortion, we develop a novel end-to-end model, IFED, to generate dual optical flow sequence through iterative learning of the velocity field during the RS time. Extensive experimental results demonstrate that IFED is superior to naive cascade schemes, as well as the state-of-the-art which utilizes adjacent RS images. Most importantly, although it is trained on a synthetic dataset, IFED is shown to be effective at retrieving GS frame sequences from real-world RS distorted images of dynamic scenes.

SUNet: Symmetric Undistortion Network for Rolling Shutter Correction

Preprint

Full-text available

Aug 2021

The vast majority of modern consumer-grade cameras employ a rolling shutter mechanism, leading to image distortions if the camera moves during image acquisition. In this paper, we present a novel deep network to solve the generic rolling shutter correction problem with two consecutive frames. Our pipeline is symmetrically designed to predict the global shutter image corresponding to the intermediate time of these two frames, which is difficult for existing methods because it corresponds to a camera pose that differs most from the two frames. First, two time-symmetric dense undistortion flows are estimated by using well-established principles: pyramidal construction, warping, and cost volume processing. Then, both rolling shutter images are warped into a common global shutter one in the feature space, respectively. Finally, a symmetric consistency constraint is constructed in the image decoder to effectively aggregate the contextual cues of two rolling shutter images, thereby recovering the high-quality global shutter image. Extensive experiments with both synthetic and real data from public benchmarks demonstrate the superiority of our proposed approach over the state-of-the-art methods.

Towards Rolling Shutter Correction and Deblurring in Dynamic Scenes

Preprint

Apr 2021

Joint rolling shutter correction and deblurring (RSCD) techniques are critical for the prevalent CMOS cameras. However, current approaches are still based on conventional energy optimization and are developed for static scenes. To enable learning-based approaches to address real-world RSCD problem, we contribute the first dataset, BS-RSCD, which includes both ego-motion and object-motion in dynamic scenes. Real distorted and blurry videos with corresponding ground truth are recorded simultaneously via a beam-splitter-based acquisition system. Since direct application of existing individual rolling shutter correction (RSC) or global shutter deblurring (GSD) methods on RSCD leads to undesirable results due to inherent flaws in the network architecture, we further present the first learning-based model (JCD) for RSCD. The key idea is that we adopt bi-directional warping streams for displacement compensation, while also preserving the non-warped deblurring stream for details restoration. The experimental results demonstrate that JCD achieves state-of-the-art performance on the realistic RSCD dataset (BS-RSCD) and the synthetic RSC dataset (Fastec-RS). The dataset and code are available at https://github.com/zzh-tech/RSCD.

Solving Rolling Shutter 3D Vision Problems using Analogies with Non-rigidity

Article

Full-text available

Jan 2021
INT J COMPUT VISION

We propose an original approach to absolute pose and structure-from-motion (SfM) which handles rolling shutter (RS) effects. Unlike most existing methods which either augment global shutter projection with velocity parameters or impose continuous time and motion through pose interpolation, we use local differential constraints. These are established by drawing analogies with non-rigid 3D vision techniques, namely shape-from-template and non-rigid SfM (NRSfM). The proposed idea is to interpret the images of a rigid surface acquired by a moving RS camera as those of a virtually deformed surface taken by a GS camera. These virtually deformed surfaces are first recovered by relaxing the RS constraint using SfT or NRSfM. Then we upgrade the virtually deformed surface to the actual rigid structure and compute the camera pose and ego-motion by reintroducing the RS constraint. This uses a new 3D-3D registration procedure that minimizes a cost function based on the Euclidean 3D point distance. This is more stable and physically meaningful than the reprojection error or the algebraic distance used in previous work. Experimental results obtained with synthetic and real data show that the proposed methods outperform existing ones in terms of accuracy and stability, even in the known critical configurations.

Object Distance Estimation Using a Single Image Taken from a Moving Rolling Shutter Camera

Article

Full-text available

Jul 2020
SENSORS-BASEL

This paper proposes a technique to estimate the distance between an object and a rolling shutter camera using a single image. The implementation of this technique uses the principle of the rolling shutter effect (RSE), a distortion within the rolling-shutter-type camera. The proposed technique has a mathematical strength compared to other single photo-based distance estimation methods that do not consider the geometric arrangement. The relationship between the distance and RSE angle was derived using the camera parameters (focal length, shutter speed, image size, etc.). Mathematical equations were derived for three different scenarios. The mathematical model was verified through experiments using a Nikon D750 and Nikkor 50 mm lens mounted on a car with varying speeds, object distances, and camera parameters. The results show that the mathematical model provides an accurate distance estimation of an object. The distance estimation error using the RSE due to the change in speed remained stable at approximately 10 cm. However, when the distance between the object and camera was more than 10 m, the estimated distance was sensitive to the RSE and the error increased dramatically.

Rolling Shutter Homography and its Applications

Article

Mar 2020
IEEE T PATTERN ANAL

In this article we study the adaptation of the concept of homography to Rolling Shutter (RS) images. This extension has never been clearly adressed despite the many roles played by the homography matrix in multi-view geometry. We first show that a direct point-to-point relationship on a RS pair can be expressed as a set of 3 to 8 atomic 3x3 matrices depending on the kinematic model used for the instantaneous-motion during image acquisition. We call this group of matrices the RS Homography. We then propose linear solvers for the computation of these matrices using point correspondences. Finally, we derive linear and closed form solutions for two famous problems in computer vision in the case of RS images: image stitching and plane-based relative pose computation. Extensive experiments with both synthetic and real data from public benchmarks show that the proposed methods outperform state-of-art techniques.

Example of rolling shutter imagery.

Contexts in source publication

Similar publications

Citations