Fig 4 - uploaded by Jian-Guang Lou
Content may be subject to copyright.
Geometry of pinhole camera model. (a) Gometric relation between a point P in 3-D space and its projective point P in the retinal plane. (b) Coordinate relation between P and P .  

Geometry of pinhole camera model. (a) Gometric relation between a point P in 3-D space and its projective point P in the retinal plane. (b) Coordinate relation between P and P .  

Source publication
Article
Full-text available
Many existing multiview image/video coding techniques remove inter-viewpoint redundancy by applying disparity compensation in a conventional video coding framework, e.g., H.264/MPEG-4 AVC. However, conventional methodology works ineffectively as it ignores the special characteristics of inter-viewpoint disparity. In this paper, we propose a geometr...

Context in source publication

Context 1
... computer vision and optical measurement, the well known pinhole camera model [16] (Fig. 4) is often used to describe the geometric relationship of image formation. In the pinhole model, the geometric process for image formation is completely determined by a perspective projection center and a retinal plane. The projection of a scene point can be modeled ...

Similar publications

Conference Paper
Full-text available
Multi-view video coding exploits inter-view redundancies to compress the video streams and their associated depth information. These techniques utilize disparity estimation techniques to obtain disparity vectors (DVs) across different views. However, these methods contribute to the majority of the computational power needed for multi-view video enc...
Conference Paper
Full-text available
With the growing demands for 3D and multi-view video content, efficient depth data coding becomes a vital issue in image and video coding field. In this paper, we propose a simple depth coding scheme using multiple prediction modes exploiting temporal correlation of depth map. Current depth coding techniques mostly depend on intra-coding mode that...
Article
Full-text available
In this paper, a novel skeleton-based approach to human time-varying mesh (H-TVM) compression is presented. The topic of TVM compression is new and has many challenges, such as handling the lack of obvious mapping of vertices across frames and handling the variable connectivity across frames, while maintaining efficiency, which are the most importa...

Citations

... MVC performs ME and DE along with rate-distortion optimization (RDO) to achieve a high coding efficiency; however, the computational complexity of these methods is extremely high. Consequently, several fast algorithms have been proposed to address these problems (Xu and He 2008;Zhu et al. 2010a;Li et al. 2007Li et al. , 2008San et al. 2007;Lu et al. 2007;Shen et al. 2009Shen et al. , 2010Shen et al. , 2011Peng et al. 2008;Kuo et al. 2010;Pan et al. 2015). The works in Xu and He (2008) and Zhu et al. (2010a) were aimed at accelerating the encoding time for DE by estimating the search range. ...
... Zhu et al. selected a search center by using the spatio-temporal correlation of the disparity field and predicted the search range adaptively according to the temporal variation of the disparity field (Zhu et al. 2010a). Some studies are based on the camera geometry that can significantly reduce the encoding time for DE or both DE and ME; however, additional information of the multi-view camera parameters is required (Li et al. , 2008San et al. 2007;Lu et al. 2007). Li et al. proposed a fast inter-frame prediction algorithm that contained two stages ). ...
Article
Full-text available
On-site multiview video viewing system can provide people with more immersive viewing experience on various live video shows, such as sports and concerts. To address the high computational complexity of multi-view video coding, several researches have worked on fast motion estimation, fast disparity estimation, and/or fast mode decision. In this paper, we propose a low-complexity encoding scheme that can further exploit inter-view dependencies and depth information. A fast mode selection with threshold-based early termination is presented to select a minimal set of mode candidates according to the reference macroblocks of neighbouring views and expedite the mode decision process. Further, this study also demonstrates that disparity estimation computation and search ranges of both motion estimation and disparity estimation can be significantly refined by using inter-view dependencies and depth information. The experimental results show that the proposed encoding method reduces the coding time by 80% with a negligible PSNR loss.
... As an RGB-D sensor can provide a continuous measurement of the 3D structure of the environment, the relative pose between two RGB-D sensors can be estimated through explicit matching of surface geometries in the overlapping regions within their FoVs. A variety of algorithms have been proposed to determine whether multiple cameras are looking at the same scene, such as vision-based [27,51] or geometry-based [29,52] methods. Here, we assume that the sensors use one of these approaches to detect whether they are observing the same scene. ...
Article
Full-text available
In this paper, the Relative Pose based Redundancy Removal (RPRR) scheme is presented, which has been designed for mobile RGB-D sensor networks operating under bandwidth-constrained operational scenarios. The scheme considers a multiview scenario in which pairs of sensors observe the same scene from different viewpoints, and detect the redundant visual and depth information to prevent their transmission leading to a significant improvement in wireless channel usage efficiency and power savings. We envisage applications in which the environment is static, and rapid 3D mapping of an enclosed area of interest is required, such as disaster recovery and support operations after earthquakes or industrial accidents. Experimental results show that wireless channel utilization is improved by 250% and battery consumption is halved when the RPRR scheme is used instead of sending the sensor images independently.
... As an RGB-D sensor can provide a continuous measurement of the 3-D structure of the environment, the relative pose between two RGB-D sensors can be estimated through explicit matching of surface geometries in the overlapping regions within their FoVs. A variety of algorithms have been proposed to determine whether multiple cameras are looking at the same scene, such as vision-based [23], [41], or geometry-based [42], [43] methods. Here, we assume that the sensors use one of these approaches to detect whether they are observing the same scene. ...
Preprint
Full-text available
The Relative Pose based Redundancy Removal(RPRR) scheme is presented, which has been designed for mobile RGB-D sensor networks operating under bandwidth-constrained operational scenarios. Participating sensor nodes detect the redundant visual and depth information to prevent their transmission leading to a significant improvement in wireless channel usage efficiency and power savings. Experimental results show that wireless channel utilization is improved by 250% and battery consumption is halved when the RPRR scheme is used instead of sending the sensor images independently.
... High order prediction (HOP) models, e.g., using geometric transformations with more DoF, have been studied during the last two decades in traditional 2D and 3D image coding scenarios. Several geometric models, like translation, rotation, scale, shear and perspective changes have been used to improve the coding efficiency, by exploiting spatial [7], temporal [8]- [13] and inter-view [14]- [17] redundancy. In most proposals, these models have been applied image-wise (instead of blockwise), due to two main reasons: (i) high computational complexity in block-wise model parameter estimation, and (ii) significant additional bit rate required for parameter transmission. ...
... The Affine GT can be defined by any three of the four vectors ( ⃗ ). In this paper, the first three vectors, ⃗ 0 , ⃗ 1 and ⃗ 2 are generated using the second stage of the proposed approach, where the remaining vector, ⃗ 3 , is calculated assuming 3 = 3 = 0, thus resulting in ⃗ 3 = ( 0 − 1 + 2 , 0 − 1 + 2 − ( − 1)).Using (14), the individual parameters in the submatrices can then be calculated by (15) for the Projective GT: ...
Article
Full-text available
This paper proposes a two-stage high order intra block prediction method for light field image coding. This method exploits the spatial redundancy in lenslet light field images by predicting each image block, through a geometric transformation applied to a region of the causal encoded area. Light field images comprise an array of micro-images that are related by complex geometric transformations that cannot be efficiently compensated by state-of-the-art image coding techniques, which are usually based on low order translational prediction models. The two-stage nature of the proposed method allows to choose the order of the prediction model most suitable for each block, ranging from pure translations to projective or bilinear transformations, optimized according to an appropriate rate-distortion criterion. The proposed higher order intra block prediction approach was integrated into an HEVC codec and evaluated for both unfocused and focused light field camera models, using different resolutions and microlens arrays. Experimental results show consistent bitrate savings, which can go up to 12.62%, when compared to a lower order intra block prediction solution and 49.82% when compared to HEVC still picture coding.
... Several fast algorithms that are specifically designed for DE in MVC, were proposed in literature. San et al [13] employed an epipolar-based fast DE algorithm that greatly reduces the search range by searching only around the epipolar lines where the optimal Disparity Vectors (DVs) should lie. Kim et al [14] utilized the geometry of the camera arrangements to determine the reliability of the DV and the Motion Vector (MV) predictors based on their relationship, and depending on the accuracy of both predictors, adaptively adjusted the search areas for both the DE and the ME, respectively. ...
Article
Full-text available
3D video coding for transmission exploits the Disparity Estimation (DE) to remove the inter-view redundancies present within both the texture and the depth map multi-view videos. Good estimation accuracy can be achieved by partitioning the macro-block into smaller subblocks partitions. However, the DE process must be performed on each individual sub-block to determine the optimal mode and their disparity vectors, in terms of ratedistortion efficiency. This vector estimation process is heavy on computational resources, thus, the coding computational cost becomes proportional to the number of search points and the inter-view modes tested during the rate-distortion optimization. In this paper, a solution that exploits the available depth map data, together with the multi-view geometry, is proposed to identify a better DE search area; such that it allows a reduction in its search points. It also exploits the number of different depth levels present within the current macro-block to determine which modes can be used for DE to further reduce its computations. Simulation results demonstrate that this can save up to 95% of the encoding time, with little influence on the coding efficiency of the texture and the depth map multi-view video coding. This makes 3D video coding more practical for any consumer devices, which tend to have limited computational power.
... In addition, the lifting implementation [9] has been used to ensure invertibility and reduce the complexity. A number of proposed approaches [10], [11], [12], [13] exploit the geometrical structure in the scene by 3D modeling prior to encoding the data so that the compression efficiency is improved. The bit rate allocation problem between texture and scene geometry has, for example, been studied in [14] and [15], where the geometry is defined using a per pixel disparity map. ...
Article
In this paper, we present a novel wavelet-based compression algorithm for multiview images. This method uses a layer-based representation, where the 3-D scene is approximated by a set of depth planes with their associated constant disparities. The layers are extracted from a collection of images captured at multiple viewpoints and transformed using the 3-D discrete wavelet transform (DWT). The DWT consists of the 1-D disparity compensated DWT across the viewpoints and the 2-D shape-adaptive DWT across the spatial dimensions. Finally, the wavelet coefficients are quantized and entropy coded along with the layer contours. To improve the rate-distortion performance of the entire coding method, we develop a bit allocation strategy for the distribution of the available bit budget between encoding the layer contours and the wavelet coefficients. The achieved performance of our proposed scheme outperforms the state-of-the-art codecs for several data sets of varying complexity.
... To reduce the complexity of finding the best matching, the multiview geometry is employed in [7] to predict the disparity values, but only multiview image coding is considered. However, the translational inter-view motion assumed by the disparity compensation method could not accurately represent the geometry relationships between different cameras; therefore, this method is not always efficient. ...
... When the rate increases, the gain of our method over JMVC starts to diminish, because the prediction residual signal produces most of the output bits. The same phenomenon also exists in other MVC algorithms [7]- [9], [12]. ...
Article
Full-text available
In this paper, we first develop improved projective rectification-based view interpolation and extrapolation methods, and apply them to view synthesis prediction-based multiview video coding (MVC). A geometric model for these view synthesis methods is then developed. We also propose an improved model to study the rate-distortion (R-D) performances of various practical MVC schemes, including the current joint multiview video coding standard. Experimental results show that our schemes achieve superior view synthesis results, and can lead to better R-D performance in MVC. Simulation results with the theoretical models help explaining the experimental results.
... Only moving objects have motion displacements, while the objects in background often have no motion displacements. However, the inter-view disparity is different from the temporal motion which has been widely used in the single view video coding algorithms [11] [12]. For multiview video data, the inter-view disparity is dependent on the depth of object and the camera setup. ...
... Thus, the DE accuracy of anchor frames will affect to the following non-anchor frames. In order to obtain a high picture quality, a larger R MIN value, γ1, is needed in (12) to enhance the search accuracy of anchor frames. ...
Article
Full-text available
Disparity estimation is adopted by multiview video coding (MVC) to reduce the inter-view redundancy. However, it consumes enormous computational load. In this paper, a fast disparity estimation is proposed by using the spatio-temporal correlation and the temporal variation of disparity field. For each macroblock, a temporal prediction of the disparity vector is calculated first by utilizing the smoothed disparity field of the previous coded frame. Then, the search center is selected among the candidates obtained from spatio-temporal neighboring disparity vectors, and deemed to be a preliminary disparity vector. Finally, the search range is predicted adaptively by using the distance between the search center and the temporal prediction of the disparity vector, and then the search is implemented in a limited range. The distance represents the temporal variation of the disparity vector. As compared to the full search algorithm in MVC reference software, experimental results show that an average 96% reduction of the computational complexity is achieved, while the rate-distortion performance remains the same. As compared to the fast search algorithm in MVC reference software, experimental results show that an average 43% reduction of the computational complexity is achieved, and the rate-distortion performance of the proposed algorithm has been improved.
... We thus increase the resolution of the down-scaled regions by a factor of √ mo to compensate the loss of resolution, as in [9]. Note that basic epipolar geometry is also exploited in [15], but it assumes that the fundamental matrix is known, and it only uses the theory to predict the 2-D disparity vectors, whereas our scheme uses the epipolar geometry for rectification, 1-D disparity estimation, as well as view interpolation. In addition, [15] only considers multiview image coding instead of multiview video coding. ...
... Note that basic epipolar geometry is also exploited in [15], but it assumes that the fundamental matrix is known, and it only uses the theory to predict the 2-D disparity vectors, whereas our scheme uses the epipolar geometry for rectification, 1-D disparity estimation, as well as view interpolation. In addition, [15] only considers multiview image coding instead of multiview video coding. ...
Conference Paper
Full-text available
A projective rectification-based view interpolation algorithm is developed for multiview video coding and free viewpoint video. It first calculates the fundamental matrix between two views without using any camera parameter. The two views are then resampled to have horizontal and matched epipolar lines. One-dimensional disparity is estimated next, which is used to interpolate the image for an intermediate viewpoint. After unrectification, the interpolated view can be displayed directly for free viewpoint video purpose. It can also be used as a reference to encode data of an intermediate camera. Experimental results show that the interpolated views can be 3 dB better than existing method. Video coding results illustrate that the method can provide up to 1.3 dB improvement over JMVC.
... The creation and the transmission of autostereoscopic content has to be thought with the broadcast constraints, and especially with two of them: the adaptivity with respect to the different receiver capabilities (size, number of views, depth perception, etc.) and the backward compatibility allowing to extract the 2D information for existing 2D displays. Among the various studies123456, recent researches give much attention to 3DTV [7], more specifically to depth image-based rendering (DIBR) approaches. Indeed, DIBR technique has been recognized as a promising tool which can synthesize some new " virtual " views from the so-called video-plus-depth data representation, instead of using the former 3DTV proposals, such as 3D models or stereoscopic images. ...
Article
Full-text available
The video-plus-depth data representation uses a regular texture video enriched with the so-called depth map, providing the depth distance for each pixel. The compression efficiency is usually higher for smooth, gray level data representing the depth map than for classical video texture. However, improvements of the coding efficiency are still possible, taking into account the fact that the video and the depth map sequences are strongly correlated. Classically, the correlation between the texture motion vectors and the depth map motion vectors is not exploited in the coding process. The aim of this paper is to reduce the amount of information for describing the motion of the texture video and of the depth map sequences by sharing one common motion vector field. Furthermore, in the literature, the bitrate control scheme generally fixes for the depth map sequence a percentage of 20% of the texture stream bitrate. However, this fixed percentage can affect the depth coding efficiency, and it should also depend on the content of each sequence. We propose a new bitrate allocation strategy between the texture and its associated per-pixel depth information. We provide comparative analysis to measure the quality of the resulting 3D+t sequences.