Figure - available from: Cluster Computing
This content is subject to copyright. Terms and conditions apply.
Calculation of median filters, for nine pixels case

Calculation of median filters, for nine pixels case

Source publication
Article
Full-text available
Many image filtering techniques can be accelerated with compute unified device architecture (CUDA)-based massively parallel implementations. In this paper, we show the major issues on our acceleration techniques, and also its implementation details. We implemented various image filtering operations in our own CUDA kernel programs, and they are comb...

Citations

... The quality of real-time realistic rendering is often denoted highly dependent on approximating the indirect lighting in the scene [5,6]. Due to its excessive amount of computation, the game industry has been heavily investing in efficient and scalable illumination methods for decades. ...
Article
Full-text available
Graphical user experiences are now ubiquitous features, and therefore widespread. Specifically, the computer graphics field and the game industry have been continually favoring the ambient occlusion post-processing method for its superb indirect light approximation and its effectiveness. Nonetheless of its canonical performance, its operation on non-occluded surfaces is often seen redundant and unfavorable. In this paper, we propose a new perspective to handle such issues by highlighting the corners where ambient occlusion is likely to occur. Potential illumination occlusions are highlighted by checking the corners of the surfaces in the screen-space. Our algorithm showed feasibility for renderers to avoid unwanted computations by achieving performance improvements of 15% to 28% acceleration, in comparison to the previous works.
... In computer graphics field, interactive realistic graphics results have long been pursued [17,18]. Ray tracing is one of them, but it has only recently been applied and commercialized due to its immense computing cost in real-time. ...
... Recent advancement of graphics hardware has allowed a great leap towards real-time ray-traced rendering [17,18]. Nvidia's latest graphics processing units RTX series are the first runner of its kind; it supports real-time ray tracing with its specialized architecture [29]. ...
Article
Full-text available
Recently, ray tracing techniques have been highly adopted to produce high quality images and animations. In this paper, we present our design and implementation of a real-time ray-traced rendering engine. We achieved real-time capability for triangle primitives, based on the ray tracing techniques on GPGPU (general-purpose graphics processing unit) compute shaders. To accelerate the ray tracing engine, we used a set of acceleration techniques, including bounding volume hierarchy, its roped representation, joint up-sampling, and bilateral filtering. Our current implementation shows remarkable speed-ups, with acceptable error values. Experimental results shows 2.5–13.6 times acceleration, and less than 3% error values for the 95% confidence range. Our next step will be enhancing bilateral filter behaviors.
... Digital filters find large-scale applications in a variety of areas which includes chemical pollution modeling [1], control system [2], speech processing [3][4][5], data validation and reconciliation [6], image processing [7], vehicle navigation and movement analysis [8], biomedical signal processing [9], etc. Therefore, many researchers have paid considerable attention on stability analysis of digital filters over the last many years [10][11][12][13][14][15][16][17][18][19]. ...
Article
Full-text available
The problem of exponential stability and the H∞ performance criterion for externally disturbed state-delayed digital filter employing fixed point arithmetic in the existence of saturation arithmetic is highlighted in this paper. A limit-cycle free condition for a class of state-delayed digital filter in the existence of external disturbance and saturation arithmetic is brought out by employing Lyapunov function. The presented result ensures the exponential stability and scales down the consequences of external disturbances to the H∞ performance index. A numerical example simulated using MATLAB linear matrix inequality control toolbox is provided to validate effectiveness of the achieved criterion.
... They have proposed an algorithm's performance was superior and enabled recognition of objects with less. Baek and Kim [2] developed various image filtering operations in our own CUDA kernel programs, This approach is combined to build an artifact-detection scheme in PCB board soldering process. It shows correctness and feasibility, with much execution speedup. ...
Article
Full-text available
The main objective of this paper is to propose a Reliable and Speedy Communication for Upstream Emergency (RESCUE) framework to create 3D remote sensing videos. The proposed framework uses low memory space on mobile devices with the help of efficient pipelining processes. Many methods for converting 2D videos to 3D have been proposed wherein developers convert high-quality 2D to 3D. We propose a method to apply the pre-classification process on either training on dataset or heuristics based on the quality of streaming attributes. The evaluation framework automatically changes remote sensing videos from 2D to 3D by using depth profile with method filter, segmentation and sharpening transformation of the videos. The traditional Hough transformation algorithm is not suitable for hardware implementation and saliency is based on the colour histogram to support slow motion object of frames which leads to inordinate delay in converting 2D to 3D. Therefore, the RESCUE framework, with its extended vanishing point and line algorithm, is applied to provide suitable solutions and overcome the drawbacks of existing techniques. The framework is a lightweight process when compared with JAVRAE, and it utilizes limited phone memory and consumes less processing time.
Article
The issue of the exponential stability of interfered discrete-time delayed systems with saturation is considered in this paper. The state saturation is constrained by a convex hull, allowing for the application of a suitable Lyapunov–Krasovskii functional to derive an exponential stability criterion. Improved summation inequalities are used to manage the sum terms in the forward difference of the Lyapunov–Krasovskii functional. The results can be used to assure the nonexistence of limit cycles in the system. Compared to previous methods, the present method leads to improved results. Two examples are given to highlight the importance of the obtained results.
Article
Full-text available
Today, many big data applications require massively parallel tasks to compute complicated mathematical operations. To perform parallel tasks, platforms like CUDA (Compute Unified Device Architecture) and OpenCL (Open Computing Language) are widely used and developed to enhance the throughput of massively parallel tasks. There is also a need for high-level abstractions and platform-independence over those massively parallel computing platforms. Recently, Khronos group announced SYCL (C++ Single-source Heterogeneous Programming for OpenCL), a new cross-platform abstraction layer, to provide an efficient way for single-source heterogeneous computing, with C++-template-level abstractions. However, since there has been no official implementation of SYCL, we currently have several different implementations from various vendors. In this paper, we analyse the characteristics of those SYCL implementations. We also show performance measures of those SYCL implementations, especially for well-known massively parallel tasks. We show that each implementation has its own strength in computing different types of mathematical operations, along with different sizes of data. Our analysis is available for fundamental measurements of the abstract-level cost-effective use of massively parallel computations, especially for big-data applications.
Article
Full-text available
With the development of image processing technology, pencil drawing has been widely used in video games and mobile phone applications. However, the existing pencil drawing algorithms require a large amount of time to convert a real picture into a pencil drawing; hence, it is difficult to apply them to real-time systems. This paper proposes a parallel fast pencil drawing generation algorithm based on graphics processing unit (GPU) to accelerate the real-time rendering process of sketch painting. The parallelism of the pencil drawing generation algorithm is identified via theoretical analysis at first. Then, sub-algorithms of the sequential algorithm are designed in parallel using the compute unified device architecture (CUDA) programming model and executed via thread-level parallel techniques. Furthermore, an optimal cache pattern of data that reduces the access time of the most frequently used data is structured using shared memory and constant memory. Finally, task-level parallelism is achieved by CUDA stream technology, which overlaps independent sub-tasks for further acceleration. On the CUDA platform, the experimental results demonstrate that the proposed parallel algorithm can achieve a significant increase in speedup. The proposed algorithm achieves a performance improvement of 448.59 times compared to the sequential algorithm, on 2560W1920-resolution images, and maintain a high degree of similarity with the real pencil paintings. Hence, the proposed algorithm is suitable for real-time pencil drawing rendering and has promising application prospects in non-photorealistic rendering.