Table 4 - uploaded by Miguel Bordallo Lopez
Content may be subject to copyright.
Average power consumption of the LBP calculations.

Average power consumption of the LBP calculations.

Source publication
Article
Full-text available
The future multi-modal user interfaces of battery-powered mobile devices are expected to require computation-ally costly image analysis techniques. The use of Graphic Processing Units for computing is very well suited for parallel processing and the addition of programmable stages and high precision arithmetic provide for opportu-nities to implemen...

Similar publications

Conference Paper
Full-text available
The forces and torques due to atmospheric drag and solar radiation pressure (SRP) acting on complex and articulated space objects are efficiently calculated by utilizing the highly parallelized hardware available in commodity desktop PC graphics processing units. The calculations are performed by combining traditional OpenGL rendering of 3D models...
Article
Full-text available
With the recent advances in the programmability and performance of mobile Graphics Processing Units (GPUs), General-Purpose Graphics Processing Unit (GPGPU) technologies have become available even in mobile devices such as smartphones and tablets. Among the available GPGPU technologies for mobile devices, Open Computing Language (OpenCL) and Render...
Conference Paper
Full-text available
The forces and torques due to atmospheric drag and solar radiation pressure (SRP) acting on complex and articulated space objects are efficiently calculated by utilizing the highly parallelized hardware available in commodity desktop PC graphics processing units. The calculations are performed by combining traditional OpenGL rendering of 3D models...

Citations

... Computer vision on embedded/ IOT devices are also evolving (Bordallo López et al., 2011;Nitsche and De Cristóforis, 2012;Cavus et al., 2014;Lee et al., 2016) with the support of embedded GPUs for general-purpose programming models. However, to the best of our knowledge, there is no available generic solutions for embedded devices like Raspberry Pi (Brahmbhatt, 2013). ...
Article
Full-text available
The Ubiquitous interconnected smart devices enabled by the recent evolution of low-cost, generic, small-size, powerful computing platforms devised the term Internet of Things (IoT), which cross-cuts many areas of our modern day living. IoT applications go way beyond simple sensing and actuation to sophisticated localized processing and decision-making. The recent advances in embedded systems produced a long list of IoT boards equipped with powerful central processing units (CPUs), and graphics processing units (GPUs). Unfortunately, even with the limited energy consumption and high processing power of such GPUs, CPUs are usually the only computational element utilized by the hosted applications, thus hindering the capabilities of the entire board. This is mainly due to the complicated nature of GPU-based programming. In this paper, we are presenting a case study showing the effect of offloading the computationally intensive part of a latency-sensitive educational game to a low-cost Raspberry Pi’s GPU, thus enabling the board to seamlessly host the entire game operations. Relying mainly on the boards CPU shows very long interaction latency i.e., 4.82 seconds. By efficiently leveraging the powerful coprocessor, VideoCore GPU, we are able to significantly improve the interaction latency to a fraction of a second, making the game conveniently playable.
... Extracting image features is to find out the features belonging to the image itself from the substitution matching image to complete the matching with the template image. The features of [16] include color, texture, shape, and spatial features. There are different ways to extract each feature. ...
Article
Full-text available
Abstract Intelligent transportation system needs to solve the main problems in traffic safety. This paper focuses on the traffic safety caused by fatigue driving based on image recognition of key technologies for research and analysis. This paper proposes that the location of face and facial feature points and the classification of fatigue detection are the key links to determine the fatigue driving detection rate. In the analysis of face localization algorithm based on skin color modeling, a corner-based optimization method is proposed to optimize the face region. Based on the analysis of the binary algorithm of human eye localization algorithm, a bi-directional integral projection method is proposed to achieve accurate human eye localization. Then the commonly used fatigue classification algorithm (KNN algorithm) is analyzed. Finally, the proposed method is verified by the simulation test of fatigue driving. Experimental results show that the algorithm based on skin color modeling can accurately locate the driver’s face region. The eye location algorithm based on the two-valued algorithm can also locate the eye location of the tester accurately. The accuracy of KNN fatigue detection model is 87.82%. It can identify driver’s fatigue state with high accuracy.
... However, Leskela et al [12] proposed a heterogeneous computing method in which they analyzed the frame timing and introduced an optimization technique for work scheduling. Lopez et al [13] and Cheng et al [14] implemented image and face recognition systems on mobile platforms. The systems, which were implemented with CPU and GPU operations combined, exhibited 2x to 10x faster execution times for face recognition algorithms. ...
Article
On mobile devices, image sequences are widely used for multimedia applications such as computer vision, video enhancement, and augmented reality. However, the real-time processing of mobile devices is still a challenge because of constraints and demands for higher resolution images. Recently, heterogeneous computing methods that utilize both a central processing unit (CPU) and a graphics processing unit (GPU) have been researched to accelerate the image sequence processing. This paper deals with various optimizing techniques such as parallel processing by the CPU and GPU, distributed processing on the CPU, frame buffer object, and double buffering for parallel and/or distributed tasks. Using the optimizing techniques both individually and combined, several heterogeneous computing structures were implemented and their effectiveness were analyzed. The experimental results show that the heterogeneous computing facilitates executions up to 3.5 times faster than CPU-only processing.
... There are some image processing studies on mobile programming. Parallel programming based on GPU is a popular topic to improve the speed of computation for image processing on mobile programming [13][14]. DERMA/care is an online mobile application for skin cancer detection that is implemented using multimodal images [15]. ...
Conference Paper
Full-text available
The pattern of signature and handwriting are unique, so they can be utilised as an authentication system. This research proposed a method of signature and handwriting recognition on a mobile device using the Gray Level Co-occurrence Matrix for texture-based feature extraction and the Bootstrap for performing single classifier model. The proposed method is successfully implemented in the offline and online application. The offline experiment of signature and handwriting from the same user produces accuracy 100%. In a cross evaluation using different users as model and target, the experiment performs accuracy around 34% and 44% for signature and handwriting data, respectively. In the case study of the training and testing data from the same user on mobile devices, the experiment using stylus and finger produces accuracy 84.62% and 88.46% respectively for online signature recognition, and 70% and 90% for online handwriting recognition.
... Many authors are attempting to increase power eciency of heterogeneous architectures by dividing workloads between the heterogeneous processing elements [3,9,12,14,21]. These target mobile SoCs such as the Tegra 2 [3,21], Samsung S4, Samsung Note II, Google Nexus 7 and Tegra 250 [14], Tegra 3 [9] and Texas Instruments' OMAP 3530 platform [12]. ...
... Many authors are attempting to increase power eciency of heterogeneous architectures by dividing workloads between the heterogeneous processing elements [3,9,12,14,21]. These target mobile SoCs such as the Tegra 2 [3,21], Samsung S4, Samsung Note II, Google Nexus 7 and Tegra 250 [14], Tegra 3 [9] and Texas Instruments' OMAP 3530 platform [12]. A typical application area is SIFT [9,14], but Huang and Lai [9] also experiment with BLAS benchmarks, mobile face recognition [3,21] and face tracking [12]. ...
... These target mobile SoCs such as the Tegra 2 [3,21], Samsung S4, Samsung Note II, Google Nexus 7 and Tegra 250 [14], Tegra 3 [9] and Texas Instruments' OMAP 3530 platform [12]. A typical application area is SIFT [9,14], but Huang and Lai [9] also experiment with BLAS benchmarks, mobile face recognition [3,21] and face tracking [12]. The common approach in these studies is to ooad certain computation blocks entirely to the on-board GPU using the OpenGL ES graphics library. ...
Conference Paper
Energy efficiency is a timely topic for modern mobile computing. Reducing the energy consumption of devices not only increases their battery lifetime, but also reduces the risk of hardware failure. Many researchers strive to understand the relationship between software activity and hardware power usage. A recurring strategy for saving power is to reduce operating frequencies. It is widely acknowledged that standard frequency scaling algorithms generally overreact to changes in hardware utilisation. More recent and original efforts attempt to balance software workloads on heterogeneous multicore architectures, such as the Tegra K1, which includes a quad-core CPU and a CUDA-capable GPU. However, it is not known whether it is possible to utilise these processor elements in parallel to save energy. Research into these types of systems are unfortunately often evaluated with the Performance Per Watt (PPW) metric, which is an unaccurate method because it ignores constant power usage from idle components. We show that this metric can end up increase energy usage on the Tegra K1, and give a false impression of how such systems consume energy. In reality, we show that it is much harder to save energy by balancing workloads between the heterogeneous cores of the Tegra K1, where we demonstrate only a 5% energy saving by offloading 10% DCT workload from the GPU to the CPU. Significantly more energy can be saved (up to 50 %) using the appropriate processor for different workloads.
... There are also various proposed approaches for vision-based pose estimation that can be realized using the cameras and GPGPUs of portable mobile devices to control the robotic manipulator [20,21]. Object recognition and classification in natural visual scenes also have numerous practical applications that can be physically realized using powerful portable mobile devices [22]. Adaptation of existing desktop-based GPU implementations on portable mobile devices is a challenging task because of their fewer cores and reduced memory resources [23]. ...
... This is because the embedded controllers discussed in previous sections are suitable for single instruction single data (SISD) arithmetic and logical operations while the GPU is suitable for SIMD executions. The GPU can execute required arithmetic operations on a large set of data with higher speed and lower power consumption [22,40]. The poor performance of GPU (for a single test point) is because of overhead related to the data transfer and kernel launch. ...
Article
Robotic controllers have to execute various complex independent tasks repeatedly. Massive processing power is required by the motion controllers to compute the solution of these computationally intensive algorithms. General-purpose graphics processing unit (GPGPU)-enabled mobile phones can be leveraged for acceleration of these motion controllers. Embedded GPUs can replace several dedicated computing boards by a single powerful and less power-consuming GPU. In this paper, the inverse kinematic algorithm based numeric controllers is proposed and realized using the GPGPU of a handheld mobile device. This work is the extension of a desktop GPU-accelerated robotic controller presented at DAS'16 where the comparative analysis of different sequential and concurrent controllers is discussed. First of all, the inverse kinematic algorithm is sequentially realized using Arduino-Due microcontroller and the field-programmable gate array (FPGA) is used for its parallel implementation. Execution speeds of these controllers are compared with two different GPGPU architectures (Nvidia Quadro K2200 and Nvidia Shield K1 Tablet), programmed with Compute Unified Device Architecture (CUDA) computing language. Experimental data shows that the proposed mobile platform-based scheme outperforms the FPGA by 5× and boasts a 100× speedup over the Arduino-based sequential implementation.
... GPU computing has been utilized for object recognition as well. For instance, Harvey [43] has implemented a multi-GPU multi-class SVM classifier, López et al. [45] have used LBP features and boosting, Uetz and Behnke [46] and Ciresan et al. [47] have utilized neural networks, Kim et al. [48] have adapted scale-invariant feature transform (SIFT) while Cornelis and Gool [49] have used speeded-up robust features for this task. ...
... All the above solutions, except for [27,28,38] (which utilize the OpenCL framework), [45] (which uses OpenGL) and [26] (which uses the Cm platform [50]), have been developed using the CUDA platform (it should be noted that in a few instances, other frameworks, such as OpenCL [37], OpenGL [25] and OpenMP and SSE [48], have been employed in conjunction with CUDA). All these approaches have been reported to achieve higher speeds than the corresponding CPU-only implementations even though in most cases the CPU has been optimized for performance. ...
... It should be noted that very few of these systems (e.g., [7,45]) have been developed for mobile platforms since a mobile GPU usually has fewer cores, lower memory bandwidth, and variant architecture when compared to the desktop GPUs [7]. However, since the powerful GPU on the Project Tango Tablet Development Kit significantly allays these limitations, we believe that there is a compelling need to adapt and test some of the GPU-accelerated algorithms mentioned above for face and object detection and recognition on this newly introduced mobile device, especially given the huge implications that this will have for developing real-time assistive solutions for disabled individuals, particularly those with visual impairments. ...
Article
Full-text available
An application for the recently introduced Google Project Tango Tablet Development Kit to assist visually impaired (VI) users in understanding their environmental context by identifying and locating multiple faces and objects in their vicinity in real-time is presented. CUDA-based GPU-accelerated algorithms would be utilized to detect and recognize faces and objects from the visual data, while the locations of these entities relative to the user would be estimated from the depth data acquired via the tablet. The interaction would be speech based with the user being offered several options for requesting information about the identities and/or relative locations of face and objects. The aim is to create a portable, affordable, power-efficient, standalone assistive application to increase the autonomy of VI users which can run in real time on the device itself.
... Training has the largest computational and memory complexity; however, it can be performed offline on dedicated server infrastructures. Conversely, depending on the application constraints, the actual image classification may have to be performed online on the same embedded device used to acquire the image (e.g., on a smart phone) [8][9][10]. Recent embedded devices boast heterogeneous CPU-GPU architectures and are suited for challenging applications, such as robotics, control and image processing [3,4,11,12]. ...
Article
Full-text available
Deep convolutional neural networks achieve state-of-the-art performance in image classification. The computational and memory requirements of such networks are however huge, and that is an issue on embedded devices due to their constraints. Most of this complexity derives from the convolutional layers and in particular from the matrix multiplications they entail. This paper proposes a complete approach to image classification providing common layers used in neural networks. Namely, the proposed approach relies on a heterogeneous CPU-GPU scheme for performing convolutions in the transform domain. The Compute Unified Device Architecture(CUDA)-based implementation of the proposed approach is evaluated over three different image classification networks on a Tegra K1 CPU-GPU mobile processor. Experiments show that the presented heterogeneous scheme boasts a 50× speedup over the CPU-only reference and outperforms a GPU-based reference by 2×, while slashing the power consumption by nearly 30%.
... Pemrograman berbasis GPU banyak dikembangkan untuk meningkatkan kecepatan proses algoritma pengolahan citra dalam perangkat mobile [12] [13]. ...
Technical Report
Full-text available
Tanda tangan dan tulisan tangan memiliki pola yang unik dan dapat digunakan sebagai identitas biometrik. Pada penelitian ini, pengenalan pola tanda tangan dan tulisan tangan dilakukan menggunakan teknik ekstraksi fitur Gray Level Co- occurrence Matrix dan teknik boostrap untuk memperoleh fitur model dari data latih. Aplikasi pengenalan tanda tangan dan tulisan tangan dikembangkan sebagai aplikasi mobile device pada smartphone berbasis Android. Keseluruhan komputasi dilakukan secara offline pada mobile device. Pengujian diakukan pada 61 responden yang dibagi menjadi pengujian tanda tangan dan tulisan tangan menggunakan input dari stylus dan jari. Terdapat empat skema pengujian yaitu pengujian pertama dan kedua dilakukan menggunakan data latih dan data uji dari responden yang sama sedangkan pengujian ke tiga dan ke empat dilakukan menggunakan data latih dan data uji dari responden yang beberda. Masing-masing pengujian dilakukan oleh 26 responden sebagai data latih. Pengenalan pola dibagi dalam kategori skor lemah, skor medium dan skor kuat dengan threshold nilai similarity. Hasil eksperimen pada pengenalan pola tanda tangan diperoleh rata-rata nilai similarity sebesar 0.6 pada pengujian dengan responden yang sama. Hasil pengujian tanda tangan pada responden yang berbeda diperoleh tingkat pengenalan pada katogori skor kuat sebesar 0%. Sedangkan eksperimen pada pengenalan pola tulisan tangan diperoleh rata-rata nilai similarity 0.58 pada pengujian dengan responden yang sama. Hasil pengenalan tulisan tangan pada responden yang berbeda diperoleh hasil pengenalan pada kategori skor kuat sebesar 12.50% dan 20.20% untuk masing-masing input menggunakan stylus dan jari
... Future research will consider the addition of gender classification, expression recognition, and heart-rate measurements. In addition, the inclusion of a parallel pipeline that makes use of GP-GPU capabilities [3], will further reduce the latency and power consumption of the face analysis process. ...
Data
Face detection and recognition are key componentsin multiple camera-based devices and applications. Smart glassesare a type of optical head mounted displays that integrate first-person cameras and hands free displays with immediate access toprocessing power able to analyze first person images in real timewith hands free operation. In this context, we have constructedan application prototype that detects and recognizes faces inreal-time, and runs independently on the device. We provide adescription of the embedded implementation at a system-levelwhere we highlight the application development challenges andtrade-offs that need to be dealt with battery powered wearabledevices. The implementation includes a parallel pipeline thatreduces the latencies of the application.