Average power consumption of the LBP calculations.

Source publication

Accelerating image recognition on mobile devices using GPGPU

Article

Full-text available

Jan 2011

The future multi-modal user interfaces of battery-powered mobile devices are expected to require computation-ally costly image analysis techniques. The use of Graphic Processing Units for computing is very well suited for parallel processing and the addition of programmable stages and high precision arithmetic provide for opportu-nities to implemen...

GPU-Accelerated Computation of SRP Forces with Graphical Encoding of Surface Normals

Conference Paper

Full-text available

Aug 2015

The forces and torques due to atmospheric drag and solar radiation pressure (SRP) acting on complex and articulated space objects are efficiently calculated by utilizing the highly parallelized hardware available in commodity desktop PC graphics processing units. The calculations are performed by combining traditional OpenGL rendering of 3D models...

Comparison of OpenCL and RenderScript for mobile devices

Article

Full-text available

Nov 2016

With the recent advances in the programmability and performance of mobile Graphics Processing Units (GPUs), General-Purpose Graphics Processing Unit (GPGPU) technologies have become available even in mobile devices such as smartphones and tablets. Among the available GPGPU technologies for mobile devices, Open Computing Language (OpenCL) and Render...

GPU-ACCELERATED COMPUTATION OF SRP AND DRAG FORCES AND TORQUES WITH GRAPHICAL ENCODING OF SURFACE NORMALS

Conference Paper

Full-text available

Feb 2016

Unleashing the Hidden Powers of Low-cost IoT Boards: GPU-based Edutainment Case Study

Article

Full-text available

Feb 2020

The Ubiquitous interconnected smart devices enabled by the recent evolution of low-cost, generic, small-size, powerful computing platforms devised the term Internet of Things (IoT), which cross-cuts many areas of our modern day living. IoT applications go way beyond simple sensing and actuation to sophisticated localized processing and decision-making. The recent advances in embedded systems produced a long list of IoT boards equipped with powerful central processing units (CPUs), and graphics processing units (GPUs). Unfortunately, even with the limited energy consumption and high processing power of such GPUs, CPUs are usually the only computational element utilized by the hosted applications, thus hindering the capabilities of the entire board. This is mainly due to the complicated nature of GPU-based programming. In this paper, we are presenting a case study showing the effect of offloading the computationally intensive part of a latency-sensitive educational game to a low-cost Raspberry Pi’s GPU, thus enabling the board to seamlessly host the entire game operations. Relying mainly on the boards CPU shows very long interaction latency i.e., 4.82 seconds. By efficiently leveraging the powerful coprocessor, VideoCore GPU, we are able to significantly improve the interaction latency to a fraction of a second, making the game conveniently playable.

Research on key technologies of intelligent transportation based on image recognition and anti-fatigue driving

Article

Full-text available

Feb 2019
Int J Image Video Process

Abstract Intelligent transportation system needs to solve the main problems in traffic safety. This paper focuses on the traffic safety caused by fatigue driving based on image recognition of key technologies for research and analysis. This paper proposes that the location of face and facial feature points and the classification of fatigue detection are the key links to determine the fatigue driving detection rate. In the analysis of face localization algorithm based on skin color modeling, a corner-based optimization method is proposed to optimize the face region. Based on the analysis of the binary algorithm of human eye localization algorithm, a bi-directional integral projection method is proposed to achieve accurate human eye localization. Then the commonly used fatigue classification algorithm (KNN algorithm) is analyzed. Finally, the proposed method is verified by the simulation test of fatigue driving. Experimental results show that the algorithm based on skin color modeling can accurately locate the driver’s face region. The eye location algorithm based on the two-valued algorithm can also locate the eye location of the tester accurately. The accuracy of KNN fatigue detection model is 87.82%. It can identify driver’s fatigue state with high accuracy.

Analysis of Implementing Mobile Heterogeneous Computing for Image Sequence Processing

Article

Oct 2017
KSII T INTERNET INF

On mobile devices, image sequences are widely used for multimedia applications such as computer vision, video enhancement, and augmented reality. However, the real-time processing of mobile devices is still a challenge because of constraints and demands for higher resolution images. Recently, heterogeneous computing methods that utilize both a central processing unit (CPU) and a graphics processing unit (GPU) have been researched to accelerate the image sequence processing. This paper deals with various optimizing techniques such as parallel processing by the CPU and GPU, distributed processing on the CPU, frame buffer object, and double buffering for parallel and/or distributed tasks. Using the optimizing techniques both individually and combined, several heterogeneous computing structures were implemented and their effectiveness were analyzed. The experimental results show that the heterogeneous computing facilitates executions up to 3.5 times faster than CPU-only processing.

Hand signature and handwriting recognition as identification of the writer using gray level cooccurrence matrix and bootstrap

Conference Paper

Full-text available

Sep 2017

The pattern of signature and handwriting are unique, so they can be utilised as an authentication system. This research proposed a method of signature and handwriting recognition on a mobile device using the Gray Level Co-occurrence Matrix for texture-based feature extraction and the Bootstrap for performing single classifier model. The proposed method is successfully implemented in the offline and online application. The offline experiment of signature and handwriting from the same user produces accuracy 100%. In a cross evaluation using different users as model and target, the experiment performs accuracy around 34% and 44% for signature and handwriting data, respectively. In the case study of the training and testing data from the same user on mobile devices, the experiment using stylus and finger produces accuracy 84.62% and 88.46% respectively for online signature recognition, and 70% and 90% for online handwriting recognition.

Load Balancing of Multimedia Workloads for Energy Efficiency on the Tegra K1 Multicore Architecture

Conference Paper

Jun 2017

Energy efficiency is a timely topic for modern mobile computing. Reducing the energy consumption of devices not only increases their battery lifetime, but also reduces the risk of hardware failure. Many researchers strive to understand the relationship between software activity and hardware power usage. A recurring strategy for saving power is to reduce operating frequencies. It is widely acknowledged that standard frequency scaling algorithms generally overreact to changes in hardware utilisation. More recent and original efforts attempt to balance software workloads on heterogeneous multicore architectures, such as the Tegra K1, which includes a quad-core CPU and a CUDA-capable GPU. However, it is not known whether it is possible to utilise these processor elements in parallel to save energy. Research into these types of systems are unfortunately often evaluated with the Performance Per Watt (PPW) metric, which is an unaccurate method because it ignores constant power usage from idle components. We show that this metric can end up increase energy usage on the Tegra K1, and give a false impression of how such systems consume energy. In reality, we show that it is much harder to save energy by balancing workloads between the heterogeneous cores of the Tegra K1, where we demonstrate only a 5% energy saving by offloading 10% DCT workload from the GPU to the CPU. Significantly more energy can be saved (up to 50 %) using the appropriate processor for different workloads.

Comparison of GPGPU based robotic manipulator with other embedded controllers

Article

May 2017

Robotic controllers have to execute various complex independent tasks repeatedly. Massive processing power is required by the motion controllers to compute the solution of these computationally intensive algorithms. General-purpose graphics processing unit (GPGPU)-enabled mobile phones can be leveraged for acceleration of these motion controllers. Embedded GPUs can replace several dedicated computing boards by a single powerful and less power-consuming GPU. In this paper, the inverse kinematic algorithm based numeric controllers is proposed and realized using the GPGPU of a handheld mobile device. This work is the extension of a desktop GPU-accelerated robotic controller presented at DAS'16 where the comparative analysis of different sequential and concurrent controllers is discussed. First of all, the inverse kinematic algorithm is sequentially realized using Arduino-Due microcontroller and the field-programmable gate array (FPGA) is used for its parallel implementation. Execution speeds of these controllers are compared with two different GPGPU architectures (Nvidia Quadro K2200 and Nvidia Shield K1 Tablet), programmed with Compute Unified Device Architecture (CUDA) computing language. Experimental data shows that the proposed mobile platform-based scheme outperforms the FPGA by 5× and boasts a 100× speedup over the Arduino-based sequential implementation.

A GPU-accelerated real-time contextual awareness application for the visually impaired on Google’s project Tango device

Article

Full-text available

Feb 2017
J SUPERCOMPUT

Rabia Jafri

An application for the recently introduced Google Project Tango Tablet Development Kit to assist visually impaired (VI) users in understanding their environmental context by identifying and locating multiple faces and objects in their vicinity in real-time is presented. CUDA-based GPU-accelerated algorithms would be utilized to detect and recognize faces and objects from the visual data, while the locations of these entities relative to the user would be estimated from the depth data acquired via the tablet. The interaction would be speech based with the user being offered several options for requesting information about the identities and/or relative locations of face and objects. The aim is to create a portable, affordable, power-efficient, standalone assistive application to increase the autonomy of VI users which can run in real time on the device itself.

GPGPU Accelerated Deep Object Classification on a Heterogeneous Mobile Platform

Article

Full-text available

Dec 2016

Deep convolutional neural networks achieve state-of-the-art performance in image classification. The computational and memory requirements of such networks are however huge, and that is an issue on embedded devices due to their constraints. Most of this complexity derives from the convolutional layers and in particular from the matrix multiplications they entail. This paper proposes a complete approach to image classification providing common layers used in neural networks. Namely, the proposed approach relies on a heterogeneous CPU-GPU scheme for performing convolutions in the transform domain. The Compute Unified Device Architecture(CUDA)-based implementation of the proposed approach is evaluated over three different image classification networks on a Tegra K1 CPU-GPU mobile processor. Experiments show that the presented heterogeneous scheme boasts a 50× speedup over the CPU-only reference and outperforms a GPU-based reference by 2×, while slashing the power consumption by nearly 30%.

Pengembangan Aplikasi Identifikasi Biometrik Berbasis Perangkat Mobile Untuk Alternatif Sistem Keamanan Digital

Technical Report

Full-text available

Aug 2016

Tanda tangan dan tulisan tangan memiliki pola yang unik dan dapat digunakan sebagai identitas biometrik. Pada penelitian ini, pengenalan pola tanda tangan dan tulisan tangan dilakukan menggunakan teknik ekstraksi fitur Gray Level Co- occurrence Matrix dan teknik boostrap untuk memperoleh fitur model dari data latih. Aplikasi pengenalan tanda tangan dan tulisan tangan dikembangkan sebagai aplikasi mobile device pada smartphone berbasis Android. Keseluruhan komputasi dilakukan secara offline pada mobile device. Pengujian diakukan pada 61 responden yang dibagi menjadi pengujian tanda tangan dan tulisan tangan menggunakan input dari stylus dan jari. Terdapat empat skema pengujian yaitu pengujian pertama dan kedua dilakukan menggunakan data latih dan data uji dari responden yang sama sedangkan pengujian ke tiga dan ke empat dilakukan menggunakan data latih dan data uji dari responden yang beberda. Masing-masing pengujian dilakukan oleh 26 responden sebagai data latih. Pengenalan pola dibagi dalam kategori skor lemah, skor medium dan skor kuat dengan threshold nilai similarity. Hasil eksperimen pada pengenalan pola tanda tangan diperoleh rata-rata nilai similarity sebesar 0.6 pada pengujian dengan responden yang sama. Hasil pengujian tanda tangan pada responden yang berbeda diperoleh tingkat pengenalan pada katogori skor kuat sebesar 0%. Sedangkan eksperimen pada pengenalan pola tulisan tangan diperoleh rata-rata nilai similarity 0.58 pada pengujian dengan responden yang sama. Hasil pengenalan tulisan tangan pada responden yang berbeda diperoleh hasil pengenalan pada kategori skor kuat sebesar 12.50% dan 20.20% untuk masing-masing input menggunakan stylus dan jari

Face detection and recognition for smart glasses

Data

Nov 2015

Face detection and recognition are key componentsin multiple camera-based devices and applications. Smart glassesare a type of optical head mounted displays that integrate first-person cameras and hands free displays with immediate access toprocessing power able to analyze first person images in real timewith hands free operation. In this context, we have constructedan application prototype that detects and recognizes faces inreal-time, and runs independently on the device. We provide adescription of the embedded implementation at a system-levelwhere we highlight the application development challenges andtrade-offs that need to be dealt with battery powered wearabledevices. The implementation includes a parallel pipeline thatreduces the latencies of the application.

Average power consumption of the LBP calculations.

Similar publications

Citations