ArticlePDF Available

A Technique for Hand Gesture Recognition on Real Time Basis

August 2018
International Journal of Computer Applications 181(9):43-46

August 2018
181(9):43-46

DOI:10.5120/ijca2018917613

Authors:

Radhika Agrawal

University of Glasgow

COMPARISON OF TECHNIQUES DISCUSSED

…

Figures - uploaded by Radhika Agrawal

Content may be subject to copyright.

Content uploaded by Radhika Agrawal

Content may be subject to copyright.

International Journal of Computer Applications (0975 – 8887)

Volume 181 – No. 9, August 2018

A Technique for Hand Gesture Recognition on Real Time

Basis

Ayushi Shrivastav

Department of Computer Science

and Engineering,

Shri Ramdeobaba College Of

Engineering and Management

Maharastra, India

Radhika Agrawal

Department of Computer Science

and Engineering,

Shri Ramdeobaba College Of

Engineering and Management

Maharastra, India

S. G. Mundada

Department of Computer Science

and Engineering,

Shri Ramdeobaba College Of

Engineering and Management

Maharastra, India

ABSTRACT

Sign gesture is a non-verbal visual language, different from

the spoken language in terms of medium of communication,

but serves the same function for hearing & speech impaired

community. Gesture Recognition, and more speciﬁcally hand

gesture recognition, is one of the typical methods used in sign

language for non-verbal communication. It is often very

difficult for the hearing & speech impaired community to

communicate their ideas and creativity to the normal humans.

This paper focuses on discussing different methods to identify

the gesture. Method for hand segmentation is discussed in

terms of the different approaches to sub-components of the

identifying the gesture. The judgement parameters are

accuracy in real time performance, processing time, processor

utilization, etc.

Keywords

Hand gesture recognition, Image processing, Human

computer interaction (HCI), K-means clustering, Hand

segmentation, hand gestures

1. INTRODUCTION

Sign language is a non-verbal language in which signs are

made by moving one or both the hands, combined with facial

expressions and postures of the body. The signs have a

particular meaning and the people who understand the

language knows what the sign stands for. It is one of widely

used communication methods by hearing & speech impaired

community. In sign language, gestures are considered to be

any speciﬁc patterns or movements of the hands, face or body

to make certain sense or meaning. In other words, gestures

can be expressed with the help of facial expressions, limb

movements or any meaningful bodily state.

A static hand gesture is a gesture which can be represented by

a single image i.e. the meaning can be delivered by particular

fixed position or posture captured in the image. The typical

input for a static gesture is a single picture representing the

gesture. A dynamic hand gesture can be said to be a series of

hand postures connected in continuous motion over a deﬁnite

period of time. These series or sequences of action are taken

from a video recorder as a video, can be real time as well, and

given as an input to the application deployed for the task of

hand gesture recognition. The video can then be processed as

frames and the gesture can be identified by applying certain

recognition patterns, such as neural network, hand

segmentation, etc.

In this paper, a methodology based on vision-based hand

gesture recognition is proposed for a dynamic input, and hand

segmentation, centroid calculation and direction tracking is

performed on the frames and a string of code bits is generated,

which is then used to recognize the gesture from already

defined database. All this is done on video by processing it

into frames, with the assumption that the continuous motion is

a gesture representing a meaning. The proposed method is

broken down into the following steps: Selection of frames,

Background segmentation using K-means algorithm, Centroid

calculation for detecting change in direction, thresh-holding

for generating string of code bits, hand gesture interpretation

and speech generation.

2. LITERATURE REVIEW

Hand gesture recognition methods can be broadly classified

into three categories: Glove-based, Depth-based and Vision-

based hand gesture recognition. [5][10] Glove-based hand

gesture recognition methods predict the gesture by data

extracted by the hardware called data glove. Data glove, also

known as wired glove, uses various sensors, for tactile sensing

and fine-motion control sensing, to capture information about

positioning of the hand and joints in order to extract the

gesture. S. Oniga & I. Orha[5] have implemented the

hardware based approach for gesture recognition, by using a

bracelet that captures the movement of the hand using

accelerometers and Field Programmable Gate Array (FPGA)

and then modeled, trained and simulated the desired network

using Neural Network Toolbox[6].

In vision based approach, a camera or video recorder is used

to capture and extract information about the hand position and

analyze it to understand the gesture. Y. Fang et. al. [2] uses

extended Adaboost method for hand detection and hand

segmentation is done by collecting the color of the hand from

neighborhood of features mean position. They further use the

scale-space feature detection to detect blob and ridge

structures, i.e. palm and ﬁnger structures.

Another method for hand gesture recognition is Depth-based

recognition. The Senz3d camera captures a RGB video frame

along with the associated depth data. Depth based

thresholding is performed to remove the background. Then

segmentation based on depth data is performed for the object

closest to the camera. This may also include pixels that belong

to the arm region. A color based filtering is performed on

these pixels to check if these actually represent the hand

pixels based on a predefined color model. If these are not

recognized to belong to the hand region, then the algorithm

waits for the next frame. [9]

Further, we explore the different methods for hand

segmentation, hand tracking, feature extraction and gesture

identification in dynamic gesture recognition.

International Journal of Computer Applications (0975 – 8887)

Volume 181 – No. 9, August 2018

Image segmentation is typically performed to locate the hand

object in image. Proper hand segmentation background is very

crucial for the overall efﬁcacy and the effectiveness of a hand

gesture recognition algorithm. The background is usually so

chosen that maximum variation in pixel intensities between

the hand image and background is observed, and any

occlusion of hand with other body parts is avoided. Hand

segmentation can indirectly be referred to as detecting Region

of Interest. S. Bhowmick, S. Kumar and A. Kumar[10] have

used a skin colour based hand segmentation technique that

exploits a hybrid HSV+YCbCr colour model. These colour

spaces have the advantage over RGB colour space in the sense

that colour intensity has to be varied individually for each

colour in RGB frame while on the other hand, H component

(hue) and Y component control the colour intensity in their

respective frames.

Another method that can be used for hand detection is

background subtraction. Rather than detecting the ROI, the

background is subtracted by applying clustering algorithms

[1]. The proposed method uses K-means algorithm for the

same.

Tracking in computer vision refers to the technique which

constantly monitors the consecutive positions/locations of the

region of interest (ROI). [10] The Region of Interest in this

case is hand we’re tracking. In order to ﬁnd the gesture

trajectory, the centroid / center of gravity (CoG) of the

segmented hand is found out ﬁrst. The centroid can be found

out by moment calculation. A moment is a gross characteristic

of a contour computed by integrating or summing over all of

the pixels of the contour.

3. PROPOSED METHOD

The problem statement is divided into two modules: Gesture

recognition, when the input is video, and Speech conversion

of the identified gesture. For Speech conversion, a text-to-

speech API is used. A combination of direction and codebit is

used to identify the gesture. The gesture recognition method

involves selecting frames, calculating direction & codebit of

the gesture, detecting change in gesture (in case of multiple

gesture) and passing the identified gesture to speech-to-text

API. The preprocessing steps involve selecting frames from

the video, detecting codebit and direction of the gesture,

calculating least square error between frames and then

identifying the gesture/string of gestures.

Fig1. Flowchart of the proposed algorithm

3.1. Splitting of video into frames

This video is preprocessed by splitting into frames and these

frames are used to calculate gestures. Each frame has a unique

frame_id, and using the frame_rate found out by inbuilt

openCV method, total number of frames in the video is

calculated. First frame per minute is selected and the proposed

algorithm is applied on the selected frames. The number of

frames is calculated by using the formula :

3.2. Hand detection & Background

elimination

For this step, K-means cluster can be used to form two

clusters, i.e. a background cluster and a hand cluster. The K-

means clustering algorithm is an iterative technique which is

used to segment the image into K clusters. An initial set of

centroid seeds are picked randomly and rest are assigned to its

closest seed. After each assignment, the assigned centroid is

updated by adding in the coordinates of the new point.

Assigning all points to a set of successively updated centroids

constitutes one iteration of k-means algorithm. Each iteration

consists of a re-assignment of all points, until no point can be

moved to a centroid closer than the one for the cluster it is

already a member of. Every time a point is re-assigned, its old

centroid must be down dated and its new centroid must be

updated.

The RGB image is converted to a black and white image by

clustering the pixels and changing the pixel’s color depending

START

INPUT VIDEO

DIVIDE IN N FRAMES

CALCULATE ERROR BETWEEN SUBSEQUENT FRAMES

CODE BIT GENERATED

APPEND CODEBITS

IF ERROR <= 60

FOR EACH FRAME

STOP

International Journal of Computer Applications (0975 – 8887)

Volume 181 – No. 9, August 2018

on its cluster. If the pixel belongs to the hand cluster, the color

of this pixel is changed to white, and if it belongs to the

background cluster, the color is changed to black. So,

applying K-means clustering, with k=2, on the image will

result in a black and white image with two clusters, one is the

hand region and the other cluster would contain the

background region. The background cluster is discarded and

hand cluster is used to perform further pre-processing steps.

Centroid Calculation & Finger Detection

Using the first frame as the input, the centroid of the hand is

calculated using image moment, which is the weighted

average of pixel’s intensities of the image[1]. The centroid is

calculated by first calculating the image moment using this

formula.

(1)

where Mij is image moment, I(x, y) is the intensity at

coordinate (x, y).

Equation (1) is used to calculate moments used for the

calculation of coordinates of the centroid.

(2)

By using equation (2), coordinates of centroid are computed.

̅ , y

̅) are the coordinate of centroid and M00 is the area for

binary image.

For peak detection, the concept of convex hull is used. The

palm is separated from the hand by creating a palm mask, i.e.

eroding the fingers away from the hand mask, by successive

erosion and dilation morphological operations. Then by using

convexity defects, the center of the hand and the number of

fingers are detected.

Direction observation

Centroid calculated for each frame is used to observe any

change in direction of the gesture. Consecutive frames and

change in centroid of the frame are observed. If change in

direction is observed, and if the change is in x co-ordinate,

and the change is positive, then the direction is taken to be

right and if it is negative, the change is taken to be left.

Similarly, the change in y coordinate is also observed and

according to the sign of the change, the direction is assigned

as up or down. This change in direction is stored and used

later to detect the gesture.

Thresh-holding and conversion into code

Now, in order to classify the raised and folded fingers i.e.

significant and insignificant peaks among the detected peaks,

the distances of the peaks from the centroid are used. For

classifying the peaks, distance of each peak from the centroid

is calculated using city block distance. Now, a threshold, Th,

is set. If the distance > Th, the code bit value would be set to 1

and if the distance < Th the value would be set to zero. These

values would indicate whether the finger is open or close. If

the value is 1, it indicates that the finger is open; and if the bit

generated is 0, the finger is considered to be closed.

A string of code bits are obtained and this indicates the code

of the input image which will be used for mapping the gesture

to the meaning of the gesture.

3.3. Finding error between frames

For each frame, error is calculated using the least squared

error formula. This calculated error is compared with its

previous frame. If the error is less than 60% it is ignored. If it

is more than 60%, it indicates that there is a change in gesture

and the next frame is treated as a different gesture and is sent

for processing of code bits.

3.4. Mapping of code with words/characters

and speech as output

The code obtained in the above step is used to find a matching

code in the stored code-word pair, where each code

corresponds to characters/words i.e. the meaning of the

gesture. The word is extracted from the code-word pair and

passed as text to speech conversion API, thus giving the

speech as output.

4. COMPARATIVE STUDY

Vision-based, glove-based and depth-based are widely

used in hand gesture recognition. But the recognition method

based on vision is hard to work well in bad conditions. And

the recognition method based on glove also has an

embarrassing situation. Although this method owns the

advantages of less input data, high speed, and it can get 3D

information about hands or fingers movement directly. It

could also recognize a lot of hand gestures on time [1]. Being

a newly developed distance measuring hardware, the depth

camera gives a depth image that could reflect the 3d feature

directly, which is not affected by the factors such as

illumination, shadow and color. Even if there is a covered part

between two objects, by using different distance information

which we’ve got from the depth image, different parts of the

covered object can be separated. But at this time, depth

camera is too expensive to apply.

However, the recognition method based on glove is not able

to leave the support of equipment. It’s impossible for users to

wear bloated gloves all the time in nature condition. This

obvious disadvantage destines its useless, so we need to

develop a new technology to solve it. Depth-based recognition

method has a high robust. And because it has the

characteristics of real-time identification and high precision, it

is a promising research direction. But depth camera based on

the technologies such as time of flight (TOF), structure light,

3d laser scanning is so expensive that its utility has been

limited.

TABLE 1: COMPARISON OF TECHNIQUES DISCUSSED

Technique

Advantages

Drawbacks

Thresholding

 Simple

 Easy

 Fast

 Sensitive to

intensity of

light

K-means

Clustering

 Independent of

Image intensity.

 Forms clusters at

run time.

 Takes

background

as Region

Interest(ROI).

Convex Hull

 Easy to

implement

 Every point in the

contour need not

be accessed

 Does not

detect fingers

that are half

folded.

Peak

Detection

using slope

 Detects more

number of

fingertips as

compared to

 Need to

access each

point in the

contour of

International Journal of Computer Applications (0975 – 8887)

Volume 181 – No. 9, August 2018

convex hull.

 Detects half

folded fingers

ROI.

5. CONCLUSION

In this study of human hand gesture recognition, hand

tracking using centroid and observing direction change were

applied on the hand to detect the gesture. From this outline

information, the co-ordinates of the centroid and the fingertips

of the hand were obtained and the intervening differences

were calculated. The time taken for detection was minimal

and almost real-time. The limited number of gesture sets that

we were able to detect, proved to be the only stumbling block,

but we hope to circumvent this problem by further refining

our algorithm in the future. Overall, this proposed method

proved to be a considerable success when compared with

standard methods in terms of accuracy. A further work can be

carried out to show the efficiency of the system in terms of

broad range of implementations.

6. REFERENCES

[1] M. Panwar (Centre for Development of Advanced

Computing, Noida), ‘Hand Gesture Recognition based on

Shape Parameters’

[2] Y. Fang et. al. 2007, ‘A REAL-TIME HAND GESTURE

RECOGNITION METHOD’

[3] T. Nguyen & H. Huynh, ‘Static Hand Gesture

Recognition Using Artificial Neural Network’, Journal of

Image and Graphics, Volume 1, No.1, March, 2013

[4] M. Quraishi et. al., ‘A Novel Human Hand Finger

Gesture Recognition U sing Machine Learning’, 2012

2nd IEEE International Conference on Parallel,

Distributed and Grid Computing

[5] S. Oniga & I. Orha, ‘Intelligent Human-Machine

Interface Using Hand Gestures Recognition’

[6] L. Chen et. al., ‘A Survey on Hand Gesture Recognition’,

2013 International Conference on Computer Sciences

and Applications

[7] M.Murugeswari (PG Scholar, Communication Systems,

Anna University,Tamil Nadu) ,S.Veluchamy (Assistant

Professor, Communication Systems, Anna

University,Tamil Nadu), ‘Hand Gesture Recognition

system for Real-Time Application’, 2014 IEEE

International Conference on Advanced Communication

Control and Computing Technologies (ICACCCT)

[8] M. Tao & L. Ma, ‘A Hand Gesture Recognition Model

Based on Semi-supervised Learning’, 2015 7th

International Conference on Intelligent Human-Machine

Systems and Cybernetics

[9] R. Agrawal & N. Gupta, ‘Real Time Hand Gesture

Recognition for Human Computer Interaction’, 2016

IEEE 6th International Conference on Advanced

Computing

[10] Sourav Bhowmick et. al, ‘Hand Gesture Recognition of

English Alphabets using Artiﬁcial Neural Network’,

2015 IEEE 2nd International Conference on Recent

Trends in Information Systems (ReTIS)

[11] S. Gawande & Prof. N. Chopde, ‘Neural Network based

Hand Gesture Recognition’, International Journal of

Emerging Research in Management &Technology

ISSN:2278-9359 (Volume-2, Issue-3)

IJCATM : www.ijcaonline.org

Real Time Conversion of Hand Gestures to Speech using Vision Based Technique

Article

Full-text available

Jul 2019

Sign Language is one of the most common approaches of communication usually used by people having hearing and speech impairment. These languages consist of well-defined set of gestures or pattern and sequence of actions that conveys meaningful words and sentences. The paper presents different algorithms and techniques for automation of single hand gesture detection and recognition using vision based methods. The paper uses basic structure of hand and properties like centroid for detecting the pattern formed by the fingers and thumb and assigning code bits i.e. converting each gesture into a set of 5 digits representation and motion is detected using movement of centroid in each frame. The paper uses techniques like K-means Clustering or Thresholding for background elimination; Convex Hull or a proposed algorithm for peak detection and text to speech API for conversion of words/sentences corresponding to gestures to speech. Combinations of different techniques like thresholding and convex hull or Clustering and proposed algorithm is implemented and results are compared.

Real-Time Recognition of Indian Sign Language

Conference Paper

Feb 2019

Real Time Hand Gesture Recognition for Human Computer Interaction

Conference Paper

Full-text available

Feb 2016

Hand Gesture Recognition of English Alphabets using Artificial Neural Network

Conference Paper

Full-text available

Aug 2015

Human computer interaction (HCI) and sign language recognition (SLR), aimed at creating a virtual reality, 3D gaming environment, helping the deaf-and-mute people etc., extensively exploit the use of hand gestures. Segmentation of the hand part from the other body parts and background is the primary need of any hand gesture based application system; but gesture recognition systems are often plagued by different segmentation problems, and by the ones like co-articulation, movement epenthesis, recognition of similar gestures etc. The principal objective of this paper is to address a few of the said problems. In this paper, we propose a method for recognizing isolated as well as continuous English alphabet gestures which is a step towards helping and educating the hearing and speech-impaired people. We have performed the classification of the gestures with artificial neural network. Recognition rate (RR) of the isolated gestures is found to be 92.50% while that of continuous gestures is 89.05% with multilayer perceptron and 87.14% with focused time delay neural network. These results, when compared with other such system in the literature, go into showing the effectiveness of the system.

Intelligent human-machine interface using hand gestures recognition

Conference Paper

Full-text available

May 2012

Due to the rapid increase of number of industrial or domestic systems that must be controlled it is clear that new, more natural methods of control are needed. This paper presents an intelligent human machine interface based on hand's gesture recognition. The gestures based control system is composed by two subsystems that communicated via radio waves. The first subsystem is a bracelet that captures the movement of the hand using accelerometers. The second subsystem is the control box on which the data processing takes place. Artificial Neural Networks (ANN) are used to add learning capabilities and adaptive behavior to intelligent interfaces that can be used even by elderly or impaired people. Field Programmable Gate Array (FPGA) implementation is an easy an attractive way for hardware implementation. The desired network is modeled, trained and simulated using Neural Network Toolbox. Many networks architecture trained with different methods could be simulated and the network that is best performing for given application is chosen for hardware implementation using System Generator tool developed by Xilinx Inc. This also allows the easy generation of Hardware Description Language (HDL) code from the system representation in Simulink. This HDL design can then be synthesized for implementation in the Xilinx family of FPGA devices.

A Hand Gesture Recognition Model Based on Semi-supervised Learning

Conference Paper

Aug 2015

The traditional vision based hand gesture recognition technology requires a lot of light environment and backgrounds. Focused on these above problems, this paper presents a new hand gesture recognition model, in which, the unsupervised sparse auto-encoder neural network model is applied to train the image patches, in order to extract the edge feature that is the weight, and the pooled features are used as the input of the classifier for classification. The fine turning for the parameter of the entire net is to improve the classification accuracy finally.

Hand gesture recognition system for real-time application

Conference Paper

May 2014

In recent years, several researches are being done to improve the means by which human to machine interaction. With the development of input devices like keyboard, mouse and pen are not sufficient due to this limitation direct use of hand gesture as an input device to provide natural human to machine interaction. The objective of this paper is to implement the vision based hand gesture recognition system to control the movement of robot. We can use of Scale invariant feature transform (SIFT) for extract the keypoint from the gesture image capture by single sensing device. Space incompatibility of SIFT keypoint causes bag of feature approach was introduced. Then use the vector quantization will map the keypoint extracted from SIFT into unified dimensional histogram vector after the K-mean clustering. The histogram vectors as an input to multiclass SVM classifier for recognize the gesture. Generate the grammar apply to the robot to control the movements (Left, Right, Straight ward, Backward, stop) of robot.

Static Hand Gesture Recognition Using Artificial Neural Network

Article

Jan 2013

Computers are widely used in all fields. However, the interaction between human and machine is done mainly through the traditional input devices like mouse, keyboard etc. To satisfy the requirements of users, computers need other ways to interact more conveniently, such as using speech or body language (e.g. gestures, posture). In this paper, we propose a new method supporting hand gesture recognition in the static form, using artificial neural network. The proposed solution has been tested with high accuracy (98%) and is promising.

A Survey on Hand Gesture Recognition

Conference Paper

Dec 2013

Hand gesture recognition has become one of the key techniques of human-computer interaction (HCI). Many researchers are devoted in this field. In this paper, firstly the history of hand gesture recognition is discussed and the technical difficulties are also enumerated. Then, we analyze the definition of hand gesture and introduce the basic principle of it. The approaches for hand gesture recognition, such as vision-based, glove-based and depth-based, are contrasted briefly in this paper. But the former two methods are too simple and not natural enough. Currently, the new finger identification and hand gesture recognition technique with Kinect depth data is the most popular research direction. Finally, we discuss the application prospective of hand gesture recognition based on Kinect.

A novel human hand finger gesture recognition using machine learning

Conference Paper

Dec 2012

Human-Computer Interaction (HCI) using intelligent artificial computing interface is a fast emerging and revolutionary field of study of computer vision. This present study is concerned with making computers responsive to human gestures and postures. In this paper a simple alternative method for hand gesture recognition system has been proposed. The system takes various fingers postures and try to recognize them using machine learning. A pattern of gestures is trained and tested to show the results using linear artificial neural network.

A Real-Time Hand Gesture Recognition Method

Conference Paper

Aug 2007

Centre for Development of Advanced Computing, Noida), 'Hand Gesture Recognition based on Shape Parameters

M Panwar

M. Panwar (Centre for Development of Advanced Computing, Noida), 'Hand Gesture Recognition based on Shape Parameters'

A Technique for Hand Gesture Recognition on Real Time Basis

Figures

Recommended publications

Scotland’s most sustainable university

Adam Smith 300 Year Anniversary – Global Reading Group Events

Adam Smith 300 Year Anniversary – Global Reading Group Events

The future with quantum

A real-time dynamic hand gesture recognition system using kinect sensor

Data fusion-based real-time hand gesture recognition with Kinect V2

Real Time Conversion of Hand Gestures to Speech using Vision Based Technique

Transfer Learning for Improved Hand Gesture Recognition with Neural Networks

Exploring Hand Gesture Recognition Techniques for Enhanced Control of Bionic Hands

Hand gesture recognition in low-intensity environment using depth images