ArticlePublisher preview available

Deep learning-based sign language recognition system for static signs

June 2020
Neural Computing and Applications 32(2):1-12

June 2020
32(2):1-12

Authors:

Lovely Professional University

Sign language for communication is efficacious for humans, and vital research is in progress in computer vision systems. The earliest work in Indian Sign Language (ISL) recognition considers the recognition of significant differentiable hand signs and therefore often selecting a few signs from the ISL for recognition. This paper deals with robust modeling of static signs in the context of sign language recognition using deep learning-based convolutional neural networks (CNN). In this research, total 35,000 sign images of 100 static signs are collected from different users. The efficiency of the proposed system is evaluated on approximately 50 CNN models. The results are also evaluated on the basis of different optimizers, and it has been observed that the proposed approach has achieved the highest training accuracy of 99.72% and 99.90% on colored and grayscale images, respectively. The performance of the proposed system has also been evaluated on the basis of precision, recall and F-score. The system also demonstrates its effectiveness over the earlier works in which only a few hand signs are considered for recognition.

High-level general CNN architecture

…

The convolution operation

Accuracy and loss curves for training and validation datasets

…

Figures - available from: Neural Computing and Applications

This content is subject to copyright. Terms and conditions apply.

A preview of this full-text is provided by Springer Nature.

Learn more

Content available from Neural Computing and Applications

This content is subject to copyright. Terms and conditions apply.

S.I. : HYBRID ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING

TECHNOLOGIES

Deep learning-based sign language recognition system for static signs

Ankita Wadhawan

•Parteek Kumar

Received: 3 December 2018 / Accepted: 18 December 2019 / Published online: 1 January 2020

ÓSpringer-Verlag London Ltd., part of Springer Nature 2020

Abstract

Sign language for communication is efﬁcacious for humans, and vital research is in progress in computer vision systems.

The earliest work in Indian Sign Language (ISL) recognition considers the recognition of signiﬁcant differentiable hand

signs and therefore often selecting a few signs from the ISL for recognition. This paper deals with robust modeling of static

signs in the context of sign language recognition using deep learning-based convolutional neural networks (CNN). In this

research, total 35,000 sign images of 100 static signs are collected from different users. The efﬁciency of the proposed

system is evaluated on approximately 50 CNN models. The results are also evaluated on the basis of different optimizers,

and it has been observed that the proposed approach has achieved the highest training accuracy of 99.72% and 99.90% on

colored and grayscale images, respectively. The performance of the proposed system has also been evaluated on the basis

of precision, recall and F-score. The system also demonstrates its effectiveness over the earlier works in which only a few

hand signs are considered for recognition.

Keywords Sign language Data acquisition Convolutional neural network Max-pooling Softmax Optimizer

1 Introduction

Sign language is a computer vision-based complete con-

voluted language that engrosses signs shaped by the

movements of hands in combination with facial expres-

sions. It is a natural language used by people with low or

no hearing sense for communication. A sign language can

be used for communication of letters, words or sentences

using different signs of the hands. This type of communi-

cation makes it easier for hearing-impaired people to

express their views and also help in bridging the commu-

nication gap between hearing-impaired people and other

person.

Humans have been adapting to sign language to com-

municate since ancient times. Hand gestures are as ancient

as the human civilization itself [1]. Hand signs are espe-

cially useful to express any word or feeling to communi-

cate. Therefore, people around the world use signals from

hand constantly to express despite the formulation of

writing conventions.

In recent times, much research has been ongoing in

developing systems that are able to classify signs of dif-

ferent sign languages into the given class. Such systems

have found applications in games, virtual reality environ-

ments, robot controls and natural language communica-

tions. At present, the Indian Sign Language systems are in

the developing stage and no sign language recognition

system is available for recognizing signs in real time. So,

there is a need to develop a complete recognizer which

identiﬁes signs of Indian Sign Language.

The automatic recognition of human signs is a complex

multidisciplinary problem that has not yet been completely

solved. In the past years, a number of approaches were

used which involve the use of machine learning techniques

for sign language recognition. Since the advent of deep

learning techniques, there have been attempts to recognize

&Ankita Wadhawan

ankita.wadhawan@thapar.edu

Parteek Kumar

parteek.bhatia@thapar.edu

Computer Science and Engineering Department, Thapar

Institute of Engineering and Technology, Patiala, Punjab,

India

123

Neural Computing and Applications (2020) 32:7957–7968

https://doi.org/10.1007/s00521-019-04691-y(0123456789().,-volV)(0123456789().,-volV)

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Active convolutional neural networks sign language (ActiveCNN-SL) framework: a paradigm shift in deaf-mute communication

Article

Full-text available

Jun 2024
ARTIF INTELL REV

Real-time speech-to-text and text-to-speech technologies have significantly influenced the accessibility of communication for individuals who are deaf or mute. This research aims to assess the efficacy of these technologies in facilitating communication between deaf or mute individuals and those who are neither deaf nor mute. A mixed-method approach will incorporate qualitative and quantitative data collection and analysis techniques. The study will involve participants from deaf or mute and non-deaf or non-mute communities. The research will scrutinize the precision and efficiency of communication using these technologies and evaluate user experience and satisfaction. Furthermore, the study intends to pinpoint potential obstacles and limitations of these technologies and offer suggestions for enhancing their effectiveness in fostering inclusivity. The study proposes an active learning framework for sign language gesture recognition, termed Active Convolutional Neural Networks—Sign Language (ActiveCNN-SL). ActiveCNN-SL aims to minimize the labeled data required for training and augment the accuracy of sign language gesture recognition through iterative human feedback. This proposed framework holds the potential to enhance communication accessibility for deaf and mute individuals and encourage inclusivity across various environments. The proposed framework is trained using two primary datasets: (i) the Sign Language Gesture Images Dataset and (ii) the American Sign Language Letters (ASL)—v1. The framework employs Resnet50 and YoloV.8 to train the datasets. It has demonstrated high performance in terms of precision and accuracy. The ResNet model achieved a remarkable accuracy rate of 99.98% during training, and it also exhibited a validation accuracy of 100%, surpassing the baseline CNN and RNN models. The YOLOv8 model outperformed previous methods on the ASL alphabet dataset, achieving an overall mean average accuracy for all classes of 97.8%.

Alphabet Recognition in Sign Language Using Deep Learning Algorithm with Bayesian Optimization

Article

Full-text available

Jun 2024

Sign language, a vital medium for communication, particularly for individuals with speechand hearing impairments, is gaining recognition for its efficacy. To evaluate the efficacy ofsign language alphabet recognition systems, three prominent image classification deeplearning models—ResNeXt101, VGG19, and ViT—were chosen due to their establishedrelevance and popularity in the field. The study aimed to identify the most effective modelfor accurate and efficient sign language classification using the NUS hand posture dataset-II. The study utilized Bayesian optimization for hyperparameter tuning, recognizing itssuperiority in systematically exploring the hyperparameter space compared to otheroptimization methods. This approach significantly enhanced the performance of the modelsby tailoring their configurations, leading to improved accuracy and robustness in signlanguage recognition across various experimental scenarios. While the findings consistentlyfavored ResNeXt101 over VGG19, with a notable 2% higher F1 score, ViT also showcasedcomparable performance in certain experiments, achieving an impressive F1 score of 99%.Despite these successes, the study encountered limitations, including dataset bias andgeneralization challenges, which underscore the need for further research in this domain toaddress these complexities.

Rotation Invariant Technique for Sign Language Recognition

Article

Jun 2024

Sign language recognition is an assistive technology that has garnered significant attention from researchers, particularly with respect to its potential benefits for individuals with hearing impairments. This paper proposes an effective technique for sign language recognition based on the Contourlet Transform (CT) and deep learning. The CT is employed in the pre-processing stage to reduce complexity and processing time, while deep learning is utilized to extract and classify sign language features. The proposed method was evaluated using two sign language databases: a direct feed database and an American sign language database. The experimental analysis demonstrated that the proposed method gives good results in processing time by more than 70% while maintaining high accuracy

MultiModal Ensemble Approach Leveraging Spatial, Skeletal, and Edge Features for Enhanced Bangla Sign Language Recognition

Article

Full-text available

Jun 2024

Sign language is the primary form of communication for individuals with auditory impairment. In Bangladesh, Bangla Sign Language (BdSL) is widely used among the hearing-impaired population. However, due to the general public’s limited awareness of sign language, communicating with them using BdSL can be challenging. Consequently, there is a growing demand for an automated system capable of efficiently understanding BdSL. For automation, various Deep Learning (DL) architectures can be employed to translate Bangla Sign Language into readable digital text. The automation system incorporates live cameras continuously capturing images, which are then processed by a DL model. However, factors such as lighting, background noise, skin tone, hand orientations, and other aspects of the image circumstances may introduce uncertainty variables. To address this, we propose a procedure that reduces these uncertainties by considering three different modalities: spatial information, skeleton awareness, and edge awareness. We introduce three image pre-processing techniques alongside three CNN models. The CNN models are combined using nine distinct ensemble meta-learning algorithms, with five of them being modifications of averaging and voting techniques. In the result analysis, our individual CNN models achieved higher training accuracy at 99.77%, 98.11%, and 99.30%, respectively, than most of the other state-of-the-art image classification architectures, except for ResNet50, which achieved 99.87%. Meanwhile, the ensemble model achieved the highest accuracy of 95.13% on the testing set, outperforming all individual CNN models. This analysis demonstrates that considering multiple modalities can significantly improve the system’s overall performance in hand pattern recognition.

Sign Language Detection using CNN Model

Article

May 2024

Prof. Kirti Patil

Sign Language is mainly used by deaf (hard hearing) and dumb people to exchange information between their own community and with other people. It is a language where people use their hand gestures to communicate as they can’t speak or hear. Sign Language Recognition (SLR) deals with recognizing the hand gestures acquisition and continues till text or speech is generated for corresponding hand gestures. Here hand gestures for sign language can be classified as static and dynamic. Deep Learning Computer Vision is used to recognize the hand gestures by building Deep Neural Network architectures (Convolution Neural Network Architectures) where the model will learn to recognize the hand gestures images over an epoch.

Determining the Digits of Turkish Sign Languages Using Deep Learning Techniques

Chapter

Jun 2024

Interactive system based on artificial intelligence and robotic arm to enhance arabic sign language learning in deaf children

Article

Full-text available

Jun 2024
Educ Inform Tech

Tailored support is crucial for deaf and hearing-impaired children to overcome learning difficulties, particularly during primary education. The absence of listening profoundly hinders the progression of the learning journey, as it plays a pivotal role in language acquisition. Employing assistive technology is one approach to address this issue in the field of education. This paper introduces RSA, an interactive system designed for the recognition and simulation of letters in Arabic Sign Language. Our system’s objective is to enrich language learning in an engaging manner. RSA utilizes artificial intelligence to identify and recognize the gestures corresponding to Arabic letters in real-time. Additionally, the system has the capability to replicate these letters through the utilization of a robotic arm. Thanks to its simplicity, the system holds promise in enhancing the acquisition of Arabic sign language skills for deaf children.

Isolated sign language recognition through integrating pose data and motion history images

Article

May 2024

This article presents an innovative approach for the task of isolated sign language recognition (SLR); this approach centers on the integration of pose data with motion history images (MHIs) derived from these data. Our research combines spatial information obtained from body, hand, and face poses with the comprehensive details provided by three-channel MHI data concerning the temporal dynamics of the sign. Particularly, our developed finger pose-based MHI (FP-MHI) feature significantly enhances the recognition success, capturing the nuances of finger movements and gestures, unlike existing approaches in SLR. This feature improves the accuracy and reliability of SLR systems by more accurately capturing the fine details and richness of sign language. Additionally, we enhance the overall model accuracy by predicting missing pose data through linear interpolation. Our study, based on the randomized leaky rectified linear unit (RReLU) enhanced ResNet-18 model, successfully handles the interaction between manual and non-manual features through the fusion of extracted features and classification with a support vector machine (SVM). This innovative integration demonstrates competitive and superior results compared to current methodologies in the field of SLR across various datasets, including BosphorusSign22k-general, BosphorusSign22k, LSA64, and GSL, in our experiments.

Sign Language Translation to Natural Voice Output: A Machine Learning Perspective

Conference Paper

Apr 2024

Kurdish Sign Language Recognition Using Pre-Trained Deep Learning Models

Article

Full-text available

Apr 2024

Ali A. Alsaud

In the tapestry of rich human communication, sign language gleams like one of the basic threads of this art, giving voice to hundreds of deaf and hard-of-hearing individuals in the region. The technology for recognizing and translating sign language fell far behind what these communities needed. Therefore, the present investigation is going to compare the performance of three top-performing deep learning algorithms in recognizing the signs created from the database of Kurdish Sign Language. The models are going to be put to a rigorous test, with a variety of signs drawn. All the 3 models give a performance that is—from all indications—good or even excellent, this is MobileNetV2, a very good candidate that manages to walk an amazing line between the requirements of high accuracy, low space complexity, and acceptable time complexity. We conclude by looking at some exciting opportunities for future research, including integrating our models into hardware devices and expanding our study to a larger variety of sign languages. And just as any good journey would, it throws up as many questions as it answers, leaving us inspired by the many possibilities that will need to be explored to enhance communication for all.

Deep Sign: Enabling Robust Statistical Continuous Sign Language Recognition via Hybrid CNN-HMMs

Article

Full-text available

Dec 2018
INT J COMPUT VISION

This manuscript introduces the end-to-end embedding of a CNN into a HMM, while interpreting the outputs of the CNN in a Bayesian framework. The hybrid CNN-HMM combines the strong discriminative abilities of CNNs with the sequence modelling capabilities of HMMs. Most current approaches in the field of gesture and sign language recognition disregard the necessity of dealing with sequence data both for training and evaluation. With our presented end-to-end embedding we are able to improve over the state-of-the-art on three challenging benchmark continuous sign language recognition tasks by between 15 and 38% relative reduction in word error rate and up to 20% absolute. We analyse the effect of the CNN structure, network pretraining and number of hidden states. We compare the hybrid modelling to a tandem approach and evaluate the gain of model combination.

Selfie video based continuous Indian sign language recognition system

Article

Full-text available

Feb 2017

This paper introduces a novel method to bring sign language closer to real time application on mobile platforms. Selfie captured sign language video is processed by constraining its computing power to that of a smart phone. Pre-filtering, segmentation and feature extraction on video frames creates a sign language feature space. Minimum Distance and Artificial Neural Network classifiers on the sign feature space is trained and tested iteratively. Sobel edge operator's power is enhanced with morphology and adaptive thresholding giving a near perfect segmentation of hand and head portions compensating for the small vibrations of the selfie stick. Word matching score (WMS) gives the performance of the proposed method with an average WMS of around 85.58% for MDC and 90% for ANN with a small variation of 0.3s in classification times. Neural network classifiers with fast training algorithms will certainly make this novel selfie sign language recognizer application into app stores.

Real-time computer vision-based Bengali Sign Language recognition

Conference Paper

Full-text available

Dec 2014

3D Sign Language Recognition with Joint Distance and Angular Coded Color Topographical Descriptor on a 2 – Stream CNN

Article

Sep 2019
NEUROCOMPUTING

Currently, one of the challenging and most interesting human action recognition (HAR) problems is the 3D sign language recognition problem. The sign in the 3D video can be characterized in the form of 3D joint location information in 3D space over time. Therefore, the objective of this study is to construct a color coded topographical descriptor from joint distances and angles computed from joint locations. We call these two color coded images the joint distance topographic descriptor (JDTD) and joint angle topographical descriptor (JATD) respectively. For the classification we propose a two stream convolutional neural network (2CNN) architecture, which takes as input the color-coded images JDTD and JATD. The two independent streams were merged and concatenated together with features from both streams in the dense layer. For a given query 3D sign (or action), a list of class scores was obtained as a text label corresponding to the sign. The results showed improvement in classifier performance over the predecessors due to the mixing of distance and angular features for predicting closely related spatio temporal discriminative features. To benchmark the performance of our proposed model, we compared our results with the state-of-the-art baseline action recognition frameworks by using our own 3D sign language dataset and two publicly available 3D mocap action datasets, namely, HDM05 and CMU.

Deep convolutional neural networks for sign language recognition

Conference Paper

Jan 2018

Faster convergence and reduction of overfitting in numerical hand sign recognition using DCNN

Conference Paper

Dec 2017

Video-based Chinese sign language recognition using convolutional neural network

Conference Paper

May 2017

Using Deep Convolutional Networks for Gesture Recognition in American Sign Language

Article

Oct 2017

In the realm of multimodal communication, sign language is, and continues to be, one of the most understudied areas. In line with recent advances in the field of deep learning, there are far reaching implications and applications that neural networks can have for sign language interpretation. In this paper, we present a method for using deep convolutional networks to classify images of both the the letters and digits in American Sign Language.

Hand sign language recognition for Bangla alphabet using Support Vector Machine

Conference Paper

Oct 2016

The sign language considered as the main language for deaf and dumb people. So, a translator is needed when a normal person wants to talk with a deaf or dumb person. In this paper, we present a framework for recognizing Bangla Sign Language (BSL) using Support Vector Machine. The Bangla hand sign alphabets for both vowels and consonants have been used to train and test the recognition system. Bangla sign alphabets are recognized by analyzing its shape and comparing its features that differentiates each sign. In proposed system, hand signs are first converted to HSV color space from RGB image. Then Gabor filters are used to acquire desired hand sign features. Since feature vector obtained using Gabor filter is in a high dimension, to reduce the dimensionality a nonlinear dimensionality reduction technique that is Kernel PCA has been used. Lastly, Support Vector Machine (SVM) is employed for classification of candidate features. The experimental results show that our proposed method outperforms the existing work on Bengali hand sign recognition.

Sign Language Recognition using 3D convolutional neural networks

Conference Paper

Jun 2015

Deep learning-based sign language recognition system for static signs

Abstract and Figures

Recommended publications

Deep learning-based sign language recognition system for static signs

Real-time recognition framework for Indian Sign Language using fine-tuned convolutional neural netwo...

Human-Computer Interaction System for Improving Digital Literacy Among Speech- and Hearing-Impaired...

Deep Leaning Based Static Indian-Gujarati Sign Language Gesture Recognition