ArticlePublisher preview available

Deep learning-based sign language recognition system for static signs

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

Sign language for communication is efficacious for humans, and vital research is in progress in computer vision systems. The earliest work in Indian Sign Language (ISL) recognition considers the recognition of significant differentiable hand signs and therefore often selecting a few signs from the ISL for recognition. This paper deals with robust modeling of static signs in the context of sign language recognition using deep learning-based convolutional neural networks (CNN). In this research, total 35,000 sign images of 100 static signs are collected from different users. The efficiency of the proposed system is evaluated on approximately 50 CNN models. The results are also evaluated on the basis of different optimizers, and it has been observed that the proposed approach has achieved the highest training accuracy of 99.72% and 99.90% on colored and grayscale images, respectively. The performance of the proposed system has also been evaluated on the basis of precision, recall and F-score. The system also demonstrates its effectiveness over the earlier works in which only a few hand signs are considered for recognition.
This content is subject to copyright. Terms and conditions apply.
S.I. : HYBRID ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
TECHNOLOGIES
Deep learning-based sign language recognition system for static signs
Ankita Wadhawan
1
Parteek Kumar
1
Received: 3 December 2018 / Accepted: 18 December 2019 / Published online: 1 January 2020
ÓSpringer-Verlag London Ltd., part of Springer Nature 2020
Abstract
Sign language for communication is efficacious for humans, and vital research is in progress in computer vision systems.
The earliest work in Indian Sign Language (ISL) recognition considers the recognition of significant differentiable hand
signs and therefore often selecting a few signs from the ISL for recognition. This paper deals with robust modeling of static
signs in the context of sign language recognition using deep learning-based convolutional neural networks (CNN). In this
research, total 35,000 sign images of 100 static signs are collected from different users. The efficiency of the proposed
system is evaluated on approximately 50 CNN models. The results are also evaluated on the basis of different optimizers,
and it has been observed that the proposed approach has achieved the highest training accuracy of 99.72% and 99.90% on
colored and grayscale images, respectively. The performance of the proposed system has also been evaluated on the basis
of precision, recall and F-score. The system also demonstrates its effectiveness over the earlier works in which only a few
hand signs are considered for recognition.
Keywords Sign language Data acquisition Convolutional neural network Max-pooling Softmax Optimizer
1 Introduction
Sign language is a computer vision-based complete con-
voluted language that engrosses signs shaped by the
movements of hands in combination with facial expres-
sions. It is a natural language used by people with low or
no hearing sense for communication. A sign language can
be used for communication of letters, words or sentences
using different signs of the hands. This type of communi-
cation makes it easier for hearing-impaired people to
express their views and also help in bridging the commu-
nication gap between hearing-impaired people and other
person.
Humans have been adapting to sign language to com-
municate since ancient times. Hand gestures are as ancient
as the human civilization itself [1]. Hand signs are espe-
cially useful to express any word or feeling to communi-
cate. Therefore, people around the world use signals from
hand constantly to express despite the formulation of
writing conventions.
In recent times, much research has been ongoing in
developing systems that are able to classify signs of dif-
ferent sign languages into the given class. Such systems
have found applications in games, virtual reality environ-
ments, robot controls and natural language communica-
tions. At present, the Indian Sign Language systems are in
the developing stage and no sign language recognition
system is available for recognizing signs in real time. So,
there is a need to develop a complete recognizer which
identifies signs of Indian Sign Language.
The automatic recognition of human signs is a complex
multidisciplinary problem that has not yet been completely
solved. In the past years, a number of approaches were
used which involve the use of machine learning techniques
for sign language recognition. Since the advent of deep
learning techniques, there have been attempts to recognize
&Ankita Wadhawan
ankita.wadhawan@thapar.edu
Parteek Kumar
parteek.bhatia@thapar.edu
1
Computer Science and Engineering Department, Thapar
Institute of Engineering and Technology, Patiala, Punjab,
India
123
Neural Computing and Applications (2020) 32:7957–7968
https://doi.org/10.1007/s00521-019-04691-y(0123456789().,-volV)(0123456789().,-volV)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
... The method for recognizing hand gestures involves several basic steps: data capture, localization of the hand, feature separation, and identification using recovered features. Wadhawan and Kumar (2020) proposed a convolutional neural network (CNN)-based sign language recognition system. The system's efficiency was evaluated using fifty different CNN models to conduct the research. ...
... The recent research in gesture recognition Pre-processing techniques and recognition rateWadhawan and Kumar (2020) 100 static signs of different individuals Image Processing + ORB Feature Extraction The application accuracy is 96.96%Pre-processing techniques and recognition rateDamaneh et al. (2023) Three datasets were used: the Massey test set of 758 images, ASL with 7020 test images, and ASL Alphabet with 26,100 test images ...
Article
Full-text available
Real-time speech-to-text and text-to-speech technologies have significantly influenced the accessibility of communication for individuals who are deaf or mute. This research aims to assess the efficacy of these technologies in facilitating communication between deaf or mute individuals and those who are neither deaf nor mute. A mixed-method approach will incorporate qualitative and quantitative data collection and analysis techniques. The study will involve participants from deaf or mute and non-deaf or non-mute communities. The research will scrutinize the precision and efficiency of communication using these technologies and evaluate user experience and satisfaction. Furthermore, the study intends to pinpoint potential obstacles and limitations of these technologies and offer suggestions for enhancing their effectiveness in fostering inclusivity. The study proposes an active learning framework for sign language gesture recognition, termed Active Convolutional Neural Networks—Sign Language (ActiveCNN-SL). ActiveCNN-SL aims to minimize the labeled data required for training and augment the accuracy of sign language gesture recognition through iterative human feedback. This proposed framework holds the potential to enhance communication accessibility for deaf and mute individuals and encourage inclusivity across various environments. The proposed framework is trained using two primary datasets: (i) the Sign Language Gesture Images Dataset and (ii) the American Sign Language Letters (ASL)—v1. The framework employs Resnet50 and YoloV.8 to train the datasets. It has demonstrated high performance in terms of precision and accuracy. The ResNet model achieved a remarkable accuracy rate of 99.98% during training, and it also exhibited a validation accuracy of 100%, surpassing the baseline CNN and RNN models. The YOLOv8 model outperformed previous methods on the ASL alphabet dataset, achieving an overall mean average accuracy for all classes of 97.8%.
... In the realm of empirical investigations using a curated dataset [19], a novel convolutional neural network (CNN) architecture was devised, incorporating two convolutional layers, max-pooling, dropout, and densely connected layers, totaling 4,073,540 parameters. This approach yields remarkable accuracy scores of 99.72% for color data and 99.9% for grayscale data. ...
Article
Full-text available
Sign language, a vital medium for communication, particularly for individuals with speechand hearing impairments, is gaining recognition for its efficacy. To evaluate the efficacy ofsign language alphabet recognition systems, three prominent image classification deeplearning models—ResNeXt101, VGG19, and ViT—were chosen due to their establishedrelevance and popularity in the field. The study aimed to identify the most effective modelfor accurate and efficient sign language classification using the NUS hand posture dataset-II. The study utilized Bayesian optimization for hyperparameter tuning, recognizing itssuperiority in systematically exploring the hyperparameter space compared to otheroptimization methods. This approach significantly enhanced the performance of the modelsby tailoring their configurations, leading to improved accuracy and robustness in signlanguage recognition across various experimental scenarios. While the findings consistentlyfavored ResNeXt101 over VGG19, with a notable 2% higher F1 score, ViT also showcasedcomparable performance in certain experiments, achieving an impressive F1 score of 99%.Despite these successes, the study encountered limitations, including dataset bias andgeneralization challenges, which underscore the need for further research in this domain toaddress these complexities.
... With the Hidden Markov Model (HMM) as methodology, the experimental outcomes demonstrate the effectiveness of our proposed framework, achieving an accuracy of 83.77% in recognizing occluded sign gestures. In a 2020 study by Wadhawan et al. [10], deep learning-based Convolutional Neural Networks (CNN) were utilized to create a robust model for recognizing static signs in sign language. The researchers collected 35,000 hand images from multiple users for 100 words in sign language. ...
Article
Sign language recognition is an assistive technology that has garnered significant attention from researchers, particularly with respect to its potential benefits for individuals with hearing impairments. This paper proposes an effective technique for sign language recognition based on the Contourlet Transform (CT) and deep learning. The CT is employed in the pre-processing stage to reduce complexity and processing time, while deep learning is utilized to extract and classify sign language features. The proposed method was evaluated using two sign language databases: a direct feed database and an American sign language database. The experimental analysis demonstrated that the proposed method gives good results in processing time by more than 70% while maintaining high accuracy
... The system achieved an accuracy of 99.90% on 35,000 samples of 100 classes. In the performance comparison, SGD demonstrated superior results over Adam and RMSProp optimizers [18]. A paper on CSLR showcased a video-based identification technique for the CNN system. ...
Article
Full-text available
Sign language is the primary form of communication for individuals with auditory impairment. In Bangladesh, Bangla Sign Language (BdSL) is widely used among the hearing-impaired population. However, due to the general public’s limited awareness of sign language, communicating with them using BdSL can be challenging. Consequently, there is a growing demand for an automated system capable of efficiently understanding BdSL. For automation, various Deep Learning (DL) architectures can be employed to translate Bangla Sign Language into readable digital text. The automation system incorporates live cameras continuously capturing images, which are then processed by a DL model. However, factors such as lighting, background noise, skin tone, hand orientations, and other aspects of the image circumstances may introduce uncertainty variables. To address this, we propose a procedure that reduces these uncertainties by considering three different modalities: spatial information, skeleton awareness, and edge awareness. We introduce three image pre-processing techniques alongside three CNN models. The CNN models are combined using nine distinct ensemble meta-learning algorithms, with five of them being modifications of averaging and voting techniques. In the result analysis, our individual CNN models achieved higher training accuracy at 99.77%, 98.11%, and 99.30%, respectively, than most of the other state-of-the-art image classification architectures, except for ResNet50, which achieved 99.87%. Meanwhile, the ensemble model achieved the highest accuracy of 95.13% on the testing set, outperforming all individual CNN models. This analysis demonstrates that considering multiple modalities can significantly improve the system’s overall performance in hand pattern recognition.
... k-Nearest Neighbours with Dynamic Time Warping (a nonparametric technique) and Convolutional Neural Networks were used as classifiers. For static signs, a deep learning-based sign language recognition system its author Wadhawan, Ankita, and Parteek Kumar [10]. The paper discusses the use of deep learning-based convolutional neural networks to represent robust static signs in the context of sign language recognition (CNN). ...
Article
Sign Language is mainly used by deaf (hard hearing) and dumb people to exchange information between their own community and with other people. It is a language where people use their hand gestures to communicate as they can’t speak or hear. Sign Language Recognition (SLR) deals with recognizing the hand gestures acquisition and continues till text or speech is generated for corresponding hand gestures. Here hand gestures for sign language can be classified as static and dynamic. Deep Learning Computer Vision is used to recognize the hand gestures by building Deep Neural Network architectures (Convolution Neural Network Architectures) where the model will learn to recognize the hand gestures images over an epoch.
Article
Full-text available
Tailored support is crucial for deaf and hearing-impaired children to overcome learning difficulties, particularly during primary education. The absence of listening profoundly hinders the progression of the learning journey, as it plays a pivotal role in language acquisition. Employing assistive technology is one approach to address this issue in the field of education. This paper introduces RSA, an interactive system designed for the recognition and simulation of letters in Arabic Sign Language. Our system’s objective is to enrich language learning in an engaging manner. RSA utilizes artificial intelligence to identify and recognize the gestures corresponding to Arabic letters in real-time. Additionally, the system has the capability to replicate these letters through the utilization of a robotic arm. Thanks to its simplicity, the system holds promise in enhancing the acquisition of Arabic sign language skills for deaf children.
Article
This article presents an innovative approach for the task of isolated sign language recognition (SLR); this approach centers on the integration of pose data with motion history images (MHIs) derived from these data. Our research combines spatial information obtained from body, hand, and face poses with the comprehensive details provided by three-channel MHI data concerning the temporal dynamics of the sign. Particularly, our developed finger pose-based MHI (FP-MHI) feature significantly enhances the recognition success, capturing the nuances of finger movements and gestures, unlike existing approaches in SLR. This feature improves the accuracy and reliability of SLR systems by more accurately capturing the fine details and richness of sign language. Additionally, we enhance the overall model accuracy by predicting missing pose data through linear interpolation. Our study, based on the randomized leaky rectified linear unit (RReLU) enhanced ResNet-18 model, successfully handles the interaction between manual and non-manual features through the fusion of extracted features and classification with a support vector machine (SVM). This innovative integration demonstrates competitive and superior results compared to current methodologies in the field of SLR across various datasets, including BosphorusSign22k-general, BosphorusSign22k, LSA64, and GSL, in our experiments.
Article
Full-text available
In the tapestry of rich human communication, sign language gleams like one of the basic threads of this art, giving voice to hundreds of deaf and hard-of-hearing individuals in the region. The technology for recognizing and translating sign language fell far behind what these communities needed. Therefore, the present investigation is going to compare the performance of three top-performing deep learning algorithms in recognizing the signs created from the database of Kurdish Sign Language. The models are going to be put to a rigorous test, with a variety of signs drawn. All the 3 models give a performance that is—from all indications—good or even excellent, this is MobileNetV2, a very good candidate that manages to walk an amazing line between the requirements of high accuracy, low space complexity, and acceptable time complexity. We conclude by looking at some exciting opportunities for future research, including integrating our models into hardware devices and expanding our study to a larger variety of sign languages. And just as any good journey would, it throws up as many questions as it answers, leaving us inspired by the many possibilities that will need to be explored to enhance communication for all.
Article
Full-text available
This manuscript introduces the end-to-end embedding of a CNN into a HMM, while interpreting the outputs of the CNN in a Bayesian framework. The hybrid CNN-HMM combines the strong discriminative abilities of CNNs with the sequence modelling capabilities of HMMs. Most current approaches in the field of gesture and sign language recognition disregard the necessity of dealing with sequence data both for training and evaluation. With our presented end-to-end embedding we are able to improve over the state-of-the-art on three challenging benchmark continuous sign language recognition tasks by between 15 and 38% relative reduction in word error rate and up to 20% absolute. We analyse the effect of the CNN structure, network pretraining and number of hidden states. We compare the hybrid modelling to a tandem approach and evaluate the gain of model combination.
Article
Full-text available
This paper introduces a novel method to bring sign language closer to real time application on mobile platforms. Selfie captured sign language video is processed by constraining its computing power to that of a smart phone. Pre-filtering, segmentation and feature extraction on video frames creates a sign language feature space. Minimum Distance and Artificial Neural Network classifiers on the sign feature space is trained and tested iteratively. Sobel edge operator's power is enhanced with morphology and adaptive thresholding giving a near perfect segmentation of hand and head portions compensating for the small vibrations of the selfie stick. Word matching score (WMS) gives the performance of the proposed method with an average WMS of around 85.58% for MDC and 90% for ANN with a small variation of 0.3s in classification times. Neural network classifiers with fast training algorithms will certainly make this novel selfie sign language recognizer application into app stores.
Article
Currently, one of the challenging and most interesting human action recognition (HAR) problems is the 3D sign language recognition problem. The sign in the 3D video can be characterized in the form of 3D joint location information in 3D space over time. Therefore, the objective of this study is to construct a color coded topographical descriptor from joint distances and angles computed from joint locations. We call these two color coded images the joint distance topographic descriptor (JDTD) and joint angle topographical descriptor (JATD) respectively. For the classification we propose a two stream convolutional neural network (2CNN) architecture, which takes as input the color-coded images JDTD and JATD. The two independent streams were merged and concatenated together with features from both streams in the dense layer. For a given query 3D sign (or action), a list of class scores was obtained as a text label corresponding to the sign. The results showed improvement in classifier performance over the predecessors due to the mixing of distance and angular features for predicting closely related spatio temporal discriminative features. To benchmark the performance of our proposed model, we compared our results with the state-of-the-art baseline action recognition frameworks by using our own 3D sign language dataset and two publicly available 3D mocap action datasets, namely, HDM05 and CMU.
Article
In the realm of multimodal communication, sign language is, and continues to be, one of the most understudied areas. In line with recent advances in the field of deep learning, there are far reaching implications and applications that neural networks can have for sign language interpretation. In this paper, we present a method for using deep convolutional networks to classify images of both the the letters and digits in American Sign Language.
Conference Paper
The sign language considered as the main language for deaf and dumb people. So, a translator is needed when a normal person wants to talk with a deaf or dumb person. In this paper, we present a framework for recognizing Bangla Sign Language (BSL) using Support Vector Machine. The Bangla hand sign alphabets for both vowels and consonants have been used to train and test the recognition system. Bangla sign alphabets are recognized by analyzing its shape and comparing its features that differentiates each sign. In proposed system, hand signs are first converted to HSV color space from RGB image. Then Gabor filters are used to acquire desired hand sign features. Since feature vector obtained using Gabor filter is in a high dimension, to reduce the dimensionality a nonlinear dimensionality reduction technique that is Kernel PCA has been used. Lastly, Support Vector Machine (SVM) is employed for classification of candidate features. The experimental results show that our proposed method outperforms the existing work on Bengali hand sign recognition.