Fig 1 - uploaded by Cemil Oz
Content may be subject to copyright.
Cyberglove™ with 18 sensors. Fig. 2. Flock of Birds ® 3-D motion tracker.  

Cyberglove™ with 18 sensors. Fig. 2. Flock of Birds ® 3-D motion tracker.  

Source publication
Article
Full-text available
An American Sign Language (ASL) recognition system developed based on multi-dimensional Hidden Markov Models (HMM) is presented in this paper. A Cyberglove? sensory glove and a Flock of Birds, motion tracker are used to extract the features of ASL gestures. The data obtained from the strain gages in the glove defines the hand shape while the data f...

Contexts in source publication

Context 1
... use a right-hand Cyberglove™ (Fig. 1) to obtain the joint angle values. It has 18 sensors and the data recording frequency is up to 150 Hz. The data used in our ASL recognition system is from 15 sensors: 3 sensors for the thumb, 2 sensors for each of the other four fingers, and 4 sensors between each neighboring two of the five fingers. To track the position and ...
Context 2
... the recognition accuracy decreased to 95%. This was because the 8th sample data from user 3 had a relatively big variation, which influenced the overall distributions of some gestures and as a result the system misrecognized some signs. Table 2 shows the detailed results of the training and recog- niton performance of the system for user 1. Fig. 10 shows how a speed tracker monitors the hand motion while the system is running. When the tracker finds the velocity of the hand is below a threshold, in this case 0.05 (unit/second), it will send a message to the main program to request the recognition process. Once the recognition is triggered, the communication between the speed ...
Context 3
... When the tracker finds the velocity of the hand is below a threshold, in this case 0.05 (unit/second), it will send a message to the main program to request the recognition process. Once the recognition is triggered, the communication between the speed tracker and the recongition processor will be turned off until a high speed reactivates it. Fig. 10 shows the reactivation velocity set at 0.5. Fig. 11 illustrates 4 recognized ASL alphabets, 'A', 'F', 'X', and ...
Context 4
... is below a threshold, in this case 0.05 (unit/second), it will send a message to the main program to request the recognition process. Once the recognition is triggered, the communication between the speed tracker and the recongition processor will be turned off until a high speed reactivates it. Fig. 10 shows the reactivation velocity set at 0.5. Fig. 11 illustrates 4 recognized ASL alphabets, 'A', 'F', 'X', and ...

Similar publications

Article
Full-text available
This paper proposes a Japanese sign-language recognition system using acceleration sensors, position sensors and datagloves, to understand human dynamic motions and finger geometry. The sensor integration method realized a robust gesture recognition comparing with a single sensor method. The sign-language recognition is done by referring to a Japan...
Conference Paper
Full-text available
This paper deals with robust modelling of mouth shapes in the context of sign language recognition using deep con-volutional neural networks. Sign language mouth shapes are difficult to annotate and thus hardly any publicly available annotations exist. As such, this work exploits related information sources as weak supervision. Humans mainly look a...
Article
Full-text available
Sign language is the only means of communication for speech and hearing impaired people. Using machine translation, Sign Language Recognition (SLR) systems provide medium of communication between speech and hearing impaired and others who have difficulty in understanding such languages. However, most of the SLR systems require the signer to sign in...
Article
Full-text available
Subunit segmenting and modelling in medical sign language is one of the important studies in linguistic-oriented and vision-based Sign Language Recognition (SLR). Many efforts were made in the precedent to focus the functional subunits from the view of linguistic syllables but the problem is implementing such subunit extraction using syllables is n...
Conference Paper
Full-text available
This work presents an iterative realignment approach applicable to visual sequence labelling tasks such as gesture recognition, activity recognition and continuous sign language recognition. Previous methods dealing with video data usually rely on given frame labels to train their clas-sifiers. Looking at recent data sets, these labels often tend t...

Citations

... The majority of these studies have focused on specific sign languages, particularly American Sign Language (ASL) and Chinese Sign Languages (CSL). Various technologies have been employed, including Microsoft Kinect [4,5], leap motion [6][7][8][9][10][11], data gloves [12,13], cameras [14][15][16][17][18][19][20][21][22][23][24][25], surface electromyography (sEMG), and inertial measurement unit (IMU) [26][27][28][29][30][31][32][33][34][35][36]. Although some systems have achieved high levels of accuracy, none are likely to be suitable for everyday real-life situations. ...
Preprint
Full-text available
The aim of this study is to develop a software solution for real-time recognition of sign language words using two arms. This will enable communication between hearing-impaired individuals and those who can hear. Several sign language recognition systems have been developed using different technologies, including cameras, armbands, and gloves. The system developed in this study utilizes surface electromyography (muscle activity) and inertial measurement unit (motion dynamics) data from both arms. Other methods often have drawbacks, such as high costs, low accuracy due to ambient light and obstacles, and complex hardware requirements, which have prevented their practical application. A software has been developed that can run on different operating systems using digital signal processing and machine learning methods specific to the study. For the test, we created a dataset of 80 words based on their frequency of use in daily life and performed a thorough feature extraction process. We tested the recognition performance using various classifiers and parameters and compared the results. The Random Forest algorithm was found to have the highest success rate with 99.875% accuracy, while the Naive Bayes algorithm had the lowest success rate with 87.625% accuracy. Feedback from a test group of 10 people indicated that the system is user-friendly, aesthetically appealing, and practically useful. The new system enables smoother communication for people with hearing disabilities and promises seamless integration into daily life without compromising user comfort or lifestyle quality.
... The process of manual feature extraction starts with converting images into feature vectors, and the vector representation is then used as input for further analysis. The work of [2] introduced sign language translation using multidimensional Hidden Markov Models (HMM), utilizing data from a sensory glove to identify hand shape and motion tracking. HMM extracted the constituent signs implicitly in multiple dimensions through a stochastic process, allowing for interactive learning and recognition. ...
Article
Full-text available
Between 2019 and 2022, as the Covid-19 pandemic unfolded, numerous countries implemented lockdown policies, leading most corporate companies to permit employees to work from home. Communication and meetings transitioned to online platforms, replacing face-to-face interactions. This shift posed challenges for deaf or hearing-impaired individuals who rely on sign language, using hand gestures for communication. However, it also affected those who can hear clearly but lack knowledge of sign language. Unfortunately, many online meeting platforms lack sign language translation features. This study addresses this issue, focusing on Thai sign language. The objective is to develop a model capable of translating Thai sign language in real-time. The Long Short-Term Memory (LSTM) architecture is employed in conjunction with MediaPipe Holistic for data collection. MediaPipe Holistic captures keypoints of hand, pose, and head, while the LSTM model translates hand gestures into a sequence of words. The model’s efficiency is assessed based on accuracy, with real-time testing achieving an 86% accuracy, slightly lower than the performance on the test dataset. Nonetheless, there is room for improvement, such as expanding the dataset by collecting data from diverse individuals, employing data augmentation techniques, and incorporating an attention mechanism to enhance model accuracy.
... Wang et al. [6] proposed a multidimensional hidden Markov models (HMM) based system for ASL recognition. They use cyberglove and flock of birds motion trackers to interpret ASL gestures. ...
... HMMs are a well-liked method that has shown their usefulness in several fields, including computer vision, voice recognition, molecular biology, and SLR [5]. In addition to the HMM, the KNN is used in classifying hand gestures [6], and the KNN classifier, in conjunction with SVM, has been utilized in categorizing postures. The KNN contributes to research to improve the recognition of ASL signals [3]. ...
Article
Full-text available
span lang="EN-US">One way of communicating with the deaf is to speak sign language. The chief barrier to little Indian sign language (ISL) research was the language diversity and variations in place. It is essential to learn sign language to communicate with them. Most learning takes place in peer groups. There are very few materials available for teaching signs. Thus, signing is very challenging to learn. Fingerspelling is the first step in sign learning and is used whenever there is no appropriate sign or if the signatory is unfamiliar with it. Sign language learning tools currently available use expensive external sensors. Through this project, we will take this field further by collecting a dataset and extracting functionally helpful information used in several supervised learning methods. Our current work presents four validated fold cross results for multiple approaches. The difference from the previous work is that we used different figures for our validation set than the training set in four-fold cross-validations.</span
... In this context, recognizing a sign involves extracting features from a single frame, while body movement is also informative in dynamic approaches (Escalera, Guyon, & Athitsos, 2017). For example, Hidden Markov Models (HMM) were applied to estimate the probability of an observed sequence (Vogler & Metaxas, 2003;Wang, Leu, & Oz, 2006). However, in video analysis, numerous factors must be considered, such as the variation of appearance, the position of a person, and the variation of illumination (Dalal, Triggs, & Schmid, 2006). ...
Article
Sign languages play an essential role in the cognitive and social development of the deaf, consisting of a natural form of communication and being a symbol of identity and culture. However, hearing loss has a severe social impact due to an existing communication barrier, preventing access to essential services such as education and health. A bi–directional sign language translation may be the solution to bridging the communication gap between the deaf and the listener, completing a two–way communication cycle. Virtual personal assistants can benefit from this technology by extending how users interact with the intelligent system. With this idea, in this work we develop a multi–stream deep learning model to recognize signs of Brazilian (BSL), Indian (ISL), and Korean (KSL) Sign Languages. We combine different types of information for the classification task, using single–stream and multi–stream 3D Convolutional Neural Networks. In addition, considering the largest source of sign data globally – the internet – we propose a depth sensor–free classification method, with depth maps artificially generated through Generative Adversarial Networks. In order to consider the main parameters that encode sign languages, the final architecture is composed of a multi–stream network that receives the segmented hands, the faces, the distances and speeds of the points of articulation, and the RGB frames associated with artificial depth maps. Finally, we provide a visual explanation to understand which regions were important for model decision–making. The best models were obtained using the multi–stream network, presenting an accuracy of 0.91 ± 0.07, and f1–score of 0.90 ± 0.08 on publicly available BSL data set. The results suggest that the multi–stream network with artificially generated depth maps is suitable for the task of sign recognition in different languages.
... The accuracy of the model improves to 91.3% when trained on the dataset of both signers. Wang et al. [21] used a Multi-dimensional Hidden Markov model for recognizing American Sign Language (ASL) and achieved an accuracy of 96.7% where the input data stream was segmented and represented in the form of a 21-dimensional feature vector. Classifier recognizes the sign language according to the stochastic data. ...
Preprint
Full-text available
Sign Language is the primary mode of communication for the hearing impaired community and between the community and the outside world. This paper proposes a vision-based sign language gesture recognition model to identify the sign gesture (word) from a hand gesture video. The proposed model consists of three modules: Pre-processing, Convolutional Neural Network, Recurrent Neural Network. Pre-processing module is used to extract the frames, segment the region of interest, and convert them into a grayscale image. Convolutional Neural Network is used to extract the spatial features for each frame and each video is represented by a sequence of spatial features. Recurrent Neural Network recognizes the gestures based on spatio-temporal relation in the sequence of features. The evaluation of the proposed model is done on two different datasets. With Argentinian Sign Language (LSA 64), the model is able to achieve an accuracy of 100%. The model achieved an accuracy of 97.70% with the data set: Indian Sign Language (ISL) for Emergency Situations.
... In this paper, we proposed a Sinhala Sign Language recognition system using Naive Bayes classifier. A compact and low-cost leap motion sensor has been used to capture hand gesture data against the existing different data capturing methods like CyberGlove [14] or Microsoft Kinect [15] which used in sign language recognition systems. We examined and calibrated the sensory data obtained from the application programming interface (API) of the leap sensor and derived the features of hand gestures of Sinhala signs. ...
Article
Full-text available
A sign language recognition system for low-resource Sinhala Sign Language using Leap Motion (LM) and Deep Neural Networks (DNN) has been presented in this paper. The study extracts static and dynamic features of hand movements of Sinhala Sign Language (SSL) using a LM controller which acquires the position of the palm, radius of hand sphere and positions of five fingers, and the proposed system is tested with the selected 24 letters and 6 words. The experimental results prove that the proposed DNN model with an average testing accuracy of 89.2% outperforms a Naïve Bayes model with 73.3% testing accuracy and a Support Vector Machine (SVM) based model with 81.2% testing accuracy. Therefore, the proposed system which uses 3D non-contact LM Controller and machine learning model has a great potential to be an affordable solution for people with hearing impairment when they communicate with normal people in their day-to-day life in all service sectors.
... Taking this into account, Image Recognition-based systems were an alternative approach of research. Initial Image Recognitionbased systems mostly relied on using conventional methods like Hidden Markov Models (HMMs) [2]. HMMs were initially used along with sensor data also and have been proven to be quite accurate in Sign Language Recognition Systems. ...
Chapter
Sign Language Recognition and Translation systems involve the usage of the human body pose and hand pose estimation. Sign Language Recognition has been conventionally been performed by some preliminary sensors and later evolved to various advanced Deep Learning–based Computer Vision systems. This chapter deals with the past, present, and future of the Sign Language Recognition systems. Sign Language Translation is also briefly discussed here, giving insights on Natural Language Processing techniques to accurately convert Sign Language to translated sentences.
... 2) visionbased, that utilizes camera and image processing systems to obtain gesture features for later recognition. Visionbased is more natural and realistic compared to device-based methods [6] that requires special equipment. Historically, numerous literature work were proposed following the classical visionbased systems and works by segmenting the hand from a given sequence of digital video frames that were acquired from a camera feed. ...
Conference Paper
Sign languages is a critical requirement that helps deaf people to express their needs, feelings and emotions using a variety of hand gestures throughout their daily life. This language had evolved in parallel with spoken languages, however, it do not resemble its counterparts in the same way. Moreover, it is as complex as any other spoken language, as each sign language embodies hundreds of signs, that differs from the next by slight changes in hand shape, position, motion direction, face and body parts contributing to each sign. Unfortunately, sign languages are not globally standardized, where the language differs between countries and has its own vocabulary and varies although they might look similar. Furthermore, publicly available datasets are limited in quality and most of the available translation services are expensive, due to the required skilled human personnel. This paper proposes a deep learning approach for sign language detection that is finely tailored for the Egyptian sign language (special case of the generic sign language). The model is built to harnesses the power of convolutional and recurrent networks by integrating them together to better recognize the sign language spatio-temporal data-feed. In addition, the paper proposes the first Egyptian sign language dataset for emotion words and pronouns. The experimental results demonstrated the proposed approach promising results on the introduced dataset using combined CNN with RNN models.
... 2) visionbased, that utilizes camera and image processing systems to obtain gesture features for later recognition. Visionbased is more natural and realistic compared to device-based methods [6] that requires special equipment. Historically, numerous literature work were proposed following the classical visionbased systems and works by segmenting the hand from a given sequence of digital video frames that were acquired from a camera feed. ...
... The output from the accelerometer sensor, when every action is made, it is given to the microcontroller where the process of display in voice format is made. The sensor detects the slightest changes during an action and based on its angle [Honggang Wang et al., 2006] it gives the analog output values of the action's corresponding x, y and z axes. The values of the axes from both the sensors are used to set ranges for the setting of words in the microcontroller. ...
Article
The computer recognition of sign language is an important process for enabling communication with the visually and hearing impaired people. This proposed project introduces an efficient way of computer recognition of sign languageby using a simplified method by the use of an accelerometer sensor which is a three axis sensor and a voice IC. The main objective of our project is to convert the sign language into a voice format and display the corresponding message on the LCD screen. The basic idea of this project is to have accelerometer sensors attached to the gloves worn by the impaired person. When the person flexes his/her hand for the pre-coded commands, the accelerometer sensors senses the change due to the angular movement of fingers and produces a corresponding output voltage. The sensed analog signal is converted to digital signal by ADC and transmits it to the voice IC via a microcontroller. The objective of the microcontroller is to perform the matching of the obtained hex-code with its corresponding pre-coded commands using Keil software. Once the code is matched with its pre-coded commands the output is delivered through a speaker via a voice IC and the command is also displayed in a LCD screen.