Content uploaded by Abhijit Pandya
Author content
All content in this area was uploaded by Abhijit Pandya on Sep 01, 2022
Content may be subject to copyright.
Content uploaded by Harshal Sanghvi
Author content
All content in this area was uploaded by Harshal Sanghvi on Aug 24, 2022
Content may be subject to copyright.
2022 IEEE World Conference on Applied Intelligence and Computing
978-1-6654-7988-2/22/$31.00 ©2022 IEEE 829
DOI: 10.1109/aic.2022.139
A Proposed Framework for Stutter Detection :
Implementation on Embedded Systems.
Bassem Alhalabi, Jonatahn Taylor, Harshal A. Sanghvi, Abhijit S. Pandya
Department of CEECS
Florida Atlantic University
Boca Raton, USA
alhalabi@fau.edu, jtaylor@fau.edu, hsanghvi2020@fau.edu, pandya@fau.edu
Abstract— It is estimated that more than 70 million people
in the world stutter. One of the major problems facing speech
professionals who collaborate with stuttering patients is
quantitatively monitoring and tracking improvements in and
outside of therapy sessions. After extensive research, it was
proposed to develop a bio-medical device that could be worn
daily by patients to monitor and record key events in everyday
conversations to track instances of stutters to be later analyzed
by speech professionals. This bio-medical innovation shall assist
the health professionals and caretakers of the stuttering
individuals to help them get out of this behavior and compete in
the real world. This paper extensively describes in detail a
feasibility study carried out and prototype developed for such a
device and contemplates its future uses and developments. This
biomedical innovation shall provide data regarding various
parameters in stuttering which needs to be evaluated and this
evaluation fastens the process of the therapy provided by health
professionals.
Keywords—Stuttering, Embedded Systems, Bio-Medical
Innovation, Stuttering Software, Communication Disorder
I. INTRODUCTION
Stuttering is a speech impediment that affects approximately
1% of the world’s population [1] in which speech can be
impaired with prolongations, repetition and breaks. While
speech professionals can monitor patients during their
sessions, the majority of their conversations outside of the
office go un-monitored and represents a wealth of information
that could be useful in quantifying a patient progress during
treatment. The proposed bio-medical device is of an
eembedded system nature, battery operated and small enough
to be worn daily by a patient and capable of using signal
processing techniques to detect instances of stuttering without
the need to record actual conversations. Instances of property
pronounced words, along with silences and detected stutters
will be recorded to a removable memory device that can later
be downloaded by a speech professional. There are various
kinds of stuttering effects through which a patient suffers from
such as Hidden Stuttering. A concept of finding definition is
generally stated as hidden stuttering. Many people talk about
the stuttering effects in blogs, podcasts, in stuttering support
groups, books as well as in writings of people who generally
stutter. The speech language pathologist still knows a very few
things in this area as there are not many reports which are
observed. Due to which the definition of stuttering is still
insufficient and there is a very little research, which goes on
in this area. A very important part of ethical practice is
evidence-based therapy, which is hardly provided in this area.
Many individuals stuttered to understand the underlying
issues in the therapy. Speech Pathologists do not just provide
the more meaningful treatments with stutter undercover but
also make them more effective due to their unique experiences
in this area. The results from the study can be used to guide
clinical decision-making.
II. LITERATURE REVIEW
There are a number of communication disorders that affect the
ability to produce and form the typical speech sounds that are
necessary to speak with others [2,3]. The symptoms of
dysarthria, apraxia, stuttering, cluttering, and lisping, among
other things, can range from person to person [4,5,6]. In the
medical dictionary, dysarthria is characterized as a disease of
speech produced by nervous system-controlled weakening of
the muscles of the voice box. When a person has dysarthria,
their speech is slurred or mumbled, and their intonation or
tempo of speech is flat or fast, making it difficult to understand
what they are saying [7]. If a patient's speech is either
excessively slow or too fast (or both), the condition is known
as cluttering. It is a neuro-developmental condition defined by
fundamental behaviors such as involuntary stops, repetition
and prolonging of sounds, syllables, sentences, or phrases, and
is also known as stammering/disfluency2 [8]. Stuttering is
also known as stammering/disfluency2. The most prevalent
type of stuttering, known as Developmental Stuttering, occurs
between the ages of two and seven. Stuttering is a multifaceted
problem, involving neurological and hereditary factors,
according to recent studies [9].
Human speech is used to convey a person's innermost
feelings, ideas, and thoughts to others. Stuttering or
stammering refers to a speech disorder in which the normal
flow of speech is disrupted. It's a condition in which people
have trouble saying what they really want to say.
Communication is challenging for those who stutter, and this
can have an impact on a person's overall well-being and the
quality of their personal relationships. Influences work
performance and opportunities negatively. More than 70
million people around the world are afflicted by this issue, a
large number. The overall population is approximately one
percent of this [10]. When they converse, the recipient
becomes annoyed by the long words and doesn't comprehend
most of the time. Stuttering interferes with the retention and
comprehension of story content for listeners, and the reactions
2022 IEEE World Conference on Applied Intelligence and Computing (AIC) | 978-1-6654-7988-2/22/$31.00 ©2022 IEEE | DOI: 10.1109/AIC55036.2022.9848966
Authorized licensed use limited to: Florida Atlantic University. Downloaded on August 19,2022 at 17:05:40 UTC from IEEE Xplore. Restrictions apply.
830
of listeners to methods and therapy programs for stuttering are
discussed by E. Charles Healey in his paper [11].
Evidence-based information about the cited subjects is readily
accessible through existing literature. muscular stiffness and
the latency durations in verbal replies as well as coordination
and patterns of the muscles (respiratory, glottal,
oromandibular) involved in speaking can be used to identify
various neurological illnesses and stuttering [12].
Over the past few decades, stuttering speech recognition has
been a subject of interest for researchers in a variety of fields,
including speech pathology, psychology, speech physiology
and acoustics and signal analysis. Manually counting and
categorizing stuttering speech problems has traditionally been
used to measure how severe the disorder is. In addition to the
length of pauses in total speech, stuttering speech can be
measured by the amount of time spent in silence. However,
speech language pathologists use a variety of methods to
evaluate stuttering (SLP). As a result, it takes a long time and
is susceptible to error. As a result, the dysfluency count and
kind of dysfluency classification for stuttering speech
assessments are automated using an ASR system [13].
III. STUTTERING DETECTION
There is no one known cause for stuttering, and it can affect a
multitude of people at various times in their lives. There are
some links to family history, brain deficiencies and stuttering
events can be triggered by a range of environmental and
personal variables such as mood, temperament, stress, and
major life events.[1] While the development of a stutter can
be random, therapy can be effective in improving patients with
a stutter. As shown in Figure-1: Initial sketch of proposed
wearable device, Stuttering can be broken down into 3 main
groups:
Figure 1 – Initial sketch of proposed wearable device
Figure-1 shows the prototype of proposed wearable device.
The proposed device consists of LED Display, SD Card Slot,
Menu Buttons, Microphone Sensor Port and Lapel Mic.
Blocks – which are when a patient struggles with
getting a word out.
I.E. I went to the ……. Mall.
Prolongations – Which are when a syllable get
stretched out
I.E. I went to the MMMMall.
Repetitions – where a part of a word or a syllable are
repeated before completing the final word
I.E. I went to the Ma-Ma-Mall. -or- My name is T-
T-Tom
A-D below for more information on proofreading,
spelling, and grammar.
IV. INITIAL ANALYSIS
Before To find a starting point for detecting stuttering events,
initial analysis of recordings of known stuttering and
recordings of simulated stuttering were run through a number
of signal processing techniques that are already readily
available. The two that showed promise in our initial
experiments were Fast Fourier Transform (FFT) to look for
areas of prolonged dominant frequency, and silence detection
algorithms used for removing whitespace in audio and video
recording. Both showed potential for use in different types of
stuttering. The FFT method showed promise for detecting
prolongations, as typical human speech is inherently variable
in the frequency domain, prolongation events have a relatively
constant frequency for a longer period of time then un-
impeded speech. Silence detection showed promise for
examples of repetitions as the length of time a word is spoken
in casual conversation showed up as highly variable, whereas
repetition events tended to have very quick bursts of silence
events that could be detected as a repetitive frequency
between words or syllables. Figure -2 describes about the
repetitive actions formed by individual.
Figure 2 – Silence detection algorithm showing their repetition
events happen there is a burst of silence events at a measurable
frequency.
Authorized licensed use limited to: Florida Atlantic University. Downloaded on August 19,2022 at 17:05:40 UTC from IEEE Xplore. Restrictions apply.
831
As illustrated in Figure-2, the silence detection algorithm
shows the repetitive actions and is compared with silent events
(normal motions) and it measures the frequency.
V. PROPOSED METHOD
After an initial analysis of recorded instances of stuttering, the
detection of repetitive silence events was chosen as it
appeared in all recordings analyzed, as well as represented a
potential for ease of detection and by an embedded system.
For speech detection we need microcontroller that can
measure the incoming audio signal and perform the analysis
algorithms described above within a reasonable time. We also
need a microphone and amplifier circuit to bring the ~50mV
microphone output up to a voltage easily readable by a
microcontroller (3-5VDC). For the amplifier circuit we had
two options, fixed and automatic gain control.
Fixed gain control is a simpler and cheaper circuit and has
benefits of known output amplitude in controlled
environments (i.e. known input range), and automatic gain
control that variables the amplification automatically to keep
the output amplitude constant across a range of input levels.
Automatic gain control excels over manual gain control in its
ability to normalize a larger range of audio inputs (i.e. distance
of the microphone from the speaker) without clipping the
peaks of the signal as described in Figure 3, 4 and making
detection and timing measurements difficult for the
microprocessor.
Traditional amplifier circuits with manual gain control may
have issues dealing with variable distance from speaker to
microphone and clipping of usable signals that may make
stutter detection difficult. As shown in Figure-3,4 MAX9814
Amplifier with Automatic gain control was chosen as the
automatic gain control and on chip filtering would allow for
more reliable audio measurements and the graphical
representation of the same is shown in the figures. Time and
voltage are the two parameters being analyzed.
Figure-3 explains about Automatic Gain Control disabled
which shows the clipping of an audio signal which consists of
high amplitude.
Figure-4 demonstrates about Automatic Gain Control Enabled
which shows the amplified voltage output which is
proportional to the mV input.
Processing Technique:
Using the MSP430, Smooth audio signal was used
to remove reverberation by averaging A/D return.
Peak to Peak timing was measured to get interval
between words or syllables.
The limit was applied to remove erroneous results
out of bounds for a repetition.
The occurrences were analyzed which were of
greater than 3 ~equal peak to peak to peak times that
were within range and record as stutter instance.
VI. EXECUTION
To execute the proof of concept the MSP430 platform by
Texas Instruments was chosen for its 12-bit analog to digital
converter and low power modes that would be needed for
future integration into a final battery-operated device. The
MAX9814 microphone amplifier package was chosen for its
onboard filtering that has proved to excel at removing
background noise and amplifying voice signals, as well as
robust automatic gain control built into the chip which should
be ideal for the type of voice detection we are trying to achieve
with this project. The MAX9814 was wired into the Analog
to digital converter of the MSP430 and the A/D converter was
configured in software to sample the amplifier chips output.
Routines were written into software for averaging and
smoothing of the analog signal to make detection of unique
A series of test was conducted by speaking into the device
to determine basic level of functionality and adjustments were
Figure 3 – Automatic Gain Control Disabled showcasing
clipping of an audio signal that is too high an amplitude [14]
Figure 4 – Automatic Gain Control Enabled showing
amplified voltage output proportional to the mV input [3]
Authorized licensed use limited to: Florida Atlantic University. Downloaded on August 19,2022 at 17:05:40 UTC from IEEE Xplore. Restrictions apply.
832
made to averaging and smoothing functions till we were
comfortable with the demonstrated ability to detect the
difference between a spoken words and silence. At this point
were ready to begin measuring peak to peak times and analyze
the results through the prototype as shown in Figure-5.
Figure 5 - MSP430 and MAX9814 chips with microphone
and screen for display wired together on breadboard.
Figure-5 describes the prototype designed through embedded
systems which is utilized to analyze the stuttering. In Figure-
5, MSP430 and MAX9814 chips along with microphone and
LCD screen for display are wired together on the breadboard.
VII. ALGORITHM DEVELOPMENT
To create the algorithm, we began by creating a timer and
writing a function to start the timer when no words were
detected (silence) and reset the timer when a word was
detected. Every time a word was detected the timer value at
that instant was pushed into an array for later analysis. Now
that we had an array of the times between words, we moved
on to analyzing the timing of the sequences. We started with
the assumption that three equal silences would represent a
repetition or stutter event from our initial analysis. A window
of +/- approximately 5ms was needed to give the algorithm
enough wiggle room to pass this test with reasonable detection
rates. Experiments were done with lowering this to 2 instances
but to many false positives were detected and 3 was settled on
as the final number.
VIII. RESULTS
After the algorithm was refined, we were able to reach a point
where we were detecting basic stuttering events for repetition
of letters and syllables which was the goal of our project. We
would estimate our success of capturing a stuttering event
between 70 and 80% for repetitions with false positives in
normal conversation in low single digit percentages. our
primitive algorithm runs into problems in normal speech such
as four or more syllable words and laughing which both create
high percentages of false positives for stuttering.
IX. PACKAGING.
Additional work needs to be done to complete the packaging
to meet the initial design criteria, we have developed Solid
works models of 3d printable packaging that would fit the
proposed development system and microphone. If a custom
circuit board was created to combine devices and remove the
demo boards additional size would be made available and the
device could be made smaller. The proposed prototype of the
3D printed device is shown in Figure 6 and Figure 7.
Figure 6 – 3d printable packaging for base.
Figure 7 – 3d printable packaging for lapel microphone
X. CONCLUSION AND FUTURE WORK
To bring this project from a proof of concept to a prototype
phase additional algorithm work would be needed to
determine if our execution of peak to peak timing is indeed
good enough for reliable detections of repetitions in patients.
Additional work with a wider range of recorded patients
would be key to fleshing out the algorithm. Also, the inclusion
of FFT algorithms to detect prolongations would be a good
next step. Detection of blocks would only be possible by a
Authorized licensed use limited to: Florida Atlantic University. Downloaded on August 19,2022 at 17:05:40 UTC from IEEE Xplore. Restrictions apply.
833
non-audible method as from the microprocessor’s standpoint
a block is no different than silence and undetectable from
normal speech patterns. The future work shall include a cloud-
based model to detect the stuttering parameters and this shall
be useful for the health professionals and caretakers of
stuttering patients to assist them through real time data. A web
or mobile based application would be created as a tool.
ACKNOWLEDGMENT
We would like to acknowledge Dr. Dale Williams,
Department of Communication Sciences & Disorders, Florida
Atlantic University for his idea to create this project to help
his patients track their progress. The Audacity freeware
program (https://www.audacityteam.org/) was also extremely
helpful in our initial investigation for finding different
algorithms quickly. We would also like to impart sincere
regards to Department of CEECS, Florida Atlantic University
for their support during the project.
REFERENCES
[1] Penman, A., Hill, A. E., Hewat, S., & Scarinci, N. (2021). Speech–
language pathology students’ perceptions of simulation‐based
learning experiences in stuttering. International journal of
language & communication disorders, 56(6), 1132-1146.
[2] Azios, M., Irani, F., Rutland, B., Ratinaud, P., & Manchaiah, V.
(2020). Representation of Stuttering in the United Sates Newspaper
Media. Journal of Consumer Health on the Internet, 24(4), 329-
345.
[3] Joseph R Duffy. Motor Speech Disorders-E-Book: Substrates,
Differential Diagnosis, and Management. Elsevier Health Sciences,
2013.
[4] Nan Bernstein Ratner and Brian MacWhinney. Fluency bank: A
new resource for fluency research and practice. Journal of Fluency
Disorders, 56:69, 2018.
[5] David Ward. Stuttering and Cluttering: Frameworks for
Understanding and Treatment. Psychology Press, 2008.
[6] Thomas D Kehoe and Wikibooks Contributors. Speech Language
Pathology-Stuttering. Kiambo Ridge, 2006.
[7] Joseph Kalinowski, Sandra Noble, Joy Armson, and AndrewStuart.
Pretreatment and posttreatment speech naturalness ratings of adults
with mild and severe stuttering. American Journal of Speech-
Language Pathology, 3(2):61–66, 1994.
[8] Anne Smith and Christine Weber. How stuttering develops: The
multifactorial dynamic pathways theory. Journal of Speech,
Language, and Hearing Research, 60(9):2483–2505, 2017.
[9] Andrew C Etchell, Oren Civier, Kirrie J Ballard, and Paul F
Sowman. A systematic literature review of neuroimaging research
on developmental stuttering between 1995 and 2016. Journal of
Fluency Disorders, 55:6–45, 2018.
[10] Banerjee, N., Borah, S., & Sethi, N. (2022). Intelligent stuttering
speech recognition: A succinct review. Multimedia Tools and
Applications, 1-22.
[11] Healey EC (2010) What the literature tells us about listeners'
reactions to stuttering: implications for the clinical management of
stuttering. Sem Speech Language 31, no. 04, pp. 227-235). ©
Thieme Medical Publishers
[12] Pinelli P (1992) Neurophysiology in the science of speech. Curr
Opinion Neurol Neurosurg 5(5):744–755
[13] Manjula G, Kumar S (2016) Overview of Analysis and
Classification of Stuttered Speech Proceed 11th IRF Int Conf
[14] “MAX9814 Microphone Amplifier with AGC and Low-Noise
Microphone Bias” Available
:https://datasheets.maximintegrated.com/en/ds/MAX9814.pdf /.
[Accessed: 01-May-2019].
Authorized licensed use limited to: Florida Atlantic University. Downloaded on August 19,2022 at 17:05:40 UTC from IEEE Xplore. Restrictions apply.