ArticlePDF Available

Disfluency in dialogue: An intentional signal from the speaker?

June 2012
Psychonomic Bulletin & Review 19(5):921-8

June 2012
19(5):921-8

DOI:10.3758/s13423-012-0279-x

Source
PubMed

Authors:

Ian Finlayson

Martin Corley

The University of Edinburgh

Disfluency is a characteristic feature of spontaneous human speech, commonly seen as a consequence of problems with production. However, the question remains open as to why speakers are disfluent: Is it a mechanical by-product of planning difficulty, or do speakers use disfluency in dialogue to manage listeners' expectations? To address this question, we present two experiments investigating the production of disfluency in monologue and dialogue situations. Dialogue affected the linguistic choices made by participants, who aligned on referring expressions by choosing less frequent names for ambiguous images where those names had previously been mentioned. However, participants were no more disfluent in dialogue than in monologue situations, and the distribution of types of disfluency used remained constant. Our evidence rules out at least a straightforward interpretation of the view that disfluencies are an intentional signal in dialogue.

…

Figures - uploaded by Martin Corley

Content may be subject to copyright.

Content uploaded by Martin Corley

Content may be subject to copyright.

A preview of the PDF is not available

EEG Correlates of Distractions and Hesitations in Human–Robot Interaction: A LabLinking Pilot Study

Article

Full-text available

Mar 2023

In this paper, we investigate the effect of distractions and hesitations as a scaffolding strategy. Recent research points to the potential beneficial effects of a speaker’s hesitations on the listeners’ comprehension of utterances, although results from studies on this issue indicate that humans do not make strategic use of them. The role of hesitations and their communicative function in human-human interaction is a much-discussed topic in current research. To better understand the underlying cognitive processes, we developed a human–robot interaction (HRI) setup that allows the measurement of the electroencephalogram (EEG) signals of a human participant while interacting with a robot. We thereby address the research question of whether we find effects on single-trial EEG based on the distraction and the corresponding robot’s hesitation scaffolding strategy. To carry out the experiments, we leverage our LabLinking method, which enables interdisciplinary joint research between remote labs. This study could not have been conducted without LabLinking, as the two involved labs needed to combine their individual expertise and equipment to achieve the goal together. The results of our study indicate that the EEG correlates in the distracted condition are different from the baseline condition without distractions. Furthermore, we could differentiate the EEG correlates of distraction with and without a hesitation scaffolding strategy. This proof-of-concept study shows that LabLinking makes it possible to conduct collaborative HRI studies in remote laboratories and lays the first foundation for more in-depth research into robotic scaffolding strategies.

Occurrences and Durations of Filled Pauses in Relation to Words and Silent Pauses in Spontaneous Speech

Article

Full-text available

Mar 2023

Mária Gósy

Filled pauses (i.e., gaps in speech production filled with non-lexical vocalizations) have been studied for more than sixty years in different languages. These studies utilize many different approaches to explore the origins, specific patterns, forms, incidents, positions, and functions of filled pauses. The present research examines the presence of filled pauses by considering the adjacent words and silent pauses that define their immediate positions as well as the influence of the immediate position on filled pause duration. The durations of 2450 filled pauses produced in 30 narratives were analyzed in terms of their incidence, immediate positions, neighboring silent pauses, and surrounding word types. The data obtained showed that filled pauses that were attached to a word on one side were the most frequent. Filled pauses occurring within a word and between two silent pauses were the longest of all. Hence, the durations of filled pauses were significantly influenced by the silent pauses occurring in their vicinity. The durations and occurrence of filled pauses did not differ when content or function words preceded the filled pause or followed it. These findings suggest that the incidence and duration of filled pauses as influenced by the neighboring words and silent pauses may be indicative of their information content, which is related to the processes of transforming ideas into grammatical structures.

Change in Degree vs. Change in Kind: Bridging the Quantitative/Qualitative Divide in Second Language Fluency

Article

Full-text available

Oct 2023

Reid Evans

The present study investigated the development of second language fluency from the perspective of complex dynamic systems theory in an un-tutored adult learner of English as a second language. To complement traditional studies of second language (L2) utterance fluency, this study adopted a functional, usage-based perspective and sought to understand how fluency changes not just in terms of numerical increases or decreases in the frequency of disfluency features, but also how the discursive function of fluency changes over time. Data were collected from two oral tasks on a weekly basis for one academic year and were analyzed quantitatively and qualitatively in an attempt to achieve dynamic method integration (Hiver & Al-Hoorie, 2020). Though many disfluency features demonstrated either stagnation or even increases in frequency over the period of data collection, the way in which these features were leveraged by the learner changed over time. Specifically, pause clusters and self-repair acquired new roles associated with the emergence, and the subsequent monitor and repair, of novel syntactic structures. Results suggest that understanding L2 utterance fluency development is complex and requires more than simple frequency counts of disfluency features. Instead, definitions of L2 fluency must account for the discursive function of disfluencies and how these functions change as speakers develop proficiency in the L2.

Hezitace: jejich fonetická realizace a zaznamenávání (přehledová studie) / Hesitation: their phonetic realization and notation (review study)

Article

Full-text available

Jan 2024

Lucie Jílková

The aim of this paper is to give an overview of the current knowledge about the phonetic realization and transcription of hesitations. Most authors distinguish two basic types of hesitations, namely vocalic and consonantal hesitations. Vocal hesitations are in the vast majority of cases (in different languages) realized as a neutral central vowel, schwa [ə], consonantal hesitations are realised as a consonant [m] without a final explosion. Some authors also mention the possible nasalization of vocalic hesitation [ə]. The two basic types of hesitations are most often transcribed as uh (vocalic hesitation) and um (consonantal hesitation). This method of transcription is mainly used by English writers who analyse (American) English. However, the repertoire of hesitation transcription varies from author to author, including with respect to the different languages that authors deal with when analyzing hesitations.

Disfluencies Revisited—Are They Speaker-Specific?

Article

Full-text available

Jun 2023

The forensic application of phonetics relies on individuality in speech. In the forensic domain, individual patterns of verbal and paraverbal behavior are of interest which are readily available, measurable, consistent, and robust to disguise and to telephone transmission. This contribution is written from the perspective of the forensic phonetic practitioner and seeks to establish a more comprehensive concept of disfluency than previous studies have. A taxonomy of possible variables forming part of what can be termed disfluency behavior is outlined. It includes the “classical” fillers, but extends well beyond these, covering, among others, additional types of fillers as well as prolongations, but also the way in which fillers are combined with pauses. In the empirical section, the materials collected for an earlier study are re-examined and subjected to two different statistical procedures in an attempt to approach the issue of individuality. Recordings consist of several minutes of spontaneous speech by eight speakers on three different occasions. Beyond the established set of hesitation markers, additional aspects of disfluency behavior which fulfill the criteria outlined above are included in the analysis. The proportion of various types of disfluency markers is determined. Both statistical approaches suggest that these speakers can be distinguished at a level far above chance using the disfluency data. At the same time, the results show that it is difficult to pin down a single measure which characterizes the disfluency behavior of an individual speaker. The forensic implications of these findings are discussed.

A megakadásjelenségek és a temporális paraméterek szerepe a borderline személyiségzavar felismerésében

Article

Full-text available

Dec 2022

Borderline personality disorder (BPD) is characterized by a pervasive pattern of instability of identity, emotions, and interpersonal relationships, and difficulty with emotional and impulse control. Due to the various co-occurrence patterns of symptoms and the interactions between them, the BPD population is relatively heterogeneous, making it difficult for clinicians to diagnose individuals. Since an individual's state of mind may to some extent be reflected in their speech behavior, an analysis of speech patterns may make a useful contribution to efforts of identifying the disorder. Our goal is to characterize spontaneous speech of BPD individuals based on the patterns of disfluencies and temporal parameters with the forced alignment method of automatic speech recognition, and to differentiate BPD individuals (N = 27) from healthy controls (N = 27) with a logistic regression statistical model. Our results have shown, that spontaneous speech of BPD individuals can be characterized by the frequency of silent pauses, filled pauses, and disturbances of grammatical encoding (grammatical errors and blendings), and the two groups can be differentiated by these features with 0.834 AUC.

Exploring the status of filled pauses as pragmatic markers: The role of gaze and gesture

Article

Full-text available

Apr 2023

Loulou Kosmala

The present study aims to explore the status of filled pauses as pragmatic markers by taking into account their accompanying visual and gestural behavior. This aspect has not yet been widely explored, and the current study breaks new ground by demonstrating that the analysis of gaze and gesture can shed substantial light on the pragmatic functions of filled pauses and other pausing phenomena. Filled pauses (FPs) serve several pragmatic functions in speech, mainly planning but also turn-holding and emphasis, and their use is also highly determined by register and setting. This research explores the different pragmatic functions of FPs by analyzing their distribution in two different communication settings (conversation vs presentation setting), combining a quantitative and a qualitative methodology, following Kosmala & Crible’s (2021) study on the same data. Particular attention was paid to the co-occurring gestural activity of uh/ums and gaze behavior. Analyses show that the pragmatic functions of FPs are also embodied in kinetic activities which differ according to the setting: more pragmatic and referential ones were found during FPs in conversation than in the presentation setting, as well as more eye-contact, which reflects their potential communicative role during interactional sequences.

Identifying the Speech Errors in a Talkshow From a Podcast: A Case on Speech Disfluency

Article

Full-text available

Mar 2023

This study used a psycholinguistic approach and concentrated on speech disfluency in an Ellen Degeneres podcast. The goal of this study was to identify the most prevalent type of speech disfluency used in the Ellen Degeneres podcast by analyzing different types of speech disfluency. Because the data were gathered from documents, the research method was descriptive qualitative research with content analysis. The researchers chooses Podcast to air on July 3, 2021 with a duration of 31 minutes 45 seconds. The theory of speech disfluency by Clark and Clark (1977) was used to analyze the speech disfluency. The analysis revealed 77 speech inflections in Ellen Degeneres' Talk Show, including: Silent pauses (1%) and filled pauses (35%) as well as repetitions (10%), false starts (1%) and false starts (retraced) (0%), corrections (14%) and stutters (6%). Ellen Degeneres' filled pauses were the most prevalent type of speech disfluency in "Ellen Degeneres Explains Why She's Ending Her Show". Filled pauses used by speaker in "Ellen Degeneres Explains Why She’s Ending Her Show" to occurs because there is a pause to continue the words that will be spoken carefully. So, the conversation become run well. In conclusion, with this research we must occurs because there is a pause to continue the words that will be spoken carefully.

Defining Filler Particles: A Phonetic Account of the Terminology, Form, and Grammatical Classification of "Filled Pauses"

Article

Full-text available

Feb 2023

Malte Belz

The terms hesitation, planner, filler, and filled pause do not always refer to the same phonetic entities. This terminological conundrum is approached by investigating the observational, explanatory, and descriptive inadequacies of the terms in use. Concomitantly, the term filler particle is motivated and a definition is proposed that identifies its phonetic exponents and describes them within the linguistic category of particles. The definition of filler particles proposed here is grounded both theoretically and empirically and then applied to a corpus of spontaneous dialogues with 32 speakers of German, showing that in addition to the prototypical phonetic forms, there is a substantial amount of non-prototypical forms, i.e., 9.5%, comprising both glottal (e.g., [ʔ]) and vocal forms (e.g., [ɛɸ], [j̰ɛvə]). The grammatical classification and the results regarding the phonetic forms are discussed with respect to their theoretical relevance in filler particle research and corpus studies. The phonetic approach taken here further suggests a continuum of phonetic forms of filler particles, ranging from singleton segments to multi-syllabic entities.

Though this be hesitant, yet there is method in ’t: Effects of disfluency patterns in neural speech synthesis for cultural heritage presentations

Article

Full-text available

Apr 2024
Comput Speech Lang

Word Frequency Effects in Speech Production: Retrieval of Syntactic Information and of Phonological Form

Article

Full-text available

Jul 1994

In 7 experiments the authors investigated the locus of word frequency effects in speech production. Experiment 1 demonstrated a frequency effect in picture naming that was robust over repetitions. Experiments 2, 3, and 7 excluded contributions from object identification and initiation of articulation. Experiments 4 and 5 investigated whether the effect arises in accessing the syntactic word (lemma) by using a grammatical gender decision task. Although a frequency effect was found, it dissipated under repeated access to word's gender. Experiment 6 tested whether the robust frequency effect arises in accessing the phonological form (lexeme) by having Ss translate words that produced homophones. Low-frequent homophones behaved like high-frequent controls, inheriting the accessing speed of their high-frequent homophone twins. Because homophones share the lexeme, not the lemma, this suggests a lexeme-level origin of the robust effect.

A Standardized Set of 260 Pictures: Norms for Name Agreement, Image Agreement, Familiarity, and Visual Complexity

Article

Full-text available

Mar 1980
J Exp Psychol Hum Learn Mem

Presents a standardized set of 260 pictures for use in experiments investigating differences and similarities in the processing of pictures and words. The pictures are black-and-white line drawings executed according to a set of rules that provide consistency of pictorial representation. They have been standardized on 4 variables of central relevance to memory and cognitive processing: name agreement, image agreement, familiarity, and visual complexity. The intercorrelations among the 4 measures were low, suggesting that they are indices of different attributes of the pictures. The concepts were selected to provide exemplars from several widely studied semantic categories. Sources of naming variance, and mean familiarity and complexity of the exemplars, differed significantly across the set of categories investigated. The potential significance of each of the normative variables to a number of semantic and episodic memory tasks is discussed. (34 ref) (PsycINFO Database Record (c) 2006 APA, all rights reserved).

Why is conversation so easy?

Article

Full-text available

Feb 2004

Traditional accounts of language processing suggest that monologue – presenting and listening to speeches – should be more straightforward than dialogue – holding a conversation. This is clearly not the case. We argue that conversation is easy because of an interactive processing mechanism that leads to the alignment of linguistic representations between partners. Interactive alignment occurs via automatic alignment channels that are functionally similar to the automatic links between perception and behaviour (the so-called perception – behaviour expressway) proposed in recent accounts of social interaction. We conclude that humans are 'designed' for dialogue rather than monologue. Whereas many people find it difficult to present a speech or even listen to one, we are all very good at talking to each other. This might seem a rather obvious and banal observation, but from a cognitive point of view the apparent ease of conversation is paradoxical. The range and complexity of the information that is required in monologue (preparing and listening to speeches) is much less than is required in dialogue (holding a conversation). In this article we suggest that dialogue processing is easy because it takes advantage of a processing mechanism that we call 'interactive alignment'. We argue that interactive alignment is automatic and reflects the fact that humans are designed for dialogue rather than monologue. We show how research in social cognition points to other similar automatic alignment mechanisms.

Gesturing on the telephone: Independent effects of dialogue and visibility

Article

Full-text available

Feb 2008

Speakers often gesture in telephone conversations, even though they are not visible to their addressees. To test whether this effect is due to being in a dialogue, we separated visibility and dialogue with three conditions: face-to-face dialogue (10 dyads), telephone dialogue (10 dyads), and monologue to a tape recorder (10 individuals). For the rate of gesturing, both dialogue and visibility had significant, independent effects, with the telephone condition consistently higher than the tape recorder. Also, as predicted, visibility alone significantly affected how speakers gestured: face-to-face speakers were more likely to make life-size gestures, to put information in their gestures that was not in their words, to make verbal reference to their gestures, and to use more gestures referring to the interaction itself. We speculate that demonstration, as a modality, may underlie these findings and may be intimately tied to dialogue while being suppressed in monologue.

References in Conversation Between Experts and Novices

Article

Full-text available

Mar 1987

In conversation, two people inevitably know different amounts about the topic of discussion, yet to make their references understood, they need to draw on knowledge and beliefs that they share. An expert and a novice talking with each other, therefore, must assess each other's expertise and accommodate to their differences. They do this in part, it is proposed, by assessing, supplying, and acquiring expertise as they collaborate in completing their references. In a study of this accommodation, pairs of people who were or were not familiar with New York City were asked to work together to arrange pictures of New York City landmarks by talking about them. They were able to assess each other's level of expertise almost immediately and to adjust their choice of proper names, descriptions, and perspectives accordingly. In doing so, experts supplied, and novices acquired, specialized knowledge that made referring more efficient.

Approximate inference in GLMM

Article

Jan 1993

Trouble in mind: Paralinguistic indices of effort and uncertainty in communication

Article

Jan 2001

D.J. Barr

Approximate Inference In Generalized Linear Mixed Models

Article

Mar 1993

Statistical approaches to overdispersion, correlated errors, shrinkage estimation, and smoothing of regression relationships may be encompassed within the framework of the generalized linear mixed model (GLMM). Given an unobserved vector of random effects, observations are assumed to be conditionally independent with means that depend on the linear predictor through a specified link function and conditional variances that are specified by a variance function, known prior weights and a scale factor. The random effects are assumed to be normally distributed with mean zero and dispersion matrix depending on unknown variance components. For problems involving time series, spatial aggregation and smoothing, the dispersion may be specified in terms of a rank deficient inverse covariance matrix. Approximation of the marginal quasi-likelihood using Laplace's method leads eventually to estimating equations based on penalized quasilikelihood or PQL for the mean parameters and pseudo-likelihood for the variances. Implementation involves repeated calls to normal theory procedures for REML estimation in variance components problems. By means of informal mathematical arguments, simulations and a series of worked examples, we conclude that PQL is of practical value for approximate inference on parameters and realizations of random effects in the hierarchical model. The applications cover overdispersion in binomial proportions of seed germination; longitudinal analysis of attack rates in epilepsy patients; smoothing of birth cohort effects in an age-cohort model of breast cancer incidence; evaluation of curvature of birth cohort effects in a case-control study of childhood cancer and obstetric radiation; spatial aggregation of lip cancer rates in Scottish counties; and the success of salamander matings in a complicated experiment involving crossing of male and female effects. PQL tends to underestimate somewhat the variance components and (in absolute value) fixed effects when applied to clustered binary data, but the situation improves rapidly for binomial observations having denominators greater than one.

Pronouncing “The” as “Thee” to signal problems in speaking

Article

Feb 1997

In spontaneous speaking, the is normally pronounced as thuh, with the reduced vowel schwa (rhyming with the first syllable of about). But it is sometimes pronounced as thiy, with a nonreduced vowel (rhyming with see). In a large corpus of spontaneous English conversation, speakers were found to use thiy to signal an immediate suspension of speech to deal with a problem in production. Fully 81% of the instances of thiy in the corpus were followed by a suspension of speech, whereas only 7% of a matched sample of thuhs were followed by such suspensions. The problems people dealt with after thiy were at many levels of production, including articulation, word retrieval, and choice of message, but most were in the following nominal. © 1997 Elsevier Science B.V. All rights reserved.

The use of uh and um by 3- and 4-year-old native English-speaking children: Not quite right but not completely wrong

Article

Aug 2008
First Lang

The delay markers (DMs) uh and um are often used by adult English speakers to indicate that an upcoming pause is due to a speech disruption, not the end of a conversational turn. Moreover, uh and um indicate different degrees of disruption (Clark & Fox Tree, 2002). Thus, it appears that children must learn how to use DMs appropriately. In the current study we examined DM use in elicited speech samples from 24 3- and 4-year-old children. We found that pauses following DMs were longer than those not following a DM, but that there was no difference between the pauses following uh and um. Children at this age, then, appear to understand the basic use of DMs, but do not yet differentiate between them.

Disfluency in dialogue: An intentional signal from the speaker?

Abstract and Figures

Recommended publications

The effects of speech production and speech comprehension on driving performance

(Not so) Great Expectations: Listening to Foreign-Accented Speech Reduces the Brain’s Anticipatory P...

Mit Rechten über Rechte reden. Michael Köhlmeiers Rede vor/zu Rechten und übers Recht. In: Studia Au...

Perceptual Evaluation of Early versus Late F0 Peaks in the Intonation Structure of Czech Question-Wo...