Article

On the Band Width of Vowel Formats

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Measurements were made on a sample of vowel utterances, by male talkers, of the band widths of the first three formants. It was found that the band width was essentially constant and independent of the particular vowel. The mean values for bars 1, 2, and 3 were 130, 100, and 185 cps. respectively. Ten percent of the 300 band widths measured were less than 90 cps and ten percent greater than 260 cps.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Measurements of the formants in the radiated voice sound give an estimation of B with a resolution comparable with the fundamental frequency f 0 , e.g. (Bogert, 1953;. Improved frequency resolution has been achieved with swept sine excitation at the neck with the glottis held closed ; Van den Berg, 1955) and held open . ...
... The top graph in Figure 4 determined an empirical f 0 dependent relationship between f Fi and B Fi based on published closed glottis swept sine excitation measurements with a closed glottis (Fant, 1961;. This relationship is shown in Figure 4 and 185 Hz respectively (Bogert, 1953 (where f F1 is close to the f R1 measured in this study) and up to 140% for higher resonances. ...
Thesis
Full-text available
Speech and singing are of enormous importance to human culture, yet the physics that underlies the production and control of the voice is incompletely understood, and its parameters not well known, mainly due to the difficulty of accessing them in vivo. In the simplified but well-accepted source-filter model, non-linear vocal fold oscillation produces a sound source at a fundamental frequency and its multiples, the resonances of the vocal tract filter the spectral envelope of the sound to produce voice formants. In this thesis, both source and tract properties are studied experimentally and an in vitro experiment investigates how the filter can affect the source. The control of fundamental frequency by either air supply or mechanical control parameters is investigated ex vivo using excised human larynges. All else equal, and excluding the four types of discontinuity or hysteresis observed, the fundamental frequency was found to be proportional to the square root of subglottal pressure, which has implications for singing and speech production, particularly in tonal languages. Additionally, airflow through the glottis causes a narrowing of the aryepiglottic tube and can initiate ventricular and/or aryepiglottic fold oscillation without muscular control. The acoustic impedance of the vocal tract was measured in vivo over a range of 9 octaves and 80 dB dynamic range with the glottis closed and during phonation. The frequencies, magnitudes and bandwidths were measured for the acoustic and for the mechanical resonances of the surrounding tissues. The bandwidths and the energy losses in the vocal tract that cause them were found to be five-fold higher than the viscothermal losses of a dry, smooth rigid cylinder, and to increase during phonation. Using a simple vocal tract model and measurements during inhalation, the subglottal system resonances were also estimated. The possible effects of the filter on the source are demonstrated in an experiment on a water-filled latex vocal fold replica: changing the aero-acoustic load of the model tract by inserting a straw at the model lips changes the fundamental frequency. This result is discussed in the context of straw phonation used in speech therapy.
... The goal of these post-war studies was to specify identifiable characteristics of individual segments of speech, both acoustic features and theoretical assumptions. A fixed consonantal environment, /hVd/, was often employed as listening stimuli (e.g., Bogert, 1953;Peterson & Barney, 1952, Potter & Steinberg, 1950; however, the justification was distinct from Fletcher's. Rather than employing the fixed frame for practicality, it was adopted to obtain a "practically steady state" (Peterson & Barney, 1952, p. 177), reasoned to reflect the vowel as it was intended by the speaker or a vowel's most characteristic representation (Lehiste & Peterson, 1961) 7 . ...
Thesis
A growing body of research is exploring second language (L2) learners’ listening perception of vowel contrasts. Conventionally, researchers have estimated how well listeners differentiate between L2 vowels with isolated words (or syllables) in a fixed consonantal frame, such as b-vowel-t (e.g., beat-bit). However, there is a dearth of research that systematically examines how well results generalise beyond isolated frames or the suitability of employing more phonologically and sententially diverse listening prompt types for assessing L2 vowel perception. To address this gap, two studies investigated the effects of using b-vowel-t and more diverse prompt types for assessing intermediate-advanced adult L2 perception of English /i/-/ɪ/ and /ɛ/-/æ/ vowel pairs. Prompt performance was measured for internal consistency, congruence with the Perceptual Assimilation Model for L2 speech learning (Best & Tyler, 2007), and listeners’ subjective experiences with each prompt type. Mixed effects modelling investigated the predictive power of b-vowel-t performance on more diverse prompt types. Study 1 explored prompt performance using closed-set, forced choice tasks with first language (L1) Mandarin and Korean listeners. Study 2 investigated the effect of Mandarin and Spanish L1 listeners’ target word familiarity and associations with sentence prompts using transcription-response tasks and self-report surveys. Both studies found that diverse prompts had adequate internal consistency and aligned with PAM-L2 predictions. B-vowel-t prompts poorly generalised to diverse prompts and accorded less with PAM-L2 predictions. Survey results showed increased demands from more diverse prompt types based on participants’ ratings; however, this did not always correspond to lower performance. Collectively, results indicate utility in employing prompts beyond isolated words in a fixed consonantal frame for laboratory and at-home administrations. These findings contribute to the vowel perception literature by evaluating and extending the scope of prompts which may be used.
... The first popular approach involves the application of parametric models, such as linear prediction (LP) analysis, to derive speech parameters that compactly describe the resonance properties of a series of acoustic tubes (Atal and Hanauer, 1971). Alternatively, model-free estimation of formant parameters can be performed directly on the timedomain waveform (House and Stevens, 1958) or speech spectrum (Bogert, 1953;Dunn, 1961). Bandwidth estimation has proven difficult using these approaches, and thus investigators have resorted to applying empirically derived relationships between formant frequency and bandwidth (Fant, 1972;Hawks and Miller, 1995;Tappert et al., 1963) or to simply fixing the formant bandwidths to standard values (Olive, 1971;Iseli et al., 2007;Deng et al., 2006). ...
Article
Formant bandwidth estimation is often observed to be more challenging than the estimation of formant center frequencies due to the presence of multiple glottal pulses within a period and short closed-phase durations. This study explores inherently different statistical properties between linear prediction (LP)–based estimates of formant frequencies and their corresponding bandwidths that may be explained in part by the statistical bounds on the variances of estimated LP coefficients. A theoretical analysis of the Cramér-Rao bounds on LP estimator variance indicates that the accuracy of bandwidth estimation is approximately twice as low as that of center frequency estimation. Monte Carlo simulations of all-pole vowels with stochastic and mixed-source excitation demonstrate that the distributions of estimated LP coefficients exhibit expectedly different variances for each coefficient. Transforming the LP coefficients to formant parameters results in variances of bandwidth estimates being typically larger than the variances of respective center frequency estimates, depending on vowel type and fundamental frequency. These results provide additional evidence underlying the challenge of formant bandwidth estimation due to inherent statistical properties of LP-based speech analysis.
Chapter
Hatten wir bisher den Weg der informationstragenden Signale ausschließlich im Bereich der physikalischen Übertragungsmedien verfolgt, so wollen wir nun das Schicksal der Signale beim empfangsseitigen Kommunikationspartner, dem Perzipienten, betrachten, d. h. im psychophysiologischen Bereich.
Chapter
Die wichtigsten Gesetze der Schallabstrahlung lassen sich an idealisierten Strahlertypen, den „Kugelstrahlern“, und an einem weiteren Strahlertyp, der „Kolbenmembran“, anschaulich erläutern.
Chapter
Thus far we have considered the structure and function of the portions of the respiratory system involved in phonation, some of the basic principles of sound production, and the special adaptation of the breathing mechanism for the voice. It now remains for us to examine the processes by which vocal sound is converted into speech and further refined for song. In this chapter our attention will be turned especially upon a consideration of what is good speech, how teaching can be brought to bear on the improvement of speech, questions of special importance to those whose livelihood is more or less dependent upon speech, and finally a few speech problems.
Article
Telephone, radio broadcasting, public-address and bandwidth conserving systems are discussed with particular attention being given to the latter two. The relation of certain properties of hearing (e.g., the Haas effect) to public address system design is reviewed along with several bandwidth conserving techniques including speech interpolation systems and vocoders.
Article
We describe a computer model of the human vocal cords and vocal tract that is amenable to dynamic control by parameters directly identified in the human physiology. The control format consequently provides an efficient, parsimonious description of speech information. The control parameters represent subglottal lung pressure, vocal-cord tension and rest opening, vocal-tract shape, and nasal coupling. Using these inputs, we synthesize vowel-consonant-vowel syllables to demonstrate the dynamic behavior of the cord/tract model. We show that inherent properties of the model duplicate phenomena observed in human speech; in particular, cord/tract acoustic interaction, cord vibration, and tract-wall radiation during occlusion, and voicing onset-offset behavior. Finally, we describe an approach to deriving the physiological controls automatically from printed text, and we present sentence-length synthesis obtained from a preliminary system.
Technical Report
Full-text available
This paper describes a dataset of formant patterns measured in the steady-states of recorded Japanese vowels. Five adult, male, native speakers of Japanese were selected from the "ETL-WD-I and II" balanced word dataset; and for each of the five vowels / i, e, a, o, /, 22 different words were selected on the basis of consistently finding the lengthiest and most steady-state vocalic nuclei. A semi-supervised method based on linear-prediction (LP) analysis of the speech waveform was then used to carefully extract the first four formants in five consecutive frames of each vocalic nucleus, thereby yielding a total of 2750 patterns of formant-frequencies {F1, F2, F3, F4} and formant-bandwidths {B1, B2, B3, B4}. These formant patterns are offered in electronic form, with the aim of contributing to the small but growing body of publicly available formant data. The formant data can be downloaded from here: http://isd.pu-toyama.ac.jp/~parham/proj_FormantDataETL.html
Article
This paper reviews the history of stereophonic sound reproduction. It includes a comprehensive bibliography; the philosophical and psychological research on stereophony; methods used to make disc recordings and tape recordings on two channels; description of large auditorium motion picture systems for stereophonic effects; FM broadcasting of stereophonic programs on a multiplex basis and brief descriptions of pseudo-stereophonic reproduction systems for the home.
Article
This paper represents an effort to update and expand an earlier paper, "Speech Analysis, Synthesis, and Processing - A Selected Bibliography," prepared and published by this author at Texas Instruments Inc., Dallas, in 1963. Selections included in this paper will provide the research with a fairly extensive and representative presentation of relevant source material in the areas to which the title refers.
Article
An automated technique is presented which employs the systems identification properties of the digital inverse filter (IF) [8] for the classification and assessment of laryngeal dysfunction. The information is contained in the positions of the IF polynomial zeros in the complex plane as the IF is computed repeatedly over small analysis segments of a speech sample. A graphic display of the z-plane roots and a vector of pattern features of that display result for each case. The vectors are then processed by an automated clustering procedure to classify the cases in the feature space. The results of the analysis of a large test battery of acoustically degraded synthetic vowel sounds using the IF method are presented.
ResearchGate has not been able to resolve any references for this publication.