FIG 4 - uploaded by Jont B. Allen
Content may be subject to copyright.
͑ Color online ͒ Peak response of Radio Shack sound level meters. 

͑ Color online ͒ Peak response of Radio Shack sound level meters. 

Source publication
Article
Full-text available
The Volume- Unit (VU) meter, used in speech research prior to the advent of computers and modern signal processing methods, is described in signal processing terms. There are no known software implementations of this meter, which meet the 1954 ASA standard and provide the instantaneous needle level. Important speech applications will be explored, s...

Contexts in source publication

Context 1
... 6. ͑ Color online ͒ The solid line shows the cumulative distribution of VU levels ͑ generated by VUSOFT ͒ relative to the rms of speech, and compares it to the method of level measurement used by Dunn and White ͑ 1940 ͒ . The dash-dotted line shows the cumulative distribution of rms levels in 1 / 8 s intervals, which is identical to the data shown in Fig. 4 of French and Steinberg ͑ 1947 ͒ , taken from Dunn and White ͑ 1940 ͒ . The idealized result of French and Steinberg is shown with the dashed line. For the solid line, the abscissa is the VU level ͑ in dB vu ͒ minus the long term rms level in decibels ͑ computed over the whole speech recording, typically several minutes ͒ . For the dashed and dash-dotted lines, the abscissa is the ratio ͑ in decibels ͒ of the rms in 1 / 8 s intervals to the long term rms level. The ordinate is the percentage of 1 / 8 s intervals or VU levels ͑ equally spaced in time ͒ that are greater than the level shown on the abscissa.  ...
Context 2
... rise time. This difference in the transient response causes an average difference in the VU reading for short syllables of −1.6 dB, as shown in Fig. 3. The recent vintage VU meter meets the specified transient response, as does VUSOFT . To understand the sensitivity of these differences with speech as the input, we measured peak VU meter levels of 40 speech recordings. We tested speech material consisting of isolated consonant-vowel pairs. A computer was used to store and play back the sounds into the two hardware VU meters, and the largest displacement of the VU meter needle was recorded. All speech sounds were normalized to read 0 dB vu using VUSOFT . A calibration tone, specified by the ASA standard ͑ ASA, 1954 ͒ , was used to assure that all three VU meters were identically calibrated. Figure 3 shows a histogram of the peak VU levels of the hardware VU meters and VUSOFT . The mean difference between 1950s vintage VU meter and VUSOFT is −1.6 dB vu, with a standard deviation of 0.37 dB vu. The mean difference between the recent vintage VU meter and VUSOFT is 0.009 dB vu with a standard deviation of 0.09 dB vu. The recent vintage VU meter provides readings that are more consistent with VUSOFT , because they have more similar transient responses. Two Radio Shack digital and analog meters catalog numbers 33-2055 and 33-4050 ͒ were purchased and tested to determine if they would be a suitable substitute for a VU meter. The peak responses of these instruments are shown in Fig. 4, compared to VUSOFT . The digital sound level meter has a peak response that rises much faster than VUSOFT and thus also faster than the ASA standard ͒ , while the analog meter response is slower. It was also determined that the response of both of the Radio Shack sound level meters depends on the SPL range setting ͑ i.e., the transient response is different depending on whether it is set to read 60– 70 dB or 70– 80 dB, etc. ͒ . Thus, as noted in Radio Shack’s manual, neither of these meters conforms to any VU meter standard specification. Reading a VU meter is more of an art than a science. The duration of the recording turns out to be a critical variable, as we shall show next. With regard to the reading method, the ASA standard for VU meters reads as follows ͑ ASA, 1954 ͒ : The reading is determined by the greatest deflections occurring in a period of about a minute for program waves, or a shorter period ͑ e.g., 5 to 10 s ͒ for message telephone speech waves, excluding not more than one or two deflections of unusual amplitude. The authors asked several “experts” how they read VU meters. We were told by to pick the three highest levels for a segment of speech material and average them together. This method is claimed to be less subjective, and purported by the experts to be the true “standard method” for reading the VU level of speech material. Figure 5 shows the waveform of a speech signal along with the VUSOFT output. High speech levels occur less fre- quently than low speech levels. Due to the small probability of the tails of the probability distribution, the longer the recording, the higher the peak level. In other words, “The longer you measure, the larger the VU level you will record.” The goal in the following study is to quantify the relationship between the rms level, the time duration of the speech sample, and the peak VU level. Our results are derived from a histogram of the VUSOFT output for 26 hours of speech as well as a count of the VUSOFT output peaks and the amplitude of those peaks. VU levels reported on in this section were generated exclusively by VUSOFT . All the speech material was normalized to the same rms level ͑ computed over the whole speech file, typically several minutes ͒ . The speech material was from a cor- pus titled “ICSI Meeting Speech” produced by the Linguistic Data Consortium ͑ ͒ , catalog number 2004S02. The speech involved approximately equal numbers of male and female talkers conversing. This speech material was chosen because it was conversational in nature, involved a large number of speakers, and was never com- pressed or otherwise modified. Figure 5 illustrates the peaks in the VU meter output for a particular speech phrase. The term percentage of intervals refers to the VU level compared to the distribution of VU levels ͑ with VU levels sampled periodically ͒ . When we speak of a percentage level of 90% , the level is greater than 90% of other levels observed in speech for a fixed speech rms level. The horizontal lines in Fig. 5 show the 80%, 85%, and 90% levels for a particular speech recording. The solid line in Fig. 6 shows the cumulative distribution of VU levels relative to the rms of speech. This figure was generated by computing levels for the speech material described above, and making a histogram of those levels. The histogram was converted to a cumulative level distribution where the levels are given relative to the rms level. The dashed line in Fig. 6 is the result from Fig. 4 of French and Steinberg ͑ 1947 ͒ , which was computed from the data of Dunn and White ͑ 1940 ͒ and Sivian ͑ 1929 ͒ . It is not surprising that the relationship for the cumulative distribution of VU levels is similar to the result of Dunn and White ͑ 1940 ͒ because the meter has a similar frequency response to the 1 / 8 s window used by Dunn and White ͑ 1940 ͒ , as illus- trated in Fig. 7. Figure 8 shows the relationship between the time duration that the VU meter level is monitored and the ratio of the VU peak level and the rms level, in dB. For each level the number of peaks of that level were counted. The average length of time between the peaks of each level was computed by dividing the length of the speech material by the number of peaks counted. This figure is particularly important because it allows one to compare the VU meter method described in the ASA standard to the rms level. We are unaware of any ASA 1954 compliant software VU meter simulations that provide the instantaneous numeri- cal needle position. Such a software simulation is necessary for comparison with other speech level measures ͑ such as rms ͒ and also automated level control using the VU meter in modern computer controlled speech experiments. The ideal VU meter is a full wave rectifier followed by a second order low-pass system. The VU meter level is reported in dB vu referenced to a 1 kHz sin wave that will dissipate 1 mW into a 600 ⍀ resistor. A MATLAB © code ͑ called VUSOFT ͒ that implements the standard can be found in Appendix B. Our VU meter reading method is to observe the highest peak. Figure 8 shows how the largest peak depends on ob- servation duration. The ASA specified reading methods states that the VU level is the “greatest deflections occurring in a period of about a minute for program waves, or a shorter period ͑ e.g., 5 to 10 s ͒ for message telephone speech waves, excluding not more than one or two deflections of unusual amplitude.” From Fig. 8 we conclude that the VU level observed over 5 to 10 s intervals will be 6 – 9 dB higher than the rms level, and that the VU level observed over a 1 min interval will be roughly 12 dB higher than the rms level. The transient response of a 1950s vintage VU meter and a recent vintage VU meter were evaluated to confirm that we have accurately duplicated their behavior with VUSOFT . All three VU meters were very close to the standard specified response, leading us to conclude that we had properly inter- preted the standard and duplicated it in VUSOFT . The 1950s vintage VU meter had an overshoot of which was 1.75% greater, and a peak time 0.06 s longer than that of the standard, while the recent vintage VU meter had a nearly identical transient response to the standard ͑ Fig. 2 ͒ . For short speech sounds, the peak level measured by the 1950s vintage VU meter was 1.6 dB vu lower on average than that measured by VUSOFT , while for the same set of speech sounds, the recent vintage VU meter differed from VUSOFT by 0.009 dB vu, on average. The transient response of two Radio Shack “Sound Level Meters” were compared to the transient response of the VU meter to determine if they would make a suitable substitute for a VU meter. The sound level meters had a significantly different transient response and therefore would result in different observed levels. The sound level meter standard published by the American National Standards Institute ͑ ANSI ͒ is different from the ASA VU meter standard, and will provide different level measurements for speech as a result of its different transient response. For example, the ANSI meter standard indicates that the needle level shall have an overshoot of 0 to 1.1 dB for the “fast response” setting and 0 to 1.6 dB for the “slow response” setting, which is significantly larger than the 0.09 to 0.13 dB overshoot specified for ASA standard VU meters. An ANSI sound level meter could potentially be used to measure speech levels, however, the specifications are less tight than the ASA VU meter standard and would therefore not be conducive to reproducibility between sound level meter instruments. It is important when measuring speech levels to know that the transient response of the measurement device has a significant impact on the observed level, that the “VU meter” has tight specifications, and that not every level measurement device is a VU meter. Figure 2 and 3 illustrate how a small difference in transient response leads to an average difference of 1.6 dB vu for short speech sounds. The intensity just noticeable difference ͑ JND ͒ is less than this value. The noise level and the signal-to-noise ratio ͑ SNR ͒ are critical components of many types of speech perception experiments; thus we would like to know how the rms measurement of noise compares to the VU-based measurement. For Gaussian noise the average absolute value is ␴ ͱ 2 / ␲ , where ␴ 2 is the variance of the noise ͑ measured in volts squared ͒ . The VU level of the noise is then 20 ...
Context 3
... signal, with the proper second order system, and scaling and conversion to decibels. The second order system with the response described by the standard is a low-pass filter with a very low cutoff frequency ͑ around 8 Hz ͒ . Conceptually, that means the VU level is a moving average of absolute value of the input signal. For periodic or steady signals such as a tone or noise, the VU level is the average absolute value of the signal. The parameters of the continuous and discrete time second order systems are derived in Appendix A. The MATLAB code that implements the VU meter standard is given in Appendix B, dubbed VUSOFT . A VU meter reads in decibels, 20 log 10 ͑ V / V ref ͒ , where V is the meter voltage and V ref is the level of a 1 kHz tone that will deliver 1 mW into a 600 ⍀ impedance. Thus V ref = ͑ 2 / ␲ ͒ ͱ 2 · 600· 0.001 V, which is about −3 dBV. A full wave rectifier generates harmonics. In a discrete time simulation of a VU meter such harmonics alias, causing the simulated VU meter to breach the standard ͑ i.e., no varia- tions are allowed larger than 0.2 dB from the response to a steady tone at 1 kHz ͒ . This problem is solved by an up- sample rate conversion of the discrete time input signal to at least eight times its original rate before the full wave rectifier ͑ Oppenheim and Schafer, 1998 ͒ . The ASA standard refers to a nonlinearity in the rectifier used in VU meters “the exponent of whose characteristic is 1.2± 0.2.” A 1950s vintage VU meter was examined ͑ further details in Sec. III and in Appendix C ͒ to determine the effects of any such non-linearity on the ballistics of that VU meter. It was discovered that the VU meter faceplate is graduated in a way that removes the effect of the nonlinearity, and that has a negligible effect on the ballistics of the VU meter needle. VUSOFT was designed based on the specifications in Sec. II. The MATLAB code and derivation can be found Ap- pendices B and A, respectively. To verify that VUSOFT implements the ASA VU meter standard correctly, it was compared with a 1950s vintage VU meter and a recent vintage VU meter. The 1950s vintage hardware VU meter was labeled “VOLUME INDICATOR, Type 911-B, Ser. No. D-8941, The Daven Co., Newark NJ.” The recent vintage VU meter was manufactured by Simpson Electric Co. ͑ 520 Simpson Avenue, Lac du Flambeau, WI 54538 ͒ . The transient responses of the three meters were compared, along with the peak VU level with short speech sounds. The response of a second order system can be described by any two of several parameters. The two easiest parameters to measure are the peak time t p and the overshoot M p . The peak time t p is the amount of time it takes for the step response of a system to reach its highest level. The overshoot is the amount by which the step response of a system will exceed its final value. The overshoot is measured by apply- ing a long-duration reference tone and then noting by how much the meter needle exceeds its final value. The peak time is measured by playing successively longer reference tones, until increasing the length of the reference tone no longer increases the maximum level the needle reaches. The length of the tone at which the maximum level reached no longer increases is taken as the peak time. Figure 2 shows the step response of the three VU meters. Note that the 1950s vintage hardware VU meter used in this study does not meet the ASA specification, since the overshoot is too large, and it has a slightly longer rise time. This difference in the transient response causes an average difference in the VU reading for short syllables of −1.6 dB, as shown in Fig. 3. The recent vintage VU meter meets the specified transient response, as does VUSOFT . To understand the sensitivity of these differences with speech as the input, we measured peak VU meter levels of 40 speech recordings. We tested speech material consisting of isolated consonant-vowel pairs. A computer was used to store and play back the sounds into the two hardware VU meters, and the largest displacement of the VU meter needle was recorded. All speech sounds were normalized to read 0 dB vu using VUSOFT . A calibration tone, specified by the ASA standard ͑ ASA, 1954 ͒ , was used to assure that all three VU meters were identically calibrated. Figure 3 shows a histogram of the peak VU levels of the hardware VU meters and VUSOFT . The mean difference between 1950s vintage VU meter and VUSOFT is −1.6 dB vu, with a standard deviation of 0.37 dB vu. The mean difference between the recent vintage VU meter and VUSOFT is 0.009 dB vu with a standard deviation of 0.09 dB vu. The recent vintage VU meter provides readings that are more consistent with VUSOFT , because they have more similar transient responses. Two Radio Shack digital and analog meters catalog numbers 33-2055 and 33-4050 ͒ were purchased and tested to determine if they would be a suitable substitute for a VU meter. The peak responses of these instruments are shown in Fig. 4, compared to VUSOFT . The digital sound level meter has a peak response that rises much faster than VUSOFT and thus also faster than the ASA standard ͒ , while the analog meter response is slower. It was also determined that the response of both of the Radio Shack sound level meters depends on the SPL range setting ͑ i.e., the transient response is different depending on whether it is set to read 60– 70 dB or 70– 80 dB, etc. ͒ . Thus, as noted in Radio Shack’s manual, neither of these meters conforms to any VU meter standard specification. Reading a VU meter is more of an art than a science. The duration of the recording turns out to be a critical variable, as we shall show next. With regard to the reading method, the ASA standard for VU meters reads as follows ͑ ASA, 1954 ͒ : The reading is determined by the greatest deflections occurring in a period of about a minute for program waves, or a shorter period ͑ e.g., 5 to 10 s ͒ for message telephone speech waves, excluding not more than one or two deflections of unusual amplitude. The authors asked several “experts” how they read VU meters. We were told by to pick the three highest levels for a segment of speech material and average them together. This method is claimed to be less subjective, and purported by the experts to be the true “standard method” for reading the VU level of speech material. Figure 5 shows the waveform of a speech signal along with the VUSOFT output. High speech levels occur less fre- quently than low speech levels. Due to the small probability of the tails of the probability distribution, the longer the recording, the higher the peak level. In other words, “The longer you measure, the larger the VU level you will record.” The goal in the following study is to quantify the relationship between the rms level, the time duration of the speech sample, and the peak VU level. Our results are derived from a histogram of the VUSOFT output for 26 hours of speech as well as a count of the VUSOFT output peaks and the amplitude of those peaks. VU levels reported on in this section were generated exclusively by VUSOFT . All the speech material was normalized to the same rms level ͑ computed over the whole speech file, typically several minutes ͒ . The speech material was from a cor- pus titled “ICSI Meeting Speech” produced by the Linguistic Data Consortium ͑ ͒ , catalog number 2004S02. The speech involved approximately equal numbers of male and female talkers conversing. This speech material was chosen because it was conversational in nature, involved a large number of speakers, and was never com- pressed or otherwise modified. Figure 5 illustrates the peaks in the VU meter output for a particular speech phrase. The term percentage of intervals refers to the VU level compared to the distribution of VU levels ͑ with VU levels sampled periodically ͒ . When we speak of a percentage level of 90% , the level is greater than 90% of other levels observed in speech for a fixed speech rms level. The horizontal lines in Fig. 5 show the 80%, 85%, and 90% levels for a particular speech recording. The solid line in Fig. 6 shows the cumulative distribution of VU levels relative to the rms of speech. This figure was generated by computing levels for the speech material described above, and making a histogram of those levels. The histogram was converted to a cumulative level distribution where the levels are given relative to the rms level. The dashed line in Fig. 6 is the result from Fig. 4 of French and Steinberg ͑ 1947 ͒ , which was computed from the data of Dunn and White ͑ 1940 ͒ and Sivian ͑ 1929 ͒ . It is not surprising that the relationship for the cumulative distribution of VU levels is similar to the result of Dunn and White ͑ 1940 ͒ because the meter has a similar frequency response to the 1 / 8 s window used by Dunn and White ͑ 1940 ͒ , as illus- trated in Fig. 7. Figure 8 shows the relationship between the time duration that the VU meter level is monitored and the ratio of the VU peak level and the rms level, in dB. For each level the number of peaks of that level were counted. The average length of time between the peaks of each level was computed by dividing the length of the speech material by the number of peaks counted. This figure is particularly important because it allows one to compare the VU meter method described in the ASA standard to the rms level. We are unaware of any ASA 1954 compliant software VU meter simulations that provide the instantaneous numeri- cal needle position. Such a software simulation is necessary for comparison with other speech level measures ͑ such as rms ͒ and also automated level control using the VU meter in modern computer controlled speech experiments. The ideal VU meter is a full wave rectifier followed by a second order low-pass system. The VU meter level is reported in dB vu referenced to a 1 kHz sin wave that will dissipate 1 mW into a 600 ⍀ resistor. A MATLAB © code ͑ called VUSOFT ͒ that ...

Citations

... The DR of a signal would increase if the upper threshold is increased or if the lower threshold is reduced. It is for this reason that while Dunn and White (1940) and Beranek (1947) have reported the DR for English speech as 30 dB (as they used 99 percentile and 10 percentile thresholds), the same was reported in the 40-to 59-dB range by Jin et al. (2014), Lobdell and Allen (2007), and Rhebergen et al. (2009; as they choose 1 percentile as the lower threshold). Given the sensitivity of DR values to the choice of threshold, Moore et al. (2008) have argued against setting 1 percentile level as the lower threshold. ...
Article
Purpose In this work, we have determined the long-term average speech spectra (LTASS) and dynamic ranges (DR) of 17 Indian languages. This work is important because LTASS and DR are language-dependent functions used to fit hearing aids, calculate the Speech Intelligibility Index, and recognize speech automatically. Currently, LTASS and DR functions for English are used to fit hearing aids in India. Our work may help improve the performance of hearing aids in the Indian context. Method Speech samples from native talkers were used as stimuli in this study. Each speech sample was initially cleaned for extraneous sounds and excessively long pauses. Next, LTASS and DR functions for each language were calculated for different frequency bands. Similar analysis was also performed for English for reference purposes. Two-way analysis of variance was also conducted to understand the effects of important parameters on LTASS and DR. Finally, a one-sample t test was conducted to assess the significance of important statistical attributes of our data. Results We showed that LTASS and DR for Indian languages are 5–10 dB and 11 dB less than those for English. These differences may be due to lesser use rate of high-frequency dominant phonemes and preponderance of vowel-ending words in Indian languages. We also showed that LTASS and DR do not differ significantly across Indian languages. Hence, we propose a common LTASS and DR for Indian languages. Conclusions We showed that differences in LTASS and DR for Indian languages vis-à-vis English are large and significant. Such differences may be attributed to phonetic and linguistic characteristics of Indian languages.
... Such computations are performed for different frequency bands. The DR of speech is affected by several factors; integration time [20][21][22], DR definition [21,22], and frequency band [3,13,[20][21][22][23]. ...
Article
Full-text available
Purpose: The Long-Term Average Speech Spectrum (LTASS) and Dynamic Range (DR) of speech strongly influence estimates of Speech Intelligibility Index (SII), gain and compression required for hearing aid fitting. It is also known that acoustic and linguistic characteristics of a language have a bearing on its LTASS and DR. Thus, there is a need to estimate LTASS and DR for Indian languages. The present work on three Indian languages fills this gap and contrasts LTASS and DR attributes of these languages against British English.Methods: For this purpose, LTASS and DR were measured for 21 one-third octave bands in the frequency range of 0.1 to 10 kHz for Hindi, Kannada, Indian English and British English.Results: Our work shows that the DR of Indian languages studied is 7-10 dB less relative to that of British English. We also report that LTASS levels for Indian languages are 7 dB lower relative to British English for frequencies above 1 kHz. Finally, we observed that LTASS and DR attributes across genders were more or less the same.Conclusions: Given the evidence presented in this work that LTASS and DR characteristics for Indian languages analyzed are markedly different than those for BE, there is a need to determine Indian language specific SII, as well as gain and compression parameters used in hearing aids.
... Vowels have a maximum, sustained amplitude that is usually ,100 ms. This limitation and the difficulty reading a moving needle (Levitt and Bricker, 1970;Lobdell and Allen, 2007) preclude the vu meter as an appropriate instrument for the accurate measurement of speechsignal amplitudes. With digital technology, the rms of a signal is a precise way to quantify the amplitude. ...
Article
Background: The Auditec of St. Louis and the Department of Veterans Affairs (VA) recorded versions of the Northwestern University Auditory Test No. 6 (NU-6) are in common usage. Data on young adults with normal hearing for pure tones (YNH) demonstrate equal recognition performances on the two versions when the VA version is presented 5 dB higher but similar data on older listeners with sensorineural hearing loss (OHL) are lacking. Purpose: To compare word-recognition performances on the Auditec and VA versions of NU-6 presented at six presentation levels with YNH and OHL listeners. Research design: A quasi-experimental, repeated-measures design was used. Study sample: Twelve YNH (M = 24.0 years; PTA = 9.9-dB HL) and 36 OHL listeners (M = 71.6 years; PTA = 26.7-dB HL) participated in three, one-hour sessions. Data collection and analyses: Each listener received 100 stimulus words that were randomized by 6 presentation levels for each of two speakers (YNH, -2 to 28-dB SL; OHL, -2 to 38-dB SL). The sessions were limited to 25 practice and 400 experimental words. Digital versions of the 16, 25-word tracks for each session were alternated between speakers. Results: Each of the 48 listeners had higher recognition performances on the Auditec version of NU-6 than on the VA version. The respective overall recognition performances on the Auditec and VA versions were 71.4% and 64.1% (YNH) and 68.7% and 58.2% (OHL). At the highest presentation levels, recognition performances on the two versions differed by only 0.5% (YNH) and 3.3% (OHL). At the 50% correct point, performances on the Auditec version were 3.2 dB (YNH) and 6.1 dB (OHL) better than those on the VA version. The slopes at the 50% points on the mean functions for both speakers were about 4.9%/dB (YNH) and 3.0%/dB (OHL); however, the slopes evaluated from the individual listener data were steeper, 5.2 to 5.3%/dB (YNH) and 3.3 to 3.5%/dB (OHL). When the individual data were transformed from dB SL to dB HL, the differences between the two listener groups were emphasized. The four functions (2 speakers by 2 listener groups) were plotted for each of the 48 participants and each of the 200 words, which revealed the gamut of relations among the datasets. Examination of the data for each speaker across test sessions, in the traditional 50-word lists, and in the typically used 25-word lists of Randomization A revealed no differences of clinical concern. Finally, introspective reports from the listeners revealed that 91.7% and 83.3% of the YNH and OHL listeners, respectively, thought the Auditec speaker was easier to understand than the VA speaker. Recognition performances on each participant and on each word are presented.
... These findings suggest that either both limits of the SNR range are shifted to more positive SNRs or the width of the EDRS and the SNR range exceeds the value of 30 dB adopted by models like the SII. Other authors have estimated the EDRS for unprocessed speech to be larger than 30 dB, either based on the physical properties of speech sound or on measurable SR performance (French and Steinberg, 1947;Fletcher and Galt, 1950;Zeng et al., 2002;Lobdell and Allen, 2007;Rhebergen et al., 2009). It has been shown that for some high-level listening conditions, an EDRS of 40 dB is better suitable for predicting speech intelligibility in NH listeners (Studebaker et al., 1999). ...
Article
Full-text available
Speech recognition was measured in 24 normal-hearing subjects for unprocessed speech and for speech processed by a cochlear implant Advanced Combination Encoder (ACE) coding strategy in quiet and at various signal-to noise ratios (SNRs). All signals were low- or high-pass filtered to avoid ceiling effects. Surprisingly, speech recognition performance plateaus at approximately 22 dB SNR for both speech types, implying that ACE processing has no effect on the upper limit of the effective SNR range. Speech recognition improved significantly above 15 dB SNR, suggesting that the upper limit used in the Speech Intelligibility Index should be reconsidered.
... For each CV, the most intelligible recording of each talker was selected in this study. The levels of the tokens were equalized using VUSOFT, a software implementation of an analog VU-meter developed by Lobdell and Allen (2007), such that all CVs showed the same VUSOFT peak value. This equalization strategy is mainly based on the vowel levels, thus ensuring realistic relations between the levels of the individual consonants. ...
Article
Full-text available
This study examined the perceptual consequences of three speech enhancement schemes based on multiband nonlinear expansion of temporal envelope fluctuations between 10 and 20 Hz: (a) “idealized” envelope expansion of the speech before the addition of stationary background noise, (b) envelope expansion of the noisy speech, and (c) envelope expansion of only those time-frequency segments of the noisy speech that exhibited signal-to-noise ratios (SNRs) above −10 dB. Linear processing was considered as a reference condition. The performance was evaluated by measuring consonant recognition and consonant confusions in normal-hearing and hearing-impaired listeners using consonant-vowel nonsense syllables presented in background noise. Envelope expansion of the noisy speech showed no significant effect on the overall consonant recognition performance relative to linear processing. In contrast, SNR-based envelope expansion of the noisy speech improved the overall consonant recognition performance equivalent to a 1- to 2-dB improvement in SNR, mainly by improving the recognition of some of the stop consonants. The effect of the SNR-based envelope expansion was similar to the effect of envelope-expanding the clean speech before the addition of noise.
... The DR also depends on the procedure used to determine the speech maxima and minima. Lobdell and Allen (2007) suggested that the DR of speech exceeds 40 dB in the difference between a maximum of 99% and minimum of 1% of the DR criteria, meaning that 99% of the speech signals are at or below the peak level, and 1% of the speech signals fall at or below the minimum speech level, respectively. Compared to Lobdell and Allen (2007), the current DR (30 dB) in the SII is based on a narrower DR criterion (Beranek, 1947;Dunn & White, 1940). ...
... Lobdell and Allen (2007) suggested that the DR of speech exceeds 40 dB in the difference between a maximum of 99% and minimum of 1% of the DR criteria, meaning that 99% of the speech signals are at or below the peak level, and 1% of the speech signals fall at or below the minimum speech level, respectively. Compared to Lobdell and Allen (2007), the current DR (30 dB) in the SII is based on a narrower DR criterion (Beranek, 1947;Dunn & White, 1940). The 30-dB DR of speech was proposed by Beranek (1947) on the basis of the data from Dunn and White (1940) and the current 30-dB DR in the SII was used by French and Steinberg (1947) in their original formulation of the articulation index. ...
Article
Purpose: This study aims to evaluate the sensitivity of the speech intelligibility index (SII) to the assumed speech dynamic range (DR) in different languages and with different types of stimuli. Method: Intelligibility prediction uses the absolute transfer function (ATF) to map the SII value to the predicted intelligibility for a given stimuli. To evaluate the sensitivity of the predicted intelligibility to the assumed DR, ATF-transformed SII scores for English (words), Korean (sentences), and Mandarin (sentences) were derived for DRs ranging from 10 dB to 60 dB. Results: Increasing the assumed DR caused steeper ATFs for all languages. However, high correlation coefficients between predicted and measured intelligibility scores were observed for DRs from 20 dB to 60 dB for ATFs in English, Korean, and Mandarin. Conclusions: Results of the present study indicate that the intelligibility computed from the SII is not sensitive to the assumed DR. The 30-dB DR commonly used in computing the SII is thus a reasonable assumption that produces accurate predictions for different languages and different types of stimuli.
... Six stop consonants were spoken by five talkers and the remaining by another five talkers, resulting in 80 tokens (5 talkers × 16 CVs) total. As acoustic analysis and stability of these tokens were carefully evaluated by Phatak and Allen [13], each token was level-normalized before presentation using VU-METER software [14]. The purpose of dividing syllables among talkers was to create a diversity of talkers and simultaneously shorten experiment time. ...
... The speech tokens were equalized based on the peak level of an analog VU-meter simulation that responds sluggishly to the input signal (VUSOFT; Lobdell and Allen, 2007), such that they exhibited similar vowel levels while the consonant levels differed (cf. Zaar and Dau, 2015). ...
Article
The perception of consonants in background noise has been investigated in various studies and was shown to critically depend on fine details in the stimuli. In this study, a microscopic speech perception model is proposed that represents an extension of the auditory signal processingmodel by Dau, Kollmeier, and Kohlrausch [(1997). J. Acoust. Soc. Am. 102, 2892–2905]. The model was evaluated based on the extensive consonant perception data set provided by Zaar and Dau [(2015). J. Acoust. Soc. Am. 138, 1253–1267], which was obtained with normal-hearing listeners using 15 consonant-vowel combinations mixed with white noise. Accurate predictions of the consonant recognition scores were obtained across a large range of signal-to-noise ratios. Furthermore, the model yielded convincing predictions of the consonant confusion scores, such that the predicted errors were clustered in perceptually plausible confusion groups. The large predictive power of the proposed model suggests that adaptive processes in the auditory preprocessing in combination with a cross-correlation based template-matching back end can account for some of the processes underlying consonant perception in normal-hearing listeners. The proposed model may provide a valuable framework, e.g., for investigating the effects of hearing impairment and hearing-aid signal processing on phoneme recognition.
... [2,4] This electromechanical instrument measures the average value, but has a ballistic and calibration that brings its gauge close to the RMS value. In the document, this is indicated by the p=1.2 exponent, which is between the linear (p=1) and the quadratic (p=2) [2,5] values. With reference to the ballistics, a critically damped instrument or slightly overdamped had more "jittery" action than slightly under-damped instruments, causing an eyestrain in the reading [2]. ...
... To analyze the response of a VU meter, the model given by Lobdell and Allen was used. [5] Figure 1 shows that the ballistic of a VU-meter moving coil system, requires a second order low-pass filter with ζ = 0.81272 and ωn = 13,512. [5] In the software field implementations, the use of 2nd order FIR or IIR filters are necesary, with their advantages and disadvantages, but its treatment is beyond the scope of this work. ...
... [5] Figure 1 shows that the ballistic of a VU-meter moving coil system, requires a second order low-pass filter with ζ = 0.81272 and ωn = 13,512. [5] In the software field implementations, the use of 2nd order FIR or IIR filters are necesary, with their advantages and disadvantages, but its treatment is beyond the scope of this work. ...
Article
Full-text available
The change brought by loudness normalization sets newrequirements in audio measurements motivating the characterization of the available meters. The method presented proposes a procedure for characterizing the dynamic response of an audio meter and provides a means to identify it. Laboratory measurements made on classic electromechanical VU meters using the method described in this paper show that they are close to RMS values and provide better representation of volume than today's more commonLEDladder semi-peak responding meters. By utilizing a simplified variation of the above-referenced method, various audio measuring devices were evaluated quantifying their dynamic characteristics.
... The individual speech tokens were cut and faded in and out manually. Their levels were equalized using VUSOFT, a software implementation of an analog VU-meter developed by Lobdell and Allen (2007), which was also used for level equalization in Phatak et al. (2008). The level equalization was performed such that all CVs showed the same VUSOFT peak value. ...
Article
Full-text available
Responses obtained in consonant perception experiments typically show a large variability across stimuli of the same phonetic identity. The present study investigated the influence of different potential sources of this response variability. It was distinguished between source-induced variability, referring to perceptual differences caused by acoustical differences in the speech tokens and/or the masking noise tokens, and receiver-related variability, referring to perceptual differences caused by within- and across-listener uncertainty. Consonant-vowel combinations consisting of 15 consonants followed by the vowel /i/ were spoken by two talkers and presented to eight normal-hearing listeners both in quiet and in white noise at six different signal-to-noise ratios. The obtained responses were analyzed with respect to the different sources of variability using a measure of the perceptual distance between responses. The speech-induced variability across and within talkers and the across-listener variability were substantial and of similar magnitude. The noise-induced variability, obtained with time-shifted realizations of the same random process, was smaller but significantly larger than the amount of within-listener variability, which represented the smallest effect. The results have implications for the design of consonant perception experiments and provide constraints for future models of consonant perception.