Book

Psychoacoustics. Facts and Models

Authors:

Abstract

BewitchedSimon the Loyal has vowed never to love, for love makes a warrior weak. His arranged marriage to a beautiful Norman heiress would be duty and no more. But more than duty stirs his blood when he first sees Ariane.BetrayedShe has known only coldness from men - and a betrayal so deep it all but killed her soul. Wanting no man, trusting no man, speaking only through the sad songs she draws from her harp, Ariane comes to Simon an unwilling bride.EnchantedThey wed to bring peace to the Disputed Lands, but marriage alone is not enough. Simon must teach Ariane passion, she must teach him trust. And both must surrender to the sweet violence of love's enchantment. . .or die. © Springer-Verlag Berlin Heidelberg 1990, 1999, 2007. All rights are reserved.
... In Foundations of Modern Auditory Theory, the critical band is defined as the bandwidth at which subjective responses change abruptly (Scharf 1970). The critical band of (Fastl and Zwicker 2007) a masker sound determines how much it effectively adds to noise masking within specific frequencies. Fastl and Zwicker (2007) listed five methods for determining the critical bandwidth: threshold measurements, masking in frequency gaps, detectability of phase changes, loudness measurements and binaural hearing. ...
... The critical band of (Fastl and Zwicker 2007) a masker sound determines how much it effectively adds to noise masking within specific frequencies. Fastl and Zwicker (2007) listed five methods for determining the critical bandwidth: threshold measurements, masking in frequency gaps, detectability of phase changes, loudness measurements and binaural hearing. Three common approximations for estimating the masker critical bandwidth based on the existing research are listed below: ...
... (1) Bark scale critical bandwidth: proposed by Zwicker(1961), the definition (Fastl and Zwicker 2007) is shown below with conditions: ...
Article
Full-text available
People move to cities to enjoy the conveniences of modern life, but the resulting population growth comes with issues such as increased noise. While public awareness of noise pollution is increasing, noise emanating from construction sites remains a source of complaint. Despite the actions taken by different parties to control construction noise, newly constructed structures continue to exacerbate the harmful consequences of noise. The acoustic environment in some areas of cities, such as London, can no longer provide the tranquility expected by residents, and the risk of daily disruptions for those living near construction sites is high. Given the various limitations and complications of current construction noise control methods, the campaign to limit construction noise must embrace other strategies. Sound masking can potentially abate noise without resorting to time-consuming procedures. In this paper, five raw construction noise datasets were collected around an existing noise-sensitive premise. Five masker sound datasets were recorded to form seven mixed sounds to create a comparative experiment on masking construction noise. A case study and additional research are presented to demonstrate the mechanism of sound masking and to investigate the factors involved in effectively masking construction noise using the sound masking concept. Based on the case study and theoretical research findings, suggestions for masker selection of specific noise datasets and conclusions regarding ideal sound simultaneous masking of construction noise are provided.
... A-Single source component localizations Taking Figs. 1 and 2 as an example when the lead vocal is a source composed of red spatial components, Table I shows their frequency and azimuth positions. On Table I, the Freq column indicates absolute Hz value obtained by averaged FFT, while Freq Posused for map frequency position of each component-uses a round out to a third octave resolution value, due to achieve a simple method closer to the critical bands [9]. Also, RMS L & R indicate dBFS values for each channel, and L-R Diff is the subtraction between L and R. ...
... This method is based on the ITD e ILD concepts (Interaural Time Difference and Interaural Intensity Difference, respectively) [6], as well the as the precedence effect and the frequency radiation patterns of acoustic sources. 10 dB differences between channels are both used on ILD for coincident stereo microphone takes [5] and the precedence effect [9] to settle a source full L or full R in the azimuth position, as well as broadband criteria concerning musical sources contemplating loudness curves [9]. In this way, the 10 dB value obtained as L-R is assigned as full R, and -10 dB as full R. ...
... This method is based on the ITD e ILD concepts (Interaural Time Difference and Interaural Intensity Difference, respectively) [6], as well the as the precedence effect and the frequency radiation patterns of acoustic sources. 10 dB differences between channels are both used on ILD for coincident stereo microphone takes [5] and the precedence effect [9] to settle a source full L or full R in the azimuth position, as well as broadband criteria concerning musical sources contemplating loudness curves [9]. In this way, the 10 dB value obtained as L-R is assigned as full R, and -10 dB as full R. ...
Conference Paper
Full-text available
Research on musical aesthetic patterns on different decades of the second part of the last twentieth century allowed specific studies based on frequency and amplitude data. This approach is carried out mainly to locate musical instruments and their relations inside the stereo image (the traditional audio virtual space), so a bidimensional map development was determined as a key part for finding spatial distribution patterns on different periods of stereo music masters-to be able to characterize timbral aesthetics from a measured perspective-. This article presents the concepts, methodology and the analysis performed, a well all their results. Hence, a new approach of musical sound sources conceptualization inside the stereo image is presented, where they are either spectral or dynamic component constellations. Additionally, this concept and approach could apply to any kind of sound sources inside a recorded signal.
... Auditory perception is based on the critical band analysis in the inner ear where a frequencyto-location transformation takes place along the basilar membrane. The power spectral of the received sounds are not represented on a linear frequency scale but on limited frequency bands called critical bands [34,35]. The auditory system is usually modeled as a bandpass filterbank, consisting of strongly overlapping bandpass filters [20] with bandwidths around 100 Hz for bands with a central frequency below 500 Hz and up to 5000 Hz for bands placed at high frequencies. ...
... -Frequency (simultaneous) masking is a frequency domain phenomenon where a low level signal, e.g. a pure tone (Maskee), can be made inaudible (masked) by a simultaneously appearing stronger signal (the masker), e.g. a narrow band noise, if the masker and maskee are close enough to each other in frequency [34]. A masking threshold can be derived below which any signal will not be audible. ...
... -In addition to frequency masking, two phenomena of the HAS in the time domain also play an important role in human auditory perception. Those are pre-masking and postmasking in time [34]. The temporal masking effects appear before and after a masking signal has been switched on and off, respectively. ...
Article
Full-text available
Robustness, imperceptibility and embedding capacity are the preliminary requirements of any digital audio watermarking technique. However, research has concluded that these requirements are difficult to achieve at the same time. Thus, the watermarking technique is closely dependent on the solution that manages the robustness / imperceptibility trade-off. A large majority of research work has been devoted to improving this trade-off by implementing increasingly advanced techniques. For conciseness and efficiency, the comprehensive review reported in this paper mainly considers the following aspects imperceptibility and robustness among the criteria, as they determine the key performance of most existing audio watermarking systems. In this paper we have introduce the basic concepts of digital audio watermarking, the performance characteristics, and a classification of digital audio watermarking systems according to the extraction/detection process or to human perception. We have also presented various digital audio watermarking applications. Further, we have presented classifications of unintentional and intentional attacks that can be performed on audio watermarking systems and we have highlighted the impact of these attacks on the watermarked audio quality. We have presented two classifications made by researchers, the first one categorizes these attacks into basic and advanced attacks, while the second one classifies the attacks by group according to the process performed on the watermarked audio file. Furthermore, after presenting an overview of the properties of the Human Auditory System (HAS), we have presented several evaluation aspects of audio watermarking systems and we have reviewed various recent robust and imperceptible audio watermarking methods in the spatial, transform and hybrid domains.
... Loudness (N) calculation uses Fastl & Zwicker's model for time-varying signals and assumes free field frontal incidence, as detailed in ISO 532-1 method B [13]. Sharpness (S) calculation is also described in Fastl and Zwicker [18]. Roughness (R) is given by Daniel and Weber [8] and fluctuation strength (FS) is based on the Zhou et al. [45] model. ...
... The following indicators are discussed: L A,max and N 50 are explored as sound intensity measures. Loudness describes perceived noise intensity better than sound pressure level, as it accounts for spectral and temporal masking effects caused by the frequency selectivity characteristic of the human hearing system [18]. R 50 and FS 50 are both measures of noise amplitude modulation. ...
... Information about the spectral envelope given through the spectra's centre of gravity is retrieved by S 50 . Sharpness of narrow-band noises increases sharply at high center frequencies; for this reason, it is considered a noise high-frequency content descriptor [18]. ...
Article
Environmental noise control regulations typically employ noise level descriptors to set limits for noise exposure. However, other characteristics of noise, such as frequency content, temporal patterns and masking, have been proven to influence the perception of acoustic environments. In this sense, psychoacoustic indicators offer an objective means of establishing relationships between physical characteristics of noise and the human auditory sensation phenomena. This study explored psychoacoustic indicators of pass-by vehicle noise across different vehicle categories, driving speeds, and temperatures. Moreover, the indicators were exploited as features to train a classification algorithm to predict vehicle category. Over 2000 vehicle noise samples were collected using the Statistical Pass-By (SPB) method, categorized into three classes according to ISO 11819-1, besides an additional class for delivery vans. Correction coefficients were obtained for temperature and speed to noise levels, loudness, roughness, sharpness and fluctuation strength. The differences in these indicators based on vehicle category were then discussed. A vehicle-category predictive model using the three vehicle categories defined in ISO 11819-1 yielded 84% accuracy. Including vans as an extra vehicle category dropped accuracy to 72% due to their misclassification with passenger cars. Combining these two categories increased overall accuracy to 86%. These findings could enable a less visual-dependent vehicle categorization so that vehicle fleets worldwide are more consistently classified in terms of noise. Additionally, psychoacoustic indicators appear to be valuable features for vehicle classification systems aimed to resemble the human auditory experience.
... Wake interaction effects significantly impacted the UAM vehicle's aerodynamic performance and noise characteristics. Additionally, a psychological noise impact analysis using SEL and Zwicker's PA metric [21] revealed the UAV's higher psychological impact despite lower physical noise. Ko et al. [22] presented a framework for auralizing multirotor noise during flyovers and takeoffs. ...
... Their study reported that modulation metrics, such as fluctuation strength and roughness, significantly impacted the predicted annoyance, with wind conditions and drone velocity also influencing psychoacoustic metrics. Moreover, Zwicker's PA metric [21] of the cross-type quadrotor was higher than that of the plus-type quadrotor. While previous research has offered valuable insights into the psychoacoustic impact of drones in real operational conditions, there remains a gap concerning the significance of impulsiveness under these circumstances. ...
Conference Paper
Full-text available
This study investigated the acoustic and psychoacoustic properties of five quadcopters drones during realistic flyover scenarios, utilizing a 64-microphone array for outdoor recordings. Acoustic analyses encompassed signal-to-noise ratio (SNR) values, time-frequency sound pressure levels, and noise spectra at overhead positions. An analysis based on A-weighted SNR revealed discernible drone noise despite background noise. Significant noise levels were observed up to 12 kHz. Harmonics of blade passage frequencies were evident, influencing noise spectra up to 1 kHz. Unlike traditional aircraft, drones' proximity to the ground limits the atmospheric absorption effects of high-frequency noise. A psychoacoustic analysis focused on sound quality metrics (SQMs) and annoyance assessment. SQMs exhibited consistent patterns across attributes, such as sharpness, tonality, roughness, and impulsiveness, with notable drone-specific perceptions. Different annoyance models indicated varying degrees of annoyance perception, with the Autel EVO II drone (lowest installation ratio, defined as the ratio between the drone diagonal size and the propeller diameter) perceived as the most annoying and the DJI Phantom 4 (heaviest) as the least one. Propeller positioning, represented by the parameter of installation ratio, correlated significantly with annoyance levels, suggesting an influence on both noise signature and psychoacoustic response. These findings highlight the importance of understanding the acoustic and psychoacoustic impact of drones, particularly in urban environments.
... The quantitative analysis comprises sound quality analysis [53] and user fatigue analysis [24]. Sound quality analysis is based on the psycho-acoustic metrics of Zwicker's model [54], including psycho-acoustic annoyance, loudness, sharpness, roughness, and fluctuation strength. These metrics are used to quantify the human sensations evoked by a sound, facilitating sound quality analysis of different composite methods. ...
... (4)), incorporating the weighting of loudness (L), sharpness (S), roughness (R), and fluctuation strength (F ). Further implementation details can be found in [54]. Additionally, user fatigue with sound composition is assessed by considering the number of individuals evaluated, the time spent by the user, and the number of generations invested in the process [24]. ...
Article
Full-text available
Interactive evolutionary computation (IEC) finds diverse applications in the domain of sound. However, there is a lack of methods and operations for combining sound elements into a composite sound during the crossover phase of sound composition. To this end, we propose a novel composite crossover operation using paired comparison-based interactive differential evolution. This operation integrates the target and mutant vectors into a composite sound through linear rescaling, enabling users to synthesize and assess their preferred sounds during the evaluation phase. Moreover, an optimization-stopping mechanism is incorporated to allow users to halt the process when a satisfactory sound is produced. This helps to alleviate user fatigue by eliminating unnecessary candidate sounds and improving the efficiency of the composition process by reducing redundant generations and time consumption. The efficacy of this operation was demonstrated through comparative testing and sound quality analysis. Furthermore, this paper presents an original analytical approach based on four fundamental attributes of sound for analyzing human auditory preferences in composite sounds. This method combines both quantitative and qualitative paradigms, achieved through principal component analysis (PCA) and human subjective auditory analysis. These discoveries have significant implications for both sound composition and the analysis of human auditory perception.
... It is measured in sone, corresponding to a sound of 40 decibels per ton of 1 kHz. The loudness perception is a function of the SPL and frequency, and it is calculated based on the following equation 1. [9,11] Access this article online Website: www.ijpvmjournal.net/www.ijpm.ir DOI: 10.4103/ijpvm.ijpvm_395_22 ...
... The specific loudness (N ) is a function of the critical bandwidth measured in Bark (unit). [9,11] Exposure to harmful noise increases mental workload (MWL), leading to a detrimental effect on human cognitive function. In human-system interaction, various aspects of cognitive function, including working memory, perception, attention, decision-making, and learning, are important because they affect the mental performance of the operator. ...
Article
Full-text available
Bachground Noise is one of the most important harmful factors in the environment. There are limited studies on the effect of noise loudness on brain signals and attention. The main objective of this study was to investigate the relationship between exposure to different loudness levels with brain index, types of attention, and subjective evaluation. Methods Four noises with different loudness levels were generated. Sixty-four male students participated in this study. Each subject performed the integrated visual and auditory continuous performance test (IVA-2) test before and during exposure to noise loudness signals while their electroencephalography was recorded. Finally, the alpha-to-gamma ratio (AGR), five types of attention, and the subjective evaluation results were examined. Results During exposure to loudness levels, the AGR and types of attention decreased while the NASA-Tax Load Index (NASA-TLX) scores increased. The noise exposure at lower loudness levels (65 and 75 phon) leads to greater attention dysfunction than at higher loudness. The AGR was significantly changed during exposure to 65 and 75 phon and audio stimuli. This significant change was observed in exposure at all loudness levels except 85 phon and visual stimuli. The divided and sustained attention changed significantly during exposure to all loudness levels and visual stimuli. The AGR had a significant inverse correlation with the total score of NASA-TLX during noise exposure. Conclusions These results can lead to the design of methods to control the psychological effects of noise at specific frequencies (250 and 4000 Hz) and can prevent non-auditory damage to human cognitive performance in industrial and urban environments.
... O sistema de gravação binaural pode ser observado na Figura 2. Os dados obtidos das gravações binaurais foram pós-processados no software Matlab 2021. A partir dessas medições, foram calculados os parâmetros de níveis de pressão sonora [12] e os parâmetros psicoacústicos loudness [13], roughness [14], sharpness [15] e fluctuation strength [14] para cada canal (esquerdo e direito). ...
... O sistema de gravação binaural pode ser observado na Figura 2. Os dados obtidos das gravações binaurais foram pós-processados no software Matlab 2021. A partir dessas medições, foram calculados os parâmetros de níveis de pressão sonora [12] e os parâmetros psicoacústicos loudness [13], roughness [14], sharpness [15] e fluctuation strength [14] para cada canal (esquerdo e direito). ...
... This fast amplitude modulation is responsible for the harsh or rough perceptual character of many nonverbal vocalizations. It has previously been shown to extend well up to 200 Hz in particularly rough screams 21 , but the peak between 50 and 70 Hz observed here falls right in the middle of the perceptual roughness zone 22 . Slower temporal modulation is present in some nonverbal vocalizations, notably laughs with their syllable-like rhythm at about 5 Hz 23 , but most nonverbal vocalizations appear to lack a well-defined temporal structure. ...
... In addition to low-pass filtering of frequency modulation, we applied an amplitude threshold when calculating the number of inflections: two putative inflection points had to be separated by at least 20 cents (0.2 semitones). This cutoff is still above the resolution of pitch perception (just noticeable differences in the voice-typical frequency range can be as low as 10 cents 22 ), ensuring that the detected inflections are clearly audible and thus perceptually relevant, and at the same time it is high enough to guard against counting tiny "false" inflections caused by measurement error. ...
Article
Humans have evolved voluntary control over vocal production for speaking and singing, while preserving the phylogenetically older system of spontaneous nonverbal vocalizations such as laughs and screams. To test for systematic acoustic differences between these vocal domains, we analyzed a broad, cross-cultural corpus representing over 2 h of speech, singing, and nonverbal vocalizations. We show that, while speech is relatively low-pitched and tonal with mostly regular phonation, singing and especially nonverbal vocalizations vary enormously in pitch and often display harsh-sounding, irregular phonation owing to nonlinear phenomena. The evolution of complex supralaryngeal articulatory spectro-temporal modulation has been critical for speech, yet has not significantly constrained laryngeal source modulation. In contrast, articulation is very limited in nonverbal vocalizations, which predominantly contain minimally articulated open vowels and rapid temporal modulation in the roughness range. We infer that vocal source modulation works best for conveying affect, while vocal filter modulation mainly facilitates semantic communication.
... Since the lower range 250-1,000 Hz probably included this break frequency (e.g., 700 Hz in O'Shaughnessy, 1987), perceived gesture shape may have covaried with this transition in psychoacoustic scaling below and above the break frequency. Also the Bark scale (Zwicker & Fastl, 1999), used in early models of auditory filters, concerns a linear scaling in absolute 100-Hz bandwidths below 500 Hz, whereas filters above 500 Hz widen with increasing frequency. ...
... Compared to a sine tone at 1 kHz, for instance, the perceived loudness would have depended on most, if not all, auditory filters. Given that loudness summation operates differently within auditory filters than between them (Zwicker & Fastl, 1999) and that loudness dependence on frequency is more pronounced at lower SPLs (ISO, 2003), these complex dependencies likely became relevant here. To account for this complexity, a computational model (ISO, 2017a) that also models time-varying loudness can be consulted. ...
Article
Full-text available
Sound-based trajectories or sound gestures draw links to spatiokinetic processes. For instance, a gliding, decreasing pitch conveys an analogous downward motion or fall. Whereas the gesture’s pitch orientation and range convey its meaning and magnitude, respectively, the way in which pitch changes over time can be conceived of as gesture shape, which to date has rarely been studied in isolation. This article reports on an experiment that studied the perception of shape in uni-directional pitch, loudness, and tempo gestures, each assessed for four physical scalings. Gestures could increase or decrease over time and comprised different frequency and sound level ranges, durations, and different scaling contexts. Using a crossmodal-matching task, participants could reliably distinguish between pitch and loudness gestures and relate them to analogous visual line segments. Scalings based on equivalent-rectangular bandwidth (ERB) rate for pitch and raw signal amplitude for loudness were matched closest to a straight line, whereas other scalings led to perceptions of exponential or logarithmic curvatures. The investigated tempo gestures, by contrast, did not yield reliable differences. The reliable, robust perception of gesture shape for pitch and loudness has implications on various sound-design applications, especially those cases that rely on crossmodal mappings, e.g., visual analysis or control interfaces like audio waveforms or spectrograms. Given its perceptual relevance, auditory shape appears to be an integral part of sound gestures, while illustrating how crossmodal correspondences can underpin auditory perception.
... It has also been questioned if a distinct perception of dissonance influences the general pleasantness of multi-tone sounds. Zwicker and Fastl [4] state that the annoyance of sounds is a measure that depends on their loudness, timbre and temporal structure. In several studies, e.g. ...
... Further analyses are suggested to clarify the poor correlation between models and data, as the models have reportedly proven to work for a wide range of sounds, e.g. [4,11]. In a first analysis it was found that participants more likely rated the tonal sharpness in the sounds rather than an overall sharpness that can also arise from a broadband noise. ...
Conference Paper
Full-text available
Product sounds with clearly audible tonal components are often perceived as unpleasant or annoying. If different simultaneously operating aggregates are present in a system, for example vehicle engines and gearboxes, the interaction of tonal components, similar to music, can evoke additional sensations in human auditory perception. Supplementary to a pronounced tonality, such sounds can also yield distinct degrees of consonance or dissonance between tones. Previous studies showed that the perceived dissonance had a high impact on preference judgements for sounds with similar tonality. In experiments of the present study, sounds that differed in tonality were rated with respect to the auditory sensations sharpness, tonality and dissonance by one group of participants while another group only carried out a preference task. Thereout a model for predicting perceived preference is derived from the subjective judgements of auditory sensations. The performance of the preference predictions based on subjective judgements will be compared against purely model-based predictions using different algorithms for acoustic attributes.
... Frontiers in Synaptic Neuroscience 11 frontiersin.org MOC innervation tends to peak broadly in mid-cochlear regions, which then also coincides broadly with the range of most sensitive hearing in the behavioral audiogram of a given species, as previously shown for cat (Fay, 1988;Liberman et al., 1990), human (Zwicker and Fastl, 1990;Liberman and Liberman, 2019), CBA/CaJ mouse (Maison et al., 2003;Radziwon et al., 2009;Grierson et al., 2022) and guinea pig (Heffner et al., 1971;Kujawa and Liberman, 1997;Liberman and Liberman, 2019). This was confirmed here for the gerbil, whose behavioral sensitivity is best between 1 and 20 kHz (Ryan, 1976). ...
Article
Full-text available
Introduction Age-related hearing difficulties have a complex etiology that includes degenerative processes in the sensory cochlea. The cochlea comprises the start of the afferent, ascending auditory pathway, but also receives efferent feedback innervation by two separate populations of brainstem neurons: the medial olivocochlear and lateral olivocochlear pathways, innervating the outer hair cells and auditory-nerve fibers synapsing on inner hair cells, respectively. Efferents are believed to improve hearing under difficult conditions, such as high background noise. Here, we compare olivocochlear efferent innervation density along the tonotopic axis in young-adult and aged gerbils (at ~50% of their maximum lifespan potential), a classic animal model for age-related hearing loss. Methods Efferent synaptic terminals and sensory hair cells were labeled immunohistochemically with anti-synaptotagmin and anti-myosin VIIa, respectively. Numbers of hair cells, numbers of efferent terminals, and the efferent innervation area were quantified at seven tonotopic locations along the organ of Corti. Results The tonotopic distribution of olivocochlear innervation in the gerbil was similar to that previously shown for other species, with a slight apical cochlear bias in presumed lateral olivocochlear innervation (inner-hair-cell region), and a broad mid-cochlear peak for presumed medial olivocochlear innervation (outer-hair-cell region). We found significant, age-related declines in overall efferent innervation to both the inner-hair-cell and the outer-hair-cell region. However, when accounting for the age-related losses in efferent target structures, the innervation density of surviving elements proved unchanged in the inner-hair-cell region. For outer hair cells, a pronounced increase of orphaned outer hair cells, i.e., lacking efferent innervation, was observed. Surviving outer hair cells that were still efferently innervated retained a nearly normal innervation. Discussion A comparison across species suggests a basic aging scenario where outer hair cells, type-I afferents, and the efferents associated with them, steadily die away with advancing age, but leave the surviving cochlear circuitry largely intact until an advanced age, beyond 50% of a species’ maximum lifespan potential. In the outer-hair-cell region, MOC degeneration may precede outer-hair-cell death, leaving a putatively transient population of orphaned outer hair cells that are no longer under efferent control.
... In general, the larger aircraft (A320 and A321) seem to emit less tonal noise (on average) compared to the lighter A319. Despite the importance of tonality in sound perception [65], psychoacoustic annoyance is mainly influenced by loudness. Hence, the PA trend observed between aircraft subtypes (see Fig. 10b) is considerably similar to that of loudness (see Fig. 10c) and not to that of tonality (see Fig. 10d). ...
... Loudness N itself regards the perceived magnitude of a signal and is calculated with the ISO 532-1 standard (2017) [8]. It is also the main factor in the Zwicker and Fastl model's (1999) [9] calculation of psychoacoustic annoyance P A, where it is weighted with the first three psychoacoustic metrics (R, F S and S). Psychoacoustic annoyance is therefore a single value on a linear scale combining annoying aspects of a sound signal. ...
Conference Paper
Full-text available
Psychoacoustic metrics and listening tests can be an important part of any acoustic analysis, addressing aspects of human perception in addition to mere physical descriptions. This, however, leads to the necessity of audio files from either physical prototypes or representative auralizations. Within the scope of a product’s early design stages, the former is usually not available, making the latter the better choice. A previous contribution aimed to create a framework for psychoacoustic analysis of auralized vibroacoustic problems and investigated the benefit of perceptually evaluating auralizations in addition to physical metrics. Utilizing said framework, this contribution’s main question is how closely auralizations of FEM-simulated transfer functions can resemble audio measurements of corresponding systems regarding the listening experience as well as the most common physical and perceptual metrics. This is examined for a case study with reduced complexity and accessible, reproducible measurements: A vibrating plate radiating into a free field. The case study is conducted as a means of validating the auralization methodology and its applicability for assessment purposes in the context of vibroacoustic scenarios.
... Dans l'optique d'une standardisation des procédures de mesure du bruit sur le terrain, cela serait problématique car cela compromettrait la reproductibilité des mesures. Premièrement, bien que la médiane du seuil d'audition de la population otologiquement saine soit bien documentée [51], la dispersion autour de cette moyenne reste à évaluer [52], en dehors peut-être de la tranche des jeunes adultes. Deuxièmement, la perte auditive est une pathologie qui peut passer inaperçue pendant longtemps, en particulier lorsque la perte auditive n'affecte pas la compréhension de la parole [53]. ...
Preprint
Full-text available
Dans la grande majorité de la législation sur le bruit dans l'environnement, la métrique utilisée pour exprimer les valeurs limites est basée sur le niveau de pression acoustique. Mais certains pays ont introduit des valeurs limites d'émergence sonore où la conformité d'une activité bruyante est définie comme la différence maximale admissible entre le niveau de pression acoustique avec et sans la contribution sonore de l'activité réglementée. Cet article étudie les fondements et les mérites de ce type de valeurs limites différentielles de bruit. Notre revue de la littérature indique qu'il y a très peu de preuves en faveur de l'utilisation de limites de bruit différentielles par rapport aux limites « absolues ». De plus, alors que les limites d'émergence sonore semblent provenir d'une réflexion sur l'audibilité de la source de bruit réglementée, celles-ci semblent donner peu d'indications sur ce qui est audible et ce qui ne l'est pas. En outre, la définition et la mesure concrète de l'émergence sonore posent plusieurs problèmes qui compromettent la reproductibilité des résultats. De surcroît, la référence au bruit de fond rend très difficile, premièrement, de vérifier la conformité des installations bruyantes sur le long terme, deuxièmement, de protéger efficacement les riverains d'un bruit excessif et, troisièmement, d'évaluer la conformité sur la base de simulations. Lorsqu'il n'est pas envisageable de passer à une autre métrique, cet article formule des recommandations pour une utilisation plus fiable de l'émergence sonore.
... In order to achieve different situations with respect to speech intelligibility; the speech level was varied during the hearing tests whereas the level of Proceedlngs ol lntemoiss 56 the respective traf c noise was not altered. The physical evaluation of the loudness of the disturbing traffic noises was carried out with a modem analyzing instmment including also features for loudness calculation for temporally variable sounds [5,6] according to Zwickers method out of third-octave bands [2]. From the temporally variable loudness values of the disturbing noises a percentile statistics analysis was performed. ...
... is the larger of (the background noise) or the self speech masking spectrum expressed as a spectrum level Eq 4 is derived from Ludvigsen (15) which in turn has been distilled from masking curves found in Zwicker (16), (17). The total level of masking in the i-th calculation band is found from the second (summed) term in Eq 5. N' i is the noise spectrum level in band i C k is the slope per octave of the upward spread of masking from band k below band i B k is the larger of N i (the background noise level) or the self-speech masking spectrum V i k is the index for each one-third octave frequency band ...
... is the larger of (the background noise) or the self speech masking spectrum expressed as a spectrum level Eq 4 is derived from Ludvigsen (15) which in turn has been distilled from masking curves found in Zwicker (16), (17). The total level of masking in the i-th calculation band is found from the second (summed) term in Eq 5. N' i is the noise spectrum level in band i C k is the slope per octave of the upward spread of masking from band k below band i B k is the larger of N i (the background noise level) or the self-speech masking spectrum V i k is the index for each one-third octave frequency band ...
... is the larger of (the background noise) or the self speech masking spectrum expressed as a spectrum level Eq 4 is derived from Ludvigsen (15) which in turn has been distilled from masking curves found in Zwicker (16), (17). The total level of masking in the i-th calculation band is found from the second (summed) term in Eq 5. N' i is the noise spectrum level in band i C k is the slope per octave of the upward spread of masking from band k below band i B k is the larger of N i (the background noise level) or the self-speech masking spectrum V i k is the index for each one-third octave frequency band ...
... A measure of the annoyance of wind turbine sound that takes into account fluctuation strength and loudness can be derived from the model for psychoacoustic annoyance PA, described in [118]. As suggested in [119], the parameters for sharpness S and roughness R can be set to zero for wind turbines. ...
Thesis
Full-text available
The increasing demand for renewable energy has led to a surge in wind turbine installations. However, the noise emitted by wind turbines is hampering their deployment close to inhabited places because of the annoyance caused to the residents. Despite lower sound levels than other common sources, wind turbine noise ranks as the most annoying sound source compared to wind, road, and rail noises. Therefore, it is essential to understand the characteristics of the noise sources, their propagation in the surrounding environment, and to predict the annoyance perceived by the inhabitants before the installation of the wind farm. Furthermore, wind turbines have to meet specific noise regulations. The norms for onshore horizontal-axis turbines are based on factors like distance from dwellings and the difference between the total noise, comprising of both the emitted and background noise, and the background noise alone. Such regulations are justified by the noise annoyance induced by wind turbines in specific conditions. The challenge for the manufacturers is to enhance energy output, reduce noise annoyance, and ensure regulatory compliance. This necessitates careful optimizations in design, location, and operation. Wind turbines emit mechanical and aerodynamic noise. Sources of mechanical noise are the drivetrain and the gearbox. Aerodynamic noise is caused by the interaction of the moving wind turbine blades and the air. The dominant noise sources in the audible frequency range (20 Hz to 20 kHz) are aerodynamic, mainly trailing- and leading-edge noise. Trailing-edge noise is caused by the interaction of the turbulent boundary layer on the blade with the trailing edge of the blade itself, while leading-edge noise is generated by the interaction of the turbulence in the inflow with the leading edge of the blade. This research focuses on developing a wind turbine noise model, accounting for trailing- and leading-edge noise sources, atmospheric propagation, and creating audible sound files (auralization) from a numerical workflow. Furthermore, the impact of terrain and atmospheric conditions on noise emission, propagation, and perception is investigated. Using analytical and empirical methods, the study prioritizes fast computational turn-around methods that can be used in a real environment and practical applications. The methods should be robust, delivering accurate results for various conditions and turbine geometries. The proposed numerical model comprises the noise source model, the atmospheric propagation method, and the auralization technique. It is based on a RANS-informed Amiet's theory for the prediction of broadband trailing- and leading-edge noise combined with an engineering ray-based model for atmospheric sound propagation. It starts by dividing the blades of a 3D CAD model into segments, enabling precise airfoil shape representation. Trailing-edge boundary layer parameters, crucial for empirical wall pressure spectra models, are calculated using 2D RANS simulations. Amiet's theory is then used to predict broadband airfoil noise, while the modeling of sound propagation with atmospheric effects is considered with the ray-based engineering model Harmonoise. The resulting sound spectrum is converted into an actual audible sound (auralized) using the spectral shaping synthesis technique and the realism is enhanced with turbulence-induced amplitude fluctuation. The methodology is applied to three horizontal-axis wind turbines in various conditions, validating and exploring the noise prediction capabilities for single turbines and a wind farm. Auralization realism is confirmed through listening tests. The proposed methodology surpasses industry standards and enables auralization. It offers reliable noise predictions with available or simplified turbine geometries and incorporates weather effects. The methodology, while enhancing environmental noise assessment, also opens avenues for noise annoyance study and wind turbine acceptance promotion.
... At present, commonly used objective parameters include psychoacoustic parameters, sound pressure level, frequency band characteristics, etc. Fastl and Zwicker [18] combined the inherent properties of sound with the masking effect of human hearing, and proposed multiple psychoacoustic parameters: Loudness, Sharpness, Roughness, Fluctuation Strength, Tonality, and Articulation Index. When selecting appropriate evaluation program source, the correlation between subjective and objective evaluation results of nonlinear distortion can reach more than 0.84, and the influence of frequency response curve on sound quality is also extremely important. ...
... There are 3 environment recordings per category. b) Synthetic noises; SSN [44] and ICRA7 [45]: To evaluate the different algorithms, in this work we also use stationary speech-shaped noise (SSN) and non-stationary modulated seven-speaker babble noise (ICRA7) as synthetic interferers. Table I. 3) Training, evaluation and testing data: The training set was composed of speech from the LibriVox corpus and noise from the DEMAND dataset. ...
Article
Full-text available
Cochlear implants (CIs) provide a solution for individuals with severe sensorineural hearing loss to regain their hearing abilities. When someone experiences this form of hearing impairment in both ears, they may be equipped with two separate CI devices, which will typically further improve the CI benefits. This spatial hearing is particularly crucial when tackling the challenge of understanding speech in noisy environments, a common issue CI users face. Currently, extensive research is dedicated to developing algorithms that can autonomously filter out undesired background noises from desired speech signals. At present, some research focuses on achieving end-to-end denoising, either as an integral component of the initial CI signal processing or by fully integrating the denoising process into the CI sound coding strategy. This work is presented in the context of bilateral CI (BiCI) systems, where we propose a deep-learning-based bilateral speech enhancement model that shares information between both hearing sides. Specifically, we connect two monaural end-to-end deep denoising sound coding techniques through intermediary latent fusion layers. These layers amalgamate the latent representations generated by these techniques by multiplying them together, resulting in an enhanced ability to reduce noise and improve learning generalization. The objective instrumental results demonstrate that the proposed fused BiCI sound coding strategy achieves higher interaural coherence, superior noise reduction, and enhanced predicted speech intelligibility scores compared to the baseline methods. Furthermore, our speech-in-noise intelligibility results in BiCI users reveal that the deep denoising sound coding strategy can attain scores similar to those achieved in quiet conditions.
... It did not appear that the audible response of the speaker aided in the determination of the specific cutoff frequency. The individual sensitivity of ears to modest changes of loudnesstermed the just noticeable difference [JND] -of the tone at the cutoff (-3 dB) and the slow roll-off of a first order filter (-20 dB/decade) are suspected to be contributors to this result (Fastl, 2007). ...
... Physically, in free-field conditions, doubling the distance between a sound source and the listener leads to a 6 dB loss in the intensity of the sound (Blauert and Xiang, 2009). A listener is able to detect intensity differences smaller than 1 dB, which makes it possible to detect small variations in distance (Fastl and Zwicker, 2007;Strybel and Perrott, 1984). However, this is only applicable if the intensity of the sound stimulus remains constant and the listener has already gained some experience with the specific sound source and the acoustic environment. ...
Article
Full-text available
The perception of the distance to a sound source is relevant in many everyday situations, not only in real spaces, but also in virtual reality (VR) environments. Where real rooms often reach their limits, VR offers far-reaching possibilities to simulate a wide range of acoustic scenarios. However, in virtual room acoustics a plausible reproduction of distance-related cues can be challenging. In the present study, we compared the detection of changes of the distance to a sound source and its neurocognitive correlates in a real and a virtual reverberant environment, using an active auditory oddball paradigm and EEG measures. The main goal was to test whether the experiments in the virtual and real environments produced equivalent behavioral and EEG results. Three loudspeakers were placed in front of the participants at ego-centric distances of 2 m (near), 4 m (center), and 8 m (far) in front of the participants, each 66 cm below their ear level. Sequences of 500 ms noise stimuli were presented either from the center position (standards, 80% of trials) or from the near or far position (targets, 10% each). The participants (N = 20) had to indicate a target position via a joystick response (“near” or “far”). Sounds were emitted either by real loudspeakers in the real environment or rendered and played back for the corresponding positions via headphones in the virtual environment. In addition, within both environments, loudness of the auditory stimuli was either unaltered (natural loudness) or the loudness cue was manipulated, so that all three loudspeakers were perceived equally loud at the listener's position (matched loudness). The EEG analysis focused on the mismatch negativity (MMN), P3a, and P3b as correlates of deviance detection, attentional orientation, and context-updating/stimulus evaluation, respectively. Overall, behavioral data showed that detection of the target positions was reduced within the virtual environment, and especially when loudness was matched. Except for slight latency shifts in the virtual environment, EEG analysis indicated comparable patterns within both environments and independent of loudness settings. Thus, while the neurocognitive processing of changes in distance appears to be similar in virtual and real spaces, a proper representation of loudness appears to be crucial to achieve a good task performance in virtual acoustic environments.
... It is convenient to represent human formants on a linear frequency scale when the focus is on voice production because vocal tract resonances occur approximately every dF Hz. However, our auditory perception is approximately logarithmic in the relevant frequency range (Fastl & Zwicker, 2006). Furthermore, the invariance in formant ratios between speakers saying the same vowel becomes more obvious if these ratios are log-transformed -that is, if we convert ratios to musical intervals. ...
Article
Full-text available
Formants (vocal tract resonances) are increasingly analyzed not only by phoneticians in speech but also by behavioral scientists studying diverse phenomena such as acoustic size exaggeration and articulatory abilities of non-human animals. This often involves estimating vocal tract length acoustically and producing scale-invariant representations of formant patterns. We present a theoretical framework and practical tools for carrying out this work, including open-source software solutions included in R packages soundgen and phonTools . Automatic formant measurement with linear predictive coding is error-prone, but formant_app provides an integrated environment for formant annotation and correction with visual and auditory feedback. Once measured, formants can be normalized using a single recording (intrinsic methods) or multiple recordings from the same individual (extrinsic methods). Intrinsic speaker normalization can be as simple as taking formant ratios and calculating the geometric mean as a measure of overall scale. The regression method implemented in the function estimateVTL calculates the apparent vocal tract length assuming a single-tube model, while its residuals provide a scale-invariant vowel space based on how far each formant deviates from equal spacing (the schwa function). Extrinsic speaker normalization provides more accurate estimates of speaker- and vowel-specific scale factors by pooling information across recordings with simple averaging or mixed models, which we illustrate with example datasets and R code. The take-home messages are to record several calls or vowels per individual, measure at least three or four formants, check formant measurements manually, treat uncertain values as missing, and use the statistical tools best suited to each modeling context.
... Typically, A-weighted and average over time. Most researchers agree that this descriptor cannot predict perceptual responses to complex sonic environments [23,24]. Recently, Kantono et al. [25] and Lin et al. [26] did a groundbreaking experimental study demonstrating that the psychoacoustics metrics in a dining context can substantially impact the enjoyment of the flavour of food and drink. ...
Article
One of the most multisensory experiences in our daily life is eating and drinking. Modifying external sensory properties (such as auditory stimuli) has been proposed as a promising framework for modulating taste perception. To this aim, it is relevant exploring the characteristic of the sound environment where people usually consume the juice that can moderate the taste perception in drinking fruit juice. This research conducted the juice tasting experiment focusing on individuals' flavour perception while listening to 7 sound environments in a controlled multisensory laboratory (Sens i-lab). The soundtracks were examined using several psychoacoustics parameters. such as, sound pressure level, loudness, Sharpness, Roughness, fluctuation, tonality, and Prominence were analysed. Correlation analysis reveal several patterns between the average taste rating of the orange juice and the psychometrics of 7 environmental sounds: for example. Sweet flavour of orange juice was negatively correlated with sound level, Roughness, and sound Prominence. Vice versa, sour flavour was positively correlated with the sound level, Roughness, and the average sound fluctuation. Further, the freshness of juice has negative correlation with sound Roughness and the juice thickness rating was negatively correlated with the average Sharpness of sound environment.
... All parameters are calculated by B&K connect software. Loudness and its statistical parameters are calculated according to the method of ISO 532 -1(13), the sharpness parameter group's calculation follows the method described in the DIN 45692 standard (14), and the roughness and fluctuation parameter group is calculated based on the method developed by Zwicker and Aurés (15). ...
Conference Paper
Full-text available
Soundscape is an essential environmental element affecting people's experience in urban open spaces. In the complex mechanism of soundscape perception, the acoustic characteristics of the soundscape are the primary influencing factor. Herein, we used ISO 12913 soundscape indicators to carry out auditory perception experiments on typical soundscape materials recorded from 27 urban open spaces. The principal indicators pleasantness and eventfulness were calculated from the evaluation results of 68 participants. They were then taken individually to perform multiple regression with factors including psychoacoustic, physical acoustics indicators, the significance of various sound source types, and other indicators characterizing the sound composition. The results showed that: 1) The regression model for pleasantness (Ad R 2 = 0.703, F=21.48) showed that the significance of bird sounds contributed positively, and the significance of mechanical sounds and the level of S95 in the environment have a negative impact. 2) The regression model for eventfulness (Ad R 2 =0.676, F=19.05) showed that the number of significant sound sources and the level of F50 in the environment contributed positively. Simultaneously, the significance of mechanical sounds has a negative impact.
... Harmonicity in turn indicates how closely a sonority's spectrum corresponds to a harmonic series [6]. Finally, sharpness denotes the energy at high frequencies which has also been identified as a predictor of C/D [7,8]. ...
Article
Full-text available
There is debate whether the foundations of consonance and dissonance are rooted in culture or in psychoacoustics. In order to disentangle the contribution of culture and psychoacoustics, we considered automatic responses to the perfect fifth and the major second (flattened by 25 cents) intervals alongside conscious evaluations of the same intervals across two cultures and two levels of musical expertise. Four groups of participants completed the tasks: expert performers of Lithuanian Sutartinės, English speaking musicians in Western diatonic genres, Lithuanian non-musicians and English-speaking non-musicians. Sutartinės singers were chosen as this style of singing is an example of ‘beat diaphony’ where intervals of parts form predominantly rough sonorities and audible beats. There was no difference in automatic responses to intervals, suggesting that an aversion to acoustically rough intervals is not governed by cultural familiarity but may have a physical basis in how the human auditory system works. However, conscious evaluations resulted in group differences with Sutartinės singers rating both the flattened major as more positive than did other groups. The results are discussed in the context of recent developments in consonance and dissonance research.
... This section gives a monoaural speech enhancement framework based on the psychoacoustic model of the human hearing system. It is well-known that the sensitivity of the human ear varies unevenly (Zwicker & Fastl, 1990). For this reason, the concept of critical band (CB) is essential to describe how the ear perceives loudness, pitch, and timbre. ...
Article
Full-text available
In this paper, we investigate a psychoacoustic model-driven spectral subtraction framework for enhancement of noisy speech. In the proposed framework, the noisy speech spectrum is separated into six distinct and unevenly frequency-spaced subbands as per the psychoacoustic model of the human hearing system, and spectral over-subtraction is applied independently in each subband. The noise in each subband is estimated using an adaptive noise estimator that does not require a speech pause tracker. To compute and update the noise, the noisy speech power is adaptively smoothed using a smoothing factor controlled by a posterior SNR. The performance of the proposed framework is evaluated using SNR, segmental SNR (SegSNR), and PESQ scores for a variety of non-stationary and stationary noise environments at varying SNR levels. The experimental results show that the proposed framework outperforms various up-to-date speech enhancement technologies on three extensively used objective metrics assessments and speech spectrograms.
... For psychoacoustic metrics, Zwicker loudness [29,68], fluctuation strength, roughness, and sharpness [17] were analyzed. Their average value over the fragment and their main percentiles were also included. ...
Article
Researchers increasingly look at augmenting soundscapes to reduce Behavioural and Psychological Syndrome of Dementia (BPSD). However, methods to select suitable sounds still need improvement. This study proposes a sound selection methodology to augment the soundscape in nursing homes to lower BPSD, using sound characteristics and recognition methods. To uncover the underlying characteristics of sounds that trigger a positive response, added sounds previously used in nursing homes in Flanders were re-analyzed using a wide range of acoustic and psychoacoustic indices. The results highlight the sound characteristics that lead to positive responses and the need for further studies to understand the sounds most suitable for people with dementia. Results showed that sharpness and high-pitch sounds, such as animal localization or crickets, create a higher chance of a positive response. High-pitched sounds have a higher emergence in a typical nursing home sonic environment, increasing the possibility of being noticed. Sounds recognized as music, however, had a lower chance of a positive response and should be used cautiously. Interestingly, although bird vocalizations are often highly rated in other contexts, the ones considered in the current dataset did not lead to a positive response, highlighting the need for further studies to understand better which sounds are most suitable for people with dementia.
... However, other studies [91][92][93] indicate that psycho-acoustic loudness models or the inclusion of a loudness derivative and duration for the description of the sound character in multi-metric annoyance models can enable a better characterization of the perception of transient sounds than single levelbased measures. Metrics including time-varying loudness models, such as those developed by Glasberg and Moore [94] or Zwicker and Fastl [95], may be useful for predicting the impact of impulsive sounds since they incorporate models of the temporal behavior of the human hearing system, which is clearly of importance with rapidly changing sounds. The output from their models is a profile of the loudness heard through time as opposed to a single number representing the whole or part of a sound event. ...
Article
Full-text available
The reduction of sonic boom levels is the main challenge but also the key factor to start a new era of supersonic commercial flights. Since 1970, a FAA regulation has banned supersonic flights overland for unacceptable sonic booms at the ground, and many research studies have been carried out from that date to understand sonic boom generation, propagation and effects, both on the environment and communities. Minimization techniques have also been developed with the attempt to reduce sonic boom annoyance to acceptable levels. In the last 20 years, the advances in both knowledge and technologies, and companies and institutions’ significant investments have again raised the interest in the development of new methods and tools for the design of low boom supersonic aircraft. The exploration of unconventional configurations and exotic solutions and systems seems to be needed to effectively reduce sonic boom and allow supersonic flight everywhere. This review provides a description of all aspects of the sonic boom phenomenon related to the design of the next generation of supersonic aircraft. In particular, a critical review of the prediction and minimization methods found in the literature, aimed at identifying their strengths, limitations and gaps, is made, along with a complete overview of disruptive unconventional aircraft configurations and exotic active/passive solutions to boom level reduction. The aim of the work is to give a clear statement of state-of-the-art sonic boom prediction methods and possible reduction solutions to be explored for the design of next low-boom supersonic aircraft.
... De estas grabaciones, se extrajeron los principales parámetros psicoacústicos [13] (Loudness, Sharpness, Roughness y Fluctuation Strength) de la duración completa, en el caso del ruido urbano, y del promedio de todos los sobrevuelos, en el de las aeronaves y, siguiendo el procedimiento descrito en [14], se seleccionaron muestras sonoras de corta duración que tuvieran todos sus parámetros psicoacústicos próximos a los de las grabaciones completas. De esta manera, para los distintos tipos de ruido urbano, se puedo asegurar que las muestras de corta duración eran representativas del entorno. ...
Conference Paper
Full-text available
RESUMEN El aislamiento acústico se mide generalmente en el rango contenido entre las bandas de 1/3 de octava de 100 y 3150 Hz, siendo este rango el más habitual también para el cálculo de sus valores globales. La normativa contempla, no obstante, dos rangos extendidos de alta y baja frecuencia, cubriendo el rango entre 3150 y 5000 Hz y entre 50 y 100 Hz, respectivamente. Si bien la inclusión del rango extendido de alta frecuencia no supone un problema, la medición de la baja frecuencia es tediosa, requiere de la aplicación de un procedimiento de medida específico y suele llevar asociada una mayor incertidumbre. Consecuentemente, existe hoy en día un extenso diálogo científico relativo a la inclusión de este rango en las caracterizaciones de aislamiento. Aunque diversas investigaciones han destacado que la baja frecuencia del aislamiento no tiene una influencia significativa en la percepción de molestia, otros investigadores siguen fomentando su inclusión. Para contribuir a este diálogo se ha realizado un listening test con 100 personas, con aislamientos de ventana y sonidos urbanos. Se ha observado que la baja frecuencia es perceptualmente relevante sólo si el aislamiento en estas bandas es excepcionalmente bajo comparado con el aislamiento promedio de ventanas. ABSTRACT Sound insulation is generally measured in the range between the 1/3 octave bands from 100 to 3150 Hz, and this is the range most commonly used to calculate its Single-Number Quantities. However, the standard includes two extended high-and low-frequency ranges, covering the range between 3150 and 5000 Hz and between 50 and 100 Hz respectively. * Autor de contacto: dprida@ing.uc3m.es Copyright: ©2023 First author et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 Unported License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. While the inclusion of the extended high-frequency range is not a problem, the measurement of the low-frequency range is tedious, requires the use of a specific measurement procedure and is usually associated with a higher uncertainty. Consequently, there is now an extensive scientific dialogue on the inclusion of this range in sound insulation measurements and expressions. Although several studies have shown that low-frequency sound insulation does not have a significant effect on annoyance perception, other researchers continue to advocate its inclusion. To contribute to this dialogue, a listening test was conducted with 100 people using sound insulation of façade window elements and urban sounds. It was found that low frequencies are only perceptually relevant when the sound insulation in these bands is exceptionally low compared to the average expected window sound insulation. Palabras Clave-aislamiento, fachada, percepción, bajas frecuencias
... There are 3 environment recordings per category. b) Synthetic noises; SSN [44] and ICRA7 [45]: To evaluate the different algorithms, in this work we also use stationary speech-shaped noise (SSN) and non-stationary modulated seven-speaker babble noise (ICRA7) as synthetic interferers. ...
Preprint
Full-text available
Cochlear implants (CIs) provide a solution for individuals with severe sensorineural hearing loss to regain their hearing abilities. When someone experiences this form of hearing impairment in both ears, they may be equipped with two separate CI devices, which will typically further improve the CI benefits. This spatial hearing is particularly crucial when tackling the challenge of understanding speech in noisy environments, a common issue CI users face. Currently, extensive research is dedicated to developing algorithms that can autonomously filter out undesired background noises from desired speech signals. At present, some research focuses on achieving end-to-end denoising, either as an integral component of the initial CI signal processing or by fully integrating the denoising process into the CI sound coding strategy. This work is presented in the context of bilateral CI (BiCI) systems, where we propose a deep-learning-based bilateral speech enhancement model that shares information between both hearing sides. Specifically, we connect two monaural end-to-end deep denoising sound coding techniques through intermediary latent fusion layers. These layers amalgamate the latent representations generated by these techniques by multiplying them together, resulting in an enhanced ability to reduce noise and improve learning generalization. The objective instrumental results demonstrate that the proposed fused BiCI sound coding strategy achieves higher interaural coherence, superior noise reduction, and enhanced predicted speech intelligibility scores compared to the baseline methods. Furthermore, our speech-in-noise intelligibility results in BiCI users reveal that the deep denoising sound coding strategy can attain scores similar to those achieved in quiet conditions.
... Generally depicted as an amalgamation of inharmonious, unsettling auditory elements, noise has the potential to incite irritation or distress. Often characterized as unwanted sound, noise infiltrates our auditory landscape, occasionally clashing with our quest for aural serenity [32]. Yet, the demarcation lines separating sound and noise are nebulous and fluid, intrinsically subjective in nature. ...
Article
Full-text available
Background Some of the noise-intensive processes in dental laboratories include the finishing of crowns, bridges, and removable partial dentures; blowing out workpieces with steam and compressed air; and deflating casting rings. High sound pressure levels are also present in dental vibrators, polishing equipment, and sandblasters. The aim of this study was to Evaluation of the effect of noise production in dental technology laboratory on dental technician hearing capacity. Methods For this cross-sectional study, a total of 120 dental technicians were chosen. Otoscopic evaluation and the Weber test were used to establish if they had sensorineural or transmission hearing loss at 500 Hz, 1000 Hz, 2000 Hz, and 4000 Hz, respectively. Then an OAER (objective auditory evoked response) and PTA (clinical aurimeter) test were administered (Neurosoft, Russia). The whole procedure was carried out by an audiologist and an ENT specialist. Results The PTA results showed that the patient had mild hearing impairment overall, with the loss being more severe in the left ear than in the right. The OAE test results revealed that in-ear of the left side, 84.5% of subjects passed and 15.5% of subjects struggled and were referred to an ear specialist, whereas in the right ear, 82.7% of subjects passed and 17.3% struggled and were referred to an ear specialist. According to this study, in a right-handed study participant, the ear on the left side is more vulnerable than the right side. Differences in the mean hearing threshold at 4000 and 6000 Hz in the left ear were statistically significant in the groups of workers with eleven to fifteen years of practical experience and twenty-one to twenty-five years of practical experience, respectively (Minervini, et al. J Clin Med 12:2652, 2023). Conclusions A statistically meaningful threshold shift from 4000 to 6000 Hz is observed as the working experience grows, and this is suggestive of sensorineural hearing impairment brought on by the noisy dental environment.
... traffic and aircraft noise [7], soundscape pleasantness (considered the opposite of annoyance) more often considers specific audio events such as bird sounds [8] (occasionally even differentiating between different species of birds sounds [9]) or water fountain sounds [10]. Psychoacoustic annoyance is also defined at a coarse-grained level, originally formulated to be applied to consumer products such as vacuums, refrigerators, and car engines [11]. As such, both coarse-grained audio events (cAE) and finegrained audio events (fAE) are considered in this paper. ...
Preprint
Full-text available
Sound events in daily life carry rich information about the objective world. The composition of these sounds affects the mood of people in a soundscape. Most previous approaches only focus on classifying and detecting audio events and scenes, but may ignore their perceptual quality that may impact humans' listening mood for the environment, e.g. annoyance. To this end, this paper proposes a novel hierarchical graph representation learning (HGRL) approach which links objective audio events (AE) with subjective annoyance ratings (AR) of the soundscape perceived by humans. The hierarchical graph consists of fine-grained event (fAE) embeddings with single-class event semantics, coarse-grained event (cAE) embeddings with multi-class event semantics, and AR embeddings. Experiments show the proposed HGRL successfully integrates AE with AR for AEC and ARP tasks, while coordinating the relations between cAE and fAE and further aligning the two different grains of AE information with the AR.
... traffic and aircraft noise [7], soundscape pleasantness (considered the opposite of annoyance) more often considers specific audio events such as bird sounds [8] (occasionally even differentiating between different species of birds sounds [9]) or water fountain sounds [10]. Psychoacoustic annoyance is also defined at a coarse-grained level, originally formulated to be applied to consumer products such as vacuums, refrigerators, and car engines [11]. As such, both coarse-grained audio events (cAE) and finegrained audio events (fAE) are considered in this paper. ...
... It is used as the basis for calculating psychoacoustic annoyance (P A), which combines metrics associated with hearing sensations of annoying sounds and outputs a single linear value. The Zwicker and Fastl model (1999) [8] is used in this con-tribution: It multiplies the five percentile loudness of a signal with weights derived from the signal's sharpness (S) and temporal modulations according to the following equation: ...
Conference Paper
Full-text available
Improving the listening experience of vibroacoustic scenarios is an important task in many industry applications, be it cabin noise inside a passenger car or machine noise in factory halls. Numerical models enable highly accurate predictions of such sound fields. These models, however, yield physical descriptions only. Within the scope of human perception, a description of the perceptual domain through psychoacoustic metrics and listening tests appears to be an important extension of analysis. This leads to the necessity of physical prototypes or authentic auralizations of numerical solutions. This contribution aims to create a framework for psychoacoustic analysis of auralized vibroacoustic problems and investigate the benefit of perceptually evaluating auralizations in addition to physical metrics. For this purpose, a basic case study is conducted. First, complex sound pressure is computed in front of a simple vibrating plate. Performing an inverse discrete Fouriertransform on the resulting pressure yields a time signal that resembles an impulse response. This response is subsequently used for convolution with several different excitation signals in order to simulate a basic vibroacoustic scenario. Finally, a sensitivity analysis is performed for several perceptual characteristics with respect to geometrical changes of the vibrating plate and compared with sensitivities for the physical domain.
... This zone represents the range of parameter values that produce sonically acceptable or musically pleasing output. Hence Selection of the safety zone and the ϵ u and ϵ o margins can be guided by psychoacoustic models [32], musical perception studies [14], or practical audio engineering guidelines [24]. ...
Preprint
Full-text available
Ableton Live, a popular digital audio workstation, can be viewed as an instance of Higher-Order Functional Reactive Programming (HFRP), where audio effects are modeled as higher-order functions and compositional relationships define the behavior of the system. This paper explores various analysis techniques applied within the context of Ableton Live, leveraging the Reactive Normal Form (RNF) framework to facilitate analysis and reasoning about the system's behavior. First, we introduce the concept of HFRP in Ableton Live, highlighting its compositional nature and the role of higher-order functions in modeling audio effects. We then discuss the Reactive Normal Form framework, which provides a structured approach to organizing and sequencing effects, enabling a more systematic analysis of the system. Next, we delve into specific analysis techniques that can be employed within the HFRP framework in Ableton Live. We explore the application of Hoare logic to establish in-variants across time, allowing us to reason about the behavior of effect chains and verify certain properties that hold true throughout the system's execution. This formal approach enhances our understanding of the system's behavior and ensures the preservation of desired properties. Furthermore, with certain effects and the Reactive Normal Form (RNF) framework, we can harness the power of DSP analysis to obtain rich information about all possible audio outputs of a track, based on the initial audio and effect sequence. By analyzing the behavior of specific effects and their interactions within the RNF, we can gain valuable insights into the range of audio outputs that can be produced. Additionally, we explore the use of interval bounds on effect parameters to avoid undesirable sonic artifacts. By defining safety zones for parameter values, we can ensure that the system produces sounds that adhere to specific sonic standards or aesthetic guidelines. Applying interval analysis techniques allows us to estimate the proximity of parameters to their safety zones, providing insights into the range of acceptable values and avoiding dissonant, clipped, or harsh sounds. Overall, this paper showcases the applicability of various analysis techniques in the context of HFRP in Ableton Live.
... In the latter study [36], we investigated ''short-term" or ''psychoacoustic" noise annoyance according to [37,38] to different situations of broadband sounds including those emitted by WTs. The sounds covered a range of spectral shape, depth of periodic AM, and occurrence (or absence) of random AM, and were subjectively rated under controlled laboratory conditions. ...
Chapter
It has been suggested that, for patients with obstructive sleep apnea (OSA), identification of the obstruction site in the upper airway is essential in the treatment decision-making process. Due to the fact that all existing techniques for the identification of the obstruction site have drawbacks, there is a continuous search for more feasible methods. Because of the filtering effect of the upper airway on snoring sound, snoring sound parameters are considered as potential predictors of the obstruction site in the upper airway. According to previous studies, using snoring sound parameters to predict the presence of epiglottic obstruction in OSA patients with single-level obstruction can achieve a relatively high accuracy (53.1–96%) as compared with velum, oropharynx, and tongue base obstructions, even though the exact role of the epiglottis in generating snoring sound is still unclear. However, further studies are needed to explore the possibility to predict the presence of epiglottic obstruction in OSA patients with multilevel obstruction based on the acoustic analysis of snoring sound, especially of snoring sound recorded during natural sleep. At this stage, snoring sound analysis does not seem to be a viable diagnostic modality for treatment selection.
Article
Full-text available
Across the millennia, and across a range of disciplines, there has been a widespread desire to connect, or translate between, the senses in a manner that is meaningful, rather than arbitrary. Early examples were often inspired by the vivid, yet mostly idiosyncratic, crossmodal matches expressed by synaesthetes, often exploited for aesthetic purposes by writers, artists, and composers. A separate approach comes from those academic commentators who have attempted to translate between structurally similar dimensions of perceptual experience (such as pitch and colour). However, neither approach has succeeded in delivering consensually agreed crossmodal matches. As such, an alternative approach to sensory translation is needed. In this narrative historical review, focusing on the translation between audition and vision, we attempt to shed light on the topic by addressing the following three questions: (1) How is the topic of sensory translation related to synaesthesia, multisensory integration, and crossmodal associations? (2) Are there common processing mechanisms across the senses that can help to guarantee the success of sensory translation, or, rather, is mapping among the senses mediated by allegedly universal (e.g., amodal) stimulus dimensions? (3) Is the term ‘translation’ in the context of cross-sensory mappings used metaphorically or literally? Given the general mechanisms and concepts discussed throughout the review, the answers we come to regarding the nature of audio-visual translation are likely to apply to the translation between other perhaps less-frequently studied modality pairings as well.
Conference Paper
Systematic reviews can provide valuable evidence to health care and health policy, especially, when clinically important effects of similar magnitude are observed. Whether these strong requirements are met in noise epidemiology is questionable. The medical model does not fully fit with the stress and context driven causal pathways through which the health effects are determined. Heterogeneity of effects is expected due to different background prevalence of vulnerabilities, health and disease. Exposure assessment is more demanding than in related areas (air pollution) due to the need of accounting for perceptional accuracy and sound control options. Furthermore, the applied A-weighted sound level indicators do not correlate in the same way with the actual nervous system arousal for all sound sources. Eventually, most research is of observational nature and randomization and placebo control is largely not feasible. Nevertheless, systematic reviews are indispensable for the further development of noise epidemiology. Furthermore, in order to compare the potential size of the adverse effects of noise on humans at the policy level the calculation of disability adjusted life years (DALYs) is required and dependent on high quality systematic reviews. Examples of limitations will be outlined in detail and suggestions are made for future improvements.
Article
Directly audible phase changes of three-tone-complexes were determined for three frequency regions (225 Hz, 1050 Hz, 5250 Hz) as a function of the frequency distance between the components. Three different amplitude configurations were used at three levels (60 dB, 80 dB, 100 dB). The audibility limits of phase lead and phase lag of the highest frequency component, with respect to the cosine-configuration, were ascertained. The research was carried out with six subjects, using the method of constant stimuli. For small frequency distances between the components phase changes around 30 degree are perceptible. The difference limens increase when the distances between the components reach certain limits, depending on frequency and level.
Article
An estimate of the correlation between the sound pressure level and the loudness of speech is proposed. Speech-noise elicits the same loudness as a test-sentence, if the noise level corresponds to the maximum value of the fluctuating speech level. For the level measurements, A-weighting and ″slow″ meter damping is preferable. In order to halve the loudness of speech of speech-noise, the level has to be decreased in both cases by almost 7 dB.
Article
Masking of single, short test tone impulses by critical band noise masker impulses is measured. Simultaneous, forward (post) as well as backward (pre) masking are treated. The dependence of the masked threshold on the delay time between test tone impulse and masker impulse, test tone duration, test frequency, masker level and masker duration is investigated. This way, an atlas of temporal masking effects, elicited by a single critical band noise masker impulse, is produced. By means of these data, threshold values of test tone impulses masked by critical band noise of various temporal structures can be estimated. In addition, some detailed results should be mentioned: The forward masking pattern of a critical band noise masker exhibits steeper slopes than both backward and simultaneous masking critical band rate pattern. Increasing the level of a critical band noise masker results in a flattening of the upper slope of its backward, simultaneous and forward masking pattern. For short delay time of test tone impulses, very short critical band noise masker impulses produce less backward as well as forward masking than longer maskers; simultaneous masking, however, is almost independent of masker duration. A transient masking pattern, i. e. masked thresholds as a function of both critical band rate and time, visualizes influence of a single critical band noise masker impulse on single short test tone impulses, masked by bursts of a critical band noise masker are estimated. Predicted thresholds yield a rough approximation of data determined in control experiments.
Article
The method of compensation of difference tones has been used on two subjects with reference to three questions. The results can be summarized in the following generalization: (a) the nonlinearity reacts independent of time; (b) the conpensations of the difference tones of third, fifth and seventh order are independent of each other; (c) the difference tone to be compensated exists as a cochlea-traveling-wave, which is composed out of many vectorial partials.
Article
Difference limens of phase are measured presenting steady three-tone-complexes via headphones or via loudspeaker in different acoustic environments. When using a loudspeaker in an anechoic room, almost the same phase differences are audible as when using headphones with an equalizer. The mean difference limens of phase in a reverberant room are about twice as large as those in an anechoic chamber. If the same electric signals are reproduced acoustically by the same loudspeaker but in reverberant amplitudes and phases of the components are altered in such a way that the sound signals at the ear of the listener may be very different. The results of psychoacoustic experiments reveal that in a reverberant room the signals at the listener's ear, with a high probability, will be on the average less ″phase sensitive″ than they are in an anaechoic room. Thus the results of the headphone experiments represent some kind of ″worst case″ with respect to the audibility of phase changes.
Article
Sound pressure measurements with microphones have shown that their frequency response can be affected by mounting devices and floor-stands. Using customary devices, an interference error between plus or minus 1 dB and plus or minus 2 dB can be expected. Especially in the frequency range between 1 kHz and 20 kHz, the microphone's frequency response is affected comb-filter-like. Theoretical soundfield calculations showed that errors less than plus or minus 0. 5 dB can be hardly attained, errors less than plus or minus 0. 2 dB require selected equipment. For example, mounting of a one quarter in. -microphone on a 40 cm boom, fixed on the leg (10 mm diameter) of a vertical floor-stand, leads to an interference error of plus or minus 0. 8 dB. An additional error of plus or minus 1 dB may occur due to microphone mounting devices (plastic clips). Covering floor-stands with absorbing material is not recommended, since thereby the interference error may be increased. These findings are important with respect to measuring standards (e. g. DIN 45 633), where maximum errors of the whole test set-up must not exceed plus or minus 1 dB. Best results will be obtained with mounting a microphone on a flexible extension rod, which is bent to an angle of approximately 120 degree . In this case interference errors amount to less than plus or minus 0. 2 dB.
Chapter
Critical bandwidth represents a very basic magnitude in psychoacoustics. Many hearing sensations like pitch, loudness and timbre are easily described when transforming the physical frequency scale into the psychophysical critical band rate scale. The usefulness of the critical band concept has been proposed and confirmed in many reviews of psychoacoustics (e.g. Zwicker and Feldtkeller, 1967; Scharf, 1970; Zwicker, 1975; Plomp, 1976; Moore, 1982; Zwicker, 1982). As a database for the critical bandwidth, the classic paper by Zwicker, Flottorp and Stevens (1957) is usually referenced, in which also the critical band is correlated to the concept of the critical ratio as proposed by Fletcher (1940) and others. In an international committee of ISO, the numerical values of critical bands were agreed upon and published by Zwicker (1961). For these numerical values of the critical band, Zwicker and Terhardt (1980) suggested analytical expressions in order to facilitate the incorporation of the critical band concept into computer models of hearing. An alternative formula was suggested by Moore and Glasberg (1983), describing the equivalent rectangular bandwidth, ERB. The ERB is based on a symmetric auditory filter as derived by Patterson (1976) from masking experiments with notched-noise maskers. A detailed description of the ERB-concept is given in Patterson et al. (1982); the formula of Moore and Glasberg (1983) was fitted to notched-noise data of Fidell et al. (1983), Patterson (1976), Patterson et al. (1982) and Weber (1977).
Article
Sixty short pieces of different kinds of music (about 5 s each) were presented to eleven subjects. The subjects reproduced the perceived rhythm of the music on a Morse-key. The results showed significant accumulations of the reproduced time intervals at values of 125, 250, and 500 ms. A comparison with results of the rhythm of speech, which are published in a companion paper, yielded significant differences in the perceived rhythm of speech and music.
Article
Most efforts to transmit information of speech sounds to the profoundly deaf by mechanical stimulation of the skin surface are not successful, because the transducer elements cannot be kept small enough and their power consumption requires large batteries. Thus, efficient and wearable tactile hearing aids for use in everyday life are not yet available. Ring-shaped transducer elements using thin piezoelectric polyvinylidene fluoride films for stimulating the phalanges of one hand have a large usable dynamic range (30 dB), low power consumption (100 mu W at threshold and f equals 200 Hz) and very low weight (below 2 g) and volume. The mechanical sturdiness and excellent long-time stability of these transducers enable one to develop a small and wearable tactile hearing-aid that can also be used by deaf, pre-school children.
Article
German, English, French and Japanese test sentences and sentences reproduced in reverse were presented to subjects. After presentation, the subjects reproduced the perceived rhythm on a Morse-key. The German, English, and French sentences showed perfect correspondence in the numbers of auditory events and syllables. The rhythm patterns of the Japanese test sentences were similar to the rhythm patterns of the German, English, and French sentences with respect to time intervals between instants of events. The reversed speech yielded what was virtually the temporally-inverted rhythm pattern of the forward speech.
Article
The cochlear potential (CP) of human subjects (S) in response to low-frequency Gaussian-shaped condensation and rarefaction impulses was recorded at the external ear canal, and averaged. This CP can be separated into three components: the microphonic (MP), summating (SP) and compound action potential (CAP). The time course of the MP is a non-linear representation of the differentiated time course of the sound pressure at the eardrum, for which the extent of the nonlinearity is strongly dependent on the S. The time course for the CAP to both condensation and rarefaction pulses shows that the probability of the production of an action potential is high when the corresponding differentiated MP goes through zero in the negative direction. The CAP can be separated into the various contributions from different, non-overlapping areas of the basilar membrane (BM). The latencies of these contributions increase with increasing distance from the oval window, corresponding to the increasing delay in arrival time of the travelling wave. The measured values for this increase in latency are, however, clearly smaller than the values given in the literature for ″click″ responses in the normal ear, but correspond closely to the values given for ears showing recruitment. In the latter case it is assumed that the latency increase measures only the delay of the travelling wave on the BM and not the response time of the ″second filter″ . The SP has a triangular time course, whose duration increases with impulse length but whose amplitude is related to the amplitude of the sound impulse in a highly non-linear fashion. The maximum attainable amplitude of the SP differs greatly from subject to subject.
Article
A system has been designed and built in which a music or speech signal whose sound level and tone is adapted to existing background noise. The interior noise of a car was chosen as an example. Initial psychoacoustic experiments showed that an acceptable sound level of the required signal for the experimenters was such that the ratio of the peak sound intensities of the required signal and the interfering noise was 2:1 on average. The realized system uses three frequency bands with separation frequencies of 300 Hz and 200 Hz within the control loop. They ensure that on the one hand the loudness of the required signal is held in a narrow range above the audible threshold of the noise, and the other hand that deficiencies in the tone quality are corrected.
Article
Thresholds of amplitude and of frequency modulation have been measured by 6 subjects at tone frequencies of 0. 25, 1, and 4 kHz, using a modulation frequency of 4 Hz as a function of the level of the tones in quiet condition as well as with additional broad band or narrow band masking noise. Results indicate that thresholds of modulation remain unchanged by masking for tone levels exceeding the excitation level of the masking sound by more than 25 db.
Article
A model to describe the spectral pitch of pure tones is derived from psychoacoustically measured results of pitch shifts, resulting from level changes or partial masking of sinusoids. The magnitudes constituting the model are based on the masking-patterns of the sounds employed during the experiments. It is shown that spectral pitch of pure tones can be calculated on the basis of three magnitudes derived from the masking-patterns. Control experiments and a comparison with data from the literature show a quantitatively correct description of spectral pitch by this model. In addition, neurophysiological data and the pitch effects examined show qualitative agreement.
Article
In order to investigate universal harmony, the sensorial consonance, two procedures were carried out. In the first investigation 17 chosen test sounds were chosen with regard to their harmony and compared pair-wise. In the second investigation the harmony was estimated through test-sounds. These test-sounds were from the following: -vacuum cleaner, coffee grinder, telephone bell, typewriter, running water-tap, a drill, a circular saw, a car, motor-bike, aeroplane, a woman's voice, a man's voice, a piece of music, a chord, bell noise, amplitude-modulated sinusoidal tone, white noise, a pure tone. In both experiments all the sound tests were performed at a loudness level of 78 phons. In both investigations the harmony gained of the voice was universally good: The musical accord would be judged as sounding more pleasant, the noise of the circular saw as less pleasant. An investigation of the connection between a pleasant sound and the fundamental hearing sensation gave a high coordination between the experimentally ascertained pleasant sound value and a combination from estimated values of the sensations o roughness, sharpness an tonality. Roughness and sharpness influence melody in the negative sense while tonality represents a positive influence.
Article
The behaviour of the auditory system in the perception of small changes in frequency and amplitude was investigated. The kind of presentation was found to influence strongly the perceptual resolution. It is necessary to distinguish between sound differences and sound variations. 'Differences' refer to sounds that are isolated in time during presentation, while 'variations' are changes of sound properties that take place continuously. Just Noticeable Differences (JND's) and Just Noticeable Variations (JNV's) were measured for changes both in frequency and in amplitude for sinusoids and broad-band stimuli. The ear was found to resolve frequency differences which are about one third as large as the smallest resolvable frequency variations, independent of stimulus band width and level.
Article
Masking patterns produced by critical and band wide noise, by one tone, and by complexes of two, three and five tones of close frequency are compared. The levels of difference tones, necessary for the compensation of internally produced difference tones of uneven order are measured as function of both level and frequency separation of the two primary tones. A comparison between the results of the two series of measurements indicates that the nonlinearity of the hearing system influences its frequency selectivity in such a way that sinusoidal tones reflect the largest but level-dependent selectivity; narrow band noises, however - as a consequence of the additional spectra produced by the ear - simulate a less pronounced but almost level-independent frequency selectivity.
Chapter
An interesting characteristic of human hearing is the frequency distance of extreme values of all three kinds of emissions, spontaneous (SOAE), simultaneous evoked (SEOAE), and delayed evoked (DEOAE) otoacoustic emissions. This preferred frequency distance was described as being frequency dependent. Transformed into the critical band rate, however, it shows a quite constant value. The first data collection (Zwicker and Schloth, 1984) pointed towards a characteristic value of about a half of a critical band width. Larger data collection (Dallmayr, 1985, 1987) for SOAE and SEOAE uncovered this characteristic value as being close to 0.4 Bark. This value seems to be very stable and quite independent of subjects. The reason for this value must be hidden somehow in the cochlear mechanics. It is the goal of this paper to search for such a relation.
Chapter
Modulation of a sound elicits one of two different,kinds of auditory sensation, depending on the speed of modulation. In the case of low modulation frequencies (typically less than about 20 Hz) the resultant sensation is called fluctuation strength (see Terhardt, 1968). Faster modulation of sounds leads to the perception of roughness (v. Békésy, 1935). As yet, fluctuation strength has received considerably less attention than roughness. Terhardt (1968) and Schone (1979) studied the dependence of fluctuation strength on some essential signal parameters, using sinusoidally amplitude modulated pure tones. We have recently repeated and extended these earlier observations, to include investigations of the fluctuation strength of AM-broadband noise (Fastl, 1982a) and of FM-tones.
Chapter
Studies concerned with the neural mechanisms of processing biologically relevant sounds are heavily burdened by the choice of an appropriate stimulus test repertoire. From the physical point of view, acoustic signals are determined by frequency and intensity parameters and their temporal courses. Various combinations of these parameters are the components also of bioacoustic signals and therefore have to be considered in the search for those parts of the signals which transmit the biologically relevant information. Our approach in solving such problems considered the parameter intensity or course of a sound’s amplitude. This parameter is the subject of research mostly because it obviously bears information about the distance and the spatial location of a sound source within the acoustic environment of individuals. However, regarding the communicative behavior of species, the amplitude of a sound often signify emotional states and, beyond that, it’s course can also deliver intentional information used for intraspecific acoustic communication between the members of a species. The information bearing parameters can be single or multiple as well as periodically repeated AM-elements.
Article
Für eine möglichst verzerrungsfreie Abstrahlung kurzer Schallimpulse wurde ein elektrostatischer Lautsprecher entwickelt und gebaut. Es wurde ein symmetrisches System gewählt, bei dem eine dünne grafitierte Kunststoffolie zwischen zwei festen perforierten Elektroden schwingt. Um ein möglichst ebenes Schallfeld auch bei hohen Frequenzen zu erreichen, wurde die etwa 1 m2 große schwingende Fläche des Lautsprechers unterteilt. Bei tiefen Frequenzen schwingt die ganze Fläche, bei hohen Frequenzen nur eine kleinere Teilfiäche in der Mitte des Lautsprechers. Nach einer elektrischen Entzerrung wurde ein horizontaler Frequenzgang zwischen 200 Hz und 10 kHz erzielt. Der in diesem Bereich maximal erreichbare Schalldruck beträgt etwa 85 dB in einem Abstand von 2 m. For radiating short pulses of sound with as little distortion as possible, an electrostatic loudspeaker was developed and constructed. A symmetrical driving arrangement was chosen where a thin graphite-coated plastic foil vibrates between two fixed perforated electrodes. To attain a sound field as planar as possible even at high frequencies, the vibrating surface of the loudspeaker, with a size of about one square meter, was subdivided. At low frequencies the surface vibrates in its entirety; at high frequencies the vibrations are restricted to a smaller partial area at the center of the loudspeaker. After electrical equalization a flat frequency response was attained between 200 cps and 10 kcs. The maximum sound pressure that can be attained within this range is about 85 db at a distance of 2 m.
Article
Six musical test sounds, each tuned in normal, stretched, and contracted tempered intonation, were subjectively evaluated by 25 musicians and 25 non-musicians in paired-comparison tests. The results indicate that none of the three intonations is optimum for every kind of musical sounds. Rather, for tone sequences and for sounds of the type ″high melody plus low accompaniment″ , stretched intonation was first-rate. In chords with medium spectral complexity, normal intonation was optimum, and in case of high spectral complexity even contracted intonation was evaluated as suitable. It is concluded that ″ideal″ intonation must be flexible, i. e. , adapting to the sound's structure at each moment. The relations of these findings to principles of pitch and roughness perception, musical performance, and intonation of conventional as well as electronic keyboard instruments are discussed.
Article
The abstract for this document is available on CSA Illumina.To view the Abstract, click the Abstract button above the document title.