Figure 3 - uploaded by Toni Hirvonen
Content may be subject to copyright.
1: Cone of confusion. θ cc indicates the azimuth and φ cc the elevation angle of the sound source within the cone of confusion.

1: Cone of confusion. θ cc indicates the azimuth and φ cc the elevation angle of the sound source within the cone of confusion.

Source publication
Article
Full-text available
Humans have the ability to perceive various spatial auditory attributes, such as the localization and width of sound sources. The study of spatial hearing is important not only in terms of basic perceptual research, but also because ever more sophisticated audio reproduction algorithms and systems are introduced to consumers. From such systems, lis...

Similar publications

Article
Full-text available
A Digilog Book is an augmented reality (AR) based next generation publication supporting both sentimental analog emotions and immersive digital contents to improve a user's experience. This paper enhances the Digilog Book authoring tool, ARtalet. This is a tangible user interface based immersive AR authoring tool providing an intuitive non-programm...
Article
The iterative process of masking minimisation when mixing multitrack audio is a challenging optimisation problem, in part due to the complexity and non-linearity of auditory perception. In this article, we first propose a multitrack masking metric inspired by the MPEG psychoacoustic model. We investigate different audio processing techniques to man...
Conference Paper
Full-text available
Recent advances in multimedia technology opened the path for individual manipulation of the different audio objects within a multichannel mix, for both sampling and karaoke applications. This requires the transmission of these objects as an additional information. Informed Source Separation (ISS) is an adequate framework for this problem. Its main...
Article
Full-text available
When browsing in virtual 3D environments, a complex sound eld should be rendered from multiple audio sources. Since it ooers good spatial representation and manipulation possibilities, ambisonics is prefered to other surround encoding systems as an intermediate format. In this paper, several ambisonic decoders are reviewed and designed, which m a y...

Citations

... However, it has been shown to depend on the signal emitted by the source, and on the spatial constellation of the source [Blauert 1997;Santala and Pulkki 2011]. Humans easily make mistakes larger than 15 • when asked to describe the spatial distribution of a source, even in optimal listening conditions [Santala and Pulkki 2011], and with short broad-band clicks, the perceived width of broad sources narrows down to point-like auditory events [Hirvonen 2007]. This suggests that the spatial distribution of a sound source can be perceived relatively accurately only if the frequency content is relatively broad and if the sound lasts relatively long. ...
Article
Directional audio coding (DirAC) is a parametric time-frequency domain method for processing spatial audio based on psychophysical assumptions and on energetic analysis of the sound field. Methods to use DirAC in spatial sound synthesis for virtual worlds are presented in this article. Formal listening tests are used to show that DirAC can be used to position and to control the spatial extent of virtual sound sources with good audio quality. It is also shown that DirAC can be used to generate reverberation for N-channel horizontal listening with only two monophonic reverberators without a prominent loss in quality when compared with quality obtained with N-channel reverberators.
... In these cases, the perception corresponds well to the physical situation. When the frequency content is narrower, or the duration of the stimulus is short, the perceived widths of the sources are perceived to be narrower than in reality [PB82,CT03,Hir07,HP08]. When the frequency bands of a broad sound signal are presented using loudspeakers in different directions, the listener perceives the source to be wide, though not as wide as the loudspeaker ensemble is [Hir07]. ...
... When the frequency content is narrower, or the duration of the stimulus is short, the perceived widths of the sources are perceived to be narrower than in reality [PB82,CT03,Hir07,HP08]. When the frequency bands of a broad sound signal are presented using loudspeakers in different directions, the listener perceives the source to be wide, though not as wide as the loudspeaker ensemble is [Hir07]. ...
... In these cases, the perception corresponds well to the physical situation. When the frequency content is narrower, or the duration of the stimulus is short, the perceived widths of the sources are perceived to be narrower than in reality [PB82,CT03,Hir07,HP08]. When the frequency bands of a broad sound signal are presented using loudspeakers in different directions, the listener perceives the source to be wide, though not as wide as the loudspeaker ensemble is [Hir07]. ...
... When the frequency content is narrower, or the duration of the stimulus is short, the perceived widths of the sources are perceived to be narrower than in reality [PB82,CT03,Hir07,HP08]. When the frequency bands of a broad sound signal are presented using loudspeakers in different directions, the listener perceives the source to be wide, though not as wide as the loudspeaker ensemble is [Hir07]. ...
Chapter
IntroductionConcepts of spatial hearingBasic spatial effects for stereophonic loudspeaker and headphone playbackBinaural techniques in spatial audioSpatial audio effects for multichannel loudspeaker layoutsReverberationModeling of room acousticsOther spatial effectsConclusion AcknowledgementsReferences
... At least the LSO activity reveals the incoherence present near the signal frequencyi nt he (N 0 S π ). The BMLD phenomenon together with some binaural pitch phenomena has also been analyzed with an earlier version of the model in [52]. In the results, the N 0 S π case could be explained with variation of MSO output values. ...
... The mechanisms which compute the perceiveddirection from the cues were hypothesized in this article, butnot implemented. It can be assumed that the hypothesized mechanism has to be augmented also with mechanisms which account for gathering the responses from different frequencyc hannels, as it is known that the direction is not perceivedseparately for different frequencybands in all circumstances [52]. The implementation and tuning of this part of the model is left as asubject for future studies. ...
Article
Full-text available
Some recent neurophysical studies suggest that mammalian binaural decoding is based on count comparison. When a signal is presented earlier or with higher level to one ear, the neural signals are stronger in the auditory pathways leading to the contralateral hemisphere in such mechanisms. This paper describes functional count-comparison models of two brainstem nuclei, medial superior olive (MSO) and lateral superior olive (LSO), both of which exist in both hemispheres. The topology of the organs and the connections between them as presented in the current neuroanatomical studies are imitated in the functional model. The parameters of the functional models are selected to fit existing neurophysiological and psychoacoustical data. It is shown that the proposed MSO and LSO models are sensitive to interaural differences in time and level in a way that accounts for some known psychoacoustical phenomena.