Fig 1 - uploaded by Sven Nordholm
Content may be subject to copyright.
Main processing blocks in an assisted listening device  

Main processing blocks in an assisted listening device  

Source publication
Article
Full-text available
In everyday environments, we are frequently immersed by unwanted acoustic noise and interference while we want to listen to acoustic signals, most often speech. Technology for assisted listening is then desired to increase the efficiency of speech communication, reduce listener fatigue, or just allow for enjoying undisturbed sounds (e.g., music). F...

Similar publications

Conference Paper
Full-text available
A reacher is an assistive device utilized for reaching items placed overhead or on the floor by individuals with arthritis, post-surgical total joint replacement, spinal injuries and more. A forearm support for a reacher device has been developed to decrease the amount of force distribution placed through the wrist. Also, allowing for movement at t...
Article
Full-text available
The 2011 Census estimated that 2.7 crore people have any form of disability in India. Cerebral Palsy (CP) is one of the disabilities recognized by major acts in India. Cerebral palsy is a neurological condition affecting movement and muscle coordination. Assistive devices play an important role in improving the functionality of people with CP. The...
Conference Paper
Full-text available
This paper presents the design process of an exoskeleton for executing human fingers’ extension movement for the rehabilitation procedures and as an active orthosis purposes. The Fingers Extending eXoskeleton (FEX) is a serial, under-actuated mechanism capable of executing fingers’ extension. The proposed solution is easily adaptable to any finger...
Article
Full-text available
Wearable robots assist individuals with sensorimotor impairment in daily life, or support industrial workers in physically demanding tasks. In such scenarios, low mass and compact design are crucial factors for device acceptance. Remote actuation systems (RAS) have emerged as a popular approach in wearable robots to reduce perceived weight and incr...
Article
Full-text available
Purpose: To describe the use of assistive devices and postural asymmetries in lying, sitting and standing positions in adults with cerebral palsy, and to analyze postural asymmetries and any associations with their ability to maintain or change position and time in these positions. Methods: A cross-sectional study based on data from the Swedish Cer...

Citations

... A number of two-channel (binaural) beamforming algorithms achieve narrow tuning by combining the signals at the left and right ears. 8 One issue with most previous solutions is that they sacrifice natural spatial information, which can counteract any improvements in SNR they provide, especially in complex situations with competing talkers. 9,10 Deep neural network approaches to sound segregation have made impressive leaps in recent years and under the right conditions can effectively isolate a sound of interest from a complex mixture. ...
... To compare to a widely used spatial processing algorithm, stimuli were also processed with a binaural MVDR beamformer. 8,30 To enable a direct comparison to the two-channel BOSSA approach, the binaural MVDR implementation used two virtual microphones (one per ear). Relative transfer function vectors aimed towards the target source angle were calculated for each ear using the same KEMAR HRTFs that were used in BOSSA. ...
Preprint
Full-text available
Selective listening in competing-talker situations (restaurants, parties, etc.) is an extraordinarily difficult task for many people. For individuals with hearing loss, this difficulty can be so extreme that it seriously impedes communication and participation in daily life. Directional filtering is one of the only proven ways to improve speech understanding in competition, and most hearing devices now incorporate some kind of directional technology, although real-world benefits are modest, and many approaches fail in competing-talker situations. We recently developed a biologically inspired algorithm that is capable of very narrow spatial tuning and can isolate one talker from a mixture of talkers. The algorithm is based on a hierarchical network model of the auditory system, in which binaural sound inputs drive populations of neurons tuned to specific spatial locations and frequencies, and the spiking responses of neurons in the output layer are reconstructed into audible waveforms. Here we evaluated the algorithm in a group of adults with sensorineural hearing loss, using a challenging competing-talker task. The biologically inspired algorithm led to robust intelligibility gains under conditions in which a standard beamforming approach failed. The results provide compelling support for the potential benefits of biologically inspired algorithms for assisting individuals with hearing loss in “cocktail party” situations.
... The evaluation was based on the source-to-distortion ratio (SDR) [42]. From [23], [24], we utilized the covariance matrix of the interference signal and the relative transfer function (RTF) [43], [44] to estimate the beamformer's filter. The RTF was calculated using the RIR from the target source to each microphone. ...
Article
Full-text available
In this paper, we present an enhanced method designed to facilitate sound field interpolation (SFI) for rotationrobust beamforming using unequally spaced circular microphone arrays (unes-CMAs). Unlike the previous approach that necessitated an equally spaced circular microphone array (es- CMA), our method addresses the challenge of handling nonuniformly spaced microphones, making it suitable for real-world applications where unes-CMAs are more prevalent. Our proposed method enables the estimation of a virtual signal of an unes- CMA before rotation, derived from the observed signal after rotation. A modified SFI technique is utilized to compensate for the positional errors of microphones on an unes-CMA and to estimate a virtual signal at equally spaced positions after rotation. As an intermediate step, the previous SFI method is utilized to obtain equally spaced signals before rotation. Subsequently, the target signal of the unes-CMA before rotation is reconstructed, effectively achieving rotation-robust beamforming on the unes-CMA. Moreover, we provide an in-depth analysis of our proposed method's properties. We conducted simulated experiments, including online beamforming applications, to evaluate its performance. The experimental results demonstrated that our method effectively mitigates the adverse effects of unequal microphone placement, yielding significant improvements in estimating the signal before rotation under various conditions. Moreover, our proposed method consistently outperformed the previous approach, significantly enhancing the performance of beamforming.
... speech distortion. Given a standard filter-and-sum beamformer [15] ...
... Following [17,15], we assume noiseū(t) ands(t) being uncorrelated, we can rewrite Φxx using (12) ...
... where Φuu(t) represents the undesired noise and interference covariance matrix. With (19), it can be shown [17,15] that the MVDR beamformer can be rewritten tō ...
... By virtue of the spatial information, multi-channel speech enhancement (MC-SE) can effectively extract the target speech from the noisy mixture and often leads to superior performance over the single-channel (SC) case [1,2]. Recently, with the advent of deep neural networks (DNNs), we have witnessed the proliferation of neural beamformers (NBFs) by leaps and bounds, which make significant progress over traditional spatial filters [3,4,5,6]. ...
... Speech enhancement (SE) is a longstanding, active area of speech research given its myriad applications in downstream tasks [1][2][3][4]. With the change in working patterns globally due to the COVID-19 pandemic, online remote meetings have become a mainstay in the working world. ...
Preprint
Full-text available
Recent work in the field of speech enhancement (SE) has involved the use of self-supervised speech representations (SSSRs) as feature transformations in loss functions. However, in prior work, very little attention has been paid to the relationship between the language of the audio used to train the self-supervised representation and that used to train the SE system. Enhancement models trained using a loss function which incorporates a self-supervised representation that shares exactly the language of the noisy data used to train the SE system show better performance than those which do not match exactly. This may lead to enhancement systems which are language specific and as such do not generalise well to unseen languages, unlike models trained using traditional spectrogram or time domain loss functions. In this work, SE models are trained and tested on a number of different languages, with self-supervised representations which themselves are trained using different language combinations and with differing network structures as loss function representations. These models are then tested across unseen languages and their performances are analysed. It is found that the training language of the self-supervised representation appears to have a minor effect on enhancement performance, the amount of training data of a particular language, however, greatly affects performance.
... Researchers have successfully applied adaptive beamformers (BFs) [1]- [3] to enhance a single speech signal in the captured signal by low-latency processing. Typically, they use the speech's direction-of-arrival (DOA) to estimate the acoustic transfer function (ATF) from the speaker to the microphones (the steering vector) based on a plane-wave assumption and use the estimated ATF to optimize the BF [3], [4]. ...
... Researchers have successfully applied adaptive beamformers (BFs) [1]- [3] to enhance a single speech signal in the captured signal by low-latency processing. Typically, they use the speech's direction-of-arrival (DOA) to estimate the acoustic transfer function (ATF) from the speaker to the microphones (the steering vector) based on a plane-wave assumption and use the estimated ATF to optimize the BF [3], [4]. A method based on generalized eigenvalue decomposition (GEV) has been developed to estimate the ATF accurately in reverberation (i.e., multipath environments) using the spatial covariance matrices (SCMs) of the signals [5], [6]. ...
Preprint
This paper introduces a novel low-latency online beamforming (BF) algorithm, named Modified Parametric Multichannel Wiener Filter (Mod-PMWF), for enhancing speech mixtures with unknown and varying number of speakers. Although conventional BFs such as linearly constrained minimum variance BF (LCMV BF) can enhance a speech mixture, they typically require such attributes of the speech mixture as the number of speakers and the acoustic transfer functions (ATFs) from the speakers to the microphones. When the mixture attributes are unavailable, estimating them by low-latency processing is challenging, hindering the application of the BFs to the problem. In this paper, we overcome this problem by modifying a conventional Parametric Multichannel Wiener Filter (PMWF). The proposed Mod-PMWF can adaptively form a directivity pattern that enhances all the speakers in the mixture without explicitly estimating these attributes. Our experiments will show the proposed BF's effectiveness in interference reduction ratios and subjective listening tests.
... where E{·} denotes the expectation operator and {·} H denotes the Hermitian transpose operator. The MVDR beamformer [28][29][30] aims at minimizing the noise power spectral density (PSD) while preserving the speech component in the reference microphone. It was shown in [28,29] that the MVDR beamformer is given by ...
... The MVDR beamformer [28][29][30] aims at minimizing the noise power spectral density (PSD) while preserving the speech component in the reference microphone. It was shown in [28,29] that the MVDR beamformer is given by ...
Preprint
There is an emerging need for comparable data for multi-microphone processing, particularly in acoustic sensor networks. However, commonly available databases are often limited in the spatial diversity of the microphones or only allow for particular signal processing tasks. In this paper, we present a database of acoustic impulse responses and recordings for a binaural hearing aid setup, 36 spatially distributed microphones spanning a uniform grid of (5x5) m^2 and 12 source positions. This database can be used for a variety of signal processing tasks, such as (multi-microphone) noise reduction, source localization, and dereverberation, as the measurements were performed using the same setup for three different reverberation conditions (T_60\approx{310, 510, 1300} ms). The usability of the database is demonstrated for a noise reduction task using a minimum variance distortionless response beamformer based on relative transfer functions, exploiting the availability of spatially distributed microphones.
... speech distortion. Given a standard filter-and-sum beamformer [15] ...
... Following [17,15], we assume noiseū(t) ands(t) being uncorrelated, we can rewrite Φxx using (12) ...
... where Φuu(t) represents the undesired noise and interference covariance matrix. With (19), it can be shown [17,15] that the MVDR beamformer can be rewritten tō ...
Preprint
Full-text available
Multi-frame algorithms for single-channel speech enhancement are able to take advantage from short-time correlations within the speech signal. Deep filtering (DF) recently demonstrated its capabilities for low-latency scenarios like hearing aids with its complex multi-frame (MF) filter. Alternatively, the complex filter can be estimated via an MF minimum variance distortionless response (MVDR), or MF Wiener filter (WF). Previous studies have shown that incorporating algorithm domain knowledge using an MVDR filter might be beneficial compared to the direct filter estimation via DF. In this work, we compare the usage of various multi-frame filters such as DF, MF-MVDR, or MF-WF for HAs. We assess different covariance estimation methods for both MF-MVDR and MF-WF and objectively demonstrate an improved performance compared to direct DF estimation, significantly outperforming related work while improving the runtime performance.
... To achieve spatial selectivity, it is common to use the beamforming technique. Beamforming is a well-established method of designing spatiotemporal filters for array processing (Doclo et al., 2015;Van Trees, 2002;Van Veen and Buckley, 1988). For personal devices, a certain number of microphones can be used to differentiate sound from different directions. ...
Article
Active noise control (ANC) systems are commonly designed to achieve maximal sound reduction regardless of the incident direction of the sound. When desired sound is present, the state-of-the-art methods add a separate system to reconstruct it. This can result in distortion and latency. In this work, we propose a multi-channel ANC system that only reduces sound from undesired directions, and the system truly preserves the desired sound instead of reproducing it. The proposed algorithm imposes a spatial constraint on the hybrid ANC cost function to achieve spatial selectivity. Based on a six-channel microphone array on a pair of augmented eyeglasses, results show that the system minimized only noise coming from undesired directions. The control performance could be maintained even when the array was heavily perturbed. The proposed algorithm was also compared with the existing methods in the literature. Not only did the proposed system provide better noise reduction, but it also required much less effort. The binaural localization cues did not need to be reconstructed since the system preserved the physical sound wave from the desired source.
... In many hands-free speech communication systems such as hearing aids, mobile phones and smart speakers, interfering sounds, ambient noise and reverberation may degrade the speech quality and intelligibility of the recorded microphone signals [1]. To enhance speech quality and intelligibility, many multi-microphone speech enhancement methods aiming at noise and interferer reduction and dereverberation have been proposed in the last decades [2], [3]. For many of these methods, both non-adaptive versions with time-invariant parameters as well as adaptive versions with time-varying parameters exist. ...
... The block diagram in Fig. 1 shows an overview of the complete algorithm. Note that the computation time is not significantly increased by the MIMO-WPE preprocessing stage, since the wBLCMP filter can be effectively computed using the MIMO-WPE filter, because both are based on the convolutional signal model in (2) and can be derived using the ℓ p -norm cost function in (12a) [24]. The RTF vector of the j-th source can then be estimated based on the generalized eigenvalue decomposition of the dereverberated covariance matrix R j,t of that source and the dereverberated covariance matrix R v,j,t of all other sources and the background noise. ...
Preprint
Interfering sources, background noise and reverberation degrade speech quality and intelligibility in hearing aid applications. In this paper, we present an adaptive algorithm aiming at dereverberation, noise and interferer reduction and preservation of binaural cues based on the wBLCMP beamformer. The wBLCMP beamformer unifies the multi-channel weighted prediction error method performing dereverberation and the linearly constrained minimum power beamformer performing noise and interferer reduction into a single convolutional beamformer. We propose to adaptively compute the optimal filter by incorporating an exponential window into a sparsity-promoting lp-norm cost function, which enables to track a moving target speaker. Simulation results with successive target speakers at different positions show that the proposed adaptive version of the wBLCMP beamformer outperforms a non-adaptive version in terms of objective speech enhancement performance measures.