Figure 1 - uploaded by Tomasz Letowski
Content may be subject to copyright.
BCM locations investigated in the study.

BCM locations investigated in the study.

Source publication
Conference Paper
Full-text available
The goal of the study was to assess intelligibility and quality of speech recorded through a bone conduction microphone (BCM) located at various points on the talkerpsilas head. Ten words spoken by a female and a male talker in a quiet environment were recorded through a BCM placed at eight different locations on the talkerpsilas head. The sound le...

Context in source publication

Context 1
... BCM locations used in the study were: the mastoid, collarbone, chin angle, forehead, Fz, vertex, inion, and the bone just above temple (referred further in the text as the temple). All locations are shown in Figure 1. The BCM was held in place on the talker's head with an adjustable headband. ...

Similar publications

Article
Full-text available
The aim of this study was to assess listeners' ability to localize virtual bone conduction (BC) audio signals delivered to three BC vibrator locations. Test signals were pre-processed with each participant's head related transfer function for 16 evenly-spaced horizontal locations and played back to the listener using a pair of BC vibrators placed e...
Article
Full-text available
Speech signals can be converted into electrical audio signals using either conventional air conduction (AC) microphone or a contact bone conduction (BC) microphone. The goal of this study was to investigate the effects of the location of a BC microphone on the intensity and frequency spectrum of the recorded speech. Twelve locations, 11 on the talk...
Article
Full-text available
The goal of the study was to assess intelligibility and quality of speech recorded through a bone conduction microphone (BCM) located at various points on the talker's head. Ten words spoken by a female and a male talker in a quiet environment were recorded through a BCM placed at eight different locations on the talker's head. The sound levels of...
Conference Paper
Full-text available
In this paper, we use voice activity detection to improve the de-noising ability of the previously proposed pre-image iteration speech enhancement method. We use a speech database consisting of two-channel recordings where the audio signal is recorded by both a bone conductive microphone and a close-talking microphone. The bone channel is used for...

Citations

... Using this unique property, they have already been employed in applications such as audio enhancement by fusion with a standard air conduction signal [43], human sound classification [45], and pitch detection [33]. While they require direct physical contact to be effective, the ear has been shown to be an effective site for bone conduction microphone placement [39], making them a natural fit for integration in tiny earbuds. Furthermore, the different frequency response imparted on the bone-conducted signal, while somewhat detrimental to applications targeting speech enhancement and reconstruction, provides a valuable differentiation between the wearer's voice and adjacent speakers, enabling personalized voice activity detection without enrollment data and significantly less processing overhead. ...
Preprint
Full-text available
The recent ubiquitous adoption of remote conferencing has been accompanied by omnipresent frustration with distorted or otherwise unclear voice communication. Audio enhancement can compensate for low-quality input signals from, for example, small true wireless earbuds, by applying noise suppression techniques. Such processing relies on voice activity detection (VAD) with low latency and the added capability of discriminating the wearer's voice from others - a task of significant computational complexity. The tight energy budget of devices as small as modern earphones, however, requires any system attempting to tackle this problem to do so with minimal power and processing overhead, while not relying on speaker-specific voice samples and training due to usability concerns. This paper presents the design and implementation of a custom research platform for low-power wireless earbuds based on novel, commercial, MEMS bone-conduction microphones. Such microphones can record the wearer's speech with much greater isolation, enabling personalized voice activity detection and further audio enhancement applications. Furthermore, the paper accurately evaluates a proposed low-power personalized speech detection algorithm based on bone conduction data and a recurrent neural network running on the implemented research platform. This algorithm is compared to an approach based on traditional microphone input. The performance of the bone conduction system, achieving detection of speech within 12.8ms at an accuracy of 95\% is evaluated. Different SoC choices are contrasted, with the final implementation based on the cutting-edge Ambiq Apollo 4 Blue SoC achieving 2.64mW average power consumption at 14uJ per inference, reaching 43h of battery life on a miniature 32mAh li-ion cell and without duty cycling.
... 3,4 Especially in noisy factories, subways, and city streets, measurement of clear and highquality voice through air conduction is challenging. As a result, bone-conduction approaches represented by various types of throat microphones 5,6 have been developed to accommodate the needs of voice capture in noisy environments. In addition, advanced throat microphones can pick up very gentle speech such as murmurs or whispered voices, which are almost inaudible even at a short distance. ...
... for tasks such as covert military operations and assisting patients with laryngeal injury. 5,7 Besides, many clinically relevant parameters such as phonation time, fundamental frequency, and sound pressure level can be derived from throat vibration signals, enabling noninvasive approaches for quantifying vocal states and for diagnosis and treatment of voice disorders. [9][10][11] However, the majority of throat microphones are based on rigid components that require extra fixtures to ensure close contact with wearers' necks. ...
Article
Full-text available
Wearable flexible sensors attached on the neck have been developed to measure the vibration of vocal cords during speech. However, high-frequency attenuation caused by the frequency response of the flexible sensors and absorption of high-frequency sound by the skin are obstacles to the practical application of these sensors in speech capture based on bone conduction. In this paper, speech enhancement techniques for enhancing the intelligibility of sensor signals are developed and compared. Four kinds of speech enhancement algorithms based on a fully connected neural network (FCNN), a long short-term memory (LSTM), a bidirectional long short-term memory (BLSTM), and a convolutional-recurrent neural network (CRNN) are adopted to enhance the sensor signals, and their performance after deployment on four kinds of edge and cloud platforms is also investigated. Experimental results show that the BLSTM performs best in improving speech quality, but is poorest with regard to hardware deployment. It improves short-time objective intelligibility (STOI) by 0.18 to nearly 0.80, which corresponds to a good intelligibility level, but it introduces latency as well as being a large model. The CRNN, which improves STOI to about 0.75, ranks second among the four neural networks. It is also the only model that is able to achieves real-time processing with all four hardware platforms, demonstrating its great potential for deployment on mobile platforms. To the best of our knowledge, this is one of the first trials to systematically and specifically develop processing techniques for bone-conduction speed signals captured by flexible sensors. The results demonstrate the possibility of realizing a wearable lightweight speech collection system based on flexible vibration sensors and real-time speech enhancement to compensate for high-frequency attenuation.
... It is a transducer applied to the skin surrounding the larynx to pick up speech signals transmitted through the skin, and hence, it is relatively unaffected by environmental distortions. Another representative non-air conduction detector is a bone conduction microphone [4]. This device obtains the speech signal by picking up the vibration of the vocal cords that is transmitted to the skull. ...
Article
Full-text available
Language has been one of the most effective ways of human communication and information exchange. To solve the problem of non-contact robust speech recognition, recovery, and surveillance, this paper presents a speech recovery technology based on a 24 GHz portable auditory radar and webcam. The continuous-wave auditory radar is utilized to extract the vocal vibration signal, and the webcam is used to obtain the fitted formant frequency. The traditional formant speech synthesizer is selected to synthesize and recover speech, using the vocal vibration signal as the sound source excitation and the fitted formant frequency as the vocal tract resonance characteristics. Experiments on reading single English characters and words are carried out. Using microphone records as a reference, the effectiveness of the proposed speech recovery technology is verified. Mean opinion scores show a relatively high consistency between the synthesized speech and original acoustic speech
... It can be used as supplement with the AC speech to enhance the accuracy of the SR system by enhancing the speech. Placement of the BCM influences intelligibility of the speech, literature studies show the various location to place the BCM[8][9][10]. Locations near the larynx result in higher intensity, while locations near the temple result in higher intelligibility[9]. ...
Article
A review on multimodal speaker recognition (SR) is being presented. For many decades the speaker recognition has been studied and still it has grabbed the interest of many researchers. Speaker recognition includes of two levels –system training and system testing. The robustness of the speaker recognition system depends on the training environment and testing environment as well as the quality of speech .Air conducted (AC) Speech is a source from which speaker is recognized by extracting the features. The performance of the speaker recognition system depends on AC speech. further to improve the robustness and accuracy of the SR system various other sources(Modals) like Throat Microphone ,Bone Conduction Microphone, array of microphones,Non Audible murmur, non auditory information like video are used in complementary with standard AC microphone. This paper is purely a review on SR and various complimentary modals.
... To control for additional factors and examine possible interactions, we recorded from two bone microphone locations and under two background noise conditions in this study. Because speech intelligibility over BC is known to vary with background noise (Gripper et al., 2007;McBride et al., 2008a;Osafo-Yeboah et al., 2009) and with the locations at which sound is presented to, or recorded from, the skull (Tran et al., 2008;Stanley and Walker, 2009;McBride et al., 2011;Hodges and McBride, 2012;Tran et al., 2013), we controlled for these factors when examining morphological effects. ...
Article
Full-text available
In bone conduction (BC), acoustic signals travel through an individual's bones and soft tissues rather than travelling through the air. While bone conduction hearing and communication are important in everyday life, nature, and technology, little is known about how individual differences affect the transmission of bone-conducted sound. Individuals differ in the sizes, shapes, and proportions of their craniofacial bones, leading to potentially different bone-conducted sound transmissioneffects in different individuals. Individual differences may influence the audibility and quality of bone-conducted sound, and this was studied using speech intelligibility as an assessment criterion for bone-conducted sound transmission. Thirty-two human participants were first subjected to a series of anthropometric craniofacial measurements. Eight morphologically diverse talkers were recorded with bone microphones placed at different skull locations, and 24 morphologically diverse listeners listened to these samples over bone conduction headphones. Modified Rhyme Test results suggest that skull morphology influences BC speech intelligibility and does so differently at different skull locations. Understanding morphological effects can improve bone conduction sound transmission models and may help to enhance BC technology for a diverse user population.
... Some researches were focusing on an earinsert type microphone and found that it can record voice in a noisy environment [4,5]. Tran et al. investigate the effective location of a BCM and reported that the forehead seems to be the best location for the recording of BCV [6]. However, since the BCV's intelligibility and quality depends on the difference among individuals, such as the shape of skull and thickness of skin, the method needs solutions which do not needs to think about the difference. ...
Article
We investigated a new communication-aid system focused on bone-conduction through a tooth, for listening to and recording voices. In this paper, we developed a tooth-conduction microphone (TCM) and evaluate the articulation of tooth-conducted voice (TCV). Because the TCM has the shape of one's dental mold, it is wearable like a mouthpiece. Moreover, it can extract tooth vibration during phonation as TCV. To evaluate articulation of TCV, we adopted monosyllable articulation for subjective assessment and linear predictive coding cepstral distance for objective assessment. The results of articulation show that TCV is not sufficiently clear compared to airconducted. However, it is confirmed that TCV is robust to environmental noise because the accuracy rate is not decreased when the TCV is recorded under high ambient noise.
... Most published bone conduction studies look at only one side of the BC system, examining recording or listening alone, or they use hybrid AC/BC systems (e.g., Gripper et al., 2007; McBride et al., 2008a; Tran et al., 2008; Stanley and Walker, 2009; but see McBride et al., 2011). In contrast, our study used a full BC-to-BC communication pathway. ...
... Bone conduction recordings made from the forehead yielded greater intelligibility than those made from the mandibular condyle. These results agree with previous work, which has also found the forehead to be a superior location for recording intelligible bone-conducted speech (Tran et al., 2008; McBride et al., 2011). Taken together, these results make a strong case for the forehead as the most favorable microphone location for bone conducted speech intelligibility, and that the forehead location should be preferred if such placement is practical. ...
... Our study found no significant effect for talker age or sex on the intelligibility of forehead-recorded speech. In a BC-to-AC study, Tran et al. (2008) found forehead-recorded male speech to be significantly more intelligible, but only one male and one female talker were compared in that study, and the talkers' vocal traits were not reported. In a comparison between two male talkers and two female talkers, recorded on the forehead and played back on the condyle (BC-to-BC), Listeners' age, race, and regional origin did not predict speech intelligibility for either forehead or condyle recordings in our study. ...
Article
Full-text available
Bone conduction (BC) communication systems provide benefits over air conduction systems but are not in widespread use, partly due to problems with speech intelligibility. Contributing factors like device location and background noise have been explored, but little attention has been paid to the role of individual user differences. Because BC signals travel through an individual’s skull and facial tissues, demographic factors such as user age, sex, race, or regional origin may influence sound transmission. Vocal traits such as pitch, spectral tilt, jitter, and shimmer may also play a role. Along with microphone placement and background noise, these factors can affect BC speech intelligibility. Eight diverse talkers were recorded with bone microphones on two different skull locations and in different background noise conditions. Twenty-four diverse listeners listened to these samples over BC and completed Modified Rhyme Tests for speech intelligibility. Forehead bone recordings were more intelligible than condyle recordings. In condyle recordings, female talkers, talkers with high fundamental frequency, and talkers in background noise were understood better, as were communications between talkers and listeners of the same regional origin. Listeners’ individual traits had no significant effects. Thoughtful application of this knowledge can help improve BC communication for diverse users.
... Therefore, BCM is not sensitivity to external interference, such as background noise. Moreover, the quality of collected speech from BCM will be varied while the position of BCM changed [4]. Although the BCM is robust to noise effect, several studies pointed out its infirm to high-frequency of speech which caused by the conduction path loss [5]. ...
... The HRE-5673 situates its bone vibrator and bone microphone on external skull locations (mandibular condyle and forehead, respectively) that are known to be excellent for speech transmission. The forehead has been found to be the best external skull location for speech clarity from bone microphones (Tran et al., 2008, McBride et al., 2011), and the condyle has been found to be the most sensitive external skull location for receiving bone conducted signals (McBride et al., 2008b). A possible reason for the EM20N-T earpiece's improved performance over the HRE-5673 may be the relative ease of use. ...
Technical Report
Full-text available
Difficult environments, such as Chemical, Biological, Radiological, Nuclear, and Explosive (CBRNE) environments, pose a unique communication challenge. Effective communication is essential to stay safe in these environments, yet safety gear itself impedes communication. Personal protective equipment (PPE) (e.g., full-face respirators) and noisy decontamination devices (power sprayers, etc.) can impede successful speech transmission. Bone conduction communication systems are a promising solution. These systems are relatively insensitive to background noise and can capture speech directly from a user’s skull vibrations, before airborne speech is disrupted by a respirator. To assess the potential of bone conduction systems for use by encapsulated personnel, three communication systems were tested for speech intelligibility using the Modified Rhyme Test (MRT). Sixteen participants wore the M50 Joint Service General Purpose Mask (JSGPM) full-face respirator and communicated via radio using three different communication systems in two levels of background noise. A bone conduction earpiece performed best, followed by a mask-mounted bone conduction system. Both bone conduction systems outperformed the currently-fielded air conduction communication system. The results support the use of bone conduction technology for improved encapsulated communication, which may improve safety and effectiveness for CBRNE personnel. Results are discussed and recommendations are provided.
... These subjective ratings were from 1 (very unpleasant or annoying) to 5 (very pleasant). The same ratings were obtained in a repetition of the study using a larger number of participants and longer list of CAT items but only at eight locations (Tran et al., 2008). However, the best and worst speech discrimination ratings were more differentiated, e.g., forehead (88.2%) and temple (82.2%) versus collar bone (40.0%). ...
Article
Full-text available
Speech signals can be converted into electrical audio signals using either conventional air conduction (AC) microphone or a contact bone conduction (BC) microphone. The goal of this study was to investigate the effects of the location of a BC microphone on the intensity and frequency spectrum of the recorded speech. Twelve locations, 11 on the talker's head and 1 on the collar bone, were investigated. The speech sounds were three vowels (/u/, /a/, /i/) and two consonants (/m/, /∫/). The sounds were produced by 12 talkers. Each sound was recorded simultaneously with two BC microphones and an AC microphone. Analyzed spectral data showed that the BC recordings made at the forehead of the talker were the most similar to the AC recordings, whereas the collar bone recordings were most different. Comparison of the spectral data with speech intelligibility data collected in another study revealed a strong negative relationship between BC speech intelligibility and the degree of deviation of the BC speech spectrum from the AC spectrum. In addition, the head locations that resulted in the highest speech intelligibility were associated with the lowest output signals among all tested locations. Implications of these findings for BC communication are discussed.