Conference PaperPDF Available

A smarter way to find pitch

January 2005

January 2005

Conference: Proc. of Int. Computer Music Conf.

Authors:

University of Otago

The 'Tartini' project at the University of Otago aims to use the computer as a practical tool for singers and in- strumentalists. Sound played into the system is analysed fast enough to create useful feedback for teaching or, at a higher level, for practising musicians to refine their tech- nique. Central to this analysis is the accurate determina- tion of musical pitch. We describe a fast, accurate and robust method for find- ing the continuous pitch in monophonic musical sounds. We employ a special normalised version of the Squared Difference Function (SDF) coupled with a peak picking algorithm. We show how to implement the algorithm effi- ciently. Inherent in our method is a 'clarity' estimate that measures to what extent the sound has a tone. This has al- ready found application in showing defects in a violinist's bowing technique.

Content uploaded by Geoff Wyvill

Content may be subject to copyright.

A SMARTER WAY TO FIND PITCH

Philip McLeod, Geoff Wyvill

University of Otago

Department of Computer Science

pmcleod, geoff @cs.otago.ac.nz

ABSTRACT

The ’Tartini’ project at the University of Otago aims to

use the computer as a practical tool for singers and in-

strumentalists. Sound played into the system is analysed

fast enough to create useful feedback for teaching or, at a

higher level, for practising musicians to reﬁne their tech-

nique. Central to this analysis is the accurate determina-

tion of musical pitch.

We describe a fast, accurate and robust method for ﬁnd-

ing the continuous pitch in monophonic musical sounds.

We employ a special normalised version of the Squared

Difference Function (SDF) coupled with a peak picking

algorithm. We show how to implement the algorithm efﬁ-

ciently. Inherent in our method is a ’clarity’ estimate that

measures to what extent the sound has a tone. This has al-

ready found application in showing defects in a violinist’s

bowing technique.

1. INTRODUCTION

Over the last three years at the University of Otago we

have been investigating ways to use a computer to anal-

yse sound and provide useful, practical feedback to musi-

cians, both amateur and professional. We have informally

dubbed this activity the Tartini Project, named for the vi-

olinist and composer Guiseppe Tartini. In 1714, Tartini

discovered that when two related notes were played simul-

taneously on a violin a third sound was heard. He taught

his students to listen for this third sound as a device to

ensure that their playing was in tune.

We are using visual feedback from a computer simi-

larly to provide useful information to help musicians im-

prove their art. Our system can help beginners to learn

to hear musical intervals and professionals to understand

some of the subtle choices they need to make in expressive

intonation.

Pitch is the perception of how high or low a musi-

cal note sounds, which can be considered as a frequency

which corresponds closely to the fundamental frequency

or main repetition rate in the signal. Estimation of f0has

quite a history. It is used in speech recognition and music

information retrieval, and in handheld ’tuners’ that help

developing musicians to tune their instruments. Existing

algorithms for pitch estimation include the Average Mag-

nitude Difference Function (AMDF), Harmonic Product

Spectrum (HPS), Log Harmonic Product Spectrum, Phase

Vocoder, Channel Vocoder, Parallel Processing Pitch De-

tector [6], Square Difference Function (SDF) [1], Cepstral

Pitch Determination [5], Subharmonic-to-harmonic ratio

[8] and Super Resolution Pitch Detector (SRPD) [4].

We have already demonstrated that we can produce use-

ful feedback in real time to musicians [3]. In particu-

lar, we have successfully displayed the shape of a profes-

sional violinist’s vibrato and helped at least one amateur

violinist to develop smoother changes in bow direction.

Once f0is known, a full harmonic analysis of the sound

becomes possible in real time and we can display many

other aspects of a sound that are useful to a musician. In

this paper, we deal only with the ’McLeod Pitch Method’

(MPM), our latest and much improved method of ﬁnding

the fundamental pitch.

MPM runs in real time with a standard 44.1 kHz sam-

pling rate. It operates without using low-pass ﬁltering so

it can work on sound with high harmonic frequencies such

as a violin and it can display pitch changes of one cent re-

liably. MPM works well without any post processing to

correct the pitch. Post processing is a common require-

ment in other pitch detectors.

The Tartini system has an option to equalise the lev-

els of the signal to the sensitivity of the inner ear. Stan-

dard equal-loudness curves [7] are used, tending to reduce

low frequencies not perceived well relative to frequencies

around 3700 Hz heard best. This helps move from a direct

fundamental frequency estimate towards something more

correlated with pitch.

Existing pitch algorithms that use the Fourier Domain

suffer from spectral leakage. This is because the ﬁnite

window chosen in the data does not always contain a whole

number of periods of the signal. The common solution to

this is to reduce the leakage by using a windowing func-

tion [2], smoothing the data at the window edges. This

requires a larger window size for the same frequency res-

olution. A similar problem happens in some time domain

methods, such as the autocorrelation, where a window

containing a fractional number of periods, produces max-

ima at varying locations depending on the phase of the in-

put. MPM however, introduces a method of normalisation

which is less affected by edge problems. Keeping track of

terms on each side of the correlation separately.

To explain our fast calculation of the Normalised Square

Difference Function (NSDF) it is necessary ﬁrst to de-

scribe the relationship between an Autocorrelation Func-

tion (ACF) and the Squared Difference Function (SDF).

This we do in Sections 2 and 3. The fast calculation de-

pends on a standard method for ACF [6]; how this is used

is described in Section 6. The NSDF automatically gen-

erates an estimate of the clarity of the sound, describing

how tone-like it is. This is basically the value of the cho-

sen maximum of the function (section 7).

2. AUTOCORRELATION FUNCTION

There are a two main ways of deﬁning the Autocorrelation

Function (ACF). We will refer to them as type I, type II.

When not speciﬁed we are referring to type II.

We deﬁne the ACF type I of a discrete signal xtas:

rt(τ) =

t+W−1

j=t

xjxj+τ(1)

where rt(τ)is the autocorrelation function of lag τcalcu-

lated starting at time index t, where Wis the initial win-

dow size, i.e. the number of terms in the summation.

We will deﬁne the ACF type II as:

t(τ) =

t+W−1−τ

j=t

xjxj+τ(2)

In this deﬁnition the window size decreases with increas-

ing τ. This has a tapering effect, with a smaller number of

non-zero terms being used in the calculation at larger τ.

Note that ACF Type I and Type II are the same for a zero

padded data set i.e. xk= 0, k > t +W−1.

3. SQUARE DIFFERENCE FUNCTION

Again we deﬁne two types of discrete signal Square Dif-

ference Functions (SDFs). The SDF of Type I is deﬁned

as:

dt(τ) =

t+W−1

j=t

(xj−xj+τ)2(3)

and the SDF Type II is deﬁned as:

t(τ) =

t+W−τ−1

j=t

(xj−xj+τ)2(4)

As in Type II ACF, the window size decreases as we in-

crease τ. In both types of SDFs minima occur when τis

a multiple of the period, whereas in the ACFs maxima oc-

curred. These do not always coincide. If we expand out

Equation 4 we see that there is an ACF inside the SDF

calculation.

t(τ) =

t+W−τ−1

j=t

(x2

j+x2

j+τ−2xjxj+τ)(5)

If we deﬁne

t(τ) =

t+W−τ−1

j=t

(x2

j+x2

j+τ)(6)

we can see that

t(τ) = m0

t(τ)−2r0

t(τ)(7)

When using the Type II ACF it is common to divide

t(τ)by the number of terms as a method of counteract-

ing the tapering effect. However this can introduce ar-

tifacts, such as sudden jumps when large changes in the

waveform pass out the edge of the window. Our normali-

sation method provides a more stable correlation function

even down to a window containing just two periods of a

waveform.

4. NORMALIZED SQUARE DIFFERENCE

FUNCTION

Once the SDF has been calculated at time t, we have the

central problem of deciding which is the τthat corre-

sponds to the pitch. This does not always correspond with

to overall minimum, but is usually one of the local min-

ima. Without a grasp of the range of the values it is dif-

ﬁcult to decide which minimum it corresponds to. We

have discovered a useful way of normalizing the values

to simplify this problem. We deﬁne a Normalised Square

Difference Function (NSDF) as follows:

t(τ)=1−m0

t(τ)−2r0

t(τ)

t(τ)(8)

=2r0

t(τ)

t(τ)(9)

The greatest possible magnitude of 2r0

t(τ)is m0

t(τ)i.e.

|2rt(τ)|<=m0

t(τ). This puts n0

t(τ)in the range of -1 to

1, where 1 means perfect correlation, 0 means no corre-

lation and -1 means perfect negative correlation, irrespec-

tive of the waveform’s amplitude. From equation 9, we

see this becomes the same as normalising the autocorre-

lation in the same fashion. Notice that m0

tis a function

of τ, minimising the edge effects of the decreasing win-

dow size. Having these normalised values simpliﬁes the

problem of choosing the pitch period, as the range is well

deﬁned. We refer to the process of choosing the ’best’

maximum as peak picking, and our algorithm is shown in

Section 5. But another reason for normalisation is that it

enables us to deﬁne a clarity measure. Clarity is discussed

in Section 7.

An important property we have found useful in a time

domain pitch detection algorithm is what we call the Sym-

metry property. This means there are the same number of

evenly spaced samples being used from either side of time

tfor a given τ, and that these samples are symmetric in

terms of their distances from t. This maximises cancella-

tions of frequency deviations from opposite sides of time

t, creating a frequency averaging effect.

Equation 4 can be made to hold this property by simply

shifting the center to time t, yielding

t(τ) =

t+(W−τ)/2−1

j=t−(W−τ)/2

(xj−xj+τ)2(10)

0 200 400 600 800 1000 1200

−1

−0.5

0.5

1.5 NSDF Delay Space

Delay τ (in samples)

NSDF Correlation n(τ)

Figure 1. A NSDF graph showing the highest key max-

imum at 650, whereas the pitch has a period at 325. The

graph is sprinkled with other unimportant local maxima.

5. PEAK PICKING ALGORITHM

The algorithm so far gives us correlation coefﬁcients at

integer τ. We will choose the ﬁrst ’major’ peak as repre-

senting the pitch period. This is not always the maximum,

which is considered the fundamental frequency. Firstly,

ﬁnd all of the useful local maxima. These are maxima

with τwhich potentially represent the period associated

with the pitch. We will refer to these as key maxima. As

can be seen from Figure 1, if we just take all the local

maxima we get a lot of unnecessary peaks which are not

of much use. We ﬁnd that taking only the highest max-

imum between every positively sloped zero crossing and

negatively sloped zero crossing works well at choosing

key maxima. The maximum at delay 0 is ignored, and we

start from the ﬁrst positively sloped zero crossing. If there

is a positively sloped zero crossing toward the end with-

out a negative zero crossing, the highest maximum so far

is accepted, if one exists.

In the example from Figure 1 this leaves us with three

key maxima. It is possible to get some spurious peaks as

key maxima: for example if the value at τ= 720 had

crossed through the zero line then it would add another

maximum to our key list. But these are normally a lot

smaller than the other key maxima, so are not chosen in

the later part of the algorithm.

Parabolic interpolation is used to ﬁnd the positions of

the maxima to a higher accuracy. This is done using the

highest local value and its two neighbours.

From the key maxima we deﬁne a threshold which is

equal to the value of the highest maximum, nmax, multi-

plied by a constant k. We then take the ﬁrst key maximum

which is above this threshold and assign its delay, τ, as

the pitch period. The constant, k, has to be large enough

to avoid choosing peaks caused by strong harmonics, such

as those in Figure 2, but low enough to not choose the un-

wanted beat or sub-harmonic signals. Choosing an incor-

rect key maximum causes a pitch error, usually a ’wrong

octave’. Pitch is a subjective quantity and impossible to

get correct all the time. In special cases, the pitch of a

given note will be judged differently by different, expert

listeners. We can endeavour to get the pitch agreed by the

user/musician as most often as possible. The value of k

can be adjusted to achieve this, usually in the range 0.8 to

1.0.

The pitch period is equal to the delay, τ, at the chosen

0 200 400 600 800 1000 1200

−1

−0.5

0.5

1.5 NSDF Delay Space

Delay τ (in samples)

NSDF Correlation n(τ)

Figure 2. A graph showing the NSDF of a signal with a

strong second harmonic. The real pitch here has a period

of 190. But close matches are made at half this period.

key maximum. The corresponding frequency is obtained

by dividing the sample rate by the pitch period (in sam-

ples). We turn this into a note on the even tempered scale

using:

note =log10(f /27.5)

log10(12

√2) (11)

These correspond to notes on the midi scale, and contain

decimal parts representing fractions of a semitone.

6. EFFICIENT CALCULATION OF SDF

To calculate the SDF by summation takes O(W w )time,

where wis the desired number of ACF coefﬁcients. By

splitting d0

t(τ)into the two components m0

t(τ)and r0

t(τ),

we can calculate these terms more efﬁciently. The ACF

can be calculated in approximately O((W+w)log(W+

w)) time by use of the Fast Fourier Transform [6]. The

ACF part of the SDF, r0

t(τ), can be calculated as follows:

1. Zero pad the window by the number of NSDF val-

ues required, w. We use w=W/2.

2. Take a Fast Fourier Transform of this real signal.

3. For each complex coefﬁcient, multiply it by its con-

jugate (giving the power spectrial density).

4. Take the inverse Fast Fourier Transform.

The two terms of m0

t(τ)from Equation 6 can each be

calculated incrementally, by simply using the result from

τ−1, and subtracting the appropriate x2

tstarting (when

τ= 0) with both sums equal to the total sum squared of

the whole window, which we already have in r0

t(0).

Typical window sizes we use for a 44100 Hz signal are

512, 1024, 2048 or 4096 samples. with 75% overlap in

time, i.e. incrementing tby W/4.

7. THE CLARITY MEASURE

We deﬁne clarity as a measure of how coherent a note

sound is. If a signal contains a more accurately repeating

waveform, then it is clearer. This is similar to the term

voiced, used in speech recognition. Clarity is independent

of the amplitude or harmonic structure of the signal. As

a signal becomes more noise-like, its clarity decreases to-

ward zero. The clarity is simply taken as the correlation

value of the chosen key maximum. If no key maxima are

found, it is set to zero.

We use the clarity measure in combonation with the

RMS power to weight the alpha value (translucency) of

the pitch contour at a given point in time. This maximises

the on-screen contrast, displaying the pitch information

most relevant to the musician. The clearer the sound the

larger the weight and the louder the sound the larger the

weight. This ensures that background sounds and non-

musical sounds are not cluttering the display, but are faded

into the background. Sounds below the noise threshold

level are completely ignored.

8. CONCLUSION

The MPM algorithm can provide real-time pitch contours.

With its ability to extract pitch with as little as two peri-

ods, smaller window sizes can be used than in other algo-

rithms. Smaller window sizes allow for better representa-

tion of a changing pitch, such as that during vibrato. Tar-

tini works well on a range of instruments including string,

woodwind, brass and voice.

Tartini emphasises the importance of a loud and clear

pitch by adjusting the alpha of the pitch contours (Section

7), thus hiding away unwanted background or pitch-less

sounds. This can be seen in ﬁgure 3(a) where the pitch

contour fades out at either end. Also the rate and steadi-

ness of vibrato can be seen. This direct feedback allows a

musician to see where they are going wrong, or to get the

effect they want. Breaks in playing, for example when a

violinist changes bow direction, can be seen as a break in

the contour. A violinist can practice shortening the break,

helping to improve the bowing gesture. Figure 3(b) is a

screen-shot showing the strength of each harmonic as a

track, during a descending violin scale. This is done using

the pitch as a basis for the harmonic analysis. A number

of musicians and singers have shown great interest in the

system. Tartini can be downloaded from www.tartini.net.

9. ACKNOWLEDGEMENTS

We would like to thank our professional musicians, Mr

Kevin Lefohn (violin) and Miss Judy Bellingham (voice)

for numerous discussions and providing us with samples

of good sound. Also we thank Dr Don Warrington of the

Physics Department, University of Otago for his advice

and encouragement .

10. REFERENCES

[1] Cheveigne, A. ”YIN, a fundamental frequency

estimator for speech and music”, Journal of

the Acoustical Society of America, Vol 111(4),

pp 1917-30, April 2002.

[2] Harris. F, ”On the Use of Windows for Har-

monic Analysis with the Discrete Fourier

Transform”, Proc. of the IEEE, Vol 66, No 1,

Jan 1978.

Figure 3. Two screen-shots from Tartini. (a) A pitch con-

tour and log-RMS plot of a Violin vibrato about a D on the

5th octave. (b) Harmonic tracks from a descending scale

beside their equivalent key on a keyboard.

[3] McLeod. P, Wyvill. G, “Visualization of Musi-

cal Pitch”, Proc. Computer Graphics Interna-

tional, Tokyo, Japan, July 9-11, 2003, pp 300-

303.

[4] Medan. Y, Yair. E, and Chazan. D, ”Su-

per Resolution Pitch Determination of Speech

Signals”, IEEE Tans. Signal Processing, Vol

39(1), pp 40-48, 1991.

[5] Noll. A, ” Cepstrum Pitch Determination”,

Journal of the Acoustical Society America, Vol

41(2), pp 293-309, 1967.

[6] Rabiner. L, Schafer. R, Digital Processing of

Speech Signals, Prentice Hall, 1978

[7] Rossing. T, The Science of Sound, 2nd ed. Ad-

dison Wesley, 1990.

[8] Sun. X, “Pitch determination and voice quality

analysis using subharmonic-to-harmonic ra-

tio”, Proc. of IEEE International Conference

on Acoustics, Speech, and Signal Processing,

Orlando, Florida, May 13-17, 2002.

Voice Response Questionnaire System for Speaker Recognition Using Biometric Authentication Interface

Article

Full-text available

Jan 2023

The use of voice to perform biometric authentication is an important technological development, because it is a non-invasive identification method and does not require special hardware, so it is less likely to arouse user disgust. This study tries to apply the voice recognition technology to the speech-driven interactive voice response questionnaire system aiming to upgrade the traditional speech system to an intelligent voice response questionnaire network so that the new device may offer enterprises more precise data for customer relationship management (CRM). The intelligence-type voice response gadget is becoming a new mobile channel at the current time, with functions of the questionnaire to be built in for the convenience of collecting information on local preferences that can be used for localized promotion and publicity. Authors of this study pro-pose a framework using voice recognition and intelligent analysis models to iden-tify target customers through voice messages gathered in the voice response questionnaire system; that is, transforming the traditional speech system to an intelligent voice complex. The speaker recognition system discussed here employs volume as the acoustic feature in endpoint detection as the computation load is usually low in this method. To correct two types of errors found in the end-point detection practice because of ambient noise, this study suggests ways to improve the situation. First, to reach high accuracy, this study follows a dynamic time warping (DTW) based method to gain speaker identification. Second, it is devoted to avoiding any errors in endpoint detection by filtering noise from voice signals before getting recognition and deleting any test utterances that might nega-tively affect the results of recognition. It is hoped that by so doing the recognition rate is improved. According to the experimental results, the method proposed in this research has a high recognition rate, whether it is on personal-level or indus-trial-level computers, and can reach the practical application standard. Therefore, the voice management system in this research can be regarded as Virtual customer service staff to use.

Hybrid High Noise resiliency Pitch Detection Algoritm

Article

Full-text available

Apr 2015

Raja Kishore

Pitch is one of the essential features in many speech related applications. A pitch detection algorithm (PDA) is an algorithm designed to estimate the pitch or fundamental frequency of a quasiperiodic or virtually periodic signal, usually a digital recording of speech or a musical note or tone. This can be done in the time domain or the frequency domain or both the two domains. Although numerous pitch detection algorithms have been developed, as shown in this paper, the detection ratio in noisy environments still needs improvement. In this paper, we present a hybrid noise resilient pitch detection algorithm named BaNa that combines the approaches of harmonic ratios and Cepstrum analysis. A Viterbi algorithm with a cost function is used to identify the pitch value among several pitch candidates. We use an online speech database along with a noise database to evaluate the accuracy of the BaNa algorithm and several stateof-the-art pitch detection algorithms. Results show that for all types of noises and SNR values investigated, BaNa achieves the best pitch detection accuracy. Moreover, the BaNa algorithm is shown to achieve around 80% pitch detection ratio at 0dB signalto-noise ratio (SNR).

Harmonic Differences Method for Robust Fundamental Frequency Detection in Wideband and Narrowband Speech Signals

Article

Full-text available

Oct 2021
MATH PROBL ENG

In this article, a novel pitch determination algorithm based on harmonic differences method (HDM) is proposed. Most of the algorithms today rely on autocorrelation, cepstrum, and lastly convolutional neural networks, and they have some limitations (small datasets, wideband or narrowband, musical sounds, temporal smoothing, etc.), accuracy, and speed problems. There are very rare works exploiting the spacing between the harmonics. HDM is designed for both wideband and exclusively narrowband (telephone) speech and tries to find the most repeating difference between the harmonics of speech signal. We use three vowel databases in our experiments, namely, Hillenbrand Vowel Database, Texas Vowel Database, and Vowels from the TIMIT corpus. We compare HDM with autocorrelation, cepstrum, YIN, YAAPT, CREPE, and FCN algorithms. Results show that harmonic differences are reliable and fast choice for robust pitch detection. Also, it is superior to others in most cases.

Quantitative and Qualitative Electroglottographic Wave Shape Differences in Children and Adults Using Voice Map–Based Analysis

Article

Full-text available

Jul 2021

Purpose The purpose of this study is to identify the extent to which various measurements of contacting parameters differ between children and adults during habitual range and overlap vocal frequency/intensity, using voice map–based assessment of noninvasive electroglottography (EGG). Method EGG voice maps were analyzed from 26 adults (22–45 years) and 22 children (4–8 years) during connected speech and vowel /a/ over the habitual range and the overlap vocal frequency/intensity from the voice range profile task on the vowel /a/. Mean and standard deviations of contact quotient by integration, normalized contacting speed, quotient of speed by integration, and cycle-rate sample entropy were obtained. Group differences were evaluated using the linear mixed model analysis for the habitual range connected speech and the vowel, whereas analysis of covariance was conducted for the overlap vocal frequency/intensity from the voice range profile task. Presence of a “knee” on the EGG wave shape was determined by visual inspection of the presence of convexity along the decontacting slope of the EGG pulse and the presence of the second derivative zero-crossing. Results The contact quotient by integration, normalized contacting speed, quotient of speed by integration, and cycle-rate sample entropy were significantly different in children compared to (a) adult males for habitual range and (b) adult males and adult females for the overlap vocal frequency/intensity. None of the children had a “knee” on the decontacting slope of the EGG slope. Conclusion EGG parameters of contact quotient by integration, normalized contacting speed, quotient of speed by integration, cycle-rate sample entropy, and absence of a “knee” on the decontacting slope characterize the wave shape differences between children and adults, whereas the normalized contacting speed, quotient of speed by integration, cycle-rate sample entropy, and presence of a “knee” on the downward pulse slope characterize the wave shape differences between adult males and adult females. Supplemental Material https://doi.org/10.23641/asha.15057345

A Survey of Information Technology Applications to Treat Fear of Public Speaking

Article

Full-text available

Mar 2021

Public speaking started to gain much attention when it comes to phobias, which is anxiety for new presenters. In some cases, specialists consider that avoiding the phenomenon which causes the phobia is sufficient treatment; in others, the exact opposite, being gradually exposed to the object of fear may lead to a cure. We have to start looking for other psychotherapeutic methods, innovative ones, to help people surpass their immense fears and improve their ability to give presentations. The current article presents a survey on discovering fear and anxiety when preventing and treating it and analyses their utility as tools for learning how to overcome this type of phobias, thus improving presentation ability. Using IT-based solutions for treating presented this fear, especially anxiety for new presenters. The current methods of dealing with the fear of public speaking will be reviewed, as well as Clarify the technology (tools, systems, and applications) based used for detecting and treatment. We will analyze research that studies how to detect fear and the ways to treat it, the concept behind their mechanism and the possibility of exploiting them in presentations. therefore, the paper debates these IT instruments and applications in this field. Based on the results of the survey, we will propose an appropriate mechanism for detecting degrees and types of fear when presenting presentations and their treatment.

Expressive performance on single-reed woodwind instruments: an experimental characterisation of articulatory actions

Thesis

Full-text available

Feb 2021

Montserrat Pàmies-Vilà

The clarinet and the saxophone are the two main representatives of single-reed woodwind instruments. The players of these instruments control the blowing pressure, the lip force, the finger actions, the tongue-reed interaction and the vocal tract configuration to generate and fine-tune the desired sounds. For a broad variety of musical instruments, the players' control of sound production and their influence on the acoustics of the instrument, known as player-instrument interaction, is a topic that has seen an increased interest in music acoustics during the last decade. The PhD study presented in this dissertation contributes to this topic by providing systematic experimental methodologies to analyse articulation techniques, vocal tract modifications and their relationship to the transient phenomena during clarinet and saxophone expressive performance. In single-reed woodwind instruments, articulatory actions define how detached tones appear one after another. Articulation is a feature that is highly player dependent as it is mainly achieved by controlling the breath and the tongue interaction with the reed. The current PhD study focuses on the analysis of certain player actions during articulation and how they define the characteristics of attack and release transients of tones. In other words, the aim is to analyse how players influence the beginning and ending of tones. To that end, two experimental procedures are put into practice: measurements under real playing conditions with instrumentalists and measurements under artificial blowing conditions in the laboratory. After analysis of the recorded data with clarinettists and saxophonists, the results show the intricacies of the tonguing, blowing and vocal tract actions, particularly those related to the performance of tongue-articulated tones. The artificial blowing setup recreates sound production in single-reed woodwinds providing controllable laboratory conditions to evaluate sensor technology while having direct view to the reed motion. Thereby a calibration procedure for strain-gauges is established to obtain a quantified measurement of the reed-tip displacement. In addition, by including an artificial tongue, the reproduction of tonguing actions in a controlled environment is achieved. In this work, the measurements with clarinet and saxophone players are analysed by comparing playing techniques among participants. For the experiments with clarinettists, a statistical analysis evaluates to what extent several playing configurations influence the blowing and tonguing action and the sound production. Real-playing measurements with clarinet players are also used to configure the artificial blowing-and-tonguing setup and to evaluate its performance. For their part, experiments with saxophone players are specifically dedicated to examining the vocal tract adjustments. By assessing the difference between the mouth and mouthpiece pressure, a time-domain approach is established to identify the tone transitions where vocal tract adjustments are present. Similarly, an excerpt from a clarinet concerto is used to evaluate the influence of the vocal tract on the attack transients across the registers of the clarinet. A further analysis on tongue articulation is provided by using an inverse physical model, which takes advantage of the measurements with real players to re-synthesise a clarinet musical phrase, leading to the estimation of the model parameters. This PhD study provides a thorough analysis of the acoustical implications of the player actions related to tone transitions in single-reed woodwinds. The outcome of the study might serve both musicians and acousticians towards a better understanding of the instrument-player interaction and the sound production.

Dedicated Exposure Control for Remote Photoplethysmography

Article

Full-text available

Jun 2020

This paper aims to show that control of exposure time during video capture will improve the accuracy of remote photoplethysmography (rPPG). We propose a purpose specific exposure control algorithm for use in heart rate estimation via rPPG applicable for any controllable camera. Our novel algorithm works by selecting exposure that acheives maximum Signal-to-Noise Ratio (SNR) before distortion will occur. We performed experiments to test the accuracy of non-contact PPG extracted simultaneously from two identical cameras positioned together but with different exposure time controls. Our purpose specific algorithm in camera A controlled exposure time to maximise rPPG SNR ratio while camera B remained set at one of a range of values. Exposure time set by our novel algorithm out-performed camera B with a lower mean absolute error relative to a standard pulse oximeter. A significant improvement to heart rate estimation performance using a research camera can be made with specific control of exposure time. The improvements in performance demonstrated here are an important step in taking rPPG out of a lab environment and into less controlled circumstances such clinical settings and emergency rescue scenarios.

An Efficient Real-Time Pitch Correction System via Field-Programmable Gate Array

Conference Paper

Jun 2024

Evaluating Singing for Computer Input Using Pitch, Interval, and Melody

Conference Paper

Apr 2022

Flow ball-assisted voice training: Immediate effects on vocal fold contacting

Article

Jul 2020
BIOMED SIGNAL PROCES

Objective Effects of exercises using a tool that promotes a semi-occluded artificially elongated vocal tract with real-time visual feedback of airflow – the flow ball – were tested using voice maps of EGG time-domain metrics. Methods Ten classically trained singers (5 males and 5 females) were asked to sing messa di voce exercises on eight scale tones, performed in three consecutive conditions: baseline (‘before’), flow ball phonation (‘during’), and again without the flow ball (‘after’). These conditions were repeated eight times in a row: one scale tone at a time, on an ascending whole tone scale. Audio and electroglottographic signals were recorded using a Laryngograph microprocessor. Vocal fold contacting was assessed using three time-domain metrics of the EGG waveform, using FonaDyn. The quotient of contact by integration, Qci, the normalized peak derivative, QΔ, and the index of contacting Ic, were quantified and compared between ‘before’ and ‘after’ conditions. Results Effects of flow ball exercises depended on singers’ habitual phonatory behaviours and on the position in the voice range. As computed over the entire range of the task, Qci was reduced by about 2% in five of ten singers. QΔ was 2–6% lower in six of the singers, and 3–4% higher only in the two bass-baritones. Ic decreased by almost 4% in all singers. Conclusion Overall, vocal adduction was reduced and a gentler vocal fold collision was observed for the ‘after’ conditions. Significance Flow ball exercises may contribute to the modification of phonatory behaviours of vocal pressedness.

Visualization of musical pitch

Conference Paper

Full-text available

Aug 2003

We have created software that shows a musician accurately, in real time, the pitch of the notes he or she is playing or singing. This is useful as a teaching aid for beginners and also for studying refinements of sound production such as vibrato.

On the Use of Windows for Harmonic Analysis With the Discrete Fourier Transform

Article

Full-text available

Feb 1978

fred joel harris

This paper makes available a concise review of data windows and their affect on the detection of harmonic signals in the presence of broad-band noise, and in the presence of nearby strong harmonic interference. We also call attention to a number of common errors in the application of windows when used with the fast Fourier transform. This paper includes a comprehensive catalog of data windows along with their significant performance parameters from which the different windows can be compared. Finally, an example demonstrates the use and value of windows to resolve closely spaced harmonic signals characterized by large differences in amplitude.

Pitch determination and voice quality analysis using Subharmonic-to-Harmonic Ratio

Conference Paper

May 2002

Xuejing Sun

The Science of Sound

Article

Oct 1982

Thomas Rossing

Cepstrum Pitch Determination

Article

Mar 1967

A. Michael Noll

The cepstrum, defined as the power spectrum of the logarithm of the power spectrum, has a strong peak corresponding to the pitch period of the voiced‐speech segment being analyzed. Cepstra were calculated on a digital computer and were automatically plotted on microfilm. Algorithms were developed heuristically for picking those peaks corresponding to voiced‐speech segments and the vocal pitch periods. This information was then used to derive the excitation for a computer‐simulated channel vocoder. The pitch quality of the vocoded speech was judged by experienced listeners in informal comparison tests to be indistinguishable from the original speech.

YIN, A fundamental frequency estimator for speech and music

Article

May 2002

An algorithm is presented for the estimation of the fundamental frequency (F0) of speech or musical sounds. It is based on the well-known autocorrelation method with a number of modifications that combine to prevent errors. The algorithm has several desirable features. Error rates are about three times lower than the best competing methods, as evaluated over a database of speech recorded together with a laryngograph signal. There is no upper limit on the frequency search range, so the algorithm is suited for high-pitched voices and music. The algorithm is relatively simple and may be implemented efficiently and with low latency, and it involves few parameters that must be tuned. It is based on a signal model (periodic signal) that may be extended in several ways to handle various forms of aperiodicity that occur in particular applications. Finally, interesting parallels may be drawn with models of auditory processing.

Super Resolution Pitch Determination of Speech Signals

Article

Feb 1991

Based on a new similarity model for the voice excitation process, a novel pitch determination procedure is derived. The unique features of the proposed algorithm are infinite (super) resolution, better accuracy than the difference limen for F <sub>0</sub>, robustness to noise, reliability, and modest computational complexity. The algorithm is instrumental to speech processing applications which require pitch synchronous spectral analysis. The computational complexity of the proposed algorithm is well within the capacity of modern digital signal processing (DSP) technology and therefore can be implemented in real time

Pitch Determination And Voice Quality Analysis Using Subharmonic-To-Harmonic Ratio

Article

Nov 2001

Xuejing Sun

This paper presents an improvement of a previously proposed pitch determination algorithm (PDA). Particularly aiming at handling alternate cycles in speech signal, the algorithm estimates pitch through spectrum shifting on logarithmic frequency scale and calculating the Subharmonic-to-Harmonic Ratio (SHR). The evaluation results on two databases show that this algorithm performs considerably better than other PDAs compared. Application of SHR to voice quality analysis task is also presented. The implementation and evaluation routines are available from <http://mel.speech.nwu.edu/sunxj/pda.htm>.

A smarter way to find pitch

Abstract

Recommended publications

Porosity measurement ventures out of the lab

Error estimates and adaptive refinement for plate problems

Using active teaching workshops to enhance the lecture experience

FAST CONVEGENCE CLUSTERING ENSEMBLE