Figure 1 - uploaded by Emmanouil Benetos
Content may be subject to copyright.
Diagram for the proposed multiple fundamental frequency estimation system.  

Diagram for the proposed multiple fundamental frequency estimation system.  

Source publication
Article
Full-text available
This paper proposes a system for multiple fundamental fre-quency estimation of piano sounds using pitch candidate selec-tion rules which employ spectral structure and temporal evolu-tion. As a time-frequency representation, the Resonator Time-Frequency Image of the input signal is employed, a noise sup-pression model is used, and a spectral whiteni...

Contexts in source publication

Context 1
... this section, the preprocessing steps employed by the pro- posed multiple-F0 estimation system are described. These steps can also be seen in a diagram for the proposed system, which is displayed in Figure 1. ...
Context 2
... algorithm that was created for multiple-F0 estimation ex- periments is described in this section. A diagram showing the stages of the proposed system is displayed in Figure 1. ...

Similar publications

Article
Full-text available
In this paper, we present an adaptive spectrum estimation method for non-stationary Biomedical Signals. The algorithm is based on time-varying autoregressive (TVAR) modeling where the time varying parameters are estimated by Kalman filtering. The algorithm generates adaptively an estimate of the power spectral density (PSD) at each time instant. A...

Citations

... Since the classical approach is based on the amplitude/energy of the spectrum, it is sensitive to the timbre of the sources. In order to make the estimation more robust against timbre variations, spectral whitening or flattening processes have been proposed [2,6,7,8]. ...
... Comparison to state-of-the-art: In Table 1, we indicate the Pitch F-Measure results of our system in harmonic setting (P1, β is forced to 0) and in inharmonic setting (P2 , β is estimated). We compare our results to the ones obtained by Emiya et al. [15] and Benetos et al. [7] on the same test-set. Also, the results obtained by directly applying a threshold on the detected peaks are reported as a baseline results. ...
... Peak is a fixed threshold on detected peaks, P1 is the proposed method without considering inharmonicity (β forced to 0) and P2 is with the inharmonic model (β is estimated). Emiya et al. is presented in[15] and Benetos et al. in[7]. ...
Conference Paper
Full-text available
In this paper, a novel approach for the computation of a pitch salience function is presented. The aim of a pitch (considered here as synonym for fundamental frequency) salience function is to es-timate the relevance of the most salient musical pitches that are present in a certain audio excerpt. Such a function is used in nu-merous Music Information Retrieval (MIR) tasks such as pitch, multiple-pitch estimation, melody extraction and audio features computation (such as chroma or Pitch Class Profiles). In order to compute the salience of a pitch candidate f , the classical approach uses the weighted sum of the energy of the short time spectrum at its integer multiples frequencies hf . In the present work, we pro-pose a different approach which does not rely on energy but only on frequency location. For this, we first estimate the peaks of the short time spectrum. From the frequency location of these peaks, we evaluate the likelihood that each peak is an harmonic of a given fundamental frequency. The specificity of our method is to use as likelihood the deviation of the harmonic frequency locations from the pitch locations of the equal tempered scale. This is used to cre-ate a theoretical sequence of deviations which is then compared to an observed one. The proposed method is then evaluated for a task of multiple-pitch estimation using the MAPS test-set.
... Finally, previous work by the authors includes an iterative system for multiple-F0 estimation for piano sounds [11] which incorporates temporal information for pitch estimation based on the common amplitude modulation (CAM) assumption and a public evaluation of the aforementioned system for the MIREX 2010 multiple fundamental frequency estimation task [12]. Results for the MIREX task were encouraging, considering that the system was trained on isolated piano sounds and tested on woodwind and string recordings, noting also that no note tracking procedure was incorporated. ...
... For the isolated chord experiments using the MAPS database, the performance of the proposed transcription system compared with the results shown in [11] and [6] is shown in Fig. 5, organized according to the polyphony level of the ground truth (experiments were performed with unknown polyphony). The mean F-measures for polyphony levels L = 1, . . . ...
... In terms of a general comparison between all systems, the global F-measure for all sounds was used, where the proposed system outperforms all other approaches, reaching 88.54%. The system in [11] reports 87.47%, the system in [6] 83.70%, and finally the algorithm of [24] used for comparison in [6] reports 85.25%. By applying the same significance tests as in [11], it can be seen that the proposed method outperforms the methods of [6], [11], [24] in a statistically significant manner with 95% confidence. ...
Article
Full-text available
In this paper, a method for automatic transcription of music signals based on joint multiple-F0 estimation is proposed. As a time-frequency representation, the constant-Q resonator time-frequency image is employed, while a novel noise suppression technique based on pink noise assumption is applied in a preprocessing step. In the multiple-F0 estimation stage, the optimal tuning and inharmonicity parameters are computed and a salience function is proposed in order to select pitch candidates. For each pitch candidate combination, an overlapping partial treatment procedure is used, which is based on a novel spectral envelope estimation procedure for the log-frequency domain, in order to compute the harmonic envelope of candidate pitches. In order to select the optimal pitch combination for each time frame, a score function is proposed which combines spectral and temporal characteristics of the candidate pitches and also aims to suppress harmonic errors. For postprocessing, hidden Markov models (HMMs) and conditional random fields (CRFs) trained on MIDI data are employed, in order to boost transcription accuracy. The system was trained on isolated piano sounds from the MAPS database and was tested on classic and jazz recordings from the RWC database, as well as on recordings from a Disklavier piano. A comparison with several state-of-the-art systems is provided using a variety of error metrics, where encouraging results are indicated.
... Afterwards, an algorithm for noise suppression is performed to the whitened RTFI. A two-stage median filtering procedure with 1 3 octave span is applied to Y [k] resulting in a noise representation N [k], in a similar way to [4]. Cepstral smoothing using D = 30 coefficients is applied to N [k] (as in [3]) and the resulting smooth curve N ′ [k] is subtracted from Y [k], resulting in the whitened and noise-suppressed RTFI representation Z[k] ...
... Approaches to transcription related to the current work include the iterative spectral subtraction-based system in [1], the rule-based system in [2] which employed the resonator time-frequency image (RTFI) as a time-frequency representation, and the score functionbased joint multiple-F0 estimation approach in [3]. Previous work by the authors includes a system for iterative multiple-F0 estimation [4] , which was also evaluated for the 2010 MIREX multi-F0 estimation task. As far as onset detection is concerned, an overview can be seen in [5], where the spectral flux and phase deviation are combined into a complex onset detection feature. ...
... Afterwards, an algorithm for noise suppression is performed to the whitened RTFI. A two-stage median filtering procedure with 1 3 octave span is applied to Y [k] resulting in a noise representation N [k], in a similar way to [4]. Cepstral smoothing using D = 30 coefficients is applied to N [k] (as in [3]) and the resulting smooth curve N [k] is subtracted from Y [k], resulting in the whitened and noise-suppressed RTFI representation Z[k]. ...
Conference Paper
Full-text available
In this paper, an approach for polyphonic music transcription based on joint multiple-F0 estimation and note onset/offset detection is proposed. For preprocessing, the resonator time-frequency image of the input music signal is extracted and noise suppression is performed. A pitch salience function is extracted for each frame along with tuning and inharmonicity parameters. For onset detection, late fusion is employed by combining a novel spectral flux-based feature which incorporates pitch tuning information and a novel salience function-based descriptor. For each segment defined by two onsets, an overlapping partial treatment procedure is used and a pitch set score function is proposed. A note offset detection procedure is also proposed using HMMs trained on MIDI data. The system was trained on piano chords and tested on classic and jazz recordings from the RWC database. Improved transcription results are reported compared to state-of-the-art approaches.
Article
With the increasing importance of smart gadgets in our daily lives, there is a need for an automatic piano transcription system in various multimedia services. For automatic piano transcription, the string inharmonicity coefficient ($B$ ) and fundamental frequency (${f_0}$) should be detected robustly and accurately. The proposed triplet-sequentially additive partial (SAP) algorithm improves the current $B$ estimation algorithm in terms of both performance and speed with less prior knowledge. Additionally, this joint ($B$ , ${f_0}$ ) estimation algorithm is applied directly to the transcription of real piano recordings, and the 4.41% improvement of accuracy was achieved over another transcription system that had both similar processing steps and feature extraction method.