Fig 1 - uploaded by Clement Laroche
Content may be subject to copyright.
Spectrogram of the synthetic test signal. 

Spectrogram of the synthetic test signal. 

Source publication
Article
Full-text available
One of the most general model of music signals considers that such signals can be represented as a sum of two distinct components: a tonal part that is sparse in frequency and temporally stable, and a transient (or percussive) part composed of short term broadband sounds. In this paper, we propose a novel hybrid method built upon Nonnegative Matrix...

Contexts in source publication

Context 1
... compute the STFT with a 512 sample- long (0.128 s) Hann analysis window and a 50% overlap. The spectrogram of the signal is represented in Figure 1. As our input signal has four sources, we expect that one source can be represented by one component and therefore, that a model of rank 4 (k = 4) should adequately model the signals. ...
Context 2
... basis functions of the resulting dictionary do not contain drum- specific information, but rather atoms specific to the training signal. Finally, Figure 10 shows the results as a function of the rank of factorization. For k P > 100, the results in terms of SDR are close to 0dB. ...
Context 3
... value of the four parameters estimated in [10] are not tuned for a wide variety of audio signals. Figure 11 shows the decomposition results on a specific song of the Medley-dB database. The parameters of the CoNMF are not tuned for this song and the algorithm does not extract the percussive and harmonic parts correctly. ...
Context 4
... CoNMF algorithm results are lower than those of the HPNMF. Some transients of the harmonic instruments are decomposed in the percussive part, and some percussive in- struments (mainly in the low frequency range) are decomposed in the harmonic part. The parameters used for the CoNMF are not the optimal for the Medley-dB database. The value of the four parameters estimated in [10] are not tuned for a wide variety of audio signals. Figure 11 shows the decomposition results on a specific song of the Medley-dB database. The parameters of the CoNMF are not tuned for this song and the algorithm does not extract the percussive and harmonic parts correctly. The NM- PCF and the MF still contain a significant amount of harmonic components in the percussive part. These two methods do not produce a clean separation. Finally, the HPNMF provide the best decomposition on this specific signal. ...
Context 5
... illustrate how the HPNMF works (Algorithm 1), we use a simple synthetic signal. The test signal models a mix of harmonic and percussive components. The harmonic part is simulated by a sum of sine waves that overlap in time and frequency. The first signal simulates a C(3) with fun- damental frequency f 0 = 131 Hz, the other one a B(4) with f 0 = 492 Hz. To simulate the percussive part, we add 0.1 s of Gaussian white noise for the first two seconds. For the last two seconds, we add 0.3 s of Gaussian white noise filtered by a high-pass filter. The signal is 5 s long and the sampling rate is 4000 Hz. We compute the STFT with a 512 sample- long (0.128 s) Hann analysis window and a 50% overlap. The spectrogram of the signal is represented in Figure 1. As our input signal has four sources, we expect that one source can be represented by one component and therefore, that a model of rank 4 (k = 4) should adequately model the signals. More precisely, for the NMF and the PNMF we chose k = 4 and for the HPNMF the rank of the harmonic part is k H = 2 and the rank of the percussive part is k P = 2. The choice of the rank of factorization is an important variable of the problem. In this case, we select it in order to illustrate the performance of the method. We will further discuss the importance of the choice of the rank of factorization in Section IV-D. We compare the HPNMF with the PNMF and the NMF using the KL distance with multiplicative update rules as stated in [46]. The three algorithms are initialized with the same random positive matrices W ini ∈ R n×k and A ini ∈ R k×m + ...
Context 6
... the remainder of this paper, all experiments are performed using the dictionaries built on the audio files of the drummer 2 dataset. Figure 9 shows the influence of the length of the audio signals on the separation results. Above 6 min of training data, the quality of the decomposition decreases. In this case, it seems that when the training signals are too long, the dictio- nary created by the NMF become very specific to the training signal. In fact, the amount of information to decompose is too large and in order to minimize the reconstruction error (i.e., the value of the cost function), the NMF will favour the basis functions capturing the maximum energy possible. These basis functions of the resulting dictionary do not contain drum- specific information, but rather atoms specific to the training signal. Finally, Figure 10 shows the results as a function of the rank of factorization. For k P > 100, the results in terms of SDR are close to 0dB. In this case the dictionary does not match the drums of the test songs and the harmonic and percussive sounds of the original signal are decomposed in the harmonic part of the HPNMF. This causes a high SAR because, although the decomposition is not satisfactory, the separated signals do not contain artifacts. In our tests, the optimal value of the rank of factorization is k P = 12. This value is specific of course to the records of the ENST-Drums [44] database and to our evaluation database (in this case, the SiSEC ...

Similar publications

Article
Full-text available
Identifying a scene based on the environment in which the related audio is recorded is known as acoustic scene classification (ASC). In this paper, a bi-level light-weight Convolutional Neural Network (CNN)-based model is presented to perform ASC. The proposed approach performs classification in two levels. The scenes are classified into three broa...

Citations

... It traces the development progresses in the guitar playing techniques and chord recognition in a chronological order. General music recognition and recommendation tasks are performed by notes analysis and instruments classification [4][5][6][7][8][9]. Research in this specific domain can be tracked from earlier systems using augmented reality (AR) display [10]. ...
Chapter
Music recommendation systems (MRS) for guitar learning have gained traction and popularity during pandemic times and aids in both self-learning and remote learning. The research area gained immense importance because of the changed learning and instructing paradigm during COVID-19 pandemic. Moreover, guitar is one of the toughest instruments to play, and the growing number of guitar learners makes system design for guitar learning a very targeted and viable research area. This paper proposes a novel customizable mathematical model for determining the difficulty of guitar triad chords for machine learning and suggesting easier triad chords that can help a lot of beginner guitarists. The work proposed in the paper is easily comprehensible as it determines the triad chord difficulty on Likert scale which proposes 1 as the easiest and 5 as the most difficult. The proposed MRS effectively aids in theoretical learning and deeper understanding of chords which are spread over the entire fretboard of the guitar and are played across various octaves. This knowledge makes the guitar playing more flavorful and effortless. Its key feature is to help a beginner guitar player understand that a single chord, more specifically in the proposed MRS case, and a triad chord can be played in 12 different variations across various locations on the fretboard with just a basic understanding of relationship between the notes.KeywordsInteractive systemsMusic recommendation systemGuitar learningMachine learningTriad chords
... Tis algorithm relies on the anisotropic continuity of the spectrogram to separate the signal. Since the shock spectrum is continuously and smoothly distributed in frequency, the harmonic spectrum is continuously and smoothly distributed in the time direction [17]. Equation (1) is derived from the differences in the spectral representation of impact and harmonic sounds. ...
Article
Full-text available
Digital music has become a hot spot with the rapid development of network technology and digital audio technology. The general public is increasingly interested in music similarity detection (MSD). Similarity detection is mainly for music style classification. The core MSD process is to first extract music features, then implement training modeling, and finally input music features into the model for detection. Deep learning (DL) is a relatively new feature extraction technology to improve the extraction efficiency of music features. This paper first introduces the convolutional neural network (CNN) of DL algorithms and MSD. Then, an MSD algorithm is constructed based on CNN. Besides, the Harmony and Percussive Source Separation (HPSS) algorithm separates the original music signal spectrogram and decomposes it into two components: time characteristic harmonics and frequency characteristic shocks. These two elements are input to the CNN together with the data in the original spectrogram for processing. In addition, the training-related hyperparameters are adjusted, and the dataset is expanded to explore the influence of different parameters in the network structure on the music detection rate. Experiments on the GTZAN Genre Collection music dataset show that this method can effectively improve MSD using a single feature. The final detection result is 75.6%, indicating the superiority of this method compared with other classical detection methods.
... Guitar is a string instrument with varying number of strings and is considered one of the toughest [7][8] instruments to play, involving numerous techniques like chords playing, fingerstyle and riffs. Breakthrough systems to assist Indian classical singers [9], system assisting Drum players [10], system assisting music theory learning by understanding music document layout [11], developing datasheets for tablatures of acoustic guitar [12] are utilizing machine learning at their design core. Although music recommendation systems also cater to variety of other applications such as development of software and applications to motivate people for exercising based upon their music preferences [13]. ...
Article
Full-text available
Music Recommendation Systems (MRS) for Guitar Learning have gained traction and popularity during pandemic times and aids in both self-learning and remote learning. The research area gained immense importance because of the changed learning and instructing paradigm during Covid-19 pandemic. Moreover, guitar is one of the toughest instruments to play and growing number of guitar learners, makes system design for guitar learning a very targeted and viable research area. This paper covers development of music recommendation system for guitar learning and guitar playing. The state-of-the-art development shows that many facets of guitar playing and learning are being explored for creating recommender systems and interactive systems. This knowledge makes the guitar playing more flavourful and effortless for users and will aid in developing machine learning based interactive recommender systems for developers.
... Many new technologies are introduced into BSS research. For example, signal sparse component analysis [2,3], dictionary learning [4,5], nonnegative matrix factorization [6,7], bounded component analysis [8,9], tensor decomposition [10,11], and machine learning [12]. However, these algorithms are sensitive to noise. ...
Article
Full-text available
Blind source separation is a widely used technique to analyze multichannel data. In most real-world applications, noise is inevitable and will affect the quality of signal separation and even make signal separation failure. In this paper, a new signal processing framework is proposed to separate noisy mixing sources. It is composed of two different stages. The first step is to process the mixing signal by a certain signal transform method to make the expected signals have energy concentration characteristics in the transform domain. The second stage is formed by a certain BSS algorithm estimating the demixing matrix in the transform domain. In the energy concentration segment, the SNR can reach a high level so that the demixing matrix can be estimated accurately. The analysis process of the proposed algorithm framework is analyzed by taking the wavelet transform as an example. Numerical experiments demonstrate the efficiency of the proposed algorithm to estimate the mixing matrix in comparison with well-known algorithms.
... As in Eq. (12), the proposed method directly treats the timedomain signals, and its constraint claims that the separated components satisfy the perfect reconstruction property in the time-domain as in a recent audio source separation method [24]. Some of the conventional HPSS methods (e.g., anisotropic smoothness based methods [5,7] and non-negative matrix factorization based methods [25,26]) assume additivity of power spectrograms, but it requires some statistical assumptions as discussed in [20]. In contrast, the constraint in the proposed method (additivity in the time-domain) is always justified. ...
Preprint
Decomposition of an audio mixture into harmonic and percussive components, namely harmonic/percussive source separation (HPSS), is a useful pre-processing tool for many audio applications. Popular approaches to HPSS exploit the distinctive source-specific structures of power spectrograms. However, such approaches consider only power spectrograms, and the phase remains intact for resynthesizing the separated signals. In this paper, we propose a phase-aware HPSS method based on the structure of the phase of harmonic components. It is formulated as a convex optimization problem in the time domain, which enables the simultaneous treatment of both amplitude and phase. The numerical experiment validates the effectiveness of the proposed method.
Preprint
Full-text available
The existing satellite communication anti-jamming technology mostly realizes anti-jamming communication from the perspective of interference suppression, and has defects such as low spectrum efficiency and limited anti-jamming capability. This paper proposes to use the independence of communication signal and interference signal to separate communication signal and interference signal in the waveform domain, and then realize communication anti-interference. This method can realize anti-jamming communication under strong jamming without reducing the spectrum efficiency. The blind source separation problem is essentially a multi-parameter joint optimization problem, and there is no analytical solution. This paper uses an artificial bee colony optimization algorithm to solve the blind separation problem and obtains a sub-optimal solution.
Thesis
The advent of affordable digital sampling technology brought about great changes in music production. Experienced producers of sample-based electronic music (SBEM) are capable of understanding the intricate relationship between a sampled source and its use in a new track, harnessing its properties to shape the structure, timbre, and rhythm of their compositions. However, automated analysis of the phenomena surrounding both SBEM and the sources it uses for its samples is a challenge which involves many tasks from music information retrieval (MIR) and audio processing. In this thesis we develop models and techniques to better understand SBEM at different levels. In particular, we offer four main technical contributions to retrieval and analysis tasks. First, we explore how timbral changes affect the spectral peak maps used in audio fingerprinting as a means to identify overlapping samples. Second, we analyze the structure of typical SBEM tracks using audio decomposition based on non-negative matrix factor deconvolution (NMFD). Third, we investigate the interaction of timbre and structure by designing a mid-level audio feature based on cascaded harmonic-residual-percussive source separation (CHRP). Fourth, we apply random forests to identify the quintessential sampling source: drum breaks. Given their prominent role in SBEM, we devote considerable attention to drum breaks. As an application to computational musicology, we formalize an algorithm for calculating local swing ratio in drum breaks and adapt an autocorrelation-based method to analyze their microrhythmic properties. Finally, we present a creative application to music production which allows combining the temporal and timbral properties of two separate drum breaks (redrumming). Despite the massive commercial success of SBEM and the practice of sampling, they mostly remain outside the attention of formal academic studies and MIR research. In this thesis our overarching goal is to identify and formalize some of the fundamental audio processing tasks related to SBEM, proposing methods that can seed further research.