Spectrogram of the synthetic test signal.

Source publication

Hybrid Projective Non Negative Matrix Factorization with Drum Dictionaries for Harmonic/Percussive Source Separation

Article

Full-text available

Apr 2018

One of the most general model of music signals considers that such signals can be represented as a sum of two distinct components: a tonal part that is sparse in frequency and temporally stable, and a transient (or percussive) part composed of short term broadband sounds. In this paper, we propose a novel hybrid method built upon Nonnegative Matrix...

Context 1

... compute the STFT with a 512 sample- long (0.128 s) Hann analysis window and a 50% overlap. The spectrogram of the signal is represented in Figure 1. As our input signal has four sources, we expect that one source can be represented by one component and therefore, that a model of rank 4 (k = 4) should adequately model the signals. ...

View in full-text

Context 2

... basis functions of the resulting dictionary do not contain drum- specific information, but rather atoms specific to the training signal. Finally, Figure 10 shows the results as a function of the rank of factorization. For k P > 100, the results in terms of SDR are close to 0dB. ...

View in full-text

Context 3

... value of the four parameters estimated in [10] are not tuned for a wide variety of audio signals. Figure 11 shows the decomposition results on a specific song of the Medley-dB database. The parameters of the CoNMF are not tuned for this song and the algorithm does not extract the percussive and harmonic parts correctly. ...

View in full-text

Context 4

... CoNMF algorithm results are lower than those of the HPNMF. Some transients of the harmonic instruments are decomposed in the percussive part, and some percussive in- struments (mainly in the low frequency range) are decomposed in the harmonic part. The parameters used for the CoNMF are not the optimal for the Medley-dB database. The value of the four parameters estimated in [10] are not tuned for a wide variety of audio signals. Figure 11 shows the decomposition results on a specific song of the Medley-dB database. The parameters of the CoNMF are not tuned for this song and the algorithm does not extract the percussive and harmonic parts correctly. The NM- PCF and the MF still contain a significant amount of harmonic components in the percussive part. These two methods do not produce a clean separation. Finally, the HPNMF provide the best decomposition on this specific signal. ...

View in full-text

Context 5

... illustrate how the HPNMF works (Algorithm 1), we use a simple synthetic signal. The test signal models a mix of harmonic and percussive components. The harmonic part is simulated by a sum of sine waves that overlap in time and frequency. The first signal simulates a C(3) with fun- damental frequency f 0 = 131 Hz, the other one a B(4) with f 0 = 492 Hz. To simulate the percussive part, we add 0.1 s of Gaussian white noise for the first two seconds. For the last two seconds, we add 0.3 s of Gaussian white noise filtered by a high-pass filter. The signal is 5 s long and the sampling rate is 4000 Hz. We compute the STFT with a 512 sample- long (0.128 s) Hann analysis window and a 50% overlap. The spectrogram of the signal is represented in Figure 1. As our input signal has four sources, we expect that one source can be represented by one component and therefore, that a model of rank 4 (k = 4) should adequately model the signals. More precisely, for the NMF and the PNMF we chose k = 4 and for the HPNMF the rank of the harmonic part is k H = 2 and the rank of the percussive part is k P = 2. The choice of the rank of factorization is an important variable of the problem. In this case, we select it in order to illustrate the performance of the method. We will further discuss the importance of the choice of the rank of factorization in Section IV-D. We compare the HPNMF with the PNMF and the NMF using the KL distance with multiplicative update rules as stated in [46]. The three algorithms are initialized with the same random positive matrices W ini ∈ R n×k and A ini ∈ R k×m + ...

View in full-text

Context 6

... the remainder of this paper, all experiments are performed using the dictionaries built on the audio files of the drummer 2 dataset. Figure 9 shows the influence of the length of the audio signals on the separation results. Above 6 min of training data, the quality of the decomposition decreases. In this case, it seems that when the training signals are too long, the dictio- nary created by the NMF become very specific to the training signal. In fact, the amount of information to decompose is too large and in order to minimize the reconstruction error (i.e., the value of the cost function), the NMF will favour the basis functions capturing the maximum energy possible. These basis functions of the resulting dictionary do not contain drum- specific information, but rather atoms specific to the training signal. Finally, Figure 10 shows the results as a function of the rank of factorization. For k P > 100, the results in terms of SDR are close to 0dB. In this case the dictionary does not match the drums of the test songs and the harmonic and percussive sounds of the original signal are decomposed in the harmonic part of the HPNMF. This causes a high SAR because, although the decomposition is not satisfactory, the separated signals do not contain artifacts. In our tests, the optimal value of the rank of factorization is k P = 12. This value is specific of course to the records of the ENST-Drums [44] database and to our evaluation database (in this case, the SiSEC ...

View in full-text

Block diagram of the proposed Bi-level acoustic scene classification

Illustration of convolution and depthwise convolution blocks [30]

Working of SENet architecture [16] (Legend: H-Height, W-Width, C-Channels)

Different spectral feature representations used for the training of the...

Bi-level Acoustic Scene Classification Using Lightweight Deep Learning Model

Article

Full-text available

Aug 2023

Identifying a scene based on the environment in which the related audio is recorded is known as acoustic scene classification (ASC). In this paper, a bi-level light-weight Convolutional Neural Network (CNN)-based model is presented to perform ASC. The proposed approach performs classification in two levels. The scenes are classified into three broa...

A Customizable Mathematical Model for Determining the Difficulty of Guitar Triad Chords for Machine Learning

Chapter

Jul 2023

Music recommendation systems (MRS) for guitar learning have gained traction and popularity during pandemic times and aids in both self-learning and remote learning. The research area gained immense importance because of the changed learning and instructing paradigm during COVID-19 pandemic. Moreover, guitar is one of the toughest instruments to play, and the growing number of guitar learners makes system design for guitar learning a very targeted and viable research area. This paper proposes a novel customizable mathematical model for determining the difficulty of guitar triad chords for machine learning and suggesting easier triad chords that can help a lot of beginner guitarists. The work proposed in the paper is easily comprehensible as it determines the triad chord difficulty on Likert scale which proposes 1 as the easiest and 5 as the most difficult. The proposed MRS effectively aids in theoretical learning and deeper understanding of chords which are spread over the entire fretboard of the guitar and are played across various octaves. This knowledge makes the guitar playing more flavorful and effortless. Its key feature is to help a beginner guitar player understand that a single chord, more specifically in the proposed MRS case, and a triad chord can be played in 12 different variations across various locations on the fretboard with just a basic understanding of relationship between the notes.KeywordsInteractive systemsMusic recommendation systemGuitar learningMachine learningTriad chords

Article

Full-text available

Feb 2023
Comput Intell Neurosci

Xiuli Wang

Digital music has become a hot spot with the rapid development of network technology and digital audio technology. The general public is increasingly interested in music similarity detection (MSD). Similarity detection is mainly for music style classification. The core MSD process is to first extract music features, then implement training modeling, and finally input music features into the model for detection. Deep learning (DL) is a relatively new feature extraction technology to improve the extraction efficiency of music features. This paper first introduces the convolutional neural network (CNN) of DL algorithms and MSD. Then, an MSD algorithm is constructed based on CNN. Besides, the Harmony and Percussive Source Separation (HPSS) algorithm separates the original music signal spectrogram and decomposes it into two components: time characteristic harmonics and frequency characteristic shocks. These two elements are input to the CNN together with the data in the original spectrogram for processing. In addition, the training-related hyperparameters are adjusted, and the dataset is expanded to explore the influence of different parameters in the network structure on the music detection rate. Experiments on the GTZAN Genre Collection music dataset show that this method can effectively improve MSD using a single feature. The final detection result is 75.6%, indicating the superiority of this method compared with other classical detection methods.

A Systematic Review of Machine Learning state of the art in Guitar Playing and Learning

Article

Full-text available

Sep 2022

Music Recommendation Systems (MRS) for Guitar Learning have gained traction and popularity during pandemic times and aids in both self-learning and remote learning. The research area gained immense importance because of the changed learning and instructing paradigm during Covid-19 pandemic. Moreover, guitar is one of the toughest instruments to play and growing number of guitar learners, makes system design for guitar learning a very targeted and viable research area. This paper covers development of music recommendation system for guitar learning and guitar playing. The state-of-the-art development shows that many facets of guitar playing and learning are being explored for creating recommender systems and interactive systems. This knowledge makes the guitar playing more flavourful and effortless for users and will aid in developing machine learning based interactive recommender systems for developers.

A New Blind Source Separation Algorithm Framework for Noisy Mixing Model Based on the Energy Concentration Characteristic in Signal Transform Domain

Article

Full-text available

Dec 2021
MATH PROBL ENG

Blind source separation is a widely used technique to analyze multichannel data. In most real-world applications, noise is inevitable and will affect the quality of signal separation and even make signal separation failure. In this paper, a new signal processing framework is proposed to separate noisy mixing sources. It is composed of two different stages. The first step is to process the mixing signal by a certain signal transform method to make the expected signals have energy concentration characteristics in the transform domain. The second stage is formed by a certain BSS algorithm estimating the demixing matrix in the transform domain. In the energy concentration segment, the SNR can reach a high level so that the demixing matrix can be estimated accurately. The analysis process of the proposed algorithm framework is analyzed by taking the wavelet transform as an example. Numerical experiments demonstrate the efficiency of the proposed algorithm to estimate the mixing matrix in comparison with well-known algorithms.

Phase-aware Harmonic/Percussive Source Separation via Convex Optimization

Preprint

Mar 2019

Decomposition of an audio mixture into harmonic and percussive components, namely harmonic/percussive source separation (HPSS), is a useful pre-processing tool for many audio applications. Popular approaches to HPSS exploit the distinctive source-specific structures of power spectrograms. However, such approaches consider only power spectrograms, and the phase remains intact for resynthesizing the separated signals. In this paper, we propose a phase-aware HPSS method based on the structure of the phase of harmonic components. It is formulated as a convex optimization problem in the time domain, which enables the simultaneous treatment of both amplitude and phase. The numerical experiment validates the effectiveness of the proposed method.

Satellite communication anti-jamming based on artificial bee colony blind source separation

Conference Paper

Nov 2021

Satellite communication anti-jamming based on ABC blind source separation

Preprint

Full-text available

Feb 2021

Jiong Li

The existing satellite communication anti-jamming technology mostly realizes anti-jamming communication from the perspective of interference suppression, and has defects such as low spectrum efficiency and limited anti-jamming capability. This paper proposes to use the independence of communication signal and interference signal to separate communication signal and interference signal in the waveform domain, and then realize communication anti-interference. This method can realize anti-jamming communication under strong jamming without reducing the spectrum efficiency. The blind source separation problem is essentially a multi-parameter joint optimization problem, and there is no analytical solution. This paper uses an artificial bee colony optimization algorithm to solve the blind separation problem and obtains a sub-optimal solution.

Phase-aware Harmonic/percussive Source Separation via Convex Optimization

Conference Paper

May 2019

Analyzing Sample-Based Electronic Music Using Audio Processing Techniques

Thesis

Jan 2019

Patricio Lopez Serrano Erickson

The advent of affordable digital sampling technology brought about great changes in music production. Experienced producers of sample-based electronic music (SBEM) are capable of understanding the intricate relationship between a sampled source and its use in a new track, harnessing its properties to shape the structure, timbre, and rhythm of their compositions. However, automated analysis of the phenomena surrounding both SBEM and the sources it uses for its samples is a challenge which involves many tasks from music information retrieval (MIR) and audio processing. In this thesis we develop models and techniques to better understand SBEM at different levels. In particular, we offer four main technical contributions to retrieval and analysis tasks. First, we explore how timbral changes affect the spectral peak maps used in audio fingerprinting as a means to identify overlapping samples. Second, we analyze the structure of typical SBEM tracks using audio decomposition based on non-negative matrix factor deconvolution (NMFD). Third, we investigate the interaction of timbre and structure by designing a mid-level audio feature based on cascaded harmonic-residual-percussive source separation (CHRP). Fourth, we apply random forests to identify the quintessential sampling source: drum breaks. Given their prominent role in SBEM, we devote considerable attention to drum breaks. As an application to computational musicology, we formalize an algorithm for calculating local swing ratio in drum breaks and adapt an autocorrelation-based method to analyze their microrhythmic properties. Finally, we present a creative application to music production which allows combining the temporal and timbral properties of two separate drum breaks (redrumming). Despite the massive commercial success of SBEM and the practice of sampling, they mostly remain outside the attention of formal academic studies and MIR research. In this thesis our overarching goal is to identify and formalize some of the fundamental audio processing tasks related to SBEM, proposing methods that can seed further research.

Spectrogram of the synthetic test signal.

Contexts in source publication

Similar publications

Citations