Article

A music recommender system based on compact convolutional transformers

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Manual classification of millions of songs of the same or different genres is a challenging task for human beings. Therefore, there should be a machine intelligent model that can classify the genres of the songs very accurately. In this paper, a deep learning-based hybrid model is proposed for the analysis and classification of different music genre files. The proposed hybrid model mainly uses a combination of multimodal and transfer learning-based models for classification. This model is analyzed using GTZAN and Ballroom datasets. The GTZAN dataset contains 1000 music files classified with 10 different kinds of music genres such as Metal, Classical, Rock, Reggae, Pop, Disco, Blues, Country, Hip-Hop and Jazz, and the duration of each music file is 30 s. The Ballroom dataset contains 698 music files classified into 8 different kinds of music genres such as Tango, ChaChaCha, Rumba, Viennese waltz, Jlive, Waltz, Quickstep and Samba, and the duration of each music file is 30 s. The performance of the model is evaluated using the Python tool. The macro-average and weighted average are taken for computing the percentage of accuracy of each model. From the results, it is found that the proposed hybrid model is able to perform better as compared to other deep learning models such as the convolution neural network model, transfer learning-based model, multimodal model, machine learning models and other existing models in terms of training accuracy, validation accuracy, training loss, validation loss, precision, recall, F1-score and support.
Article
Full-text available
Recommender systems have been applied in a wide range of domains such as e-commerce, media, banking, and utilities. This kind of system provides personalized suggestions based on large amounts of data to increase user satisfaction. These suggestions help client select products, while organizations can increase the consumption of a product. In the case of social data, sentiment analysis can help gain better understanding of a user’s attitudes, opinions and emotions, which is beneficial to integrate in recommender systems for achieving higher recommendation reliability. On the one hand, this information can be used to complement explicit ratings given to products by users. On the other hand, sentiment analysis of items that can be derived from online news services, blogs, social media or even from the recommender systems themselves is seen as capable of providing better recommendations to users. In this study, we present and evaluate a recommendation approach that integrates sentiment analysis into collaborative filtering methods. The recommender system proposal is based on an adaptive architecture, which includes improved techniques for feature extraction and deep learning models based on sentiment analysis. The results of the empirical study performed with two popular datasets show that sentiment–based deep learning models and collaborative filtering methods can significantly improve the recommender system’s performance.
Article
Full-text available
Deep learning methods are gaining popularity in different application domains, and especially in natural language processing. It is commonly believed that using a large enough dataset and an adequate network architecture, almost any processing problem can be solved. A frequent and widely used typology is the encoder-decoder architecture, where the input data is transformed into an intermediate code by means of an encoder, and then a decoder takes this code to produce its output. Different types of networks can be used in the encoder and the decoder, depending on the problem of interest, such as convolutional neural networks (CNN) or long-short term memories (LSTM). This paper uses for the encoder a method recently proposed, called Causal Feature Extractor (CFE). It is based on causal convolutions (i.e., convolutions that depend only on one direction of the input), dilatation (i.e., increasing the aperture size of the convolutions) and bidirectionality (i.e., independent networks in both directions). Some preliminary results are presented on three different tasks and compared with state-of-the-art methods: bilingual translation, LaTeX decompilation and audio transcription. The proposed method achieves promising results, showing its ubiquity to work with text, audio and images. Moreover, it has a shorter training time, requiring less time per iteration, and a good use of the attention mechanisms based on attention matrices.
Article
Full-text available
Agriculture is one of the most important sector in India and the farmers are one of the most essential members of society. The major economy of the country comes from the agricultural sector. Though there is no end to the woes of Indian Farmers. One of the major causes for the continuing Indian farmer distress is lack of knowledge and benefits of the agricultural programs and schemes proposed by the Government of India.The Collaborative Recommendation System For Agriculture Sector is one such way to solve this problem. There are various workshops conducted to create awareness about the government schemes to the farmers but still the results are not seen as expected. Even if they are aware they are not solved and hence many NGOs and and Institutes have come up with various measures to solve this problem. Our research system focuses on helping the farmers by answering their agricultural queries by generating a profile of basic requirements through a web application and recommends the proposed government schemes developed to help farmers.The recommendation system also periodically update farmers with the recent trends in agricultural field, new Government schemes and programs. Keywords - agriculture, government schemes, web application, recommendation, knn algorithm, cosine similarity, CRSAS- Collaborative Recommendation System For Agriculture Sector
Article
Full-text available
Today, music is a very important and perhaps inseparable part of people's daily life. There are many genres of music and these genres are different from each other, resulting in people to have different preferences of music. As a result, it is an important and up‐to‐date issue to classify music and to recommend people new music in music listening applications and platforms. Classifying music by their genre is one of the most useful techniques used to solve this problem. There are a number of approaches for music classification and recommendation. One approach is based on the acoustic characteristics of music. In this study, a music genre classification system and music recommendation engine, which focuses on extracting representative features that have been obtained by a novel deep neural network model, have been proposed. Acoustic features extracted from these networks have been utilised for music genre classification and music recommendation on a data set.
Article
Full-text available
Several researchers are trying to develop different computer‐aided diagnosis system for breast cancer employing machine learning (ML) methods. The inputs to these ML algorithms are labeled histopathological images which have complex visual patterns. So, it is difficult to identify quality features for cancer diagnosis. The pre‐trained Convolutional Neural Networks (CNNs) have recently emerged as an unsupervised feature extractor. However, a limited investigation has been done for breast cancer recognition using histopathology images with CNN as a feature extractor. This work investigates ten different pre‐trained CNNs for extracting the features from breast cancer histopathology images. The breast cancer histopathological images are obtained from publicly available BreakHis dataset. The classification models for the different feature sets, which are obtained using different pre‐trained CNNs in consideration, are developed using a linear support vector machine. The proposed method outperforms the other state of art methods for cancer detection, which can be observed from the results obtained.
Article
Full-text available
With commercial music streaming service which can be accessed from mobile devices, the availability of digital music currently is abundant compared to previous era. Sorting out all this digital music is a very time-consuming and causes information fatigue. Therefore, it is very useful to develop a music recommender system that can search in the music libraries automatically and suggest suitable songs to users. By using music recommender system, the music provider can predict and then offer the appropriate songs to their users based on the characteristics of the music that has been heard previously. Our research would like to develop a music recommender system that can give recommendations based on similarity of features on audio signal. This study uses convolutional recurrent neural network (CRNN) for feature extraction and similarity distance to look similarity between features. The results of this study indicate that users prefer recommendations that consider music genres compared to recommendations based solely on similarity.
Article
Full-text available
Text Mining is the excavations carried out by the computer to get something new that comes from information extracted automatically from data sources of different text. Clustering technique itself is a grouping technique that is widely used in data mining. The aim of this study was to find the most optimum value similarity. Jaccard similarity method used similarity, cosine similarity and a combination of Jaccard similarity and cosine similarity. By combining the two similarity is expected to increase the value of the similarity of the two titles. While the document is used only in the form of a title document of practical work in the Department of Informatics Engineering University of Ahmad Dahlan. All these articles have been through the process of preprocessing beforehand. And the method used is the method of document clustering with Shared Nearest Neighbor (SNN). Results from this study is the cosine similarity method gives the best value of proximity or similarity compared to Jaccard similarity and a combination of both
Article
Full-text available
Background subtraction, or foreground detection, is a challenging problem in video processing. This problem is mainly concerned with a binary classification task, which designates each pixel in a video sequence as belonging to either the background or foreground scene. Traditional approaches for tackling this problem lack the power of capturing deep information in videos from a dynamic environment encountered in real-world applications, thus often achieving low accuracy and unsatisfactory performance. In this paper, we introduce a new 3D atrous convolutional neural network (CNN), used as a deep visual feature extractor, and stack convolutional long short-term memory (ConvLSTM) networks on top of the feature extractor to capture long-term dependencies in video data. This novel architecture is named a 3D atrous convolutional long short-term memory network. The new network can capture not only deep spatial information but also long-term temporal information in the video data. We train the proposed 3D atrous ConvLSTM network with focal loss to tackle the class imbalance problem commonly seen in background subtraction. Experimental results on a wide range of videos demonstrate the effectiveness of our approach and its superiority over existing methods. OAPA
Article
Full-text available
Music recommender systems is an important field of research because of easy availability and use of online music. The most existing models only focus on explicit data like ratings and other user-item dimensions. A challenging problem in music recommendation is to model a variety of contextual information, such as feedback, time and location. In this article, we proposed a competent hybrid music recommender system (HMRS), which works on context and collaborative approaches. The timestamp is extracted from users listening log to construct a decision context behavior that extracted various temporal features like a week, sessions(as morning, evening or night). We used depth-first-search (DFS) algorithm which traverses the whole graph through the paths in different contexts. Bellman-Ford algorithm provides ranked list of recommended items with multi-layer context graph. We enhanced the process using particle swarm optimization (PSO) which produced highly optimized results. The dataset is used from Last.fm which contains 19,150,868 music listening logs of 992 users (till May, 4th 2009). We extract the properties of music from user’s listening history and evaluate the efficient system to recommend music based on user’s contextual preferences. Our system noticeably delivers the best recommendations regarding recall results when compared to existing methods.
Article
Full-text available
Cosine similarity is one of the most popular measures in text classification problems. In this paper, we use this important measure to investigate the performance of Arabic language text classification. Vector space model (VSM) is generally used as a model to represent textual information as features vectors. However, Latent Semantic Indexing (LSI) is a better textual representation technique as it maintains semantic information between the words. Hence, we use the singular value decomposition (SVD) method to extract text features based on LSI. In our experiments, we conduct a comparison between the cosine similarity and some of the well-known classifiers such as Naïve Bayes, k-Nearest Neighbors, Neural Network, Random Forest, Support Vector Machine, and classification tree. We use a corpus that contains 4,000 documents of ten topics (400 document for each topic). The corpus contains 2,127,197 words with about 139,168 unique words. The testing set contains 400 documents, 40 documents for each topics. As a weighing scheme, we use Term_Frequency.Inverse_Document_Frequency (TF.IDF). This study reveals that LSI-based features significantly outperforms the TF.IDF-based features using cosine similarity measure. It also reveals that cosine similarity measure outperforms most of the investigated classifiers; however, support vector machine (SVM) was found not-significantly outperforms the cosine measure.
Article
Full-text available
This study investigates whether taking genre into account is beneficial for automatic music mood annotation in terms of core affects valence, arousal, and tension, as well as several other mood scales. Novel techniques employing genre-adaptive semantic computing and audio-based modelling are proposed. A technique called the ACTwg employs genre-adaptive semantic computing of mood-related social tags, whereas ACTwg-SLPwg combines semantic computing and audio-based modelling, both in a genre-adaptive manner. The proposed techniques are experimentally evaluated at predicting listener ratings related to a set of 600 popular music tracks spanning multiple genres. The results show that ACTwg outperforms a semantic computing technique that does not exploit genre information, and ACTwg-SLPwg outperforms conventional techniques and other genre-adaptive alternatives. In particular, improvements in the prediction rates are obtained for the valence dimension which is typically the most challenging core affect dimension for audio-based annotation. The specificity of genre categories is not crucial for the performance of ACTwg-SLPwg. The study also presents analytical insights into inferring a concise tag-based genre representation for genre-adaptive music mood analysis.
Article
Full-text available
Musical genres are categorical labels created by humans to characterize pieces of music. A musical genre is characterized by the common characteristics shared by its members. These characteristics typically are related to the instrumentation, rhythmic structure, and harmonic content of the music. Genre hierarchies are commonly used to structure the large collections of music available on the Web. Currently musical genre annotation is performed manually. Automatic musical genre classification can assist or replace the human user in this process and would be a valuable addition to music information retrieval systems. In addition, automatic musical genre classification provides a framework for developing and evaluating features for any type of content-based analysis of musical signals. In this paper, the automatic classification of audio signals into an hierarchy of musical genres is explored. More specifically, three feature sets for representing timbral texture, rhythmic content and pitch content are proposed. The performance and relative importance of the proposed features is investigated by training statistical pattern recognition classifiers using real-world audio collections. Both whole file and real-time frame-based classification schemes are described. Using the proposed feature sets, classification of 61% for ten musical genres is achieved. This result is comparable to results reported for human musical genre classification.
Article
Full-text available
Musical genres are categorical descriptions that are used to describe music. They are commonly used to structure the increasing amounts of music available in digital form on the Web and are important for music information retrieval. Genre categorization for audio has traditionally been performed manually. A particular musical genre is characterized by statistical properties related to the instrumentation, rhythmic structure and form of its members. In this work, algorithms for the automatic genre categorization of audio signals are described. More specifically, we propose a set of features for representing texture and instrumentation. In addition a novel set of features for representing rhythmic structure and strength is proposed. The performance of those feature sets has been evaluated by training statistical pattern recognition classifiers using real world audio collections. Based on the automatic hierarchical genre classification two graphical user interfaces for browsing and interacting with large audio collections have been developed.
Article
Full-text available
Recommender systems apply knowledge discovery techniques to the problem of making personalized recommendations for information, products or services during a live interaction. These systems, especially the k-nearest neighbor collaborative filtering based ones, are achieving widespread success on the Web. The tremendous growth in the amount of available information and the number of visitors to Web sites in recent years poses some key challenges for recommender systems. These are: producing high quality recommendations, performing many recommendations per second for millions of users and items and achieving high coverage in the face of data sparsity. In traditional collaborative filtering systems the amount of work increases with the number of participants in the system. New recommender system technologies are needed that can quickly produce high quality recommendations, even for very large-scale problems. To address these issues we have explored item-based collaborative filtering techniques. Itembased techniques first analyze the user-item matrix to identify relationships between different items, and then use these relationships to indirectly compute recommendations for users. In this paper we analyze different item-based recommendation generation algorithms. We look into different techniques for computing item-item similarities (e.g., item-item correlation vs. cosine similarities between item vectors) and different techniques for obtaining recommendations from them (e.g., weighted sum vs. regression model). Finally, we experimentally evaluate our results and compare them to the basic k-nearest neighbor approach. Our experiments suggest that item-based algorithms provide dramatically better performance than user-based algorithms, while at the same time p...
Article
In the intelligent transportation system, the session data usually represents the users' demand. However, the traditional approaches only focus on the sequence information or the last item clicked by the user, which cannot fully represent user preferences. To address this issue, this paper proposes an Multi-aspect Aware Session-based Recommendation (MASR) model for intelligent transportation services, which comprehensively considers the user's personalized behavior from multiple aspects. In addition, it developed a concise and efficient transformer-style self-attention to analyze the sequence information of the current session, for accurately grasping the user's intention. Finally, the experimental results show that MASR is available to improve user satisfaction with more accurate and rapid recommendations, and reduce the number of user operations to decrease the safety risk during the transportation service.
Article
Identifying perceived emotional content of music constitutes an important aspect of easy and efficient search, retrieval, and management of the media. One of the most promising use cases of music organization is an emotion-based playlist, where automatic music emotion recognition plays a significant role in providing emotion related information, which is otherwise, generally unavailable. Based on the importance of the auditory system in emotional recognition and processing, in this study, we propose a new cochleogram-based system for detecting the affective musical content. To effectively simulate the response of the human auditory periphery, the music audio signal is processed by a detailed biophysical cochlear model, thus obtaining an output that closely matches the characteristics of human hearing. In this proposed approach, based on the cochleogram images, which we construct directly from the response of the basilar membrane, a convolutional neural network (CNN) is used to extract the relevant music features. To validate the practical implications of the proposed approach with regard to its possible integration in different digital music libraries, an extensive study was conducted to evaluate the predictive performance of our approach in different aspects of music emotion recognition. The proposed approach was evaluated on publicly available 1000 songs database and the experimental results showed that it performed better in comparison with common musical features (such as tempo, mode, pitch, clarity, and perceptually motivated mel-frequency cepstral coefficients (MFCC)) as well as official ”MediaEval” challenge results on the same reference database. Our findings clearly show that the proposed approach can lead to better music emotion recognition performance and be used as part of a state-of-the-art music information retrieval system.
Article
This paper presents an architecture for real-time music genres classification on broadcast data from the standard Frequency Modulation (FM) radio band. The architecture is composed of an FPGA-based MFCC (Mel-Frequency Cepstral Coefficient) feature extraction stage, followed by a classification procedure. The proposed system enables automatic audio indexing of broadcast data from the standard FM radio band. Using a system-level design approach that reduces overall design time, the system was successfully implemented on Virtex 6 FPGA clocked at more than 150 MHz. The experiments have shown on one hand, a high accuracy between the FPGA-based MFCC calculation and its Matlab-based reference model, and on the other hand, nearly similar result when used in music genre classification with sequential patterns mining techniques. Furthermore, the proposed system satisfies real-time requirements, multichannel scalability and is suitable for indexing applications in embedded systems. Results are also discussed regarding operating frequencies and resources utilization.
Article
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Article
Diversification has become one of the leading topics of recommender system research not only as a way to solve the over-fitting problem but also an approach to increasing the quality of the user's experience with the recommender system. This article aims to provide an overview of research done on this topic from one of the first mentions of diversity in 2001 until now. The articles ,and research, have been divided into three sub-topics for a better overview of the work done in the field of recommendation diversification: the definition and evaluation of diversity; the impact of diversification on the quality of recommendation results and the development of diversification algorithms themselves. In this way, the article aims both to offer a good overview to a researcher looking for the state-of-the-art on this topic and to help a new developer get familiar with the topic.
Conference Paper
Content-based music information retrieval tasks have traditionally been solved using engineered features and shallow processing architectures. In recent years, there has been increasing interest in using feature learning and deep architectures instead, thus reducing the required engineering effort and the need for prior knowledge. However, this new approach typically still relies on mid-level representations of music audio, e.g. spectrograms, instead of raw audio signals. In this paper, we investigate whether it is possible to apply feature learning directly to raw audio signals. We train convolutional neural networks using both approaches and compare their performance on an automatic tagging task. Although they do not outperform a spectrogram-based approach, the networks are able to autonomously discover frequency decompositions from raw audio, as well as phase-and translation-invariant feature representations.
Trends in content-based recommendation
  • Lops
Similarity measures for recommender systems: a comparative study
  • Sondur
Deep learning based recommender system: A survey and new perspectives
  • Zhang