Rodrigo Mello

Rodrigo Mello
University of São Paulo | USP · Department of Computer Science (ICMC)

PhD in Electrical Engineering

About

184
Publications
42,011
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,514
Citations
Introduction

Publications

Publications (184)
Preprint
Full-text available
Implementing insect monitoring systems provides an excellent opportunity to create accurate interventions for insect control. However, selecting the appropriate time for an intervention is still an open question due to the inherent difficulty of implementing on-site monitoring in real-time. This decision is even more critical with insect species th...
Conference Paper
Full-text available
Insect outbreaks can affect forests and agroecosystems, resulting in economic and environmental damage. This problem provides an opportunity for Machine-Learning (ML) applications, mainly studies with cause-effect relationships. However, many studies do not consider causality analysis, focusing solely on feature selection and prediction with ML met...
Conference Paper
Full-text available
Implementing insect monitoring systems provides an excellent opportunity to create accurate interventions for insect control. Growers can use methods enlightened by Integrated Pest Management to prevent economic damage to their crops. However, selecting the appropriate time for applying an intervention is still an open question. This decision is ev...
Article
Supervised learning algorithms consider different learning biases from the universe of all admissible functions to induce classifiers. When using ensembles, one takes advantage of different biases typically built from the same algorithm to combine complementary classifiers into a single model, such as Random Forest, that builds up several trees fro...
Article
Full-text available
Unlabelled data appear in many domains and are particularly relevant to streaming applications, where even though data is abundant, labelled data is rare. To address the learning problems associated with such data, one can ignore the unlabelled data and focus only on the labelled data (supervised learning); use the labelled data and attempt to leve...
Article
The striking results of deep neural networks (DNN) have motivated its wide acceptance to tackle large datasets and complex tasks such as natural language processing, facial recognition, and artificial image generation. However, DNN parameters are often empirically selected on a trial-and-error approach without detailed information on convergence be...
Article
Time series analysis models, understands, and predicts phenomena from different domains such as meteorology, medicine, and economics. In this context, Fuzzy Time Series has been standing out due to its capacity of using mathematical functions to represent linguistic variables, resulting in interpretative and more accurate models. Several studies ai...
Article
Full-text available
COVID-19 has widely spread around the world, impacting the health systems of several countries in addition to the collateral damage that societies will face in the next years. Although the comparison between countries is essential for controlling this disease, the main challenge is the fact of countries are not simultaneously affected by the virus....
Article
The data clustering problem is of central importance for the area of machine learning, given its usefulness to represent data structural similarities from input spaces. Although, data clustering counts on scarse literature of a theoretical framework with generalization guarantees. In this context, this manuscript introduces a new concept, based on...
Preprint
Full-text available
Unlabelled data appear in many domains and are particularly relevant to streaming applications, where even though data is abundant, labelled data is rare. To address the learning problems associated with such data, one can ignore the unlabelled data and focus only on the labelled data (supervised learning); use the labelled data and attempt to leve...
Article
Cover Song Identification (CSI) is a task in Music Information Retrieval (MIR) that attempts to identify other versions of a song containing different structures, tonalities, and tempos, what brings several challenges to this task. Some of frameworks proposed to identify cover songs were evaluated through the Music Information Retrieval Evaluation...
Article
Full-text available
Visual question answering (VQA) is an emerging task combining natural language processing and computer vision technology. Selecting compelling multi-modality features is the core of visual question answering. In multi-modal learning, the attention network provides an effective way that selectively utilizes the given visual information. However, the...
Article
The Intergovernmental Panel on Climate Change and the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services have emphasised unequivocal evidences about the impact of human actions on climate and biodiversity at alarming rates. In Brazilian terms, 2019 has been marked by controversial discussions among politicians and envi...
Chapter
The increasingly relevance of data streams in the context of machine learning and artificial intelligence has motivated this paper which discusses and draws necessary relationships between the concepts of data streams and time series in attempt to build on theoretical foundations to support online learning in such scenarios. We unify the concepts o...
Chapter
This paper presents a new approach to quantify temporal novelties in Social Networks and, as a consequence, to identify changing points driven by the occurrence of new real-world events that influence the public opinion. Our approach starts using Text Mining tools to highlight the main key terms, that will be later used to create a temporal graph,...
Preprint
Full-text available
Concept Drift (CD) detection intends to continuously identify changes in data stream behaviors, supporting researchers in the study and modeling of real-world phenomena. Motivated by the lack of learning guarantees in current CD algorithms, we decided to take advantage of the Statistical Learning Theory (SLT) to formalize the necessary requirements...
Conference Paper
Full-text available
The unbounded and multidimensional nature, the evolution of data distributions with time, and the requirement of single-pass algorithms comprise the main challenges of data stream classification , which makes it impossible to infer learning models in the same manner as for batch scenarios. Data dimensionality reduction arises as a key factor to tra...
Preprint
Full-text available
The reconstruction of phase spaces is an essential step to analyze time series according to Dynamical System concepts. A regression performed on such spaces unveils the relationships among system states from which we can derive their generating rules, that is, the most probable set of functions responsible for generating observations along time. In...
Article
Full-text available
Predicting the insulating thermal behavior of a multi-component refractory ceramic system could be a difficult task, which can be tackled using the finite element (FE) method to solve the partial differential equations of the heat transfer problem, thus calculating the temperature profiles throughout the system in any given period. Nevertheless, us...
Article
One of the most challenging tasks in volcanic data analysis is the classification of seismic events. By knowing them, it is possible to take decisions in advance, providing benefits for the neighboring societies as, for instance, how such events may impact users' life and cropland areas. Although there are several approaches to perform such task, D...
Article
Full-text available
Most of the existing outsourced encrypted data schemes are retrieved based on the query keyword entered by authorised users. However, with the increase of the data scale in the cloud storage system, the retrieval efficiency of existing solutions has not been significantly improved. In this paper, a multi-keyword ranked search scheme for ciphertext...
Article
Full-text available
This data manuscript presents a set of signals collected from the Llaima volcano located at the western edge of the Andes in Araucania Region, Chile. The signals were recorded from the LAV station between 2010 and 2016. After individually processing and analyzing every signal, specialists from the Observatorio Vulcanológico de los Andes Sur (OVDAS)...
Article
Share link (until March 06 2020): https://authors.elsevier.com/a/1aPhK~2-EzJro Materials selection of multi-component systems is a challenging task, which is usually not properly tackled in furnace linings (FL) design. In an attempt to generate a systematic approach to select FL ceramic materials, an evolutionary screening procedure (ESP) is prop...
Article
Full-text available
The Distance Vector-Hop (DV-Hop) algorithm is the most well-known range-free localization algorithm based on the distance vector routing protocol in wireless sensor networks; however, it is widely known that its localization accuracy is limited. In this paper, DEIDV-Hop is proposed, an enhanced wireless sensor node localization algorithm based on t...
Conference Paper
Full-text available
Assigning scores to individual features is a popular method for estimating the relevance of features in supervised learning. An accurate feature score estimation provides essential insights in sensitive domains, which is decisive to explain how features influence a given decision, contributing to the inter-pretability of the model. Learning from st...
Raw Data
Dataset created to monitor the Brazilian vegetation combining 4 different systems: (i) an inventory of Brazilian seed plants created to map the country biodiversity; (ii) the Fraction of Absorbed Photosynthetically Active Radiation; (iii) the NASA Power database to include meteorological data; and (iv) the DATASUS system which makes available geogr...
Article
Full-text available
In low-resource scenarios, for example, small datasets or a lack in computational resources available, state-of-the-art deep learning methods for speech recognition have been known to fail. It is possible to achieve more robust models if care is taken to ensure the learning guarantees provided by the statistical learning theory. This work presents...
Preprint
The Statistical Learning Theory (SLT) provides the theoretical background to ensure that a supervised algorithm generalizes the mapping $f: \mathcal{X} \to \mathcal{Y}$ given $f$ is selected from its search space bias $\mathcal{F}$. This formal result depends on the Shattering coefficient function $\mathcal{N}(\mathcal{F},2n)$ to upper bound the em...
Preprint
Full-text available
The Data Clustering (DC) problem is of central importance for the area of Machine Learning (ML), given its usefulness to represent data structural similarities from input spaces. Differently from Supervised Machine Learning (SML), which relies on the theoretical frameworks of the Statistical Learning Theory (SLT) and the Algorithm Stability (AS), D...
Article
Temporal data produced by industrial, human, and natural phenomena typically contain deterministic and stochastic influences, being the first ideally modelled using Dynamical Systems while the second is appropriately addressed using Statistical tools. Although such influences have been widely studied as individual components, specific tools are req...
Article
In spite of the relevance of Decision Trees (DTs), there is still a disconnection between their theoretical and practical results while selecting models to address specific learning tasks. A particular criterion is provided by the Shattering coefficient, a growth function formulated in the context of the Statistical Learning Theory (SLT), which mea...
Preprint
Full-text available
Recently, several techniques have been explored to detect unusual behaviour in surveillance videos. Nevertheless, few studies leverage features from pre-trained CNNs and none of then present a comparison of features generate by different models. Motivated by this gap, we compare features extracted by four state-of-the-art image classification netwo...
Article
The surveillance of active volcanoes around the world has become a critical security issue for many countries, requiring a continuous monitoring of seismic signals. By analyzing such signals, we intend to understand volcanic activities (e.g. explosions, eruptions and depressurization) and take decisions to reduce the effects and damages to the econ...
Article
Motivated by the Statistical Learning Theory (SLT), which provides a theoretical framework to ensure when supervised learning algorithms generalize input data, this manuscript relies on the Algorithmic Stability framework to prove learning bounds for the unsupervised concept drift detection on data streams. Based on such proof, we also designed the...
Chapter
This chapter starts by reviewing the basic concepts on Linear Algebra, then we design a simple hyperplane-based classification algorithm. Next, it provides an intuitive and an algebraic formulation to obtain the optimization problem of the Support Vector Machines. At last, hard-margin and soft-margin SVMs are detailed, including the necessary mathe...
Chapter
In this chapter, we provide the necessary foundation for completely design and implement SVM optimization algorithm. The concepts are described so that those can be broadly applied to general-purpose optimization problems.
Chapter
Full-text available
The area of Machine Learning (ML) is interested in answering how a computer can “learn” specific tasks such as recognize characters, support the diagnosis of people under severe diseases, classify wine types, separate some material according to its quality (e.g. wood could be separated according to its weakness, so it could be later used to build e...
Chapter
This chapter starts by describing the necessary concepts and assumptions to ensure supervised learning. Later on, it details the Empirical Risk Minimization (ERM) principle, which is the key point for the Statistical Learning Theory (SLT). The ERM principle provides upper bounds to make the empirical risk a good estimator for the expected risk, giv...
Chapter
Chapter 2 introduced the concepts and formulation developed in the context of the Statistical Learning Theory. In this chapter, those concepts are illustrated using the following algorithms: Distance-Weighted Nearest Neighbors, Perceptron, Multilayer Perceptron, and Support Vector Machines.
Chapter
In the previous chapters, we described the Support Vector Machines as a method that creates an optimal hyperplane separating two classes by minimizing the loss via margin maximization. This maximization led to a dual optimization problem resulting in a Lagrangian function which is quadratic and requires simple inequality constraints. The support ve...
Article
This paper presents our efforts to detect Concept Drifts (changes in data generation processes), using the Cross-Recurrence Quantification Analysis, on time series produced by social network systems. Experiments were performed on the TSViz project (http://www.tsviz.com.br), which collects online tweets associated with predefined hashtags and proces...
Preprint
Full-text available
The Statistical Learning Theory (SLT) provides the theoretical guarantees for supervised machine learning based on the Empirical Risk Minimization Principle (ERMP). Such principle defines an upper bound to ensure the uniform convergence of the empirical risk Remp(f), i.e., the error measured on a given data sample, to the expected value of risk R(f...
Article
Full-text available
When dealing with semi-supervised scenarios, the Positive and Unlabeled (PU) problem is a special case in which few labeled examples from a single class of interest are received to proceed with the classification of unseen instances, according to their similarities with the known class. In the scope of time series, most of the current studies propo...
Article
A wide-range of applications based on processing of data streams have emerged in the last decade. They require specialised techniques to obtain representative models and extract information. Traditional data clustering algorithms have been adapted to include continuously arriving data by updating the current model. Most of data stream clustering al...
Book
This book presents the Statistical Learning Theory in a detailed and easy to understand way, by using practical examples, algorithms and source codes. It can be used as a textbook in graduation or undergraduation courses, for self-learners, or as reference with respect to the main theoretical concepts of Machine Learning. Fundamental concepts of Li...
Article
Full-text available
Deep Learning (DL) is one of the most common subjects when Machine Learning and Data Science approaches are considered. There are clearly two movements related to DL: the first aggregates researchers in quest to outperform other algorithms from literature, trying to win contests by considering often small decreases in the empirical risk; and the se...
Article
The Convolutional Neural Network (CNN) figures among the state-of-the-art Deep Learning (DL) algorithms due to its robustness to support data shift, scale variations, and its capability of extracting relevant information from large-scale input data. However, setting appropriate parameters to define CNN architectures is still a challenging issue, ma...
Article
Full-text available
High-accuracy speech recognition is especially challenging when large datasets are not available. It is possible to bridge this gap with careful and knowledge-driven parsing combined with the biologically inspired CNN and the learning guarantees of the Vapnik Chervonenkis (VC) theory. This work presents a Shallow-CNN-HTSVM (Hierarchical Tree Suppor...
Article
Concept drift detection plays a very important role in the context of data streams. It allows to point out data behavior modifications along time, which are intrinsically associated to the phenomena responsible for producing such sequences of observations. By detecting such modifications, one can better understand those phenomena and take better de...
Conference Paper
The huge amount of data daily produced through social networks, such as Twitter, has been motivating several researchers and companies to design approaches to model and study users’ behavior/feelings. Most of the current studies have been focusing on building tweet datasets which are later analysed using offline tools. This paper introduces a new a...
Article
The main objective of this paper is to present a study on how the design and development of new techniques and infrastructure to process and manage data may evolve within the next years. By analyzing recent studies, we have noticed the huge volume of data currently produced and collected from different real-world applications has causing changes in...
Article
Full-text available
As Interfaces Cérebro-Computador (BCI) são sistemas que provêm uma alternativa para que pessoas com perda severa ou total do controle motor possam inte- ragir com o ambiente externo. Para mapear intenções individuais em operações de má- quina, os sistemas de BCI empregam um conjunto de etapas que envolvem a captura e pré-processamento dos sinais ce...
Article
The Statistical Learning Theory (SLT) defines five assumptions to ensure learning for supervised algorithms. Data independency is one of those assumptions, once the SLT relies on the Law of Large Numbers to ensure learning bounds. As a consequence, this assumption imposes a strong limitation to guarantee learning on time-dependent scenarios. In ord...
Article
Cover song identification (CSI) systems typically represent songs as chromagrams which are pairwise compared using different evaluation measurements. Chromagram comparison are usually computationally demanding, making most CSI systems unsuitable for real-world scenarios where millions of songs have to be processed. Evaluation mechanisms such as the...
Article
Empirical Mode Decomposition (EMD) is a method to decompose signals into Intrinsic Mode Functions (IMFs) to be analyzed in terms of instantaneous frequencies and amplitudes. By comparing the phase spectra of IMFs, we observed that a subset of them contains more stochastic influences while the other is predominantly deterministic. Considering this o...
Article
Real-world data streams may change their behaviors along time, what is referred to as concept drift. By detecting those changes, researchers obtain relevant information about the phenomena that produced such streams (e.g. temperatures in a region, bacteria population, disease occurrence, etc.). Many concept drift detection algorithms consider super...
Article
Traditionally, computer programs have been developed using the sequential programming paradigm. With the advent of parallel computing systems, such as multicore processors and distributed environments, the sequential paradigm became a barrier to the utilisation of the available resources, since the program is restricted to a single processing unit....
Article
The mining of data streams has been attracting much attention in the recent years, specially from Machine Learning researchers. One important task in learning from data streams is to correctly detect changing data characteristics over time, since this is critical to the correct modeling of data behavior. With the understanding that many application...
Article
High-dimensional data streams clustering is an attractive research topic, as there are several applications that generate a high number of attributes, bringing new challenges in terms of partitioning due to the curse of dimensionality. In addition, those applications produce unbounded sequences of data which cannot be stored for later analysis. Alt...
Article
Topology is the branch of mathematics that studies how objects relate to one another for their qualitative structural properties, such as connectivity and shape. In this paper, we present an approach for data clustering based on topological features computed over the persistence diagram, estimated using the theory of persistent homology. The featur...
Article
Full-text available
Music collections are widely available on the Inter- net and, with the increasing storage and bandwidth capability, users can currently access thousands of songs, what brings challenges to music organization and exploration. Therefore, there is a growing demand towards automated Music Information Retrieval (MIR) tools for organizing, retrieving and...
Article
Full-text available
One of the most important issues in the design of distributed systems is process scheduling, which maps applications to resources in an attempt to reduce the application execution time or maximise resource utilisation. The complexity involved in finding good scheduling solutions has motivated the design of several heuristics and approximation algor...
Article
Full-text available
The analysis of temporal geospatial data has provided important insights into global vegetation dynamics, particularly the interaction among different variables such as precipitation and vegetation indices. Nevertheless, this analysis is not a straightforward task due to the complex relationships among different systems driving the dynamics of the...
Article
Full-text available
Surrogate data methods have been widely applied to produce synthetic data, while maintaining the same statistical properties as the original. By using such methods, one can analyze certain properties of time series. In this context, Theiler's surrogate data methods are the most commonly considered approaches. These are based on the Fourier transfor...
Article
Full-text available
Process scheduling is one of the most important issues in distributed computing. However, this problem still requires further formalisation to understand the consequences of scheduler decisions. To overcome this drawback, this paper defines the behaviour of computer workloads in terms of a dynamical system model, in which next workload states depen...
Article
The detection of concept drift allows to point out when a data stream changes its behaviour over time, what supports further analysis to understand why the phenomenon represented by such data has changed. Nowadays, researchers have been approaching concept drift using unsupervised learning strategies, due to data streams are open-ended sequences of...
Article
Learning from continuous streams of data has been receiving an increasingly attention in the last years. Among the many challenges related to mining data streams, change detection is one topic frequently addressed. Being able to determine whether or not data characteristics are changing along time is a major concern for data stream algorithms, be i...
Conference Paper
Full-text available
Resumo: A detecção e o rastreamento automáticos de leucócitos em imagens de vídeo de MI podem garantir análises mais precisas e, consequentemente, auxiliar os pesquisadores no desenvolvimento de estratégias terapêuticas mais eficazes. Entretanto, na análise in vivo, a respiração e a atividade cardíaca do animal ocasionam a perda momentânea do foco...
Conference Paper
Clustering is one of the most used data mining techniques, while computational topology is a very recent field bridging abstract mathematics with concrete computational techniques. In this paper, we explore the hypothesis that topologically-similar clusters may indicate meaningful relationships. Our approach has an efficient implementation based on...
Conference Paper
Monitoring natural environments is a challenging task on account of their hostile features. The use of wireless sensor networks (WSN) for data collection is a viable method since these domains lack any infrastructure. Further studies are required to handle the data collected to provide a better modeling of behavior and make it possible to forecast...
Article
Full-text available
The ability to detect changes in the data distribution is an important issue in Data Stream mining. Detecting changes in data distribution allows the adaptation of a previously learned model to accommodate the most recent data and, therefore, improve its prediction capability. This paper proposes a framework for non-supervised automatic change dete...
Article
Sensor failures or oversupply in wireless sensor networks (WSNs), especially initial random deployment, create spare sensors (whose area is fully covered by other sensors) and sensing holes. We envision a team of robots to relocate sensors and improve their area coverage. Existing algorithms, including centralized ones and the only localized G-R3S2...
Article
Severe constraints imposed by the nature of endless sequences of data collected from unstable phenomena have pushed the understanding and the development of automated analysis strategies, such as data clustering techniques. However, current clustering validation approaches are inadequate to data streams due to they do not properly evaluate represen...
Article
The current ability to produce massive amounts of data and the impossibility in storing it motivated the development of data stream mining strategies. Despite the proposal of many techniques, this research area still lacks in approaches to mine data streams composed of multiple time series, which has applications in finance, medicine and science. M...
Article
Player Modelling has been receiving much attention from the game community in the recent years. The ability to build accurate models of player behavior can be useful in many aspects of a game. One important aspect is the tracking of a player’s behavior along time, informing every time a change is perceived. This way, the game Artificial Intelligenc...
Article
This paper proposes a new approach to improve time series modeling by considering stochastic and deterministic influences. Assuming such influences are present in observations, a first decomposition step is required to split them into two components: one stochastic and another deterministic. As second step, models are adjusted on each component and...
Article
Several research fields have described phenomena that produce endless sequences of samples, referred to as data streams. These phenomena are studied using data clustering models continuously obtained throughout the endless data gathering process, whose set of dynamical properties, i.e., behavior, evolves over time. In order to cope with data stream...
Article
Current scientific applications have been producing large amounts of data. The processing, handling and analysis of such data require large-scale computing infrastructures such as clusters and grids. In this area, studies aim at improving the performance of data-intensive applications by optimizing data accesses. In order to achieve this goal, dist...
Conference Paper
Throughout the years, information quality has become an essential tool for marketing decisions and to support other business scenarios and operations. That happens due to the fast pace of increase in both the amount of data to be qualified and the complexity of qualification process. This has motivated the adoption of High Performance Computing (HPC)...
Article
The understanding of several phenomena requires unbounded data collections, called data streams. These phenomena often present unstable behavior and are studied by means of unsupervised induction processes based on data clustering. Currently, clustering processes have shown serious limitations in their applications to data streams due to the demand...
Chapter
Machine learning is a field of artificial intelligence which aims at developing techniques to automatically transfer human knowledge into analytical models. Recently, those techniques have been applied to time series with unknown dynamics and fluctuations in the established behavior patterns, such as humancomputer interaction, inspection robotics a...
Article
By modeling the outputs produced by real world systems, we can study and, therefore, understand how they work and behave under different circumstances. This is especially interesting to support the prediction of future behaviour and, consequently, decision-making, what is particularly required in certain application domains. In order to proceed wit...
Conference Paper
Real-world datasets commonly present high dimensional data, which means an increased amount of information. However, this does not always imply an improvement in learning technique performance. Furthermore, some features may be correlated or add unexpected noise, thereby reducing data clustering performance. This has motivated the development of fe...
Article
Recently, there has been an increased interest in self-healing systems. These types of systems are able to cope with failures in the environment they execute and work continuously by taking proactive actions to correct these problems. The detection of faults plays a prominent role in self-healing systems, as faults are the original causes of failur...

Network

Cited By