Rodrigo Mello
University of São Paulo | USP · Department of Computer Science (ICMC)

PhD in Electrical Engineering

About

184

Publications

42,011

Reads

1,514

Citations

Skills and Expertise

Algorithms

Time Series Analysis

Machine Learning

Publications

Forecasting insect abundance using time series embedding and machine learning

Preprint

Full-text available

Dec 2023

Implementing insect monitoring systems provides an excellent opportunity to create accurate interventions for insect control. However, selecting the appropriate time for an intervention is still an open question due to the inherent difficulty of implementing on-site monitoring in real-time. This decision is even more critical with insect species th...

Forecasting the abundance of agricultural pests: a new machine learning framework

Poster

Full-text available

Aug 2023

Forecasting the abundance of agricultural pests: a new machine learning framework

Conference Paper

Full-text available

Aug 2023

Insect outbreaks can affect forests and agroecosystems, resulting in economic and environmental damage. This problem provides an opportunity for Machine-Learning (ML) applications, mainly studies with cause-effect relationships. However, many studies do not consider causality analysis, focusing solely on feature selection and prediction with ML met...

Forecasting insect abundance using time series embedding and environmental covariates

Conference Paper

Full-text available

Jul 2023

Implementing insect monitoring systems provides an excellent opportunity to create accurate interventions for insect control. Growers can use methods enlightened by Integrated Pest Management to prevent economic damage to their crops. However, selecting the appropriate time for applying an intervention is still an open question. This decision is ev...

A New Forecasting Tool Based on Time Series Embedding

Conference Paper

May 2023

eXplainable Ensemble Strategy using distinct and restrict learning biases: A case study on the Brazilian Forest

Article

Dec 2022

Supervised learning algorithms consider different learning biases from the universe of all admissible functions to induce classifiers. When using ensembles, one takes advantage of different biases typically built from the same algorithm to combine complementary classifiers into a single model, such as Random Forest, that builds up several trees fro...

A Survey on Semi-Supervised Learning for Delayed Partially Labelled Data Streams

Article

Full-text available

Mar 2022

Unlabelled data appear in many domains and are particularly relevant to streaming applications, where even though data is abundant, labelled data is rare. To address the learning problems associated with such data, one can ignore the unlabelled data and focus only on the labelled data (supervised learning); use the labelled data and attempt to leve...

Neural network training fingerprint: visual analytics of the training process in classification neural networks

Article

Nov 2021

The striking results of deep neural networks (DNN) have motivated its wide acceptance to tackle large datasets and complex tasks such as natural language processing, facial recognition, and artificial image generation. However, DNN parameters are often empirically selected on a trial-and-error approach without detailed information on convergence be...

Using fuzzy clustering to address imprecision and uncertainty present in deterministic components of time series

Article

Oct 2021

Time series analysis models, understands, and predicts phenomena from different domains such as meteorology, medicine, and economics. In this context, Fuzzy Time Series has been standing out due to its capacity of using mathematical functions to represent linguistic variables, resulting in interpretative and more accurate models. Several studies ai...

Figure 1. Mean Silhouette variation ( S µ ), considering three groups...

Figure 2. Confirmed cases per million inhabitants: country partitions...

Figure 5. Mean Silhouette ( S µ ) variation by considering three groups...

Figure 9. Confirmed cases for the first eight weeks: visualization of...

Mean Silhouette variation (Sμ\documentclass[12pt]{minimal}...

Country transition index based on hierarchical clustering to predict next COVID-19 waves

Article

Full-text available

Jul 2021

COVID-19 has widely spread around the world, impacting the health systems of several countries in addition to the collateral damage that societies will face in the next years. Although the comparison between countries is essential for controlling this disease, the main challenge is the fact of countries are not simultaneously affected by the virus....

Coarse-refinement dilemma: On generalization bounds for data clustering

Article

Jul 2021

The data clustering problem is of central importance for the area of machine learning, given its usefulness to represent data structural similarities from input spaces. Although, data clustering counts on scarse literature of a theoretical framework with generalization guarantees. In this context, this manuscript introduces a new concept, based on...

A Survey on Semi-Supervised Learning for Delayed Partially Labelled Data Streams

Preprint

Full-text available

Jun 2021

Time complexity evaluation of cover song identification algorithms

Article

Apr 2021

Cover Song Identification (CSI) is a task in Music Information Retrieval (MIR) that attempts to identify other versions of a song containing different structures, tonalities, and tempos, what brings several challenges to this task. Some of frameworks proposed to identify cover songs were evaluated through the Music Information Retrieval Evaluation...

Cross-modality co-attention networks for visual question answering

Article

Full-text available

Apr 2021

Visual question answering (VQA) is an emerging task combining natural language processing and computer vision technology. Selecting compelling multi-modality features is the core of visual question answering. In multi-modal learning, the attention network provides an effective way that selectively utilizes the given visual information. However, the...

Brazilian Forest Dataset: A new dataset to model local biodiversity

Article

Jan 2021

The Intergovernmental Panel on Climate Change and the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services have emphasised unequivocal evidences about the impact of human actions on climate and biodiversity at alarming rates. In Brazilian terms, 2019 has been marked by controversial discussions among politicians and envi...

Investigating 3D Convolutional Layers as Feature Extractors for Anomaly Detection Systems Applied to Surveillance Videos

Conference Paper

Full-text available

Jan 2021

Data Streams Are Time Series: Challenging Assumptions

Chapter

Oct 2020

The increasingly relevance of data streams in the context of machine learning and artificial intelligence has motivated this paper which discusses and draws necessary relationships between the concepts of data streams and time series in attempt to build on theoretical foundations to support online learning in such scenarios. We unify the concepts o...

Quantifying Temporal Novelty in Social Networks Using Time-Varying Graphs and Concept Drift Detection

Chapter

Oct 2020

This paper presents a new approach to quantify temporal novelties in Social Networks and, as a consequence, to identify changing points driven by the occurrence of new real-world events that influence the public opinion. Our approach starts using Text Mining tools to highlight the main key terms, that will be later used to create a temporal graph,...

Ensuring Learning Guarantees on Concept Drift Detection with Statistical Learning Theory

Preprint

Full-text available

Jun 2020

Concept Drift (CD) detection intends to continuously identify changes in data stream behaviors, supporting researchers in the study and modeling of real-world phenomena. Motivated by the lack of learning guarantees in current CD algorithms, we decided to take advantage of the Statistical Learning Theory (SLT) to formalize the necessary requirements...

Compressed k-Nearest Neighbors Ensembles for Evolving Data Streams

Conference Paper

Full-text available

Jun 2020

The unbounded and multidimensional nature, the evolution of data distributions with time, and the requirement of single-pass algorithms comprise the main challenges of data stream classification , which makes it impossible to infer learning models in the same manner as for batch scenarios. Data dimensionality reduction arises as a key factor to tra...

Supporting Optimal Phase Space Reconstructions Using Neural Network Architecture for Time Series Modeling

Preprint

Full-text available

Jun 2020

The reconstruction of phase spaces is an essential step to analyze time series according to Dynamical System concepts. A regression performed on such spaces unveils the relationships among system states from which we can derive their generating rules, that is, the most probable set of functions responsible for generating observations along time. In...

Estimating the thermal insulating performance of multi-component refractory ceramic systems based on a machine learning surrogate model framework

Article

Full-text available

Jun 2020

Predicting the insulating thermal behavior of a multi-component refractory ceramic system could be a difficult task, which can be tackled using the finite element (FE) method to solve the partial differential equations of the heat transfer problem, thus calculating the temperature profiles throughout the system in any given period. Nevertheless, us...

In-depth comparison of deep artificial neural network architectures on seismic events classification

Article

May 2020

One of the most challenging tasks in volcanic data analysis is the classification of seismic events. By knowing them, it is possible to take decisions in advance, providing benefits for the neighboring societies as, for instance, how such events may impact users' life and cropland areas. Although there are several approaches to perform such task, D...

Multi-Keyword ranked search based on mapping set matching in cloud ciphertext storage system

Article

Full-text available

Apr 2020

Most of the existing outsourced encrypted data schemes are retrieved based on the query keyword entered by authorised users. However, with the increase of the data scale in the cloud storage system, the retrieval efficiency of existing solutions has not been significantly improved. In this paper, a multi-keyword ranked search scheme for ciphertext...

Figure 1: Llaima volcano and its seismic stations [3].

Figure 2: Samples of Llaima signals to represent every seismic event...

Llaima Volcano Dataset: In-Depth Comparison of Deep Artificial Neural Network Architectures on Seismic Events Classification

Article

Full-text available

Apr 2020

This data manuscript presents a set of signals collected from the Llaima volcano located at the western edge of the Andes in Araucania Region, Chile. The signals were recorded from the LAV station between 2010 and 2016. After individually processing and analyzing every signal, specialists from the Observatorio Vulcanológico de los Andes Sur (OVDAS)...

Materials selection of furnace linings with multi-component refractory ceramics based on an evolutionary screening procedure

Article

Mar 2020

Share link (until March 06 2020): https://authors.elsevier.com/a/1aPhK~2-EzJro Materials selection of multi-component systems is a challenging task, which is usually not properly tackled in furnace linings (FL) design. In an attempt to generate a systematic approach to select FL ceramic materials, an evolutionary screening procedure (ESP) is prop...

Figure 2. (a) Node error distribution diagram in square random...

Enhancing the Sensor Node Localization Algorithm Based on Improved DV-Hop and DE Algorithms in Wireless Sensor Networks

Article

Full-text available

Jan 2020

The Distance Vector-Hop (DV-Hop) algorithm is the most well-known range-free localization algorithm based on the distance vector routing protocol in wireless sensor networks; however, it is widely known that its localization accuracy is limited. In this paper, DEIDV-Hop is proposed, an enhanced wireless sensor node localization algorithm based on t...

Time series clustering using stochastic and deterministic influences

Article

Jan 2020

Feature Scoring using Tree-Based Ensembles for Evolving Data Streams

Conference Paper

Full-text available

Dec 2019

Assigning scores to individual features is a popular method for estimating the relevance of features in supervised learning. An accurate feature score estimation provides essential insights in sensitive domains, which is decisive to explain how features influence a given decision, contributing to the inter-pretability of the model. Learning from st...

Brazilian Forest Dataset

Raw Data

Dec 2019

Dataset created to monitor the Brazilian vegetation combining 4 different systems: (i) an inventory of Brazilian seed plants created to map the country biodiversity; (ii) the Fraction of Absorbed Photosynthetically Active Radiation; (iii) the NASA Power database to include meteorological data; and (iv) the DATASUS system which makes available geogr...

Fig. 1 CNN-HTSVM architecture defined for the experiments

Table 1 Example excerpt from the augmented pronunciation model

Fig. 2 HTSVM architecture defined for the experiments

Table 2 F1 Scores in frames, frame error rates and phone error rates...

Table 3 Most frequent FER confusion percentages in GMM-HMM and...

Theoretical learning guarantees applied to acoustic modeling

Article

Full-text available

Dec 2019

In low-resource scenarios, for example, small datasets or a lack in computational resources available, state-of-the-art deep learning methods for speech recognition have been known to fail. It is possible to achieve more robust models if care is taken to ensure the learning guarantees provided by the statistical learning theory. This work presents...

On the Shattering Coefficient of Supervised Learning Algorithms

Preprint

Nov 2019

Rodrigo Mello

The Statistical Learning Theory (SLT) provides the theoretical background to ensure that a supervised algorithm generalizes the mapping $f: \mathcal{X} \to \mathcal{Y}$ given $f$ is selected from its search space bias $\mathcal{F}$. This formal result depends on the Shattering coefficient function $\mathcal{N}(\mathcal{F},2n)$ to upper bound the em...

Coarse-Refinement Dilemma: On Generalization Bounds for Data Clustering

Preprint

Full-text available

Nov 2019

The Data Clustering (DC) problem is of central importance for the area of Machine Learning (ML), given its usefulness to represent data structural similarities from input spaces. Differently from Supervised Machine Learning (SML), which relies on the theoretical frameworks of the Statistical Learning Theory (SLT) and the Algorithm Stability (AS), D...

Redes Neurais Artificiais: Aplicações em biologia.

Conference Paper

Full-text available

Oct 2019

Redes Neurais Artificiais: Aplicações em biologia

Poster

Full-text available

Oct 2019

Decomposing time series into deterministic and stochastic influences: A survey

Article

Oct 2019

Temporal data produced by industrial, human, and natural phenomena typically contain deterministic and stochastic influences, being the first ideally modelled using Dynamical Systems while the second is appropriately addressed using Statistical tools. Although such influences have been widely studied as individual components, specific tools are req...

Measuring the Shattering Coefficient of Decision Tree Models

Article

Jul 2019

In spite of the relevance of Decision Trees (DTs), there is still a disconnection between their theoretical and practical results while selecting models to address specific learning tasks. A particular criterion is provided by the Shattering coefficient, a growth function formulated in the context of the Statistical Learning Theory (SLT), which mea...

Time series clustering using stochastic and deterministic influences

Article

Jan 2019

Are pre-trained CNNs good feature extractors for anomaly detection in surveillance videos?

Preprint

Full-text available

Nov 2018

Recently, several techniques have been explored to detect unusual behaviour in surveillance videos. Nevertheless, few studies leverage features from pre-trained CNNs and none of then present a comparison of features generate by different models. Motivated by this gap, we compare features extracted by four state-of-the-art image classification netwo...

Discriminating seismic events of the Llaima volcano (Chile) based on spectrogram cross-correlations

Article

Nov 2018

The surveillance of active volcanoes around the world has become a critical security issue for many countries, requiring a continuous monitoring of seismic signals. By analyzing such signals, we intend to understand volcanic activities (e.g. explosions, eruptions and depressurization) and take decisions to reduce the effects and damages to the econ...

Time Series Decomposition Using Spring System Applied on Phase Spaces

Conference Paper

Oct 2018

Color Quantization in Transfer Learning and Noisy Scenarios: An Empirical Analysis Using Convolutional Networks

Conference Paper

Oct 2018

On Learning Guarantees to Unsupervised Concept Drift Detection on Data Streams

Article

Sep 2018

Motivated by the Statistical Learning Theory (SLT), which provides a theoretical framework to ensure when supervised learning algorithms generalize input data, this manuscript relies on the Algorithmic Stability framework to prove learning bounds for the unsupervised concept drift detection on data streams. Based on such proof, we also designed the...

Introduction to Support Vector Machines: A Practical Approach on the Statistical Learning Theory

Chapter

Aug 2018

This chapter starts by reviewing the basic concepts on Linear Algebra, then we design a simple hyperplane-based classification algorithm. Next, it provides an intuitive and an algebraic formulation to obtain the optimization problem of the Support Vector Machines. At last, hard-margin and soft-margin SVMs are detailed, including the necessary mathe...

In Search for the Optimization Algorithm: A Practical Approach on the Statistical Learning Theory

Chapter

Aug 2018

In this chapter, we provide the necessary foundation for completely design and implement SVM optimization algorithm. The concepts are described so that those can be broadly applied to general-purpose optimization problems.

A Brief Review on Machine Learning: A Practical Approach on the Statistical Learning Theory

Chapter

Full-text available

Aug 2018

The area of Machine Learning (ML) is interested in answering how a computer can “learn” specific tasks such as recognize characters, support the diagnosis of people under severe diseases, classify wine types, separate some material according to its quality (e.g. wood could be separated according to its weakness, so it could be later used to build e...

Statistical Learning Theory: A Practical Approach on the Statistical Learning Theory

Chapter

Aug 2018

This chapter starts by describing the necessary concepts and assumptions to ensure supervised learning. Later on, it details the Empirical Risk Minimization (ERM) principle, which is the key point for the Statistical Learning Theory (SLT). The ERM principle provides upper bounds to make the empirical risk a good estimator for the expected risk, giv...

Assessing Supervised Learning Algorithms: A Practical Approach on the Statistical Learning Theory

Chapter

Aug 2018

Chapter 2 introduced the concepts and formulation developed in the context of the Statistical Learning Theory. In this chapter, those concepts are illustrated using the following algorithms: Distance-Weighted Nearest Neighbors, Perceptron, Multilayer Perceptron, and Support Vector Machines.

A Brief Introduction on Kernels: A Practical Approach on the Statistical Learning Theory

Chapter

Aug 2018

In the previous chapters, we described the Support Vector Machines as a method that creates an optimal hyperplane separating two classes by minimizing the loss via margin maximization. This maximization led to a dual optimization problem resulting in a Lagrangian function which is quadratic and requires simple inequality constraints. The support ve...

Concept drift detection on social network data using cross-recurrence quantification analysis

Article

Aug 2018

This paper presents our efforts to detect Concept Drifts (changes in data generation processes), using the Cross-Recurrence Quantification Analysis, on time series produced by social network systems. Experiments were performed on the TSViz project (http://www.tsviz.com.br), which collects online tweets associated with predefined hashtags and proces...

Computing the Shattering Coefficient of Supervised Learning Algorithms

Preprint

Full-text available

May 2018

The Statistical Learning Theory (SLT) provides the theoretical guarantees for supervised machine learning based on the Empirical Risk Minimization Principle (ERMP). Such principle defines an upper bound to ensure the uniform convergence of the empirical risk Remp(f), i.e., the error measured on a given data sample, to the expected value of risk R(f...

Semi-Supervised Time Series Classification on Positive and Unlabeled Problems Using Cross-Recurrence Quantification Analysis

Article

Full-text available

Mar 2018

When dealing with semi-supervised scenarios, the Positive and Unlabeled (PU) problem is a special case in which few labeled examples from a single class of interest are received to proceed with the classification of unseen instances, according to their similarities with the known class. In the scope of time series, most of the current studies propo...

Estimating data stream tendencies to adapt clustering parameters

Article

Jan 2018

A wide-range of applications based on processing of data streams have emerged in the last decade. They require specialised techniques to obtain representative models and extract information. Traditional data clustering algorithms have been adapted to include continuously arriving data by updating the current model. Most of data stream clustering al...

Machine Learning: A Practical Approach on the Statistical Learning Theory

Book

Jan 2018

This book presents the Statistical Learning Theory in a detailed and easy to understand way, by using practical examples, algorithms and source codes. It can be used as a textbook in graduation or undergraduation courses, for self-learners, or as reference with respect to the main theoretical concepts of Machine Learning. Fundamental concepts of Li...

Providing theoretical learning guarantees to Deep Learning Networks

Article

Full-text available

Nov 2017

Deep Learning (DL) is one of the most common subjects when Machine Learning and Data Science approaches are considered. There are clearly two movements related to DL: the first aggregates researchers in quest to outperform other algorithms from literature, trying to win contests by considering often small decreases in the empirical risk; and the se...

Designing Architectures of Convolutional Neural Networks to Solve Practical Problems

Article

Oct 2017

The Convolutional Neural Network (CNN) figures among the state-of-the-art Deep Learning (DL) algorithms due to its robustness to support data shift, scale variations, and its capability of extracting relevant information from large-scale input data. However, setting appropriate parameters to define CNN architectures is still a challenging issue, ma...

Analyzing the Public Opinion on the Brazilian Political and Corruption Issues

Conference Paper

Oct 2017

Acoustic Modeling Using a Shallow CNN-HTSVM Architecture

Conference Paper

Full-text available

Oct 2017

Acoustic Modeling Using a Shallow CNN-HTSVM Architecture

Article

Full-text available

Jun 2017

High-accuracy speech recognition is especially challenging when large datasets are not available. It is possible to bridge this gap with careful and knowledge-driven parsing combined with the biologically inspired CNN and the learning guarantees of the Vapnik Chervonenkis (VC) theory. This work presents a Shallow-CNN-HTSVM (Hierarchical Tree Suppor...

Multidimensional surrogate stability to detect data stream concept drift

Article

Jun 2017

Concept drift detection plays a very important role in the context of data streams. It allows to point out data behavior modifications along time, which are intrinsically associated to the phenomena responsible for producing such sequences of observations. By detecting such modifications, one can better understand those phenomena and take better de...

TSViz: A Data Stream Architecture to Online Collect, Analyze, and Visualize Tweets

Conference Paper

Apr 2017

The huge amount of data daily produced through social networks, such as Twitter, has been motivating several researchers and companies to design approaches to model and study users’ behavior/feelings. Most of the current studies have been focusing on building tweet datasets which are later analysed using offline tools. This paper introduces a new a...

Is even data analysis ready today?

Article

Jan 2017

The main objective of this paper is to present a study on how the design and development of new techniques and infrastructure to process and manage data may evolve within the next years. By analyzing recent studies, we have noticed the huge volume of data currently produced and collected from different real-world applications has causing changes in...

Emprego de Banco de Filtros e do Teorema de Imersão de Takens em Padrões Espaciais para a Classificação de Imagética Motora em Interfaces Cérebro-Computador

Article

Full-text available

Dec 2016

As Interfaces Cérebro-Computador (BCI) são sistemas que provêm uma alternativa para que pessoas com perda severa ou total do controle motor possam inte- ragir com o ambiente externo. Para mapear intenções individuais em operações de má- quina, os sistemas de BCI empregam um conjunto de etapas que envolvem a captura e pré-processamento dos sinais ce...

Applying a kernel function on time-dependent data to provide supervised-learning guarantees

Article

Nov 2016

The Statistical Learning Theory (SLT) defines five assumptions to ensure learning for supervised algorithms. Data independency is one of those assumptions, once the SLT relies on the Law of Large Numbers to ensure learning bounds. As a consequence, this assumption imposes a strong limitation to guarantee learning on time-dependent scenarios. In ord...

On Accuracy and Time Processing Evaluation of Cover Song Identification Systems

Article

Oct 2016

Cover song identification (CSI) systems typically represent songs as chromagrams which are pairwise compared using different evaluation measurements. Chromagram comparison are usually computationally demanding, making most CSI systems unsuitable for real-world scenarios where millions of songs have to be processed. Evaluation mechanisms such as the...

Applying Empirical Mode Decomposition and mutual information to separate stochastic and deterministic influences embedded in signals

Article

Jul 2016

Empirical Mode Decomposition (EMD) is a method to decompose signals into Intrinsic Mode Functions (IMFs) to be analyzed in terms of instantaneous frequencies and amplitudes. By comparing the phase spectra of IMFs, we observed that a subset of them contains more stochastic influences while the other is predominantly deterministic. Considering this o...

Using Dynamical Systems Tools to Detect Concept Drift in Data Streams

Article

Apr 2016

Real-world data streams may change their behaviors along time, what is referred to as concept drift. By detecting those changes, researchers obtain relevant information about the phenomena that produced such streams (e.g. temperatures in a region, bacteria population, disease occurrence, etc.). Many concept drift detection algorithms consider super...

Detecting Dynamical Changes in Data Streams

Conference Paper

Jan 2016

Estimating data stream tendencies to adapt clustering parameters

Article

Jan 2016

Application execution path analysis for the automatic parallelisation of binary codes in the Intel x86 platform

Article

Dec 2015

Traditionally, computer programs have been developed using the sequential programming paradigm. With the advent of parallel computing systems, such as multicore processors and distributed environments, the sequential paradigm became a barrier to the utilisation of the available resources, since the program is restricted to a single processing unit....

Unsupervised change detection in data streams: an application in music analysis

Article

Nov 2015

The mining of data streams has been attracting much attention in the recent years, specially from Machine Learning researchers. One important task in learning from data streams is to correctly detect changing data characteristics over time, since this is critical to the correct modeling of data behavior. With the understanding that many application...

PTS: Projected Topological Stream clustering algorithm

Article

Nov 2015

High-dimensional data streams clustering is an attractive research topic, as there are several applications that generate a high number of attributes, bringing new challenges in terms of partitioning due to the curse of dimensionality. In addition, those applications produce unbounded sequences of data which cannot be stored for later analysis. Alt...

Persistent homology for time series and spatial data clustering

Article

Sep 2015

Topology is the branch of mathematics that studies how objects relate to one another for their qualitative structural properties, such as connectivity and shape. In this paper, we present an approach for data clustering based on topological features computed over the persistence diagram, estimated using the theory of persistent homology. The featur...

Article

Full-text available

Aug 2015

Music collections are widely available on the Inter- net and, with the increasing storage and bandwidth capability, users can currently access thousands of songs, what brings challenges to music organization and exploration. Therefore, there is a growing demand towards automated Music Information Retrieval (MIR) tools for organizing, retrieving and...

Min-heap-based scheduling algorithm: An approximation algorithm for homogeneous and heterogeneous distributed systems

Article

Full-text available

Feb 2015

One of the most important issues in the design of distributed systems is process scheduling, which maps applications to resources in an attempt to reduce the application execution time or maximise resource utilisation. The complexity involved in finding good scheduling solutions has motivated the design of several heuristics and approximation algor...

Estimating determinism rates to detect patterns in geospatial datasets

Article

Full-text available

Jan 2015

The analysis of temporal geospatial data has provided important insights into global vegetation dynamics, particularly the interaction among different variables such as precipitation and vegetation indices. Nevertheless, this analysis is not a straightforward task due to the complex relationships among different systems driving the dynamics of the...

Testing for Linear and Nonlinear Gaussian Processes in Nonstationary Time Series

Article

Full-text available

Jan 2015

Surrogate data methods have been widely applied to produce synthetic data, while maintaining the same statistical properties as the original. By using such methods, one can analyze certain properties of time series. In this context, Theiler's surrogate data methods are the most commonly considered approaches. These are based on the Fourier transfor...

Modelling distributed computing workloads to support the study of scheduling decisions

Article

Full-text available

Jan 2015

Process scheduling is one of the most important issues in distributed computing. However, this problem still requires further formalisation to understand the consequences of scheduler decisions. To overcome this drawback, this paper defines the behaviour of computer workloads in terms of a dynamical system model, in which next workload states depen...

A Stable and Online Approach to Detect Concept Drift in Data Streams

Article

Dec 2014

The detection of concept drift allows to point out when a data stream changes its behaviour over time, what supports further analysis to understand why the phenomenon represented by such data has changed. Nowadays, researchers have been approaching concept drift using unsupervised learning strategies, due to data streams are open-ended sequences of...

cbeb2014 submission 057

Data

Dec 2014

Proposal of a new stability concept to detect changes in unsupervised data streams

Article

Nov 2014

Learning from continuous streams of data has been receiving an increasingly attention in the last years. Among the many challenges related to mining data streams, change detection is one topic frequently addressed. Being able to determine whether or not data characteristics are changing along time is a major concern for data stream algorithms, be i...

TÉCNICA DE ESTABILIZAÇÃO DE MOVIMENTO EM MICROSCOPIA INTRAVITAL UTILIZANDO MÉTODOS DE CO-REGISTRO DE IMAGENS

Conference Paper

Full-text available

Oct 2014

Resumo: A detecção e o rastreamento automáticos de leucócitos em imagens de vídeo de MI podem garantir análises mais precisas e, consequentemente, auxiliar os pesquisadores no desenvolvimento de estratégias terapêuticas mais eficazes. Entretanto, na análise in vivo, a respiração e a atividade cardíaca do animal ocasionam a perda momentânea do foco...

Data Clustering Using Topological Features

Conference Paper

Oct 2014

Clustering is one of the most used data mining techniques, while computational topology is a very recent field bridging abstract mathematics with concrete computational techniques. In this paper, we explore the hypothesis that topologically-similar clusters may indicate meaningful relationships. Our approach has an efficient implementation based on...

An Accurate Flood Forecasting Model Using Wireless Sensor Networks and Chaos Theory: A Case Study with Real WSN Deployment in Brazil

Conference Paper

Sep 2014

Monitoring natural environments is a challenging task on account of their hostile features. The use of wireless sensor networks (WSN) for data collection is a viable method since these domains lack any infrastructure. Further studies are required to handle the data collected to provide a better modeling of behavior and make it possible to forecast...

Unsupervised density-based behavior change detection in data streams

Article

Full-text available

Feb 2014

The ability to detect changes in the data distribution is an important issue in Data Stream mining. Detecting changes in data distribution allows the adaptation of a previously learned model to accommodate the most recent data and, therefore, improve its prediction capability. This paper proposes a framework for non-supervised automatic change dete...

Localized Ant Colony of Robots for Redeployment in Wireless Sensor Networks

Article

Jan 2014

Sensor failures or oversupply in wireless sensor networks (WSNs), especially initial random deployment, create spare sensors (whose area is fully covered by other sensors) and sensing holes. We envision a team of robots to relocate sensors and improve their area coverage. Existing algorithms, including centralized ones and the only localized G-R3S2...

Energy-based function to evaluate data stream clustering

Article

Dec 2013

Severe constraints imposed by the nature of endless sequences of data collected from unstable phenomena have pushed the understanding and the development of automated analysis strategies, such as data clustering techniques. However, current clustering validation approaches are inadequate to data streams due to they do not properly evaluate represen...

TS-stream: Clustering time series on data streams

Article

Dec 2013

The current ability to produce massive amounts of data and the impossibility in storing it motivated the development of data stream mining strategies. Despite the proposal of many techniques, this research area still lacks in approaches to mine data streams composed of multiple time series, which has applications in finance, medicine and science. M...

Online behavior change detection in computer games

Article

Nov 2013

Player Modelling has been receiving much attention from the game community in the recent years. The ability to build accurate models of player behavior can be useful in many aspects of a game. One important aspect is the tracking of a player’s behavior along time, informing every time a change is perceived. This way, the game Artificial Intelligenc...

Improving time series modeling by decomposing and analyzing stochastic and deterministic influences

Article

Nov 2013

This paper proposes a new approach to improve time series modeling by considering stochastic and deterministic influences. Assuming such influences are present in observations, a first decomposition step is required to split them into two components: one stochastic and another deterministic. As second step, models are adjusted on each component and...

Figure 1. Clustering and Novelty Detection Architecture.

Figure 2. Markov chain with outputs from clustering stage as states.

Intrusion Detection in Unstructured Contexts Using On-line Clustering and Novelty Detection

Article

Full-text available

May 2013

Data stream dynamic clustering supported by Markov chain isomorphisms

Article

May 2013

Several research fields have described phenomena that produce endless sequences of samples, referred to as data streams. These phenomena are studied using data clustering models continuously obtained throughout the endless data gathering process, whose set of dynamical properties, i.e., behavior, evolves over time. In order to cope with data stream...

Figure 2: Sample cut into four clusters in a dendrogram obtained using...

Figure 4: The charts present the correlations among the first ten...

Figure 5: Sample case of time axis misalignment, which is not...

Figure 6: The dendrograms show a comparison of Complete-linkage using...

Figure 7: The matrices represent the level of dissimilarity measured...

Article

Full-text available

Jan 2013

An On-Line Data Access Prediction and Optimization Approach for Distributed Systems

Article

Jun 2012

Current scientific applications have been producing large amounts of data. The processing, handling and analysis of such data require large-scale computing infrastructures such as clusters and grids. In this area, studies aim at improving the performance of data-intensive applications by optimizing data accesses. In order to achieve this goal, dist...

Improving Performance of Information Quality Applications: A Scale-Out Approach

Conference Paper

May 2012

Throughout the years, information quality has become an essential tool for marketing decisions and to support other business scenarios and operations. That happens due to the fast pace of increase in both the amount of data to be qualiﬁed and the complexity of qualiﬁcation process. This has motivated the adoption of High Performance Computing (HPC)...

Formalization of data stream clustering properties and analysis of algorithms

Article

Jan 2012

The understanding of several phenomena requires unbounded data collections, called data streams. These phenomena often present unstable behavior and are studied by means of unsupervised induction processes based on data clustering. Currently, clustering processes have shown serious limitations in their applications to data streams due to the demand...

A Self-Organizing Neural Network to Approach Novelty Detection

Chapter

Jan 2012

Machine learning is a field of artificial intelligence which aims at developing techniques to automatically transfer human knowledge into analytical models. Recently, those techniques have been applied to time series with unknown dynamics and fluctuations in the established behavior patterns, such as humancomputer interaction, inspection robotics a...

Classification of Time Series Generation Processes using Experimental Tools: A Survey and Proposal of an Automatic and Systematic Approach

Article

Dec 2011

By modeling the outputs produced by real world systems, we can study and, therefore, understand how they work and behave under different circumstances. This is especially interesting to support the prediction of future behaviour and, consequently, decision-making, what is particularly required in certain application domains. In order to proceed wit...

Quantifying Features Using False Nearest Neighbors: An Unsupervised Approach

Conference Paper

Nov 2011

Real-world datasets commonly present high dimensional data, which means an increased amount of information. However, this does not always imply an improvement in learning technique performance. Furthermore, some features may be correlated or add unexpected noise, thereby reducing data clustering performance. This has motivated the development of fe...

Learning Process Behavior for Fault Detection.

Article

Oct 2011

Recently, there has been an increased interest in self-healing systems. These types of systems are able to cope with failures in the environment they execute and work continuously by taking proactive actions to correct these problems. The detection of faults plays a prominent role in self-healing systems, as faults are the original causes of failur...