Antonio Javier Gallego

Antonio Javier Gallego
University of Alicante | UA · Software and Computing Systems

PhD

About

81
Publications
26,088
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,077
Citations

Publications

Publications (81)
Article
Full-text available
Medical image datasets are essential for training models used in computer-aided diagnosis, treatment planning, and medical research. However, some challenges are associated with these datasets, including variability in data distribution, data scarcity, and transfer learning issues when using models pre-trained from generic images. This work studies...
Article
Full-text available
Classifying logo images is a challenging task as they contain elements such as text or shapes that can represent anything from known objects to abstract shapes. While the current state of the art for logo classification addresses the problem as a multi‐class task focusing on a single characteristic, logos can have several simultaneous labels, such...
Article
Full-text available
Scene understanding is an important area in robotics and autonomous driving. To accomplish these tasks, the 3D structures in the scene have to be inferred to know what the objects and their locations are. To this end, semantic segmentation and disparity estimation networks are typically used, but running them individually is inefficient since they...
Article
Online Judge (OJ) systems are typically considered within programming-related courses as they yield fast and objective assessments of the code developed by the students. Such an evaluation generally provides a single decision based on a rubric, most commonly whether the submission successfully accomplished the assignment. Nevertheless, since in an...
Chapter
The popularity of comics has increased in the digital era, leading to the development of several applications and platforms. These advancements have opened up new opportunities for creating and distributing comics and experimenting with new forms of visual storytelling. One of the most promising research areas in this field is the use of deep learn...
Article
Full-text available
Siamese Neural Networks (SNNs) constitute one of the most representative approaches for addressing Few-Shot Image Classification. These schemes comprise a set of Convolutional Neural Network (CNN) models whose weights are shared across the network, which results in fewer parameters to train and less tendency to overfit. This fact eventually leads t...
Chapter
Full-text available
Document binarization is a well-known process addressed in the document image analysis literature, which aims to isolate the ink information from the background. Current solutions use deep learning, which requires a great amount of annotated data for training robust models. Data augmentation is known to reduce such annotation requirements, and it c...
Chapter
Full-text available
Prototype Generation (PG) methods seek to improve the efficiency of the k-Nearest Neighbor (kNN) classifier by obtaining a reduced version of a given reference dataset following certain heuristics. Despite being largely addressed topic in multiclass scenarios, few works deal with PG in multilabel environments. Hence, the existing proposals exhibit...
Chapter
Full-text available
Medical image classification datasets usually have a limited availability of annotated data, and pathological samples are usually much scarcer than healthy cases. Furthermore, data is often collected from different sources with different acquisition devices and population characteristics, making the trained models highly dependent on the data domai...
Chapter
Full-text available
The study of tracts—bundles of nerve fibers that are organized together and have a similar function—is of major interest in neurology and related areas of science. Tractography is the medical imaging technique that provides the information to estimate these tracts, which is crucial for clinical applications and scientific research. This is a comple...
Article
Full-text available
The use of deep learning makes it possible to achieve extraordinary results in all kinds of tasks related to computer vision. However, this performance is strongly related to the availability of training data and its relationship with the distribution in the eventual application scenario. This question is of vital importance in areas such as roboti...
Article
The large amount of debris in our oceans is a global problem that dramatically impacts marine fauna and flora. While a large number of human-based campaigns have been proposed to tackle this issue, these efforts have been deemed insufficient due to the insurmountable amount of existing litter. In response to that, there exists a high interest in th...
Article
Prototype Generation (PG) methods are typically considered for improving the efficiency of the k-Nearest Neighbour (kNN) classifier when tackling high-size corpora. Such approaches aim at generating a reduced version of the corpus without decreasing the classification performance when compared to the initial set. Despite their large application in...
Article
Full-text available
Optical music recognition (OMR) is the field that studies how to automatically read music notation from score images. One of the relevant steps within the OMR workflow is the staff-region retrieval. This process is a key step because any undetected staff will not be processed by the subsequent steps. This task has previously been addressed as a sup...
Article
Full-text available
Existing research for the assistance of visually impaired people mainly focus on solving a single task (such as reading a text or detecting an obstacle), hence forcing the user to switch applications to perform other actions. This paper proposes an interactive system for mobile devices controlled by hand gestures that allow the user to control the...
Preprint
Full-text available
Prototype Generation (PG) methods are typically considered for improving the efficiency of the $k$-Nearest Neighbour ($k$NN) classifier when tackling high-size corpora. Such approaches aim at generating a reduced version of the corpus without decreasing the classification performance when compared to the initial set. Despite their large application...
Preprint
Full-text available
This paper proposes an interactive system for mobile devices controlled by hand gestures aimed at helping people with visual impairments. This system allows the user to interact with the device by making simple static and dynamic hand gestures. Each gesture triggers a different action in the system, such as object recognition, scene description or...
Preprint
Full-text available
Logo classification is a particular case of image classification, since these may contain only text, images, or a combination of both. In this work, we propose a system for the multi-label classification and similarity search of logo images. The method allows obtaining the most similar logos on the basis of their shape, color, business sector, sema...
Chapter
Recognition methods based on Deep Learning (DL) currently represent the state of the art in a number of robot-related tasks as, for instance, computer vision for autonomous guidance or object manipulation. Nevertheless, the large requirements of annotated data constitute one of their main drawbacks, at least when considering supervised frameworks....
Chapter
In recent years, the large amount of debris scattered throughout the ocean is becoming one of the major pollution problems, causing extinction of species and accelerating the degradation of our planet, among other environmental issues. Since the manual treatment of this waste represents a considerably tedious task, autonomous frameworks are gaining...
Conference Paper
Full-text available
We present an innovative two-headed attention layer that combines geometric and latent features to segment a 3D scene into semantically meaningful subsets. Each head combines local and global information, using either the geometric or latent features, of a neighborhood of points and uses this information to learn better local relationships. This Ge...
Preprint
Full-text available
We present an innovative two-headed attention layer that combines geometric and latent features to segment a 3D scene into semantically meaningful subsets. Each head combines local and global information, using either the geometric or latent features, of a neighborhood of points and uses this information to learn better local relationships. This Ge...
Article
The k-Nearest Neighbor (kNN) algorithm is widely used in the supervised learning field and, particularly, in search and classification tasks, owing to its simplicity, competitive performance, and good statistical properties. However, its inherent inefficiency prevents its use in most modern applications due to the vast amount of data that the curre...
Article
Full-text available
This work proposes a multimodal approach with which to predict the regional Gross Domestic Product (GDP) by combining historical GDP values with the embodied information in Twitter messages concerning the current economic condition. This proposal is of great interest, since it delivers forecasts at higher frequencies than both the official statisti...
Article
Binarization is a well-known image processing task, whose objective is to separate the foreground of an image from the background. One of the many tasks for which it is useful is that of preprocessing document images in order to identify relevant information, such as text or symbols. The wide variety of document types, alphabets, and formats makes...
Conference Paper
Document analysis is a key step within the typical Optical Music Recognition workflow. It processes an input image to obtain its layered version by extracting the different sources of information. Recently, this task has been formulated as a supervised learning problem, specifically by means of Convolutional Neural Networks due to their high perfor...
Preprint
Full-text available
Binarization is a well-known image processing task, whose objective is to separate the foreground of an image from the background. One of the many tasks for which it is useful is that of preprocessing document images in order to identify relevant information, such as text or symbols. The wide variety of document types, typologies, alphabets, and fo...
Article
Full-text available
In the context of supervised statistical learning, it is typically assumed that the training set comes from the same distribution that draws the test samples. When this is not the case, the behavior of the learned model is unpredictable and becomes dependent upon the degree of similarity between the distribution of the training set and the distribu...
Article
Full-text available
Optical Music Recognition (OMR) is the research field focused on the automatic reading of music from scanned images. Its main goal is to encode the content into a digital and structured format with the advantages that this entails. This discipline is traditionally aligned to a workflow whose first step is the document analysis. This step is respons...
Article
Full-text available
The increasing consideration of Convolutional Neural Networks (CNN) has not prevented the use of the k-Nearest Neighbor (kNN) method. In fact, a hybrid CNN-kNN approach is an interesting option in which the network specializes in feature extraction through its activations (Neural Codes), while the kNN has the advantage of performing a retrieval by...
Conference Paper
Full-text available
The paper presents a working pipeline which integrates hardware and software in an automated robotic rose cutter. To the best of our knowledge, this is the first robot able to prune rose bushes in a natural environment. Unlike similar approaches like tree stem cutting, the proposed method does not require to scan the full plant, have multiple camer...
Article
Full-text available
The method proposed in this paper is part of the vision module of a garden robot capable of navigating towards rose bushes and clip them according to a set of pruning rules. The method is responsible for performing the segmentation of the branches and recovering their morphology in 3D. The obtained reconstruction allows the manipulator of the robot...
Preprint
In the context of supervised statistical learning, it is typically assumed that the training set comes from the same distribution that draws the test samples. When this is not the case, the behavior of the learned model is unpredictable and becomes dependent upon the degree of similarity between the distribution of the training set and the distribu...
Article
Full-text available
Data augmentation has become a standard step to improve the predictive power and robustness of convolutional neural networks by means of the synthetic generation of new samples depicting different deformations. This step has been traditionally considered to improve the network at the training stage. In this work, however, we study the use of data a...
Conference Paper
Full-text available
This work presents a method to predict the regional Gross Domestic Product (GDP) using the textual information stored in tweets. In particular, we propose the use of a hybrid autoencoder to predict the GDP of the Valencian Community (Spain) using the tweets written by the most influential economists, politicians, newspapers, and institutions in the...
Chapter
Full-text available
The existence of a large amount of untranscripted music manuscripts has caused initiatives that use Machine Learning (ML) for Optical Music Recognition, in order to efficiently transcribe the music sources into a machine-readable format. Although most music manuscript are similar in nature, they inevitably vary from one another. This fact can negat...
Chapter
Full-text available
The classification of logos is a particular case within computer vision since they have their own characteristics. Logos can contain only text, iconic images or a combination of both, and they usually include figurative symbols designed by experts that vary substantially besides they may share the same semantics. This work presents a method for mul...
Article
Full-text available
This work presents a method that can be used for the efficient detection of small maritime objects. The proposed method employs aerial images in the visible spectrum as inputs to train a categorical Convolutional Neural Network for the classification of ships. A subset of those filters that make the greatest contribution to the classification of th...
Article
Full-text available
We present a method to detect maritime oil spills from Side-Looking Airborne Radar (SLAR) sensors mounted on aircraft in order to enable a quick response of emergency services when an oil spill occurs. The proposed approach introduces a new type of neural architecture named Convolutional Long Short Term Memory Selectional AutoEncoders (CMSAE) which...
Chapter
Full-text available
In this paper, we present a method for locating and recognizing hand gestures from images, based on Deep Learning. Our goal is to provide an intuitive and accessible way to interact with Computer Vision-based mobile applications aimed to assist visually impaired people (e.g. pointing a finger at an object in a real scene to zoom in for a close-up o...
Article
Full-text available
The use of peer assessment for open-ended activities has advantages for both teachers and students. Teachers might reduce the workload of the correction process and students achieve a better understanding of the subject by evaluating the activities of their peers. In order to ease the process, it is advisable to provide the students with a rubric o...
Chapter
Full-text available
In this work, we present a multimodal approach to perform object recognition from photographs taken using smartphones. The proposed method extracts neural codes from the input image using a Convolutional Neural Network (CNN), and combines them with a series of metadata gathered from the smartphone sensors when the picture was taken. These metadata...
Article
Full-text available
In this study, we use unmanned aerial vehicles equipped with multispectral cameras to search for bodies in maritime rescue operations. A series of flights were performed in open‐water scenarios in the northwest of Spain, using a certified aquatic rescue dummy in dangerous areas and real people when the weather conditions allowed it. The multispectr...
Chapter
Full-text available
Data augmentation is a widely considered technique to improve the performance of Convolutional Neural Networks during training. This step consists in synthetically generate new labeled data by perturbing the samples of the training set, which is expected to provide more robustness to the learning process. The problem is that the augmentation proced...
Article
Full-text available
We present a hybrid approach to improve the accuracy of Convolutional Neural Networks (CNN) without retraining the model. The proposed architecture replaces the softmax layer by a k-Nearest Neighbor (kNN) algorithm for inference. Although this is a common technique in transfer learning, we apply it to the same domain for which the network was train...
Article
Full-text available
This work presents a system for the detection of ships and oil spills using Side-Looking Airborne Radar (SLAR) images. The proposed method employs a two-stage architecture composed of three pairs of Convolutional Neural Networks (CNNs). Each pair of networks is trained to recognize a single class (ship, oil spill and coast) by following two steps:...
Article
Full-text available
In this paper we study the learning of graph languages. We extend the well-known classes of k-testability and k-testability in the strict sense languages to directed graph languages. We propose a grammatical inference algorithm to learn the class of directed acyclic k-testable in the strict sense graph languages. The algorithm runs in polynomial ti...
Article
Full-text available
In the education context, open-ended works generally entail a series of benefits as the possibility of develop original ideas and a more productive learning process to the student rather than closed-answer activities. Nevertheless, such works suppose a significant correction workload to the teacher in contrast to the latter ones that can be self-co...
Article
Full-text available
The automatic classification of ships from aerial images is a considerable challenge. Previous works have usually applied image processing and computer vision techniques to extract meaningful features from visible spectrum images in order to use them as the input for traditional supervised classifiers. We present a method for determining if an aeri...
Article
Full-text available
In this work, we use deep neural autoencoders to segment oil spills from Side-Looking Airborne Radar (SLAR) imagery. Synthetic Aperture Radar (SAR) has been much exploited for ocean surface monitoring, especially for oil pollution detection, but few approaches in the literature use SLAR. Our sensor consists of two SAR antennas mounted on an aircraf...
Conference Paper
Full-text available
There are large collections of music manuscripts preserved over the centuries. In order to analyze these documents it is necessary to transcribe them into a machine-readable format. This process can be done automatically using Optical Music Recognition (OMR) systems, which typically consider segmentation plus classification workflows. This work is...
Poster
Full-text available
In this work, the main aim is to detect candidate regions to be oil slicks in Side-Looking Airborne Radar (SLAR) images using Deep Learning techniques. The proposed approach is based on Autoencoders to allow us to automatically discriminate oil spills without hand-crafted features or other features extracted from traditional computer vision techniq...
Article
Full-text available
While standing as one of the most widely considered and successful supervised classification algorithms, the k-Nearest Neighbor (kNN) classifier generally depicts a poor efficiency due to being an instance-based method. In this sense, Approximated Similarity Search (ASS) stands as a possible alternative to improve those efficiency issues at the exp...
Article
Full-text available
Staff-line removal is an important preprocessing stage as regards most Optical Music Recognition systems. The common procedures employed to carry out this task involve image processing techniques. In contrast to these traditional methods, which are based on hand-engineered transformations, the problem can also be approached from a machine learning...
Article
Full-text available
Binarization plays a key role in the automatic information retrieval from document images. This process is usually performed in the first stages of documents analysis systems, and serves as a basis for subsequent steps. Hence it has to be robust in order to allow the full analysis workflow to be successful. Several methods for document image binari...
Article
Full-text available
MirBot is a collaborative application for smartphones that allows users to perform object recognition. This app can be used to take a photograph of an object, select the region of interest and obtain the most likely class (dog, chair, etc.) by means of similarity search using features extracted from a convolutional neural network (CNN). The answers...
Conference Paper
Full-text available
Resumen La detección de plagios en los trabajos entregados por los alumnos es un problema que ha existido tradicio-nalmente cuando se entregaban en formato papel pero que en los últimos años se ha incrementado debido a la gran cantidad de información que existe en Internet, a la facilidad para encontrarla usando buscadores y a la entrega electrónic...
Conference Paper
Full-text available
This work presents a new spatial verification technique for image similarity search. The proposed algorithm evaluates the geometry of the detected local keypoints by building segments connecting pairs of points and analyzing their intersections in a 2D plane. We show that these intersections remain constant with respect to different geometric trans...
Article
Full-text available
The potential of integrating multiagent systems and virtual environments has not been exploited to its whole extent. This paper proposes a model based on grammars, called Minerva, to construct complex virtual environments that integrate the features of agents. A virtual world is described as a set of dynamic and static elements. The static part is...
Conference Paper
El rol del profesor cambia cuando hace uso de las TIC, su figura tiende a planificar y guiar situaciones de aprendizaje más que a ser un mero transmisor de información como en el pasado. El disponer del conocimiento necesario sobre las herramientas adecuadas para realizar la labor de seguimiento y control es fundamental para descongestionar al doce...
Conference Paper
Full-text available
This study presents a multimodal interactive image retrieval system for smartphones (MirBot). The application is designed as a collaborative game where users can categorize photographs according to the WordNet hierarchy. After taking a picture, the region of interest of the target can be selected, and the image information is sent with a set of met...
Conference Paper
Full-text available
In this paper, we tackle the task of graph language learning. We first extend the well-known classes of k-testability and k-testability in the strict sense languages to directed graph lan-guages. Second, we propose a graph automata model for directed acyclic graph languages. This graph automata model is used to propose a grammatical inference algor...
Article
Full-text available
Virtual Worlds Generator is a grammatical model that is proposed to define virtual worlds. It integrates the diversity of sensors and interaction devices, multimodality and a virtual simulation system. Its grammar allows the definition and abstraction in symbols strings of the scenes of the virtual world, independently of the hardware that is used...
Chapter
Full-text available
We present three new algorithms to model images with graph primitives. Our main goal is to propose algorithms that could lead to a broader use of graphs, especially in pattern recognition tasks. The first method considers the q-tree representation and the neighbourhood of regions. We also propose a method which, given any region of a q-tree, finds...
Chapter
Full-text available
Virtual Worlds Generator is a grammatical model that is proposed to define virtual worlds. It integrates the diversity of sensors and interaction devices, multimodality and a virtual simulation system. Its grammar allows the definition and abstraction in symbols strings of the scenes of the virtual world, independently of the hardware that is used...
Conference Paper
Full-text available
A formal grammar-based model is presented to integrate the essential characteristics of a Multi-Agent System with the visualization given by an Interactive Graphic Systems. This model adds several advantages, such as the separation between the implementation of the system activity and the hardware devices, or the easy reusability of components. To...
Conference Paper
Full-text available
This article proposes a new method for robust and accurate detection of the orientation and the location of an object on low-contrast surfaces in an industrial context. To be more efficient and effective, our method employs only artificial vision. Therefore, productivity is increased since it avoids the use of additional mechanical devices to ensur...
Conference Paper
Full-text available
Presentamos la experiencia realizada en un programa docente de Aprendizaje Basado en Proyectos que comprende 4 asignaturas de las titulaciones de Ingeniería Informática. El objetivo era desarrollar un proyecto conjunto que permitiera experimentar el nuevo modelo y cubriera los objetivos de cada asignatura por separado. Mostramos los cambios docente...
Conference Paper
Full-text available
In this paper a new method for reconstructing 3D scenes from stereo images is presented, as well as an algorithm for environment mapping, as an application of the previous method. In the reconstruction process a geometrical rectification filter is used to remove the conical perspective of the images. It is essential to recover the geometry of the s...
Article
Full-text available
Se presenta un sistema de visión estereoscópica basada en segmentación que aprovecha la información obtenida y las ventajas de este tipo de sistemas para la detección de objetos en la escena y la estimación de su profundidad. El proceso de segmentación elegido, umbralización adaptativa, permite obtener buenos resultados con un tiempo de cómputo muy...
Conference Paper
The reconstruction and mapping of real scenes is a crucial element in several fields such as robot navigation. Stereo vision can be a powerful solution. However the perspective effect arises, as well as other problems, when the reconstruction is tackled using depth maps obtained from stereo images. A new approach is proposed to avoid the perspectiv...
Article
Full-text available
Resumen El presente trabajo se centra en el proceso de personalización y mejora de la herramienta de gestión del aprendizaje (LMS 1) Moodle para su adaptación a los requisitos de dos asignaturas de la titulación de Ingeniería Informática. Al abordar este proceso se detectaron ciertas funcionalidades que la plataforma aún no brinda y que sería desea...
Conference Paper
Full-text available
En este artículo presentamos una experiencia desarrollada en un programa docente que utiliza el Aprendizaje Basado en Proyectos (ABP) para impartir de forma conjunta cuatro asignaturas de la titulación de Ingeniería Informática. Para ello, en primer lugar se propuso la realización de un videojuego como proyecto conjunto. La elección de esta temátic...
Conference Paper
Full-text available
EagleEye es un entorno de trabajo orientado a la docencia y a la investigación que simplifica el proceso de implementación y prueba de un proyecto. Para facilitar el desarrollo global, el sistema se basa en la definición visual de un grafo de procesamiento mediante el cual se describe el flujo de datos del proceso implementado. A cada uno de los no...
Conference Paper
In this paper we present a method for mapping 3D unknown environments from stereo images. It is based on a dense disparity image obtained by a process of window correlation. To each image in the sequence a geometrical rectification process is applied, which is essential to remove the conical perspective of the images obtained with a photographic ca...
Article
A system to reconstruct three-dimensional scenes from stereo images is presented. It is based on a dense disparity image obtained by a process of window correlation. Starting from these images and after the application of a geometrical rectification, the 3D reconstruction of the scene is obtained. The geometrical rectification is essential to corre...
Article
Full-text available
A system to reconstruct three-dimensional scenes from stereo images is presented. The reconstruction is based on a dense disparity image obtained by a process of window correlation, applying a geometrical rectification before generating a three-dimensional matrix which stores the spatial occupation. The geometrical rectification is essential to cor...

Network

Cited By