Yoan Gutiérrez

Yoan Gutiérrez
University of Alicante | UA · Department of Software and Computing Systems

Professor

About

91
Publications
13,016
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
446
Citations
Introduction
Yoan Gutiérrez currently works at the Department of Software and Computing Systems, University of Alicante. Yoan does research in Artificial Intelligence and Data Mining, and related projects.

Publications

Publications (91)
Article
Full-text available
Large language models have shown impressive performance in Natural Language Processing tasks, but their black box characteristics render the explain-ability of the model's decision difficult to achieve and the integration of semantic knowledge. There has been a growing interest in combining external knowledge sources with language models to address...
Article
The spread of fake news (FN) has attracted attention from disciplines ranging from social sciences to Artificial Intelligence. This work is novel because it explores the news-sharing behaviour of social-media users, focussing on those that spread FN, rather than the psychological motivations behind them. The 14-item Risky News-Sharing Quotient (RNS...
Article
Full-text available
The role played by science and technology parks (STPs) in technology transfer, industrial innovation, and economic growth is examined in this paper. The accurate monitoring of their evolution and impact is hindered by the lack of uniformity in STP models or goals, and the scarcity of high-quality datasets. This work uses existing terminologies, def...
Article
El modelado de skipgrams es una técnica para la generación de términos multi-palabra que conserva parte de la secuencialidad y flexibilidad del lenguaje. Sin embargo, en algunos casos el número de skipgrams generados puede ser excesivo a medida que se aumenta la distancia entre palabras. Además, esta distancia no suele ser tenida en cuenta a la hor...
Article
Automatic Machine Learning (Auto-ML) tools enable the automatic solution of real-world problems through machine learning techniques. These tools tend to be more time consuming than standard machine learning libraries, therefore, exploiting all the available resources to the full is a valuable feature. This paper presents a two-phase optimization sy...
Article
Discovering the main features of virality patterns in Twitter is the focus of this research. Five trending topics related to the COVID-19 pandemic were selected for the study, with Spanish as the target language. To carry out the discovery of virality patterns, we applied opinion mining techniques that enable us to structure the information based o...
Article
Full-text available
The framework of the present study was the destination life cycle model, a classical model that describes the development of tourist destinations. We examined mass tourism in Benidorm based on tourist accommodation supply and demand statistics over the January 2016 to October 2018 period, provided by Spain's National Institute for Statistics. The o...
Article
Full-text available
Apart from the economic impact of the online gambling industry, the social, public order and health-related consequences of the industry merit analysis to inform appropriate action, regulatory or otherwise. The omnipresence of ICTs, the inability to use technologies properly, along with the growth of online gambling channels, have acted simultaneou...
Article
Corpora are one of the most valuable resources at present for building machine learning systems. However, building new corpora is an expensive task, which makes the automatic extension of corpora a highly attractive task to develop. Hence, finding new strategies that reduce the cost and effort involved in this task, while at the same time guarantee...
Research
Webpage: https://knowledge-learning.github.io/ehealthkd-2020/ Datasets and tools: https://github.com/knowledge-learning/ehealthkd-2020/tree/master/data Held as part of the evaluation forum IberLEF in the XXXVI edition of the International Conference of the Spanish Society for Natural Language Processing (SEPLN 2020). September 22-25, 2020. Málaga,...
Article
This paper introduces Hierarchical Machine Learning Optimisation (HML-Opt), an AutoML framework that is based on probabilistic grammatical evolution. HML-Opt has been designed to provide a flexible framework where a researcher can define the space of possible pipelines to solve a specific machine learning problem, which can range from high-level de...
Article
The massive amount of biomedical information published online requires the development of automatic knowledge discovery technologies to effectively make use of this available content. To foster and support this, the research community creates linguistic resources, such as annotated corpora, and designs shared evaluation campaigns and academic compe...
Chapter
This research presents NLP-Opt, an Auto-ML technique for optimizing pipelines of machine learning algorithms that can be applied to different Natural Language Processing tasks. The process of selecting the algorithms and their parameters is modelled as an optimization problem and a technique was proposed to find an optimal combination based on the...
Article
The abundance of digital media information coming from different sources, completely redefines approaches to media content production management and distribution for all contexts (i.e. technical, business and operational). Such content includes descriptive information (i.e. metadata) about an asset (e.g. a movie, song or game), as well as playable...
Article
SAM is a social media platform that enhances the experience of watching video content in a conventional living room setting, with a service that lets the viewer use a second screen (such as a smart phone) to interact with content, context and communities related to the main video content. This article describes three key functionalities used in the...
Article
This project is motivated by the need to know the digital social university ecosystem of those universities present in social networks. More specifically, the focus is the University of Alicante. To that end, Human Language Technologies (TLH) play a fundamental role. TLH are employed not only to extract meta-data from comments from social networks,...
Article
Social-Univ 2.0 is a web application that monitors the university digital ecosystem in social networks and, specifically, it focuses on analysing the University of Alicante environment. It uses human language technologies to extract meta-data from tweets and represents actors' profiles, as well as their social relations in a time period. This appli...
Article
This paper presents and describes eHealth-KD corpus. The corpus is a collection of 1173 Spanish health-related sentences manually annotated with a general semantic structure that captures most of the content, without resorting to domain-specific labels. The semantic representation is first defined and illustrated with example sentences from the cor...
Conference Paper
In this paper we present our contribution for the TREC 2018 Real-Time Summarization track. This task contains two scenarios: push notifications, and email digest. We participated in both, submitting three runs on each one. Our main goal was to evaluate the effectiveness the techniques employed in Social Analytics, a reputation analysis platform, wh...
Article
Purpose The purpose of this paper is to propose a mathematical model to determine invariant sets, set covering, orbits and, in particular, attractors in the set of tourism variables. Analysis was carried out based on an algorithm and applying an interpretation of chaos theory developed in the context of General Systems Theory and Big Data. Design...
Article
Full-text available
In recent year the great popularity that enjoys mobile technologies has led most users to become consumers and producers of information on the network. Many studies speak about this phenomenon as an activity that is capable of doubling or tripling existing content on an annual basis. The huge amount of information makes the current user oriented sy...
Article
Ontologies are appropriate structures for capturing and representing the knowledge about a domain or task. However, the design and further population of them are both difficult tasks, normally addressed in a manual or in a semi-automatic manner. The goal of this article is to define and extend a task-oriented ontology schema that semantically repre...
Conference Paper
The recent failures of traditional poll models, like the predictions in United Kingdom with the Brexit, or in United States presidential election with the victory of Donald Trump, have been noteworthy. With the decline of traditional poll models and the growth of the social networks, automatic tools are gaining popularity to make predictions in thi...
Conference Paper
Full-text available
Nowadays, search for documents on the Internet is becoming increasingly difficult. The reason is the amount of content published by users (articles, comments, blogs, reviews). How to facilitate that the users can find their required documents? What would be necessary to provide useful document meta-data for supporting search engines? In this articl...
Article
This paper presents an unsupervised approach to solve semantic ambiguity based on the integration of the Personalized PageRank algorithm with word-sense frequency information. Natural Language tasks such as Machine Translation or Recommender Systems are likely to be enriched by our approach, which includes semantic information that obtains the appr...
Article
Full-text available
La Web 2.0 ha focalizado la importancia de la información, no en unos pocos expertos en un tema, sino en una multitud de opiniones vertidas por usuarios a través de diversos medios en las redes sociales. Debido a ello, han cobrado un mayor interés los sistemas que son capaces de determinar qué es lo que piensan los usuarios sobre un determinado con...
Article
Full-text available
The writing style used in social media usually contains informal elements that can lower the performance of Natural Language Processing applications. For this reason, text normalisation techniques have drawn a lot of attention recently when dealing with informal content. However, not all the texts present the same level of informality and may not r...
Conference Paper
Full-text available
This document presents a description of the Entity Linking system known as REDES, which was involved into the track Entity Discovery and Linking (EDL) of the challenge TAC Knowledge Base Population (KBP) 2016. The system developed is result of the collaboration among different research projects, in particular SAM, from which specific modules were r...
Article
Full-text available
Mobile devices have significantly changed the way users access the information available on Internet. These devices allow instant access anytime and anywhere, but they have a number of important limitations with respect to personal computers. The limited screen space and, sometimes, the limited capacity to receive the information, make the selectio...
Article
Nowadays, the vast amount of heterogeneous information available on the Internet poses difficulties for users when they have to find the information they require, since this is a non-trivial task. In this respect, Human Language Technologies (HLT) tools offer a great support for this task, being able to provide the specific information requested by...
Article
Full-text available
Nowadays, the vast amount of heterogeneous information available on the Internet poses difficulties for users when they have to find the information they require, since this is a non-trivial task. In this respect, Human Language Technologies (HLT) tools offer a great support for this task, being able to provide the specific information requested by...
Article
Abstract In this work we present a semantic framework suitable of being used as support tool for recommender systems. Our purpose is to use the semantic information provided by a set of integrated resources to enrich texts by conducting different NLP tasks: WSD, domain classification, semantic similarities and sentiment analysis. After obtaining th...
Conference Paper
Full-text available
Today's generation of Internet-connected devices has changed the way users are interacting with media, exchanging their role from passive and unidirectional to proactive and interactive. Under this new role, users are able to comment or rate a TV show and search for information regarding characters, facts, multimedia content or any other related ma...
Conference Paper
This paper presents an approach to entity linking in the domain of Social TV on two different knowledge bases: Wikipedia and our own ontology of media assets. We provide insights into the main challenges posed by this task, together with a description of different tools and related projects in the field. Since the system described is part of a plat...
Conference Paper
This paper presents an opinion mining approach in the domain of Social TV using two different contexts: Twitter user messages for Spanish and English, as well as movie reviews. The main goal of this paper is to study the benefits of opinion mining approaches using ranking skip-gram techniques for processing user feedbacks. To carry out this study i...
Article
The great amount of available online information is making increasingly more and more difficult that users can assimilate such as volume of information, being this almost inconceivable without using Human Language Technologies (HLT) tools, for instance, information retrieval systems or automatic summarisers. The interest of this emerging project (a...
Article
Full-text available
Today's generation of Internet devices has changed how users are interacting with media, from passive and unidirectional users to proactive and interactive. Users can use these devices to comment or rate a TV show and search for related information regarding characters, facts or personalities. This phenomenon is known as second screen. This paper d...
Article
Full-text available
ElectionMap is a web application that follows, in Twitter, entities previously established and related to the politics. The user's opinions about the entities are classified according to its valuation by using sentiment analysis processes. Afterwards the opinions are represented in a geographic map that allows to know the social acceptance of spani...
Article
Social Rankings is a web application that follows different entities in the social networks in real time. It detects and analyses the opinions about these entities using sentiment analysis techniques, to generate a visual report of their reputation and evolution in time. © 2015 Sociedad Española para el Procesamiento del Lenguaje Natural.
Article
Full-text available
In this paper, we present a combination of different types of sentiment analysis approaches in order to improve the individual performance of them. These ones consist of (I) ranking algorithms for scoring sentiment features as bi-grams and skip-grams extracted from annotated corpora; (II) a polarity classifier based on a deep learning algorithm; an...
Article
Full-text available
This project is focused on intelligent information processing using different sources such as micro-blogs, blogs, forums, specialized websites, etc. The goal is to obtain new knowledge using semantic information. As a result we can determine user requirements or improve organizations reputation. This paper describes the problems faced, working hypo...
Conference Paper
Full-text available
In this paper we describe the system submitted for the SemEval 2014 Task 9 (Sentiment Analysis in Twitter) Subtask B. Our contribution consists of a supervised approach using machine learning techniques, which uses the terms in the dataset as features. In this work we do not employ any external knowledge and resources. The novelty of our approach l...
Conference Paper
Full-text available
This work introduces a new approach for aspect based sentiment analysis task. Its main purpose is to automatically assign the correct polarity for the aspect term in a phrase. It is a probabilistic automata where each state consists of all the nouns, adjectives, verbs and adverbs found in an annotated corpora. Each one of them contains the number o...
Conference Paper
Full-text available
In this paper, we present our contribution for the Task 1 (6 levels polarity classification) of the TASS 2013 competition. This contribution consists on two different approaches: a modified version of a ranking algorithm (RA-SR) using bigrams, and new proposal using a skipgrams scorer. These approaches create sentiment lexicons able to retain the c...
Article
Full-text available
In this work, we present a sentence-level subjectivity detection method, which relies on Subjectivity Word Sense Disambiguation (SWSD). We use an unsupervised sense clustering-based method for SWSD. In our method, semantic resources tagged with emotions and sentiment polarities are used to apply subjectivity detection, intervening Word Sense Disamb...
Conference Paper
Full-text available
This paper describes the specifications and results of UMCC_DLSI system, which participated in the Semantic Textual Similarity task (STS) of SemEval-2013. Our supervised system uses different types of lexical and semantic features to train a Bagging classifier used to decide the correct option. Related to the different features we can highlight t...
Conference Paper
Full-text available
In this paper, we describe the development and performance of the supervised system UMCC_DLSI-(SA). This system uses corpora where phrases are annotated as Positive, Negative, Objective, and Neutral, to achieve new sentiment resources involving word dictionaries with their associated polarity. As a result, new sentiment inventories are obtained and...
Conference Paper
Full-text available
This paper describes the specifications and results of UMCC_DLSI-(EPS) system, which participated in the first Evaluating Phrasal Semantics of SemEval-2013. Our supervised system uses different kinds of semantic features to train a bagging classifier used to select the correct similarity option. Related to the different features we can highlight th...
Conference Paper
Full-text available
This work introduces a new unsupervised approach to multilingual word sense disambiguation. Its main purpose is to automatically choose the intended sense (meaning) of a word in a particular context for different languages. It does so by selecting the correct Babel synset for the word and the various Wiki Page titles that mention the word. BabelNet...
Conference Paper
Full-text available
In this paper we describe UMCC_DLSI-(DDI) system which attempts to detect and classify drug entities in biomedical texts. We discuss the use of semantic class and words relevant domain, extracted with ISR-WN (Integration of Semantic Resources based on WordNet) resource to obtain our goal. Following this approach our system obtained an F-Measure of...
Article
Full-text available
Desambiguación del sentido de las palabras extrayendo características de las relaciones internas en ISR-WN Word Sense Disambiguation extracting features from ISR-WN internal relations Resumen: Con el objetivo de resolver el problema de la desambiguación se propone un procedimiento informático en el cual se detectan características presentes entre l...
Conference Paper
Full-text available
Resumen: El método de Extracción de Información Semántica en ontologías propone la inferencia de ontologías creadas en formato RDF, mediante un conjunto de transformaciones e identificación de cada término dependiendo de su contexto. Como consecuencia de ello se forma como modelo final un grafo de contenido. Una vez terminado el proceso de lectura...
Conference Paper
Full-text available
We present a study about the influence of sentiment polarity (positive, negative and neutral) in the Textual Entailment Recognition. The main idea of this paper is guided to identify the behavior of the sentiment polarity (obtained by a method of sentiment polarity classification based at the construction of Relevant Polarity Trees) on the Recognit...
Conference Paper
Full-text available
This paper describes the specifications and results of UMCC_DLSI system, which participated in the first Semantic Textual Similarity task (STS) of SemEval-2012. Our supervised system uses different kinds of semantic and lexical features to train classifiers and it uses a voting process to select the correct option. Related to the different features...
Conference Paper
Full-text available
In this paper we propose a new graph-based approach to solve semantic ambiguity using a semantic net based on WordNet. Our proposal uses an adaptation of the Clique Partitioning Technique to extract sets of strongly related senses. For that, an initial graph is obtained from senses of WordNet combined with the information of several semantic catego...
Conference Paper
Full-text available
This paper presents a new approach to solve semantic ambiguity using an adaptation of the Cliques Partitioning Technique to N distance. This new approach is able to identify sets of strongly related senses using a multidimensional graph based on different resources: WordNet Domains, SUMO and WordNet Affects. As a result, each Clique will contain re...
Conference Paper
Full-text available
We evaluate the effectiveness of using our edit distances algorithm to improving an unsupervised language-independent stemming method. The main idea is to create morphological families through the automatic words grouping using our distance. Based on that grouping, we make a stemming process. The capacity of the edit distance algorithm in the task...
Article
Abstract: In this paper we present the enrichment of the Integration of Semantic Resources based in WordNet (ISR-WN Enriched). This new proposal improves the previous one where several semantic resources such as SUMO, WordNet Domains and WordNet Affects were related, adding other semantic resources such as Semantic Classes and SentiWordNet. Firstly...
Article
Full-text available
Abstract: In this paper we present the enrichment of the Integration of Semantic Resources based in WordNet (ISR-WN Enriched). This new proposal improves the previous one where several semantic resources such as SUMO, WordNet Domains and WordNet Affects were related, adding other semantic resources such as Semantic Classes and SentiWordNet. Firstly...
Conference Paper
Full-text available
In this paper, we concentrate on the 3 of the tracks proposed in the NTCIR 8 MOAT, concerning the classification of sentences according to their opinionatedness, relevance and polarity. We propose a method for the detection of opinions, relevance, and polarity classification, based on ISR-WN (a resource for the multidimensional analysis with Releva...
Conference Paper
Full-text available
Introducción Hoy en día, son prácticamente muchas universidades españolas las que se han visto obligadas a la adaptación de sus planes de estudio a los requisitos de Bolonia y también el sistema educacional cubano está inmerso en una renovación de sus planes. Si toda la información estuviera en una base de datos, se podría explotar y detectar todos...
Conference Paper
Full-text available
In this paper we concentrate on the resolution of the semantic ambiguity that arises when a given word has several meanings. This specific task is commonly referred to as Word Sense Disambiguation (WSD). We propose a method that obtains the appropriate senses from a multidimensional analysis (using Relevant Semantic Trees). Our method uses differen...
Conference Paper
Full-text available
This paper describes the UMCC-DLSI system in SemEval-2010 task number 17 (All-words Word Sense Disambiguation on Specific Domain). The main purpose of this work is to evaluate and compare our computational resource of WordNet's mappings using 3 different methods: Relevant Semantic Tree, Relevant Semantic Tree 2 and an Adaptation of k-clique's Techn...
Article
Full-text available
Resumen: Este artículo presenta una herramienta para integrar diferentes recursos basados en la estructura y relaciones internas de WordNet, utilizando técnicas de grafos. El objetivo es centralizar en una única herramienta el acceso y manejo de interrelaciones entre diferentes recursos tales como: WordNet Domains, Wordnet Affect y SUMO. Como resul...
Article
Full-text available
El sector turístico en Cuba goza de gran salud, gracias al impulso recibido en diferentes frentes y a la demanda de clientes, por lo que se hace indispensable perfeccionar el funcionamiento de los servi-cios y, para ello, una de las vías lo constituye la informatización de sus procesos. El artículo presenta un sistema para la reserva de servicios h...
Article
El sector turístico en Cuba goza de gran salud, gracias al impulso recibido en diferentes frentes y a la demanda de clientes, por lo que se hace indispensable perfeccionar el funcionamiento de los servi-cios y, para ello, una de las vías lo constituye la informatización de sus procesos. El artículo presenta un sistema para la reserva de servicios h...
Article
Full-text available
S.A. recibe los ingresos del sector turístico, sus instalaciones y dependencias. Cuenta en la actualidad con más de 600 clientes cubanos y extranjeros y está facultada entre otras operaciones, para emitir instrumentos de pago a favor de terceros y tramitar cobros y pagos de entidades turísticas. La elaboración manual de los documentos de pago que e...

Network

Cited By