Context in source publication

Context 1
... this purpose, F S(s) was normalized to the range of values [0,1]. Then, for each sentence s, Γ = 11 + 1 = 12 metrics were used as inputs to the Decision Algorithm, as showed in figure 3. ...

Citations

... Initiatives such as the Linguistic Linked Open Data cloud 8 (henceforward LLOD) are focused on collecting and publishing language resources in Semantic Web formats according to the Linked Data principles [7]. When developing NLP services, one of the main challenges is finding language resources on a certain subject area with acceptable quality and ready to be reused, as revealed, for example, in previous experiments on summarisation or machine translation enhanced with terminological resources [54,60] [5]. Consequently, our motivating scenario is focused on assisting users with different backgrounds and expertise levels to face language and related needs (see Fig. 1). ...
Article
Full-text available
Domain-specific terminologies play a central role in many language technology solutions. Substantial manual effort is still involved in the creation of such resources, and many of them are published in proprietary formats that cannot be easily reused in other applications. Automatic term extraction tools help alleviate this cumbersome task. However, their results are usually in the form of plain lists of terms or as unstructured data with limited linguistic information. Initiatives such as the Linguistic Linked Open Data cloud (LLOD) foster the publication of language resources in open structured formats, specifically RDF, and their linking to other resources on the Web of Data. In order to leverage the wealth of linguistic data in the LLOD and speed up the creation of linked terminological resources, we propose TermitUp, a service that generates enriched domain specific terminologies directly from corpora, and publishes them in open and structured formats. TermitUp is composed of five modules performing terminology extraction, terminology post-processing, terminology enrichment, term relation validation and RDF publication. As part of the pipeline implemented by this service, existing resources in the LLOD are linked with the resulting terminologies, contributing in this way to the population of the LLOD cloud. TermitUp has been used in the framework of European projects tackling different fields, such as the legal domain, with promising results. Different alternatives on how to model enriched terminologies are considered and good practices illustrated with examples are proposed.
... Existen diversos trabajos en la literatura que realizan la extracción de términos [6] [7] [8] [9]. En relación con el dominio de medicina, de acuerdo con Castro y sus colaboradores [10], para el caso del inglés, hay varias investigaciones orientadas al procesamiento de textos y de datos de ese dominio [11] [12], sin embargo, se encuentran pocas iniciativas para el español [13] [14] [15] [16] [17]. ...
... Vivaldi et al [16] utilizan la extracción de términos en un nuevo algoritmo de resumen automático de textos de medicina en español. Para la extracción, los autores utilizan YATE [24], la primera herramienta utilizada para identificar términos médicos a partir de artículos. ...
... Existen diversos trabajos en la literatura que realizan la extracción de términos [6,7,8,9]. En relación con el dominio de medicina, de acuerdo con Castro y sus colaboradores [10], para el caso del inglés, hay varias investigaciones orientadas al procesamiento de textos y de datos de ese dominio [11,12], sin embargo, se encuentran pocas iniciativas para el español [13,14,15,16,17]. A modo de aporte, en el presente trabajo se describen dos extracciones de términos realizadas a partir de la identificación de sintagmas nominales en un corpus de textos médicos en español. ...
... These systems are useful in several domains: medical (ex. Johnson et al., 2002 Afantenos et al., 2005 da Cunha et al., 2007; Vivaldi et al., 2010), legal (ex. Farzindar et al., 2004), journalistic (ex. ...
Conference Paper
Full-text available
In this paper we present REG, a graph approach to study a fundamental problem of Natural Language Processing: the automatic summarization of documents. The algorithm models a document as a graph, to obtain weighted sentences. We applied this approach to the INEX@QA 2010 task (question-answering). To do it, we have extracted the terms from the queries, in order to obtain a list of terms related with the main topic of the question. Using this strategy, REG obtained good results with the automatic evaluation system FRESA.
... – SUMMTERM [21], a terminology-based summarizer that is used for summarization of medical articles and uses specialized terminology for scoring and ranking sentences; ...
Article
Full-text available
We study a new content-based method for the evaluation of text summarization systems without human models which is used to produce system rankings. The research is carried out using a new content-based evaluation framework called FRESA to compute a variety of divergences among probability distributions. We apply our comparison framework to various well-established content-based evaluation measures in text summarization such as COVERAGE, RESPONSIVENESS, PYRAMIDS and ROUGE studying their associations in various text summarization tasks including generic multi-document summarization in English and French, focus-based multi-document summarization in English and generic single-document summarization in French and Spanish.
... Aunque existen trabajos sobre análisis discursivo automático para el portugués basados en la RST (Leal, Quaresma, and Chishman, 2006 ), la compresión de frases mediante esta estrategia se realizó de manera manual, debido a que no existe en la actualidad ningún analizador discursivo completo para el español que pueda detectar núcleos y satélites. Sin embargo, hay un proyecto vigente sobre el tema (da Cunha et al., 2007b; da Cunha et al., 2010), por lo que, en cuanto este analizador discursivo esté operativo, podremos llevar a cabo este tipo de compresión de manera automáti- ca. En la figura 1 mostramos unárbolunárbol discursivo con relaciones de la RST, que incluye una relación multinuclear de Lista y dos relaciones núcleo-satélite, de Concesión y de Elaboración . ...
Article
Full-text available
Resumen El objetivo de este trabajo de investigación es confirmar si es adecuado emplear la compresión de frases como recurso para la optimización de sistemas de resumen automático de documentos. Para ello, en primer lugar, creamos un corpus de resúmenes de documentos especializados (artículos médicos) producidos por diversos sistemas de resumen automático. Posteriormente realizamos dos tipos de compresiones de estos resúmenes. Por un lado, llevamos a cabo una compresión manual, siguiendo dos estrategias: la compresión mediante la eliminación intuitiva de algunos elementos de la oración y la compresión mediante la eliminación de ciertos elementos discursivos en el marco de la Rhetorical Structure Theory (RST). Por otro lado, realizamos una compresión automática por medio de varias estrategias, basadas en la eliminación de palabras de ciertas categorías gramaticales (adjetivos y adverbios) y una baseline de eliminación aleatoria de palabras. Finalmente, comparamos los resúmenes originales con los resúmenes comprimidos, mediante el sistema de evaluación Rouge. Los resultados muestran que, en ciertas condiciones, utilizar la compresión de frases puede ser beneficioso para mejorar el resumen automático de documentos.
... For experimentation in the TAC and the DUC datasets we directly use the peer summaries produced by systems participating in the evaluations . For experimentation in Spanish and French (single-document summarization) we have created summaries at the compression rates of the model summaries using the following summarization systems: @BULLET CORTEX (Torres-Moreno et al., 2002), a single-document sentence extraction system for Spanish and French that combines various statistical measures of relevance (angle between sentence and topic, various Hamming weights for sentences, etc.) and applies an optimal decision algorithm for sentence selection; @BULLET ENERTEX (Fernandez et al., 2007 ), a summarizer based on a theory of textual energy; @BULLET SUMMTERM (Vivaldi et al., 2010), a terminology-based summarizer that is used for summarization of medical articles and uses specialized terminology for scoring and ranking sentences; @BULLET the Copernic summarizer 9 . ...
Conference Paper
Full-text available
We study correlation of rankings of text summarization systems using evaluation methods with and without human models. We apply our comparison framework to various well-established content-based evaluation measures in text summarization such as coverage, Responsiveness, Pyramids and Rouge studying their associations in various text summarization tasks including generic and focus-based multi-document summarization in English and generic single-document summarization in French and Spanish. The research is carried out using a new content-based evaluation framework called Fresa to compute a variety of divergences among probability distributions.
... dominio [7,8], sin embargo, se encuentran pocas iniciativas para el español [9,10,11,12,13]. A modo de aporte, el método de detección aquí presentado se basa en información lingüística en la medida en que se ha podido observar que el análisis, y posterior modelización e implantación en máquina, de algunas expresiones sintácticas específicas pueden ayudar a las tareas de extracción de términos. ...
Article
Full-text available
Resumen Partiendo de la hipótesis de que las enumeraciones sintagmáticas nominales (ESN) que se encuentran en los textos médicos se componen de términos específicos del dominio, presentamos un método de reconocimiento de dichas enumeraciones con el objetivo de contribuir a la extracción automática. La metodología se conforma de tres etapas: (i) reconocimiento de enumeraciones sintagmáticas nominales, aquí se utiliza exclusivamente información lingüística, a partir de la cual se elaboran reglas de análisis sintáctico; (ii) extracción automática de los candidatos a términos que se correspondían con unigramas y bigramas, y (iii) evaluación de los candidatos extraídos con el asesoramiento de expertos del área médica. Los experimentos fueron realizados en el corpus IULA, conformado por textos médicos en español. Los resultados obtenidos fueron alentadores, ya que se logró un 67% y 68% de precisión en las enumeraciones detectadas para unigramas y bigramas respectivamente. Palabras Clave Extracción de Términos Médicos, Enumeraciones Sintagmáticas Nominales, Dominio de Medicina.
Conference Paper
In this paper we use REG, a graph-based system to study a fundamental problem of Natural Language Processing: the automatic summarization of documents. The algorithm models a document as a graph, to obtain weighted sentences. We applied this approach to the INEX@QA 2011 task (question-answering). We have extracted the title and some key or related words according to two people from the queries, in order to recover 50 documents from english wikipedia. Using this strategy, REG obtained good results with the automatic evaluation system FRESA.
Chapter
Full-text available
With the diffusion of online newspapers and social media, users are becoming capable of retrieving dozens of news articles covering the same topic in a short time. News article summarization is the task of automatically selecting a worthwhile subset of news' sentences that users could easily explore. Promising research directions in this field are the use of semantics-based models (e.g., ontologies and taxonomies) to identify key document topics and the integration of social data analysis to also consider the current user's interests during summary generation. The chapter overviews the most recent research advances in document summarization and presents a novel strategy to combine ontology-based and social knowledge for addressing the problem of generic (not query-based) multi-document summarization of news articles. To identify the most salient news articles' sentences, an ontology-based text analysis is performed during the summarization process. Furthermore, the social content acquired from real Twitter messages is separately analyzed to also consider the current interests of social network users for sentence evaluation. The combination of ontological and social knowledge allows the generation of accurate and easy-to-read news summaries. Moreover, the proposed summarizer performs better than the evaluated competitors on real news articles and Twitter messages.