Figure 3: CORTEX + SUMMTERM architecture.

TermitUp: Generation and enrichment of linked terminologies

Article

Full-text available

Sep 2022

Domain-specific terminologies play a central role in many language technology solutions. Substantial manual effort is still involved in the creation of such resources, and many of them are published in proprietary formats that cannot be easily reused in other applications. Automatic term extraction tools help alleviate this cumbersome task. However, their results are usually in the form of plain lists of terms or as unstructured data with limited linguistic information. Initiatives such as the Linguistic Linked Open Data cloud (LLOD) foster the publication of language resources in open structured formats, specifically RDF, and their linking to other resources on the Web of Data. In order to leverage the wealth of linguistic data in the LLOD and speed up the creation of linked terminological resources, we propose TermitUp, a service that generates enriched domain specific terminologies directly from corpora, and publishes them in open and structured formats. TermitUp is composed of five modules performing terminology extraction, terminology post-processing, terminology enrichment, term relation validation and RDF publication. As part of the pipeline implemented by this service, existing resources in the LLOD are linked with the resulting terminologies, contributing in this way to the population of the LLOD cloud. TermitUp has been used in the framework of European projects tackling different fields, such as the legal domain, with promising results. Different alternatives on how to model enriched terminologies are considered and good practices illustrated with examples are proposed.

Extracción terminológica en el dominio médico a partir del reconocimiento de sintagmas nominales

Article

Full-text available

Jan 2011

The REG Summarization System with Question Reformulation at QA@INEX Track 2010

Conference Paper

Full-text available

Dec 2010

In this paper we present REG, a graph approach to study a fundamental problem of Natural Language Processing: the automatic summarization of documents. The algorithm models a document as a graph, to obtain weighted sentences. We applied this approach to the INEX@QA 2010 task (question-answering). To do it, we have extracted the terms from the queries, in order to obtain a list of terms related with the main topic of the question. Using this strategy, REG obtained good results with the automatic evaluation system FRESA.

Summary Evaluation With and Without References

Article

Full-text available

Dec 2010

We study a new content-based method for the evaluation of text summarization systems without human models which is used to produce system rankings. The research is carried out using a new content-based evaluation framework called FRESA to compute a variety of divergences among probability distributions. We apply our comparison framework to various well-established content-based evaluation measures in text summarization such as COVERAGE, RESPONSIVENESS, PYRAMIDS and ROUGE studying their associations in various text summarization tasks including generic multi-document summarization in English and French, focus-based multi-document summarization in English and generic single-document summarization in French and Spanish.

La compresión de frases: un recurso para la optimización de resumen automático de documentos

Article

Full-text available

Dec 2010

Resumen El objetivo de este trabajo de investigación es confirmar si es adecuado emplear la compresión de frases como recurso para la optimización de sistemas de resumen automático de documentos. Para ello, en primer lugar, creamos un corpus de resúmenes de documentos especializados (artículos médicos) producidos por diversos sistemas de resumen automático. Posteriormente realizamos dos tipos de compresiones de estos resúmenes. Por un lado, llevamos a cabo una compresión manual, siguiendo dos estrategias: la compresión mediante la eliminación intuitiva de algunos elementos de la oración y la compresión mediante la eliminación de ciertos elementos discursivos en el marco de la Rhetorical Structure Theory (RST). Por otro lado, realizamos una compresión automática por medio de varias estrategias, basadas en la eliminación de palabras de ciertas categorías gramaticales (adjetivos y adverbios) y una baseline de eliminación aleatoria de palabras. Finalmente, comparamos los resúmenes originales con los resúmenes comprimidos, mediante el sistema de evaluación Rouge. Los resultados muestran que, en ciertas condiciones, utilizar la compresión de frases puede ser beneficioso para mejorar el resumen automático de documentos.

Multilingual Summarization Evaluation without Human Models.

Conference Paper

Full-text available

Jan 2010

We study correlation of rankings of text summarization systems using evaluation methods with and without human models. We apply our comparison framework to various well-established content-based evaluation measures in text summarization such as coverage, Responsiveness, Pyramids and Rouge studying their associations in various text summarization tasks including generic and focus-based multi-document summarization in English and generic single-document summarization in French and Spanish. The research is carried out using a new content-based evaluation framework called Fresa to compute a variety of divergences among probability distributions.

Identificación de Términos a partir de Enumeraciones Sintagmáticas Nominales: Una Aplicación al Dominio Médico

Article

Full-text available

Resumen Partiendo de la hipótesis de que las enumeraciones sintagmáticas nominales (ESN) que se encuentran en los textos médicos se componen de términos específicos del dominio, presentamos un método de reconocimiento de dichas enumeraciones con el objetivo de contribuir a la extracción automática. La metodología se conforma de tres etapas: (i) reconocimiento de enumeraciones sintagmáticas nominales, aquí se utiliza exclusivamente información lingüística, a partir de la cual se elaboran reglas de análisis sintáctico; (ii) extracción automática de los candidatos a términos que se correspondían con unigramas y bigramas, y (iii) evaluación de los candidatos extraídos con el asesoramiento de expertos del área médica. Los experimentos fueron realizados en el corpus IULA, conformado por textos médicos en español. Los resultados obtenidos fueron alentadores, ya que se logró un 67% y 68% de precisión en las enumeraciones detectadas para unigramas y bigramas respectivamente. Palabras Clave Extracción de Términos Médicos, Enumeraciones Sintagmáticas Nominales, Dominio de Medicina.

Bibliography

Chapter

Sep 2014

Juan‐Manuel Torres‐Moreno

A Graph-Based Summarization System at QA@INEX Track 2011

Conference Paper

Dec 2011

In this paper we use REG, a graph-based system to study a fundamental problem of Natural Language Processing: the automatic summarization of documents. The algorithm models a document as a graph, to obtain weighted sentences. We applied this approach to the INEX@QA 2011 task (question-answering). We have extracted the title and some key or related words according to two people from the queries, in order to recover 50 documents from english wikipedia. Using this strategy, REG obtained good results with the automatic evaluation system FRESA.

Combining Semantics and Social Knowledge for News Article Summarization

Chapter

Full-text available

Jan 2014

With the diffusion of online newspapers and social media, users are becoming capable of retrieving dozens of news articles covering the same topic in a short time. News article summarization is the task of automatically selecting a worthwhile subset of news' sentences that users could easily explore. Promising research directions in this field are the use of semantics-based models (e.g., ontologies and taxonomies) to identify key document topics and the integration of social data analysis to also consider the current user's interests during summary generation. The chapter overviews the most recent research advances in document summarization and presents a novel strategy to combine ontology-based and social knowledge for addressing the problem of generic (not query-based) multi-document summarization of news articles. To identify the most salient news articles' sentences, an ontology-based text analysis is performed during the summarization process. Furthermore, the social content acquired from real Twitter messages is separately analyzed to also consider the current interests of social network users for sentence evaluation. The combination of ontological and social knowledge allows the generation of accurate and easy-to-read news summaries. Moreover, the proposed summarizer performs better than the evaluated competitors on real news articles and Twitter messages.

CORTEX + SUMMTERM architecture.

Context in source publication

Citations