Chart comparing distribution of human ratings for SPOT, RBS, ICF, NOAGG and RANDOM.

Source publication

Evaluating a Trainable Sentence Planner for a Spoken Dialogue System.

Conference Paper

Full-text available

Jan 2001

Context 1

... SPOT was statistically better than both of these systems (p .01). Figure 8 shows that SPOT got more high rankings than either of the rule- based systems. In a sense this may not be that surprising, because as Hovy and Wanner (1996) point out, it is difficult to construct a rule-based sentence planner that handles all the rule interac- tions in a reasonable way. ...

View in full-text

Context 2

... it would have been a pos- sible outcome for SPOT to not be different than either system, e.g. if the sp-trees produced by RANDOM were all equally good, or if the ag- gregation rules that SPOT learned produced out- put less readable than NOAGG. Figure 8 shows that the distributions of scores for SPOT vs. the baseline systems are very different, with SPOT skewed towards higher scores. ...

View in full-text

Functions of internal temporal dialogues

Chapter

Full-text available

Jun 2020

DIalogues paper

Data

Full-text available

Feb 2014

SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization

Conference Paper

Full-text available

Jan 2019

Multi-level discourse relations between dialogue units

Conference Paper

Full-text available

Jan 2011

Figure 2: Scatter plot of decoder hidden states

Figure 6: Average entropy of each decoding step for test set.

Figure 8: Average entropy in deciding 'EOS' and frequency of 'EOS'...

Figure 9: Correlation between word frequency and NLL. Each point...

Translation vs. Dialogue: A Comparative Analysis of Sequence-to-Sequence Modeling

Conference Paper

Full-text available

Jan 2020

Controllable Generation of Dialogue Acts for Dialogue Systems via Few-Shot Response Generation and Ranking

Preprint

Full-text available

Jul 2023

Dialogue systems need to produce responses that realize multiple types of dialogue acts (DAs) with high semantic fidelity. In the past, natural language generators (NLGs) for dialogue were trained on large parallel corpora that map from a domain-specific DA and its semantic attributes to an output utterance. Recent work shows that pretrained language models (LLMs) offer new possibilities for controllable NLG using prompt-based learning. Here we develop a novel few-shot overgenerate-and-rank approach that achieves the controlled generation of DAs. We compare eight few-shot prompt styles that include a novel method of generating from textual pseudo-references using a textual style transfer approach. We develop six automatic ranking functions that identify outputs with both the correct DA and high semantic accuracy at generation time. We test our approach on three domains and four LLMs. To our knowledge, this is the first work on NLG for dialogue that automatically ranks outputs using both DA and attribute accuracy. For completeness, we compare our results to fine-tuned few-shot models trained with 5 to 100 instances per DA. Our results show that several prompt settings achieve perfect DA accuracy, and near perfect semantic accuracy (99.81%) and perform better than few-shot fine-tuning.

Controllable Generation of Dialogue Acts for Dialogue Systems via Few-Shot Response Generation and Ranking

Conference Paper

Jan 2023

Jurassic is (almost) All You Need: Few-Shot Meaning-to-Text Generation for Open-Domain Dialogue

Preprint

Full-text available

Oct 2021

One challenge with open-domain dialogue systems is the need to produce high-quality responses on any topic. We aim to improve the quality and coverage of Athena, an Alexa Prize dialogue system. We utilize Athena's response generators (RGs) to create training data for two new neural Meaning-to-Text RGs, Athena-GPT-Neo and Athena-Jurassic, for the movies, music, TV, sports, and video game domains. We conduct few-shot experiments, both within and cross-domain, with different tuning set sizes (2, 3, 10), prompt formats, and meaning representations (MRs) for sets of WikiData KG triples, and dialogue acts with 14 possible attribute combinations. Our evaluation uses BLEURT and human evaluation metrics, and shows that with 10-shot tuning, Athena-Jurassic's performance is significantly better for coherence and semantic accuracy. Experiments with 2-shot tuning on completely novel MRs results in a huge performance drop for Athena-GPT-Neo, whose semantic accuracy falls to 0.41, and whose untrue hallucination rate increases to 12%. Experiments with dialogue acts for video games show that with 10-shot tuning, both models learn to control dialogue acts, but Athena-Jurassic has significantly higher coherence, and only 4% untrue hallucinations. Our results suggest that Athena-Jurassic can reliably produce outputs of high-quality for live systems with real users. To our knowledge, these are the first results demonstrating that few-shot tuning on a massive language model can create NLGs that generalize to new domains, and produce high-quality, semantically-controlled, conversational responses directly from MRs and KG triples.

Can Neural Generators for Dialogue Learn Sentence Planning and Discourse Structuring?

Preprint

Sep 2018

Responses in task-oriented dialogue systems often realize multiple propositions whose ultimate form depends on the use of sentence planning and discourse structuring operations. For example a recommendation may consist of an explicitly evaluative utterance e.g. Chanpen Thai is the best option, along with content related by the justification discourse relation, e.g. It has great food and service, that combines multiple propositions into a single phrase. While neural generation methods integrate sentence planning and surface realization in one end- to-end learning framework, previous work has not shown that neural generators can: (1) perform common sentence planning and discourse structuring operations; (2) make decisions as to whether to realize content in a single sentence or over multiple sentences; (3) generalize sentence planning and discourse relation operations beyond what was seen in training. We systematically create large training corpora that exhibit particular sentence planning operations and then test neural models to see what they learn. We compare models without explicit latent variables for sentence planning with ones that provide explicit supervision during training. We show that only the models with additional supervision can reproduce sentence planing and discourse operations and generalize to situations unseen in training.

Can Neural Generators for Dialogue Learn Sentence Planning and Discourse Structuring?

Conference Paper

Jan 2018

Harvesting Creative Templates for Generating Stylistically Varied Restaurant Reviews

Article

Sep 2017

Many of the creative and figurative elements that make language exciting are lost in translation in current natural language generation engines. In this paper, we explore a method to harvest templates from positive and negative reviews in the restaurant domain, with the goal of vastly expanding the types of stylistic variation available to the natural language generator. We learn hyperbolic adjective patterns that are representative of the strongly-valenced expressive language commonly used in either positive or negative reviews. We then identify and delexicalize entities, and use heuristics to extract generation templates from review sentences. We evaluate the learned templates against more traditional review templates, using subjective measures of "convincingness", "interestingness", and "naturalness". Our results show that the learned templates score highly on these measures. Finally, we analyze the linguistic categories that characterize the learned positive and negative templates. We plan to use the learned templates to improve the conversational style of dialogue systems in the restaurant domain.

Recent Advances in Natural Language Generation: A Survey and Classification of the Empirical Literature

Article

Full-text available

Jan 2017
COMPUT INFORM

Natural Language Generation (NLG) is defined as the systematic approach for producing human understandable natural language text based on non-textual data or from meaning representations. This is a significant area which empowers human-computer interaction. It has also given rise to a variety of theoretical as well as empirical approaches. This paper intends to provide a detailed overview and a classification of the state-of-the-art approaches in Natural Language Generation. The paper explores NLG architectures and tasks classed under document planning, micro-planning and surface realization modules. Additionally, this paper also identifies the gaps existing in the NLG research which require further work in order to make NLG a widely usable technology.

Harvesting Creative Templates for Generating Stylistically Varied Restaurant Reviews

Conference Paper

Jan 2017

How to Talk to Strangers: Generating Medical Reports for First-Time Users

Conference Paper

Full-text available

Jul 2016

We propose a novel approach for handling first-time users in the context of automatic report generation from time-series data in the health domain. Handling first-time users is a common problem for Natural Language Generation (NLG) and interactive systems in general-the system cannot adapt to users without prior interaction or user knowledge. In this paper, we propose a novel framework for generating medical reports for first-time users, using multi-objective optimisation (MOO) to account for the preferences of multiple possible user types, where the content preferences of potential users are modelled as objective functions. Our proposed approach outperforms two meaningful baselines in an evaluation with prospective users, yielding large (= .79) and medium (= .46) effect sizes respectively.

Data-driven approaches to content selection for data-to-text generation

Thesis

Full-text available

May 2015

Dimitra Gkatzia

Chart comparing distribution of human ratings for SPOT, RBS, ICF, NOAGG and RANDOM.

Contexts in source publication

Similar publications

Citations