Example of an dialogue move constituted with three internal decisions.

Source publication

Hybridisation of expertise and reinforcement learning in dialogue systems

Conference Paper

Full-text available

Sep 2009

Context 1

... Problem Constraints Figure 1 shows a part of the design of an automated hotline dedicated to ADSL box installation. It reveals that the welcome message can be split up into three internal decisions: nature of the greeting/presentation, the help messages insertion or not and finally some choices of questions. ...

View in full-text

Context 2

... figure 1, the modules are the diamond-shaped boxes la- belled 1, 2 and 3. For module 1, the possible internal actions are the 1.1, 1.2 and 1.3 transitions. ...

View in full-text

Context 3

... scope of this contribution is enhanced in three direc- tions. First, figure 1 has for purpose to illustrate in an automaton design how internal decisions can combine and become a single dialogue move. But the MVDP framework does not rely on the automaton structure: the system gathers internal decisions in- dependently of the internal structure, collects rewards, builds a corpus and uses it to learn a best practice policy. ...

View in full-text

Deep Reinforcement Learning for Dialogue Generation

Conference Paper

Full-text available

Jan 2016

End-to-End Gujarati Task-Oriented Dialogue Management using Reinforcement Learning

Article

Full-text available

May 2024

Figure 3: Learning to choose between Confirmation Question and...

Figure 4: Plots showing the performance of multi-clarification learned...

Combining Cognitive Modeling and Reinforcement Learning for Clarification in Dialogue

Conference Paper

Full-text available

Jan 2020

Sub-domain Modelling for Dialogue Management with Hierarchical Reinforcement Learning

Conference Paper

Full-text available

Jan 2017

Learning dialogue policies using state aggregation in reinforcement learning

Conference Paper

Full-text available

Oct 2004

Reinforcement learning for Dialogue Systems optimization with user adaptation.

Thesis

Full-text available

Dec 2019

Nicolas Carrara

The most powerful artificial intelligence systems are now based on learned statisticalmodels. In order to build efficient models, these systems must collect a huge amountof data on their environment. Personal assistants, smart-homes, voice-servers and otherdialogue applications are no exceptions to this statement. A specificity of those systems isthat they are designed to interact with humans, and as a consequence, their training datahas to be collected from interactions with these humans. As the number of interactionswith a single person is often too scarce to train a proper model, the usual approach tomaximise the amount of data consists in mixing data collected with different users into asingle corpus.However, one limitation of this approach is that, by construction, the trained modelsare only efficient with an "average" human and do not include any sort of adaptation; thislack of adaptation makes the service unusable for some specific group of persons andleads to a restricted customers base and inclusiveness problems. This thesis proposessolutions to construct Dialogue Systems that are robust to this problem by combiningTransfer Learning and Reinforcement Learning. It explores two main ideas:The first idea of this thesis consists in incorporating adaptation in the very first dialogueswith a new user. To that extend, we use the knowledge gathered with previous users. Buthow to scale such systems with a growing database of user interactions? The first proposedapproach involves clustering of Dialogue Systems (tailored for their respective user)based on their behaviours. We demonstrated through handcrafted and real user-modelsexperiments how this method improves the dialogue quality for new and unknown users.The second approach extends the Deep Q-learning algorithm with a continuous transferprocess.The second idea states that before using a dedicated Dialogue System, the first in-teractions with a user should be handled carefully by a safe Dialogue System commonto all users. The underlying approach is divided in two steps. The first step consists inlearning a safe strategy through Reinforcement Learning. To that extent, we introduced abudgeted Reinforcement Learning framework for continuous state space and the underlyingextensions of classic Reinforcement Learning algorithms. In particular, the safe version ofthe Fitted-Q algorithm has been validated, in term of safety and efficiency, on a dialoguesystem tasks and an autonomous driving problem. The second step consists in using thosesafe strategies when facing new users; this method is an extension of the classic ε-greedyalgorithm.

Reinforcement Learning Algorithm Selection

Conference Paper

Full-text available

Jan 2018

Dialogue systems rely on a careful reinforcement learning design: the learning algorithm and its state space representation. In lack of more rigorous knowledge, the designer resorts to its practical experience to choose the best option. In order to automate and to improve the performance of the aforementioned process, this article formalises the problem of online off-policy reinforcement learning algorithm selection. A meta-algorithm is given for input a portfolio constituted of several off-policy reinforcement learning algorithms. It then determines at the beginning of each new trajectory, which algorithm in the portfolio is in control of the behaviour during the full next trajectory, in order to maximise the return. The article presents a novel meta-algorithm, called Epochal Stochastic Bandit Algorithm Selection (ESBAS). Its principle is to freeze the policy updates at each epoch, and to leave a rebooted stochastic bandit in charge of the algorithm selection. Under some assumptions, a thorough theoretical analysis demonstrates its near-optimality considering the structural sampling budget limitations. Then, ESBAS is put to the test in a set of experiments with various portfolios, on a negotiation dialogue game. The results show the practical benefits of the algorithm selection for dialogue systems, in most cases even outperforming the best algorithm in the portfolio, even when the aforementioned assumptions are transgressed.

The Negotiation Dialogue Game

Chapter

Dec 2017

This article presents the design of a generic negotiation dialogue game between two or more players. The goal is to reach an agreement, each player having his own preferences over a shared set of options. Several simulated users have been implemented. An MDP policy has been optimised individually with Fitted Q-Iteration for several user instances. Then, the learnt policies have been cross evaluated with other users. Results show strong disparity of inter-user performances. This illustrates the importance of user adaptation in negotiation-based dialogue systems.

The Complex Negotiation Dialogue Game

Article

Full-text available

Jul 2017

Romain Laroche

This position paper formalises an abstract model for complex negotiation dialogue. This model is to be used for the benchmark of optimisation algorithms ranging from Reinforcement Learning to Stochastic Games, through Transfer Learning, One-Shot Learning or others.

Multi-Advisor Reinforcement Learning

Article

Full-text available

Apr 2017

This article deals with a novel branch of Separation of Concerns, called Multi-Advisor Reinforcement Learning (MAd-RL), where a single-agent RL problem is distributed to $n$ learners, called advisors. Each advisor tries to solve the problem with a different focus. Their advice is then communicated to an aggregator, which is in control of the system. For the local training, three off-policy bootstrapping methods are proposed and analysed: local-max bootstraps with the local greedy action, rand-policy bootstraps with respect to the random policy, and agg-policy bootstraps with respect to the aggregator's greedy policy. MAd-RL is positioned as a generalisation of Reinforcement Learning with Ensemble methods. An experiment is held on a simplified version of the Ms. Pac-Man Atari game. The results confirm the theoretical relative strengths and weaknesses of each method.

Frames: A Corpus for Adding Memory to Goal-Oriented Dialogue Systems

Article

Mar 2017

This paper presents the Frames dataset (Frames is available at http://datasets.maluuba.com/Frames), a corpus of 1369 human-human dialogues with an average of 15 turns per dialogue. We developed this dataset to study the role of memory in goal-oriented dialogue systems. Based on Frames, we introduce a task called frame tracking, which extends state tracking to a setting where several states are tracked simultaneously. We propose a baseline model for this task. We show that Frames can also be used to study memory in dialogue management and information presentation through natural language generation.

Frames: a corpus for adding memory to goal-oriented dialogue systems

Conference Paper

Jan 2017

Towards a Virtual Personal Assistant Based on a User-Defined Portfolio of Multi-Domain Vocal Applications

Conference Paper

Dec 2016

This paper proposes a novel approach to defining and simulating a new generation of virtual personal assistants as multi-application multi-domain distributed dialogue systems. The first contribution is the assistant architecture, composed of independent third-party applications handled by a Dispatcher. In this view, applications are black-boxes responding with a self-scored answer to user requests. Next, the Dispatcher distributes the current request to the most relevant application, based on these scores and the context (history of interaction etc.), and conveys its answer to the user. To address variations in the user-defined portfolio of applications, the second contribution , a stochastic model automates the online optimisa-tion of the Dispatcher's behaviour. To evaluate the learnabil-ity of the Dispatcher's policy, several parametrisations of the user and application simulators are enabled, in such a way that they cover variations of realistic situations. Results confirm in all considered configurations of interest, that reinforcement learning can learn adapted strategies. Index Terms— multi-application spoken dialogue systems , multi-domain, dialogue strategy, reinforcement learning .

NASTIA: Negotiating Appointment Setting Interface

Conference Paper

Full-text available

May 2014

This paper describes a French Spoken Dialogue System (SDS) named NASTIA (Negotiating Appointment SeTting InterfAce). Appointment scheduling is a hybrid task halfway between slot-filling and negotiation. NASTIA implements three different negotiation strategies. These strategies were tested on 1734 dialogues with 385 users who interacted at most 5 times with the SDS and gave a rating on a scale of 1 to 10 for each dialogue. Previous appointment scheduling systems were evaluated with the same experimental protocol. NASTIA is different from these systems in that it can adapt its strategy during the dialogue. The highest system task completion rate with these systems was 81\% whereas NASTIA had an 88\% average and its best performing strategy even reached 92\%. This strategy also significantly outperformed previous systems in terms of overall user rating with an average of 8.28 against 7.40. The experiment also enabled highlighting global recommendations for building spoken dialogue systems.

DINASTI: Dialogues with a Negotiating Appointment Setting Interface

Conference Paper

Full-text available

May 2014

This paper describes the DINASTI (DIalogues with a Negotiating Appointment SeTting Interface) corpus, which is composed of 1734 dialogues with the French spoken dialogue system NASTIA (Negotiating Appointment SeTting InterfAce). NASTIA is a reinforcement learning-based system. The DINASTI corpus was collected while the system was following a uniform policy. Each entry of the corpus is a system-user exchange annotated with 120 automatically computable features.The corpus contains a total of 21587 entries, with 385 testers. Each tester performed at most five scenario-based interactions with NASTIA. The dialogues last an average of 10.82 dialogue turns, with 4.45 reinforcement learning decisions. The testers filled an evaluation questionnaire after each dialogue. The questionnaire includes three questions to measure task completion. In addition, it comprises 7 Likert-scaled items evaluating several aspects of the interaction, a numerical overall evaluation on a scale of 1 to 10, and a free text entry. Answers to this questionnaire are provided with DINASTI. This corpus is meant for research on reinforcement learning modelling for dialogue management.

Example of an dialogue move constituted with three internal decisions.

Contexts in source publication

Similar publications

Citations