Figure 1 - uploaded by Romain Laroche
Content may be subject to copyright.
Example of an dialogue move constituted with three internal decisions.

Example of an dialogue move constituted with three internal decisions.

Contexts in source publication

Context 1
... Problem Constraints Figure 1 shows a part of the design of an automated hotline dedicated to ADSL box installation. It reveals that the welcome message can be split up into three internal decisions: nature of the greeting/presentation, the help messages insertion or not and finally some choices of questions. ...
Context 2
... figure 1, the modules are the diamond-shaped boxes la- belled 1, 2 and 3. For module 1, the possible internal actions are the 1.1, 1.2 and 1.3 transitions. ...
Context 3
... scope of this contribution is enhanced in three direc- tions. First, figure 1 has for purpose to illustrate in an automaton design how internal decisions can combine and become a single dialogue move. But the MVDP framework does not rely on the automaton structure: the system gathers internal decisions in- dependently of the internal structure, collects rewards, builds a corpus and uses it to learn a best practice policy. ...

Citations

... Indeed, the first interactions with the user are crucials to build a bond between the service and the user; taking random actions, using ε-greedy, to gather information about the environment (including the humain) may just end-up with user dropouts. To tackle this problem, it is possible to contraint the action-space to get the relevant actions for a given state (Singh et al. 2002;Jason D Williams et al. 2008;Laroche, Putois, Bretier, and Bouchon-Meunier 2009;. As those solutions necessitate an expertise on RL, new studies introduced techniques suited to an average developer; to name a few: RL results monitoring , convergence speed prediction (El Asri and Laroche 2013) or interaction quality prediction (El Asri, Khouzaimi, et al. 2014). ...
Thesis
Full-text available
The most powerful artificial intelligence systems are now based on learned statisticalmodels. In order to build efficient models, these systems must collect a huge amountof data on their environment. Personal assistants, smart-homes, voice-servers and otherdialogue applications are no exceptions to this statement. A specificity of those systems isthat they are designed to interact with humans, and as a consequence, their training datahas to be collected from interactions with these humans. As the number of interactionswith a single person is often too scarce to train a proper model, the usual approach tomaximise the amount of data consists in mixing data collected with different users into asingle corpus.However, one limitation of this approach is that, by construction, the trained modelsare only efficient with an "average" human and do not include any sort of adaptation; thislack of adaptation makes the service unusable for some specific group of persons andleads to a restricted customers base and inclusiveness problems. This thesis proposessolutions to construct Dialogue Systems that are robust to this problem by combiningTransfer Learning and Reinforcement Learning. It explores two main ideas:The first idea of this thesis consists in incorporating adaptation in the very first dialogueswith a new user. To that extend, we use the knowledge gathered with previous users. Buthow to scale such systems with a growing database of user interactions? The first proposedapproach involves clustering of Dialogue Systems (tailored for their respective user)based on their behaviours. We demonstrated through handcrafted and real user-modelsexperiments how this method improves the dialogue quality for new and unknown users.The second approach extends the Deep Q-learning algorithm with a continuous transferprocess.The second idea states that before using a dedicated Dialogue System, the first in-teractions with a user should be handled carefully by a safe Dialogue System commonto all users. The underlying approach is divided in two steps. The first step consists inlearning a safe strategy through Reinforcement Learning. To that extent, we introduced abudgeted Reinforcement Learning framework for continuous state space and the underlyingextensions of classic Reinforcement Learning algorithms. In particular, the safe version ofthe Fitted-Q algorithm has been validated, in term of safety and efficiency, on a dialoguesystem tasks and an autonomous driving problem. The second step consists in using thosesafe strategies when facing new users; this method is an extension of the classic ε-greedyalgorithm.
... In particular, for 20 years, research has developed and applied RL algorithms for Spoken Dialogue Systems, involving a large range of dialogue models and algorithms to optimise them. Just to cite a few algorithms: Monte Carlo (Levin & Pieraccini, 1997), Q-Learning (Walker et al., 1998), SARSA (Frampton & Lemon, 2006 ), MVDP algorithms (Laroche et al., 2009 ), Kalman Temporal Difference (Pietquin et al., 2011), Fitted-Q Iteration (), Gaussian Process RL (Gaši´cGaši´c et al., 2010), and more recently Deep RL (Fatemi et al., 2016 ). Additionally , most of them require the setting of hyper parameters and a state space representation. ...
Conference Paper
Full-text available
Dialogue systems rely on a careful reinforcement learning design: the learning algorithm and its state space representation. In lack of more rigorous knowledge, the designer resorts to its practical experience to choose the best option. In order to automate and to improve the performance of the aforementioned process, this article formalises the problem of online off-policy reinforcement learning algorithm selection. A meta-algorithm is given for input a portfolio constituted of several off-policy reinforcement learning algorithms. It then determines at the beginning of each new trajectory, which algorithm in the portfolio is in control of the behaviour during the full next trajectory, in order to maximise the return. The article presents a novel meta-algorithm, called Epochal Stochastic Bandit Algorithm Selection (ESBAS). Its principle is to freeze the policy updates at each epoch, and to leave a rebooted stochastic bandit in charge of the algorithm selection. Under some assumptions, a thorough theoretical analysis demonstrates its near-optimality considering the structural sampling budget limitations. Then, ESBAS is put to the test in a set of experiments with various portfolios, on a negotiation dialogue game. The results show the practical benefits of the algorithm selection for dialogue systems, in most cases even outperforming the best algorithm in the portfolio, even when the aforementioned assumptions are transgressed.
... Research on negotiation dialogue experiences a growth of interest. At first, Reinforcement Learning [1], the most popular framework for dialogue management in spoken dialogue systems [2][3][4], has been applied to negotiation with mitigated results [5,6], because the non-stationary policy of the opposing player prevents those algorithms from converging consistently. Then, Multi-Agent Reinforcement Learning [7] was applied but still with convergence difficulties [8]. ...
Chapter
This article presents the design of a generic negotiation dialogue game between two or more players. The goal is to reach an agreement, each player having his own preferences over a shared set of options. Several simulated users have been implemented. An MDP policy has been optimised individually with Fitted Q-Iteration for several user instances. Then, the learnt policies have been cross evaluated with other users. Results show strong disparity of inter-user performances. This illustrates the importance of user adaptation in negotiation-based dialogue systems.
... Research on negotiation dialogue experiences a growth of interest. At first, Reinforcement Learning (Sutton and Barto, 1998), the most popular framework for dialogue management in dialogue systems (Levin and Pieraccini, 1997;Laroche et al., 2009;Lemon and Pietquin, 2012), was applied to negotiation with mitigated results (English and Heeman, 2005;Georgila and Traum, 2011;Lewis et al., 2017), because the non-stationary policy of the opposing player prevents those algorithms from converging consistently. ...
Article
Full-text available
This position paper formalises an abstract model for complex negotiation dialogue. This model is to be used for the benchmark of optimisation algorithms ranging from Reinforcement Learning to Stochastic Games, through Transfer Learning, One-Shot Learning or others.
... 7. Factorisation of actions [Laroche et al., 2009]: each advisor is responsible for a specific action dimension: for instance a robot might control its legs and its arms with different advisors. ...
Article
Full-text available
This article deals with a novel branch of Separation of Concerns, called Multi-Advisor Reinforcement Learning (MAd-RL), where a single-agent RL problem is distributed to $n$ learners, called advisors. Each advisor tries to solve the problem with a different focus. Their advice is then communicated to an aggregator, which is in control of the system. For the local training, three off-policy bootstrapping methods are proposed and analysed: local-max bootstraps with the local greedy action, rand-policy bootstraps with respect to the random policy, and agg-policy bootstraps with respect to the aggregator's greedy policy. MAd-RL is positioned as a generalisation of Reinforcement Learning with Ensemble methods. An experiment is held on a simplified version of the Ms. Pac-Man Atari game. The results confirm the theoretical relative strengths and weaknesses of each method.
... The first action consists of informing the user about the price of the regal resort and the second action consists of proposing another option, Hotel Globetrotter. Performing more than one action per turn is a challenge when using reinforcement learning (Fatemi et al., 2016;Gašić et al., 2012;Pietquin et al., 2011) and, to our knowledge, this has only been done in a simulated setting (Laroche et al., 2009). ...
Article
This paper presents the Frames dataset (Frames is available at http://datasets.maluuba.com/Frames), a corpus of 1369 human-human dialogues with an average of 15 turns per dialogue. We developed this dataset to study the role of memory in goal-oriented dialogue systems. Based on Frames, we introduce a task called frame tracking, which extends state tracking to a setting where several states are tracked simultaneously. We propose a baseline model for this task. We show that Frames can also be used to study memory in dialogue management and information presentation through natural language generation.
... The first action consists of informing the user about the price of the regal resort and the second action consists of proposing another option, Hotel Globetrotter. Performing more than one action per turn is a challenge when using reinforcement learning (Pietquin et al., 2011;Gašić et al., 2012;Fatemi et al., 2016) and, to our knowledge, has only been tackled in a simulated setting (Laroche et al., 2009). ...
... Indeed, clarification plays a major role in multi-domain dialogue due to domain overlaps, disambiguation of the user's intent or wrong assumptions about domains. Reinforcement Learning (RL) [10] based systems [11,12,13] are also considered where dialogue is seen as an optimization problem leading to optimal strategies. Unlike local classification, RL offers the advantage of considering the dialogue as a sequence of action selection to achieve a long term goal and can therefore integrate clarification steps for the sake of the whole dialogue success. ...
Conference Paper
This paper proposes a novel approach to defining and simulating a new generation of virtual personal assistants as multi-application multi-domain distributed dialogue systems. The first contribution is the assistant architecture, composed of independent third-party applications handled by a Dispatcher. In this view, applications are black-boxes responding with a self-scored answer to user requests. Next, the Dispatcher distributes the current request to the most relevant application, based on these scores and the context (history of interaction etc.), and conveys its answer to the user. To address variations in the user-defined portfolio of applications, the second contribution , a stochastic model automates the online optimisa-tion of the Dispatcher's behaviour. To evaluate the learnabil-ity of the Dispatcher's policy, several parametrisations of the user and application simulators are enabled, in such a way that they cover variations of realistic situations. Results confirm in all considered configurations of interest, that reinforcement learning can learn adapted strategies. Index Terms— multi-application spoken dialogue systems , multi-domain, dialogue strategy, reinforcement learning .
... System 3 was also Reinforcement Learningbased (RL). In this case, RL was cast as a Module-Variable Decision Process (MVDP, [Laroche et al., 2009]). This system was designed to assess the influence of Text-To-Speech (TTS) prosody on users' behaviour. ...
... RL was integrated into this automaton thanks to the MVDP hybrid framework [Laroche et al., 2009]. MVDP enables to integrate one or several point(s) of choice in a dialogue phase, each point of choice having to choose between different actions according to its current internal context. ...
Conference Paper
Full-text available
This paper describes a French Spoken Dialogue System (SDS) named NASTIA (Negotiating Appointment SeTting InterfAce). Appointment scheduling is a hybrid task halfway between slot-filling and negotiation. NASTIA implements three different negotiation strategies. These strategies were tested on 1734 dialogues with 385 users who interacted at most 5 times with the SDS and gave a rating on a scale of 1 to 10 for each dialogue. Previous appointment scheduling systems were evaluated with the same experimental protocol. NASTIA is different from these systems in that it can adapt its strategy during the dialogue. The highest system task completion rate with these systems was 81\% whereas NASTIA had an 88\% average and its best performing strategy even reached 92\%. This strategy also significantly outperformed previous systems in terms of overall user rating with an average of 8.28 against 7.40. The experiment also enabled highlighting global recommendations for building spoken dialogue systems.
... Reinforcement Learning (RL) [Sutton and Barto, 1998] has been a popular technique to optimise the behaviour of SDS [Levin et al., 1997, Williams and Young, 2007, Pietquin and Dutoit, 2006]. NASTIA's dialogue manager is an RL agent implemented as a Module-Variable Decision Process (MVDP) [Laroche et al., 2009]. Dialogue is modelled as a sequence of states and actions among which the system has to choose. ...
Conference Paper
Full-text available
This paper describes the DINASTI (DIalogues with a Negotiating Appointment SeTting Interface) corpus, which is composed of 1734 dialogues with the French spoken dialogue system NASTIA (Negotiating Appointment SeTting InterfAce). NASTIA is a reinforcement learning-based system. The DINASTI corpus was collected while the system was following a uniform policy. Each entry of the corpus is a system-user exchange annotated with 120 automatically computable features.The corpus contains a total of 21587 entries, with 385 testers. Each tester performed at most five scenario-based interactions with NASTIA. The dialogues last an average of 10.82 dialogue turns, with 4.45 reinforcement learning decisions. The testers filled an evaluation questionnaire after each dialogue. The questionnaire includes three questions to measure task completion. In addition, it comprises 7 Likert-scaled items evaluating several aspects of the interaction, a numerical overall evaluation on a scale of 1 to 10, and a free text entry. Answers to this questionnaire are provided with DINASTI. This corpus is meant for research on reinforcement learning modelling for dialogue management.