Adaptable Conversational Machines

Nurul Lubis; Michael Heck; Carel Van Niekerk; Milica Gasic

doi:10.1609/aimag.v41i3.5322

Adaptable Conversational Machines

AI Magazine ◽

10.1609/aimag.v41i3.5322 ◽

2020 ◽

Vol 41 (3) ◽

pp. 28-44

Author(s):

Nurul Lubis ◽

Michael Heck ◽

Carel Van Niekerk ◽

Milica Gasic

Keyword(s):

Speech Synthesis ◽

Human Performance ◽

Intrinsic Property ◽

Dialogue Systems ◽

Machine Learning Methods ◽

Systems Research ◽

Language Technology ◽

Dialogue Modeling ◽

Level Performance ◽

Typical Solution

In recent years we have witnessed a surge in machine learning methods that provide machines with conversational abilities. Most notably, neural-network–based systems have set the state of the art for difficult tasks such as speech recognition, semantic understanding, dialogue management, language generation, and speech synthesis. Still, unlike for the ancient game of Go for instance, we are far from achieving human-level performance in dialogue. The reasons for this are numerous. One property of human–human dialogue that stands out is the infinite number of possibilities of expressing oneself during the conversation, even when the topic of the conversation is restricted. A typical solution to this problem was scaling-up the data. The most prominent mantra in speech and language technology has been “There is no data like more data.” However, the researchers now are focused on building smarter algorithms — algorithms that can learn efficiently from just a few examples. This is an intrinsic property of human behavior: an average human sees during their lifetime a fraction of data that we nowadays present to machines. A human can even have an intuition about a solution before ever experiencing an example solution. The human-inspired ability to adapt may just be one of the keys in pushing dialogue systems toward human performance. This article reviews advancements in dialogue systems research with a focus on the adaptation methods for dialogue modeling, and ventures to have a glance at the future of research on adaptable conversational machines.

Download Full-text

A survey on metrics for the evaluation of user simulations

The Knowledge Engineering Review ◽

10.1017/s0269888912000343 ◽

2012 ◽

Vol 28 (1) ◽

pp. 59-73 ◽

Cited By ~ 22

Author(s):

Olivier Pietquin ◽

Helen Hastie

Keyword(s):

Management Strategies ◽

Research Area ◽

Dialogue Systems ◽

Spoken Dialogue Systems ◽

Machine Learning Methods ◽

User Simulation ◽

Open Issue ◽

Important Research Area ◽

Assessment Metrics

AbstractUser simulation is an important research area in the field of spoken dialogue systems (SDSs) because collecting and annotating real human–machine interactions is often expensive and time-consuming. However, such data are generally required for designing, training and assessing dialogue systems. User simulations are especially needed when using machine learning methods for optimizing dialogue management strategies such as Reinforcement Learning, where the amount of data necessary for training is larger than existing corpora. The quality of the user simulation is therefore of crucial importance because it dramatically influences the results in terms of SDS performance analysis and the learnt strategy. Assessment of the quality of simulated dialogues and user simulation methods is an open issue and, although assessment metrics are required, there is no commonly adopted metric. In this paper, we give a survey of User Simulations Metrics in the literature, propose some extensions and discuss these metrics in terms of a list of desired features.

Download Full-text

Human-Centered Recommender Systems: Origins, Advances, Challenges, and Opportunities

AI Magazine ◽

10.1609/aimag.v42i3.18142 ◽

2021 ◽

Vol 42 (3) ◽

pp. 31-42

Author(s):

Joseph Konstan ◽

Loren Terveen

Keyword(s):

Machine Learning ◽

Recommender Systems ◽

Human Performance ◽

Optimization Techniques ◽

Research And Practice ◽

Systems Research ◽

Challenges And Opportunities ◽

Human Decision ◽

Performance Statistics ◽

System Designs

From the earliest days of the field, Recommender Systems research and practice has struggled to balance and integrate approaches that focus on recommendation as a machine learning or missing-value problem with ones that focus on machine learning as a discovery tool and perhaps persuasion platform. In this article, we review 25 years of recommender systems research from a human-centered perspective, looking at the interface and algorithm studies that advanced our understanding of how system designs can be tailored to users objectives and needs. At the same time, we show how external factors, including commercialization and technology developments, have shaped research on human-centered recommender systems. We show how several unifying frameworks have helped developers and researchers alike incorporate thinking about user experience and human decision-making into their designs. We then review the challenges, and the opportunities, in today’s recommenders, looking at how deep learning and optimization techniques can integrate with both interface designs and human performance statistics to improve recommender effectiveness and usefulness

Download Full-text

CrossWOZ: A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00314 ◽

2020 ◽

Vol 8 ◽

pp. 281-295

Author(s):

Qi Zhu ◽

Kaili Huang ◽

Zheng Zhang ◽

Xiaoyan Zhu ◽

Minlie Huang

Keyword(s):

Large Scale ◽

Dialogue Systems ◽

Wizard Of Oz ◽

Cross Domain ◽

Large Size ◽

User Simulation ◽

Dialogue Acts ◽

Dialogue Modeling ◽

State Tracking ◽

Task Oriented

To advance multi-domain (cross-domain) dialogue modeling as well as alleviate the shortage of Chinese task-oriented datasets, we propose CrossWOZ, the first large-scale Chinese Cross-Domain Wizard-of-Oz task-oriented dataset. It contains 6K dialogue sessions and 102K utterances for 5 domains, including hotel, restaurant, attraction, metro, and taxi. Moreover, the corpus contains rich annotation of dialogue states and dialogue acts on both user and system sides. About 60% of the dialogues have cross-domain user goals that favor inter-domain dependency and encourage natural transition across domains in conversation. We also provide a user simulator and several benchmark models for pipelined task-oriented dialogue systems, which will facilitate researchers to compare and evaluate their models on this corpus. The large size and rich annotation of CrossWOZ make it suitable to investigate a variety of tasks in cross-domain dialogue modeling, such as dialogue state tracking, policy learning, user simulation, etc.

Download Full-text

Modeling Dialogues with Hashcode Representations: A Nonparametric Approach

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5813 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3970-3979

Author(s):

Sahil Garg ◽

Irina Rish ◽

Guillermo Cecchi ◽

Palash Goyal ◽

Sarik Ghazarian ◽

...

Keyword(s):

Real Life ◽

Selection Criterion ◽

Kernel Functions ◽

Dialogue Systems ◽

Modeling Framework ◽

Training Time ◽

Nonparametric Approach ◽

Order Of Magnitude ◽

Dialogue Modeling ◽

Nonparametric Kernel

We propose a novel dialogue modeling framework, the first-ever nonparametric kernel functions based approach for dialogue modeling, which learns hashcodes as text representations; unlike traditional deep learning models, it handles well relatively small datasets, while also scaling to large ones. We also derive a novel lower bound on mutual information, used as a model-selection criterion favoring representations with better alignment between the utterances of participants in a collaborative dialogue setting, as well as higher predictability of the generated responses. As demonstrated on three real-life datasets, including prominently psychotherapy sessions, the proposed approach significantly outperforms several state-of-art neural network based dialogue systems, both in terms of computational efficiency, reducing training time from days or weeks to hours, and the response quality, achieving an order of magnitude improvement over competitors in frequency of being chosen as the best model by human evaluators.

Download Full-text

Expressivity in Interactive Speech Synthesis; Some Paralinguistic and Nonlinguistic Issues of Speech Prosody for Conversational Dialogue Systems

Speech Prosody in Speech Synthesis: Modeling and generation of prosody for high quality and flexible speech synthesis - Prosody, Phonology and Phonetics ◽

10.1007/978-3-662-45258-5_7 ◽

2015 ◽

pp. 97-107

Author(s):

Nick Campbell ◽

Ya Li

Keyword(s):

Speech Synthesis ◽

Dialogue Systems ◽

Speech Prosody

Download Full-text

Synthetic Agents as Full-fledged Teammates

Proceedings of the Human Factors and Ergonomics Society Annual Meeting ◽

10.1177/154193120905301206 ◽

2009 ◽

Vol 53 (12) ◽

pp. 789-793 ◽

Cited By ~ 1

Author(s):

Christopher W. Myers

Keyword(s):

Human Performance ◽

Performance Modeling ◽

Lessons Learned ◽

Cognitive Engineering ◽

Important Goal ◽

Training Systems ◽

Systems Research ◽

Synthetic Agents ◽

Training Resources

An important goal of training systems research is the ability to train teams to criterion while simultaneously minimizing training resources. One promising approach is to develop synthetic agents that act as full-fledged members of a team. Five experts will highlight successes, failures, and continuing challenges associated with the development, validation, and deployment of synthetic agents as full-fledged teammates. The panel will provide an intimate look “under the hood” of synthetic agents, describe what each has found useful for developing a synthetic teammate that “plays well with others,” and discuss the key roadblocks that must be overcome for the further inclusion of synthetic teammates within human training systems. The lessons learned from these panelists will be of value to those interested in cognitive engineering and human performance modeling.

Download Full-text

Language Technology Platform for Public Administration

Frontiers in Artificial Intelligence and Applications - Human Language Technologies – The Baltic Perspective ◽

10.3233/faia200621 ◽

2020 ◽

Author(s):

Raivis Skadiņš ◽

Mārcis Pinnis ◽

Artūrs Vasiļevskis ◽

Andrejs Vasiļjevs ◽

Valters Šics ◽

...

Keyword(s):

Speech Recognition ◽

Data Storage ◽

Speech Synthesis ◽

Computer Assisted ◽

Translation Memory ◽

Technology Platform ◽

Future Developments ◽

Language Technology ◽

Language Data ◽

Main Components

The paper describes the Latvian e-government language technology platform HUGO.LV. It provides an instant translation of text snippets, formatting-rich documents and websites, an online computer-assisted translation tool with a built-in translation memory, a website translation widget, speech recognition and speech synthesis services, a terminology management and publishing portal, language data storage, analytics, and data sharing functionality. The paper describes the motivation for the creation of the platform, its main components, architecture, usage statistics, conclusions, and future developments. Evaluation results of language technology tools integrated in the platform are provided.

Download Full-text

Spoken language processing by machine

The Oxford Handbook of Psycholinguistics ◽

10.1093/oxfordhb/9780198568971.013.0044 ◽

2007 ◽

pp. 722-738

Author(s):

Roger K. Moore

Keyword(s):

Language Processing ◽

Speech Synthesis ◽

Interactive Voice Response ◽

Spoken Language ◽

Public Places ◽

Spoken Language Processing ◽

Speech Recognition Software ◽

Language Technology ◽

Text To Speech Synthesis ◽

Recognition Software

The past twenty-five years have witnessed a steady improvement in the capabilities of spoken language technology, first in the research laboratory and more recently in the commercial marketplace. Progress has reached a point where automatic speech recognition software for dictating documents onto a computer is available as an inexpensive consumer product in most computer stores, text-to-speech synthesis can be heard in public places giving automated voice announcements, and interactive voice response is becoming a familiar option for people paying bills or booking cinema tickets over the telephone. This article looks at the main computational approaches employed in contemporary spoken language processing. It discusses acoustic modelling, language modelling, pronunciation modelling, and noise modelling. The article also considers future prospects in the context of the obvious shortcomings of current technology, and briefly addresses the potential for achieving a unified approach to human and machine spoken language processing.

Download Full-text

VISUALLY IMPAIRED STUDENTS EDUCATION THROUGH INTELLIGENT TECHNOLOGIES

Knowledge International Journal ◽

10.35120/kij28031133l ◽

2018 ◽

Vol 28 (3) ◽

pp. 1133-1138

Author(s):

Lindita Ademi ◽

Valbon Ademi

Keyword(s):

Intelligent Systems ◽

Visually Impaired ◽

Speech Synthesis ◽

Speech Sound ◽

Text To Speech ◽

Competitive System ◽

Natural Interfaces ◽

Systems Research ◽

Initial Work ◽

Visually Impaired Students

The problem for developing a TTS (text-to-speech) is a very active field of research. As the Human-Computer Interfaces (HCI) come of age, the need for a more ergonomic and natural interface than the current one (keyboard, mouse, etc.) is being constantly felt. Talking of natural interfaces, what comes to mind, is sound (speech) and sight (vision). These form the basis of many intelligent systems research like robotics. Moreover, speech can also serve as an excellent interface for visually impaired , or people with motor neuron disorders. In this paper we attempt at developing a TTS system for Albanian Language. A lot of commercial systems are available for many foreign languages (mostly English), but there is yet to be a competitive system available for Albanian language. Although the task of building very high quality, unlimited vocabulary text-to-speech (TTS) system is still a difficult one, with many open research questions, we believe the building of reasonable quality voices for many tasks can serve our needs. Here we have worked with standard Albanian, the most commonly spoken. We hope to easily extend the system to other languages, since there are a lot of underlying similarities between languages. Albanian language being highly phonetic, result in simple letter-to-sound rules. We used the standard concatenative synthesis. The main problem faced by us was to make the synthesized speech sound natural. We investigated the reasons for the mechanical sounding speech and developed different synthesis models to overcome some of those problems. Moreover, we implemented some standard and also novel intonation and duration modification algorithms, which can be incorporated into the TTS at a later stage. Our main achievement was reasonably legible speech with an unlimited vocabulary. The following paper presents a brief overview of the main text-to-speech synthesis problem and its subproblems, and the initial work done in building a TTS for Albanian.

Download Full-text

Generic Dialogue Modeling for Multi-application Dialogue Systems

Machine Learning for Multimodal Interaction - Lecture Notes in Computer Science ◽

10.1007/11677482_15 ◽

2006 ◽

pp. 174-186 ◽

Cited By ~ 2

Author(s):

Trung H. Bui ◽

Job Zwiers ◽

Anton Nijholt ◽

Mannes Poel

Keyword(s):

Dialogue Systems ◽

Dialogue Modeling

Download Full-text