Spoken dialogue systems have been successfully employed for the control of the machine or information retrieval. Future advances in science can extend the abilities of robots to participating in a human-level dialogue.
A recent paper on arXiv.org introduces an autonomous android ERICA who behaves and interacts like a human, including facial look and expression, gaze and gesture, and spoken dialogue.
The robot is supposed to perform four social interaction tasks: attentive listening, conducting a job interview, speed dating, and acting as a lab guide. ERICA can flexibly take its turn and use laughs, head nods, and eye contact to engage with a human. The experiments with 40 senior people showed that people could engage with ERICA for 5-7 minutes without a conversation breakdown. The robot also outperformed its baselines in the job interview task. In the future, robots like ERICA could replace humans in some social roles.
Following the success of spoken dialogue systems (SDS) in smartphone assistants and smart speakers, a number of communicative robots are developed and commercialized. Compared with the conventional SDSs designed as a human-machine interface, interaction with robots is expected to be in a closer manner to talking to a human because of the anthropomorphism and physical presence. The goal or task of dialogue may not be information retrieval, but the conversation itself. In order to realize human-level “long and deep” conversation, we have developed an intelligent conversational android ERICA. We set up several social interaction tasks for ERICA, including conversationattentive listening, job interview, and speed dating. To allow for spontaneous, incremental multiple utterances, a robust turn-taking model is implemented based on TRP (transition-relevance place) prediction, and a variety of backchannels are generated based on time frame-wise prediction instead of IPU-based prediction. We have realized an open-domain attentive listening system with partial repeats and elaborating questions on focus words as well as assessment responses. It has been evaluated with 40 senior people, engaged in conversation of 5-7 minutes without a conversation breakdown. It was also compared against the WOZ setting. We have also realized a job interview system with a set of base questions followed by dynamic generation of elaborating questions. It has also been evaluated with student subjects, showing promising results.