Project Details
Abstract
Currently existing chatbots are text-based conversational chatbots which are based on some predefined response rules or on huge dialogue datasets learning. This kind of chatbots usually uses the text or speech interface for communication. The existing chatbots use the sequence-to-sequence approach (such as question-answer approach) to generate output responses to the user dialogues. However, they do not consider any approach for accepting the multi-modal data, e.g., text, image, audio, and video, together for understanding the meaning when people use in conversation. To the best of our knowledge, there is no research work on building a chatbot which can understand the meaning from the user dialogue with different data formats. Apart from that, the existing chatbots do not have the reasoning (thinking) function to correlate the response with the user’s previous conversations. Based on our study, we do not find any research work which focusses on chatbots with reasoning function for the user dialogues with different data formats. Hence in this project proposal, we propose a 2-year project to investigate in design and implementation of an intelligent general-purpose chatbot named “Aaron” which can understand the user dialogues of different data formats and with a reasoning capability. In this project, we mainly focus on solving two problems in chatbots. The first problem is how to understand user dialogues of different data formats. To do this, we propose a hierarchical multi-modal data fusion approach with the bidirectional long short term memory (Bi-LSTM) to understand and combine the features of different data types presented in the user dialogues. The second problem is how to design a chatbot with a reasoning function. To do this, we present two methods. In the first method, we summarize the user’s previous conversation with the Bi-LSTM and store the summarized data as a part of external memory. For the second method, we use the Bi-LSTM to correlate the input features obtained from the multi-modal data fusion for the current user dialogue with the summarized data of the previous user’s conversations. We will use the public Stanford question answering datasets, the deepmind question-answering corpus, and the Amazon question answer dataset as text datasets to train the Aaron. Similarly, we use the TIMIT acoustic-phonetic continuous speech dataset and the LibriSpeech automatic speech recognition dataset as audio datasets to train the Aaron. Whereas, we use the common objects in context dataset, the Flickr 30K dataset as image datasets to train the Aaron. For video datasets, we use the Kinetics dataset, the Youtube-8M dataset and the moments in time dataset to train the Aaron. We will use the Python programming language to implement the Aaron. Furthermore, we will develop an iOS application and an Android application as the graphical user interface for the Aaron. After developing these application, we will release them in the AppStore and the GooglePlayStore for public usage.
Project IDs
Project ID:PB10901-2437
External Project ID:MOST108-2221-E182-042-MY2
External Project ID:MOST108-2221-E182-042-MY2
Status | Finished |
---|---|
Effective start/end date | 01/08/20 → 31/07/21 |
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.