Cognitive Attention Network (CAN) for Text and Image Multimodal Visual Dialog Systems

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Visual question answering and visual dialog systems are the emerging research areas in natural language processing that exploits the use of image and text modalities to convey an understanding of the contexts and attributes in a conversation as humans do in online chat platforms. These multimodal dialog techniques are enabling the extended use of chatbots in many open and vertical domains. In this paper, we propose the cognitive attention network (CAN) which is a visual dialog system capable of answering multiple user questions regarding an image, and also able to identify similar images from past conversations and referring to them during an ongoing question-Answering (Q&A) chat. Our model comprises of faster RCNN, pre-Trained BERT, late data fusion, and a memory network serving as a knowledge base for the temporary storage of previous visio-Textual dialog data representations. Training on VISDIAL v1.0 benchmark dataset, we achieve a competitive result that outperforms some of the existing state-of-The-Art models.

Original languageEnglish
Title of host publication2020 6th International Conference on Applied System Innovation, ICASI 2020
EditorsShoou-Jinn Chang, Sheng-Joue Young, Artde Donald Kin-Tak Lam, Liang-Wen Ji, Stephen D. Prior
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages37-41
Number of pages5
ISBN (Electronic)9781728175362
DOIs
StatePublished - 05 11 2020
Event6th International Conference on Applied System Innovation, ICASI 2020 - Zhiben, Taitung, Taiwan
Duration: 05 11 202008 11 2020

Publication series

Name2020 6th International Conference on Applied System Innovation, ICASI 2020

Conference

Conference6th International Conference on Applied System Innovation, ICASI 2020
Country/TerritoryTaiwan
CityZhiben, Taitung
Period05/11/2008/11/20

Bibliographical note

Publisher Copyright:
© 2020 IEEE.

Fingerprint

Dive into the research topics of 'Cognitive Attention Network (CAN) for Text and Image Multimodal Visual Dialog Systems'. Together they form a unique fingerprint.

Cite this