Abstract
Visual question answering and visual dialog systems are the emerging research areas in natural language processing that exploits the use of image and text modalities to convey an understanding of the contexts and attributes in a conversation as humans do in online chat platforms. These multimodal dialog techniques are enabling the extended use of chatbots in many open and vertical domains. In this paper, we propose the cognitive attention network (CAN) which is a visual dialog system capable of answering multiple user questions regarding an image, and also able to identify similar images from past conversations and referring to them during an ongoing question-Answering (Q&A) chat. Our model comprises of faster RCNN, pre-Trained BERT, late data fusion, and a memory network serving as a knowledge base for the temporary storage of previous visio-Textual dialog data representations. Training on VISDIAL v1.0 benchmark dataset, we achieve a competitive result that outperforms some of the existing state-of-The-Art models.
| Original language | English |
|---|---|
| Title of host publication | 2020 6th International Conference on Applied System Innovation, ICASI 2020 |
| Editors | Shoou-Jinn Chang, Sheng-Joue Young, Artde Donald Kin-Tak Lam, Liang-Wen Ji, Stephen D. Prior |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 37-41 |
| Number of pages | 5 |
| ISBN (Electronic) | 9781728175362 |
| DOIs | |
| State | Published - 05 11 2020 |
| Event | 6th International Conference on Applied System Innovation, ICASI 2020 - Zhiben, Taitung, Taiwan Duration: 05 11 2020 → 08 11 2020 |
Publication series
| Name | 2020 6th International Conference on Applied System Innovation, ICASI 2020 |
|---|
Conference
| Conference | 6th International Conference on Applied System Innovation, ICASI 2020 |
|---|---|
| Country/Territory | Taiwan |
| City | Zhiben, Taitung |
| Period | 05/11/20 → 08/11/20 |
Bibliographical note
Publisher Copyright:© 2020 IEEE.
Fingerprint
Dive into the research topics of 'Cognitive Attention Network (CAN) for Text and Image Multimodal Visual Dialog Systems'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver