Video reasoning for conflict events through feature extraction

Sheng Tzong Cheng, Chih Wei Hsu, Gwo Jiun Horng*, Ci Ruei Jiang

*Corresponding author for this work

Research output: Contribution to journalJournal Article peer-review

2 Scopus citations

Abstract

The rapid growth of multimedia data and the improvement of deep learning technology has allowed high-accuracy models to be trained for various fields. Video tools such as video classification, temporal action detection, and video summary are now available for the understanding of videos. In daily life, many social events start with a small conflict event. If conflicts and the subsequent dangers can be learned about from a video, we can prevent social incidents from occurring early on. This research presents a video and audio reasoning network that infers possible conflict events through video and audio features. To make the respective model more generalizable to other tasks, we have also added a predictive network to predict the risk of conflict events. We use multitasking to render the characteristics of movies and voices more generalizable to other similar tasks. We also propose several methods to integrate video features and audio features, improving the reasoning performance of the model. There’s a model we proposed is called the video and audio reasoning Network (VARN) which is more accurate than other models. Compared with RandomNet, it achieves a 2.9 times greater accuracy.

Original languageEnglish
Pages (from-to)6435-6455
Number of pages21
JournalJournal of Supercomputing
Volume77
Issue number6
DOIs
StatePublished - 06 2021
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2021, Springer Science+Business Media, LLC, part of Springer Nature.

Keywords

  • Computer vision
  • Deep learning
  • Multitask learning
  • Video reasoning

Fingerprint

Dive into the research topics of 'Video reasoning for conflict events through feature extraction'. Together they form a unique fingerprint.

Cite this