Aggregated Spatio-temporal MLP-Mixer for Violence Recognition in Video Clips

Yu Shian Shen, Jenhui Chen*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Existing violent behavior datasets are not perfect in quantity and quality due to the difficulty of collecting. Although the state-of-the-art Transformer models had shown their capability in behavior recognition, it is unsuitable for the task of short-term behavior understanding (e.g., violent behavior recognition) due to the need for a large amount of data to achieve their best performance. Recently, a simple deep learning architecture, an all multilayer perceptron (MLP) architecture called MLP-Mixer, was proposed against Transformer in the task of a few-sample dataset to obtain competitive results. Motivated by spatio-temporal features on neurons, we invent a dual-form dataset for MLP-Mixer-based model training called aggregated spatio-temporal MLP-Mixer (ASM) to handle video understanding tasks. We show that ASM outperforms the state-of-the-art Transformer models as well as some of the best-performed convolutional neural network (CNN) approaches on three public datasets, smart-city CCTV violence detection dataset (SCVD), real-life violence situations (RLVS) dataset, and Hockey fight. Experimental results further validate our idea on short-term behavior scene understanding improvement.

Original languageEnglish
Title of host publicationProceedings - 2023 6th International Symposium on Computer, Consumer and Control, IS3C 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages44-47
Number of pages4
ISBN (Electronic)9798350301953
DOIs
StatePublished - 2023
Event6th International Symposium on Computer, Consumer and Control, IS3C 2023 - Taichung City, Taiwan
Duration: 30 06 202303 07 2023

Publication series

NameProceedings - 2023 6th International Symposium on Computer, Consumer and Control, IS3C 2023

Conference

Conference6th International Symposium on Computer, Consumer and Control, IS3C 2023
Country/TerritoryTaiwan
CityTaichung City
Period30/06/2303/07/23

Bibliographical note

Publisher Copyright:
© 2023 IEEE.

Fingerprint

Dive into the research topics of 'Aggregated Spatio-temporal MLP-Mixer for Violence Recognition in Video Clips'. Together they form a unique fingerprint.

Cite this