Abstract
Existing violent behavior datasets are not perfect in quantity and quality due to the difficulty of collecting. Although the state-of-the-art Transformer models had shown their capability in behavior recognition, it is unsuitable for the task of short-term behavior understanding (e.g., violent behavior recognition) due to the need for a large amount of data to achieve their best performance. Recently, a simple deep learning architecture, an all multilayer perceptron (MLP) architecture called MLP-Mixer, was proposed against Transformer in the task of a few-sample dataset to obtain competitive results. Motivated by spatio-temporal features on neurons, we invent a dual-form dataset for MLP-Mixer-based model training called aggregated spatio-temporal MLP-Mixer (ASM) to handle video understanding tasks. We show that ASM outperforms the state-of-the-art Transformer models as well as some of the best-performed convolutional neural network (CNN) approaches on three public datasets, smart-city CCTV violence detection dataset (SCVD), real-life violence situations (RLVS) dataset, and Hockey fight. Experimental results further validate our idea on short-term behavior scene understanding improvement.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2023 6th International Symposium on Computer, Consumer and Control, IS3C 2023 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 44-47 |
| Number of pages | 4 |
| ISBN (Electronic) | 9798350301953 |
| DOIs | |
| State | Published - 2023 |
| Event | 6th International Symposium on Computer, Consumer and Control, IS3C 2023 - Taichung City, Taiwan Duration: 30 06 2023 → 03 07 2023 |
Publication series
| Name | Proceedings - 2023 6th International Symposium on Computer, Consumer and Control, IS3C 2023 |
|---|
Conference
| Conference | 6th International Symposium on Computer, Consumer and Control, IS3C 2023 |
|---|---|
| Country/Territory | Taiwan |
| City | Taichung City |
| Period | 30/06/23 → 03/07/23 |
Bibliographical note
Publisher Copyright:© 2023 IEEE.
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 11 Sustainable Cities and Communities
Fingerprint
Dive into the research topics of 'Aggregated Spatio-temporal MLP-Mixer for Violence Recognition in Video Clips'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver