HIGHER-ORDER RECURRENT NETWORK WITH SPACE-TIME ATTENTION FOR VIDEO EARLY ACTION RECOGNITION

  • Tsung Ming Tai
  • , Giuseppe Fiameni
  • , Cheng Kuang Lee
  • , Oswald Lanz

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Endowing visual agents with predictive capability is a key step towards video intelligence at scale. Early action recognition aims to predict the action labels before fully observing the complete video frames. Unlike action recognition, the model is asked to forecast the future or the effects by only observing the initial few frames. The strong reasoning ability over the temporal dimension is the key to success. To this end, in this paper, we propose a novel recurrent network with decomposed space-time attention and higher-order design to capture the temporal dependency associated with the specific actions. Our method achieves state-of-the-art performance on Something-Something and EPIC-Kitchens datasets under the early action recognition setting, showing evidence of predictive capability that we attribute to our higher-order recurrent design with space-time attention.

Original languageEnglish
Title of host publication2022 IEEE International Conference on Image Processing, ICIP 2022 - Proceedings
PublisherIEEE Computer Society
Pages1631-1635
Number of pages5
ISBN (Electronic)9781665496209
DOIs
StatePublished - 2022
Externally publishedYes
Event29th IEEE International Conference on Image Processing, ICIP 2022 - Bordeaux, France
Duration: 16 10 202219 10 2022

Publication series

NameProceedings - International Conference on Image Processing, ICIP
ISSN (Print)1522-4880

Conference

Conference29th IEEE International Conference on Image Processing, ICIP 2022
Country/TerritoryFrance
CityBordeaux
Period16/10/2219/10/22

Bibliographical note

Publisher Copyright:
© 2022 IEEE.

Keywords

  • Video prediction
  • early action recognition
  • higher-order recurrent networks
  • space-time attention

Fingerprint

Dive into the research topics of 'HIGHER-ORDER RECURRENT NETWORK WITH SPACE-TIME ATTENTION FOR VIDEO EARLY ACTION RECOGNITION'. Together they form a unique fingerprint.

Cite this