SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders

Sheng Wei Li*, Zi Xiang Wei, Wei Jie Chen, Yi Hsin Yu, Chih Yuan Yang, Jane Yung jen Hsu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Existing zero-shot skeleton-based action recognition methods utilize projection networks to learn a shared latent space of skeleton features and semantic embeddings. The inherent imbalance in action recognition datasets, characterized by variable skeleton sequences yet constant class labels, presents significant challenges for alignment. To address the imbalance, we propose SA-DVAE—Semantic Alignment via Disentangled Variational Autoencoders, a method that first adopts feature disentanglement to separate skeleton features into two independent parts—one is semantic-related and another is irrelevant—to better align skeleton and semantic features. We implement this idea via a pair of modality-specific variational autoencoders coupled with a total correction penalty. We conduct experiments on three benchmark datasets: NTU RGB+D, NTU RGB+D 120 and PKU-MMD, and our experimental results show that SA-DAVE produces improved performance over existing methods. The code is available at https://github.com/pha123661/SA-DVAE.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2024 - 18th European Conference, Proceedings
EditorsAleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, Gül Varol
PublisherSpringer Science and Business Media Deutschland GmbH
Pages447-462
Number of pages16
ISBN (Print)9783031726392
DOIs
StatePublished - 2025
Event18th European Conference on Computer Vision, ECCV 2024 - Milan, Italy
Duration: 29 09 202404 10 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume15074 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference18th European Conference on Computer Vision, ECCV 2024
Country/TerritoryItaly
CityMilan
Period29/09/2404/10/24

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

Keywords

  • Feature Disentanglement
  • Skeleton-based Action Recognition
  • Zero-Shot and Generalized Zero-Shot Learning

Fingerprint

Dive into the research topics of 'SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders'. Together they form a unique fingerprint.

Cite this