Abstract
To precisely estimate head poses based on RGB images is essential and useful for many applications, such as understanding the vehicle drivers' status for driving safety, and passengers' action conditions. Recently, due to the impact of the COVID-19 pandemic, people are required to wear masks in almost all public places, sometimes even in a vehicle, but the existing research works on head pose estimation have become more challenging when the face is occluded. To tackle this issue, we propose a novel siamese structure network integrating the global-local attention mechanisms with data augmentation and a multi-task learning strategy. Specifically, we initially incorporate data augmentation for synthesizing facial masks on human faces and landmark prediction in the training stage to help the model be generalized and robust. Next, a global-local attention mechanism is designed so that the relationship in whole feature maps can be learned and the critical spatial-channel information can be enhanced to obtain a better feature representation. Lastly, the feature interpolation regularization module utilizes pairs of feature embedding from the siamese network to optimize the feature embedding. To validate our proposed work, the proposed method is evaluated on AFLW2000, BIWI, and MAFA datasets. Extensive experiments show that our method can achieve highly promising performance on those public datasets.
Original language | English |
---|---|
State | Published - 2022 |
Externally published | Yes |
Event | 33rd British Machine Vision Conference Proceedings, BMVC 2022 - London, United Kingdom Duration: 21 11 2022 → 24 11 2022 |
Conference
Conference | 33rd British Machine Vision Conference Proceedings, BMVC 2022 |
---|---|
Country/Territory | United Kingdom |
City | London |
Period | 21/11/22 → 24/11/22 |
Bibliographical note
Publisher Copyright:© 2022. The copyright of this document resides with its authors. It may be distributed unchanged freely in print or electronic forms.