2024-04-30
2024-06-28
2024-06-06
Manuscript received March 13, 2023; revised April 18, 2023; accepted May 5, 2023.
Abstract—Skeleton-based human action recognition conveys interesting information about the dynamics of a human body. In this work, we develop a method that uses a multi-stream model with connections between the parallel streams. This work is inspired by a state-of-the-art method called FUSIONCPA that merges different modalities: infrared input and skeleton input. Because we are interested in investigating improvements related to the skeleton-branch backbone, we used the Spatial-Temporal Graph Convolutional Networks (ST-GCN) model and an EfficientGCN attention module. We aim to provide improvements when capturing spatial and temporal features. In addition, we exploited a Graph Convolutional Network (GCN) implemented in the ST-GCN model to capture the graphic connectivity in skeletons. This paper reports interesting accuracy on a large-scale dataset (NTU-RGB+D 60), over 91% and 93% on respectively crosssubject, and cross-view benchmarks. This proposed model is lighter by 9 million training parameters compared with the model FUSION-CPA. Keywords—deep learning, Human Action Recognition (HAR), convolutional neural networks, Graph Convolutional Networks (GCNs) Cite: Amine Mansouri, Toufik Bakir, and Smain Femmam, "Human Action Recognition with Skeleton and Infrared Fusion Model," Journal of Image and Graphics, Vol. 11, No. 4, pp. 309-320, December 2023. Copyright © 2023 by the authors. This is an open access article distributed under the Creative Commons Attribution License (CC BY-NC-ND 4.0), which permits use, distribution and reproduction in any medium, provided that the article is properly cited, the use is non-commercial and no modifications or adaptations are made.