Home > Articles > All Issues > 2023 > Volume 11, No. 4, December 2023 >
JOIG 2023 Vol.11(4): 343-352
doi: 10.18178/joig.11.4.343-352

Residual Neural Networks for Human Action Recognition from RGB-D Videos

K. Venkata Subbareddy 1,*, B. Pavani 2, G. Sowmya 3, and N. Ramadevi 4
1. Department of Electronics and Communication Engineering (ECE), Osmania University, Hyderabad, India
2. Indian Institute of Technology (IIT) Bombay, Maharashtra, India
3. Department of Electronics and Communication Engineering (ECE), Santhiram Engineering College, Andhra Pradesh, India
4. Department of Computer Science Engineering (CSE), Santhiram Engineering College, Andhra Pradesh, India
*Correspondence: subvishk03@gmail.com (K.V.S.)

Manuscript received May 25, 2023; revised June 19, 2023; accepted July 21, 2023.

Abstract—Recently, the RGB-D based Human Action Recognition (HAR) has gained significant research attention due to the provision of complimentary information by different data modalities. However, the current models have experienced still unsatisfactory results due to several problems including noises and view point variations between different actions. To sort out these problems, this paper proposes two new action descriptors namely Modified Depth Motion Map (MDMM) and Spherical Redundant Joint Descriptor (SRJD). MDMM eliminates the noises from depth maps and preserves only the action related information. Further SRJD ensures resilience against view point variations and reduces the misclassifications between different actions with similar view properties. Further, to maximize the recognition accuracy, standard deep learning algorithm called as Residual Neural Network (ResNet) is used to train the system through the features extracted from MDMM and SRJD. Simulation experiments prove that the multiple data modalities are better than single data modality. The proposed approach was tested on two public datasets namely NTURGB+D dataset and UTD-MHAD dataset. The testing results declare that the proposed approach is superior to the earlier HAR methods. On an average, the proposed system gained an accuracy of 90.0442% and 92.3850% at Cross-subject and Cross-view validations respectively.

Keywords—human action recognition, depth maps, Skeleton joints, view invariance, Residual Neural Network (ResNet), F-score

Cite: K. Venkata Subbareddy, B. Pavani, G. Sowmya, and N. Ramadevi, "Residual Neural Networks for Human Action Recognition from RGB-D Videos," Journal of Image and Graphics, Vol. 11, No. 4, pp. 343-352, December 2023.

Copyright © 2023 by the authors. This is an open access article distributed under the Creative Commons Attribution License (CC BY-NC-ND 4.0), which permits use, distribution and reproduction in any medium, provided that the article is properly cited, the use is non-commercial and no modifications or adaptations are made.