工学博士

博士研究生

Personal Information

Date of Birth:1980-12-14
Date of Employment:2012-11-05

VIEW MORE
Home > Scientific Research > Paper Publications

Human Action Recognition Based on 3D Convolution and Multi-Attention Transformer

Release time:2025-07-14 Hits:

Key Words:NETWORK
Abstract:To address the limitations of traditional two-stream networks, such as inadequate spatiotemporal information fusion, limited feature diversity, and insufficient accuracy, we propose an improved two-stream network for human action recognition based on multi-scale attention Transformer and 3D convolutional (C3D) fusion. In the temporal stream, the traditional 2D convolutional is replaced with a C3D network to effectively capture temporal dynamics and spatial features. In the spatial stream, a multi-scale convolutional Transformer encoder is introduced to extract features. Leveraging the multi-scale attention mechanism, the model captures and enhances features at various scales, which are then adaptively fused using a weighted strategy to improve feature representation. Furthermore, through extensive experiments on feature fusion methods, the optimal fusion strategy for the two-stream network is identified. Experimental results on benchmark datasets such as UCF101 and HMDB51 demonstrate that the proposed model achieves superior performance in action recognition tasks.
Volume:15
Issue:5
Translation or Not:no