青岛科技大学主页平台管理系统 Qulianen--Home-- Human Action Recognition Based on 3D Convolution and Multi-Attention Transformer

Home > Scientific Research > Paper Publications

Human Action Recognition Based on 3D Convolution and Multi-Attention Transformer

Release time:2025-07-14 Hits:

Key Words:NETWORK
Abstract:To address the limitations of traditional two-stream networks, such as inadequate spatiotemporal information fusion, limited feature diversity, and insufficient accuracy, we propose an improved two-stream network for human action recognition based on multi-scale attention Transformer and 3D convolutional (C3D) fusion. In the temporal stream, the traditional 2D convolutional is replaced with a C3D network to effectively capture temporal dynamics and spatial features. In the spatial stream, a multi-scale convolutional Transformer encoder is introduced to extract features. Leveraging the multi-scale attention mechanism, the model captures and enhances features at various scales, which are then adaptively fused using a weighted strategy to improve feature representation. Furthermore, through extensive experiments on feature fusion methods, the optimal fusion strategy for the two-stream network is identified. Experimental results on benchmark datasets such as UCF101 and HMDB51 demonstrate that the proposed model achieves superior performance in action recognition tasks.
Volume:15
Issue:5
Translation or Not:no

Pre One:MDA-MIM：一种融合多尺度特征与双重注意力机制的雷达回波图预测模型

Next One:Real-time position and trajectory estimation based on deep learning and monocular cameras

Home

Scientific Research

Teaching Research

Awards and Honours

Enrollment Information

Student Information

My Album

Blog

Recommended MA Supervisor

Personal Information

Human Action Recognition Based on 3D Convolution and Multi-Attention Transformer