渠连恩

硕士生导师

入职时间:2012-11-05

学历:博士研究生

办公地点:中德科技楼A5110

论文成果

当前位置: 中文主页 >> 科学研究 >> 论文成果

Human Action Recognition Based on 3D Convolution and Multi-Attention Transformer

发布时间:2025-07-14 点击次数:

关键字:NETWORK
摘要:To address the limitations of traditional two-stream networks, such as inadequate spatiotemporal information fusion, limited feature diversity, and insufficient accuracy, we propose an improved two-stream network for human action recognition based on multi-scale attention Transformer and 3D convolutional (C3D) fusion. In the temporal stream, the traditional 2D convolutional is replaced with a C3D network to effectively capture temporal dynamics and spatial features. In the spatial stream, a multi-scale convolutional Transformer encoder is introduced to extract features. Leveraging the multi-scale attention mechanism, the model captures and enhances features at various scales, which are then adaptively fused using a weighted strategy to improve feature representation. Furthermore, through extensive experiments on feature fusion methods, the optimal fusion strategy for the two-stream network is identified. Experimental results on benchmark datasets such as UCF101 and HMDB51 demonstrate that the proposed model achieves superior performance in action recognition tasks.
卷号:15
期号:5
是否译文: