美国手语识别的生成多流架构

论文标题

美国手语识别的生成多流架构

Generative Multi-Stream Architecture For American Sign Language Recognition

论文作者

Huh, Dom, Gurrapu, Sai, Olson, Frederick, Rangwala, Huzefa, Pathak, Parth, Kosecka, Jana

论文摘要

随着深度模型体系结构的进步，计算机视觉中的任务可以达到最佳收敛，提供适当的数据预处理和模型参数初始化。但是，对复杂应用程序的功能富度较低的数据集进行培训限制和损害人类绩效的最佳收敛。在过去的作品中，研究人员以补充硬件为代价提供了互补数据的外部来源，这些硬件在流中被喂食以抵消这种限制并提高性能。我们提出了一种生成性的多流架构，消除了对其他硬件的需求，目的是提高功能丰富性而不冒险。我们还将紧凑的时空残留块引入标准的3维卷积模型C3D。我们的RC3D模型与FASL-RGB数据集上的顶级C3D残差架构Pseudo-3D模型相对执行。我们的方法已达到95.62％的验证精度，培训的差异为1.42％，验证精度的表现优于过去的模型0.45％，差异为5.53％。

With advancements in deep model architectures, tasks in computer vision can reach optimal convergence provided proper data preprocessing and model parameter initialization. However, training on datasets with low feature-richness for complex applications limit and detriment optimal convergence below human performance. In past works, researchers have provided external sources of complementary data at the cost of supplementary hardware, which are fed in streams to counteract this limitation and boost performance. We propose a generative multi-stream architecture, eliminating the need for additional hardware with the intent to improve feature richness without risking impracticability. We also introduce the compact spatio-temporal residual block to the standard 3-dimensional convolutional model, C3D. Our rC3D model performs comparatively to the top C3D residual variant architecture, the pseudo-3D model, on the FASL-RGB dataset. Our methods have achieved 95.62% validation accuracy with a variance of 1.42% from training, outperforming past models by 0.45% in validation accuracy and 5.53% in variance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题