使用神经网络改进运动估计和编码

论文标题

使用神经网络改进运动估计和编码

Improvements of Motion Estimation and Coding using Neural Networks

论文作者

Birman, Raz, Segal, Yoram, Hadar, Ofer, Benois-Pineau, Jenny

论文摘要

预测间预测有效地用于包括H.264和HEVC（也称为H.265）在内的多个标准中。它利用连续视频帧的块之间的相关性，以执行运动补偿，从而预测块像素值并减少传输带宽。为了减少发射运动矢量（MV）的大小，从而减少带宽，编码器利用预测的运动矢量（PMV），该运动载体（PMV）是通过取用相邻块的相应MV的中间向量而得出的。在这项研究中，我们提出了基于神经网络预测的创新方法，以提高计算出的PMV的准确性。首先，我们显示了一种直接的方法，即计算最佳匹配PMV并向解码器发出邻居块索引值，同时减少表示结果所需的位数，而无需添加任何计算复杂性。然后，我们使用一个完全连接的神经网络（FCNN）的分类来估算PMV的邻居，而无需信号传导并显示用于高运动电影时该方法的优势。我们使用快进电影证明了优势。但是，相同的改进也适用于自动驾驶汽车，无人机摄像头，泛块Zoom（PTZ）摄像机的相机流以及类似的应用，而MVS幅度则有望大。我们还引入了回归FCNN以预测PMV。我们计算了霍夫曼编码的流，并证明了降低了约34％的次数，即可传输最佳匹配计算的PMV所需的位数，而无需降低质量，以供高向前向电影。

Inter-Prediction is used effectively in multiple standards, including H.264 and HEVC (also known as H.265). It leverages correlation between blocks of consecutive video frames in order to perform motion compensation and thus predict block pixel values and reduce transmission bandwidth. In order to reduce the magnitude of the transmitted Motion Vector (MV) and thus reduce bandwidth, the encoder utilizes Predicted Motion Vector (PMV), which is derived by taking the median vector of the corresponding MVs of the neighboring blocks. In this research, we propose innovative methods, based on neural networks prediction, for improving the accuracy of the calculated PMV. We begin by showing a straightforward approach of calculating the best matching PMV and signaling its neighbor block index value to the decoder while reducing the number of bits required to represent the result without adding any computation complexity. Then we use a classification Fully Connected Neural Networks (FCNN) to estimate from neighbors the PMV without requiring signaling and show the advantage of the approach when employed for high motion movies. We demonstrate the advantages using fast forward movies. However, the same improvements apply to camera streams of autonomous vehicles, drone cameras, Pan-Tilt-Zoom (PTZ) cameras, and similar applications whereas the MVs magnitudes are expected to be large. We also introduce a regression FCNN to predict the PMV. We calculate Huffman coded streams and demonstrate an order of ~34% reduction in number of bits required to transmit the best matching calculated PMV without reducing the quality, for fast forward movies with high motion.

下载PDF全文

下载文献需遵守相关版权规定

论文标题