NerVast: Compression-Efficient Scaling of Implicit Neural Video Representations via Scene-based Parameter-sharing
Abstract
Implicit neural representation (INR) has emerged as a new data representation for compressing videos and now shows on-par performance with the conventional codec. The next quest in the field is to make INR scalable for its practical use. Existing works realize this by utilizing small INR models to scale for long and high-resolution video, which achieves better encoding and decoding speeds. However, they fail to fully exploit the temporal nature of video data when encoding it into multiple separate INRs across time, which leads to sub-optimal compression efficiency. In this work, we propose NerVast, a new encoding scheme for video INR, that improves compression efficiency while still enjoying the low computation and transfer costs of small INR models. When a video is represented in separate INR segments, NerVast effectively reduces the total volume required for representation by sharing the parameters between models during encoding. Without expensive training, NerVast selects the most significant parameters to share. Then it jointly trains both shared and non-shared parameters in a way that minimizes the quality drop imposed by sharing. While maintaining real-time decoding speed (> 30 fps), NerVast provides better compression (39.9 % reduction in parameters on average) compared to the compute-efficient INR models. In other words, NerVast is better in encoding quality (1.57 dB higher in PSNR) with the same bitrate.