Gaussian Representations for Video
Sachin Shah · Anustup Choudhury · Guan-Ming Su · Jaclyn Pytlarz · Christopher Metzler · Trisha Mittal
Abstract
We introduce Gaussian representations for videos (GaRV), a novel video encoding and decoding scheme based upon 3D Gaussians. Unlike traditional representations, which encode videos as sequences of frames, or neural representations, which encode videos within the weights of a neural network, we encode videos as a collection of 3D Gaussians within a space-time volume. The key advantage of our approach is that it enables efficient and flexible rasterization-based video decoding. With a slight drop in overall compression rate, GaRV offers a 8-50$\times$ improvement in decoding time and 2.5-15$\times$ reduction in GPU memory compared with neural counterparts. Existing Gaussian video techniques require 2-30$\times$ more disk space, while also using more GPU resources than GaRV.Moreover, GaRV offers unique flexibility in how and when pixels are decoded: One can non-sequentially decode frames/regions without penalty and can selectively decode regions at high-resolution to enable low-cost foveated video decoding.
Successful Page Load