Generative model-based video compression
In this paper, we propose a new method of recovering high-quality video conferencing streams from low frame rate video streams using deep learning. As a baseline, we propose a scheme using existing frame interpolation methods and lip movement generation methods, which we fine-tune to fit our particular use case. Then we introduce Wav2FSS, a novel end-to-end framework capable of generating a high-quality reconstruction of the speaker’s face. When validated against our baseline, this model proves to be state-of-the-art.
GitHub