Note
You are reading the documentation for MMEditing 0.x, which will soon be deprecated by the end of 2022. We recommend you upgrade to MMEditing 1.0 to enjoy fruitful new features and better performance brought by OpenMMLab 2.0. Check out the changelog, code and documentation of MMEditing 1.0 for more details.
Frame-Interpolation Models¶
CAIN (AAAI’2020)¶
Abstract¶
Prevailing video frame interpolation techniques rely heavily on optical flow estimation and require additional model complexity and computational cost; it is also susceptible to error propagation in challenging scenarios with large motion and heavy occlusion. To alleviate the limitation, we propose a simple but effective deep neural network for video frame interpolation, which is end-to-end trainable and is free from a motion estimation network component. Our algorithm employs a special feature reshaping operation, referred to as PixelShuffle, with a channel attention, which replaces the optical flow computation module. The main idea behind the design is to distribute the information in a feature map into multiple channels and extract motion information by attending the channels for pixel-level frame synthesis. The model given by this principle turns out to be effective in the presence of challenging motion and occlusion. We construct a comprehensive evaluation benchmark and demonstrate that the proposed approach achieves outstanding performance compared to the existing models with a component for optical flow computation.

Results and models¶
Evaluated on RGB channels.
The metrics are PSNR / SSIM
.
The learning rate adjustment strategy is Step LR scheduler with min_lr clipping
.
Method | vimeo-90k-triplet | Download |
---|---|---|
cain_b5_g1b32_vimeo90k_triplet | 34.6010 / 0.9578 | model/log |
Citation¶
@inproceedings{choi2020channel,
title={Channel attention is all you need for video frame interpolation},
author={Choi, Myungsub and Kim, Heewon and Han, Bohyung and Xu, Ning and Lee, Kyoung Mu},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={34},
number={07},
pages={10663--10671},
year={2020}
}
FLAVR (arXiv’2020)¶
Abstract¶
Most modern frame interpolation approaches rely on explicit bidirectional optical flows between adjacent frames, thus are sensitive to the accuracy of underlying flow estimation in handling occlusions while additionally introducing computational bottlenecks unsuitable for efficient deployment. In this work, we propose a flow-free approach that is completely end-to-end trainable for multi-frame video interpolation. Our method, FLAVR, is designed to reason about non-linear motion trajectories and complex occlusions implicitly from unlabeled videos and greatly simplifies the process of training, testing and deploying frame interpolation models. Furthermore, FLAVR delivers up to 6× speed up compared to the current state-of-the-art methods for multi-frame interpolation while consistently demonstrating superior qualitative and quantitative results compared with prior methods on popular benchmarks including Vimeo-90K, Adobe-240FPS, and GoPro. Finally, we show that frame interpolation is a competitive self-supervised pre-training task for videos via demonstrating various novel applications of FLAVR including action recognition, optical flow estimation, motion magnification, and video object tracking. Code and trained models are provided in the supplementary material.

Results and models¶
Evaluated on RGB channels.
The metrics are PSNR / SSIM
.
Method | scale | Vimeo90k-triplet | Download |
---|---|---|---|
flavr_in4out1_g8b4_vimeo90k_septuplet | x2 | 36.3340 / 0.96015 | model | log |
Note: FLAVR for x8 VFI task will supported in the future.
Citation¶
@article{kalluri2020flavr,
title={Flavr: Flow-agnostic video representations for fast frame interpolation},
author={Kalluri, Tarun and Pathak, Deepak and Chandraker, Manmohan and Tran, Du},
journal={arXiv preprint arXiv:2012.08512},
year={2020}
}
TOFlow (IJCV’2019)¶
Abstract¶
Many video enhancement algorithms rely on optical flow to register frames in a video sequence. Precise flow estimation is however intractable; and optical flow itself is often a sub-optimal representation for particular video processing tasks. In this paper, we propose task-oriented flow (TOFlow), a motion representation learned in a self-supervised, task-specific manner. We design a neural network with a trainable motion estimation component and a video processing component, and train them jointly to learn the task-oriented flow. For evaluation, we build Vimeo-90K, a large-scale, high-quality video dataset for low-level video processing. TOFlow outperforms traditional optical flow on standard benchmarks as well as our Vimeo-90K dataset in three video processing tasks: frame interpolation, video denoising/deblocking, and video super-resolution.

Results and models¶
Evaluated on RGB channels.
The metrics are PSNR / SSIM
.
Method | Pretrained SPyNet | Vimeo90k-triplet | Download |
---|---|---|---|
tof_vfi_spynet_chair_nobn_1xb1_vimeo90k | spynet_chairs_final | 33.3294 / 0.9465 | model | log |
tof_vfi_spynet_kitti_nobn_1xb1_vimeo90k | spynet_chairs_final | 33.3339 / 0.9466 | model | log |
tof_vfi_spynet_sintel_clean_nobn_1xb1_vimeo90k | spynet_chairs_final | 33.3170 / 0.9464 | model | log |
tof_vfi_spynet_sintel_final_nobn_1xb1_vimeo90k | spynet_chairs_final | 33.3237 / 0.9465 | model | log |
tof_vfi_spynet_pytoflow_nobn_1xb1_vimeo90k | spynet_chairs_final | 33.3426 / 0.9467 | model | log |
Note: These pretrained SPyNets don’t contain BN layer since batch_size=1
, which is consistent with https://github.com/Coldog2333/pytoflow
.
Citation¶
@article{xue2019video,
title={Video enhancement with task-oriented flow},
author={Xue, Tianfan and Chen, Baian and Wu, Jiajun and Wei, Donglai and Freeman, William T},
journal={International Journal of Computer Vision},
volume={127},
number={8},
pages={1106--1125},
year={2019},
publisher={Springer}
}