Note

You are reading the documentation for MMEditing 0.x, which will soon be deprecated by the end of 2022. We recommend you upgrade to MMEditing 1.0 to enjoy fruitful new features and better performance brought by OpenMMLab 2.0. Check out the changelog, code and documentation of MMEditing 1.0 for more details.

# Frame-Interpolation Models¶

## CAIN (AAAI’2020)¶

### Abstract¶

Prevailing video frame interpolation techniques rely heavily on optical flow estimation and require additional model complexity and computational cost; it is also susceptible to error propagation in challenging scenarios with large motion and heavy occlusion. To alleviate the limitation, we propose a simple but effective deep neural network for video frame interpolation, which is end-to-end trainable and is free from a motion estimation network component. Our algorithm employs a special feature reshaping operation, referred to as PixelShuffle, with a channel attention, which replaces the optical flow computation module. The main idea behind the design is to distribute the information in a feature map into multiple channels and extract motion information by attending the channels for pixel-level frame synthesis. The model given by this principle turns out to be effective in the presence of challenging motion and occlusion. We construct a comprehensive evaluation benchmark and demonstrate that the proposed approach achieves outstanding performance compared to the existing models with a component for optical flow computation.

### Results and models¶

Evaluated on RGB channels. The metrics are PSNR / SSIM . The learning rate adjustment strategy is Step LR scheduler with min_lr clipping.

cain_b5_g1b32_vimeo90k_triplet 34.6010 / 0.9578 model/log

### Citation¶

@inproceedings{choi2020channel,
title={Channel attention is all you need for video frame interpolation},
author={Choi, Myungsub and Kim, Heewon and Han, Bohyung and Xu, Ning and Lee, Kyoung Mu},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={34},
number={07},
pages={10663--10671},
year={2020}
}


## FLAVR (arXiv’2020)¶

### Abstract¶

Most modern frame interpolation approaches rely on explicit bidirectional optical flows between adjacent frames, thus are sensitive to the accuracy of underlying flow estimation in handling occlusions while additionally introducing computational bottlenecks unsuitable for efficient deployment. In this work, we propose a flow-free approach that is completely end-to-end trainable for multi-frame video interpolation. Our method, FLAVR, is designed to reason about non-linear motion trajectories and complex occlusions implicitly from unlabeled videos and greatly simplifies the process of training, testing and deploying frame interpolation models. Furthermore, FLAVR delivers up to 6× speed up compared to the current state-of-the-art methods for multi-frame interpolation while consistently demonstrating superior qualitative and quantitative results compared with prior methods on popular benchmarks including Vimeo-90K, Adobe-240FPS, and GoPro. Finally, we show that frame interpolation is a competitive self-supervised pre-training task for videos via demonstrating various novel applications of FLAVR including action recognition, optical flow estimation, motion magnification, and video object tracking. Code and trained models are provided in the supplementary material.

### Results and models¶

Evaluated on RGB channels. The metrics are PSNR / SSIM .

flavr_in4out1_g8b4_vimeo90k_septuplet x2 36.3340 / 0.96015 model | log

Note: FLAVR for x8 VFI task will supported in the future.

### Citation¶

@article{kalluri2020flavr,
title={Flavr: Flow-agnostic video representations for fast frame interpolation},
author={Kalluri, Tarun and Pathak, Deepak and Chandraker, Manmohan and Tran, Du},
journal={arXiv preprint arXiv:2012.08512},
year={2020}
}


## TOFlow (IJCV’2019)¶

### Abstract¶

Many video enhancement algorithms rely on optical flow to register frames in a video sequence. Precise flow estimation is however intractable; and optical flow itself is often a sub-optimal representation for particular video processing tasks. In this paper, we propose task-oriented flow (TOFlow), a motion representation learned in a self-supervised, task-specific manner. We design a neural network with a trainable motion estimation component and a video processing component, and train them jointly to learn the task-oriented flow. For evaluation, we build Vimeo-90K, a large-scale, high-quality video dataset for low-level video processing. TOFlow outperforms traditional optical flow on standard benchmarks as well as our Vimeo-90K dataset in three video processing tasks: frame interpolation, video denoising/deblocking, and video super-resolution.

### Results and models¶

Evaluated on RGB channels. The metrics are PSNR / SSIM .

tof_vfi_spynet_chair_nobn_1xb1_vimeo90k spynet_chairs_final 33.3294 / 0.9465 model | log
tof_vfi_spynet_kitti_nobn_1xb1_vimeo90k spynet_chairs_final 33.3339 / 0.9466 model | log
tof_vfi_spynet_sintel_clean_nobn_1xb1_vimeo90k spynet_chairs_final 33.3170 / 0.9464 model | log
tof_vfi_spynet_sintel_final_nobn_1xb1_vimeo90k spynet_chairs_final 33.3237 / 0.9465 model | log
tof_vfi_spynet_pytoflow_nobn_1xb1_vimeo90k spynet_chairs_final 33.3426 / 0.9467 model | log

Note: These pretrained SPyNets don’t contain BN layer since batch_size=1, which is consistent with https://github.com/Coldog2333/pytoflow.

### Citation¶

@article{xue2019video,