Note

You are reading the documentation for MMEditing 0.x, which will soon be deprecated by the end of 2022. We recommend you upgrade to MMEditing 1.0 to enjoy fruitful new features and better performance brought by OpenMMLab 2.0. Check out the changelog, code and documentation of MMEditing 1.0 for more details.

# Matting Models¶

## DIM (CVPR’2017)¶

### Abstract¶

Image matting is a fundamental computer vision problem and has many applications. Previous algorithms have poor performance when an image has similar foreground and background colors or complicated textures. The main reasons are prior methods 1) only use low-level features and 2) lack high-level context. In this paper, we propose a novel deep learning based algorithm that can tackle both these problems. Our deep model has two parts. The first part is a deep convolutional encoder-decoder network that takes an image and the corresponding trimap as inputs and predict the alpha matte of the image. The second part is a small convolutional network that refines the alpha matte predictions of the first network to have more accurate alpha values and sharper edges. In addition, we also create a large-scale image matting dataset including 49300 training images and 1000 testing images. We evaluate our algorithm on the image matting benchmark, our testing set, and a wide variety of real images. Experimental results clearly demonstrate the superiority of our algorithm over previous methods.

### Results and models¶

stage1 (paper) 54.6 0.017 36.7 55.3 -
stage3 (paper) 50.4 0.014 31.0 50.8 -
stage1 (our) 53.8 0.017 32.7 54.5 model | log
stage2 (our) 52.3 0.016 29.4 52.4 model | log
stage3 (our) 50.6 0.015 29.0 50.7 model | log

NOTE

• stage1: train the encoder-decoder part without the refinement part.

• stage2: fix the encoder-decoder part and train the refinement part.

• stage3: fine-tune the whole network.

The performance of the model is not stable during the training. Thus, the reported performance is not from the last checkpoint. Instead, it is the best performance of all validations during training.

The performance of training (best performance) with different random seeds diverges in a large range. You may need to run several experiments for each setting to obtain the above performance.

### Citation¶

@inproceedings{xu2017deep,
title={Deep image matting},
author={Xu, Ning and Price, Brian and Cohen, Scott and Huang, Thomas},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={2970--2979},
year={2017}
}


## GCA (AAAI’2020)¶

### Abstract¶

Over the last few years, deep learning based approaches have achieved outstanding improvements in natural image matting. Many of these methods can generate visually plausible alpha estimations, but typically yield blurry structures or textures in the semitransparent area. This is due to the local ambiguity of transparent objects. One possible solution is to leverage the far-surrounding information to estimate the local opacity. Traditional affinity-based methods often suffer from the high computational complexity, which are not suitable for high resolution alpha estimation. Inspired by affinity-based method and the successes of contextual attention in inpainting, we develop a novel end-to-end approach for natural image matting with a guided contextual attention module, which is specifically designed for image matting. Guided contextual attention module directly propagates high-level opacity information globally based on the learned low-level affinity. The proposed method can mimic information flow of affinity-based methods and utilize rich features learned by deep neural networks simultaneously. Experiment results on Composition-1k testing set and this http URL benchmark dataset demonstrate that our method outperforms state-of-the-art approaches in natural image matting.

### Results and models¶

baseline (paper) 40.62 0.0106 21.53 38.43 -
GCA (paper) 35.28 0.0091 16.92 32.53 -
baseline (our) 36.50 0.0090 17.40 34.33 model | log
GCA (our) 34.77 0.0080 16.33 32.20 model | log

More results

baseline (with DIM pipeline) 49.95 0.0144 30.21 49.67 model | log
GCA (with DIM pipeline) 49.42 0.0129 28.07 49.47 model | log

### Citation¶

@inproceedings{li2020natural,
title={Natural Image Matting via Guided Contextual Attention},
author={Li, Yaoyi and Lu, Hongtao},
booktitle={Association for the Advancement of Artificial Intelligence (AAAI)},
year={2020}
}


## IndexNet (ICCV’2019)¶

### Abstract¶

We show that existing upsampling operators can be unified with the notion of the index function. This notion is inspired by an observation in the decoding process of deep image matting where indices-guided unpooling can recover boundary details much better than other upsampling operators such as bilinear interpolation. By looking at the indices as a function of the feature map, we introduce the concept of learning to index, and present a novel index-guided encoder-decoder framework where indices are self-learned adaptively from data and are used to guide the pooling and upsampling operators, without the need of supervision. At the core of this framework is a flexible network module, termed IndexNet, which dynamically predicts indices given an input. Due to its flexibility, IndexNet can be used as a plug-in applying to any off-the-shelf convolutional networks that have coupled downsampling and upsampling stages.

### Results and models¶

M2O DINs (paper) 45.8 0.013 25.9 43.7 -
M2O DINs (our) 45.6 0.012 25.5 44.8 model | log

The performance of training (best performance) with different random seeds diverges in a large range. You may need to run several experiments for each setting to obtain the above performance.

More result

@inproceedings{hao2019indexnet,