Shortcuts

Note

You are reading the documentation for MMEditing 0.x, which will soon be deprecated by the end of 2022. We recommend you upgrade to MMEditing 1.0 to enjoy fruitful new features and better performance brought by OpenMMLab 2.0. Check out the changelog, code and documentation of MMEditing 1.0 for more details.

Tutorial 1: Customize Datasets

Supported Data Format

Image Super-Resolution

  • SRAnnotationDataset General paired image dataset with an annotation file for image restoration.

  • SRFolderDataset General paired image folder dataset for image restoration.

  • SRFolderGTDataset General ground-truth image folder dataset for image restoration, where low-quality image should be generated in pipeline.

  • SRFolderRefDataset General paired image folder dataset for reference-based image restoration.

  • SRLmdbDataset General paired image lmdb dataset for image restoration.

  • SRFacialLandmarkDataset Facial image and landmark dataset with an annotation file.

Video Super-Resolution

  • SRFolderMultipleGTDataset General dataset for video super resolution, used for recurrent networks.

  • SRREDSDataset REDS dataset for video super resolution.

  • SRREDSMultipleGTDataset REDS dataset for video super resolution for recurrent networks.

  • SRTestMultipleGTDataset Test dataset for video super resolution for recurrent networks.

  • SRVid4Dataset Vid4 dataset for video super resolution.

  • SRVimeo90KDataset Vimeo90K dataset for video super resolution.

  • SRVimeo90KMultipleGTDataset Vimeo90K dataset for video super resolution for recurrent networks.

Video Frame Interpolation

  • VFIVimeo90KDataset Vimeo90K dataset for video frame interpolation.

Matting

  • AdobeComp1kDataset Adobe composition-1k dataset.

Inpainting

  • ImgInpaintingDataset Only use the image name information from annotation file.

Generation

  • GenerationPairedDataset General paired image folder dataset for image generation.

  • GenerationUnpairedDataset General unpaired image folder dataset for image generation.

Support new data format

You can reorganize new data formats to existing format.

Or create a new dataset in mmedit/datasets to load the data.

Inheriting from the base class of datasets will make it easier to create a new dataset

  • BaseSRDataset

  • BaseVFIDataset

  • BaseMattingDataset

  • BaseGenerationDataset

Here is an example of create a dataset for video frame interpolation:

import os
import os.path as osp

from .base_vfi_dataset import BaseVFIDataset
from .registry import DATASETS


@DATASETS.register_module()
class NewVFIDataset(BaseVFIDataset):
    """Introduce the dataset

    Examples of file structure.

    Args:
        pipeline (list[dict | callable]): A sequence of data transformations.
        folder (str | :obj:`Path`): Path to the folder.
        ann_file (str | :obj:`Path`): Path to the annotation file.
        test_mode (bool): Store `True` when building test dataset.
            Default: `False`.
    """

    def __init__(self, pipeline, folder, ann_file, test_mode=False):
        super().__init__(pipeline, folder, ann_file, test_mode)
        self.data_infos = self.load_annotations()

    def load_annotations(self):
        """Load annoations for the dataset.

        Returns:
            list[dict]: A list of dicts for paired paths and other information.
        """
        data_infos = []
        ...
        return data_infos

If you want create a dataset for a new low level CV task (e.g. denoise, derain, defog, and de-reflection), you can inheriting from BaseDataset.

Here is an example of create a base dataset for denoising:

import copy
from abc import ABCMeta, abstractmethod

from torch.utils.data import Dataset

from .pipelines import Compose

IMG_EXTENSIONS = ('.jpg', '.JPG', '.jpeg', '.JPEG', '.png', '.PNG', '.ppm',
                  '.PPM', '.bmp', '.BMP', '.tif', '.TIF', '.tiff', '.TIFF')


class BaseDnDataset(BaseDataset):
    """Base class for denoising datasets.
    """

    # If any extra parameter is required, please rewrite the `__init__`
    # def __init__(self, pipeline, new_para, test_mode=False):
    #     super().__init__(pipeline, test_mode)
    #     self.new_para = new_para

    @staticmethod
    def scan_folder(path):
        """Obtain image path list (including sub-folders) from a given folder.

        Args:
            path (str | :obj:`Path`): Folder path.

        Returns:
            list[str]: image list obtained form given folder.
        """

        if isinstance(path, (str, Path)):
            path = str(path)
        else:
            raise TypeError("'path' must be a str or a Path object, "
                            f'but received {type(path)}.')

        images = list(scandir(path, suffix=IMG_EXTENSIONS, recursive=True))
        images = [osp.join(path, v) for v in images]
        assert images, f'{path} has no valid image file.'
        return images

    def __getitem__(self, idx):
        """Get item at each call.

        Args:
            idx (int): Index for getting each item.

        Returns:
            dict: The output dict of pipeline.
        """
        results = copy.deepcopy(self.data_infos[idx])
        return self.pipeline(results)

    def evaluate(self, results, logger=None):
        """Evaluate with different metrics.

        Args:
            results (list[tuple]): The output of forward_test() of the model.

        Return:
            dict: Evaluation results dict.
        """
        if not isinstance(results, list):
            raise TypeError(f'results must be a list, but got {type(results)}')
        assert len(results) == len(self), (
            'The length of results is not equal to the dataset len: '
            f'{len(results)} != {len(self)}')

        results = [res['eval_result'] for res in results]  # a list of dict
        eval_result = defaultdict(list)  # a dict of list

        for res in results:
            for metric, val in res.items():
                eval_result[metric].append(val)
        for metric, val_list in eval_result.items():
            assert len(val_list) == len(self), (
                f'Length of evaluation result of {metric} is {len(val_list)}, '
                f'should be {len(self)}')

        # average the results
        eval_result = {
            metric: sum(values) / len(self)
            for metric, values in eval_result.items()
        }

        return eval_result

Welcome to submit new dataset classes to MMEditing.

Customize datasets by dataset wrappers

Repeat dataset

We use RepeatDataset as wrapper to repeat the dataset. For example, suppose the original dataset is Dataset_A, to repeat it, the config looks like the following

dataset_A_train = dict(
        type='RepeatDataset',
        times=N,
        dataset=dict(  # This is the original config of Dataset_A
            type='Dataset_A',
            ...
            pipeline=train_pipeline
        )
    )
Read the Docs v: stable
Versions
latest
stable
1.x
v0.16.0
v0.15.2
v0.15.1
v0.15.0
v0.14.0
v0.13.0
v0.12.0
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.