Note
You are reading the documentation for MMEditing 0.x, which will soon be deprecated by the end of 2022. We recommend you upgrade to MMEditing 1.0 to enjoy fruitful new features and better performance brought by OpenMMLab 2.0. Check out the changelog, code and documentation of MMEditing 1.0 for more details.
Tutorial 1: Customize Datasets¶
Supported Data Format¶
Image Super-Resolution¶
SRAnnotationDataset General paired image dataset with an annotation file for image restoration.
SRFolderDataset General paired image folder dataset for image restoration.
SRFolderGTDataset General ground-truth image folder dataset for image restoration, where low-quality image should be generated in pipeline.
SRFolderRefDataset General paired image folder dataset for reference-based image restoration.
SRLmdbDataset General paired image lmdb dataset for image restoration.
SRFacialLandmarkDataset Facial image and landmark dataset with an annotation file.
Video Super-Resolution¶
SRFolderMultipleGTDataset General dataset for video super resolution, used for recurrent networks.
SRREDSDataset REDS dataset for video super resolution.
SRREDSMultipleGTDataset REDS dataset for video super resolution for recurrent networks.
SRTestMultipleGTDataset Test dataset for video super resolution for recurrent networks.
SRVid4Dataset Vid4 dataset for video super resolution.
SRVimeo90KDataset Vimeo90K dataset for video super resolution.
SRVimeo90KMultipleGTDataset Vimeo90K dataset for video super resolution for recurrent networks.
Video Frame Interpolation¶
VFIVimeo90KDataset Vimeo90K dataset for video frame interpolation.
Matting¶
AdobeComp1kDataset Adobe composition-1k dataset.
Inpainting¶
ImgInpaintingDataset Only use the image name information from annotation file.
Generation¶
GenerationPairedDataset General paired image folder dataset for image generation.
GenerationUnpairedDataset General unpaired image folder dataset for image generation.
Support new data format¶
You can reorganize new data formats to existing format.
Or create a new dataset in mmedit/datasets to load the data.
Inheriting from the base class of datasets will make it easier to create a new dataset
BaseSRDataset
BaseVFIDataset
BaseMattingDataset
BaseGenerationDataset
Here is an example of create a dataset for video frame interpolation:
import os
import os.path as osp
from .base_vfi_dataset import BaseVFIDataset
from .registry import DATASETS
@DATASETS.register_module()
class NewVFIDataset(BaseVFIDataset):
"""Introduce the dataset
Examples of file structure.
Args:
pipeline (list[dict | callable]): A sequence of data transformations.
folder (str | :obj:`Path`): Path to the folder.
ann_file (str | :obj:`Path`): Path to the annotation file.
test_mode (bool): Store `True` when building test dataset.
Default: `False`.
"""
def __init__(self, pipeline, folder, ann_file, test_mode=False):
super().__init__(pipeline, folder, ann_file, test_mode)
self.data_infos = self.load_annotations()
def load_annotations(self):
"""Load annoations for the dataset.
Returns:
list[dict]: A list of dicts for paired paths and other information.
"""
data_infos = []
...
return data_infos
If you want create a dataset for a new low level CV task (e.g. denoise, derain, defog, and de-reflection), you can inheriting from BaseDataset.
Here is an example of create a base dataset for denoising:
import copy
from abc import ABCMeta, abstractmethod
from torch.utils.data import Dataset
from .pipelines import Compose
IMG_EXTENSIONS = ('.jpg', '.JPG', '.jpeg', '.JPEG', '.png', '.PNG', '.ppm',
'.PPM', '.bmp', '.BMP', '.tif', '.TIF', '.tiff', '.TIFF')
class BaseDnDataset(BaseDataset):
"""Base class for denoising datasets.
"""
# If any extra parameter is required, please rewrite the `__init__`
# def __init__(self, pipeline, new_para, test_mode=False):
# super().__init__(pipeline, test_mode)
# self.new_para = new_para
@staticmethod
def scan_folder(path):
"""Obtain image path list (including sub-folders) from a given folder.
Args:
path (str | :obj:`Path`): Folder path.
Returns:
list[str]: image list obtained form given folder.
"""
if isinstance(path, (str, Path)):
path = str(path)
else:
raise TypeError("'path' must be a str or a Path object, "
f'but received {type(path)}.')
images = list(scandir(path, suffix=IMG_EXTENSIONS, recursive=True))
images = [osp.join(path, v) for v in images]
assert images, f'{path} has no valid image file.'
return images
def __getitem__(self, idx):
"""Get item at each call.
Args:
idx (int): Index for getting each item.
Returns:
dict: The output dict of pipeline.
"""
results = copy.deepcopy(self.data_infos[idx])
return self.pipeline(results)
def evaluate(self, results, logger=None):
"""Evaluate with different metrics.
Args:
results (list[tuple]): The output of forward_test() of the model.
Return:
dict: Evaluation results dict.
"""
if not isinstance(results, list):
raise TypeError(f'results must be a list, but got {type(results)}')
assert len(results) == len(self), (
'The length of results is not equal to the dataset len: '
f'{len(results)} != {len(self)}')
results = [res['eval_result'] for res in results] # a list of dict
eval_result = defaultdict(list) # a dict of list
for res in results:
for metric, val in res.items():
eval_result[metric].append(val)
for metric, val_list in eval_result.items():
assert len(val_list) == len(self), (
f'Length of evaluation result of {metric} is {len(val_list)}, '
f'should be {len(self)}')
# average the results
eval_result = {
metric: sum(values) / len(self)
for metric, values in eval_result.items()
}
return eval_result
Welcome to submit new dataset classes to MMEditing.
Customize datasets by dataset wrappers¶
Repeat dataset¶
We use RepeatDataset as wrapper to repeat the dataset. For example, suppose the original dataset is Dataset_A, to repeat it, the config looks like the following
dataset_A_train = dict(
type='RepeatDataset',
times=N,
dataset=dict( # This is the original config of Dataset_A
type='Dataset_A',
...
pipeline=train_pipeline
)
)