Real-world LQ–HQ pairs from MFMs to expand IR generalization boundaries.
Xiangtao Kong1,2,* | Jixin Zhao1,2,* | Lingchen Sun1,2; | Rongyuan Wu1,2; | Lei Zhang1,2,†
1 The Hong Kong Polytechnic University
2 OPPO Research Institute
* Equal contribution. † Corresponding author.
- 2026-06-01: Released the paper.
- 2026-05-28: Released the GGT-100K dataset, baseline training code, and checkpoints.
demo_video.mp4
Demo. Comparing the LQ-GT pairs from GGT-100K. (You can slide it on the Project Page).
Overview of GGT-100K.
GGT-100K significantly improves the generalization capability of the models to real-world degradations.
- 📰 News
- 🧰 Download GGT-100K Dataset
- 🏗️ Construction Process of GGT-100K (including Restoration Evaluation of SOTA MFMs)
- 🖼️ Experimental Results
- 🏋️ Baseline Models Training
- 🔍 Baseline Models Inference
- 📮 Contact
- 📚 Citation
- Hugging Face
- Baidu Disk (password:
f38z)
The download links contain three parts:
GGT-100K: the main paired dataset.existing-dataset: external/previous datasets used in our paper. We recommend downloading and using it together with GGT-100K for training.pretrained-models: pretrained checkpoints for baseline models, including 10 models × 2 settings (20 checkpoints in total): trained on existing data only vs. trained on existing data + GGT-100K.
We provide three JSONL files that list paired paths using relative file paths (relative to the dataset root), for convenient baseline usage:
- Train (existing data, without GGT-100K):
train_existing.jsonl - Train (existing data + GGT-100K):
train_existing_GGT.jsonl - Test (GGT-100K-500):
test_GGT_500.jsonl
Each line is a pair:
{"gt":"relative/path/to/GT.png","lq":"relative/path/to/LQ.png","prompt":""}Note: Among the baseline methods in this project, only Qwen-Image-Edit (qwen-image-edit) uses the prompt field. For other methods, prompt can be left empty.
When using these lists, you should join the relative paths with your local dataset root directory.
This dataset is released under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license: License text
Click to expand construction details
GGT-100K is constructed by these four steps.
We evaluate existing MFMs and report the quantitative results below.
Click to expand experimental results
To demonstrate the effectiveness of GGT-100K, we train 10 restoration models with and without GGT-100K, and report quantitative and visual results.
The codes we use are all official open-source codes. The configurations for each environment can be referred to in the source project.
We also provide a minimal baseline training launcher for the following methods under GGT-100K/:
- DA-CLIP UIR (
daclip-uir) - FoundIR (
FoundIR) - MoCE-IR (
MoCE-IR) - BasicSR family (
X-Restormer): XRestormer / NAFNet / PromptIR / MPRNet / SwinIR - Qwen-Image-Edit (
qwen-image-edit) - Flux-ControlNet (
flux-controlnet)
All baseline commands are unified in train.sh and can be configured via JSONL paths:
--train-jsonl: training pairs JSONL (each line:{"gt":"...","lq":"...","prompt":"..."}; absolute paths recommended)- Note: only Qwen-Image-Edit (
qwen-image-edit) usesprompt. For other methods,promptcan be empty.
- Note: only Qwen-Image-Edit (
Examples:
cd GGT-100K
# (1) Train WITHOUT GGT-100K (existing data only)
bash train.sh --method swinir \
--train-jsonl train_existing.jsonl
# (2) Train WITH GGT-100K (existing + GGT-100K)
bash train.sh --method swinir \
--train-jsonl train_existing_GGT.jsonl
# FoundIR (supports --meta as JSONL)
bash train.sh --method foundir \
--train-jsonl train_existing_GGT.jsonl
# MoCE-IR (paired training via --trainset paired_meta)
bash train.sh --method moce-ir \
--train-jsonl train_existing_GGT.jsonl \
--model MoCE_IR_S --epochs 120
# Qwen-Image-Edit (LoRA finetune; requires full Qwen-Image-Edit-2511 base weights + torchrun)
# train.sh auto-detects weights under qwen-image-edit/Qwen-Image-Edit-2511 or ../Edit_model/Qwen/Qwen-Image-Edit-2511
bash train.sh --method qwen-image-edit \
--train-jsonl train_existing_GGT.jsonl \
--workdir ./outputs_qwenir
--qwen-pretrained-model ../model/Qwen-Image-Edit-2511
# Flux-ControlNet (uses script defaults; override via extra args if needed)
bash train.sh --method flux-controlnet \
--train-jsonl train_existing_GGT.jsonl \
--workdir ./outputs_flux_controlnetAll baseline inference commands are unified in test.sh. Please first download the pretrained model by:
- Hugging Face
- Baidu Disk (password:
f38z)
Common arguments:
--test-jsonl: paired test JSONL (each line:{"gt":"...","lq":"...","prompt":"..."}; absolute paths recommended)- Note: only Qwen-Image-Edit (
qwen-image-edit) usesprompt. For other methods,promptcan be empty.
- Note: only Qwen-Image-Edit (
--ckpt: checkpoint path (format depends on method:.pth/.pt/.ckpt/Folder)--results-dir: output directory for saving results (recommended to always set)
Examples:
cd GGT-100K
# BasicSR family (SwinIR)
bash test.sh --method swinir \
--test-jsonl test_GGT_500.jsonl \
--ckpt GGT-100K-preatrained/model/SwinIR.pth \
--results-dir abs_path/results_swinir
# FoundIR
bash test.sh --method foundir \
--test-jsonl test_GGT_500.jsonl \
--ckpt GGT-100K-preatrained/model/model-2000.pt \
--results-dir abs_path/results_foundir
# MoCE-IR (paired JSONL testing)
bash test.sh --method moce-ir \
--ckpt GGT-100K-preatrained/model/last.ckpt \
--model MoCE_IR_S \
--benchmarks paired_jsonl \
--meta test_GGT_500.jsonl \
--save_results \
--results-dir abs_path/results_moceir
# Qwen-Image-Edit (inference needs base model dir + LoRA weights)
bash test.sh --method qwen-image-edit \
--test-jsonl test_GGT_500.jsonl \
--ckpt GGT-100K-preatrained/model/pytorch_lora_weights.safetensors \
--base-model ../model/Qwen-Image-Edit-2511 \
--results-dir abs_path/results_qwenir
# Flux-ControlNet (inference needs base FLUX model dir + ControlNet checkpoint)
bash test.sh --method flux-controlnet \
--test-jsonl test_GGT_500.jsonl \
--ckpt GGT-100K-preatrained/model/checkpoint-200001 \
--flux-pretrained-model ../model/FLUX.1-dev \
--results-dir abs_path/results_flux_controlnetIf you have any questions, please feel free to contact: xiangtao.kong@connect.polyu.hk
@article{kong2026GGT-100K,
title={GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration},
author={Kong, Xiangtao and Zhao, Jixin and Sun, Lingchen and Wu, Rongyuan and Zhang, Lei},
journal={arXiv preprint arXiv:2605.31039},
year={2026}
}






