GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration

Real-world LQ–HQ pairs from MFMs to expand IR generalization boundaries.

Xiangtao Kong^1,2,* | Jixin Zhao^1,2,* | Lingchen Sun^1,2; | Rongyuan Wu^1,2; | Lei Zhang^1,2,†

¹ The Hong Kong Polytechnic University
² OPPO Research Institute

^* Equal contribution. ^† Corresponding author.

📰 News

2026-06-01: Released the paper.
2026-05-28: Released the GGT-100K dataset, baseline training code, and checkpoints.

demo_video.mp4

Demo. Comparing the LQ-GT pairs from GGT-100K. (You can slide it on the Project Page).

Overview of GGT-100K.

GGT-100K significantly improves the generalization capability of the models to real-world degradations.

📌 Quick Links

📰 News
🧰 Download GGT-100K Dataset
🏗️ Construction Process of GGT-100K (including Restoration Evaluation of SOTA MFMs)
🖼️ Experimental Results
🏋️ Baseline Models Training
🔍 Baseline Models Inference
📮 Contact
📚 Citation

🧰 Download GGT-100K Dataset

Download links

Hugging Face
Baidu Disk (password: f38z)

Expected file structure

The download links contain three parts:

GGT-100K: the main paired dataset.
existing-dataset: external/previous datasets used in our paper. We recommend downloading and using it together with GGT-100K for training.
pretrained-models: pretrained checkpoints for baseline models, including 10 models × 2 settings (20 checkpoints in total): trained on existing data only vs. trained on existing data + GGT-100K.

We provide three JSONL files that list paired paths using relative file paths (relative to the dataset root), for convenient baseline usage:

Train (existing data, without GGT-100K): train_existing.jsonl
Train (existing data + GGT-100K): train_existing_GGT.jsonl
Test (GGT-100K-500): test_GGT_500.jsonl

Each line is a pair:

{"gt":"relative/path/to/GT.png","lq":"relative/path/to/LQ.png","prompt":""}

Note: Among the baseline methods in this project, only Qwen-Image-Edit (qwen-image-edit) uses the prompt field. For other methods, prompt can be left empty.

When using these lists, you should join the relative paths with your local dataset root directory.

License

This dataset is released under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license: License text

🏗️ Construction Process of GGT-100K

Click to expand construction details

GGT-100K is constructed by these four steps.

Restoration evaluation of MFMs

We evaluate existing MFMs and report the quantitative results below.

🖼️ Experimental Results

Click to expand experimental results

To demonstrate the effectiveness of GGT-100K, we train 10 restoration models with and without GGT-100K, and report quantitative and visual results.

Quantitative comparison

Visual comparison

🏋️ Training

Baseline Methods Training

The codes we use are all official open-source codes. The configurations for each environment can be referred to in the source project.

We also provide a minimal baseline training launcher for the following methods under GGT-100K/:

DA-CLIP UIR (daclip-uir)
FoundIR (FoundIR)
MoCE-IR (MoCE-IR)
BasicSR family (X-Restormer): XRestormer / NAFNet / PromptIR / MPRNet / SwinIR
Qwen-Image-Edit (qwen-image-edit)
Flux-ControlNet (flux-controlnet)

All baseline commands are unified in train.sh and can be configured via JSONL paths:

--train-jsonl: training pairs JSONL (each line: {"gt":"...","lq":"...","prompt":"..."}; absolute paths recommended)
- Note: only Qwen-Image-Edit (qwen-image-edit) uses prompt. For other methods, prompt can be empty.

Examples:

cd GGT-100K

# (1) Train WITHOUT GGT-100K (existing data only)
bash train.sh --method swinir \
  --train-jsonl train_existing.jsonl

# (2) Train WITH GGT-100K (existing + GGT-100K)
bash train.sh --method swinir \
  --train-jsonl train_existing_GGT.jsonl

# FoundIR (supports --meta as JSONL)
bash train.sh --method foundir \
  --train-jsonl train_existing_GGT.jsonl

# MoCE-IR (paired training via --trainset paired_meta)
bash train.sh --method moce-ir \
  --train-jsonl train_existing_GGT.jsonl \
  --model MoCE_IR_S --epochs 120

# Qwen-Image-Edit (LoRA finetune; requires full Qwen-Image-Edit-2511 base weights + torchrun)
# train.sh auto-detects weights under qwen-image-edit/Qwen-Image-Edit-2511 or ../Edit_model/Qwen/Qwen-Image-Edit-2511
bash train.sh --method qwen-image-edit \
  --train-jsonl train_existing_GGT.jsonl \
  --workdir ./outputs_qwenir
  --qwen-pretrained-model ../model/Qwen-Image-Edit-2511

# Flux-ControlNet (uses script defaults; override via extra args if needed)
bash train.sh --method flux-controlnet \
  --train-jsonl train_existing_GGT.jsonl \
  --workdir ./outputs_flux_controlnet

🔍 Inference

All baseline inference commands are unified in test.sh. Please first download the pretrained model by:

Hugging Face
Baidu Disk (password: f38z)

Common arguments:

--test-jsonl: paired test JSONL (each line: {"gt":"...","lq":"...","prompt":"..."}; absolute paths recommended)
- Note: only Qwen-Image-Edit (qwen-image-edit) uses prompt. For other methods, prompt can be empty.
--ckpt: checkpoint path (format depends on method: .pth / .pt / .ckpt / Folder)
--results-dir: output directory for saving results (recommended to always set)

Examples:

cd GGT-100K

# BasicSR family (SwinIR)
bash test.sh --method swinir \
  --test-jsonl test_GGT_500.jsonl \
  --ckpt GGT-100K-preatrained/model/SwinIR.pth \
  --results-dir abs_path/results_swinir

# FoundIR
bash test.sh --method foundir \
  --test-jsonl test_GGT_500.jsonl \
  --ckpt GGT-100K-preatrained/model/model-2000.pt \
  --results-dir abs_path/results_foundir

# MoCE-IR (paired JSONL testing)
bash test.sh --method moce-ir \
  --ckpt GGT-100K-preatrained/model/last.ckpt \
  --model MoCE_IR_S \
  --benchmarks paired_jsonl \
  --meta test_GGT_500.jsonl \
  --save_results \
  --results-dir abs_path/results_moceir

# Qwen-Image-Edit (inference needs base model dir + LoRA weights)
bash test.sh --method qwen-image-edit \
  --test-jsonl test_GGT_500.jsonl \
  --ckpt GGT-100K-preatrained/model/pytorch_lora_weights.safetensors \
  --base-model ../model/Qwen-Image-Edit-2511 \
  --results-dir abs_path/results_qwenir

# Flux-ControlNet (inference needs base FLUX model dir + ControlNet checkpoint)
bash test.sh --method flux-controlnet \
  --test-jsonl test_GGT_500.jsonl \
  --ckpt GGT-100K-preatrained/model/checkpoint-200001 \
  --flux-pretrained-model ../model/FLUX.1-dev \
  --results-dir abs_path/results_flux_controlnet

📮 Contact

If you have any questions, please feel free to contact: xiangtao.kong@connect.polyu.hk

📚 Citation

@article{kong2026GGT-100K,
  title={GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration},
  author={Kong, Xiangtao and Zhao, Jixin and Sun, Lingchen and Wu, Rongyuan and Zhang, Lei},
  journal={arXiv preprint arXiv:2605.31039},
  year={2026}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration

📰 News

📌 Quick Links

🧰 Download GGT-100K Dataset

Download links

Expected file structure

License

🏗️ Construction Process of GGT-100K

Restoration evaluation of MFMs

🖼️ Experimental Results

Quantitative comparison

Visual comparison

🏋️ Training

Baseline Methods Training

🔍 Inference

📮 Contact

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
FoundIR		FoundIR
MoCE-IR		MoCE-IR
X-Restormer		X-Restormer
daclip-uir		daclip-uir
docs		docs
flux-controlnet		flux-controlnet
qwen-image-edit		qwen-image-edit
README.md		README.md
test.sh		test.sh
test_GGT_500.jsonl		test_GGT_500.jsonl
train.sh		train.sh
train_existing.jsonl		train_existing.jsonl
train_existing_GGT.jsonl		train_existing_GGT.jsonl

Folders and files

Latest commit

History

Repository files navigation

GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration

📰 News

📌 Quick Links

🧰 Download GGT-100K Dataset

Download links

Expected file structure

License

🏗️ Construction Process of GGT-100K

Restoration evaluation of MFMs

🖼️ Experimental Results

Quantitative comparison

Visual comparison

🏋️ Training

Baseline Methods Training

🔍 Inference

📮 Contact

📚 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages