Skip to content

PolyU-VCLab/GGT-100K

Repository files navigation

GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration

Real-world LQ–HQ pairs from MFMs to expand IR generalization boundaries.

Paper GGT-100K-HuggingFace GGT-100K-BaiduDisk ProjectPage

Xiangtao Kong1,2,* | Jixin Zhao1,2,* | Lingchen Sun1,2; | Rongyuan Wu1,2; | Lei Zhang1,2,†

1 The Hong Kong Polytechnic University
2 OPPO Research Institute

* Equal contribution. Corresponding author.

📰 News

  • 2026-06-01: Released the paper.
  • 2026-05-28: Released the GGT-100K dataset, baseline training code, and checkpoints.

demo_video.mp4

Demo. Comparing the LQ-GT pairs from GGT-100K. (You can slide it on the Project Page).

GGT-100K overview

Overview of GGT-100K.

GGT-100K compare1

GGT-100K significantly improves the generalization capability of the models to real-world degradations.

📌 Quick Links


🧰 Download GGT-100K Dataset

Download links

Expected file structure

The download links contain three parts:

  • GGT-100K: the main paired dataset.
  • existing-dataset: external/previous datasets used in our paper. We recommend downloading and using it together with GGT-100K for training.
  • pretrained-models: pretrained checkpoints for baseline models, including 10 models × 2 settings (20 checkpoints in total): trained on existing data only vs. trained on existing data + GGT-100K.

We provide three JSONL files that list paired paths using relative file paths (relative to the dataset root), for convenient baseline usage:

  • Train (existing data, without GGT-100K): train_existing.jsonl
  • Train (existing data + GGT-100K): train_existing_GGT.jsonl
  • Test (GGT-100K-500): test_GGT_500.jsonl

Each line is a pair:

{"gt":"relative/path/to/GT.png","lq":"relative/path/to/LQ.png","prompt":""}

Note: Among the baseline methods in this project, only Qwen-Image-Edit (qwen-image-edit) uses the prompt field. For other methods, prompt can be left empty.

When using these lists, you should join the relative paths with your local dataset root directory.

License

This dataset is released under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license: License text


🏗️ Construction Process of GGT-100K

Click to expand construction details

GGT-100K construction overview

GGT-100K is constructed by these four steps.

Restoration evaluation of MFMs

We evaluate existing MFMs and report the quantitative results below.

MFM restoration evaluation table


🖼️ Experimental Results

Click to expand experimental results

To demonstrate the effectiveness of GGT-100K, we train 10 restoration models with and without GGT-100K, and report quantitative and visual results.

Quantitative comparison

Experimental results table

Visual comparison

Visual comparison (main)

Visual comparison (more)


🏋️ Training

Baseline Methods Training

The codes we use are all official open-source codes. The configurations for each environment can be referred to in the source project.

We also provide a minimal baseline training launcher for the following methods under GGT-100K/:

  • DA-CLIP UIR (daclip-uir)
  • FoundIR (FoundIR)
  • MoCE-IR (MoCE-IR)
  • BasicSR family (X-Restormer): XRestormer / NAFNet / PromptIR / MPRNet / SwinIR
  • Qwen-Image-Edit (qwen-image-edit)
  • Flux-ControlNet (flux-controlnet)

All baseline commands are unified in train.sh and can be configured via JSONL paths:

  • --train-jsonl: training pairs JSONL (each line: {"gt":"...","lq":"...","prompt":"..."}; absolute paths recommended)
    • Note: only Qwen-Image-Edit (qwen-image-edit) uses prompt. For other methods, prompt can be empty.

Examples:

cd GGT-100K

# (1) Train WITHOUT GGT-100K (existing data only)
bash train.sh --method swinir \
  --train-jsonl train_existing.jsonl

# (2) Train WITH GGT-100K (existing + GGT-100K)
bash train.sh --method swinir \
  --train-jsonl train_existing_GGT.jsonl

# FoundIR (supports --meta as JSONL)
bash train.sh --method foundir \
  --train-jsonl train_existing_GGT.jsonl

# MoCE-IR (paired training via --trainset paired_meta)
bash train.sh --method moce-ir \
  --train-jsonl train_existing_GGT.jsonl \
  --model MoCE_IR_S --epochs 120

# Qwen-Image-Edit (LoRA finetune; requires full Qwen-Image-Edit-2511 base weights + torchrun)
# train.sh auto-detects weights under qwen-image-edit/Qwen-Image-Edit-2511 or ../Edit_model/Qwen/Qwen-Image-Edit-2511
bash train.sh --method qwen-image-edit \
  --train-jsonl train_existing_GGT.jsonl \
  --workdir ./outputs_qwenir
  --qwen-pretrained-model ../model/Qwen-Image-Edit-2511

# Flux-ControlNet (uses script defaults; override via extra args if needed)
bash train.sh --method flux-controlnet \
  --train-jsonl train_existing_GGT.jsonl \
  --workdir ./outputs_flux_controlnet

🔍 Inference

All baseline inference commands are unified in test.sh. Please first download the pretrained model by:

Common arguments:

  • --test-jsonl: paired test JSONL (each line: {"gt":"...","lq":"...","prompt":"..."}; absolute paths recommended)
    • Note: only Qwen-Image-Edit (qwen-image-edit) uses prompt. For other methods, prompt can be empty.
  • --ckpt: checkpoint path (format depends on method: .pth / .pt / .ckpt / Folder)
  • --results-dir: output directory for saving results (recommended to always set)

Examples:

cd GGT-100K

# BasicSR family (SwinIR)
bash test.sh --method swinir \
  --test-jsonl test_GGT_500.jsonl \
  --ckpt GGT-100K-preatrained/model/SwinIR.pth \
  --results-dir abs_path/results_swinir

# FoundIR
bash test.sh --method foundir \
  --test-jsonl test_GGT_500.jsonl \
  --ckpt GGT-100K-preatrained/model/model-2000.pt \
  --results-dir abs_path/results_foundir

# MoCE-IR (paired JSONL testing)
bash test.sh --method moce-ir \
  --ckpt GGT-100K-preatrained/model/last.ckpt \
  --model MoCE_IR_S \
  --benchmarks paired_jsonl \
  --meta test_GGT_500.jsonl \
  --save_results \
  --results-dir abs_path/results_moceir

# Qwen-Image-Edit (inference needs base model dir + LoRA weights)
bash test.sh --method qwen-image-edit \
  --test-jsonl test_GGT_500.jsonl \
  --ckpt GGT-100K-preatrained/model/pytorch_lora_weights.safetensors \
  --base-model ../model/Qwen-Image-Edit-2511 \
  --results-dir abs_path/results_qwenir

# Flux-ControlNet (inference needs base FLUX model dir + ControlNet checkpoint)
bash test.sh --method flux-controlnet \
  --test-jsonl test_GGT_500.jsonl \
  --ckpt GGT-100K-preatrained/model/checkpoint-200001 \
  --flux-pretrained-model ../model/FLUX.1-dev \
  --results-dir abs_path/results_flux_controlnet

📮 Contact

If you have any questions, please feel free to contact: xiangtao.kong@connect.polyu.hk

📚 Citation

@article{kong2026GGT-100K,
  title={GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration},
  author={Kong, Xiangtao and Zhao, Jixin and Sun, Lingchen and Wu, Rongyuan and Zhang, Lei},
  journal={arXiv preprint arXiv:2605.31039},
  year={2026}
}

About

GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages