Datasets:

blanchon
/

dinodepth-dataset

dataset stringclasses 5 values	height int32 375 768	width int32 640 1.24k	depth_scale float32 0 0.01
blendedmvs	576	768	0.004923
irs	540	960	0.00008
blendedmvs	576	768	0.005389
irs	540	960	0.000055
irs	540	960	0.000082
blendedmvs	576	768	0.000449
tartanair	480	640	0.001013
tartanair	480	640	0.000519
tartanair	480	640	0.000454
tartanair	480	640	0.000182
tartanair	480	640	0.000556
tartanair	480	640	0.000611
tartanair	480	640	0.000453
tartanair	480	640	0.00084
irs	540	960	0.000053
blendedmvs	576	768	0.002539
irs	540	960	0.000028
blendedmvs	576	768	0.003264
blendedmvs	576	768	0.001784
hypersim	768	1,024	0.000152
blendedmvs	576	768	0.002245
tartanair	480	640	0.000598
tartanair	480	640	0.000043
hypersim	768	1,024	0.000199
blendedmvs	576	768	0.001257
irs	540	960	0.00004
hypersim	768	1,024	0.00017
irs	540	960	0.00003
blendedmvs	576	768	0.002204
vkitti	375	1,242	0.001518
irs	540	960	0.00015
irs	540	960	0.000197
irs	540	960	0.000196
tartanair	480	640	0.000083
tartanair	480	640	0.000682
blendedmvs	576	768	0.000025
hypersim	768	1,024	0.00011
tartanair	480	640	0.001487
tartanair	480	640	0.001417
blendedmvs	576	768	0.003576
blendedmvs	576	768	0.004579
tartanair	480	640	0.000399
blendedmvs	576	768	0.000111
tartanair	480	640	0.000497
hypersim	768	1,024	0.000326
tartanair	480	640	0.001421
tartanair	480	640	0.000711
tartanair	480	640	0.000094
blendedmvs	576	768	0.000456
tartanair	480	640	0.000446
tartanair	480	640	0.000912
vkitti	375	1,242	0.001517
irs	540	960	0.000072
tartanair	480	640	0.000126
tartanair	480	640	0.000281
blendedmvs	576	768	0.001312
tartanair	480	640	0.000099
tartanair	480	640	0.00033
tartanair	480	640	0.000308
vkitti	375	1,242	0.001504
blendedmvs	576	768	0.004811
irs	540	960	0.000093
blendedmvs	576	768	0.000018
tartanair	480	640	0.000242
tartanair	480	640	0.000978
blendedmvs	576	768	0.000024
blendedmvs	576	768	0.001127
tartanair	480	640	0.00039
tartanair	480	640	0.00051
hypersim	768	1,024	0.000052
hypersim	768	1,024	0.00015
tartanair	480	640	0.000403
tartanair	480	640	0.000496
blendedmvs	576	768	0.003826
irs	540	960	0.000016
blendedmvs	576	768	0.001471
vkitti	375	1,242	0.001464
tartanair	480	640	0.000467
irs	540	960	0.000122
blendedmvs	576	768	0.000555
blendedmvs	576	768	0.002872
hypersim	768	1,024	0.000279
blendedmvs	576	768	0.001638
irs	540	960	0.000045
tartanair	480	640	0.000337
tartanair	480	640	0.001496
tartanair	480	640	0.000122
blendedmvs	576	768	0.00003
tartanair	480	640	0.001513
blendedmvs	576	768	0.001803
blendedmvs	576	768	0.001334
vkitti	375	1,242	0.00152
tartanair	480	640	0.001121
tartanair	480	640	0.00018
irs	540	960	0.000101
blendedmvs	576	768	0.00252
tartanair	480	640	0.001478
tartanair	480	640	0.00015
blendedmvs	576	768	0.003389
blendedmvs	576	768	0.002912

End of preview. Expand in Data Studio

DinoDepth

A large, harmonized, pre-shuffled corpus of 358,905 (image, depth) pairs for training monocular affine-invariant depth models. Five complementary sources — indoor, driving, robotics, aerial, and multi-view-stereo — are decoded to a single planar-depth convention, packed into uniform ~1 GB Parquet shards, and globally shuffled so any shard is a representative sample of the whole.

Schema

column	type	description
`dataset`	string	source dataset tag
`height`, `width`	int32	native resolution of the stored arrays
`image`	binary	JPEG bytes (RGB)
`depth`	binary	16-bit PNG — *`depth = depth_png depth_scale`**, `0` = invalid
`depth_scale`	float32	metres per PNG level (arbitrary per-scene units for multi-view stereo)

import io, numpy as np
from PIL import Image
from datasets import load_dataset

ds = load_dataset("blanchon/dinodepth-dataset", split="train", streaming=True)
row = next(iter(ds))
rgb   = Image.open(io.BytesIO(row["image"]))                                  # RGB
depth = np.asarray(Image.open(io.BytesIO(row["depth"]))) * row["depth_scale"]  # m; 0 = invalid

Composition

source	samples	domain	depth
TartanAir	186,693	robotics / aerial (synthetic)	metric
BlendedMVS	74,838	multi-view stereo (real images)	non-metric
IRS	57,819	indoor (synthetic)	metric
Hypersim	26,912	indoor (synthetic)	metric
Virtual KITTI 2	12,643	driving (synthetic)	metric
total	358,905

Harmonization

Every source is decoded to planar depth Z, with invalid pixels (sky, far-plane saturation, unreconstructed, non-finite) set to 0:

Hypersim — ray distance → planar Z (per-pixel, focal 886.81 px).
Virtual KITTI 2 — native depth / 100 (cm → m).
IRS — disparity → depth (48 / disparity).
TartanAir — native metres; sky masked.
BlendedMVS — per-scene multi-view-stereo depth (arbitrary, non-metric scale).

Depth is stored at native resolution as 16-bit PNG with a per-image depth_scale. Metric sources are clamped to a 100 m far plane; the multi-view-stereo source keeps its per-scene scale — a scale-and-shift-invariant objective absorbs it, so the sources mix directly.

Splits

train — the full 358,905-image corpus, globally shuffled into 141 shards (~1 GB each, small row groups + page index for fast random access).

Evaluation (held-out)

Zero-shot benchmarks, kept out of training and shipped as separate configs. Evaluate with per-image affine (least-squares scale + shift) alignment of predicted disparity to ground truth, then report AbsRel and δ₁:

config	benchmark	notes
`nyuv2`	NYUv2	indoor RGB-D; Eigen 654 test split; metric depth, capped at 10 m
`kitti`	KITTI	driving; Eigen test split; sparse LiDAR depth, capped at 80 m

ev = load_dataset("blanchon/dinodepth-dataset", name="nyuv2", split="test")

License

Released for non-commercial research under CC BY-NC-SA 4.0. Each source retains its original license (Virtual KITTI 2: CC BY-NC-SA 3.0; Hypersim: CC BY-SA 3.0; IRS / BlendedMVS / TartanAir per their upstream terms) — respect the original terms of each source.

_{Data composition and recipe follow AnyDepth (arXiv:2601.02760).}

Downloads last month: -

Paper for blanchon/dinodepth-dataset

AnyDepth: Depth Estimation Made Easy

Paper • 2601.02760 • Published Jan 6 • 11