Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model
CoRR(2023)
摘要
While methods for monocular depth estimation have made significant strides on
standard benchmarks, zero-shot metric depth estimation remains unsolved.
Challenges include the joint modeling of indoor and outdoor scenes, which often
exhibit significantly different distributions of RGB and depth, and the
depth-scale ambiguity due to unknown camera intrinsics. Recent work has
proposed specialized multi-head architectures for jointly modeling indoor and
outdoor scenes. In contrast, we advocate a generic, task-agnostic diffusion
model, with several advancements such as log-scale depth parameterization to
enable joint modeling of indoor and outdoor scenes, conditioning on the
field-of-view (FOV) to handle scale ambiguity and synthetically augmenting FOV
during training to generalize beyond the limited camera intrinsics in training
datasets. Furthermore, by employing a more diverse training mixture than is
common, and an efficient diffusion parameterization, our method, DMD (Diffusion
for Metric Depth) achieves a 25\% reduction in relative error (REL) on
zero-shot indoor and 33\% reduction on zero-shot outdoor datasets over the
current SOTA using only a small number of denoising steps. For an overview see
https://diffusion-vision.github.io/dmd
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要