Dynamo-Depth: Fixing Unsupervised Depth Estimation for Dynamical Scenes

NeurIPS 2023

Cornell University

Dynamo-Depth learns from unlabeled video sequences and predicts monocular depth, rigid flow, independent flow, and motion segmentation.


Unsupervised monocular depth estimation techniques have demonstrated encouraging results but typically assume that the scene is static. These techniques suffer when trained on dynamical scenes, where apparent object motion can equally be explained by hypothesizing the object's independent motion, or by altering its depth. This ambiguity causes depth estimators to predict erroneous depth for moving objects.

To resolve this issue, we introduce Dynamo-Depth, an unifying approach that disambiguates dynamical motion by jointly learning monocular depth, 3D independent flow field, and motion segmentation from unlabeled monocular videos. Specifically, we offer our key insight that a good initial estimation of motion segmentation is sufficient for jointly learning depth and independent motion despite the fundamental underlying ambiguity. Our proposed method achieves state-of-the-art performance on monocular depth estimation on Waymo Open and nuScenes Dataset with significant improvement in the depth of moving objects.



An overview of the proposed Dynamo-Depth.



Qualitative Results

Additional results on Waymo Open Dataset.

Additional results on nuScenes Dataset.


Object shadows are modeled as independently moving on the ground.


  title={Dynamo-Depth: Fixing Unsupervised Depth Estimation for Dynamical Scenes},
  author={Yihong Sun and Bharath Hariharan},
  booktitle={Thirty-seventh Conference on Neural Information Processing Systems},