Emergent Extreme-View Geometry in 3D Foundation Models
We create a lightweight fine-tuning method for 3D foundation models to improve extreme-view geometry estimation, and release benchmarks for hard unconstrained image collections.
I am a CS Ph.D. student at NYU Courant, advised by Prof. David Fouhey. I received my Bachelor’s in Computer Science at Cornell University, advised by Prof. Noah Snavely.
My research focuses on 3D computer vision for understanding and reconstructing real-world scenes from large, unconstrained image and video collections. I am especially interested in scalable learned systems that turn internet-scale visual data into accurate, generalizable 3D representations of the world.
We create a lightweight fine-tuning method for 3D foundation models to improve extreme-view geometry estimation, and release benchmarks for hard unconstrained image collections.
We create a large-scale, high-quality dataset of dynamic camera poses from 100K internet videos.
We create a dataset of 100K SfM reconstructions from 2M internet photos around the world. We use it to train a model for scene-level novel-view synthesis.
We train a classifier to disambiguate images that depict distinct, but visually similar structures, which we coin as "doppelgangers". We use this classifier to improve reconstruction quality in structure-from-motion.