Hello, I'm Joseph Tung!

I am a CS Ph.D. student at NYU Courant, advised by Prof. David Fouhey. I received my Bachelor’s in Computer Science at Cornell University, advised by Prof. Noah Snavely.

Email Scholar GitHub Twitter

Research Interests

My research focuses on 3D computer vision for understanding and reconstructing real-world scenes from large, unconstrained image and video collections. I am especially interested in scalable learned systems that turn internet-scale visual data into accurate, generalizable 3D representations of the world.

News

February 2026 We released Emergent Extreme-View Geometry in 3D Foundation Models and the MegaUnScene dataset!
April 2025 The DynPose-100K dataset is now available for download!
May 2024 I will be continuing my research this summer at Cornell Tech with Prof. Noah Snavely!

Publications

Emergent Extreme-View Geometry in 3D Foundation Models

Yiwen Zhang, Joseph Tung, Ruojin Cai, David Fouhey, Hadar Averbuch-Elor

CVPR 2026

We create a lightweight fine-tuning method for 3D foundation models to improve extreme-view geometry estimation, and release benchmarks for hard unconstrained image collections.

Project Page PDF arXiv Code Dataset

Dynamic Camera Poses and Where to Find Them

Chris Rockwell, Joseph Tung, Tsung-Yi Lin, Ming-Yu Liu, David Fouhey, Chen-Hsuan Lin

CVPR 2025

We create a large-scale, high-quality dataset of dynamic camera poses from 100K internet videos.

Project Page PDF arXiv Dataset

MegaScenes: Scene-Level View Synthesis at Scale

Joseph Tung*, Gene Chou*, Ruojin Cai, Guandao Yang, Kai Zhang, Gordon Wetzstein, Bharath Harharan, Noah Snavely

ECCV 2024

We create a dataset of 100K SfM reconstructions from 2M internet photos around the world. We use it to train a model for scene-level novel-view synthesis.

Project Page PDF arXiv Code Dataset Web-Viewer

Doppelgangers: Learning to Disambiguate Images of Similar Structures

Ruojin Cai, Joseph Tung, Qianqian Wang, Hadar Averbuch-Elor, Bharath Harharan, Noah Snavely

ICCV 2023 (Oral)

We train a classifier to disambiguate images that depict distinct, but visually similar structures, which we coin as "doppelgangers". We use this classifier to improve reconstruction quality in structure-from-motion.

Project Page PDF arXiv Code