Tool transforms world landmark photos into 4D experiences
The method, which employs deep learning to ingest and synthesize tens of thousands of mostly untagged and undated photos, solves a problem that has eluded experts in computer vision for six decades.
"It's a new way of modeling scenes that not only allows you to move your head and see, say, the fountain from different viewpoints, but also gives you controls for changing the time," said Noah Snavely, associate professor of computer science at Cornell Tech and senior author of "Crowdsampling the Plenoptic Function," presented at the European Conference on Computer Vision, held virtually Aug. 23-28.
"If you really went to the Trevi Fountain on your vacation, the way it would look would depend on what time you went -- at night, it would be lit up by floodlights from the bottom. In the afternoon, it would be sunlit, unless you went on a cloudy day," Snavely said. "We learned the whole range of appearances, based on time of day and weather, from these unorganized photo collections, such that you can explore the whole range and simultaneously move around the scene."
Representing a place in a photorealistic way is challenging for traditional computer vision, partly because of the sheer number of textures to be reproduced. "The real world is so diverse in its appearance and has different kinds of materials -- shiny things, water, thin structures," Snavely said.
Another problem is the inconsistency of the available data. Describing how something looks from every possible viewpoint in space and time -- known as the plenoptic function -- would be a manageable task with hundreds of webcams affixed around a scene, recording data day and night. But since this isn't practical, the researchers had to develop a way to compensate."