Nvidia GANverse3D uses machine learning to generate fully textured realistic 3D meshes for arch viz from a single 2D photo
Snap a photo of a car, upload it to the cloud and seconds later have a fully textured 3D asset to use in your architectural visualisation scene. That’s one potential use case for a new Nvidia Research project that uses GPU-accelerated machine learning technology to turn a single 2D image into a textured 3D mesh ready for rendering in Nvidia Omniverse (read our in-depth Omniverse Enterprise article)
GANverse3D has been developed by the Nvidia AI Research Lab in Toronto. What makes it different to other reality modelling applications is that, once trained on multi-view photos, it can create a mesh from a single 2D image.
This is in contrast to tools like Bentley Systems ContextCapture and Pix4D which need multiple photos taken from multiple angles for each model. The resulting meshes from these established applications are engineering accurate but can take a long time to process.
The output from GANverse3D will be significantly less precise but, according to Nvidia, good enough to provide entourage context for arch viz scenes.
GANverse3D is also exceedingly quick. “It’s real time, essentially,” explains Sanja Fidler, associate professor at the University of Toronto and director of AI at Nvidia. “You upload a picture and boom, there’s a 3D model,” adding that it takes 65 milliseconds on a Nvidia Tesla V100, a ‘Volta’ GPU that is a couple of generations behind Nvidia’s current Ampere GPUs.
How could GANverse3D be used in arch viz?
Nvidia has already shown GANverse3D to several of its AEC Omniverse Lighthouse accounts, as Richard Kerris, Industry GM for M&E at Nvidia and former LucasFilm CTO explained to AEC Magazine.
“Quite often they want to do the visualisation of their building, and they want cars and things like that in the parking lot, not the focus of the thing that they’re building, but to bring more realism to the visualisation process of what they’re presenting,” he said.
“And so, when we’ve shown them this technology, they immediately saw use cases for it because they’re like ‘typically our environments that we show are kind of sterile, unless we spend extra money and go through all these other things. You’ve given us the ability to add this ambient content to it, that kind of brings more realism and focus.’”
Most arch viz tools these days come with a ready supply of entourage, including detailed car models, so we asked Kerris why an architect would choose to use an asset derived from a single photo instead of a higher quality library item?
“Because in the context of AEC, they don’t need the higher quality asset for what they’re doing – it’s kind of overkill in their case,” he replied. “They looked at this as a much easier way for them to solve a visual challenge, quickly and easily, without the cost and overhead of going and downloading models and things that people really wouldn’t get to appreciate if they’re used in an ambient way.”
How does GANverse3D work?
GANverse3D, like other generative adversarial network (GAN) machine learning applications, needs to be trained. To generate a dataset for training, the researchers synthesised images depicting the same object from multiple viewpoints — like a photographer who walks around a parked vehicle, taking shots from different angles.
These multi-view images were then plugged into a rendering framework for inverse graphics, a process of inferring 3D mesh models from 2D images.
At the moment the system has only been trained to recognise cars. A total of 55,000 car images, taken from many different angles, have been fed into the system. To recognise other objects, a similar level of training would be required, and this could be done by a third party.
Fidler explained that the system could be trained on a variety of objects. Static objects that don’t deform a lot would be easiest, she said, such as sofas and tables, but humans would be more difficult.
Enter the Hoff
To showcase the technology, Nvidia researchers used a single photo of KITT, David Hasselhoff’s crime fighting AI-powered car from 1980s TV show Knight Rider.
By analysing a single photo of KITT, GANverse3D was able to predict a corresponding 3D textured mesh, as well as different parts of the vehicle such as wheels and headlights.
The researchers then used Nvidia Omniverse Kit and Nvidia PhysX tools to convert the predicted texture into high-quality materials to give KITT a more realistic look and feel, then placed it in a dynamic driving sequence, alongside other cars.
Back down to (middle) earth
GANverse3D is still very much in its infancy, and it is important to note that it is a research project, not a commercial product.
In addition, the models that it generates are only as good as the data that’s fed in and it can only create accurate meshes if it has seen similar objects from multiple angles in training.
For example, when testing the system with horses, Fidler explained that the generated models were missing detail on top, as you almost never see a picture of a horse from above.
For GANverse3D to develop into a commercial product, users would have to be confident in the quality of the generated output. The speed at which it can generate models is extremely fast, but that’s of little interest to the visualiser if models have annoying artefacts or are incomplete.
It remains to be seen how useful this technology might be for arch viz, especially as there is already a wealth of textured 3D content out there. It might find a role for niche products, such as vintage cars or custom furniture, or at the early stages of design, in the same way that sketches are still exceedingly popular for design exploration.
One can’t deny how impressive this technology looks. When reality modelling photogrammetry software first came out it felt like some kind of magic. But it now feels more Paul Daniels than Gandalf – there’s a new wizard in town.
Read the full research project paper here
Meanwhile, find out what we think of Omniverse Enterprise, Nvidia’s new viz focused collaboration platform.