From the rapid generation of high-quality visualisations to process optimisation, diffusion models are having a huge impact in AEC. Nvidia’s Sama Bali explains how this powerful generative AI technology works, how it can be applied to different workflows, and how AEC firms can get on board.
Since the introduction of generative AI, large language models (LLMs) like GPT-4 have been at the forefront, renowned for their versatility in natural language processing, machine translation, and content creation. Alongside these, image generators such as OpenAI’s DALL-E, Google’s Imagen, Midjourney and Stability AI’s Stable Diffusion, are changing the way architects, engineers, and construction professionals visualise and design projects, enabling rapid prototyping, enhanced creativity, and more efficient workflows.
At their core, diffusion models possess a distinctive capability. They can generate high-quality data from prompts by progressively adding and removing noise from a dataset.
Training diffusion models is done by adding noise to millions of images over many iterations and rewarding the model when it recreates the image in the reverse process. Once trained, the model is ready for inference whereby a user is able to generate realistic data, such as images, text, video, audio or 3D models.
Why noise? It helps diffusion models mimic random changes, understand the data, prevent overfitting, and ensure smooth transformations.
Imagine you have a sketch of a building design. You start adding random noise to it, making it look more and more like a messy scribble. This is the forward process. The reverse process is like cleaning up that messy scribble step by step until you get back to a detailed and clear architectural rendering.
The model learns how to do this cleaning process so well that it can start with random noise and end up generating a completely new, realistic building design. With this innovative approach diffusion models can produce remarkably accurate and detailed outputs, making them a powerful tool.
Diffusion models have a reputation for being difficult to control due to the way they learn, interpret, and produce visuals. However, ControlNets, a group of neural networks trained on specific tasks, can enhance the base model’s capabilities. Architects can exert precise structural and visual control over the generation process by providing references.
For example, Sketch ControlNet can transform an architectural drawing into a fully realised render.
Multiple ControlNets can be combined together for additional control. For instance, a Sketch ControlNet can be paired with an adaptor, which can incorporate a reference image to apply specific colours and styles to the design.
ControlNets are highly effective as they can process various types of information, empowering architects and designers with new ways to manage their designs and communicate ideas with clients.
Leveraging Nvidia accelerated compute capabilities further enhances the performance of diffusion models. Nvidia-optimised models, such as the SDXL Turbo and LCM-LoRA, offer state-of-the-art performance with real-time image generation capabilities. These models significantly improve inference speed and reduce latency, enabling the production of up to four images per second–drastically reducing the time required for high-resolution image generation.
Diffusion models offer several specific benefits to the AEC sector, enhancing various aspects of design, visualisation, and project management:
High-quality visualisations
Diffusion models can generate photorealistic images and videos from simple sketches, textual descriptions, or a combination. This capability is invaluable for creating detailed architectural renderings and visualisations, helping decision-makers understand and visualise proposed projects.
Daylighting and energy efficiency
Diffusion models can generate daylighting maps and analyse the impact of natural light on building designs. This helps optimise window placements and other design elements to enhance indoor daylighting and energy efficiency, ensuring that buildings are comfortable and sustainable.
Rapid prototyping
By automating the generation of design alternatives and visualisations, including materials, or object positioning, diffusion models can significantly speed up the design process. Architects and engineers can explore more design options faster, leading to more innovative and optimised solutions.
Cost savings and process optimisation
Diffusion models enable the customisation of BIM policies to suit the needs of specific regions and projects. By ensuring that resources are directed to the areas of greatest need, resource allocation is improved. This flexibility makes sure that policies are tailored to the unique requirements of different regions and projects, leading to reduced project costs and improved overall efficiency.
Use, customise, or build your diffusion models
Organisations can leverage diffusion models in multiple ways. They can use pretrained models as-is, customise them for specific needs, or build new models from scratch and harness their full potential by tailoring them to a user’s unique requirements.
Pretrained models are deployable immediately, reducing the time to market and minimising initial investment. Customising pretrained models enables the integration of domain-specific data, improving accuracy and relevance for particular applications. Developing models from scratch, although resource-intensive, enables the creation of highly specialised solutions that can address unique challenges and provide a competitive edge.
Consider diffusion models in the AEC industry like architecting a house. Using pretrained models is similar to using standard prefabricated homes—they’re ready to use, saving time and initial costs. Customising pretrained models is like modifying standard off-the-shelf house plans to fit specific requirements, making sure the design meets particular needs and preferences. Building models from scratch is similar to creating entirely new blueprints from the ground up. This approach offers the most flexibility and customisation but requires significant expertise, time, and resources.
Each method has advantages and disadvantages, enabling organisations to select the most suitable approach according to their project objectives and available resources.
Pretrained models for quick deployment
For many organisations, the quickest way to benefit from diffusion models is to use pretrained models. Available through the Nvidia API catalog, these models are optimised for high performance and can be deployed directly into applications.
Nvidia NIM offers a streamlined and efficient way for organisations to deploy diffusion models, enabling the generation of high-resolution, realistic images from text prompts. With prebuilt containers, organisations can quickly set up and run diffusion models on Nvidia accelerated infrastructure (available from Nvidia workstations, data centres, cloud services partners, and private on-prem servers).
This approach simplifies the deployment process and maximises performance, enabling businesses to focus on building innovative generative AI workflows without the complexities of model development and optimisation.
Developers can experience and experiment with Nvidia-hosted NIMs at no charge.
Members of the Nvidia Developer Program can access NIM for free for research, development, and testing on their preferred infrastructure.
Enterprises can deploy AI applications in production with NIM through the Nvidia AI Enterprise software platform.
Customising diffusion models
Customising diffusion models can improve the relevance, accuracy, and performance of diffusion models for AEC organisations. It also enables organisations to include their own knowledge and industry-specific terms, and to address specific challenges.
Fine-tuning involves taking a pretrained model and adjusting its parameters using a smaller, domain-specific dataset to better align with the specific needs and nuances of the organisation. This tailored approach improves the quality and utility of the generated content and offers scalability and flexibility. Organisations can adapt the models as their needs evolve.
For firms wanting a user-friendly path to start customising diffusion models, Nvidia AI Workbench offers a streamlined environment that lets data scientists and developers get up and running quickly with generative AI. With AI Workbench users can get started with pre-configured projects that are adaptable to different data and use cases. It’s ideal for quick, iterative development and local testing.
Example projects, such as fine-tuning diffusion models, can be modified to support things like generating architectural renderings. Furthermore, this flexibility extends to supported infrastructure. Users can start locally on Nvidia RTX-powered AI Workstations and scale to virtually anywhere—data centre or cloud—in just a few clicks. For more details on how to customise diffusion models, explore the GitHub project.
Another lightweight training technique used for fine-tuning diffusion models is Low-Rank Adaptation or LoRA. LoRA models are ideal for architectural firms due to their small size. They can be managed and trained on local workstations without extensive cloud resources.
Check out how you can seamlessly deploy and scale multiple LoRA adapters with Nvidia NIM.
For advanced customisation and high-performance training, Nvidia NeMo offers a comprehensive, scalable, and cloud-native platform. NeMo offers a choice of customisation techniques and is optimised for at-scale inference of diffusion models, with multi-GPU and multi-node configurations.
The DRaFT+ algorithm, integrated into the NeMo framework, enhances the fine-tuning of diffusion models and makes sure that the model produces diverse and high-quality outputs aligned with specific project requirements. For more technical details and to access the DRaFT+ algorithm, visit the NeMo-Aligner library on GitHub.
Nvidia Launchpad provides a free hands-on lab environment where AEC professionals can learn to fine-tune diffusion models with custom images and optimise them for specific tasks, such as generating high-quality architectural renderings or visualising construction projects.
Building diffusion models that match your style
Now that we’ve covered pretrained and customised models, let’s build diffusion models from scratch. Investing in custom diffusion models allows AEC organisations to harness the full potential of AI, leading to more efficient, accurate, and innovative project outcomes.
For instance, an architectural firm might build their own diffusion model to generate design concepts that align with their specific architectural style and client preferences, while a construction company could develop a model to optimise resource allocation and project scheduling.
One example of this approach is the work of Heatherwick Studio, a design firm based in London. They’ve been using AI in their design process. The studio is known for its innovative projects around the world, including Google’s headquarters in London and California, Africa’s first museum of contemporary African art in Cape Town, and a new district in Tokyo. Heatherwick Studio has been developing tools that use their data to streamline design processes, rendering, and data access.
“At the studio, we not only believe in the transformational power of AI to improve the industry but are actively developing and deploying in-house custom diffusion models in our everyday work,” said Pablo Zamorano, head of Geometry and Computational Design at Heatherwick studio.
“We have developed a web-based tool that enables quick design provocations, fast rendering, and image editing as well as a tool that allows for tailored knowledge search from within our BIM tools. These tools empower the work of our designers and visualisers and are now well established.”
Creating custom diffusion models with Nvidia
NeMo provides a powerful framework that provides components for building and training custom diffusion models on-premises, across all leading cloud service providers, or in Nvidia DGX Cloud. It includes a suite of customisation techniques from prompt learning to parameter-efficient fine-tuning (PEFT), making it ideal for AEC customers who need to generate high-quality architectural renderings and optimise construction visualisations efficiently.
Alternatively, Nvidia Picasso is an AI foundry leveraged by asset marketplace companies to build and deploy cutting-edge generative AI models with APIs for commercially safe visual content.
Built on Picasso, generative AI services by Getty Images for image generation and Shutterstock for 3D generation, create commercially safe visual media from text or image. AEC organisations can fine-tune their choice of Picasso-powered models to create custom diffusion models that generate images from text prompts or sketches in different styles. Picasso supports end-to-end AI model development, from data preparation and model training to model fine-tuning and deployment, making it an ideal solution for developing custom generative AI services.
Responsible innovation with diffusion models
Using AI models involves several critical steps, including data collection, preprocessing, algorithm selection, training, and evaluation. Each of these steps requires careful consideration to make sure the model performs well and meets the specific needs of the project.
However, it’s equally important to integrate responsible AI practices throughout this process. Generative AI models, despite their impressive capabilities, are susceptible to biases, security vulnerabilities, and unintended consequences. Without proper safeguards, these models can produce outputs that reinforce harmful stereotypes, discriminate against certain demographics, or contain security flaws.
Additionally, protecting the security of diffusion models is crucial for generative AI-powered applications. Nvidia introduced accelerated Confidential Computing, a groundbreaking security feature that mitigates threats while providing access to the unprecedented acceleration of Nvidia H100 Tensor Core GPUs for AI workloads. This feature makes sure that sensitive data remains secure and protected, even during processing.
Get started
Generative AI, particularly diffusion models, is revolutionising the AEC industry by enabling the creation of photorealistic renderings and innovative designs from simple sketches or textual descriptions.
To get started, AEC firms should prioritise data collection and management, identify processes that can benefit from automation, and adopt a phased approach to implementation. The Nvidia training program helps organisations train their workforce on the latest technology and bridge the skills gap by offering comprehensive technical hands-on workshops and courses.