A Google research paper on AI task delegation provides the framework for making agentic BIM work – but also exposes a yawning gap between current platforms and what’s actually needed, writes Martyn Day
In our March/April 2026 cover story, the Agentic future of BIM, I argued that the shift from modelling to execution will require BIM platforms to manage delegated authority, not just geometry.
Since then, a Google DeepMind paper, entitled ‘Intelligent AI Delegation’ has made that argument uncomfortably specific.
To be clear, the paper doesn’t mention the AEC industry. Instead, it focuses on a higher level, and in particular, on how AI might be trusted to solve problems and provide reliable outputs. But after reading the paper, I believe it systematically dismantles the assumption that AI agents can be bolted onto legacy BIM platforms and that users can expect them to operate safely at scale.
Discover what’s new in technology for architecture, engineering and construction — read the latest edition of AEC Magazine
👉 Subscribe FREE here
More importantly, the paper provides a conceptual framework that has been noticeably absent from most discussions on what agentic design systems actually need to do.
Delegation, not assistance
Current AI-in-BIM implementations largely focus on autocomplete functions. An AI application suggests clash resolutions or layout. A plug-in generates duct routes. A co-pilot drafts schedules, adds up quantities.
These are all big productivity enhancements within the flow of design – but the concept of delegation is something else entirely.
Delegation involves what the Google researchers term ‘authority transfer’. In AEC terms, when an AI generates an apartment layout, is it operating within defined planning constraints, accessibility codes, fire egress requirements, and structural coordination limits. The question is: does it provide verifiable proof that all were simultaneously satisfied – or is it generating an optimised proposal that the architect then validates and takes responsibility for?
Current AI layout tools can perform some sophisticated planning work, such as generating layouts that are optimised for zoning, daylight and cost, for example. But they’re producing proposals, not delegating decisions. The architect remains the guarantor of a design. True delegation would mean that the agent takes responsibility for the planning decision, not just assists within it.
The complexity deepens when agents need to work across disciplinary boundaries. An MEP agent optimising duct routes can’t operate in isolation from a structural agent placing columns. An architectural layout agent generating apartment plans can’t ignore fire egress requirements or façade constraints.
Real building design isn’t a matter of siloed optimisation. It’s about integrated problem-solving where constraints interact. And current tools don’t support this.
For example, Finch optimises layouts within architectural rules. Branch 3D integrates structural analysis with real-time cost and carbon feedback. But neither system negotiates with the other. There’s no mechanism for cross-discipline constraint resolution and no way to prove the integrated solution satisfies all requirements simultaneously. The architect still does that manually.
The delegation framework would require agents that can coordinate across boundaries, escalate when constraints conflict and provide verifiable proof that the entire system works, not just the individual parts.
Going from the macro to the micro, the Google paper introduces the idea of ‘contract-first decomposition’ as a binding constraint. Each task must be decomposed until its outputs are verifiable. If you can’t validate a result cheaply and reliably, you must decompose the task further until it can be validated. Verification isn’t optional. It needs to be structural to the system.
This maps directly onto the shift I previously described, from BIM-as-container to BIM-as-runtime. In a traditional workflow, geometry is the artefact. The model is a database of objects. Downstream tools merely attach analysis data to those objects and may even have to create a whole new model in another application.
In an agentic system, geometry becomes secondary. The primary artefacts are proofs. Load paths must be demonstrable. Carbon calculations must be auditable. Code compliance must be traceable. The agent’s output isn’t a wall. It is a structural decision with an attached guarantee.
This is not a hypothetical distinction. It is the same distinction that makes Model Context Protocol (MCP) the wrong shape for what vendors call ‘agentic BIM’.
MCP enables a language model to translate natural-language intent into a sequence of application-specific commands: in Revit, a faster way to invoke the API; in any host, a better interface. It doesn’t change who is accountable for the result. No authority has been transferred. The architect is still the decision-maker.
The AEC industry needs process-level transparency. You must be able to see how the AI agent arrived at an answer, not just what its answer was
MCP is a useful protocol for getting human intent into an application quickly but is not a framework for delegated authority. In fact, to call it one would be erroneous. And, as of Spring 2026, MCP is a protocol under visible pressure because its credentials as a security architecture are being challenged. MCP servers in the wild have been found to be vulnerable at scale, with biting token costs, and the protocol has been formalised under the Linux Foundation as a hedge.
In short, MCP is an interface-layer technology that is being asked to carry a governance-layer load for which it was simply not designed.
The verifiability problem
Legacy BIM platforms were also not designed for modern workstyles. They coordinate geometry and metadata. They don’t define what an agent is allowed to do. They don’t track how much autonomy to grant based on risk. They don’t manage what happens when design constraints conflict. They don’t record how decisions were made in a way that can be audited later. None of this is natively supported by an IFC schema or a Dynamo graph, and none of it can be bolted on without rethinking the substrate.
The Google paper distinguishes between outcome monitoring (inspecting final output) and process monitoring (tracking intermediate reasoning). In lowrisk domains, outcome monitoring may suffice. In high-risk domains – such as structural safety, fire compliance, HVAC performance, envelope thermal behaviour and any other area where a wrong answer could have serious consequences – it doesn’t. The industry needs process-level transparency. You must be able to see how the agent arrived at an answer, not just what its answer was.
This is where the gap becomes visible. If an MEP agent routes services across a ceiling void, current platforms can tell you whether clashes were detected. They can’t tell you whether the agent respected energy targets, understood spatial tolerances or applied code requirements correctly. The solver trace is opaque. The reasoning is inaccessible. The system has no mechanism to prove that the constraints were honoured, only to ensure that the geometry doesn’t self-collide.
The paper’s solution is verifiable task completion: outputs must be provable. That means mathematical proofs that demonstrate correctness without revealing proprietary methods (zero-knowledge-style, so vendors don’t have to open their solvers), or third-party audit regimes, or formal verification. None of this is exotic. It is the engineering standard for any safety-critical system. It is how we treat bridges, aircraft and medical devices, for example. BIM platforms are not yet treated that way, and the argument of this piece is that they will soon need to be.
This much-needed governance layer is, in fact, missing entirely. The draft revision of the BIM standard, DIS/ISO 19650-1:2026, published for consultation in March 2026, does not mention agents, autonomous workflows or agentic AI.
In other words, the standard that governs information management in BIM has precisely nothing to say about the agentic shift, despite being revised at a time when vendors are actively shipping agent-capable platforms with MCP hooks to customers. The standard continues to assume that information is produced by humans who can be held accountable – but the market is moving faster than the standard – a full revision cycle faster at least. That gap is not a matter of rhetoric. It is where liability will fall when something goes wrong and no one is able to say who took the decision that resulted in that error.
This is what architectural incompatibility looks like in practice. The tools don’t produce verifiable evidence. The protocol the vendors are betting on operates at the wrong level. And the standard that would define responsibility has not been written.
In that respect, today’s BIM platforms are inadequate to run agentic tasks, not through neglect, but because they were simply built to deliver on a workflow to deliver drawings.
Trust versus reputation
One of the more operationally relevant concepts in the Google paper is the distinction it makes between reputation and trust. Reputation is global, a historical measure of past performance. Trust is contextual and task specific. An AI agent with strong performance in low-rise residential may not warrant the same autonomy when put to work on a multi-storey hospital project in a seismic area. Trust must be calibrated dynamically.
In AEC terms, this means agent autonomy can’t be binary. It’s not a matter of ’computer says no’ or ’computer says yes’. Structural agents might operate with high autonomy in routine scenarios and require explicit approval in exceptional conditions. MEP agents might generate routings freely but need human sign-off when deviating from energy envelopes. It’s not enough to have discipline-specific knowledge. There is also a governance architecture issue to be addressed.
Current BIM tools don’t support graduated autonomy. You either run a script or you don’t. You approve a design option or you reject it. There’s no infrastructure for context-aware delegation, no mechanism for agents to escalate decisions when uncertainty exceeds thresholds and no protocol for re-delegation when trust conditions aren’t met.
Then there’s the issue of monoculture risk. Highly efficient networks of delegated agents, if built on similar models and trained on similar data, create what the Google researchers call ‘cognitive monocultures’. When errors occur, they propagate rapidly and widely.
Take, for example, Autodesk Assistant, Bentley Copilot, and Trimble Agent Studio: three vendor-controlled platform commitments to agentic AI across the AEC stack. Each runs on a proprietary solver stack and is trained on its own data.
In practice, the AI is not the part making the engineering decisions. Building code checks, structural and MEP analysis, and the geometry itself are all still handled by the traditional, rule-based engines underneath. The LLM sits above that, translating intent and explaining results. The vendors do not control the reasoning substrate they sit on and, in most cases, do not see how it was trained.
With this comes the threat of widespread code misinterpretations, repeated structural blind spots, or carbon miscalculations embedded across portfolios. A flawed assumption in one solver gets replicated across thousands of projects simultaneously. A code interpretation error doesn’t fail once, it fails everywhere the solver is relied upon. The same infrastructure that lets a practice design faster could also mean its failures propagate faster.
Of the three, Bentley’s framing comes closest to naming the problem. Its ‘trustworthy AI’ positioning gestures at the verifiability concerns that the Google DeepMind paper places centre-stage. The big question is whether that translates into concrete mechanisms, such as auditable traces, signed proofs and third-party validation, or stays as marketing.
What is striking is how few of the vendors will say publicly which LLM they use. If Autodesk Assistant, Trimble Agent Studio and Bentley Copilot share even one foundation model between them, then a single flaw in that model, such as a systematic misreading of fire egress requirements, say, or a hallucinated compliance citation that sounds authoritative, does not fail on one platform. It fails across all of them at once, and the firm downstream has no way to tell which layer produced the error.
In other safety-critical industries this would trigger established response mechanisms such as notices, recalls, audits and disclosure requirements. In an AEC context, there is none of that. There’s no disclosure requirement, no testing protocol and no governance framework that would catch this kind of failure before it reaches production.
The argument is no longer if the next generation of agentic tools reinforces that concentration. The next generation has already reinforced it. The reinforcement is named, dated, and documented. What is missing is the governance layer that would catch its failures before they reach production at scale.
At this point, it’s important to mention that Autodesk is also developing its own in-house ‘neural CAD’ foundation model for Forma – first announced at AU 2025. Neural CAD is positioned explicitly as a different category from general-purpose LLMs like ChatGPT or Claude, trained to reason about CAD objects and building systems rather than language. Some early capabilities are appearing around the edges, but the core engine is not yet in full production use.
Delegation as a structural property
The Google DeepMind paper doesn’t advocate for or against AI autonomy in design. It provides a framework for thinking about delegation as a structural property of intelligent systems. Authority transfer, verifiable task completion, process monitoring, trust calibration and re-delegation are not optional features. They are the architecture.
Viewed through that lens, the future of BIM is not ‘BIM 1.0 plus AI’. The ability to verify what agents did, grant autonomy conditionally, monitor decisions as they are made and re-delegate when things go wrong cannot be added to a modelling tool as a plug-in, or a copilot, or a protocol layer. It must be built in.
Agentic BIM will not succeed on the basis that it produces geometry faster. It will succeed only if it embeds governance, proof and accountability into the substrate itself. The infrastructure required for that does not yet exist
I would describe a system with these capabilities as a runtime-native BIM platform. It would sign every solver output with a proof that the relevant constraints were honoured, and version those proofs so they could be audited later. It would support graduated autonomy by letting a firm define the envelope in which each agent can act, escalate outside it and re-delegate when conditions change.
It would provide a cross-discipline negotiation layer, so that an MEP agent and a structural agent could resolve conflicts between their constraints without a human acting as manual translator. It would publish provenance for the underlying foundation models it depends on, because a defect in one of those models is now a defect in every project the platform touches. None of this is speculative. It is the same set of mechanisms that safety-critical industries have used for decades, applied to new substrates.
The implication is uncomfortable but clear. Legacy BIM systems are not incomplete; they are architecturally incompatible with agentic safe delegation. You cannot retrofit accountability into platforms designed to store geometry and produce drawings. And the industry will not reach a runtime-native platform by accident. It will not get there by waiting for the industry incumbents to iterate on tools that they already ship.
Agentic BIM will not succeed on the basis that it produces geometry faster. It will succeed only if it embeds governance, proof and accountability into the substrate itself. The infrastructure required for that does not yet exist.
The question now is not whether it will be built, but who will build it and on what terms. And that ‘who’ could be an established vendor, entirely rebuilding its platform from first principles, but it might also be a software company starting from scratch and prioritising the delegation contract as the primitive around which a new platform is designed.
Read the March/April 2026 cover story
The Agentic future of BIM