From training to inference: the quiet shift redrawing the AI map

June 25 2026, by Gavin Dudley | Category: Data Centres

For three years, AI has been a training story: build the biggest model, feed it the most data, repeat. That race hasn’t stopped – but the centre of gravity has moved. Most of the world’s AI compute no longer builds models. It runs them.

The shift is hard to argue with. Deloitte puts inference at two-thirds of all AI compute in 2026 – up from a third in 2023 and half in 2025.

It’s already showing up in what gets bought. NVIDIA’s data-centre revenue hit a record US$51.2 billion in Q3 FY26, up 66% year-on-year – and the company now names real-time inference, not training, as the primary driver of cloud monetisation. GPU chips are now shipped to build productivity, not training models. That change quietly redraws the map of where AI compute has to live.

Where the work actually happens.

If the work has shifted from building models to running them, what does “running them” actually involve? Not all AI compute is the same – and where a workload sits decides where it has to run. First, the basics:

Training builds the model: a one-off, computationally-heavy job that runs in massive, distinct blocks of compute, then ships in versions. It’s location-agnostic – train wherever there’s power and cooling.
Inferencing is the model in use: a query goes in, the trained model responds. It’s what almost every live AI product does in production.
Conversational inferencing is the part users feel: chatbots, copilots, voice agents. It sits close to the user, because every millisecond shows up in the experience.
Hybrid runs training and inferencing in the same environment: common where models are fine-tuned on production data.
Agents are inference that fetches its own context: The model doesn’t just answer; it breaks the task down, searches live corporate systems (CRM data, org charts, current policies) via retrieval-augmented generation, and processes that fresh data on the fly.

That last category – agents – is driving compute per query sharply upward. At GTC 2026, NVIDIA’s Jensen Huang said the compute behind a single AI workload has climbed roughly 10,000-fold in two years as reasoning models replaced simple retrieval: a query no longer returns one answer, it generates thousands of intermediate “thinking” tokens. Inference already accounts for over 40% of NVIDIA’s revenue.

Australia is already building for it.

This isn’t hypothetical for Australian buyers – the capital is already moving. The ABS reported that data-centre spending drove the largest rise of any industry in the March quarter 2026: information, media and telecommunications jumped 96.1% to a new record, as equipment spend nearly tripled to around $6 billion in the quarter. That’s real money following a real shift in where compute has to sit – and it lands here because latency and sovereignty increasingly travel together. Once inference is continuous and close to the user, location becomes a strategic choice, not an operational footnote.

Why location suddenly matters.

Inference is less bursty than training, but continuously active and far more latency-sensitive. A facility planned in Australia today will spend most of its life running inference, not training. That rewrites the playbook: inference demands proximity – a well-connected metropolitan site, close to the end user, where speed and connectivity are the product. And because it runs non-stop, the economics reduce to one question: how efficiently you turn that capacity into useful output.

No, it won’t all move to the edge.

There’s an obvious objection. If inference must sit close to the user, surely it just shifts onto the user’s own devices – phones, laptops, cheap local chips – and large data centres stop mattering. The data says otherwise. Globally – and Australia is moving the same way – most inference will still run in data centres, on high-performance chips, in a market worth over US$200 billion. The market for inference-optimised chips alone passes US$50 billion in 2026. The edge takes a slice; the heavy, consequential inference stays where the power, cooling and bandwidth are.

One chip no longer does the job.

Inference is the fastest-growing slice of accelerator demand and is forecast to pass training by the end of the decade – even if it isn’t the clear majority of accelerator revenue yet.

And a GPU isn’t always the right tool. Training rewards raw parallel throughput; inference is a different problem – each token reloads the model’s weights, so it’s bound by memory bandwidth, not raw compute. That’s opened the door to rival architectures – some perform better for low latency predictable responses (On-chip SRAM like Groq’s LPU), others are better for large parameter models (like AMD’s MI300X), or Ultra-Large Mix of Experts models (like SambaNova).

None of this dents NVIDIA – but it means inference is a genuinely different contest from training.

NVIDIA’s answer is to re-architect for inference. It built its entire GTC 2026 keynote around one message – “the age of inference has arrived” – and around token-factory economics: a data centre is a factory producing tokens, and within a fixed footprint, whoever produces the most tokens per unit of capacity has the lowest cost and the most revenue.

The unit of competition is no longer the raw chip. It’s efficiency: how much useful output you get from each unit of capacity.

Whose laws apply when AI makes the call.

Here the conversation shifts from infrastructure to governance. AI agents are increasingly used to approve or recommend loans, triage patients or trigger operational actions – in a growing number of cases with little or no human reviewing the output before it becomes real.

As the National Law Review noted in 2025, when agentic systems “operate from servers in one jurisdiction and make decisions affecting parties in multiple other jurisdictions, questions of which legal framework applies become particularly relevant.” The location of the inference run is the location of the decision – with all the regulatory and liability consequences that carries.

Three related questions sit underneath:

Data location – the IP and privacy question. Where training and fine-tuning data resides determines regulatory exposure under the Privacy Act, GDPR and health-records law. Australia’s 2026 Privacy Act amendments introduce mandatory transparency and contestability rights for “substantially automated” decisions that materially affect individuals.
Decision location – the jurisdiction question. Where inference runs increasingly determines which framework governs the output, who can demand an audit trail, and what rights the affected party has. Gartner projects that by 2027, 35% of countries will have locked into regional AI infrastructures, making “sovereignty by design” the norm.
Cultural DNA – whose assumptions are baked in. Models encode the values of the people who built them; one fine-tuned on US data by US teams may produce decisions that are culturally misaligned or legally non-compliant in an Australian context. Who can amend the model matters as much as where it runs.

Regulation is moving in one direction. Australia’s National AI Plan, published December 2025, signals stronger oversight of automated decision-making even without a standalone AI Act.

The global direction is sovereign inference: the data centres that can demonstrate local hosting, local fine-tuning and auditable decision trails will be the only locations that compliant enterprises and governments can legally run these workloads on.

Where to put your inference.

Strip it back to one line. If inference is continuous, latency-sensitive and increasingly run on a mix of silicon, then where you place it is a strategic decision, not a cost line on a spreadsheet: close to your users, close to your data, inside a sovereign jurisdiction.

The shift from training to inference is quiet, but it’s redrawing the map of where AI compute has to live – and the organisations that plan for it now will avoid costly rework later.

If you’re weighing where to place your AI capacity, that’s a conversation worth having properly. Reach out at gdudley@macquariedatacentres.com or macquariedatacentres.com/contact-us. I’m happy to talk through the pragmatic steps worth taking now.