Omdia’s AI factory model turns capex into output
Omdia’s 2026 AI factory playbook reframes infrastructure around token output, sovereignty, and the last mile of AI delivery.

Omdia’s 2026 AI factory playbook reframes infrastructure around token output, sovereignty, and the last mile of AI delivery.
I’ve been watching AI infrastructure get more expensive, more political, and a lot less tidy for a while now. But the thing that kept bothering me was how everyone talked about it like it was still just “more GPUs, more racks, more cloud.” That story stopped making sense the minute I saw teams buying compute and then watching it sit there, waiting on data movement, orchestration, approvals, and the usual enterprise sludge. I’ve seen the same pattern in a few places: the budget gets approved, the cluster gets installed, and then the actual product team realizes they’ve built a very expensive waiting room.
That’s why Omdia’s Light Reading post hit a nerve for me. It doesn’t treat AI infrastructure like a generic cloud upgrade. It treats it like industrial production, with output, bottlenecks, sovereignty, and hard physical constraints. That framing is more honest, and honestly more useful. Omdia says cumulative global data center investment could approach $1.6 trillion by 2030, and leading tech enterprises may deploy over $600 billion in AI infrastructure capex in 2026 alone. Those are not “maybe we should optimize later” numbers.
What I want to do here is break down the five dynamics Omdia is pointing at, because the useful part isn’t the headline. It’s the operating model underneath it.
Stop measuring AI by FLOPS and start measuring output
Get the latest AI news in your inbox
Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.
No spam. Unsubscribe at any time.
“Budgets for compute hoarding have been frozen as enterprises confront a ‘Zombie GPU’ effect, in which expensive GPUs idle in I/O wait; evaluation metrics are shifting to Time-to-First-Token and vector retrieval speed.”
What this actually means is simple: raw compute is not the bottleneck you should obsess over anymore. A GPU that waits around on data, retrieval, or orchestration is just a very expensive heater. I’ve seen teams brag about cluster size and then quietly admit their model latency is trash because the pipeline is a mess.

Omdia’s point about Time-to-First-Token matters because it forces teams to care about user-visible response, not vanity metrics. If the first token takes forever, nobody cares how many FLOPS you bought. Same deal with vector retrieval speed. If your retrieval layer is slow, your “AI app” feels like a stalled demo.
I ran into this pattern when a team I worked with kept adding GPUs to a RAG stack. It helped a little, then plateaued hard. The real fix was not more compute. It was reducing retrieval overhead, tightening indexing, and cutting redundant calls. That’s the annoying part: the answer is usually plumbing, not horsepower.
How to apply it:
- Track TTFT, end-to-end latency, and retrieval latency before you track raw GPU utilization.
- Look for idle GPU time caused by data transfer, queueing, or orchestration gaps.
- Audit redundant API calls, repeated embeddings, and unnecessary model hops.
- Optimize the pipeline before you buy the next rack.
Omdia’s mention of vendor case studies claiming a 12x vector indexing speed-up and up to a 75% cost reduction on API and compute redundancy is the kind of number that should make teams ask hard questions. Not because the number is magic, but because it shows where the waste usually lives.
AI factories are not data centers with a shinier name
Omdia defines an AI Factory as infrastructure whose sole objective is producing intelligence, with the token as the fundamental unit of output. That’s a very different mental model from “we have a data center and now we’ll run AI in it.”
Here’s the part I agree with: data centers are being pushed from business support centers into digital product manufacturing centers. That sounds like jargon until you try to run enterprise AI at scale. Then it becomes obvious. You need energy, hardware, scheduling, virtualization, and product delivery all lined up. If one layer is weak, the whole thing feels broken.
Omdia breaks the stack into four layers: energy and physical infrastructure, hardware and network fabric, scheduling and virtualization orchestration, and MaaS plus AI application ecosystem. I like this because it stops people from pretending the model layer is the only layer that matters. It isn’t. I’ve watched teams spend months tuning prompts while ignoring power density, network fabric, and placement policy. That’s how you end up with beautiful slides and ugly production incidents.
The industrial framing also explains why the market is getting so capital-intensive and politically messy. When AI output becomes a strategic asset, the infrastructure behind it starts looking less like IT and more like national capacity planning.
How to apply it:
- Map your AI stack into physical, network, orchestration, and application layers.
- Assign an owner to each layer instead of making “the platform team” swallow everything.
- Design for token production, not just model hosting.
- Review power, cooling, and network assumptions before the first production rollout.
If you want a clean reference point for the terminology, Omdia’s framing is in the source post on Light Reading, which is itself republishing Omdia’s press release. The original structure matters because it shows how Omdia wants buyers to think: as operators of an industrial system, not consumers of a cloud feature.
Hyperscalers are splitting into two awkward paths
Omdia says hyperscalers are balancing agility and sovereignty through two delivery paradigms. The first is full-stack drop-in, where vendors like AWS, Huawei, Google Cloud, and OCI deploy integrated AI capability into a customer’s own data center. The second is software/hardware decoupling, where software capabilities localize and the hardware ecosystem gets shaped around regional or policy constraints.

That split makes sense to me because the old cloud story was too neat. Enterprises want cloud-grade AI, but they also want control, residency, and compliance. Those goals collide all the time. So hyperscalers are being forced to package themselves in ways that are more awkward, more local, and frankly more compromise-heavy.
I’ve seen this in procurement conversations where the buyer wants “private AI,” but what they really mean is “we don’t want our sensitive data crossing three jurisdictions and a legal review.” The vendor response is usually some version of: we can drop in a stack, or we can slice the software from the hardware and localize the rest. Neither is elegant. Both are real.
What matters here is that sovereignty is no longer a side requirement. It’s shaping architecture. That means your vendor choice is not just a pricing decision. It’s a policy decision, a deployment decision, and a long-term dependency decision.
How to apply it:
- Decide whether your use case needs integrated deployment or localized control.
- Separate software portability from hardware sourcing in your architecture plan.
- Ask vendors how they handle residency, auditability, and operational boundaries.
- Test whether you can move workloads without rebuilding the whole stack.
For readers who want to dig into the companies Omdia names, start with AWS, Google Cloud, Oracle Cloud Infrastructure, and Huawei. The point is not that they’re identical. The point is that they’re being pulled toward different deployment compromises.
Compute-native AI clouds are becoming the default upgrade path
Omdia says rack power density has risen from 10–15 kW in 2024 to 40–250 kW in 2026. That’s not a small shift. That’s a completely different operational regime. Once you get into that range, “just add more racks” stops being a sentence and starts being a problem.
The report also says workloads are moving from proof-of-concept to production-grade deployment, and it points to players like Nebius and SenseTime shifting from bare-metal leasing toward Model as a Service. That tracks with what I’ve seen: the market is moving from renting boxes to buying outcomes, or at least trying to.
What I find interesting is the energy-computing linkage Omdia mentions. The tighter the power envelope gets, the less useful it is to treat compute and energy as separate conversations. They’re now the same conversation. If you can’t control power, you can’t control service quality. If you can’t control service quality, your AI platform becomes a demo machine.
This is also where the “compute-native” label earns its keep. A compute-native AI cloud is not just a cloud with GPUs bolted on. It is built around dense workloads, high utilization, and production service delivery. If you’re still designing around general-purpose infrastructure assumptions, you’re already behind the workload.
How to apply it:
- Recheck your power and cooling assumptions against current rack density, not last year’s plan.
- Separate PoC infrastructure from production infrastructure.
- Evaluate whether your provider is selling bare metal, MaaS, or an actual service layer.
- Plan for energy procurement and workload scheduling together.
That last point is the one teams keep underestimating. I’ve seen it turn into a nasty surprise when the compute is ready but the facility can’t support the load profile. The infrastructure was “approved,” which is a nice word for “nobody checked the physics closely enough.”
The last mile is where AI becomes a business, not a demo
Omdia calls this the “last mile” of AI industrialization, and I think that’s the most useful phrase in the whole piece. Vertical integrators, domain operators, and ISVs are capturing the final value layer through long-cycle data governance, legacy integration, and scenario-specific agent assembly.
That means the real money is not just in building the model or hosting the cluster. It’s in making the thing actually fit the business. That includes data cleanup, workflow integration, permissions, audit trails, and all the annoying bits that everyone likes to skip in slide decks.
I’ve been in enough enterprise AI projects to know the pattern: the model works in isolation, then dies on contact with enterprise reality. Legacy systems are ugly. Data policies are inconsistent. Domain logic is buried in tribal knowledge. The “last mile” is where all of that gets translated into something deployable.
Omdia’s mention of Inspur Cloud and its heavy-asset AI infrastructure strategy fits this theme. The point is not just that infrastructure exists. It’s that someone has to assemble the industrial line around a real use case and keep it operational over time.
How to apply it:
- Budget for integration, governance, and workflow mapping from day one.
- Identify the domain owner, not just the technical owner.
- Build scenario-specific agents instead of generic “AI assistants” that nobody trusts.
- Measure adoption and task completion, not just model accuracy.
If you ignore the last mile, you don’t get a platform. You get a prototype with a billing account.
Sovereign data factories are turning regulation into architecture
Omdia says frameworks like the EU AI Act and DORA are pushing sensitive data to remain inside physically isolated facilities. That elevates regional operators like G42 from cabinet landlords to physical gatekeepers of national-level data.
That’s a big shift, and it’s not just about compliance paperwork. It changes who has power in the stack. If data cannot leave a jurisdiction, then local infrastructure stops being optional. It becomes strategic. The operator who controls the facility controls access, latency, policy enforcement, and often the commercial relationship too.
This is where a lot of old cloud assumptions break. We used to treat location as a deployment detail. Now it can be the whole business model. Regional and industrial operators are getting pulled into a role that mixes infrastructure, regulation, and trust. That is a messy combination, but it’s the one the market is moving toward.
How to apply it:
- Classify workloads by residency, sensitivity, and regulatory exposure.
- Design physical isolation into the infrastructure plan when required.
- Assume your compliance requirements may become architecture requirements.
- Work with regional operators early instead of after the legal review blows up your timeline.
This is also why Omdia expects 2026 and 2027 to be the critical window for AI Factory development. The companies that can combine local control, industrial operations, and real AI delivery are the ones with the clearest path forward.
The template you can copy
# AI Factory Operating Model Template
Use this when you need to turn AI infrastructure from a GPU purchase into a production system.
## 1) Define the output
- Primary output: tokens, completions, retrieval results, agent actions
- Business KPI tied to output: [fill in]
- User-facing latency target: [fill in]
- TTFT target: [fill in]
## 2) Map the four layers
### Layer 1: Energy and physical infrastructure
- Power envelope: [fill in]
- Cooling model: [fill in]
- Facility constraints: [fill in]
- Residency / geography constraints: [fill in]
### Layer 2: Hardware and network fabric
- GPU / accelerator type: [fill in]
- Network topology: [fill in]
- Storage and data movement bottlenecks: [fill in]
- Expected idle time sources: [fill in]
### Layer 3: Scheduling and virtualization orchestration
- Scheduler: [fill in]
- Placement rules: [fill in]
- Multi-tenant isolation model: [fill in]
- Queueing policy: [fill in]
### Layer 4: MaaS and application ecosystem
- Model service layer: [fill in]
- Retrieval layer: [fill in]
- Agent / app layer: [fill in]
- Domain owner: [fill in]
## 3) Measure what matters
Track these before buying more compute:
- TTFT
- End-to-end response latency
- Retrieval latency
- GPU utilization
- GPU idle time caused by I/O
- Cost per successful task
- Cost per 1,000 tokens delivered
- Task completion rate
## 4) Decide the delivery model
Choose one:
- Full-stack drop-in
- Software/hardware decoupling
- Compute-native AI cloud
- Private AI foundation stack
- Regional / sovereign operator
For each model, document:
- Data residency requirements
- Integration effort
- Vendor lock-in risk
- Time-to-production
- Compliance burden
## 5) Kill the zombie GPU problem
Before adding compute, ask:
- Is the GPU waiting on data?
- Is the model waiting on retrieval?
- Is orchestration causing queueing?
- Are we duplicating API calls?
- Can we reduce redundancy before scaling?
## 6) Production readiness checklist
- [ ] Data governance approved
- [ ] Network bottlenecks measured
- [ ] Power and cooling validated
- [ ] Residency requirements mapped
- [ ] Observability in place
- [ ] Domain owner assigned
- [ ] Incident response defined
- [ ] Cost model reviewed
## 7) One-line rule
If the infrastructure cannot produce tokens reliably, it is not an AI factory yet.
That’s the version I’d actually hand to a team. It forces the conversation away from “how many GPUs can we buy?” and toward “what are we producing, what blocks it, and who owns each layer?” That’s a much better way to spend money.
Original source: https://www.lightreading.com/ai-machine-learning/five-dynamics-redefining-ai-infrastructure-in-2026-omdia. I’ve broken down Omdia’s press-release framing and added my own operating advice, examples, and template. The structure and the five dynamics come from the source; the implementation guidance is mine.
// Related Articles
- [IND]
OpenAI’s IPO filing turns hype into scrutiny
- [IND]
Skatteetaten proves public sector AI should be judged by outcomes
- [IND]
OpenAI’s IPO filing puts AI’s biggest test on Wall Street
- [IND]
OpenAI’s latest moves now center on pricing, safety, and scale
- [IND]
RISC-V mini PCs are worth buying now, but only as a bet on the future
- [IND]
Fedora 44 RISC-V widens Linux board support