Economic Utilization Resources — AI Infrastructure Revenue Benchmarks

The evidence behind economic utilization.
Public benchmarks, third-party revenue estimates, and methodology references behind the Economic Utilization Diagnostic.
Market Brief
The $660B CapEx Problem — and the only lever left
A visual comparison of GPU utilization, highlighting the significant improvement with optimization.
Read the Market Brief →
Executive Brief
One-Page Executive Brief
80%
Cost Reduction
5-7x
Throughput Increase
3
Key Outcomes
Download the Executive Brief →
Benchmark Data
Performance Benchmark Sheet
View benchmarks →
Methodology
How the Diagnostic Works
The EarthServe Economic Utilization Diagnostic is built on a distinction that most AI infrastructure reporting obscures: the difference between operational utilization and economic utilization. Operational utilization measures whether GPUs are busy. Economic utilization measures the revenue actually generated against the revenue theoretically possible from the same installed capacity. A GPU fleet can run at high compute occupancy and still monetize a fraction of its potential — because throughput, pricing mix, and actual inference demand all interact to determine what a cluster is truly worth commercially. The diagnostic makes that gap legible as a dollar figure.
The calculation begins with two inputs that define the revenue ceiling of any given infrastructure footprint: LLM inference throughput, expressed in tokens per second per node, and a blended price per token derived from the operator's actual input-to-output token mix and the per-model pricing schedule. Multiplying tokens per second per node by the number of nodes — itself derived from total GPU count or total installed power in gigawatts — and then by the 31,536,000 seconds in a year yields a revenue capacity at 100 percent economic utilization. This is not a projection. It is an arithmetic expression of what the infrastructure would earn if every token of throughput were sold at the operator's own stated price. From there, the tool infers implied current economic utilization by dividing the operator's observed annual revenue against that theoretical ceiling. That implied utilization figure is computed entirely from the user's own inputs and is never assumed or hardcoded — it is the core diagnostic insight, a mirror held up to real production data. The primary commercial output is additional annual revenue capacity unlocked: the difference between revenue at a user-specified target utilization and current observed revenue. AI infrastructure economic utilization expressed this way gives CFOs and AI revenue leaders a single, actionable number rather than a cluster of operational metrics that do not translate to the income statement. Secondary outputs — equivalent gigawatts unlocked and future capex avoidance — quantify the strategic value of improved GPU fleet monetization by expressing uncaptured revenue capacity in infrastructure terms, using the operator's own capital expenditure per gigawatt as the conversion factor. AI infrastructure ROI framed through capex avoidance is particularly relevant to infrastructure strategy teams evaluating whether new build is warranted before existing capacity is fully monetized.
The tool ships with model presets — Llama 70B at 21,000 tokens per second per node across eight GPUs, and a GPT-4o class proxy at 18,000 tokens per second per node — along with published pricing defaults, to allow immediate benchmarking without requiring production telemetry on first use. GPU revenue per watt estimates and AI revenue capacity ceilings produced by those defaults are illustrative starting points, not assertions about any operator's environment. Every field is editable. Users who substitute real production data — actual tokens per second, observed blended pricing, measured node counts, and audited revenue — will obtain an economic utilization diagnostic calibrated precisely to their infrastructure.
Frequently Asked Questions
For CFO and analyst-level readers.
1
Q: What is economic utilization and how is it different from operational utilization?
Operational utilization measures whether a GPU is active — processing requests, moving data, or executing compute. It is what infrastructure dashboards and vendor telemetry report, and it can read close to 100% even on a financially underperforming fleet. Economic utilization measures something different: the share of installed AI capacity that is actually converting into billable token output relative to what that capacity could theoretically produce at peak throughput. A fleet can be operationally busy and economically idle at the same time — because GPUs assigned to fragmented dedicated pools, undersized batches, or low-demand workload queues are active but not producing proportional revenue.
2
Q: Where does the $5–6B per GW revenue figure come from?
The $5–6B per GW estimate originates from Bank of America research on AI infrastructure revenue productivity, widely referenced in 2025 discussions of hyperscale and large-fleet AI deployments. It represents an observed relationship between installed AI compute capacity and the annual revenue that capacity generates at current average fleet economics. Unlock AI for Earth uses this figure as a real-world anchor for economic utilization calculations — not as a proprietary claim, but as a publicly available benchmark that any analyst can independently reference.
3
Q: How is the $25–35B per GW theoretical revenue figure calculated?
The theoretical revenue capacity is derived from three public inputs: peak billable tokens per second per node at target latency (sourced from MLPerf-style server benchmarks for production-grade 70B-class models on 8×H100 nodes), blended token pricing from major model providers including AWS Bedrock, GCP Vertex AI, and OpenAI, and seconds of productive capacity per year assuming high economic utilization. Multiplying peak tokens per second by annualized seconds by blended price per token yields a theoretical revenue ceiling for a given installed capacity. The $25–35B range reflects variation across model tiers and pricing assumptions. All inputs are drawn from publicly available benchmark and pricing data.
4
Q: Does a higher operational utilization number from our dashboard mean our economic utilization is also high?
No — and this is the most important distinction for finance teams evaluating AI infrastructure productivity. Operational utilization and economic utilization are independent measurements. A dedicated pool architecture can show 90–100% operational utilization on individual pools while the fleet-level economic utilization remains in the low-20s. This happens because dedicated pools are sized for peak demand within a single workload profile. When demand is unevenly distributed across low-latency, batch, agentic, and long-context pools — which it almost always is — some pools run hot while others sit cold. The average productive output across the whole fleet stays low even though no individual pool appears idle.
5
Q: What inputs does the Economic Utilization Diagnostic require and where do they come from?
The diagnostic requires three primary inputs: total GPU count or fleet size in GW, model class (selected from a preset that auto-populates throughput and pricing assumptions), and annual AI revenue. Fleet size is typically available from infrastructure procurement or finance records. Model class is selected from three presets — Llama 70B class, GPT-4o class, or custom — with throughput and pricing auto-filled from public benchmarks and published provider pricing. Annual AI revenue can be entered at whatever level of precision is available, from a rough order of magnitude to an audited figure. All throughput and pricing defaults are labeled as benchmark-based estimates and can be overridden with measured production data.
6
Q: How does EarthServe improve economic utilization and what is a realistic target range?
EarthServe replaces dedicated pool architecture with a unified inference fabric that disaggregates prefill and decode and dynamically routes requests to available capacity across the full fleet in real time. Because all workload profiles — low-latency, batch, agentic, and long-context — share the same physical GPU pool scheduled against SLOs rather than fixed allocations, idle fragmentation across pools is eliminated. Billable token throughput rises because every GPU is continuously assigned to the highest-value request it can serve at any given moment. In production configurations, economic utilization targets of 70–90% at the fleet level are achievable on the same installed hardware, compared to the low-20s typical of dedicated pool architectures. The Economic Utilization Diagnostic models this uplift directly from each customer's fleet inputs.
Ready to see your numbers?
Book a 30-minute session with our team. We'll model your specific GPU fleet, load profile, and revenue assumptions — and show you exactly what's recoverable.
Schedule a 20-min Fleet Diagnostic
No sales pitch. Just your numbers.