Part II – The Economics of Intelligence: A Tale of Two Curves
By Bhanu Nallagonda, Cofounder, Ogha Technologies March ‘26
The financial dynamics of AI in 2025 were characterized by a dramatic divergence between the unit cost of intelligence and the aggregate cost of deployment. Understanding this paradox is essential to grasping the market forces shaping the industry.
The Training vs. Inference Cost Curve
For years, the industry spotlight was fixed on the astronomical costs of training foundation models – billions of dollars spent on GPU clusters to create a GPT-4 or Gemini or Claude. Their plans extend to spending trillions of dollars over the coming years. However, 2025 marked the definitive shift where inference (the cost of running the model) became the dominant economic factor.
The Plummeting Cost of Inference
The unit cost of intelligence dropped precipitously. According to the Stanford AI Index 2025, the inference cost for a system performing at the level of GPT-3.5 dropped over 280-fold between late 2022 and late 2024. This deflation was driven by a confluence of factors:
- Algorithmic Optimization: Techniques like Sparse Activations (Mixture of Experts) allowed models to activate only a fraction of their parameters for any given token, drastically reducing the compute required per operation. Furthermore, distillation—teaching smaller models to mimic larger ones—allowed enterprise-grade performance on much lighter architectures.
- Hardware Efficiency: The deployment of specialized inference chips and improved GPU architectures (like Nvidia’s Rubin platform) reduced the energy-per-token cost by roughly 30-40% annually.
- Price Wars: Intense competition among model providers and aggregators like OpenRouter drove consumer-facing prices down to commodity levels. Developers could now shop for the cheapest intelligent token across dozens of providers.
The “Inference is the New Margin Killer” Paradox
Despite the collapse in per-token prices, enterprise spending on inference exploded. Industry reports from 2025 indicate that 60–80% of an AI system’s total lifecycle cost is now incurred during inference, not training. This trend is driven by three primary mechanisms:
- Jevons Paradox: As intelligence became cheaper, demand for it spiked. Developers stopped rationing tokens and began building applications that consume them voraciously.
- Agentic Loops and Token Creep: The shift to agentic AI means that a single user request (e.g., “plan a travel itinerary” or “build this whole website or app”) might trigger hundreds or thousands of internal model calls. The agent might search the web, verify the results, draft an itinerary, critique the itinerary, refine and rewrite it—all before the user sees a single word. A simple task that once costed one unit of inference now costs hundreds. Furthermore, RAG (Retrieval-Augmented Generation) systems stuff vast amounts of corporate data into the model’s context window for every query, multiplying the token count per interaction.
The “Autoscaling Tax”: The requirement for low-latency responses forces companies to keep GPUs “warm” and available 24/7. Unlike training, which is bursty and schedulable, inference demand is unpredictable, leading to utilization inefficiencies that can bloat costs.
Token Usage and The 100 Trillion Milestone
The volume of data processing in 2025 reached staggering levels. OpenRouter, a leading model aggregator, reported processing over 100 trillion tokens by mid-2025, with daily volumes exceeding 1 trillion tokens. To put this in perspective, this daily volume rivals the entire monthly throughput of major providers from just two years prior.
This growth is not merely a function of more users, but of heavier users. The fastest-growing behaviour on these platforms is “agentic looping” where models talk to models. This shift toward machine-to-machine communication suggests that in the future, the vast majority of AI text generation will never be read by a human—it will be read by other AIs as part of an intermediate processing step.
The Shift to Agentic AI — From Chat to Action
If 2025’s infrastructure was about size, its software trend was about being agentic. The industry consensus is that the era of the “chatbot”—a passive responder to human queries—is ending. It is being replaced by Agentic AI.
From “Prompt Engineering” to “Outcome Engineering”
The core shift in 2025 was from Generative AI (creating content) to Agentic AI (executing workflows). This transition, often termed “The Age of Autonomy,” involves models that can reason, plan and use tools to achieve high-level outcomes.
- Outcome Engineering: The skill of “prompt engineering” (crafting the perfect text string) began to fade. It was replaced by “outcome engineering”—defining the parameters of success and allowing the agent to figure out the “how”. Users stopped asking models to “write code for a login page” and started asking them to “build a login system that integrates with Auth0 and handles these specific edge cases.”
The Unit of Work: In a chat paradigm, the unit of work is a “turn” of conversation. In an agentic paradigm, the unit of work is a “job”—booking a shipment, refactoring a codebase, or auditing a financial statement.
Swarm Intelligence and Frameworks
The “Single God Model”—one massive LLM doing everything—proved inefficient for complex tasks. 2025 saw the rise of Swarm Intelligence and multi-agent orchestration.
- Specialization: Frameworks like Microsoft AutoGen, LangGraph and CrewAI allowed developers to build teams of specialized agents. One agent might be the “Researcher”, another the “Writer” and a third the “Critic.”
- The “Critic” Loop: This collaborative approach was found to significantly reduce hallucinations. A “Critic” agent could catch errors made by the “Writer” before the human ever saw them, creating a self-correcting loop that mimicked human peer review.
- Adoption Reality: Despite the hype, true autonomy remains rare. While 62% of companies experimented with agents, only a fraction (estimated 15-20%) had autonomous agents in production by year-end. The industry was stuck in “Pilot Purgatory”, struggling to trust agents with unsupervised execution of critical tasks.
The “Circular Economy” and the AI Bubble
A shadow looming over the 2025 AI landscape is the financial structure supporting this explosive growth. Analysts have identified a “circular funding loop” that resembles the vendor financing schemes of the dot-com era, raising concerns about a potential asset bubble.
The Anatomy of the Loop
The mechanism, as detailed in financial reports from late 2025, operates as follows:
- Chipmakers Invest: Companies like Nvidia invest billions of venture capital into AI startups and labs (e.g., OpenAI, CoreWeave, Mistral).
- Labs Buy Chips: These startups use the investment capital to purchase massive quantities of hardware (GPUs) and cloud services.
- Revenue Recognition: The chipmakers and cloud providers (Microsoft, Oracle, Nvidia) recognize these purchases as revenue, boosting their stock prices.
- Reinvestment: The boosted valuations allow for further investment and borrowing, continuing (or perpetuating!) the loop.
Critics argue that this creates “artificial” revenue. For instance, Nvidia’s investment in OpenAI is effectively Nvidia bankrolling its own future sales. This creates a “merry-go-round” of capital that inflates revenue figures without necessarily reflecting genuine, organic end-user demand.
Market Jitters
By late 2025, this fragility began to manifest in the markets. While the S&P 500 remained strong, Nvidia’s stock experienced volatility as investors questioned the sustainability of the $200-300 billion annual rise in AI capital expenditures. The disparity between infrastructure spend i.e. Capex and actual AI revenue (approx. $10-15 billion in pure API spend) remains the primary risk factor for 2026. If the startups fail to generate real-world revenue from end-users (read as SaaS income, not SaaSopacalypse, more on it later again) sufficient to cover these hardware costs, the cycle could unravel. As a proof point, recent financial disclosures and internal documents leaked in February 2026 verify that OpenAI has significantly revised its long-term revenue guidance upward by 27% while simultaneously reporting a contraction in profit margins due to surging operational costs. Against the earlier investor target of 40% of margin, it shrunk to 33% in the latest report for 2025 from 40% in 2024 despite the revenue growth. While this could be due to last minute purchase of premium compute to meet the token usage above expectations or estimates, the math of growing usage and reducing inference costs are not exactly helping the margins.

Majority of the revenues still come from the APIs. The revenues are multiplying or growing at a very fast clip year over year, yet they are much smaller at the moment for the investments being made.
Interestingly, Apple seems to be swimming against the tide (or are they languishing?), with a drastically lower capex spend when compared to the AI infra super spenders, though it did increase its capex year over year by about 35%. Apple’s share price correlation to Nasdaq-100 has hit the lowest in 20 years as per Bloomberg. They have licensed massive 1.2 T parameter mode from Google at about 1 B USD per annum, while hosting it in its own private cloud compute infrastructure.
At this layer, the risk takers are capable, deep pocketed, well funded and aware to a large extent. So though the investments are heavy and disproportionate, any failures will effect ruthlessly those, who are not strong enough to weather these or who bet the last penny unscrupulously. Would any of those be catastrophic?
‘Part III – The Infrastructure Arms Race’ follows…







