The State of AI

Part III – The Infrastructure Arms Race: Giga-Scale Economics

By Bhanu Nallagonda, Cofounder, Ogha Technologies

March ‘26

If the economics of AI are abstract, the physical manifestation of the industry in 2025 is concretely, overwhelmingly massive. The race to AGI has morphed into a race for gigawatts. The era of the megawatt data centre ended in 2024 and 2025 clearly started the era of the Gigawatt Campus.

The Giga-Projects: Redrawing the Map

Tech giants are no longer building data centres; they are building cities of compute.

Project Stargate (Microsoft/OpenAI)

The most ambitious of these projects is “Stargate,” a $500 billion infrastructure initiative. By late 2025, OpenAI and Microsoft, in partnership with Oracle and SoftBank, were developing five new sites across the US (Texas, New Mexico, Ohio, and the Midwest).

  • Scale: The project targets 10 gigawatts of capacity. For context, a typical nuclear reactor produces about 1 gigawatt (alright, we are going to have SMRs, Small Modular Reactors as well). Stargate is essentially building an energy infrastructure equivalent to ten nuclear power plants solely for AI.
  • Strategic Importance: This project is designed to house millions of next-generation GPUs, including the forthcoming Nvidia Rubin architecture. It represents a bet that compute power is the ultimate commodity of the 21st century.

Subsequently it underwent some changes and OpenAI reportedly scrapped its plans to own and build its own dedicated data centers. Its flagship site at Texas is partially operational and others are in various stages.

Meta’s Prometheus and Hyperion

Not to be outdone, Meta announced its “Prometheus” supercluster in Ohio (1 GW) and the “Hyperion” cluster in Louisiana, designed to scale to 5 GW! Mark Zuckerberg described these facilities as having the “footprint of Manhattan”, a physical testament to the company’s pivot to “Superintelligence Labs”. Unlike Microsoft’s cloud-focused approach, Meta’s infrastructure is largely dedicated to training its open-source Llama models and powering its consumer AI products. While OpenAI and Microsoft pivoted to renting, Meta is still doubling down on owning.

Amazon and Google

Amazon Web Services (AWS) committed to a 2.4 GW expansion in Indiana alone, part of a $15 billion investment plan for the region. Google, meanwhile, aggressively acquired power infrastructure, buying Intersect Power for $4.75 billion to secure clean energy for its AI loads.

The Energy Bottleneck and the Nuclear Pivot

The sheer energy density of these projects has collided with the realities of the power grid. In 2025, data centres are projected to consume significant percentages of national electricity output in countries like Ireland and regions like Northern Virginia. This has forced a radical diversification of energy sourcing.

The Nuclear Renaissance: To bypass grid congestion, tech giants turned to nuclear power. Microsoft inked a historic deal to restart Three Mile Island Unit 1, effectively buying the plant’s entire output for decades. Simultaneously, OpenAI-backed Oklo pushed forward with plans for Small Modular Reactors (SMRs), aiming for deployment by 2027-2028. Oklo has received its initial regulatory approvals this month. The Department of Energy (DOE) has set a target for Oklo to achieve criticality (a self-sustaining nuclear reaction) at its pilot reactor by July 4, 2026. Oklo’s stock price has corrected a lot in the meanwhile.

Gas as a Bridge: With nuclear projects taking years to spool up, the immediate demand is being met by natural gas. “Behind-the-meter” gas plants—turbines installed directly at the data centre site—became the standard for rapid deployment. xAI pioneered this approach with its “Colossus” cluster, using rented gas turbines to bring 100,000 GPUs online in months rather than years.

The Infrastructure Giga Projects of 2025

Gigawatt Data Centre Economics

Here is a quick back of the envelope calculation for a GW AI data center:

Building a GW facility requires approximately $50 B of upfront capital with estimates varying from 35 to 60 billions of US dollars. With the latest Nividia’s Vera Rubin (GB300) clusters, it can go up to $65B as well!

Out of this, the lion’s share of 45% or more (48% for Vera Rubin) can go to the compute or GPUs.

Construction and Cooling takes about 25% of the capex, for specialized liquid cooling and lower for air cooling.

Electrical and Infrastructure costs about 20-23%.

High bandwidth Networking requires about 7-10%, with higher percentage for a million+ GPUs

Monthly Opex could touch a billion dollars or about 12B dollars per annum, including the depreciation with a 5-year replacement cycle. Incidentally, power transmission fees have gone up globally in 2026. Obsolescence rates are very high causing nervousness, but there are also instances where they can plod on. There are higher obsolescence rates when performance improvements are made, for e.g. performance-per-watt of a 2026 Vera Rubin chip is nearly 50x that of a 2023 Hopper chip, so facilities “plodding on” with older gear become too expensive to run relative to the tokens they produce!

With an estimated annual throughput of 1.6 quadrillion tokens per annum at 70% utilization, the production cost per million tokens would be about $7.

However, there are many variants that come into picture, with the input costs brought down with various optimizations, efficient compute, better models, extending the life (older TPUs are still not shut down due to high demand for compute) and so on. It may not be profitable for commodity/consumer chats, but for high value reasoning models and enterprise use cases.

Specialized hardwired chips/ASICs can bring this cost down to below a dollar. The companies which have the full stack – processors, models, data centres, software eco-system, applications are going retain much of the profit of all the layers, with the lion’s share being at the heart, the chips.

On February 1, 2026, in the Budget speech, the Finance Minister Nirmala Sitharaman introduced the most aggressive digital infrastructure incentive in India, by offering a tax holiday until 2047, for any foreign company providing cloud services (SaaS, IaaS, PaaS) to global customers using the data centres located in India and their global revenue will be exempt from Indian income tax until the year 2047. To qualify, the foreign entity must serve its global clients from Indian soil and the tax holiday is only available to those using “MeitY-notified” data centres. However, any revenue earned from Indian customers must be routed through a local Indian reseller entity, which remains subject to standard Indian corporate tax. This move has fundamentally changed the math for hyperscalers (such as Amazon, Google, Microsoft, Meta) and colocation providers (Equinix, Digital Realty, Yotta etc.). Before the budget there were about $70B worth of projects in pipeline and after the budget about $90B were recorded. The tax holiday makes India more attractive than the traditional hubs like Singapore, which faces land/power constraints or Dublin.

Crowding-out Effect on Funding

There has been a significant “Great Reallocation” of capital and the definition of a “venture-scale” startup has shifted. The massive CapEx for Giga scale projects has created a polarized investment landscape with AI startups captured 35% of all global venture capital in the most recent funding cycle. Investors have largely stopped funding “AI wrappers” (simple apps built on top of LLMs). Instead, the money is flowing into Infrastructure (power, chips, data centres) and Vertical AI (highly specialized models for law, medicine or engineering). High-interest borrowing by the biggies has also tightened the overall credit market, making it harder for small, non-AI startups to get cheaper loans.

However, AI is accelerating and helping the scientific discoveries much more rapidly. A few examples are given here:

In mid-March 2026, the collaboration between Google DeepMind, NVIDIA and EMBL-EBI achieved a milestone that has been called “the completion of the biological periodic table”. While AlphaFold 2 predicted the shape of a protein, AlphaFold 3 (running on NVIDIA’s latest Blackwell clusters) predicts the interaction. It can model how a protein binds to DNA, RNA and ligands (small molecules). For the first time, researchers can see the “lock and key” mechanism of viral entry into human cells in 3D before doing a single wet-lab experiment. This is the Human Interactome—a map of every conversation occurring between molecules in our bodies.

The massive investments in compute are also fundamentally changing the Probability of Technical Success (PTS) in medicine. Traditionally, finding a ‘lead compound’ i.e. a potential drug candidate took 3–5 years. In early 2026, AI-native bio-techs like Isomorphic Labs and Insilico Medicine have started consistently hitting this milestone in 13–18 months. Multiple AI-designed drugs for Idiopathic Pulmonary Fibrosis (IPF) and Solid Tumors are in Phase III trials right now. If these succeed by late 2026, it will prove that AI doesn’t just find more drugs, but better drugs that are less likely to fail in humans.

AI is now used to “twin” patients (digital twin!) – creating digital models of how a specific person might react to a drug based on their genome, which is helping to optimize clinical trial enrolment and reduce the “90% failure rate” that has plagued pharma for decades.

The University of New Hampshire (UNH) breakthrough, published in Nature Communications in February 2026, represents a fundamental shift in how we “mine” human knowledge to solve physical engineering problems, by building the Northeast Materials Database (NEMAD),a repository of 67,573 magnetic compounds and thereon researchers identified 25 entirely new high-temperature magnets.

Traditional magnets lose their “pull” as they get hot. For an EV motor or wind turbine, a material must stay magnetic at temperatures exceeding 150°C to 200°C. The 25 new compounds identified by the UNH AI were specifically selected because they maintain their magnetic properties at these extreme thresholds. Currently Neodymium is used. Extracting rare earths is infamously toxic and they geopolitical significance.

While the specific chemical formulas of the 25 new compounds are currently being shielded for patent and further testing, early data suggests they utilize more abundant elements like Iron, Cobalt, and Manganese in unique crystal geometries that mimic the strength of rare earths without the actual rare-earth atoms. Traditionally, discovering these 25 materials would have required 50 years of trial-and-error lab work. The AI did it in weeks by “reading” decades of unstructured scientific papers and predicting which combinations humans had missed.

While the AI discovery is a massive leap, the materials must now survive the “Valley of Death” between a database and a factory. So the funding has specialized with generic software startups struggling to find capital and deep tech startups that combine AI with biology, materials, robotics or other real world areas more effectively are receiving the largest Series A and B rounds in history.

Next Part IV: Vibe Coding and the Paradox of Democratization

The State of AI

Part II – The Economics of Intelligence: A Tale of Two Curves

By Bhanu Nallagonda, Cofounder, Ogha Technologies March ‘26

The financial dynamics of AI in 2025 were characterized by a dramatic divergence between the unit cost of intelligence and the aggregate cost of deployment. Understanding this paradox is essential to grasping the market forces shaping the industry.

The Training vs. Inference Cost Curve

For years, the industry spotlight was fixed on the astronomical costs of training foundation models – billions of dollars spent on GPU clusters to create a GPT-4 or Gemini or Claude. Their plans extend to spending trillions of dollars over the coming years. However, 2025 marked the definitive shift where inference (the cost of running the model) became the dominant economic factor.

The Plummeting Cost of Inference

The unit cost of intelligence dropped precipitously. According to the Stanford AI Index 2025, the inference cost for a system performing at the level of GPT-3.5 dropped over 280-fold between late 2022 and late 2024. This deflation was driven by a confluence of factors:

  • Algorithmic Optimization: Techniques like Sparse Activations (Mixture of Experts) allowed models to activate only a fraction of their parameters for any given token, drastically reducing the compute required per operation. Furthermore, distillation—teaching smaller models to mimic larger ones—allowed enterprise-grade performance on much lighter architectures.
  • Hardware Efficiency: The deployment of specialized inference chips and improved GPU architectures (like Nvidia’s Rubin platform) reduced the energy-per-token cost by roughly 30-40% annually.
  • Price Wars: Intense competition among model providers and aggregators like OpenRouter drove consumer-facing prices down to commodity levels. Developers could now shop for the cheapest intelligent token across dozens of providers.

The “Inference is the New Margin Killer” Paradox

Despite the collapse in per-token prices, enterprise spending on inference exploded. Industry reports from 2025 indicate that 60–80% of an AI system’s total lifecycle cost is now incurred during inference, not training. This trend is driven by three primary mechanisms:

  1. Jevons Paradox: As intelligence became cheaper, demand for it spiked. Developers stopped rationing tokens and began building applications that consume them voraciously.
  2. Agentic Loops and Token Creep: The shift to agentic AI means that a single user request (e.g., “plan a travel itinerary” or “build this whole website or app”) might trigger hundreds or thousands of internal model calls. The agent might search the web, verify the results, draft an itinerary, critique the itinerary, refine and rewrite it—all before the user sees a single word. A simple task that once costed one unit of inference now costs hundreds. Furthermore, RAG (Retrieval-Augmented Generation) systems stuff vast amounts of corporate data into the model’s context window for every query, multiplying the token count per interaction.

The “Autoscaling Tax”: The requirement for low-latency responses forces companies to keep GPUs “warm” and available 24/7. Unlike training, which is bursty and schedulable, inference demand is unpredictable, leading to utilization inefficiencies that can bloat costs.

Token Usage and The 100 Trillion Milestone

The volume of data processing in 2025 reached staggering levels. OpenRouter, a leading model aggregator, reported processing over 100 trillion tokens by mid-2025, with daily volumes exceeding 1 trillion tokens. To put this in perspective, this daily volume rivals the entire monthly throughput of major providers from just two years prior.

This growth is not merely a function of more users, but of heavier users. The fastest-growing behaviour on these platforms is “agentic looping” where models talk to models. This shift toward machine-to-machine communication suggests that in the future, the vast majority of AI text generation will never be read by a human—it will be read by other AIs as part of an intermediate processing step.

The Shift to Agentic AI — From Chat to Action

If 2025’s infrastructure was about size, its software trend was about being agentic. The industry consensus is that the era of the “chatbot”—a passive responder to human queries—is ending. It is being replaced by Agentic AI.

From “Prompt Engineering” to “Outcome Engineering”

The core shift in 2025 was from Generative AI (creating content) to Agentic AI (executing workflows). This transition, often termed “The Age of Autonomy,” involves models that can reason, plan and use tools to achieve high-level outcomes.

  • Outcome Engineering: The skill of “prompt engineering” (crafting the perfect text string) began to fade. It was replaced by “outcome engineering”—defining the parameters of success and allowing the agent to figure out the “how”. Users stopped asking models to “write code for a login page” and started asking them to “build a login system that integrates with Auth0 and handles these specific edge cases.”

The Unit of Work: In a chat paradigm, the unit of work is a “turn” of conversation. In an agentic paradigm, the unit of work is a “job”—booking a shipment, refactoring a codebase, or auditing a financial statement.

Swarm Intelligence and Frameworks

The “Single God Model”—one massive LLM doing everything—proved inefficient for complex tasks. 2025 saw the rise of Swarm Intelligence and multi-agent orchestration.

  • Specialization: Frameworks like Microsoft AutoGen, LangGraph and CrewAI allowed developers to build teams of specialized agents. One agent might be the “Researcher”, another the “Writer” and a third the “Critic.”
  • The “Critic” Loop: This collaborative approach was found to significantly reduce hallucinations. A “Critic” agent could catch errors made by the “Writer” before the human ever saw them, creating a self-correcting loop that mimicked human peer review.
  • Adoption Reality: Despite the hype, true autonomy remains rare. While 62% of companies experimented with agents, only a fraction (estimated 15-20%) had autonomous agents in production by year-end. The industry was stuck in “Pilot Purgatory”, struggling to trust agents with unsupervised execution of critical tasks.

The “Circular Economy” and the AI Bubble

A shadow looming over the 2025 AI landscape is the financial structure supporting this explosive growth. Analysts have identified a “circular funding loop” that resembles the vendor financing schemes of the dot-com era, raising concerns about a potential asset bubble.

The Anatomy of the Loop

The mechanism, as detailed in financial reports from late 2025, operates as follows:

  1. Chipmakers Invest: Companies like Nvidia invest billions of venture capital into AI startups and labs (e.g., OpenAI, CoreWeave, Mistral).
  2. Labs Buy Chips: These startups use the investment capital to purchase massive quantities of hardware (GPUs) and cloud services.
  3. Revenue Recognition: The chipmakers and cloud providers (Microsoft, Oracle, Nvidia) recognize these purchases as revenue, boosting their stock prices.
  4. Reinvestment: The boosted valuations allow for further investment and borrowing, continuing (or perpetuating!) the loop.

Critics argue that this creates “artificial” revenue. For instance, Nvidia’s investment in OpenAI is effectively Nvidia bankrolling its own future sales. This creates a “merry-go-round” of capital that inflates revenue figures without necessarily reflecting genuine, organic end-user demand.

Market Jitters

By late 2025, this fragility began to manifest in the markets. While the S&P 500 remained strong, Nvidia’s stock experienced volatility as investors questioned the sustainability of the $200-300 billion annual rise in AI capital expenditures. The disparity between infrastructure spend i.e. Capex and actual AI revenue (approx. $10-15 billion in pure API spend) remains the primary risk factor for 2026. If the startups fail to generate real-world revenue from end-users (read as SaaS income, not SaaSopacalypse, more on it later again) sufficient to cover these hardware costs, the cycle could unravel. As a proof point, recent financial disclosures and internal documents leaked in February 2026 verify that OpenAI has significantly revised its long-term revenue guidance upward by 27% while simultaneously reporting a contraction in profit margins due to surging operational costs. Against the earlier investor target of 40% of margin, it shrunk to 33% in the latest report for 2025 from 40% in 2024 despite the revenue growth. While this could be due to last minute purchase of premium compute to meet the token usage above expectations or estimates, the math of growing usage and reducing inference costs are not exactly helping the margins.

Majority of the revenues still come from the APIs. The revenues are multiplying or growing at a very fast clip year over year, yet they are much smaller at the moment for the investments being made. OpenAI says it’s on pace to generate $25 billion in revenue this year, versus Anthropic’s $19 billion, while their latest ARRs as of March ’26, are only at $5.2B and $1.8B respectively. So they need to grow many times, as many as 5x to 10x this year, in 2026!

Interestingly, Apple seems to be swimming against the tide (or are they languishing?), with a drastically lower capex spend when compared to the AI infra super spenders, though it did increase its capex year over year by about 35%. Apple’s share price correlation to Nasdaq-100 has hit the lowest in 20 years as per Bloomberg. They have licensed massive 1.2 T parameter mode from Google at about 1 B USD per annum, while hosting it in its own private cloud compute infrastructure.

At this layer, the risk takers are capable, deep pocketed, well funded and aware to a large extent. So though the investments are heavy and disproportionate, any failures will effect ruthlessly those, who are not strong enough to weather these or who bet the last penny unscrupulously. Would any of those be catastrophic?

‘Part III – The Infrastructure Arms Race’ follows…

The State of AI

Part I – The Model Landscape: The Frontier of Reasoning

By Bhanu Nallagonda, Cofounder, Ogha Technologies March ‘26

The relentless march of model performance continued in 2026, but the metric of success shifted irrevocably. In previous years, the vibe of a model was its fluency, creativity and the ability to hold a conversation was the primary differentiator. In 2025, the industry pivoted hard toward reasoning and utility. The landscape changed from “Can it write a poem?” to “Can it debug this repository, plan a logistics route and execute the API calls without hallucinating?”

The Titans: A Comparative Analysis

The release cycle of the AI Titans was dominated by the intensification of the rivalry between the primary research labs—Google DeepMind, OpenAI and Anthropic—along with the surging capabilities of open-weight contenders that have fundamentally altered the competitive landscape.

Google: The Gemini 3 Era

In 2025, Google successfully shed the perception of being a “fast follower” and reasserted its research dominance with the release of the Gemini 3 family. Unlike its predecessors, which were defined primarily by their native multimodal architecture, Gemini 3 was defined by “big leaps in reasoning” and efficiency.

The Gemini 3 Pro model demonstrated that improvements in agentic capability could be decoupled from massive parameter scaling. Instead, Google focused on architectural refinements that allowed for better decision-making in multi-step workflows. This “reasoning-first” approach allowed Gemini 3 to excel in scientific domains, boosting breakthroughs in genomics (in gene editing, disease interpretation and drug discovery etc.) and quantum computing (helping develop expert level empirical software). Furthermore, the introduction of Gemma 3 continued Google’s aggressive push into the open-model space, offering developers powerful local inference capabilities that rivalled the previous year’s frontier models, effectively commoditizing “GPT-4 class” intelligence for local devices.

OpenAI: The Bifurcation of Intelligence

OpenAI’s strategy in 2025 diverged into two distinct lineages, acknowledging that “creative fluency” and “logical reasoning” might require different architectures, while continuing its consumer focus.

  • If GPT-5.2 was the apex of fluid conversation, GPT-5.4 is the ‘current’ undisputed master of autonomous execution. It is the first flagship to successfully unify the ‘thinking’ depth of a reasoning model with the ‘doing’ agility of a specialized agent. By scoring a record 83% on the GDPval benchmark across 44 professional occupations, it has effectively moved beyond being a ‘super-spellcheck’ to becoming a digital specialist in law, finance, and engineering. While there were some rumours of 2M token context window, it debuted with a 1M long context window, effectively eliminating the need for complex RAG (Retrieval) workarounds in 90% of use cases. GPT-5.4 doesn’t just write code; it operates the computer. It can navigate a desktop environment, use a mouse and keyboard to interact with non-API legacy software, and perform multi-step workflows across different applications with a 75% success rate on OSWorld-Verified—surpassing the measured human baseline. It can extend the lifetime of these legacy software, while competing for the ‘seat’ of a Junior Investment Analyst or a Staff Engineer. Just to elaborate, Google’s Gemini’s score was based on earlier terminal based benchmarks and humans score 72.4 on this (for whatever be the reasons) and Claude 4.6 Sonnet at 72.5, a shade above the humans!
  • The o-Series (o1, o3, o3-mini): The real paradigm shift, however, was operationalized by the o-series. These models introduced the concept of “test-time compute” or “thinking” phases. When asked a complex math or coding problem, o3 does not simply predict the next token. It generates hidden chains of thought, exploring multiple logical paths, verifying its own assumptions and backtracking if it detects an error, before finally outputting a response. This “System 2” thinking significantly reduced hallucination rates in high-stakes tasks.

Anthropic: The Enterprise Workhorse

Anthropic continued to cultivate its reputation for safety and reliability, a positioning that paid dividends in the enterprise market. The Claude 4.5 series, particularly Claude Opus 4.5, emerged as the heavy lifter for complex engineering tasks.

  • Beyond Coding Dominance: While Claude Opus 4.5 conquered the ‘contamination-free’ benchmarks, Claude Opus 4.6 Thinking has redefined the ‘contamination-free’ organization. It is no longer just solving GitHub issues; it is managing them. By introducing Adaptive Thinking—a native reasoning layer that self-scales its ‘effort’—and a massive 1 million token context window, Opus 4.6 has become the gold standard for full-repository refactoring. It is the first model to score a staggering 80.8% on SWE-bench Verified, effectively ending the era of ‘file-by-file’ coding in favor of ‘system-wide’ orchestration.
  • Contextual Mastery: Its massive context window and superior instruction-following capabilities made it the preferred engine for many enterprise agentic frameworks.

So it is becoming rather obvious that different leaders are pursuing different objectives, which is a good thing and that also makes not directly comparable, perhaps more so in future if they diverge more in their paths, leaving the user to choose what is best suited for their task on hand.

The Open-Weight Insurgency

Perhaps the most disruptive trend of 2025 was the compression of the performance gap between proprietary (closed) and open-weight models. Stanford’s AI Index Report 2025 highlighted that the performance difference on some benchmarks shrank from a significant 8% to a negligible 1.7% within a single year.

There are allegations that models gaming the benchmarks with distillation, so while their benchmark performance is excellent, the real world performance is not at the same level (Vibe Divergence). There is some truth in this and most of the models use synthetic data generated by frontier models for finetuning through distillation.

Models like Llama 3.3, DeepSeek-V3 and Qwen 3 provided enterprise-grade performance at a fraction of the cost. DeepSeek-V3, in particular, stunned the industry by offering performance parity with GPT-4 in coding tasks while being available as free download, forcing closed providers to compete on service, reliability and extreme-frontier capabilities rather than raw intelligence alone.

The Crisis of Measurement: Benchmarking

As models saturated traditional benchmarks like MMLU (Massive Multitask Language Understanding) with scores nearing 90%+, the industry faced a crisis of measurement. “Contamination”—the phenomenon where models memorize test questions present in their training data—rendered many classic benchmarks useless. In response, 2025 saw the rise of “living” benchmarks designed to be un-gameable.

LiveBench and Humanity’s Last Exam

The inadequacy of static benchmarks led to the adoption of LiveBench and “Humanity’s Last Exam” as the new gold standards for frontier evaluation.

  • Methodology: Sponsored by Abacus.AI, LiveBench introduced a regime of regularly released, new questions with objective ground-truth answers. This design specifically limits potential contamination, as the questions did not exist when the models were trained.
  • The Reality Check: On these rigorous tests, the gap between the absolute frontier and the “efficient” tier became starkly visible. While marketing materials claimed near-perfect scores, reality showed that on the hardest tasks, even the best models struggled. Gemini 3 Pro and Kimi K2 Thinking led the pack on Humanity’s Last Exam, but with scores only in the 45-50% range. This sobering data revealed that while AI is superhuman at retrieval, it remains fallible at novel, high-complexity reasoning.

SWE-Bench: The Coding Crucible

For software engineering tasks, SWE-Bench Pro became the definitive arena. Unlike simple coding contests (like HumanEval), SWE-Bench evaluates a model’s ability to navigate a complex, multi-file repository and fix a specific issue—a task representative of a real software engineer’s daily work.The New Ceiling: By late 2025, Claude Opus 4.5 and Gemini 3 Pro were trading the top spot, achieving resolution rates around 43-46%. While this represents a massive leap from the single-digit success rates seen in 2023, it also highlights that more than half of complex software engineering tasks still require human intervention.

The Frontier Model Leaderboard

RankModelProviderBenchmark/Signal Highlight
1GPT-5.4 ThinkingOpenAI75% on OSWorld-Verified; First model to exceed human baseline in native computer-use.
2Claude Opus 4.6 (Thinking)Anthropic80.8% on SWE-bench Verified; Leader in multi-agent orchestration and architectural refactoring.
3Gemini 3.1 Pro (Preview)Google77.1% on ARC-AGI-2; Highest verified score in novel logic and pattern inference.
4Grok 4.1 ThinkingxAI1483 Arena Elo; Holds #1 in human preference for creative and “unconstrained” reasoning.
5Kimi K2.5 ThinkingMoonshot AI#1 for Agent Swarm; Can orchestrate up to 100 concurrent sub-agents for massive parallel tasks.
6Seed 2.0 ProByteDance89.5 on VideoMME; Dominates in professional video understanding and temporal reasoning.

The key aspect of the leader board is that the leadership keeps changing with each of the leading players releasing their latest and greatest model and shuffling the deck. I confess that I needed to revise this table as I write this in March 2026.

Another aspect is that the time a model spends at the top of the leadership board is also shrinking as the competition intensifies and more capable models are released regularly and more frequently and there does not seem to be a finish line for this race anytime soon. Needless to say, the contents of above table could change between the time I write this and you read it.

However, there are no new players breaking into the ranks of top 3 and it could become increasingly difficult to do so, but let us not rule out any breakthroughs so soon.

As of early 2026, Hugging Face hosts over 2.1M models. The community reached the 1M model milestone in mid 2024. So the growth rate or the ferocity of the number of models can be estimated.

The number of truly frontier class models would be about 50 as of the writing.

While there are a much smaller number of notable and a few original foundational models, there are millions of variants. Then there are unknown number of private models and internal fine tunes.

Comparison of Model Tiers (Feb 2026)

Model TierEstimated CountKey Examples
Frontier Models< 50GPT-5.2, Claude 4.5, Gemini 3
Notable Models~4,000DeepSeek-V3, Llama-4, Mistral-Next
Open-Source (Public)2.1 Million+Hugging Face Repositories
Private Enterprise~12–15 Million (estimated)Proprietary internal tools

Natural Language Processing Models dominate the landscape with about 58-60% of share. Computer Vision Models are the next with ~20%. Audio and Speech models about 15% and finally Multimodal and others forming the rest i.e. about 5-6%. The fastest growing models are the Agentic and Reasoning models followed by Vision Language Models (VLMs).

Company-Specific Roadmaps

CompanyFlagship Model (2026)Strategic Focus & Current Pursuit
OpenAIo3 / o4-mini / GPT-5The AI Super-Assistant: Moving toward a unified “Frontier” model that acts as a primary interface for all digital tasks. Investing heavily in custom silicon to lower inference costs.
GoogleGemini 3 (Pro/Ultra)The Personalized Ecosystem: Deep integration with Google Workspace and Chrome. “Auto Browse” features allow Gemini to book tickets and manage travel natively within the browser.
AnthropicClaude Opus 4.6Trust & Computer Use: Doubling down on “Computer Use” (Claude moving the mouse/clicking buttons) and legal/financial “High-Stakes Reasoning” where auditability is non-negotiable.
MetaLlama 4 (Scout/Maverick)The Open Infrastructure: Maintaining the open-source lead. Llama 4 uses a sparse MoE architecture with a 10M context window to enable self-hosting for massive enterprise RAG systems.

OpenAI: From Monolith to Portfolio

OpenAI has abandoned the “one model for everyone” approach. Their 2026 roadmap features GPT-5.2 for premium knowledge work and gpt-oss (open-weight) models to defend against Meta. They are also testing “AI Inboxes” and “Search AI Mode” that checkout shopping carts directly via a Universal Commerce Protocol.

Google: “Agentic Vision” & Personal Intelligence

Google’s Gemini 3 Flash now features “Agentic Vision.” Unlike passive snapshots, the model “explores” an image or video to find tiny details, drastically reducing hallucinations in visual tasks. Their focus is on Personal Intelligence – connecting to your Photos, Gmail, and History to become a “partner who is already up to speed”.

Anthropic: “Computer Use” & Agentic Coding

Anthropic is the leader in Autonomous Software Engineering. Their report on “Agentic Coding” shows Claude-based agents refactoring 12.5 million lines of code in 7 hours with 99.9% accuracy. They are pursuing “Foundry” – a platform where agents operate with sovereign-level trust and governance.

Meta: Llama Models and massive context window

Llama 4 Scout model, released by Meta on April 5, 2025, officially supports a 10-million-token context window. This was a significant technical milestone, making it the first publicly available open-weight model to offer such a massive capacity – surpassing rivals like GPT-4o (128K) and Gemini 1.5 Pro (2M) at its launch.

The Llama 4 family consists of different models with varying context capabilities:

  • Llama 4 Scout (109B total / 17B active): Features the full 10 million token window.
  • Llama 4 Maverick (400B total / 17B active): Supports a 1 million token window.
  • Llama 4 Behemoth (2T total / 288B active): Announced as a flagship “teacher” model capable of even more complex reasoning, though initially released in preview for research.

Pushing the Frontier in 2027

By 2027, the focus is expected to shift toward Mechanistic Interpretability (understanding why a model thinks) and Verifiable Rewards. This will allow AI to be used in high-risk zones like autonomous surgical assistants or grid-level energy management where “black box” logic is currently a legal blocker.

Mechanistic Interpretability is considered a core pillar of transparent and explainable AI (XAI).

Part II – The Economics of Intelligence: A Tale of Two Curves follows..

The State of AI – Fears, Opportunities and the Promise

By Bhanu Nallagonda, Cofounder, Ogha Technologies

March ‘26

The Age of Autonomy, The Infrastructure of Gigawatts and The Paradox of Intelligence

As the AI related voices and noise grew louder and louder, I felt it is pertinent to sit down, take a hard look at it and assimilate the progress so far and figure out where the things are headed, so that we can be where the puck is going to be and not where it has been or is currently. The key questions are the status of the frontier models, AI bubble, vibe coding and its impact, changes to the IT services landscape, threat to the jobs and the AX. The main objective is to scratch the surface and not be swayed by all the hype surrounding it. This blog is divided into multiple parts to make it an easier read. It provides an exhaustive, 360-degree analysis of the state of Artificial Intelligence as we speak. We will dissect the technical breakthroughs of the models, analyze the plummeting cost curves of inference, map the sprawling infrastructure of the AI arms race, explore the sociological shifts brought about by “vibe coding”, look at the strategies being adopted by the traditional IT industry to cope up with these transitions and assuage the investors’, students’ and the entry level developers’ anxiety. Finally, we will extrapolate these trends to figure out where it is all going, how it would reshape the industries, players and the startup eco system. All opinions expressed are personal. The pictures are generated by my co-founder, Kiran using AI tools apart from being a critique.

Introduction

The year 2025 will perhaps go down in the history of computing as the year that saw AI’s fundamental maturation. 2023 saw the advent of the “Chatbot”, a kind of crossing the blurred line of Turing Test, but firmly, and following that 2024 was the year of “Multimodality” – where models learned to see and hear. 2025 has brought in the “Age of Autonomy”, the year when artificial intelligence started acting, beyond just talking. Physical AI and world models have started chalking out their own paths in the meanwhile.

During the twelve-month cycle of 2025, the industry has navigated extreme contradictions. We have witnessed the raw intelligence of frontier models shattering benchmarks that were considered “impossible” merely a few months ago, with systems like Google’s Gemini 3 (and 3.1 Pro this year) and OpenAI’s GPT-5.2 (and 5.3 and 5.4 subsequently this year) demonstrating reasoning capabilities that rival human experts in narrow domains. Of course, the earlier benchmarks themselves got saturated paving way for new ones. More on it later. Simultaneously, the industry is grappling with a profound economic paradox: the unit cost of raw intelligence has plummeted by nearly three orders of magnitude, while the capital required to train a new model skyrocketed into billions and the aggregate cost of deploying enterprise AI also going up manyfold, driven by the voracious appetite of agentic workflows and “swarm” architectures. The paradox does not end there, many reports point now that more than 60% of AI system’s total lifecycle costs come from inference, not training! The share of inference would go up with increasing adoption, despite the cost of inference coming down and the cost of new models skyrocketing with predominant brute force approaches with an eye on AGI. It is also desirable that the usage goes up with real business benefits so that the huge investments made are paid back.

Training vs Inference Costs

The physical manifestation of this digital revolution has become impossible to ignore. The race to Artificial General Intelligence (AGI) has morphed from a battle of algorithms into a battle of gigawatts. We are witnessing the construction of “giga-scale” infrastructure projects—like the $500 billion “Stargate” initiative and Meta’s “Prometheus” supercluster—that rival the industrial mobilizations of the 20th century. These are not merely data centres; they are modern cathedrals of compute, consuming energy on the scale of nation-states to power the next generation of synthetic cognition. However, a shadow looms over this expansive growth. A “circular economy” of funding has emerged, where chip makers invest in the very cloud providers that purchase their hardware, fuelling fears of a catastrophic asset bubble reminiscent of the dot-com crash. As valuations detach from current revenue realities, the market asks a critical question: is this the buildup to a new industrial revolution, or a prelude to a correction or even a crash?

Please read on Part I – The Model Landscape: The Frontier of Reasoning

MLOps – Machine Learning Operations

Kiran Kumar Nallagonda

Introduction

The continuous process of operationalizing machine learning models to get business value requires observability, monitoring, feedback mechanism to retrain the models whenever necessary.

Gartner predicted in 2020 that, 80 percent of AI projects would remain alchemy i.e. run by wizards whose talents will not scale in the organization and that only 20 percent of analytical insights will deliver business outcomes by 2022.  Rackspace corroborates that claim in a survey completed in January of 2021 saying that 80 percent of companies are still exploring or struggling to deploy ML models.

 The general challenges are that most of the models are difficult to use, hard to understand, have least explainability and are computationally intensive. With these challenges, it is very hard to extract the business value. The goal of MLOps is to extract business value from the data by efficiently operationalizing ML models at scale.  A data scientist may find a model which functions as per business requirements, but deploying the model into production with observability, monitoring and feedback loop complete with automated pipelines, at low expense, high reliability and at scale require entirely different set of skills. This can be achieved in parallel collaboration with DevOps teams.

An ML engineer builds ML pipelines that can reproduce the results of the models discovered by the data scientist automatically, inexpensively, reliably and at scale.

MLOps Principles

Here are a few principles to keep them in check for better MLOps:

         a) Tracking or Software Configuration

         ML models are software artifacts that need to be deployed. Tracking provenance is critical for deploying any good software and typically handled through version control systems. But, building ML models depends on complex details such as data, model architectures, hyper parameters and external software.  Keeping track of these details is vital, but can be simplified greatly with the right tools, patterns, and practices. For example, this complexity could be simplified by adopting dockerization and/or kubernetization of all components and overlaying usual DevOps version controls.

         b) Automation and DevOps

         Automation is key to modern DevOps, but it’s more difficult for ML models. In a traditional software application, a continuous integration and continuous delivery (CI/CD) pipeline would pick up some versioned source code for deployment. For an ML application, the pipeline should not only automate training models, but also automate model retraining along with archival of training data and other artifacts.

         c) Monitoring/Observability

         Monitoring software requires good logging and alerting, but there are special considerations to be made for ML applications. All predictions generated by ML models should be logged in such a way that enables traceability back to the model training job. ML applications should also be monitored for invalid predictions or data drift, which may require models to be retrained.

         d) Reliability

         ML models can be harder to test and computationally more expensive than traditional software. It is important to make sure your ML applications function as expected and are resilient to failures. Getting reliability right for ML requires some special considerations around security and testing.

         e) Cost Optimization

         MLOps are more deeply involved with cost intensive infrastructure resources and personnel. Continuous cost monitoring and making necessary adjustments from time to time to optimize the cost as well as to drive more business value is extremely important. For some of the models, training could be the cost intensive part of the work, when compared to entire life cycle of the model and its operations. But this cost equation could change entirely when model gets deployed and scaled to numerous instances. For example, initially Alexa’s speech to text, NLP, NLG related model training was cost intensive in terms of collecting, processing the data and training the models using expensive computational resources. After the models are deployed on the cloud and scaled to planet level, most of the cost shifted to inference layer part of the MLOps.

         These kinds of cost dynamics can be tackled by estimating and monitoring the costs, adopting right technologies, architectures and processes.

         In the above example, inference layer cost is off-loaded to the device itself partially, instead of utilizing the cloud resources in every instance.

         Even the training cost will have different equation when federated learning kind of architectures are adopted. Apart from these dynamics, standardizing on the right tools for tracking (and training) models will noticeably reduce the time and effort necessary to transfer models between the data science and data engineering teams.

Model Registry

A model registry acts as a location for data scientists to store models as they are trained, simplifying the bookkeeping process during research and development. Models retrained as part of the production deployment should also be stored in the same registry to enable comparison to the original versions. 

A good model registry should allow tracking of models by name/project and assigning a version number. When a model is registered, it should also include metadata from the training job. At the very least, the metadata should include:

  • Location of the model artifact(s) for deployment.  
  • Revision numbers for custom code used to train the model, such as the git version hash for the relevant project repository.
  • Information on how to reproduce the training environment, such as a Dockerfile, Conda environment YAML file, or PIP requirements file.
  • References to the training data, such as a file path, database table name, or query used to select the data.

Without the original training data, it will be impossible to reproduce the model itself or explore variations down the road. Try to reference a static version of the data, such as a snapshot or immutable file. In the case of very large datasets, it can be impractical to make a copy of the data. Advanced storage technologies (e.g. Amazon S3 versioning or a metadata system like Apache Atlas) are helpful for tracking large volumes of data.

Having a model registry puts structure around the handoff between data scientists and engineering teams. When a model in production produces erroneous output, registries make it easy to determine which model is causing the issue and roll back to a previous version of the model if necessary. Without a model registry, you might run the risk of deleting or losing track of the previous model, making rollback tedious or impossible. Model registries also enable auditing of model predictions.

Some data scientists may resist incorporating model registries into their workflows, citing the inconvenience of having to register models during their training jobs. Bypassing the model-registration step should be discouraged as a discipline and disallowed by policy. It is easy to justify a registry requirement on the grounds of streamlined handoff and auditing, and data scientists usually come to find that registering models can simplify their bookkeeping as they experiment.

Good model-registry tools make tracking of models virtually effortless for data scientists and engineering teams; in many cases, it can be automated in the background or handled with a single API call from model training code.

Model registries come in many shapes and sizes to fit different organizations based on their unique needs.  Common options fall into a few categories:

  • Cloud-provider registries such as Sagemaker Model Registry or Azure Model Registry.  These tools are great for organizations that are committed to a single cloud provider.
  • Open-source registries like MLflow, which enable customization across many environments and technology stacks. Some of these tools might also integrate with external registries; for instance, MLflow can integrate with Sagemaker Model Registry.
  • Registries incorporated into high-end data-science platforms such as Dataiku DSS or DataRobot. These tools work great if your data scientists want to use them and your organization is willing to pay extra for simple and streamlined ML pipelines.

Feature Stores

Feature stores can make it easier to track what data is being used for ML predictions, but also help data scientists and ML engineers reuse features for multiple models. A feature store provides a repository for data scientists to keep track of features they have extracted or developed for models. In other words, if a data scientist retrieves data for a model (or engineers a new feature based on some existing features), they can commit that to the feature store. Once a feature is in the feature store, it can be reused to train new models – not just by the data scientist who created it, but by anyone within your organization who trains models.

The intent of a feature store is not only to allow data scientists iterate quickly by reusing past work, but also to accelerate the work for productionizing models. If features are committed to a feature store, your engineering teams can more easily incorporate the associated logic into the production pipeline. When it’s time to deploy a new model that uses the same feature, there won’t be any additional work to code up new calculations.  

Feature stores work the best for organizations that have commonly used data entities that are applicable to many different models or applications. Take, for example, a retailer with many e-commerce customers – most of that company’s ML models will be used to predict customer behavior and trends.  In that case, it makes a lot of sense to build a feature store around the customer entity. Every time a data scientist creates a new feature to better represent customers, it can be committed to the feature store for any ML model making predictions about customers. 

Another good reason to use feature stores is for batch-scoring scenarios. If you are scoring multiple models on large batches of data (rather than one-off/real-time) then it makes sense to pre-compute the features. The pre-computed features can be stored for reuse rather than being recalculated for every model.

MLOps Pipeline

More efficient pipelines are constructed in combination with DevOps.  Here are outlined steps:

  1. Establish Version Control
  2. Implement CI/CD pipeline
  3. Implement proper logging, centralized log stash, retrieval and querying the logs.
  4. Monitor
  5. Iterate for continuous improvement

Conclusion

Developing an ML production pipeline that delivers business value is extremely challenging and can be mitigated with right deployment of resources, tools, personnel, expertise, and best practices. Remember to keep it simple, iterate it to continuously improve till it meets necessary business value.

References

Algorithmic Portfolio Management

Photo by Ramón Salinero on Unsplash

Bhanu Nallagonda

When everything goes algorithmic nowadays, why not Portfolio Management?

“Algorithmic Portfolio Management” gets a few thousand results on Google, compared to about 9 million results for “Algorithmic Trading” and in LinkedIn training, one gets zero results as on date!

In algorithmic trading or algo trading for short, preprogrammed algorithms or set of processes execute the trades. Its volumes have steadily increased over years, reaching about 60-80% of the total trading volumes depending on the markets, higher in advanced equity and forex markets and with about 40-50% of trading volume being generated in commodity markets. It also increases volatility and certain risks with millions to billions of market value getting wiped off within minutes and then recovering.

The top reasons for using algo trading are – ease of use, improved trader productivity, consistency of execution performance, lower costs/commissions, better monitoring and high speed/lower latency. Money management fund managers use algo trading to implement their investment decisions. There are traditional strategies such as mean reversion, price or earnings momentum, value and multi-factor or combination of multiple strategies and machine learning based ones such as artificial neural networks, k-NN and Bayes etc.

One specific trend over the years has been diminishing alpha and it is increasingly becoming difficult for actively managed funds to beat their benchmark indices, after expenses. ETFs are making a comeback or gaining mind and market share in the recent years. In the US, passive ETFs have attracted more investments than passive mutual funds. In order to keep the growing tendency of operational and management costs going up, there is an increasing need to leverage technology to be more efficient and effective.

Then there are quant funds, in which securities to invest are chosen through quantitative analysis based on numerical data and without any subjective intervention. While their cost of management is lower as fund managers’ efforts and interventions are much lower, their performance has not been consistent over long time.

So, how is Algorithmic Portfolio Management different from algo trading and is there a case for it to be similarly popular going forward, in this algorithm driven world? It is likely to be so, and let us look at it, along with the causes and trends that would drive it up in future.

Robo-advisory services, which provide algorithmic financial planning services to individuals after collecting their information, have been getting popular. They started with passive indexing strategies and moved onto more sophisticated optimization with variants of modern portfolio theory, tax loss harvesting and retirement planning.

With the advent of ever-increasing computational power and availability of broader and deeper data, Machine Learning brings in more sophistication to the algorithms. Machine Learning (ML) and Artificial Intelligence (AI) make analysis of new forms of data such as unstructured ones, hitherto not practical. While absence of investors’ human biases and subjective judgements are touted as advantages, AI/ML models can have their own biases depending on the data fed to them, deficiencies and limitations of the algorithms used and may even reflect the biases and preferences of algorithm constructors.

In Algorithmic Portfolio Management, a portfolio of assets and sub-assets need to be managed for better risk adjusted returns. That is a key difference when compared to algo trading, which is more of one dimensional, focused on single security at a time. So, the key aspects of Algorithmic Portfolio Management are:

  • Asset Allocation
  • Portfolio Construction
  • Portfolio Execution
  • Performance Monitoring and Evaluation
  • Rebalancing

Asset allocation is the single biggest factor that determines a large percentage of the returns or variance in the returns of the portfolio over long periods. Efficiency Frontier can optimize the portfolio for low risk and high expected return or vice versa. Diversification with negatively or low correlated securities lower the standard deviation or the risk. Monte Carlo simulation is used for risk analysis by producing distributions of possible outcomes. Apart from these age old and traditional techniques, principal component analysis can be used for feature selection i.e. to choose the parameters and aspects that matter and ML algorithms can be used for better optimization. While there are diminishing benefits with greater diversification, machine driven algorithmic approach can make management of higher number of securities more effective and easier than with human based processes.

Portfolio construction involves careful selection of securities for better risk adjusted returns. Algorithmic frameworks including macro and micro level decisions can be used for greater alignment of investment objectives and risk profiles.

Portfolio execution in terms of buying and selling securities can leverage algo trading for lower market impact and better outcomes. Higher percentage of larger ticket size trades of institutions tend to use algo trading more than for smaller trades.

With the availability real time and near real time data and computational power, portfolio performance monitoring and evaluation can be more frequent and thus triggering effective rebalancing in near real time based on market data for more optimal returns.

Passive rebalancing including calendar and/or percentage-based rebalancing is used for robo-advisory approaches. Algorithmic management can bring in more sophisticated and optimization for active and dynamic asset allocation and rebalancing driven by them. Dynamic asset allocation is not driven by fixed percentage allocations, but involves a more nuanced approach of changing the securities and their constitution based on analysis or algorithmic output.

In conclusion, fund managers are expected to leverage algorithmic portfolio management to complement subjective decisions, to reduce the costs of management and for greater alpha, though portfolios may not be driven entirely by algorithms in near future.