The State of AI

Part I – The Model Landscape: The Frontier of Reasoning

By Bhanu Nallagonda, Cofounder, Ogha Technologies March ‘26

The relentless march of model performance continued in 2026, but the metric of success shifted irrevocably. In previous years, the vibe of a model was its fluency, creativity and the ability to hold a conversation was the primary differentiator. In 2025, the industry pivoted hard toward reasoning and utility. The landscape changed from “Can it write a poem?” to “Can it debug this repository, plan a logistics route and execute the API calls without hallucinating?”

The Titans: A Comparative Analysis

The release cycle of the AI Titans was dominated by the intensification of the rivalry between the primary research labs—Google DeepMind, OpenAI and Anthropic—along with the surging capabilities of open-weight contenders that have fundamentally altered the competitive landscape.

Google: The Gemini 3 Era

In 2025, Google successfully shed the perception of being a “fast follower” and reasserted its research dominance with the release of the Gemini 3 family. Unlike its predecessors, which were defined primarily by their native multimodal architecture, Gemini 3 was defined by “big leaps in reasoning” and efficiency.

The Gemini 3 Pro model demonstrated that improvements in agentic capability could be decoupled from massive parameter scaling. Instead, Google focused on architectural refinements that allowed for better decision-making in multi-step workflows. This “reasoning-first” approach allowed Gemini 3 to excel in scientific domains, boosting breakthroughs in genomics (in gene editing, disease interpretation and drug discovery etc.) and quantum computing (helping develop expert level empirical software). Furthermore, the introduction of Gemma 3 continued Google’s aggressive push into the open-model space, offering developers powerful local inference capabilities that rivalled the previous year’s frontier models, effectively commoditizing “GPT-4 class” intelligence for local devices.

OpenAI: The Bifurcation of Intelligence

OpenAI’s strategy in 2025 diverged into two distinct lineages, acknowledging that “creative fluency” and “logical reasoning” might require different architectures, while continuing its consumer focus.

  • If GPT-5.2 was the apex of fluid conversation, GPT-5.4 is the ‘current’ undisputed master of autonomous execution. It is the first flagship to successfully unify the ‘thinking’ depth of a reasoning model with the ‘doing’ agility of a specialized agent. By scoring a record 83% on the GDPval benchmark across 44 professional occupations, it has effectively moved beyond being a ‘super-spellcheck’ to becoming a digital specialist in law, finance, and engineering. While there were some rumours of 2M token context window, it debuted with a 1M long context window, effectively eliminating the need for complex RAG (Retrieval) workarounds in 90% of use cases. GPT-5.4 doesn’t just write code; it operates the computer. It can navigate a desktop environment, use a mouse and keyboard to interact with non-API legacy software, and perform multi-step workflows across different applications with a 75% success rate on OSWorld-Verified—surpassing the measured human baseline. It can extend the lifetime of these legacy software, while competing for the ‘seat’ of a Junior Investment Analyst or a Staff Engineer. Just to elaborate, Google’s Gemini’s score was based on earlier terminal based benchmarks and humans score 72.4 on this (for whatever be the reasons) and Claude 4.6 Sonnet at 72.5, a shade above the humans!
  • The o-Series (o1, o3, o3-mini): The real paradigm shift, however, was operationalized by the o-series. These models introduced the concept of “test-time compute” or “thinking” phases. When asked a complex math or coding problem, o3 does not simply predict the next token. It generates hidden chains of thought, exploring multiple logical paths, verifying its own assumptions and backtracking if it detects an error, before finally outputting a response. This “System 2” thinking significantly reduced hallucination rates in high-stakes tasks.

Anthropic: The Enterprise Workhorse

Anthropic continued to cultivate its reputation for safety and reliability, a positioning that paid dividends in the enterprise market. The Claude 4.5 series, particularly Claude Opus 4.5, emerged as the heavy lifter for complex engineering tasks.

  • Beyond Coding Dominance: While Claude Opus 4.5 conquered the ‘contamination-free’ benchmarks, Claude Opus 4.6 Thinking has redefined the ‘contamination-free’ organization. It is no longer just solving GitHub issues; it is managing them. By introducing Adaptive Thinking—a native reasoning layer that self-scales its ‘effort’—and a massive 1 million token context window, Opus 4.6 has become the gold standard for full-repository refactoring. It is the first model to score a staggering 80.8% on SWE-bench Verified, effectively ending the era of ‘file-by-file’ coding in favor of ‘system-wide’ orchestration.
  • Contextual Mastery: Its massive context window and superior instruction-following capabilities made it the preferred engine for many enterprise agentic frameworks.

So it is becoming rather obvious that different leaders are pursuing different objectives, which is a good thing and that also makes not directly comparable, perhaps more so in future if they diverge more in their paths, leaving the user to choose what is best suited for their task on hand.

The Open-Weight Insurgency

Perhaps the most disruptive trend of 2025 was the compression of the performance gap between proprietary (closed) and open-weight models. Stanford’s AI Index Report 2025 highlighted that the performance difference on some benchmarks shrank from a significant 8% to a negligible 1.7% within a single year.

There are allegations that models gaming the benchmarks with distillation, so while their benchmark performance is excellent, the real world performance is not at the same level (Vibe Divergence). There is some truth in this and most of the models use synthetic data generated by frontier models for finetuning through distillation.

Models like Llama 3.3, DeepSeek-V3 and Qwen 3 provided enterprise-grade performance at a fraction of the cost. DeepSeek-V3, in particular, stunned the industry by offering performance parity with GPT-4 in coding tasks while being available as free download, forcing closed providers to compete on service, reliability and extreme-frontier capabilities rather than raw intelligence alone.

The Crisis of Measurement: Benchmarking

As models saturated traditional benchmarks like MMLU (Massive Multitask Language Understanding) with scores nearing 90%+, the industry faced a crisis of measurement. “Contamination”—the phenomenon where models memorize test questions present in their training data—rendered many classic benchmarks useless. In response, 2025 saw the rise of “living” benchmarks designed to be un-gameable.

LiveBench and Humanity’s Last Exam

The inadequacy of static benchmarks led to the adoption of LiveBench and “Humanity’s Last Exam” as the new gold standards for frontier evaluation.

  • Methodology: Sponsored by Abacus.AI, LiveBench introduced a regime of regularly released, new questions with objective ground-truth answers. This design specifically limits potential contamination, as the questions did not exist when the models were trained.
  • The Reality Check: On these rigorous tests, the gap between the absolute frontier and the “efficient” tier became starkly visible. While marketing materials claimed near-perfect scores, reality showed that on the hardest tasks, even the best models struggled. Gemini 3 Pro and Kimi K2 Thinking led the pack on Humanity’s Last Exam, but with scores only in the 45-50% range. This sobering data revealed that while AI is superhuman at retrieval, it remains fallible at novel, high-complexity reasoning.

SWE-Bench: The Coding Crucible

For software engineering tasks, SWE-Bench Pro became the definitive arena. Unlike simple coding contests (like HumanEval), SWE-Bench evaluates a model’s ability to navigate a complex, multi-file repository and fix a specific issue—a task representative of a real software engineer’s daily work.The New Ceiling: By late 2025, Claude Opus 4.5 and Gemini 3 Pro were trading the top spot, achieving resolution rates around 43-46%. While this represents a massive leap from the single-digit success rates seen in 2023, it also highlights that more than half of complex software engineering tasks still require human intervention.

The Frontier Model Leaderboard

RankModelProviderBenchmark/Signal Highlight
1GPT-5.4 ThinkingOpenAI75% on OSWorld-Verified; First model to exceed human baseline in native computer-use.
2Claude Opus 4.6 (Thinking)Anthropic80.8% on SWE-bench Verified; Leader in multi-agent orchestration and architectural refactoring.
3Gemini 3.1 Pro (Preview)Google77.1% on ARC-AGI-2; Highest verified score in novel logic and pattern inference.
4Grok 4.1 ThinkingxAI1483 Arena Elo; Holds #1 in human preference for creative and “unconstrained” reasoning.
5Kimi K2.5 ThinkingMoonshot AI#1 for Agent Swarm; Can orchestrate up to 100 concurrent sub-agents for massive parallel tasks.
6Seed 2.0 ProByteDance89.5 on VideoMME; Dominates in professional video understanding and temporal reasoning.

The key aspect of the leader board is that the leadership keeps changing with each of the leading players releasing their latest and greatest model and shuffling the deck. I confess that I needed to revise this table as I write this in March 2026.

Another aspect is that the time a model spends at the top of the leadership board is also shrinking as the competition intensifies and more capable models are released regularly and more frequently and there does not seem to be a finish line for this race anytime soon. Needless to say, the contents of above table could change between the time I write this and you read it.

However, there are no new players breaking into the ranks of top 3 and it could become increasingly difficult to do so, but let us not rule out any breakthroughs so soon.

As of early 2026, Hugging Face hosts over 2.1M models. The community reached the 1M model milestone in mid 2024. So the growth rate or the ferocity of the number of models can be estimated.

The number of truly frontier class models would be about 50 as of the writing.

While there are a much smaller number of notable and a few original foundational models, there are millions of variants. Then there are unknown number of private models and internal fine tunes.

Comparison of Model Tiers (Feb 2026)

Model TierEstimated CountKey Examples
Frontier Models< 50GPT-5.2, Claude 4.5, Gemini 3
Notable Models~4,000DeepSeek-V3, Llama-4, Mistral-Next
Open-Source (Public)2.1 Million+Hugging Face Repositories
Private Enterprise~12–15 Million (estimated)Proprietary internal tools

Natural Language Processing Models dominate the landscape with about 58-60% of share. Computer Vision Models are the next with ~20%. Audio and Speech models about 15% and finally Multimodal and others forming the rest i.e. about 5-6%. The fastest growing models are the Agentic and Reasoning models followed by Vision Language Models (VLMs).

Company-Specific Roadmaps

CompanyFlagship Model (2026)Strategic Focus & Current Pursuit
OpenAIo3 / o4-mini / GPT-5The AI Super-Assistant: Moving toward a unified “Frontier” model that acts as a primary interface for all digital tasks. Investing heavily in custom silicon to lower inference costs.
GoogleGemini 3 (Pro/Ultra)The Personalized Ecosystem: Deep integration with Google Workspace and Chrome. “Auto Browse” features allow Gemini to book tickets and manage travel natively within the browser.
AnthropicClaude Opus 4.6Trust & Computer Use: Doubling down on “Computer Use” (Claude moving the mouse/clicking buttons) and legal/financial “High-Stakes Reasoning” where auditability is non-negotiable.
MetaLlama 4 (Scout/Maverick)The Open Infrastructure: Maintaining the open-source lead. Llama 4 uses a sparse MoE architecture with a 10M context window to enable self-hosting for massive enterprise RAG systems.

OpenAI: From Monolith to Portfolio

OpenAI has abandoned the “one model for everyone” approach. Their 2026 roadmap features GPT-5.2 for premium knowledge work and gpt-oss (open-weight) models to defend against Meta. They are also testing “AI Inboxes” and “Search AI Mode” that checkout shopping carts directly via a Universal Commerce Protocol.

Google: “Agentic Vision” & Personal Intelligence

Google’s Gemini 3 Flash now features “Agentic Vision.” Unlike passive snapshots, the model “explores” an image or video to find tiny details, drastically reducing hallucinations in visual tasks. Their focus is on Personal Intelligence – connecting to your Photos, Gmail, and History to become a “partner who is already up to speed”.

Anthropic: “Computer Use” & Agentic Coding

Anthropic is the leader in Autonomous Software Engineering. Their report on “Agentic Coding” shows Claude-based agents refactoring 12.5 million lines of code in 7 hours with 99.9% accuracy. They are pursuing “Foundry” – a platform where agents operate with sovereign-level trust and governance.

Meta: Llama Models and massive context window

Llama 4 Scout model, released by Meta on April 5, 2025, officially supports a 10-million-token context window. This was a significant technical milestone, making it the first publicly available open-weight model to offer such a massive capacity – surpassing rivals like GPT-4o (128K) and Gemini 1.5 Pro (2M) at its launch.

The Llama 4 family consists of different models with varying context capabilities:

  • Llama 4 Scout (109B total / 17B active): Features the full 10 million token window.
  • Llama 4 Maverick (400B total / 17B active): Supports a 1 million token window.
  • Llama 4 Behemoth (2T total / 288B active): Announced as a flagship “teacher” model capable of even more complex reasoning, though initially released in preview for research.

Pushing the Frontier in 2027

By 2027, the focus is expected to shift toward Mechanistic Interpretability (understanding why a model thinks) and Verifiable Rewards. This will allow AI to be used in high-risk zones like autonomous surgical assistants or grid-level energy management where “black box” logic is currently a legal blocker.

Mechanistic Interpretability is considered a core pillar of transparent and explainable AI (XAI).

Part II – The Economics of Intelligence: A Tale of Two Curves follows..

The State of AI – Fears, Opportunities and the Promise

By Bhanu Nallagonda, Cofounder, Ogha Technologies

March ‘26

The Age of Autonomy, The Infrastructure of Gigawatts and The Paradox of Intelligence

As the AI related voices and noise grew louder and louder, I felt it is pertinent to sit down, take a hard look at it and assimilate the progress so far and figure out where the things are headed, so that we can be where the puck is going to be and not where it has been or is currently. The key questions are the status of the frontier models, AI bubble, vibe coding and its impact, changes to the IT services landscape, threat to the jobs and the AX. The main objective is to scratch the surface and not be swayed by all the hype surrounding it. This blog is divided into multiple parts to make it an easier read. It provides an exhaustive, 360-degree analysis of the state of Artificial Intelligence as we speak. We will dissect the technical breakthroughs of the models, analyze the plummeting cost curves of inference, map the sprawling infrastructure of the AI arms race, explore the sociological shifts brought about by “vibe coding”, look at the strategies being adopted by the traditional IT industry to cope up with these transitions and assuage the investors’, students’ and the entry level developers’ anxiety. Finally, we will extrapolate these trends to figure out where it is all going, how it would reshape the industries, players and the startup eco system. All opinions expressed are personal. The pictures are generated by my co-founder, Kiran using AI tools apart from being a critique.

Introduction

The year 2025 will perhaps go down in the history of computing as the year that saw AI’s fundamental maturation. 2023 saw the advent of the “Chatbot”, a kind of crossing the blurred line of Turing Test, but firmly, and following that 2024 was the year of “Multimodality” – where models learned to see and hear. 2025 has brought in the “Age of Autonomy”, the year when artificial intelligence started acting, beyond just talking. Physical AI and world models have started chalking out their own paths in the meanwhile.

During the twelve-month cycle of 2025, the industry has navigated extreme contradictions. We have witnessed the raw intelligence of frontier models shattering benchmarks that were considered “impossible” merely a few months ago, with systems like Google’s Gemini 3 (and 3.1 Pro this year) and OpenAI’s GPT-5.2 (and 5.3 and 5.4 subsequently this year) demonstrating reasoning capabilities that rival human experts in narrow domains. Of course, the earlier benchmarks themselves got saturated paving way for new ones. More on it later. Simultaneously, the industry is grappling with a profound economic paradox: the unit cost of raw intelligence has plummeted by nearly three orders of magnitude, while the capital required to train a new model skyrocketed into billions and the aggregate cost of deploying enterprise AI also going up manyfold, driven by the voracious appetite of agentic workflows and “swarm” architectures. The paradox does not end there, many reports point now that more than 60% of AI system’s total lifecycle costs come from inference, not training! The share of inference would go up with increasing adoption, despite the cost of inference coming down and the cost of new models skyrocketing with predominant brute force approaches with an eye on AGI. It is also desirable that the usage goes up with real business benefits so that the huge investments made are paid back.

Training vs Inference Costs

The physical manifestation of this digital revolution has become impossible to ignore. The race to Artificial General Intelligence (AGI) has morphed from a battle of algorithms into a battle of gigawatts. We are witnessing the construction of “giga-scale” infrastructure projects—like the $500 billion “Stargate” initiative and Meta’s “Prometheus” supercluster—that rival the industrial mobilizations of the 20th century. These are not merely data centres; they are modern cathedrals of compute, consuming energy on the scale of nation-states to power the next generation of synthetic cognition. However, a shadow looms over this expansive growth. A “circular economy” of funding has emerged, where chip makers invest in the very cloud providers that purchase their hardware, fuelling fears of a catastrophic asset bubble reminiscent of the dot-com crash. As valuations detach from current revenue realities, the market asks a critical question: is this the buildup to a new industrial revolution, or a prelude to a correction or even a crash?

Please read on Part I – The Model Landscape: The Frontier of Reasoning

MLOps – Machine Learning Operations

Kiran Kumar Nallagonda

Introduction

The continuous process of operationalizing machine learning models to get business value requires observability, monitoring, feedback mechanism to retrain the models whenever necessary.

Gartner predicted in 2020 that, 80 percent of AI projects would remain alchemy i.e. run by wizards whose talents will not scale in the organization and that only 20 percent of analytical insights will deliver business outcomes by 2022.  Rackspace corroborates that claim in a survey completed in January of 2021 saying that 80 percent of companies are still exploring or struggling to deploy ML models.

 The general challenges are that most of the models are difficult to use, hard to understand, have least explainability and are computationally intensive. With these challenges, it is very hard to extract the business value. The goal of MLOps is to extract business value from the data by efficiently operationalizing ML models at scale.  A data scientist may find a model which functions as per business requirements, but deploying the model into production with observability, monitoring and feedback loop complete with automated pipelines, at low expense, high reliability and at scale require entirely different set of skills. This can be achieved in parallel collaboration with DevOps teams.

An ML engineer builds ML pipelines that can reproduce the results of the models discovered by the data scientist automatically, inexpensively, reliably and at scale.

MLOps Principles

Here are a few principles to keep them in check for better MLOps:

         a) Tracking or Software Configuration

         ML models are software artifacts that need to be deployed. Tracking provenance is critical for deploying any good software and typically handled through version control systems. But, building ML models depends on complex details such as data, model architectures, hyper parameters and external software.  Keeping track of these details is vital, but can be simplified greatly with the right tools, patterns, and practices. For example, this complexity could be simplified by adopting dockerization and/or kubernetization of all components and overlaying usual DevOps version controls.

         b) Automation and DevOps

         Automation is key to modern DevOps, but it’s more difficult for ML models. In a traditional software application, a continuous integration and continuous delivery (CI/CD) pipeline would pick up some versioned source code for deployment. For an ML application, the pipeline should not only automate training models, but also automate model retraining along with archival of training data and other artifacts.

         c) Monitoring/Observability

         Monitoring software requires good logging and alerting, but there are special considerations to be made for ML applications. All predictions generated by ML models should be logged in such a way that enables traceability back to the model training job. ML applications should also be monitored for invalid predictions or data drift, which may require models to be retrained.

         d) Reliability

         ML models can be harder to test and computationally more expensive than traditional software. It is important to make sure your ML applications function as expected and are resilient to failures. Getting reliability right for ML requires some special considerations around security and testing.

         e) Cost Optimization

         MLOps are more deeply involved with cost intensive infrastructure resources and personnel. Continuous cost monitoring and making necessary adjustments from time to time to optimize the cost as well as to drive more business value is extremely important. For some of the models, training could be the cost intensive part of the work, when compared to entire life cycle of the model and its operations. But this cost equation could change entirely when model gets deployed and scaled to numerous instances. For example, initially Alexa’s speech to text, NLP, NLG related model training was cost intensive in terms of collecting, processing the data and training the models using expensive computational resources. After the models are deployed on the cloud and scaled to planet level, most of the cost shifted to inference layer part of the MLOps.

         These kinds of cost dynamics can be tackled by estimating and monitoring the costs, adopting right technologies, architectures and processes.

         In the above example, inference layer cost is off-loaded to the device itself partially, instead of utilizing the cloud resources in every instance.

         Even the training cost will have different equation when federated learning kind of architectures are adopted. Apart from these dynamics, standardizing on the right tools for tracking (and training) models will noticeably reduce the time and effort necessary to transfer models between the data science and data engineering teams.

Model Registry

A model registry acts as a location for data scientists to store models as they are trained, simplifying the bookkeeping process during research and development. Models retrained as part of the production deployment should also be stored in the same registry to enable comparison to the original versions. 

A good model registry should allow tracking of models by name/project and assigning a version number. When a model is registered, it should also include metadata from the training job. At the very least, the metadata should include:

  • Location of the model artifact(s) for deployment.  
  • Revision numbers for custom code used to train the model, such as the git version hash for the relevant project repository.
  • Information on how to reproduce the training environment, such as a Dockerfile, Conda environment YAML file, or PIP requirements file.
  • References to the training data, such as a file path, database table name, or query used to select the data.

Without the original training data, it will be impossible to reproduce the model itself or explore variations down the road. Try to reference a static version of the data, such as a snapshot or immutable file. In the case of very large datasets, it can be impractical to make a copy of the data. Advanced storage technologies (e.g. Amazon S3 versioning or a metadata system like Apache Atlas) are helpful for tracking large volumes of data.

Having a model registry puts structure around the handoff between data scientists and engineering teams. When a model in production produces erroneous output, registries make it easy to determine which model is causing the issue and roll back to a previous version of the model if necessary. Without a model registry, you might run the risk of deleting or losing track of the previous model, making rollback tedious or impossible. Model registries also enable auditing of model predictions.

Some data scientists may resist incorporating model registries into their workflows, citing the inconvenience of having to register models during their training jobs. Bypassing the model-registration step should be discouraged as a discipline and disallowed by policy. It is easy to justify a registry requirement on the grounds of streamlined handoff and auditing, and data scientists usually come to find that registering models can simplify their bookkeeping as they experiment.

Good model-registry tools make tracking of models virtually effortless for data scientists and engineering teams; in many cases, it can be automated in the background or handled with a single API call from model training code.

Model registries come in many shapes and sizes to fit different organizations based on their unique needs.  Common options fall into a few categories:

  • Cloud-provider registries such as Sagemaker Model Registry or Azure Model Registry.  These tools are great for organizations that are committed to a single cloud provider.
  • Open-source registries like MLflow, which enable customization across many environments and technology stacks. Some of these tools might also integrate with external registries; for instance, MLflow can integrate with Sagemaker Model Registry.
  • Registries incorporated into high-end data-science platforms such as Dataiku DSS or DataRobot. These tools work great if your data scientists want to use them and your organization is willing to pay extra for simple and streamlined ML pipelines.

Feature Stores

Feature stores can make it easier to track what data is being used for ML predictions, but also help data scientists and ML engineers reuse features for multiple models. A feature store provides a repository for data scientists to keep track of features they have extracted or developed for models. In other words, if a data scientist retrieves data for a model (or engineers a new feature based on some existing features), they can commit that to the feature store. Once a feature is in the feature store, it can be reused to train new models – not just by the data scientist who created it, but by anyone within your organization who trains models.

The intent of a feature store is not only to allow data scientists iterate quickly by reusing past work, but also to accelerate the work for productionizing models. If features are committed to a feature store, your engineering teams can more easily incorporate the associated logic into the production pipeline. When it’s time to deploy a new model that uses the same feature, there won’t be any additional work to code up new calculations.  

Feature stores work the best for organizations that have commonly used data entities that are applicable to many different models or applications. Take, for example, a retailer with many e-commerce customers – most of that company’s ML models will be used to predict customer behavior and trends.  In that case, it makes a lot of sense to build a feature store around the customer entity. Every time a data scientist creates a new feature to better represent customers, it can be committed to the feature store for any ML model making predictions about customers. 

Another good reason to use feature stores is for batch-scoring scenarios. If you are scoring multiple models on large batches of data (rather than one-off/real-time) then it makes sense to pre-compute the features. The pre-computed features can be stored for reuse rather than being recalculated for every model.

MLOps Pipeline

More efficient pipelines are constructed in combination with DevOps.  Here are outlined steps:

  1. Establish Version Control
  2. Implement CI/CD pipeline
  3. Implement proper logging, centralized log stash, retrieval and querying the logs.
  4. Monitor
  5. Iterate for continuous improvement

Conclusion

Developing an ML production pipeline that delivers business value is extremely challenging and can be mitigated with right deployment of resources, tools, personnel, expertise, and best practices. Remember to keep it simple, iterate it to continuously improve till it meets necessary business value.

References

Algorithmic Portfolio Management

Photo by Ramón Salinero on Unsplash

Bhanu Nallagonda

When everything goes algorithmic nowadays, why not Portfolio Management?

“Algorithmic Portfolio Management” gets a few thousand results on Google, compared to about 9 million results for “Algorithmic Trading” and in LinkedIn training, one gets zero results as on date!

In algorithmic trading or algo trading for short, preprogrammed algorithms or set of processes execute the trades. Its volumes have steadily increased over years, reaching about 60-80% of the total trading volumes depending on the markets, higher in advanced equity and forex markets and with about 40-50% of trading volume being generated in commodity markets. It also increases volatility and certain risks with millions to billions of market value getting wiped off within minutes and then recovering.

The top reasons for using algo trading are – ease of use, improved trader productivity, consistency of execution performance, lower costs/commissions, better monitoring and high speed/lower latency. Money management fund managers use algo trading to implement their investment decisions. There are traditional strategies such as mean reversion, price or earnings momentum, value and multi-factor or combination of multiple strategies and machine learning based ones such as artificial neural networks, k-NN and Bayes etc.

One specific trend over the years has been diminishing alpha and it is increasingly becoming difficult for actively managed funds to beat their benchmark indices, after expenses. ETFs are making a comeback or gaining mind and market share in the recent years. In the US, passive ETFs have attracted more investments than passive mutual funds. In order to keep the growing tendency of operational and management costs going up, there is an increasing need to leverage technology to be more efficient and effective.

Then there are quant funds, in which securities to invest are chosen through quantitative analysis based on numerical data and without any subjective intervention. While their cost of management is lower as fund managers’ efforts and interventions are much lower, their performance has not been consistent over long time.

So, how is Algorithmic Portfolio Management different from algo trading and is there a case for it to be similarly popular going forward, in this algorithm driven world? It is likely to be so, and let us look at it, along with the causes and trends that would drive it up in future.

Robo-advisory services, which provide algorithmic financial planning services to individuals after collecting their information, have been getting popular. They started with passive indexing strategies and moved onto more sophisticated optimization with variants of modern portfolio theory, tax loss harvesting and retirement planning.

With the advent of ever-increasing computational power and availability of broader and deeper data, Machine Learning brings in more sophistication to the algorithms. Machine Learning (ML) and Artificial Intelligence (AI) make analysis of new forms of data such as unstructured ones, hitherto not practical. While absence of investors’ human biases and subjective judgements are touted as advantages, AI/ML models can have their own biases depending on the data fed to them, deficiencies and limitations of the algorithms used and may even reflect the biases and preferences of algorithm constructors.

In Algorithmic Portfolio Management, a portfolio of assets and sub-assets need to be managed for better risk adjusted returns. That is a key difference when compared to algo trading, which is more of one dimensional, focused on single security at a time. So, the key aspects of Algorithmic Portfolio Management are:

  • Asset Allocation
  • Portfolio Construction
  • Portfolio Execution
  • Performance Monitoring and Evaluation
  • Rebalancing

Asset allocation is the single biggest factor that determines a large percentage of the returns or variance in the returns of the portfolio over long periods. Efficiency Frontier can optimize the portfolio for low risk and high expected return or vice versa. Diversification with negatively or low correlated securities lower the standard deviation or the risk. Monte Carlo simulation is used for risk analysis by producing distributions of possible outcomes. Apart from these age old and traditional techniques, principal component analysis can be used for feature selection i.e. to choose the parameters and aspects that matter and ML algorithms can be used for better optimization. While there are diminishing benefits with greater diversification, machine driven algorithmic approach can make management of higher number of securities more effective and easier than with human based processes.

Portfolio construction involves careful selection of securities for better risk adjusted returns. Algorithmic frameworks including macro and micro level decisions can be used for greater alignment of investment objectives and risk profiles.

Portfolio execution in terms of buying and selling securities can leverage algo trading for lower market impact and better outcomes. Higher percentage of larger ticket size trades of institutions tend to use algo trading more than for smaller trades.

With the availability real time and near real time data and computational power, portfolio performance monitoring and evaluation can be more frequent and thus triggering effective rebalancing in near real time based on market data for more optimal returns.

Passive rebalancing including calendar and/or percentage-based rebalancing is used for robo-advisory approaches. Algorithmic management can bring in more sophisticated and optimization for active and dynamic asset allocation and rebalancing driven by them. Dynamic asset allocation is not driven by fixed percentage allocations, but involves a more nuanced approach of changing the securities and their constitution based on analysis or algorithmic output.

In conclusion, fund managers are expected to leverage algorithmic portfolio management to complement subjective decisions, to reduce the costs of management and for greater alpha, though portfolios may not be driven entirely by algorithms in near future.