Agents Over Bubbles – Stratechery by Ben Thompson

There is a weird paradox in terms of AI prognostication: on one hand, you don’t want to be the one to completely dismiss the most terrifying doomsday scenarios; who wants to be found out to be foolishly optimistic? At the same time, there is also pressure to give credence to the possibility that we are in a bubble, and all of this hype and spending is going to go belly up.
While I have argued against the former, I have very much been on board with the latter, making the case that bubbles can be good.
Sitting here in March 2026, however, on the morning of Nvidia’s GTC, I’ve come to a different conclusion: I don’t think we’re in a bubble (which, paradoxically, maybe is the truest evidence we are).
LLM Paradigms
Over the last couple of weeks, first in the context of Nvidia’s earnings, and then last week in the context of Oracle’s, I’ve talked about three LLM inflection points.
ChatGPT: The first LLM inflection point was the November 2022 launch of ChatGPT, which hardly needs an explanation. Yes, transformer-based large language models were introduced in 2017, and the capabilities were both impressive and growing, but under-appreciated; Stratechery started an interview series with Daniel Gross and Nat Friedman in October 2022 under the premise that there was an incredible new technology that was sorely lacking for product applications and startup energy.
Needless to say, that was entirely flipped on its head just weeks later. ChatGPT opened the eyes of the world to what LLMs were capable of, but the initial versions had two flaws that have stuck in many people’s minds, particularly those convinced that we are in a bubble.
The first flaw is that LLMs frequently got things wrong, and worse, would hallucinate when it didn’t know the answer. This made LLMs feel like something of a parlor trick: amazing when it works, but not something that you can count on. The second was related to the first: even in that flawed state LLMs were tremendously useful, but you needed to have an idea of what to use them for, and you needed to proactively take care to manage mistakes and verify the output in case it was hallucinated.
o1: The second LLM inflection point was the release of OpenAI’s o1 model in September 2024. By that point LLMs had improved tremendously, both thanks to new foundation models and also because of continued improvements in post-training; that meant that the stream of tokens that constituted an answer in ChatGPT or Claude were now much more likely to be right, and somewhat less likely to hallucinate. What made o1 different, however, was that it reasoned over its answer before delivering it to you. I explained in an Update at the time:
The big challenge for traditional LLMs is that they are path-dependent; while they can consider the puzzle as a whole, as soon as they commit to a particular guess they are locked in, and doomed to failure. This is a fundamental weakness of what are known as “auto-regressive large language models”, which to date, is all of them.
Reasoning models self-evaluate: they work through an answer and then consider if the answer is correct, or if they should consider other alternatives. To put it in terms of the weaknesses I identified above, they were internally proactive in terms of managing mistakes, reducing the burden on the user to continually actively guide the LLM, and the results were remarkable. From my perspective, if the brilliance of ChatGPT was in making LLMs much more readable and useful, the brilliance of o1 was in making LLMs much more reliable and essential.
Opus 4.5: Anthropic released Opus 4.5 on November 24, 2025, to relatively little fanfare; then, at some point in December, Claude Code with Opus 4.5 suddenly seemed to be able to do things that were never possible previously. OpenAI released GPT-5.2-Codex around the same time, on December 18, and it was similarly capable. People had been talking about “agents” for a while; suddenly, however, both Claude and Codex were actually accomplishing tasks — some of which took hours — and doing them correctly.
That bit about the Opus 4.5 model’s release date is interesting, however: the key thing about agentic workloads is that they are about more than the model, or using the model recursively, like o1. Rather, a critical component of making agentic workloads work is the “harness”, i.e. the software that actually controls the model.
To put it another way, Claude Code and OpenAI’s Codex actually abstract the user away from the model: you give instructions to an agent, which actually directs the model; critically, the agent can also use other deterministic tools as well, which means that it can verify its results. To put it in the context of coding, in paradigm one an LLM would generate code; in paradigm two an LLM would think about the code it was generating and iterate towards a better answer; in this paradigm an agent directs a model to generate code, then checks to see if the code actually works, and if it doesn’t tries again, all without the user needing to be involved.
In other words, many of the biggest flaws from the original ChatGPT have been substantially mitigated, at least for verifiable use cases like coding: LLMs are much more likely to be right the first time, they reason over their results to increase their chances, and now agents actively verify the results without humans needing to be in the loop. That leaves one flaw: actually figuring out what to use these for.
The Decreased Need for Agency
The reason I’ve been writing about these three inflection points over the last couple of weeks has been to explain why it is that the industry is so compute constrained and why the massive investment in capex by the hyperscalers is justified.
- The first paradigm required a lot of compute for training, but inference — actually answering a question — was relatively efficient: you simply sent the user whatever the model spit out.
- The second paradigm dramatically increased the amount of computing needed for inference, for two reasons: first, generating an answer required a lot more tokens, because all of the “reasoning” required tokens, in addition to the answer itself. Second, the fact that reasoning made the models so much more useful meant that they were used more, which drove increased token usage in its own right.
- It’s the third paradigm, however, that has truly tipped the scales in favor of capex expenditure not being speculative investment but rather badly needed investment in meeting demand that far exceeds supply. First, generating an answer will often entail multiple calls to a reasoning model. Second, the agent itself needs compute, and that compute — and the tools the agent uses — is better done by CPUs than GPUs. Third, agents are another step function increase in usefulness, which means they are going to be used even more than even reasoning models in a chatbot.
It’s how this third point will be manifested that I think is under-appreciated. After all, far more people use chatbots than use agents, and I would make the case that most people are not using chatbots as much as they should! It’s been a question of agency: to get the most from AI requires actually taking the initiative to use AI; I wrote in 2024’s MKBHD’s For Everything:
Large language models are intelligent, but they do not have goals or values or drive. They are tools to be used by, well, anyone who is willing and able to take the initiative to use them. I don’t think either Brownlee or I particularly need AI, or, to put it another way, are overly threatened by it…The connection between us and AI, though, is precisely the fact that we haven’t needed it: the nature of media is such that we could already create text and video on our own, and take advantage of the Internet to — at least in the case of Brownlee — deliver finishing blows to $230 million startups.
How many industries, though, are not media, in that they still need a team to implement the vision of one person? How many apps or services are there that haven’t been built, not because one person can’t imagine them or create them in their mind, but because they haven’t had the resources or team or coordination capabilities to actually ship them?
This gets at the vector through which AI impacts the world above and beyond cost savings in customer support, or whatever other obvious low-hanging fruit there may be: as the ability of large language models to understand and execute complex commands — with deterministic computing as needed — increases, so too does the potential power of the sovereign individual telling AI what to do. The Internet removed the necessity — and inherent defensibility — of complex cost structures for media; AI has the potential to do the same for a far greater host of industries.
It’s interesting to read that two years on, realize that I was writing about the latest paradigm shift well before it happened, and yet feel completely blown away by that paradigm shift all the same. That’s how big of a deal actually functional agents are: you can see them coming and yet still be amazed when they arrive — and, as one must say with everything related to AI, in a form that is the worst they will ever be.
It’s the implications on agency, however, that are the most profound: yes, you need agency to use agents, and yes, the number of people who will have that agency are probably far fewer than those who might use a chatbot. Of course you can make the (almost certainly accurate) case that chatbots will become agent managers in their own right, but the more critical observation is that by abstracting humans away from direct model management any one single human can control multiple agents.
What this means in terms of compute — and by extension, economic impact — is that it actually won’t require that many people with agency to drastically increase the amount of compute that is actively utilized to create products with meaningful economic impact. In other words, the rise of agents doesn’t just mean a dramatic increase in compute, but also a narrowing of the need for widescale adoption by humans for that demand to manifest. Yes, AI still needs agency; it just doesn’t need agency from that many people for its impact to be profound.
Enterprise Economic Imperatives
Apple-focused media, in the wake of the recent MacBook Neo launch, latched onto comments from Asus CFO Nick Wu on the company’s recent earnings call describing the $599 computer as “a shock to the entire market”; equally interesting, however, was how Wu sought to downplay the Neo’s potential effects on that market:
Actually, we heard about the MacBook Neo shipments coming online back in the second half of last year. So we made some internal preparations. But after the product officially released, we found the specs to have some limitations. For example, the memory is not upgradable, and it only has 8 gigabytes of memory. So this may limit certain applications. So I think when Apple positioned the product, it’s probably focused more on content consumption. This differs somewhat from mainstream notebook usage scenarios because in that case, the Neo feels more like a tablet because tablets are mostly for content consumption.
This feels like a bit of a cop-out, given just how capable the Neo’s processor is, and how well Mac OS operates on 8GB of RAM, thanks in part to Apple’s deep integration of hardware and software; at the same time, Wu is tapping into something that is true, which is that most consumers mostly do just want to consume content (which, I would add, means he should be more worried about the Neo, not less). This is why your favorite productivity application always ends up pivoting to the enterprise: it is companies who are willing to pay for productivity, because they are the ones actually paying for the workers who they want to be more productive.
It’s reasonable to expect this to apply to AI as well: the most compelling consumer applications of AI, at least in the near term, are Google and Meta’s advertising businesses, which sit alongside content. By the same token, it was always unrealistic for OpenAI to think that it could convert more than a small percentage of consumers into subscribers; that’s both why an ad model is essential, and also why that won’t be enough to pay the bills. It’s definitely the case that most people don’t want to pay for AI; it remains to be seen if they want to use it enough to make the ad model work.
That is another way of saying that Anthropic got it right by focusing almost entirely on the enterprise market: companies have a demonstrated willingness to pay for software that makes their employees more productive, and AI certainly fits the bill in that regard. What makes enterprise executives truly salivate, however, is the prospect of AI not simply eliminating jobs, but doing so precisely because that makes the company as a whole more productive.
It’s always been the case, even in large companies, that a relatively small number of people actually move the needle and drive the company forward in meaningful ways. That drive, however, has been filtered through a huge apparatus, filled with humans, who accelerate the effort in some vectors, and retard it in others. That apparatus makes broad impact possible, but it carries massive coordination costs.
Agents, however, will tilt much more heavily towards pure acceleration, making those drivers of value much more impactful. I’m sympathetic to the argument that the best companies will want to use AI to do more, not simply save money; the reality of large organizations, however, is that the positive impact of AI will not be in eliminating jobs, but rather replacing hard-to-manage-and-motivate human cogs in the organizational machine with agents that not only do what they are told but do so tirelessly and continuously until the job is done.
This only makes the argument that we are not in a bubble that much more compelling:
- First, all of the weaknesses of LLMs are being addressed by exponential increases in compute.
- Second, the number of people who need to wield AI effectively for demand to skyrocket is decreasing.
- Third, the economic returns from using agents aren’t just impactful on the bottom line, but the top line as well.
In this context, is it any wonder that every single hyperscaler says that demand for compute exceeds supply, and that every single hyperscaler is, in the face of stock market skepticism, announcing capex plans that blow away expectations?
This is also why the impending wave of layoffs that are going to be credited to AI shouldn’t be completely dismissed as a useful cover for correcting over-hiring decisions in the COVID era, or right-sizing compensation structures in the wake of multiple contractions. That is all true!
At the same time, it’s worth considering that companies become bloated because that has long been the only way to scale, and it’s hard to know at what point the diminishing returns that come from the drag of coordination costs and a sprawling workforce outweigh the benefits of the marginal employee; you only find that point when you have blown past it, and it’s hard to go backwards.
AI, however, not only gives the aforementioned excuse to undo that bloat, but also moves the “rightsize” point significantly towards a much smaller workforce. More and more companies are not simply going to wonder if they hired too much for a pre-AI world, but also if they hired too much for a post-AI world; the most forward-looking and future-proof approach will likely be to cut more rather than less, with the hope that those who remain have no choice but to rebuild scale with agents. After all, if they don’t, dramatically smaller competitors built with AI from the beginning will soon be nipping at their heels with both smaller cost structures and more capabilities that will structurally increase over time.
There is a good chance this is going to get ugly; I’m not advocating for this outcome, rather analyzing why it is probably going to happen. The economic imperatives are going to be impossible to resist, and will fuel demand for even more compute over time, further supporting the case that this is no bubble.
Agents and the AI Value Chain
Another important bubble question is about the sky-high valuations of Anthropic and OpenAI: sure, maybe all of this stuff is real, but if models are a commodity, is there any profit to be made? Horace Dediu raises these questions at Asymco and wonders if Apple is executing The Most Brilliant Move in Corporate History:
Here is where Apple’s bet becomes genius. AI models are commoditizing faster than anyone predicted. Software and hardware both have tendencies to commodify. Protections exist but they have to do with integration and distribution. DeepSeek built a model for $6 million that matches systems costing $100 million. Open source models now power 80% of startups seeking VC funding. The moat these companies are spending hundreds of billions to build is evaporating.
Apple understood this before anyone else. It didn’t build its own AI model, it licensed Google’s Gemini for about $1 billion a year. Why spend $100 billion building a factory when outsourcing costs a billion? And if a better model appears next year, Apple just switches vendors…Apple didn’t miss the AI revolution. It just bet that the winners won’t be the ones who build the infrastructure. They’ll be the ones who own the customer and no one else on Earth owns the best customers.
I think that nearly all of these assertions were defensible during the first LLM paradigm. It didn’t take long for multiple base models to be more than good enough for what most people use LLMs for, like, say, cooking or basic medical advice, or as a therapist or companion. Moreover, it was reasonable to expect that models of this quality would soon be able to run locally; I made the case that this was Apple’s opportunity myself back when their own models — which they absolutely did try to build, contra Dediu — failed to ship.
The reasoning paradigm, however, blew a significant hole in the local inference case. Not only do reasoning models require fast compute, given the number of tokens generated, but they also need exponentially more memory to accommodate much larger context windows, which is the biggest limitation of local models. Apple makes incredible chips with a compelling unified memory architecture that makes basic inference more plausible for their devices than anyone else; there is also no scenario where capable reasoning models that are remotely competitive with cloud-based models are running locally in the foreseeable future.
It is agents, however, that may strike the fatal blow to Dediu’s argument. Specifically, I noted above that what made Opus 4.5 compelling was not the model release itself, but changes to the Claude Code harness that made it suddenly dramatically more useful. What this means is that model performance isn’t the only thing that matters: the integration between model and harness is where true agent differentiation is found.
This is a very big deal when it comes to figuring out the future structure of the AI industry and where profits will flow, because profits flow away from modular parts of the value chain — which are commoditized — and flow towards integrated parts of the value chain, which are differentiated. Apple is of course the ultimate example of this: its hardware is not commoditized because it is integrated with their software, which is why Apple can charge sustainably higher prices and capture nearly the entirety of the PC and smartphone sector profits.
It follows, then, that if agents require integration between model and harness, that the companies building that integration — specifically Anthropic and OpenAI (Gemini is a strong model, but Google hasn’t yet shipped a compelling harness) — are actually poised to be significantly more profitable than it might have seemed as recently as late last year. And, by the same token, companies who were betting on model commoditization may struggle to deliver competitive products.
The canary in the coal mine in this regard is Microsoft. Microsoft once fancied itself as an integrated AI provider, bragging on earnings call about how its deep integration with OpenAI would mean sustainably differentiated infrastructure; a month later OpenAI nearly imploded and Microsoft pivoted, talking increasingly about models as commodities and a Core AI strategy that entailed building infrastructure around models that themselves would be interchangeable and abstracted away from Microsoft’s customers.
Fast forward to last week, however, when Microsoft revealed how they will handle the potential business impact of AI reducing seats, which is a bit of a problem for their seat-based business model: the company is going to bundle AI into a new higher-tiered enterprise offering, E7, which is going to cost twice as much — $99 per seat per month — as the formerly top-of-the-line E5. That’s a big increase, which Microsoft needs to justify with AI that actually makes those seats more productive, and the product they launched with the new bundle was Copilot Cowork.
If the “Cowork” name sounds familiar, it’s because this is basically the enterprise version of Claude Cowork, a GUI-ified version of Claude Code that the company released earlier this year. There are important differences with the Microsoft version, including the fact that the latter runs in the cloud and is grounded in your organizational data, with all of the permission and access policies that go with it. What is crucial, however, is that Copilot Cowork — unlike the Copilot chatbot — is not model agnostic: Cowork is an agent, which means it needs both a model and a harness, and those are two integrated pieces, not modular components.
The implications of this are significant: Microsoft is admitting, at least for now, that delivering a truly compelling agentic product that enterprises are willing to pay for means abandoning their stated goal of being model agnostic; that, by extension, raises the possibility that models are not and will not be commodities, because agents require more than models.
This certainly raises questions about Apple’s decision to merely license Gemini and build a harness themselves in the form of new Siri. Microsoft decided that they couldn’t deliver a compelling product by going that route; what has Apple done to inspire faith that they can do a better job? If anything, the company’s saving grace is the point that Dediu ended with: consumers may simply not care that much about agents, in which case Apple will be fine with good enough, even as Microsoft, with enterprise customers who do care, realizes it needs to share more margin than it might want to with Anthropic.
What matters in terms of this Article, however, is that if agents are making Anthropic and OpenAI the point of integration in the value chain, then the bubble argument that these companies are overvalued, or that the massive investments other companies are making on their behalf in data centers is unwarranted, may not be correct.
I must, in the end, address my opening parenthetical: I’ve long maintained that there is no need to be worried about a bubble as long as everyone is worried about a bubble; it’s the moment when caution is flung to the wind and assurances are made that this is definitely not a bubble that we might actually be in one. And, well, I think the rise of agents means we are not in a bubble. The capex is warranted, and Anthropic and OpenAI look more durable than ever. If my declaring there is no bubble means there is one, then so be it!




