In Brief
AI vendors charge by the token, the unit of text going into and out of a model, and those costs are spiking fast. Royal Bank of Canada's token usage jumped 500% in six months, and roughly 300 companies addressed AI costs on recent earnings calls.
The bigger problem is almost no one can predict what they will be charged.
What Happened
Companies are scrambling to control the soaring cost of using AI, and the math behind those bills is far messier than most leaders expect. AI vendors charge by the token, the small unit of text that goes into and comes out of a model. For a model such as Anthropic's Claude, a token is roughly 3.5 characters, so about 750 words equals roughly 1,000 tokens. Everything you send and everything the model writes back gets counted and billed, so the more a company leans on AI, the faster the meter runs.
A recent report on the industry scramble laid out how fast this is moving. Royal Bank of Canada's CEO said its token usage jumped 500% in six months. Cisco's CEO called its own usage "pretty, pretty crazy." Roughly 300 companies brought up AI token costs on earnings calls in April and May, up from just 93 a year earlier. Some, including Meta, Uber, and Salesforce, have started capping how much their employees can use.
The honest truth is that almost nobody, including the experts, has fully figured out how to think about this yet. SmarterX founder and CEO Paul Roetzer discussed why the pricing is so confusing, and why it is built for the wrong audience, on Episode 221 of The Artificial Intelligence Show.
The Key Numbers
500% - RBC's token usage jump in six months
300 - Companies citing AI token costs on April-May earnings calls, up from 93 a year ago
2-5x - How much more output tokens (or your results) cost than input tokens (your prompts)
~$60/day - Cost of a support bot rereading one 20,000-token knowledge base
120 - Daily queries each Gemini Enterprise license adds to a shared pool
Why the Pricing Was Built for Developers
There are two kinds of tokens, and they are not priced the same. Input tokens are everything you send the model. Output tokens are everything it generates back. Output costs significantly more, typically two to five times as much, because the model reads your input in one pass but writes its answer one token at a time, predicting each next word. Hand Claude a 1,500-word brief and ask for a 600-word summary, and a single request costs less than two cents. Cheap, until you remember how many of those requests an organization typically runs.
Agentic AI is what blew up the bills. Tools such as Claude Code and Codex are agentic, meaning they take many steps on their own to finish a task. The catch is that on every new step, the entire prior conversation gets resent as input. If the model is on phase 10 of a task, it rereads phases one through nine first. A customer support assistant with a 20,000-token knowledge base that gest resent on every request and fields 1,000 questions a day, burns about 20 million tokens daily, or roughly $60 a day. Just to reread the same documents over and over.
The fixes exist, but most are band-aids. Prompt caching, where vendors store repeated chunks of text and charge a fraction for them, is a possible fix, though every vendor handles it differently. Right-sizing to cheaper models, capping output length, and trimming context all help at the margins. Only model routing, automatically sending each request to the cheapest model that can handle it, looks like a real long-term fix.
But there is a deeper issue. Most companies do not pay per token at all. They buy per-seat licenses, a flat monthly fee per user, and the usage limits behind those seats are opaque. Gemini Enterprise pools quotas across the whole organization, where each license adds something like 120 queries a day to a shared pool, and simply stops working when you hit the limit. A Claude team plan gives each person hidden individual limits. A Claude Enterprise plan pools them. Every provider is different, and there can be eight different ways it works inside a single tool.
That opacity is what frustrates Roetzer most. New routing systems make it worse, he says. "All my model choice is gone. When I want do my deep thinking, I don't even know which model I'm getting. I'm assuming they're giving me the sh** model because it costs them less per tokens." His larger objection is who this was designed for.
"All of this pricing is built for developers. None of these models make sense to the average knowledge worker. They are built for researchers and developers who understand how to work with APIs."
—Paul Roetzer, founder and CEO, SmarterX, Episode 221 of The Artificial Intelligence Show
The simplest answer, Roetzer argues, is probably a flat per-seat license that lets heavy users subsidize light ones. But the labs cannot just raise the price. "If you do that to a major enterprise that has yet to do AI literacy training," he says, "they're gonna be like, 'No way in hell we're getting $500. We can't justify that.'" The buyers do not yet understand the return well enough to swallow a bigger bill.
SmarterX Take
The size of the bills is only half the problem. The deeper one is the lack of predictability. A business cannot plan a major AI rollout when it has no reliable way to estimate what that rollout will cost. One-off agents built to do a defined task are somewhat measurable. But the open-ended, high-value knowledge work, such as a three-hour strategy conversation with a model, is exactly the work nobody can price in advance. No one is going to ration their thinking by the token.
This came up internally at SmarterX, where hitting Claude usage limits midmorning could stall the building of an online course for hours. The team researched the problem for weeks and found it crazier than expected. For AI leaders, the lesson is to treat cost predictability as a prerequisite, not an afterthought. Until pricing makes sense to the people actually running the agents, ambitious AI transformation plans will keep running into a wall.
What to Watch
The endgame the labs are quietly building toward is constant, around-the-clock AI usage. Roetzer describes the vision plainly: a 300-person marketing team run by 15 to 30 people orchestrating agents that never sleep, running 24/7 loops that tweak campaigns, update creative, and adjust media buys on their own. "You are literally just burning tokens all day long," he says. He sees that outcome arriving in two to three years across every function, which makes the token math even harder to imagine.
Efficiency gains are the wild card that could defuse all of it. If a model three years from now runs a thousand times more efficiently than today's, the cost of those 24/7 loops could reduce significantly. Until then, expect more companies to cap usage, more confusion across seat-license versus pay-as-you-go pricing, and more pressure on vendors to make their bills legible to non-technical buyers. As Roetzer put it, "We'll know we've reached AGI when this isn't a problem anymore. When we have a model smart enough to solve how to price models for business usage."
Why the Token Costs Outpace AI Readiness
Only 25% of organizations have reached the Scaling phase of AI adoption, according to the 2026 State of AI for Business Report. The largest share, 47%, is still piloting, and 28% are still in the earliest Understanding phase. Even among professionals personally working in Integration or Transformation, 62% say their organization has not yet reached Scaling.
That gap is exactly why the bills feel so alarming. Companies are ramping AI spend faster than they are building the operational foundation to scale it, so costs look huge while the return stays fuzzy. The full report, built on responses from 2,100+ professionals across roles, functions, and industries, maps where organizations actually stand on adoption, governance, training, and tooling. Read the full report →
Mike Kaput
Mike Kaput is the Chief Content Officer at SmarterX and a leading voice on the application of AI in business. He is the co-author of Marketing Artificial Intelligence and co-host of The Artificial Intelligence Show podcast.

