AI Cost Governance: Architectural Choices to Regain Control of Your Spending

In the latest episode of Bitrock Tech Radio, we tackled the topic of the economic sustainability of Artificial Intelligence. Not in the abstract sense of the public debate, but in very concrete terms — analyzing how much the AI that companies are using really costs, who is covering the gap between market price and real cost, and what happens when that coverage shrinks.

In this article we revisit the main points, examining the current economic mechanism, the risks tied to the flat-rate model adopted by most companies, and proposing an alternative architectural approach that ensures visibility and control over AI-related costs.


The Token Economy: the economic mechanism of Generative AI

The fundamental unit of measurement in generative AI is the token: every input sent to a model and every response received is counted and billed based on this unit. One million tokens corresponds to roughly 750,000 words. Providers bill input and output tokens separately, with rates that vary significantly from model to model.

Over the past two years, the market has seen a historic collapse in these prices. GPT-4o, for example, used to cost $5 per million input tokens: today the rate has dropped to $2.50. At enterprise scale, the average cost per million tokens across the major providers fell from around $10 to $2.50 in a single year—a deflation so pronounced that some analysts have coined the term “LLMflation” to describe it.

The cause of this deflation, however, is not a structural improvement in the economics of producing the models. It is the result of market competition deliberately waged at the expense of profitability, with the goal of capturing position and building adoption. A distinction that may seem irrelevant over a short time horizon, but that becomes decisive over the medium term for companies’ architectural choices.


Sustainability of the economic model and the risks of the flat-rate model

The financial data of the major AI providers is publicly available, even if rarely emphasized in official communications. The picture that emerges is uniform: LLM providers today operate at a structural loss. The cost of serving billions of inference requests per day systematically exceeds revenues, and the break-even projections of the main players are placed no earlier than the end of the decade. The prices charged to the market are kept artificially below the cost of production—a deliberate choice to build adoption, financed by venture capital and the cross-subsidies of the hyperscalers.

To further complicate the picture, the shift toward agentic workflows—AI systems that operate autonomously on complex tasks, reason, invoke external tools, and self-correct—introduces a significant consumption multiplier. According to a Gartner estimate from March 2026, this type of architecture consumes between 5 and 30 times more tokens than a normal conversational interaction. The effect is already visible in market data: prices per token have fallen by 80% in a year, but the average monthly AI spend of enterprise companies has grown by 36%. The two curves are moving in opposite directions: when the decline in prices comes to a halt, the consumption volume already built up will turn into a financial risk multiplier.

In this context, the most common response to uncertainty over AI costs consists of adopting flat-rate models and subscriptions: an effective solution in the short term but structurally fragile in the medium term. The subscription offers spending predictability, but at the same time it generates an operational dependency that tends to consolidate over time. Integrations, automated workflows, and team skills built around a specific tool produce progressive lock-in, the real cost of which only emerges when the provider decides to change the contractual terms according to its own timing and priorities, completely disregarding customers’ needs. At that point the exit price (systematically underestimated at the adoption stage) can become a significant constraint precisely when cost pressure is at its highest.

In other words, the models that large organizations are adopting to manage costs in the pilot phase (flat subscriptions, standardized contracts, direct access to individual providers) reveal their limits as soon as volumes grow and use cases multiply.

The central problem is not the spending itself, but the lack of visibility and control over costs. Without adequate tools, AI spending remains difficult to allocate by business unit, by process, or by request type. It is not possible to compare the ROI of different tools, identify waste, or make optimization decisions based on real data. The result is a budget line that grows without the organization having full awareness of where and why.

On top of this comes concentration risk: many companies have built their AI operations around a limited number of providers, often just one, without having assessed the implications of a possible disruption – in price, in service, or in commercial strategy. In a market still evolving rapidly, a concentrated dependency on one or two providers, with no defined exit plan, is a position that is hard to defend when conditions change.


Regaining control over AI costs: Fortitude Group’s architectural response

Contract negotiation and provider diversification are tactics that make sense, but they do not solve the structural vulnerability. The approach that Fortitude Group proposes to its clients starts from a different level: an architectural choice rather than a commercial one.

The principle is to build a technical layer that sits between the applications and the AI models, independent of any single provider. This is the role of Radicalbit’s AI Gateway, which enables the optimization of AI costs and the management of spending through configurable limiting, caching, and throttling.

In terms of cost control, the Gateway acts across four dimensions.

  • Intelligent routing: a routing system that allocates each request to the most suitable model based on expected quality, cost, and latency makes it possible to reduce overall token spending by 40–70% for the same useful output. At enterprise volumes, the difference is measured in hundreds of thousands of euros per year.
  • Semantic caching: a significant share of the queries sent to models in corporate contexts is semantically identical or very similar to requests already processed: periodic reports, standard questions about internal documentation, recurring support interactions. The Gateway identifies these redundancies and returns the already-computed response without generating new tokens. The principle is simple: don’t pay twice for the same answer.
  • Architectural portability: when the Gateway abstracts the models from the applications that use them, replacing a provider becomes a configuration operation, not a code rewrite. This operational independence pays off when more affordable models emerge, when a provider changes its prices, or when you want to experiment with new architectures without impacting the business logic. This function responds directly to the risk of lock-in: with an abstraction layer in place, the ability to migrate toward alternatives is no longer an unknown but a by-design feature of the system.
  • Visibility and spending governance: without a central control layer, AI spending is hard to quantify: it cannot be allocated by use case, by business unit, or by request type, nor can the ROI of each investment be measured precisely. The Gateway turns a cost item that is difficult to govern into structured, measurable spending, creating the conditions for strategic decisions grounded in real data.

Conclusion

The picture described contains an element that may seem counterintuitive. Despite the structural fragilities of the current economic model, market conditions have rarely been as favorable for building AI adoption with real room for experimentation.

Prices are historically low, the market is fiercely competitive, and providers have a direct interest in acquiring customers. This creates the conditions to experiment, build architectures, and understand which use cases generate real value at costs that are unlikely to return. The recommendation is to accompany AI adoption from the outset with architectural choices that preserve operational autonomy.

At this stage, building structural dependencies on a single provider – drawn in by the simplicity of the subscription or the convenience of the preconfigured integration – means accumulating a risk that will materialize when prices stop falling.

AI cost control is neither a contractual problem nor a matter of internal policy. It is an architectural problem, and like all architectural problems, the cost of addressing it increases in proportion to the time spent ignoring it.

At Fortitude Group we support organizations on this journey. If you’re starting to wonder how much control you really have over your AI spending, now is the right time to talk about it. Get in touch.


Main Author: Michele Ridi, Chief Strategy & Presale Officer @ Fortitude Group

Do you want to know more about our services? Fill in the form and schedule a meeting with our team!