This article reflects MLNavigator Research Group work. Deployment lives in AdapterOS.

← Back to Research Notes
tokens billing audit compliance enterprise

Token Billing Disputes Are Real Money Now

December 20, 2025

Large enterprises running inference workloads at scale report a consistent problem: the tokens they estimate do not match the tokens they are billed. The variance ranges from 5% to 15%, depending on the provider, the caching strategy, and the complexity of the prompt structure.

At low volumes, this variance is a rounding error. At scale, it is a budget line item.

The Problem Is Financial, Not Technical

A 10% variance on a $2 million monthly inference bill is $200,000. Over a fiscal year, that is $2.4 million in unexplained cost. Finance teams cannot reconcile this against internal usage logs. Audit trails do not close. Budget forecasts drift from actuals in ways that cannot be attributed to specific teams, products, or workloads.

This is not a research problem. It is a procurement and audit problem. The tokens were consumed. The invoice arrived. The internal accounting does not match the external charge.

The response from most organizations has been to treat AI inference as an uncontrollable cost center—something to be monitored but not governed. This is not sustainable as inference spend grows from experimental budgets into operating expense.

Caching Increases Opacity

Caching is often presented as a cost optimization. In practice, it introduces accounting complexity that few organizations are prepared to handle.

When a response is served from cache, the provider may bill zero tokens, partial tokens, or full tokens depending on the caching policy and billing model. The user cannot verify which case applied. The cache hit rate is not reported at the granularity needed for reconciliation. The savings are estimated, not measured.

Worse, caching breaks attribution. A request that would have consumed 4,000 tokens on Tuesday consumes 200 tokens on Wednesday because a similar request populated the cache. The internal system logged the same input. The invoice reflects different consumption. The delta cannot be traced to a specific decision or policy.

Finance teams are asked to trust that caching reduced costs. They cannot verify it. They cannot audit it. They cannot explain it to a regulator or an internal review board.

Attribution and Reconciliation Fail Together

Token billing disputes arise when three records do not agree: the internal usage log, the provider's metering, and the invoice.

Internal logs capture what was sent. Provider metering captures what was processed. The invoice captures what was charged. These three numbers should match. In practice, they diverge.

The divergence has multiple causes. Prompt preprocessing by the provider may add or remove tokens. System prompts may be metered differently than user prompts. Caching may reduce metered tokens without reducing logged tokens. Retry logic may double-count failed requests. Batching may aggregate requests in ways that obscure per-request attribution.

None of these causes are errors. They are implementation details that were not designed with auditability in mind. The result is that token billing is not auditable at the resolution required by finance and compliance functions.

Verifiable Token Accounting

The corrective mechanism is verifiable token accounting: a system where every token consumed can be traced from request to response to invoice.

This requires three properties.

Determinism. The same input must produce the same token count every time. If a prompt is tokenized differently on different runs, reconciliation is impossible. The tokenization must be stable, versioned, and reproducible.

Replayability. Any request must be replayable against the billing record. If the organization disputes a charge, they must be able to reconstruct the exact input, tokenization, and output that generated that charge. Without replayability, disputes are assertions without evidence.

Auditability. The full chain from request to billing must be inspectable. This includes the raw input, the tokenized form, the model routing, the output, and the metering event. Each step must produce a record that can be independently verified.

These properties are not features. They are requirements for treating inference as a governed cost center rather than an uncontrolled expense.

The Control Surface

Token accounting is a financial control surface. It determines what can be measured, what can be attributed, and what can be audited.

Organizations that lack verifiable token accounting operate with a gap between their internal view of AI costs and their external obligations. That gap is measured in dollars, and it compounds with scale.

Closing the gap requires infrastructure that was designed for accountability. The tokens must be countable. The counts must be reproducible. The records must be auditable.

This is not an optimization. It is a control.