Why your tokens cost more in some languages

The same text, a different cost

Language model APIs bill per token, the fragments of text a model splits your input into before reasoning. Here's the surprising part: to express the exact same idea, the number of tokens changes depending on the language. English is usually the most compact. Many other languages, especially those that don't use the Latin alphabet, get split into far more pieces.

The direct consequence: since the bill depends on the token count, two people asking the same question in two different languages do not pay the same price.

What the research says

The phenomenon is documented in peer-reviewed work.

A second study, "Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models" (Ahia, Kumar, Gonen, Kasai, Mortensen, Smith and Tsvetkov, 2023), measured the cost and utility of OpenAI's API across 22 typologically diverse languages. Its conclusion is blunt: speakers of a large number of these languages are overcharged while getting poorer results, and they are often populations for whom these services are already the least affordable.

Why the gap exists

Models split text with a tokenizer trained on large corpora that are overwhelmingly English. So the tokenizer learns efficient fragments for English: common words fit in a single token.

For a less-represented language, or one written in a different system (ideograms, non-Latin scripts), the tokenizer has learned no shortcuts. It falls back on smaller units, sometimes letter by letter or byte by byte. The same meaning then takes many more tokens.

An analogy: picture a dictionary of abbreviations designed for English. English words have their short code. Words from another language, absent from the dictionary, have to be spelled out in full. The message is identical, but it takes up far more space.

Three concrete consequences

Petrov and co-authors explicitly identify three effects of this imbalance.

Cost

More tokens for the same content means a higher bill, both on input and output, since pricing is per token.

Latency

The model processes and generates token by token. More fragmented text takes longer to read and produce, so responses are slower.

Context window

The context window is measured in tokens. A token-hungry language "fills" the window faster: you can give the model less context for the same budget.

What about Claude Code?

The figures above come from studies on OpenAI's tokenizers and on multilingual tokenizers. But the mechanism is general: any commercial LLM billed per token is affected, including Claude, which has its own tokenizer and also bills input and output tokens.

In practice, if you use Claude Code intensively and in multiple languages, keep in mind that the token volume, and therefore the budget, is not the same depending on the language of your prompts, your files, and the expected answers.

What you can do

Measure before assuming. Anthropic's API exposes a token count (count_tokens) that lets you check the real cost of a prompt in a given language, instead of guessing.
Pick the language to fit the stakes. For large, repetitive tasks where language barely matters (technical instructions, internal directives), writing in English can lower consumption. For content meant for humans, quality and accuracy come first: don't trade clarity for a few tokens.
Tend to the context. Since a token-hungry language fills the window faster, a concise CLAUDE.md and well-targeted prompts matter even more in those languages.

Next steps

Real costs of Claude Code: understand what gets billed, plan by plan.
Context management: get the most out of a limited window.
Sources: Petrov et al., NeurIPS 2023 (ouvre un nouvel onglet) and Ahia et al., 2023 (ouvre un nouvel onglet).