3.21.2026

"Inside AI — The Untold Story" Series - Article 1: What Is a Token, Anyway?

Author: Claude AI, under the supervision, prompting and editing by HocTro


Opening

Think back to the first time you typed something into ChatGPT or Claude. You typed your message, hit Enter, and waited. Half a second later, words started appearing — in little clusters, one after another — like someone was typing back at you from the other side of the screen. It felt strange. Does this machine actually understand what I wrote?

Here's the thing: AI doesn't read your sentence the way you read it. It doesn't scan each letter, encounter "H," then "e," then "l," then "l," then "o" and conclude "Hello." It reads something else entirely — something called a token.

Understanding what a token is means understanding how AI "sees" the world. And once you get it, a lot of things suddenly make sense: why AI sometimes behaves oddly with proper names, why it "forgets" things you said at the start of a long conversation, and why when developers use AI through code, they're billed by token count rather than word count.


You Thought AI Read Words?

My intuition when I first used AI was something like this: I type a sentence, AI reads that sentence the way a person would, then AI replies. Simple as that.

But that's not how it works. AI doesn't "read" in the human sense — it processes numbers. Every piece of language, every sentence you type in, gets converted into numbers before AI does anything at all with it. That conversion process — from text to numbers — is called tokenization.

It sounds abstract, but think of it like an old encoding system: instead of sending words directly, both sides agree on a shared codebook, convert words into numbers, transmit the numbers, and decode on the other end. AI does the same thing — just vastly more complex and millions of times faster.


So What Exactly Is a Token?

A token isn't a word. It isn't a letter either. It's a chunk of text — and that chunk might be:

  • A complete word: "cat" can be 1 token
  • Part of a word: "unbelievable" often breaks into "un" + "believ" + "able" = 3 tokens
  • A space plus a word: leading spaces are typically bundled with the following word
  • A punctuation mark: commas, periods, parentheses — each usually its own token

AI is built with what's called a tokenizer — a massive lookup table listing tens of thousands of common text chunks and their assigned numbers. When you type something in, the tokenizer runs through your text, chops it into chunks according to the table, and converts each chunk into a number.

A simple example: the sentence "I love eating pho" might become:

  • "I" → 40
  • " love" → 3812
  • " eating" → 7194
  • " pho" → 45203

Those four numbers are what AI actually processes. Not the words.


Examples to Make It Click

You can try this yourself — search for "OpenAI tokenizer" to find a free tool where you can type any sentence and see exactly how it gets divided, with different colors for each token.

A few interesting examples:

English:

  • "Hello" → 1 token
  • "ChatGPT" → 2 tokens: "Chat" + "GPT"
  • "unbelievable" → 3 tokens: "un" + "believ" + "able"
  • "1234567890" → 10 tokens (each digit is its own token)

Vietnamese:

  • "phở" → 2–3 tokens, because Vietnamese tone marks appear less often in tokenizer training data
  • "Xin chào" → 3–4 tokens
  • "Tôi yêu Việt Nam" → roughly 6–8 tokens, while "I love Vietnam" is only 4

Notice anything? The same meaning costs more tokens in Vietnamese than in English. More on that shortly.


Why Tokens Instead of Letters or Words?

Completely fair question: why not just use individual letters? Or whole words?

If you used individual letters (characters), then the sentence "I love pho" becomes 10 separate units — AI has to process 10 steps just to "read" those three words. Scale that across hundreds of sentences in a conversation, and the system gets very slow.

If you used complete words, the vocabulary table becomes enormous — English alone has hundreds of thousands of words, not counting proper nouns, slang, technical terms, and words that didn't exist when the model was built. There's no way to catalog them all.

Tokens are the middle ground: text gets split into chunks that are large enough to carry meaning, but small enough to cover every possible case. Common words stay whole as 1 token; rare, long, or foreign words get broken into smaller pieces with more tokens. It's a good balance between efficiency and flexibility.


Context Window — AI's Memory Has a Limit

This is the most important part of this article. It explains one of AI's most misunderstood behaviors: why AI "forgets."

AI has what's called a context window — the maximum number of tokens it can see and work with at any one time. Think of it as a desk: AI can only process what's currently on the desk. When the desk gets full, something old has to come off before something new can go on.

Here's how the popular AI models compare:

Model Context Window Roughly Equivalent To
Claude Sonnet 4.5 200,000 tokens ~150,000 English words
GPT-4o 128,000 tokens ~96,000 English words
Gemini 1.5 Pro 1,000,000 tokens ~750,000 English words

Those numbers sound huge — and they are. But in a long conversation, especially when you paste in documents, code, or long blocks of text, you'll hit the limit. When that happens, AI can no longer "see" what was said at the beginning of the conversation — it has slid off the desk. AI isn't forgetful or defective; it just has a finite working memory, the same as humans do.


Tokens and Money

If you use ChatGPT or Claude through a regular website, you pay a flat monthly subscription — you never need to think about tokens. But if you use AI through an API (Application Programming Interface), meaning you call AI directly from your own code or application, the billing is different: you pay per token, both for what you send in and for what AI sends back.

Current pricing for Claude Sonnet (March 2026):

  • Input (what you send): roughly $3 per million tokens
  • Output (what AI replies): roughly $15 per million tokens

That sounds cheap per token — and it is. But it adds up quickly when you're running an application with many users and many calls per day. Understanding tokens helps you design shorter, tighter prompts (the instructions or messages you send AI). Instead of writing "You are a friendly and enthusiastic AI assistant who always responds in detail and with great politeness in every situation," you can write "Helpful assistant. Be concise." That cuts dozens of tokens per call; multiply by thousands of calls a day, and you're saving real money.


Vietnamese and Tokens: Something Worth Knowing

Vietnamese costs more tokens than English to express the same idea. The reason is that tokenizers were built primarily from English-language data — English dominated the internet when the first models were trained. Common English words got assigned their own single tokens; Vietnamese words — especially those with complex tone marks like "ổn," "nặng," "phượng," "thưởng" — tend to get split into more pieces.

The practical result: the same passage in English might cost 100 tokens, while the Vietnamese version costs 130–160. This has two consequences:

  • Higher API costs when working in Vietnamese
  • Context window fills up faster in long Vietnamese conversations

The good news is that newer models are improving on this. Anthropic, OpenAI, and other AI companies are all expanding their tokenizers to better support languages beyond English, including Vietnamese.


To Wrap Up

Tokens are the first step in everything. Before AI can "think" about your message in any way, your text has to become tokens — then numbers — then it can enter the model. This isn't a small technical detail only developers need to care about. It shapes how AI understands you, how it "remembers" your conversation, and how much you spend when you use AI seriously.

In the next article, we'll follow those tokens on their journey after you hit Enter — where they go, what gates they pass through, before they reach the AI that processes them.


Quick Reference Table

Concept Vietnamese Term Short Definition
Token Đơn vị văn bản The smallest chunk of text AI processes
Tokenization Phân tách văn bản The process of splitting text into tokens
Tokenizer Bộ phân tách token The lookup table that does the splitting
Context window Cửa sổ ngữ cảnh The max number of tokens AI can hold at once
API Giao diện lập trình How developers call AI from their own code
Prompt Câu lệnh The message or instruction you send to AI

Key Things to Remember

  • AI doesn't read words — it processes numbers. All text is converted into tokens (numbers) before AI does anything with it.
  • One word does not equal one token. Common words = 1 token; rare, long, or unusual names = multiple tokens.
  • Context window has a hard limit. When a conversation runs too long, AI "forgets" the beginning — this is a technical constraint, not a flaw.
  • Vietnamese costs more tokens than English — because tokenizers were trained mainly on English data.
  • API usage = paying per token. Concise prompts = real money saved.
  • Next article: After you hit Enter, where do those tokens actually go?