3.21.2026

"Inside AI — The Untold Story" Series - Article 2: You Hit Enter — What Happens?

Author: Claude AI, under the supervision, prompting and editing by HocTro


Opening

You finish typing your question. You hit Enter. And half a second later — sometimes less — words start appearing on your screen.

In that tiny sliver of time, what actually happened? Where did your message go? Who did it encounter along the way? How many computers were involved?

The answer is less simple than it looks — but not nearly as complicated as it sounds, if we take it one step at a time. This article is a close-up follow of your message, from the moment it leaves your keyboard to the moment AI starts writing back.


First Step: Your Message Becomes Tokens

Remember the previous article? Before your message goes anywhere, it gets converted into tokens — small chunks of text, each one a number. "I love pho" doesn't get sent as those three words. It becomes a sequence of numbers: [40, 3812, 7194, 45203].

But it's not just your current message that gets sent. There's more. The application packages everything AI needs to give you a useful reply:

  • The full conversation history from the very beginning
  • The system prompt — hidden instructions the developer set in advance, like "Always reply in Vietnamese" or "Be concise and polite"
  • Configuration settings — response temperature, maximum length, and so on

All of that gets bundled into one packet — and that packet is sent using the standard that runs the entire internet: an HTTP request.


HTTP Request — The Digital Envelope

HTTP (HyperText Transfer Protocol) is the common language that everything on the internet uses to communicate. Every time you open a webpage, download an image, or submit a form — it all goes through HTTP. AI is no different.

Think of an HTTP request as an envelope. On the outside of the envelope is the recipient's address — in this case, Anthropic's server address. Inside the envelope is the contents: the full conversation encoded as tokens, along with all the configuration settings.

That envelope is encrypted using HTTPS (the "S" stands for "Secure") before it's sent, which means even if someone intercepts the packet in transit, all they see is meaningless scrambled data — the actual contents are unreadable.

The packet leaves your computer, passes through your home router, travels through your internet service provider, hops through multiple relay points across the global network — and arrives at Anthropic's servers, typically in the United States, within a few dozen milliseconds. Faster than a blink.


API — The Official Gateway

That packet knocks on a door called an API (Application Programming Interface). The name sounds technical, but the idea is familiar: an API is the official channel through which software programs talk to each other.

An easy example: when you use a ride-hailing app and it shows you a Google Maps interface inside the app, that's because the app is using Google Maps' API to pull map data. The app doesn't have its own map — it asks Google through the API, and Google replies.

AI works the same way. Your chat application — whether it's the Claude.ai website or some third-party software that has AI built in — sends your question to Anthropic's API. The API receives the packet, checks whether it's valid, and passes it inward to the model for processing.

One important distinction: the API is not the AI. The API is just the gate — it receives requests and returns responses. The actual AI lives deeper inside, behind that gate.


API Key — Your Identification

Before the API processes anything, it asks one question: Who are you?

The answer is an API key — a long, random string of characters that looks something like: sk-ant-api03-xK9mP2... (much longer in practice). Every user or application gets a unique API key issued by Anthropic.

When the packet arrives, the API key gets checked immediately:

  • Is this key valid?
  • Does this account still have credit?
  • Does this account have permission to use this model?

If everything checks out, the request is accepted and passed inward. If the key is wrong, the account is out of credit, or access has been blocked — the API returns an error instantly, without processing anything.

This is why an API key is as sensitive as a password. If someone gets hold of your API key, they can use your account — and you get the bill. Never share your API key. Never put it directly in code that you push to a public GitHub repository.


Inside the Server: From Packet to Model

The API has received and validated the packet. Now it needs to route it to the right model. Anthropic has several models — Claude Haiku, Claude Sonnet, Claude Opus — each with different speeds and capabilities. The packet specifies which model to use, and the routing system directs it to the right place.

The model receives the full context — conversation history plus the new message plus the system prompt — as one long sequence of tokens. It begins processing, calculating which token should come next, then which one after that. This is the moment AI actually "thinks" — we'll go deep on that in Article 3.

Here's the interesting part: the model doesn't wait until it has calculated the entire answer before sending anything back. It sends each token back the moment it's computed — and this leads to one of the most distinctive features of the AI experience: streaming.


Streaming — Why Do Words Appear in Chunks?

You've probably noticed: when AI replies, the words don't all appear at once. They show up gradually — in small clusters — as if someone is typing from the other side.

That's not a visual effect — it's streaming. Instead of waiting for AI to finish computing the entire response (which could take 5–10 seconds for a long answer) and then displaying it all at once, the system sends each token back the moment it's ready. Your screen receives that token and renders it into text immediately.

The result: you see words appearing within a second, even if the complete answer takes several more seconds to finish. The experience feels dramatically more responsive than waiting for the whole thing.

Think of it like video streaming: Netflix doesn't download the entire movie to your device before letting you watch it — it sends small chunks continuously, and you watch as the rest is still loading. AI streaming works on the same principle.

A small but interesting detail: when you watch AI "type," your screen is actually receiving tokens — not individual letters. Each token, when decoded, might be a word, part of a word, or a small phrase — which is why sometimes words appear in large chunks, sometimes letter by letter, depending on the length of each token.


Latency — Why Is AI Sometimes Slow?

All of this sounds fast — and it usually is. But sometimes you hit Enter and wait a few noticeable seconds. What's happening?

Latency is the time from when you send your message to when the first character appears on your screen. Several things affect it:

Geographic distance: Anthropic's servers are primarily in the United States. If you're in Vietnam, your packet has to travel halfway around the world — and even at fiber optic speeds, there's a physical limit. Add in the time at relay points and routing hops, and it adds up.

Server load: If millions of people are all asking AI questions at the same time — especially during peak hours in the US in the evening — the servers are busy. Requests queue up. Think of calling a customer service hotline on a Monday morning.

Context length: The model takes time to process the entire context before it can generate the very first token. The longer the conversation, the more tokens in the context, the longer that initial processing takes — even if the streaming after that runs quickly.

Question complexity: A simple request like "Translate 'hello' to Vietnamese" is processed very differently from "Write me a 2,000-word analysis of the history of Vietnam." The model doesn't know in advance how long its answer will be — it generates tokens until it's done.


Summing Up: A Journey in Half a Second

From the moment you hit Enter to the moment the first word appears — usually under one second — here's the sequence of events:

  1. Your message is tokenized into a sequence of numbers
  2. The full context (history + system prompt + settings) is packaged into an HTTP request
  3. The packet is HTTPS-encrypted and sent across the internet to Anthropic's servers
  4. The API receives the packet, checks the API key, and validates the account
  5. The request is routed to the correct model (Haiku, Sonnet, or Opus)
  6. The model processes the full context and starts generating the first token
  7. Tokens are streamed back — your screen renders each cluster of words as they arrive

Seven steps, in half a second. Not bad.

The next article is about step 6 — the moment the model actually "thinks." How does it generate a response? Does it guess or does it understand? And why does it sometimes invent things that sound completely convincing but are entirely made up?


Quick Reference Table

Concept Vietnamese Term Short Definition
HTTP / HTTPS Giao thức truyền tải The internet's standard communication protocol; S = encrypted
HTTP request Yêu cầu HTTP The packet your device sends to the server
API Giao diện lập trình The official gateway for software-to-software communication
API key Khóa API Your authentication credential for using the API
System prompt Hướng dẫn hệ thống Hidden instructions pre-set by the developer
Streaming Truyền theo luồng Sending tokens back one by one as they're generated
Latency Độ trễ The delay between sending and receiving the first token
Routing Định tuyến Directing a request to the correct model or server

Key Things to Remember

  • Your message is more than just your message. It's bundled with the full conversation history, system prompt, and settings — all packaged into one HTTP request.
  • The API is the gate, not the AI. It receives requests, checks the API key, and passes things inward to the model — which is the actual AI.
  • Your API key is as sensitive as a password. Anyone who has it can use your account — and you pay the bill.
  • Streaming is real, not cosmetic. AI genuinely sends each token back the moment it's computed, rather than waiting to finish the whole answer.
  • Latency is affected by: geographic distance, server load, context length, and question complexity.
  • Next article: The model has received all those tokens — now what does it do? How AI generates a reply, one token at a time.