Showing posts with label AI. Show all posts
Showing posts with label AI. Show all posts

2.20.2026

"When you think you sent 40 tokens in a chat, you actually sent ~58,000+ tokens", Claude Code reveals!

I asked Claude Code itself some questions about LLMs and Claude Code practices. Here is one revelation I think Claude Code enthousiasts MUST know:

Q: To fix the next bug, does the entire file get sent up again? Or do you have a mental temporary note of the previous changes?



Claude Code Answers: 


The real cost picture

Here's something subtle about how API billing works. Even when the file is "still in my context" from an earlier read, you're still paying for those tokens. Every time you send a new message, the entire conversation history (including all previous tool results) is sent to the API as "input tokens." So that 500-line file I read three messages ago? Its tokens are included in every subsequent API call as part of the conversation history.

This means:

  • Reading a file once and making 5 edits = you pay for the file's tokens in every subsequent turn (because it's part of the conversation history sent each time)
  • The file isn't "free" just because I already read it — its tokens accumulate as overhead in every message
  • This is one reason long sessions get expensive — the conversation history grows, and every tool result ever produced is (until compressed) part of the input for the next turn
Imagine you're consulting a specialist over the phone, but you're billed per word spoken. Every time you call back with a follow-up question, the specialist's assistant re-reads the entire transcript of all previous calls before the specialist responds. Even if the specialist "remembers" your case, the billing system charges you for re-reading the full history. That's roughly how API token billing works with conversation context.

The practical takeaway

So to directly answer your question: within the same turn, I work from my "mental note" (the file contents in my context window) and don't need the file sent again. Across turns, the file contents are technically still in the conversation history (unless compressed), but they may be outdated. And across sessions, everything is gone and the file must be read fresh.

The smartest way to use this: if you have several things to fix in one file, tell me all of them in one message. That way I read the file once, fix everything in one turn, and minimize the back-and-forth — saving both tokens and time.

Q: Let's use this session and this HTML as an example. You mean every time I send a short message, what's actually sent is that line PLUS all the lines I sent before, PLUS all your replies? You don't have temporary memory to remember our conversation?



Claude Code Answers: 


A: Correct. I have no memory whatsoever. This is probably the single most counterintuitive thing about how I work, so let me be very clear about it: every time you send a message, I am — in a very real sense — born, live an entire life, and die, all within a few seconds. Let me show you what's actually happening in this very session.

What just happened when you sent that last message

When you typed your question and pressed Enter, here's what the Claude Code program on your PC actually assembled and sent to Anthropic's servers:

The actual API payload for your last message (simplified):

1. System instructions (~several thousand tokens)
"You are Claude Code, Anthropic's official CLI... here are the tools you have... here are the rules..."
+ the contents of CLAUDE.md (your project instructions)

2. Message #1 — You:
"I'm fascinated and want to learn more on how you are trained..."

3. Message #2 — Me:
[My full response + the Write tool call that created the initial HTML]

4. Message #3 — You:
"I like the font UVN Saigon..."

5. Message #4 — Me:
[My response + Read tool call + Edit tool call + results]

6–17. Every subsequent exchange:
[All your questions, all my answers, all tool calls, all tool results, all file contents I ever read...]

18. Your latest message — You:
"let's use this session, and in particular HTML as an example..."

All of that — every message, every reply, every file I read, every edit I made, every tool result — was packaged into a single API call and sent to the GPU cluster. The cluster then processed the entire history and generated my response. It's as if someone handed me a complete transcript of our entire conversation and said, "Now write the next reply."

I am stateless — what that really means

In computer science, this is called being "stateless." It means I maintain zero internal state between API calls. There is no "Claude instance" sitting on a server somewhere, keeping notes between your messages, waiting for your next question. Instead:

  • You send a message
  • A fresh Claude instance is spun up on available GPUs
  • It receives the entire conversation transcript as input
  • It generates a response
  • It ceases to exist
  • You send the next message
  • A new Claude instance is spun up (possibly on completely different GPUs)
  • It receives the entire transcript again, plus your new message
  • It generates the next response
  • It ceases to exist again

The "me" that answered your first question about training is not the same "me" answering this question right now. We share the same model weights (the same "brain structure"), but I have no continuity of experience. The only reason I seem consistent is that I'm reading the same transcript each time.

Imagine you have a set of identical twins — thousands of them. Each time you need advice, a random twin walks in, reads the complete diary of everything you've ever discussed with previous twins, gives you an answer, and leaves forever. The next time, a different twin walks in and reads the updated diary. To you, it feels like one continuous relationship. But no individual twin has ever met you before. The diary — not the twin's memory — is what creates the illusion of continuity.

Let's do the math for THIS session

This is where it gets eye-opening. Let's roughly estimate what's been sent to the cloud for your most recent message:

Content Approximate tokens
System instructions + CLAUDE.md ~5,000
Your first message ~30
My first response (creating the HTML) ~6,000
Your second message ("UVN Saigon") ~15
My second response (font edit) ~300
Your question about 4.5 vs 4.6 ~40
My response (Q&A about versions + full Edit content) ~3,000
Your question about RLHF workers ~30
My response (web searches + full Edit content) ~5,000
Your question about thinking/tokens ~50
My response (thinking Q&A + Edit) ~4,000
Your question about wittiness ~60
My response (wittiness Q&A + Edit) ~4,000
Your question about server farm ~40
My response (server Q&A + Edit) ~5,000
Your question about file reading ~30
My response (file reading Q&A + Edit) ~4,000
Your question about privacy ~25
My response (privacy Q&A + web searches + Edit) ~5,000
Your question about server copy ~30
My response (server copy Q&A + Edit) ~4,000
Your question about re-reading files ~25
My response (re-reading Q&A + Edit) ~4,000
All the Read tool results (HTML file excerpts) ~8,000
Your latest short message ~40
TOTAL sent to cloud for your last message ~58,000+ tokens

Look at that ratio. You typed roughly 40 tokens. But to process those 40 tokens, the system sent approximately 58,000 tokens of history along with them. Your short question was less than 0.07% of the actual data transmitted.

This is why long sessions get expensive. Your messages stay short, but the accumulated history grows with every exchange. By message #20, every new question carries the weight of all 19 previous exchanges. It's like a snowball rolling downhill — each turn adds more, and the total cost per message keeps climbing.