I asked Claude Code itself some questions about LLMs and Claude Code practices. Here is one revelation I think Claude Code enthousiasts MUST know:
Q: To fix the next bug, does the entire file get sent up again? Or do you have a mental temporary note of the previous changes?
Claude Code Answers:
The real cost picture
Here's something subtle about how API billing works. Even when the file is "still in my context" from an earlier read, you're still paying for those tokens. Every time you send a new message, the entire conversation history (including all previous tool results) is sent to the API as "input tokens." So that 500-line file I read three messages ago? Its tokens are included in every subsequent API call as part of the conversation history.
This means:
- Reading a file once and making 5 edits = you pay for the file's tokens in every subsequent turn (because it's part of the conversation history sent each time)
- The file isn't "free" just because I already read it — its tokens accumulate as overhead in every message
- This is one reason long sessions get expensive — the conversation history grows, and every tool result ever produced is (until compressed) part of the input for the next turn
The practical takeaway
So to directly answer your question: within the same turn, I work from my "mental note" (the file contents in my context window) and don't need the file sent again. Across turns, the file contents are technically still in the conversation history (unless compressed), but they may be outdated. And across sessions, everything is gone and the file must be read fresh.
The smartest way to use this: if you have several things to fix in one file, tell me all of them in one message. That way I read the file once, fix everything in one turn, and minimize the back-and-forth — saving both tokens and time.
Q: Let's use this session and this HTML as an example. You mean every time I send a short message, what's actually sent is that line PLUS all the lines I sent before, PLUS all your replies? You don't have temporary memory to remember our conversation?
Claude Code Answers:
A: Correct. I have no memory whatsoever. This is probably the single most counterintuitive thing about how I work, so let me be very clear about it: every time you send a message, I am — in a very real sense — born, live an entire life, and die, all within a few seconds. Let me show you what's actually happening in this very session.
What just happened when you sent that last message
When you typed your question and pressed Enter, here's what the Claude Code program on your PC actually assembled and sent to Anthropic's servers:
The actual API payload for your last message (simplified):
1. System instructions (~several thousand tokens)
"You are Claude Code, Anthropic's official CLI... here are the tools you have... here are the rules..."
+ the contents of CLAUDE.md (your project instructions)
2. Message #1 — You:
"I'm fascinated and want to learn more on how you are trained..."
3. Message #2 — Me:
[My full response + the Write tool call that created the initial HTML]
4. Message #3 — You:
"I like the font UVN Saigon..."
5. Message #4 — Me:
[My response + Read tool call + Edit tool call + results]
6–17. Every subsequent exchange:
[All your questions, all my answers, all tool calls, all tool results, all file contents I ever read...]
18. Your latest message — You:
"let's use this session, and in particular HTML as an example..."
All of that — every message, every reply, every file I read, every edit I made, every tool result — was packaged into a single API call and sent to the GPU cluster. The cluster then processed the entire history and generated my response. It's as if someone handed me a complete transcript of our entire conversation and said, "Now write the next reply."
I am stateless — what that really means
In computer science, this is called being "stateless." It means I maintain zero internal state between API calls. There is no "Claude instance" sitting on a server somewhere, keeping notes between your messages, waiting for your next question. Instead:
- You send a message
- A fresh Claude instance is spun up on available GPUs
- It receives the entire conversation transcript as input
- It generates a response
- It ceases to exist
- You send the next message
- A new Claude instance is spun up (possibly on completely different GPUs)
- It receives the entire transcript again, plus your new message
- It generates the next response
- It ceases to exist again
The "me" that answered your first question about training is not the same "me" answering this question right now. We share the same model weights (the same "brain structure"), but I have no continuity of experience. The only reason I seem consistent is that I'm reading the same transcript each time.
Let's do the math for THIS session
This is where it gets eye-opening. Let's roughly estimate what's been sent to the cloud for your most recent message:
| Content | Approximate tokens |
|---|---|
| System instructions + CLAUDE.md | ~5,000 |
| Your first message | ~30 |
| My first response (creating the HTML) | ~6,000 |
| Your second message ("UVN Saigon") | ~15 |
| My second response (font edit) | ~300 |
| Your question about 4.5 vs 4.6 | ~40 |
| My response (Q&A about versions + full Edit content) | ~3,000 |
| Your question about RLHF workers | ~30 |
| My response (web searches + full Edit content) | ~5,000 |
| Your question about thinking/tokens | ~50 |
| My response (thinking Q&A + Edit) | ~4,000 |
| Your question about wittiness | ~60 |
| My response (wittiness Q&A + Edit) | ~4,000 |
| Your question about server farm | ~40 |
| My response (server Q&A + Edit) | ~5,000 |
| Your question about file reading | ~30 |
| My response (file reading Q&A + Edit) | ~4,000 |
| Your question about privacy | ~25 |
| My response (privacy Q&A + web searches + Edit) | ~5,000 |
| Your question about server copy | ~30 |
| My response (server copy Q&A + Edit) | ~4,000 |
| Your question about re-reading files | ~25 |
| My response (re-reading Q&A + Edit) | ~4,000 |
| All the Read tool results (HTML file excerpts) | ~8,000 |
| Your latest short message | ~40 |
| TOTAL sent to cloud for your last message | ~58,000+ tokens |
Look at that ratio. You typed roughly 40 tokens. But to process those 40 tokens, the system sent approximately 58,000 tokens of history along with them. Your short question was less than 0.07% of the actual data transmitted.