Showing posts with label Boris Cherny. Show all posts
Showing posts with label Boris Cherny. Show all posts

2.21.2026

The Secrets of Claude Code From the Engineers Who Built It

Boris Cherny (creator of Claude Code) and Cat Wu (product lead) sit down with Dan Shipper to reveal how Claude Code was built, how Anthropic dogfoods it internally, and where the future of AI-powered coding is headed.

From "AI and I" — an Every podcast by Dan Shipper.
Guests: Boris Cherny, Creator and Head of Claude Code at Anthropic; Cat Wu, Product Lead for Claude Code at Anthropic.
Host: Dan Shipper, CEO of Every.








Introduction

"What made it work really well is that Claude Code has access to everything that an engineer does at the terminal. Everything you can do, Claude Code can do. There's nothing in between." — "There's this really old idea in product called latent demand. You build a product in a way that is hackable, that is kind of open-ended enough that people can abuse it for other use cases it wasn't really designed for, and you build for that because you kind of know there's demand for it."

DAN: Cat, Boris, thank you so much for being here.

BORIS: Thanks for having us.

DAN: So for people who don't know you, you are the creators of Claude Code. Thank you very much from the bottom of my heart. I love Claude Code.

BORIS: That's amazing to hear. That's what we love to hear.

DAN: Okay, I think the place I want to start is when I first used it. There was like this moment — I think it was around when Sonnet 3.7 came out where I used it and I was like, "Holy — this is like a completely new paradigm. It's a completely new way of thinking about code." And the big difference was you went all the way and just eliminated the text editor and you're just like all you do is talk to the terminal and that's it. Previous paradigms of AI programming, previous harnesses have been like you have a text editor and you have the AI on the side and it's kind of like — or it's a tab complete. So, take me through that decision process.


I. Claude Code's Origin Story

BORIS: I think the most important thing is it was not intentional at all. We sort of ended up with it. So at the time when I joined Anthropic we were still on different teams at the time. There was this previous predecessor to Claude Code. It was called Clide — C-L-I-D-E. And it was this research project, you know, it took like a minute to start up. It was this really heavy Python thing. It had to run a bunch of indexing and stuff. And when I joined I wanted to ship my first PR and I hand wrote it like a, you know, like a noob — I didn't know about any of these tools.

DAN: Thank you for admitting that.

BORIS: I didn't know any better and then I put up this PR and Adam Wolf who was the manager for our team for a while. He was my ramp up buddy and he just rejected the PR and he was like, "You wrote this by hand. What are you doing? Use Clyde." Because he was also hacking a lot on Clyde at the time. And so I tried Clyde. I gave it the description of the task and it just one-shot this thing and this was, you know, Sonnet 3.5. So I still had to fix a thing even for this kind of basic task and the harness was super old. So it took like 5 minutes to turn this thing out and just took forever. But it worked and I was just mind-blown that this was even possible and that just kind of got the gears turning. Maybe you don't actually need an IDE.

And then later on I was prototyping using the Anthropic API and the easiest way to do that was just building a little app in the terminal because that way I didn't have to build a UI or anything. And I started just making a little chat app and then I just started thinking maybe we could do something a little bit like Clyde. So let me build a little Clyde and it actually ended up being a lot more useful than that without a lot of work. And I think the biggest revelation for me was when we started to give the model tools. It just started using tools and it was this insane moment. Like the model just wants to use tools. We gave it bash and it just started using bash, writing AppleScript to automate stuff in response to questions. And I was like this is just the craziest thing. I've never seen anything like this. Because at the time I had only used IDEs with like text editing, a little one-line autocomplete, multi-line autocomplete, whatever.

So that's where this came from. It was this kind of convergence of prototyping but also seeing what's possible in a very rough way. And this thing ended up being surprisingly useful. And I think it was the same for us. For me it was like kind of Sonnet 4, Opus 4. That's where that magic moment was. I was like, "Oh my god, this thing works."


II. The Tool Moment — Bash and Beyond

DAN: That's interesting. Tell me about that tool moment because I think that is one of the special things about Claude Code — it just writes bash and it's really good at it. And I think a lot of previous agent architectures or even anyone building agents today, your first instinct might be okay, we're going to give it a find file tool and then we're going to give it an open file tool and you build all these custom wrappers for all the different actions you might want the agent to take. But Claude Code just uses bash and it's really good at it. How do you think about what you learned from that?

BORIS: I think we're at this point right now where Claude Code actually has a bunch of tools. I think it's like a dozen or something like this. We actually add and remove tools most weeks. So this changes pretty often. But today there actually is a search tool — there's a tool for searching. And we do this for two reasons. One is the UX, so we can show the result a little nicer to the user because there's still a human in the loop right now for most tasks. And the second one is for permissions. So if you say in your Claude Code settings.json "on this file you cannot read," we have to enforce this. We enforce it for bash but we can do it a little bit more efficiently if we have a specific search tool.

But definitely we want to unship tools and kind of keep it simple for the model. Like last week or two weeks ago we unshipped the LS tool because in the past we needed it but then we actually built a way to enforce this permission system for bash. So in bash, if we know that you're not allowed to read a particular directory, Claude's not allowed to LS that directory. And because we can enforce that consistently, we don't need this tool anymore. And this is nice because it's a little less choice for Claude. A little less stuff in context.


III. How Anthropic Dogfoods Claude Code

DAN: And how do you guys split responsibility on the team?

CAT: I would say Boris sets the technical direction and has been the product visionary for a lot of the features that we've come out with. I see myself as more of a supporting role to make sure that our pricing and packaging resonates with our users. Two, making sure that we're shepherding all our features across the launch process. So from deciding, "All right, these are the prototypes that we should definitely ant-food" to setting the quality threshold for ant-fooding through to communicating that to our end users. And there's definitely some new initiatives that we're working on. I would say historically a lot of Claude Code has been built bottoms up — like Boris and a lot of the core team members have just had these great ideas for to-do list, sub agents, hooks — all these are bottoms up. As we think about expanding to more services and bringing Claude Code to more places, I think a lot of those are more like, "All right, let's talk to customers. Let's bring engineers into those conversations and prioritize those services and knock them out."

DAN: What is ant-fooding?

CAT: Oh, ant-fooding. It means dog-fooding. So, Anthropic — ant. Our nickname for internal employees is ant. And so ant-fooding is our version of dog-fooding. Internally over 70 or 80% of ants — technical Anthropic employees — use Claude Code every day. And so every time we are thinking about a new feature, we push it out to people internally and we get so much feedback. We have a feedback channel. I think we get a post every five minutes. And so you get really quick signal on whether people like it, whether it's buggy, or whether it's not good and we should unship it.

DAN: You can tell that someone that is building stuff is using it all the time to build it because the ergonomics just make sense if you're trying to build stuff and that only happens if you're ant-fooding.

BORIS: Yeah. And I think that's a really interesting paradigm for building new stuff — that sort of bottoms up "I make something for myself."

BORIS: And Cat is also so humble. I think Cat has a really big role in the product direction also — like it comes from everyone on the team. And these specific examples — this actually came from everyone on the team. Like to-do lists and sub agents, that was Sid. Hooks, Dixon shipped that. Plugins, Daisy shipped that. So everyone on the team — these ideas come from everyone.

BORIS: And so I think for us, we build this core agent loop and this core experience and then everyone on the team uses the product all the time. And everyone outside the team uses the product all the time. And so there's just all these chances to build things that serve these needs. Like for example, bash mode — you know, the exclamation mark and you can type in bash commands. This was just many months ago. I was using Claude Code and I was going back and forth between two terminals and just thought it was kind of annoying. And just on a whim, I asked Claude to kind of think of ideas. It thought of this exclamation mark bash mode. And then I was like, "Great, make it pink and then ship it." It just did it. And that's the thing that still kind of persisted. And you know, now you see kind of others also catching on to that.

DAN: That's funny. I actually didn't know that. And that's extremely useful because I always have to open up a new tab to run any bash commands. So you just do an exclamation point and then it just runs it directly instead of filtering it through all the Claude stuff.

BORIS: Yeah. And Claude Code sees the full output too.

DAN: Interesting. That's perfect. So anything you see in the Claude Code view, Claude Code also sees.

BORIS: Yeah. And this is kind of a UX thing that we're thinking about. In the past tools were built for engineers, but now it's equal parts engineers and model. And so as an engineer, you can see the output, but it's actually quite useful for the model also. And this is part of the philosophy — everything is dual use. So for example, the model can also call slash commands. Like I have a slash command for /commit where I run through a few different steps like diffing and generating a reasonable commit message and this kind of stuff. I run it manually but also Claude can run this for me. And this is pretty useful because we get to share this logic. We get to define this tool and then we both get to use it.

DAN: What are the differences in designing tools that are dual use from designing tools that are used by one or the other?

BORIS: Surprisingly, it's the same. So far. I sort of feel like this kind of elegant design for humans translates really well to the models. So, you're just thinking about what would make sense to you and the model generally — it makes sense to the model too if it makes sense to you.

CAT: I think one of the really cool things about Claude Code being a terminal UI and what made it work really well is that Claude Code has access to everything that an engineer does at the terminal. And I think when it comes to whether the tool should be dual use or not, making them dual use actually makes the tools a lot easier to understand. It just means that everything you can do, Claude Code can do. There's nothing in between.

DAN: There are a couple of those decisions. No code editor, it's in the terminal, so it has access to your files. And it's on your computer versus in the cloud in a virtual machine. So you get to use it in a repeated way where you can build up your CLAUDE.md file or build slash commands and all that kind of stuff where it becomes very composable and extensible from a very simple starting point. And I'm curious about how you think about, for people who are thinking about "I want to build an agent" — probably not Claude Code, but something else — how you get that simple package that then can extend and be really powerful over time.

BORIS: For me, I start by just thinking about it like developing any kind of product where you have to solve the problem for yourself before you can solve it for others. And this is something that they teach in YC — you have to start with yourself. If you can solve your own problem, it's much more likely you're solving the problem for others. And I think for coding, starting locally is the reasonable thing. And you know now we have Claude Code on the web. So you can also use it with a virtual machine and you can use it in a remote setting. And this is super useful when you're on the go — you want to take that from your phone. And this is sort of — we started proving this out a step at a time where you can do @claude in GitHub and I use this every day. Like on the way to work I'm at a red light, I probably shouldn't be doing this, but I'm on GitHub at a red light and then I'm like @claude, fix this issue or whatever. And so it's just really useful to be able to control it from your phone. And this kind of proves out this experience. I don't know if this necessarily makes sense for every kind of use case. For coding, I think starting local is right. I don't know if this is true for everything, though.


IV. Boris and Cat's Favorite Slash Commands

DAN: What are the slash commands you guys use?

CAT: /commit. Yeah, the /commit command makes it a lot faster for Claude to know exactly what bash commands to run in order to make a commit.

DAN: And what does the /commit slash command do for people who are unfamiliar?

CAT: It just tells it exactly how to make a commit. And you can dynamically say, "Okay, these are the three bash commands that need to be run." And what's pretty cool is also we have this templating system built into slash commands. So we actually run the bash commands ahead of time. They're embedded into the slash command. And you can also pre-allow certain tool invocations. So for that slash command we say allow git commit, git push, gh — and so you don't get asked for permission after you run the slash command because we have a permission-based security system. And then also it uses Haiku, which is pretty cool. So it's a cheaper model and faster.

BORIS: Yeah, and for me I use commit, PR, feature dev — we use a lot. So Sid created this one. It's kind of cool. It walks you through step by step building something. So we prompt Claude to first ask me exactly what I want — build the specification — and then build a detailed plan and then make a to-do list, walk through step by step. So it's kind of like more structured feature development. And then I think the last one that we probably use a lot — we use security review for all of our PRs and then also code review. So Claude does all of our code review internally at Anthropic. You know, there's still a human approving it, but Claude does the first step in code review. That's just a /code-review command.


V. How Boris Uses Claude Code to Plan Feature Development

DAN: I would love to go deeper into the "how do you make a good plan?" So the feature dev thing — because I think there's a lot of little tricks that I'm starting to find or people are starting to find that work and I'm curious what are things that we're missing. So for example, one unintuitive step of the plan development process is even if I don't exactly know what the thing that needs to be built is — I just have a little sentence in my mind like "I want feature X" — I have Claude just implement it without giving it anything else and I see what it does. And that helps me understand like, "Okay, here's actually what I mean because it made all these different mistakes or it did something that I didn't expect that might be better." And then I use the learning from the sort of throwaway development. I just clear it out. And then that helps me write a better plan spec for the actual feature development, which is something that you would never do before because it'd be too expensive to just YOLO send an engineer on a feature that you hadn't actually speced out. But because you have Claude going through your codebase and doing stuff, you can learn stuff from it. That helps inform the actual plan that you make.

BORIS: Yeah. And I can start and I'm curious how you use it too. I think there's a few different modes. One is prototyping mode. So traditional engineering prototyping — you want to build the simplest possible thing that touches all the systems just so you can get a vague sense of like what are the systems, there's unknowns, and just to trace through everything. And so I do the exact same thing as you, Dan — Claude just does the thing and then I see where it messes up and then I'll ask it to just throw it away and do it again. So just hit escape twice, go back to the old checkpoint and then try again.

I think there's also maybe two other kinds of tasks. One is just things that Claude can one-shot and I feel pretty confident it can do it. So I'll just tell it and then I'll just go to a different tab and I'll Shift-Tab to auto-accept and then just go do something else or go to another one of my Claudes and tend to that while it does this.

But also there's this kind of harder feature development. These are things that maybe in the past it would have taken a few hours of engineering time. And for this usually I'll Shift-Tab into plan mode and then align on the plan first before it even writes any code. And I think what's really hard about this is the boundary changes with every model and in kind of a surprising way — the newer models, they're more intelligent so the boundary of what you need plan mode for got pushed out a little bit. Before you used to need to plan, now you don't. And I think it's this general trend of stuff that used to be scaffolding — with a more advanced model, it gets pushed into the model itself. And the model kind of tends to subsume everything over time.


VI. Building Scaffolding the Model Will Subsume

DAN: How do you think about building an agent harness that isn't just going to be — you're not spending a bunch of time building stuff that is just going to be subsumed into the model in 3 months when the new Claude comes out? How do you know what to build versus what to just say, "It doesn't work quite yet, but next time it's going to work, so we're not going to spend time on it."

CAT: I think we build most things that we think would improve Claude Code's capabilities, even if that means we'll have to get rid of it in 3 months. If anything, we hope that we will get rid of it in three months. I think for now, we just want to offer the most premium experience possible and so we're not too worried about throwaway work.

BORIS: And an example of this is something like even plan mode itself. I think we'll probably unship it at some point when Claude can just figure out from your intent that you probably want to plan first. Or you know, for example, I just deleted like 2,000 tokens or something from the system prompt yesterday just because Sonnet 4.5 doesn't need it anymore. But Opus 4.1 did need it.

DAN: What about the case where the latest frontier model doesn't need it but you're trying to figure out how to make it more efficient because you have so many users that you're maybe not going to use Opus or Sonnet 4.5 for everything. Maybe you're going to use Haiku. So there's a trade-off between having a more elaborate harness for Haiku versus just not spending time on it, using Sonnet, eating the cost, and working on more frontier type stuff.

CAT: In general, we've positioned Claude Code to be a very premium offering. So our north star is making sure that it works incredibly well with the absolutely most powerful model we have, which is Sonnet 4.5 right now. We are investigating how to make it work really well for future generations of smaller models, but it's not the top priority for us.

DAN: One thing that I notice — we get models often and thank you very much for this. We get models a lot before they come out and it's our job to kind of figure out if it's any good. And over the last six months, when I'm testing Claude, for example in the Claude app with a new frontier model, it's actually very hard to tell whether it's better immediately. But it's really easy to tell in Claude Code because the harness matters a lot for the performance that you get out of the model. And you guys have the benefit of building Claude Code inside of Anthropic. So there's a much tighter integration between the fundamental model training and the harness that you're building and they seem to really impact each other. How does that work internally?

BORIS: Yeah, I think the biggest thing is researchers just use this. And so as they see what's working, what's not, they can improve stuff. We do a lot of eval to communicate back and forth and understand where exactly the model's at. But yeah, there's this frontier where you need to give the model a hard enough task to really push the limit of the model. And if you don't do this, then all models are kind of equal. But if you give it a pretty hard task, you can tell the difference.


VII. Everything Anthropic Has Learned About Using Sub-Agents Well

DAN: What sub-agents do you use?

BORIS: I have a few. I have a planner sub-agent that I use. I have a code review sub-agent. Code review is actually something where sometimes I use a sub-agent, sometimes I use a slash command. Usually in CI it's a slash command, but in synchronous use I use a sub-agent for the same thing. It's kind of a matter of taste. I think when you're running synchronously, it's kind of nice to fork off the context window a little bit because all the stuff that's going on in the code review, it's not relevant to what I'm doing next. But in CI, it just doesn't matter.

DAN: Are you ever spawning like 10 sub-agents at once? And for what?

BORIS: For me, I do it mostly for big migrations. Actually we have this coder slash command that we use — there's a bunch of sub-agents there. And so one of the steps is like find all the issues. So there's one sub-agent that's checking for CLAUDE.md compliance. There's another sub-agent that's looking through git history to see what's going on. Another sub-agent that's looking for obvious bugs. And then we do this deduping quality step after. So they find a bunch of stuff. A lot of these are false positives and so then we spawn like five more sub-agents and these are all just checking for false positives. And in the end, the result is awesome. It finds all the real issues without the false issues.

DAN: That's great. I actually do that. So one of my non-technical Claude Code use cases is expense filing. So like when I'm in SF, I have all these expenses. And so I built this little Claude project that uses one of these finance APIs to just download all my credit card transactions. And then it decides these are probably the expenses that I'm going to have to file. And then I have two sub-agents, one that represents me and one that represents the company. And they do battle to figure out what's the proper actual set of expenses — it's like an auditor sub-agent and a pro-Dan sub-agent.

BORIS: Yeah, the sort of opponent processor pattern seems to be an interesting one. I feel like when sub-agents were first becoming a thing, actually what inspired us — there's a Reddit thread a while back where someone made sub-agents for like there was a front-end dev and a backend dev and like a designer, testing dev, PM sub-agent. And this is like, you know, it's cute — it feels a little maybe too anthropomorphic — maybe there's something to this. But I think the value is actually the uncorrelated context windows where you have these two context windows that don't know about each other and this is kind of interesting and you tend to get better results this way.

DAN: What about you? Do you have any interesting sub-agents you use?

CAT: I've been tinkering with one that is really good at front-end testing. So it uses Playwright to see all right, what are all the errors that are client side and pull them in and try to test more steps of the app. It's not totally there yet, but I'm seeing signs of life and I think it's the kind of thing that we could potentially bundle in one of our plugin marketplaces.

BORIS: I've used something like that just with Puppeteer and just watching it build something and then open up the browser and then be like, "Oh, I need to change this." It's like, "Oh my god." It's really cool. I think we're starting to see the beginnings of this massive multi-sub-agent thing. I don't know what they call this — swarms or something like that. There's actually an increasing number of people internally at Anthropic that are using a lot of credits every month — like spending over a thousand bucks every month. And this percent of people is growing actually pretty fast. And I think the common use case is code migration. What they're doing is framework A to framework B. There's the main agent, it makes a big to-do list for everything and then just kind of map-reduces over a bunch of sub-agents. So you instruct Claude like "start 10 agents and then just go 10 at a time and just migrate all the stuff over."

DAN: What would be a concrete example of the kind of migration that you're talking about?

BORIS: I think the most classic is lint rules. There's some kind of lint rule you're rolling out. There's no autofixer because static analysis can't really — it's kind of too simplistic for it. I think other stuff is framework migrations. We just migrated from one testing framework to a different one. That's a pretty common one where it's super easy to verify the output.


VIII. Use Claude Code to Turn Past Code Into Leverage

DAN: One of the things I found — and this is both for projects inside of Every and then just open source projects — if you're someone building a product and you want to build a feature that's been done before, so maybe an example that people might need to implement a bunch is memory. How do you do memory? Because we have a bunch of different products internally, you can just spawn Claude sub-agents to be like, "How do these three other products do it?" And there's possibility for just tacit code sharing where you don't need to have an API or you don't need to ask anyone. You can just be like, "How do we do this already?" And then use the best practices to build your own. And you can also do that with open source because there's tons of open source projects where people have been working on memory for a year and it's really good. You can be like, "What are the patterns that people have figured out and which ones do I want to implement?"

CAT: Totally. You can also connect your version control system. If you've built a similar feature in the past, Claude Code can use those APIs like query GitHub directly and find how people implemented a similar feature in the past and read that code and copy the relevant parts.


IX. Memory, Logs, and Compounding Engineering

DAN: Is there — have you found any use for log files of, "Okay, here's the full history of how I implemented it." And is that important to give to Claude? And how are you making it useful?

BORIS: Some people swear by it. There are some people at Anthropic where for every task they do, they tell Claude Code to write a diary entry in a specific format that just documents like what did it do, what did it try, why didn't it work. And then they even have these agents that look over the past memory and synthesize it into observations. I think this is the starting — the budding — there's something interesting here that we could productize. But it's a new emerging pattern that we're seeing that works well. I think the hard thing about one-shotting memory from just one transcript is that it's hard to know how relevant a specific instruction is to all future tasks. Like our canonical example is if I say "make the button pink," I don't want you to remember to make all buttons pink in the future. And so I think synthesizing memory from a lot of logs is a way to find these patterns more consistently.

DAN: It seems like you probably need — there's some things where you're going to know you'll be able to synthesize or summarize in this sort of top-down way — like this will be useful later — and you'll know the right level of abstraction at which it might be useful. But then there's also a lot of stuff where it's like any given commit log like "make the button pink" could be useful for kind of an infinite number of different reasons that you're not going to know beforehand. So you also need the model to be able to look up all similar past commits and surface that at the right time. Is that something that you're also thinking about?

BORIS: Yeah, I think there could be something like that. And maybe one way to see it is this kind of traditional memory storage work — like memex kind of stuff — where you just want to put all the information into the system and then it's kind of a retrieval problem after that. I think as the model also gets smarter, it naturally — I've seen it start to naturally do this with Sonnet 4.5 where if it's stuck on something, it'll just naturally start looking through git history and be like, "Oh, okay. Yeah, this is kind of an interesting way to do it."

DAN: One of the things we're doing inside of Every — I feel like it has really changed the way that we do engineering because everyone is Claude Code, CLI-build. And we have this engineering paradigm that we call compounding engineering where in normal engineering every feature you add makes it harder to add the next feature. And in compounding engineering your goal is to make the next feature easier to build from the feature that you just added. And the way that we do that is we try to codify all the learnings from everything that we've done to build the feature. So like how did we make the plan and what parts of the plan needed to be changed? Or when we started testing it, what issues did we find? What are the things that we missed? And then we codify them back into all the prompts and all the sub-agents and all the slash commands so that the next time when someone does something like this, it catches it and that makes it easier.

And that's why for me, for example, I can hop into one of our codebases and start being productive even though I don't know anything about how the code works because we have this built-up memory system of all the stuff that we've learned as we've implemented stuff. But we've had to build that ourselves. I'm curious, are you working on that kind of loop so that Claude Code does that automatically?

BORIS: Yeah, we're starting to think about it. It's funny. We just heard the same thing from Fiona. She just joined the team. She's our manager. She hasn't coded in like 10 years, something like that. And she was landing PRs on her first day. And she was like, "Yeah, not only did I kind of forgot how to code and Claude Code made it super easy to just get back into it, but also I didn't need to ramp up on any context because I kind of knew all this." And I think a lot of it is about when people put up pull requests for Claude Code itself — and I think our customers tell us that they do similar stuff pretty often — if you see a mistake I'll just be like, "@claude add this to CLAUDE.md" so that the next time it just knows this automatically.

You can instill this memory in a variety of ways. You can say @claude add it to CLAUDE.md. You can also say "@claude write a test." You know, that's an easy way to make sure this doesn't regress. And I don't feel bad asking anyone to write tests anymore, right? It's just super easy. I think probably close to 100% of our tests are just written by Claude. And if they're bad, we just won't commit it. And then the good ones stay committed. And then also lint rules are a big one. For stuff that's enforced pretty often, we actually have a bunch of internal lint rules. Claude writes 100% of these. And this is mostly just "@claude in a PR write this lint rule."


X. The Product Decisions for Building an Agent That's Simple and Powerful

CAT: And yeah, there's sort of this problem right now about how do you do this automatically? And I think generally how Cat and I think about it is we see this power user behavior and the first step is how do you enable that by making the product hackable so the best users can figure out how to do this cool new thing. But then really the hard work starts of how do you take this and bring it to everyone else.

BORIS: And for me, I keep myself in the "everyone else" bucket. Like, you know, I don't really know how to use Vim. I don't have this crazy tmux setup. I have a pretty vanilla setup. So if you can make a feature that I'll use, it's a pretty good indicator that other average engineers will use it.

DAN: Tell me about that because that's something I think about all the time — making something that is extensible and flexible enough that power users can find novel ways to use it that you would not have even dreamed of. But it's also simple enough that anyone can use it and they can be productive with it. And you can pull what the power users find back into the basic experience. How do you think about making those design and product decisions so that you enable that?

BORIS: In general we think that every engineering environment is a little bit different from the others and so it's really important that every part of our system is extensible. Everything from your status line to adding your own slash commands through to hooks which let you insert a bit of determinism at pretty much any step in Claude Code. So we think these are the basic building blocks that we give to every engineer that they can play with.

CAT: For plugins — plugins is actually our attempt to make it a lot easier for the average user like us to bring these slash commands and hooks into our workflows. And so what plugins does is it lets you browse existing MCP servers, existing hooks, existing slash commands and just write one command in Claude Code to pull that in for yourself.

BORIS: There's this really old idea in product called latent demand which I think is probably the main way that I personally think about product and thinking about what to build next. It's a super simple idea. You build a product in a way that is hackable that is kind of open-ended enough that people can abuse it for other use cases it wasn't really designed for. Then you see how people abuse it and then you build for that because you kind of know there was demand for it. And when I was at Meta, this is how we built all the big products. I think almost every single big product had this nugget of latent demand in it. For example, something like Facebook Dating — it came from this idea that when we looked at who looks at people's profiles, I think 60% of views were between people of opposite gender — kind of traditional setup — that were not friends with each other. And so we're like, "Okay, maybe if we launch a dating product we can harness this demand that exists." For Marketplace it was pretty similar — I think 40% of posts in Facebook groups at the time were buy/sell posts. And so, "Okay, people are trying to use this product to buy and sell. We just build a product around it — that's probably going to work."

And so we think about it kind of similarly. But also we have the luxury of building for developers and developers love hacking stuff and they love customizing stuff. And as a user of our own product, it makes it so fun to build and use this thing. And so we just build the right extension points. We see how people use it and that kind of tells us what to build next. Like for example, we got all these user requests where people were like, "Dude, Claude Code is asking me for all these permissions and I'm out here getting coffee. I don't know that it's asking me for permissions. How could I just get it to ping me on Slack?" And so we built hooks. Dixon built hooks so that people could get pinged on Slack. And you could get pinged on Slack for anything that you want to get pinged on Slack for. And it was very much — people really wanted the ability to do something. We didn't want to build the integration ourselves. And so we exposed hooks for people to do that.


XI. Making Claude Code Accessible to the Non-Technical User

DAN: You recently rebranded how you talk about Claude Code to be this more general purpose agent SDK. Was that driven by some latent demand where you sort of saw there's a more general purpose use case for what you built?

CAT: We realized that similar to how you were talking about using Claude Code for things outside of coding, we saw this happen a lot. We get a ton of stories of people who are using Claude Code to help them write a blog and manage all the data inputs and take a first pass in their own tone. We find people building email assistants on this. I use it for a lot of just market research. Because at the core it's an agent that can just go on for an infinite amount of time as long as you give it a concrete task and it's able to fetch the right underlying data. So one of the things I was working on was I wanted to look at all the companies in the world and how many engineers they had and to create a ranking. And this is something that Claude Code can do even though it's not a traditional coding use case.

So we realized that the underlying primitives were really general. As long as you have an agent loop that can continue running for a long period of time and you're able to access the internet and write code and run code, pretty much — if you squint — you can kind of build anything on it. And by the point where we rebranded it from the Claude Code SDK to the Claude Agent SDK, there were already many thousands of companies using this thing and a lot of those use cases were not about coding. Both internally and externally we saw that — health assistants, financial analysts, legal assistance. It was pretty broad.

DAN: What are the coolest ones?

BORIS: I feel like actually you had Noah Brier on the podcast recently. I thought the Obsidian mind-mapping note-keeping use case is really cool. It's funny — it's insane how many people use it for this particular combination. I think some other coding or coding-adjacent use cases that are kind of cool — we have this issue tracker for Claude Code. The team's just constantly underwater trying to keep up with all the issues coming in. There's just so many. And so Claude dedupes the issues and it automatically finds duplicates and it's extremely good at it. It also does first pass resolution. So usually when there's an issue it'll proactively put up a PR internally — this is a new thing that Enigo on the team built. So this is pretty cool. There's also on-call and collecting signals from other places like getting Sentry logs and getting logs from BigQuery and collating all this. Plus just really good at doing this because it's all just bash in the end.

DAN: Is it — when it's collating logs or doing issues, is that like you have Claudes continually running in the background? And is that something that you're building for?

BORIS: It gets triggered for that particular one. It gets triggered whenever a new issue is filed. So it runs once but it can choose to run for as long as it needs.

DAN: What about the idea of Claudes always running?

BORIS: Ooh, proactive Claudes. I think it's definitely where we want to get to. I would say right now we're very focused on making Claude Code incredibly reliable for individual tasks. And if you think about multi-line autocomplete and then single-turn agents and then now we're working on Claude Code that can complete tasks — if you trace this curve eventually you go to even higher levels of abstraction, even more complicated tasks. And then hopefully the next step after that is a lot more productivity. Just understanding what your team's goals are, what your goals are, being able to say, "Hey, I think you probably want to try this feature and here's a first pass at the code and here are the assumptions I made. Are these correct?"

CAT: I can't wait. And I think probably right after that is Claude is now your manager.

BORIS: That's not in the plan.


XII. The Next Form Factor for Coding With AI

DAN: Here's a good one from the team. Why did you choose agentic RAG over vector search in your architecture? And are vector embeddings still relevant?

BORIS: Actually initially we did use vector embeddings. They're just really tricky to maintain because you have to continuously reindex the code and they might get out of date and you have local changes. So those need to make it in. And then as we thought about what does it feel like for an external enterprise to adopt it, we realized that this exposes a lot more surface area and security risk. We also found that actually Claude Code is really good and Claude models are really good at agentic search. So you can get to the same accuracy level with agentic search and it's just a much cleaner deployment story. If you do want to bring semantic search to Claude Code, you can do so via an MCP tool. So if you want to manage your own index and expose an MCP tool that lets Claude Code call that, that would work.

DAN: What do you think are the top MCPs to use with Claude Code?

BORIS: Puppeteer and Playwright are pretty high up there. Definitely. Sentry has a really good one. Asana has a really good one.

DAN: Do you think there are any power user tips that you see people inside of Anthropic or other big Claude Code power users that people don't know about but should?

BORIS: One thing that Claude Code doesn't naturally like to do, but that I personally find very useful, is — Claude Code doesn't naturally like to ask questions. But if you're brainstorming with a thought partner, a collaborator, usually you do ask questions back and forth. And so this is one of the things that I like to do, especially in plan mode. I'll just tell Claude Code, "Hey, we're just brainstorming this thing. Please ask me questions if there's anything you're unsure about." I want you to ask questions and it'll do it. And I think that actually helps you arrive at a better answer there.

There's also so many tips that we can share. I think there's a few really common mistakes I see people make. One is not using plan mode enough. This is just super important. And I think people that are kind of new to coding — they kind of assume this thing can do anything and it can't. It's not that good today and it's going to get better but today it can one-shot some tests. It can't one-shot most things. And so you kind of have to understand the limits and you have to understand where you get in the loop. Something like plan mode can 2–3x success rates pretty easily if you land on the plan first.

Other stuff that I've seen power users do really well — companies that have really big deployments of Claude Code — having settings.json that you check into the codebase is really important because you can use this to pre-allow certain commands so you don't get permission-prompted every time and also to block certain commands. Let's say you don't want web fetch or whatever. And this way as an engineer I don't get prompted and I can check this in and share it with the whole team so everyone gets to use it.

DAN: I get around that by just using "dangerously skip permissions."

BORIS: Yeah, we kind of have this but we don't recommend it. It's a model, you know, it can do weird stuff. I think another cool use case that we've seen is people using stop hooks for interesting stuff. So stop hook runs whenever the turn is complete. The assistant did some tool calls back and forth and it's done and it returns control back to the user — then we run the stop hook. And so you can define a stop hook that's like, "If the tests don't pass, return the text 'keep going.'" And essentially you can just make the model keep going until the thing is done. And this is just insane when you combine it with the SDK and this kind of programmatic usage — you know, this is a stochastic thing, it's a nondeterministic thing, but with scaffolding you can get these deterministic outcomes.

DAN: So you guys started this CLI paradigm shift. Do you think the CLI is the final form factor? Are we going to be using Claude Code in the CLI primarily in a year or in three years, or is there something else that's better?

CAT: I mean, it's not the final form factor, but we are very focused on making sure the CLI is the most intelligent that we can make it and that it's as customizable as possible.

BORIS: Yeah, Cat's asking me to talk about this because no one knows — this stuff's just moving so fast. No one knows what these form factors are. Right now I think our team is in experimentation mode. So we have CLI, then we came out with the IDE extension. Now we have a new IDE extension that's a GUI — it's a little more accessible. We have @claude on GitHub so you can just add Claude anywhere. Now there's @claude, there's Claude on web and on mobile, so you can use it on any of these places. And we're just in experimentation mode, so we're trying to figure out what's next.

I think if we kind of zoom out and see where this stuff is headed, one of the big trends is longer periods of autonomy. And so with every model, we kind of time how long can the model just keep going and do tasks autonomously. And just, you know, in dangerous mode in a container, keep auto-compacting until the task is done. And now we're on the order of double-digit hours. I think the last model is like 30 hours, something like this. And the next model is going to be days.

And as you think about parallelizing models, there's a bunch of problems that come out of this. One is what is the container this thing runs in because you don't want to have to keep your laptop open.

DAN: I have that right now because I'm doing a lot of DSPY prompt optimization and it's on my laptop and I don't want to close it — I'm in the middle with my laptop open because I don't want to close it.

BORIS: Yeah. That's right. We've visited companies before — customers — and everyone's just walking around with their Claude Codes open. "Is this running?" So I think one is kind of getting away from this mode. And then I also think pretty soon we're going to be in this mode of Claudes monitoring Claudes. And I don't know what the right form factor for this is because as a human you need to be able to inspect this and see what's going on. But also it needs to be Claude-optimized where you're optimizing for bandwidth between the Claude-to-Claude communication. So my prediction is terminal is not the final form factor. My prediction is there's going to be a few more form factors in the coming months — maybe like a year or something like that. And it's going to keep changing very quickly.


XIII. UX Discoveries and Terminal Design

DAN: I teach a lot of Claude Code to a lot of Every subscribers. And I think one of the big things is just the terminal is intimidating. And just being on a call with subscribers being like, "Here's how you open the terminal and you're allowed to do this even if you're non-technical" — that is a big deal. How do you think about that?

BORIS: One of the people on our marketing team started using Claude Code because she was writing some content that touched on Claude Code and I was like, "You should really experience it." And she got like 30 popups on her screen where she had to accept various permissions because she'd never used a terminal before. So I completely see eye to eye with you on that. It's definitely hard for non-engineers and there's even some engineers we've found who aren't fully comfortable with working day-to-day in the terminal. Our VS Code GUI extension is our first step in that direction because you don't have to think about the terminal at all. It's like a traditional interface with a bunch of buttons. I think we are working on more graphical interfaces. Claude Code on the web is a GUI. I think that actually might be a good starting point for people who are less technical.

There was this magic moment maybe a few months ago where I walked into the office and the data scientists at Anthropic — they sit right next to the Claude Code team — and the data scientist just had Claude Code running on their computers and I was like, "What is this? How did you figure this out?" I think it was Brandon — he was the first one to do it and he was like, "Oh yeah, I just installed it. I work on this product so I should use it." And I was like, "Oh my god." So he figured out how to use a terminal and JS — he hasn't really done this kind of workflow before. Obviously very technical. So I think now we're starting to see all these code-adjacent functions — people that use Claude Code. And yeah, it's kind of interesting from a latent demand point of view. These are people hacking the product so there's demand to use it for this. And so we want to make it a little bit easier with more accessible interfaces. But at the same time, for Claude Code, we're laser focused on building the best product for the best engineers. We're focused on software engineering and we want to make this really good but we want to make it a thing that other people can hack.

DAN: Sometimes Claude Code will write code that's a bit verbose. But you can just tell it to simplify it and it does a really good job.

BORIS: Yeah. Sometimes you're like, "Hey, this should be a one-line change" and it'll write five lines and you're like, "Simplify it" and it understands immediately what you mean and it'll fix it. I think a lot of people on our team do that, too.

DAN: Why not then push that into a slash command or the harness to make it just happen automatically?

BORIS: We do have instructions for this in the CLAUDE.md. I think it impacts such a low percentage of conversations that we don't want it to over-rotate in the other direction. And the reason why not a slash command is because you actually don't need that much context. I think slash commands are really good for situations where you would otherwise need to write two-three lines. But for "simplify it" you can just write "simplify it" and it gets it.

DAN: How do you keep track of and carry forward the things you learn from prototype to prototype? Especially if one person is prototyping it and then you're like, "I'm going to take it over, I'm going to do 20 more."

BORIS: There's maybe a few elements of it. One is the style guide. There's elements of style that we discover. And I think a lot of this is building for the terminal — we're kind of discovering a new design language for the terminal and building it as we go. And I think some of this you can codify in a style guide. So this is our CLAUDE.md. But then there's this other part that's kind of product sense where I don't think the model totally gets it yet. And maybe we should be trying to find ways to teach the model this product sense about "this works and this doesn't." Because in product, you want to solve the person's problem in the simplest way possible and then delete everything else that's not that and just get everything out of the way. You align the product to the intent as cleanly as possible. And maybe the model doesn't totally get that yet.

DAN: It never — it doesn't really feel what it's like to use Claude Code. The model doesn't use Claude Code.

BORIS: Yeah. And so I think when Claude Code can test itself and it can use itself — and we do this when developing and it can see UI bugs and things like that — I don't know, maybe we should just try prompting it though. Honestly a lot of the stuff is as simple as that. When there's some new idea usually you just prompt it and often it just works. Maybe we should just try that.

CAT: A lot of the prototypes are actually the UX interactions. And so I think once we discover a new UX interaction like Shift-Tab for auto-accept — I think Boris figured out —

BORIS: That was Igor actually. We went back and forth — we did like dueling prototypes for like a week.

CAT: Yeah, Shift-Tab felt really nice. And then one of the current plan mode iterations uses Shift-Tab because it's actually just another way to tell the model how agentic it should be. And so I think as more features use the same interaction, you form a stronger mental model for what should go where.

BORIS: Or like thinking — I think is another really good one. Before we released Claude Code, or maybe it was the first thinking model — was it 3.7? I forget. But it was able to think and we're brainstorming how do we toggle thinking? And then someone was just like, "What if you just ask the model to think in natural language?" And it knows how to think. And we're like, "Okay, sweet, let's do that." And we did that for a while and then we realized that people were accidentally toggling it. So they were like "don't think" and then the model was like, "Oh, I should think." It just started thinking. And so we had to tune it out so "don't think" didn't trigger it. But then it still wasn't obvious. But then we made a UX improvement to highlight the thinking and that was so fun. It felt really magical. When you do "ultra think" it's like rainbow or whatever.

And then with Sonnet 4.5 we actually find a really big performance improvement when you turn on extended thinking. And so we made it really easy to toggle it because sometimes you want it, sometimes you don't — for a really simple task, you don't want the model to think for five minutes. You want it to just do the thing. And so we used Tab as the interaction to toggle it. And then we unshipped a bunch of the thinking words. Although I think we kept "ultra think" just for sentimental reasons. It was such a cool UX.


XIV. The Art of Unshipping

DAN: Do you think there's some new metric that's about what you deleted? I think programmers have always felt like deleting a bunch of code feels really good, but there's something about — because you can build stuff so fast, it becomes more important to also delete stuff.

BORIS: I think my favorite kind of diff to see is a red diff. This is the best. Whenever I'm like, "Yeah, bring it on. Another one." But it's hard because anything you ship, people are using it. And so you got to keep people happy. I think generally our principle is if we unship something, we need to ship something even better that people can take advantage of that matches that intent even better.


XV. Productivity and the Competitive Landscape

BORIS: And yeah, I think this is kind of back to how do you measure Claude Code and the impact of it. This is something every company, every customer asks us about. Internally at Anthropic I think we doubled in size since January or something like that but then productivity per engineer has increased almost 70% in that time, measured by — I think we actually measured it in a few ways — but PRs are the simplest one and the main one. But like you said, this doesn't capture the full extent of it because a lot of this is making it easier to prototype, making it easier to try new things, making it easier to do these things that you never would have tried because they're way below the cut line. You're launching a feature and there's this wish list of stuff — now you just do all of it because it's so easy and you just wouldn't have done it.

So yeah, it's really hard to talk about. And then there's this flip side of it where more code is written. So you have to delete more code. You have to code-review more carefully and automate code review as much as you can. There's also an interesting new product management challenge because you can ship so much that you end up — it doesn't feel as cohesive because you could just add a button here and a tab there and a little thing here. It's much easier to build a product that has all the features you want but doesn't have any sort of organizing principle because you're just shipping lots of stuff all the time.

CAT: I think we try to be pretty disciplined about this and making sure that all the abstractions are really easy to understand for someone even if they just hear the name of the feature. We have this principle that I believe Boris brought to the team that I really like where we don't want a "new user experience." Everything should be so intuitive that you just drop in and it just works. And I think that's really set the bar really high for making sure every feature is really intuitive.

DAN: How do you do that with a conversational UI? Because when there's not a bunch of buttons and knobs and it's just a blank text box to start, how do you think about making it intuitive?

BORIS: There's a lot of little things that we do. We teach people that they can use the question mark to see tips. We show tips as Claude Code is working. We have the change log on the side. We tell you about, "Oh, there's a new model that's out" or we show you at the bottom — we have a notification section for thinking. I think there's just subtle ways in which we tell users about features. The other thing that's really important is to just make sure that all the primitives are very clearly defined — hooks have a common meaning in the developer ecosystem. Plugins have a very common meaning. And just making sure that what we build matches what the average developer would immediately think of when they hear that.

There's also this progressive disclosure thing — anytime in Claude Code when you run it you can hit Ctrl-O to see the full raw transcript, the same thing the model sees. And we don't show you this until it's actually relevant. So when there's a tool result that's collapsed, then we'll say "use Ctrl-O to see it." So we don't want to put too much complexity on you at the start because this thing can do anything.

I think there's this other new principle which we've just started exploring which is the model teaches you how to use the thing. So you can ask Claude Code about itself and it kind of knows to look up its own documentation to tell you about it. But we can also go even deeper — for example, slash commands are a thing that people can use but also the model can call slash commands. And maybe you see the model calling it and then you'll be like, "Oh yeah, I guess I can do that too."

DAN: How has it changed — when you first started doing this, Claude Code was this singular thing, this singular way of thinking about using AI through a CLI. Other people had stuff like this but it felt like this shift. And now there's a whole landscape of everyone going "CLI, CLI, CLI." How has that changed how you think about building, how it feels to build, and how are you dealing with the pressure of the race that you're in?

BORIS: I think for me, imitation is the greatest flattery. So it's awesome and it's cool to see all this other stuff that everyone else is building inspired by this. And I think this is ultimately the goal — to inspire people to build this next thing for this incredible technology that's coming. And that's just really exciting. Personally, I don't really use a lot of other tools. Usually when something new comes out, I'll maybe just try it to get a vibe. But otherwise I think we're pretty focused on just solving problems that we have and our customers have and building the next thing.

DAN: I think there's this underlying expectation that using AI shouldn't have to be a skill because it just does whatever you say. And you're like, well, whatever you say is going to matter for what it does. So if you can say things better it's going to do better.

BORIS: It changes with every model though. That's the hard part. Prompt engineer was a job and now famously it's not a job anymore. And there's going to be more jobs that are not jobs anymore — these kind of micro-skills that you have to learn to use this thing. And as the model gets better it can just interpret it better. But I think that's also for us — this is part of this humility that we have to have building a product like this that we just really don't know what's next and we're just trying to figure it out along with everyone else. We're just here for the ride.

DAN: That's why it's cool that you're building it for yourself because I think that's the best way to know. You're sort of living in the future. You're using it all the time. And it's pretty clear what's missing.

BORIS: Yeah. This is the luxurious thing about building dev tools — you're your own customer. I think it's also really a unique thing about AI because it sort of reset the game board for all software. Anything that you do for something that you want to use on your computer — if you're building it with AI, there's a good chance that hasn't been done before because the whole landscape has been reset. And so it's a uniquely exciting time to build stuff for yourself.


XVI. Outro

DAN: I also have my little email response agent that drafts responses for me but I don't use email that much so —

BORIS: Oh, and I knew it wasn't you responding. That's why it's seven days delayed.

DAN: The agent's just doing a very thorough job.

BORIS: Yeah, Agent SDK is cool though. It always just feels amazing how much we're able to build with such a small team. I feel like the other thing that's really cool is that people are just shifting their mindset from docs to demos. Internally, our currency is actually demos. You want people to be excited about your thing — show us 15 seconds of what it can do. And we find that everyone on the team now has this indoctrinated demo culture for sure. And I think that's better because there's a lot of things that you might have in your head that if you're a great writer, maybe you could figure out how to explain it. But it's just really hard to explain. But if someone can see it, they get it immediately.

And I think that's happening for product building, but it's also happening for all sorts of other types of creative endeavors like making a movie for example. You had to pitch it, but now you can just be like, "I made this Sora video" and you can kind of see the glimmer of the thing you're trying to make for very cheap. And so that means you don't have to spend time convincing people as much. You can just be like, "Here, I made it."

DAN: And also as a builder you can just make it and then make it again and then make it again until you're happy. I feel like the flip side is you used to make a doc or whiteboard something or I would draw stuff in Sketch or Figma or whatever. And now we'll just build it until I like how it feels. And it's just so easy to get that feeling out of it now. You could see it visually before or you could describe it in words but you could never get the vibe. And now the vibe is really easy.

BORIS: Yeah. And you built plan mode like three times. Yeah, because of this. You built it and then you threw it out and rebuilt it and then threw it out and rebuilt it.

DAN: Or like to-do's — Sid built the original version, also three or four prototypes, and then I prototyped maybe 20 versions after that in a day. I think pretty much everything we released there was at least a few prototypes behind it.

DAN: I loved this. Did we answer all of your team's questions?

BORIS: I think we did.

DAN: Well, thank you. This was amazing. I'm really glad I got to talk to you and keep building.

BORIS: Thank you for having us.

CAT: Yeah. Thanks.


Transcript source: "AI and I" (Every) — The Secrets of Claude Code From the Engineers Who Built It. Formatted for readability.

Introducing Claude Code, from Boris Cherny and Cat Wu

Introduction

"Should we be doing like big smile or?" — "No, what you're doing—" — "Big smile's creepy." — "That's sort of what I'm getting at."

BORIS: I'm Boris, I'm an engineer.

CAT: I'm Cat, I'm a product manager.

We love seeing what people build with Claude, especially with coding, and we want to make Claude better at coding for everyone. We built some tools, one of which we're sharing today.

We're launching Claude Code as a research preview. Claude Code is an agentic coding tool that lets you work with Claude directly in your terminal. We're gonna show you an example of it in action.


I. Setting Up the Demo

So we have a project here. It's a Next.js app. Let's open it up in an instance of Claude Code.

Now that we've done this, Claude Code has access to all of the files in this repository. We don't know much about this codebase. It looks like an app for chatting with a customer support agent.


II. Exploring the Codebase

Let's get Claude to help explain this codebase to us. Claude starts by reading the higher level files and then it dives in deeper. Now it's going through all the components in the project.

Cool, here's its final analysis.


III. Making Changes — Chat History & New Chat Button

So say I was asked to replace this left sidebar with a chat history, and I'm also gonna add a new chat button. I'm gonna ask Claude to help me out here.

We haven't specified any files or paths and Claude's already finding the right files to update by itself. Claude can also show its thinking and we can see how it's decided to tackle this problem.

Claude's asking me if I wanna accept these changes. I'll say, yeah.

Now Claude's updating the nav bar, adding a button and icons as well. Next, it's updating the logic to ensure the saving state works correctly.

After a bit, Claude completes the task. Here's a summary of what it's done.

Let's take a look at the app. So we're seeing the new chat button and new chat history section on the left. Let's check if I can start a new chat while keeping the previous one saved. I'll try out the new chat button too.

Great, it's all working.


IV. Adding Tests

Now let's ask Claude to add some tests to make sure that the features we just added work.

Claude's asking for permission to run commands. We'll say yes.

Claude is making some changes to run these tests. After getting the results, it continues with its plan until all tests pass.

After a few minutes, it looks like we're good to go.


V. Fixing Build Errors

Now I'm going to ask Claude to compile the app and see if we get any build errors. Let's see what it finds.

Claude identified the build errors and is now fixing them. Then it tries to build again. It'll keep going until it works.


VI. Committing and Pushing to GitHub

Now let's finish everything up by asking Claude to commit its changes and push them to GitHub.

Claude creates a summary and a description of our changes. And it'll push the changes to GitHub.

That's it, that's an example of what Claude Code can do. We can't wait for people to start building with it.


Transcript source: Anthropic — Introducing Claude Code. YouTube. Formatted for readability.

2.20.2026

Claude Code And Its Creator Boris Cherny



Introduction

"At Anthropic, the way that we thought about it is we don't build for the model of today. We build for the model six months from now." — "All of Claude Code has just been written and rewritten and rewritten and rewritten over and over and over. There is no part of Claude Code that was around 6 months ago." — "Maybe in a month, no more need for plan mode in a month. Oh my god."

HOST: Welcome to another episode of the Lightcone and today we have an extremely special guest, Boris Cherny, the creator engineer of Claude Code. Boris, thanks for joining us.

BORIS: Thanks for having me.

HOST: Thanks for creating a thing that has taken away my sleep for about 3 weeks straight. I am very addicted to Claude Code and it feels like rocket boosters. Has it felt like this for people — like for you know months at this point?

BORIS: I think it was like end of November is where a lot of my friends said like something changed. I remember for me I felt this way when I first created Claude Code and I didn't yet know if I was on to something. I kind of felt like I was on to something and then that's when I wasn't sleeping. Yeah. And that was just like three straight months. This was September 2024. It was like three straight months. I didn't take a single day vacation. Worked through the weekends. Worked every single night. I was just like, "Oh my god, this is — I think this is going to be a thing. I don't know if it's useful yet because it couldn't actually code yet."






I. The Most Surprising Moment in the Rise of Claude Code

HOST: If you look back on those moments to now, like what would be the most surprising thing about this moment right now?

BORIS: It's unbelievable that we're still using a terminal. That was supposed to be the starting point. I didn't think that would be the ending point. And then the second one is that it's even useful because at the beginning it didn't really write code. Even in February when we GA'd it wrote maybe like 10% of my code or something like that. I didn't really use it to write code. It wasn't very good at it. I still wrote most of my code by hand. So the fact that our bets paid off and it got good at the thing that we thought it was going to get good at because it wasn't obvious. At Anthropic, the way that we thought about it is we don't build for the model of today. We build for the model 6 months from now. And that's actually like still my advice to founders that are building on LLMs — just try to think about what is that frontier where the model is not very good at today because it's going to get good at it and you just have to wait.


II. How Boris Came Up with the Idea for Claude Code

HOST: Going back though, do you remember when you first got the idea? Can you just talk us through that? Like was it some spark or what was even the first version of it in your mind?

BORIS: You know, it's funny. It was so accidental that it just kind of evolved into this. As Anthropic I think for Anthropic the bet has been coding for a long time and the bet has been the path to safe AGI is through coding and this has kind of always been the idea and the way you get there is you teach the model how to code then you teach it how to use tools then you teach it how to use computers. And you can kind of see that because the first team that I joined at Anthropic it was called the Anthropic Labs team and it produced three products — it was Claude Code, MCP, and the desktop app. So you can kind of see how these weave together.

The particular product that we built — no one asked me to build a CLI. We kind of knew maybe it was time to build some kind of coding product because it seemed like the model was ready, but no one had yet really built the product that harnessed this capability. So like still there's this insane feeling of product overhang. But at the time it was just even crazier because no one had built this yet. And so I started hacking around and I was like, "Okay, we build a coding product. What do I have to do first? I have to understand how to use the API because I hadn't used the Anthropic API at that point." And so I just built a little terminal app to use the API. That's all that I did. And it was a little chat app because you know like you think about the AI applications of the time and for non-coders today what are most people using is just a chat app. So that's what I built. And it was in a terminal. I can ask questions. I give answers. Then I think tool use came out. I just wanted to try out tool use because I don't really understand what this is. I was like tool use is cool. Is this actually useful? Probably not. Let me just try it.

HOST: You built it in terminal just because it was the easiest way to get something up and running.

BORIS: Yes. Because I didn't have to build a UI.

HOST: Okay. It was just me at that point. The IDEs, Cursor, Windsurf taking off. Were you under any pressure or getting lots of suggestions of, hey, we should build this out as a plugin or as a fully featured IDE itself?

BORIS: There was no pressure because we didn't even know what we wanted to build. Like the team was just in explore mode. We knew vaguely we wanted to do something in coding, but it wasn't obvious what. No one was high confidence enough. That was my job to figure out. And so I gave the model the bash tool. That was the first tool that I gave it just because I think that was literally the example in our docs. I just took the example. It was in Python. I just ported it to TypeScript because that's how I wrote it. I didn't know what the model could do with bash. So I asked it to read a file. It could cat the file. So that was cool. And then I was like, "Okay, what can you actually do?" And I asked, "What music am I listening to?" It wrote some AppleScript to script my Mac and look up the music in my music player.

HOST: Oh my god.

BORIS: And this was Sonnet 3.5. And I didn't think the model could do that. And that was my first — I think ever — feel-the-AGI moment where I was just like, "Oh my god, the model — it just wants to use tools. That's all it wants."


III. The Elegant Simplicity of Terminals

HOST: That's kind of fascinating. I mean it's very contrarian that Claude Code works so well in such an elegant simple form factor. Terminals have been around for a really long time and that seemed to be like a good design constraint that allowed a lot of interesting developer experiences. It doesn't feel like working. It just feels fun as a developer. I don't think about files where everything is and that came by accident almost.

BORIS: Yeah, it was an accident. I remember after the terminal started to take off internally — and honestly after building this thing I think like 2 days after the first prototype I started giving it to my team just for dogfooding because if you come up with an idea and it seems useful the first thing you want to do is give it to people to see how they use it. And then I came in the next day and then Robert who sits across from me who's another engineer he just had Claude Code on his computer and he was using it to code. I was like, "What are you doing? This thing isn't ready. It's just a prototype." But yeah, it was already useful in that form factor.

And I remember when we did our launch review to kind of launch Claude Code externally, this was in December, November, something like that in 2024. Dario asked and he was like, "The usage chart internally like the DAU chart is vertical. Are you forcing engineers to use it? Why are you mandating them?" And I was just like, "No, no, we didn't. I just posted about it and they've just been telling each other about it." Honestly, it was just accidental. We started with the CLI because it was the cheapest thing and it just kind of stayed there for a bit.


IV. The First Use Cases

HOST: So in that 2024 period, how were the engineers using it? Were they sort of shipping code with it yet or were they using it in a different way?

BORIS: The model was not very good at coding yet. I was using it personally for automating git. I think at this point I've probably forgotten most of my git because Claude Code has just been doing it for so long. But yeah, automating bash commands — that was a very early use case and like operating Kubernetes and things like this. People were using it for coding. There were some early signs of this. I think the first use case was actually writing unit tests because it's a little bit lower risk and the model was still pretty bad at it but people were figuring it out and they were figuring out how to use this thing.

And one thing that we saw is people started writing these markdown files for themselves and then having the model read that markdown file. And this is where CLAUDE.md came from. Probably the single for me biggest principle in product is latent demand. And just every bit of this product is built through latent demand after the initial CLI. And so CLAUDE.md is an example of that.

There's this other general principle that I think is maybe interesting where you can build for the model and then you can build scaffolding around the model in order to improve performance a little bit and depending on the domain you can improve performance maybe 10, 20% something like that and then essentially the gain is wiped out with the next model. So either you can build the scaffolding and then get some performance gain and then rebuild it again or you just wait for the next model and then you kind of get it for free. The CLAUDE.md and kind of the scaffolding is an example of that and really I think that's why we stayed in the CLI is because we felt there is no UI we could build that would still be relevant in 6 months because the model was improving so quickly.


V. What's in Boris' CLAUDE.md?

HOST: Earlier we were saying like we should compare CLAUDE.md's but you said something very profound which is yours is actually very short which is almost like the opposite of what people might expect. Why is that? What's in your CLAUDE.md?

BORIS: Okay so I checked this before we came. So my CLAUDE.md has two lines. The first line is whenever you put up a PR, enable automerge. So as soon as someone accepts it, it's merged. That's just so I can code and I don't have to kind of go back and forth with code review or whatever. And then the second one is whenever I put up a PR, post it in our internal team stamps channel. Just so someone can stamp it and I can get unblocked. And the idea is every other instruction is in our CLAUDE.md that's checked into the codebase and it's something our entire team contributes to multiple times a week. And very often I'll see someone's PR and they make some mistake that's totally preventable and I'll just literally tag Claude on the PR. I'll just do like add Claude, you know, add this to the CLAUDE.md and I'll do this many times a week.

HOST: Do you have to compact the CLAUDE.md? Like I definitely reached a point where I got the message at the top saying your CLAUDE.md is like thousands of tokens now. What do you do when you guys hit that?

BORIS: So our CLAUDE.md is actually pretty short. I think it's like a couple thousand tokens maybe something like that. If you hit this my recommendation would be delete your CLAUDE.md and just start fresh.

HOST: Interesting.

BORIS: I think a lot of people try to over-engineer this and really the capability changes with every model. And so the thing that you want is do the minimal possible thing in order to get the model on track. And so if you delete your CLAUDE.md and then the model is getting off track, it does the wrong thing. That's when you kind of add back a little bit at a time. And what you're probably going to find is with every model, you have to add less and less.

For me, I consider myself a pretty average engineer to be honest. I don't use a lot of fancy tools. I don't use Vim. I use VS Code because it's simpler.

HOST: Wait, really? I would have assumed that because you built this in the terminal that you were sort of like a diehard terminal Vim-only person. You know, screw those VS Code people.

BORIS: Well we have people like that on the team. There's Adam Wolf for example, he's on the team, he's like "you will never take Vim from my cold dead hands." So there's definitely a lot of people like that on the team and this is one of the things that I learned early on is every engineer likes to hold their dev tools differently. They like to use different tools. There's just no one tool that works for everyone. But I think also this is one of the things that makes it possible for Claude Code to be so good because I kind of think about it as what is the product that I would use that makes sense to me and so to use Claude Code you don't have to understand Vim you don't have to understand tmux you don't have to know how to SSH you don't have to know all the stuff you just have to open up the tool and it'll guide you it'll do all this stuff.


VI. How Do You Decide the Terminal's Verbosity?

HOST: How do you decide how verbose you want the terminal to be? Like sometimes you have to go Control-O and check it out and is it like internal bikeshed battles around longer, shorter? I mean every user probably has an opinion. How do you make those sorts of decisions?

HOST: What's your opinion? Is it too verbose right now?

HOST: Oh, I love the verbosity because basically sometimes it just goes off the deep end and I'm watching and then I can just read very quickly and it's like, "Oh, no, no, it's not that." And then I escape and then just stop it and then it just stops an entire bug farm as it's happening. I mean, that's usually when I didn't do plan mode properly.

BORIS: This is something that we probably change pretty often. I remember early on, this is maybe six months ago, I tried to get rid of bash output just internally just to summarize it because I was like these giant long bash commands, I don't actually care. And then I gave it to Anthropic employees for a day and everyone just revolted. "I want to see my bash" because it actually is quite useful for, you know, like for something like git output, maybe it's not useful, but if you're running Kubernetes jobs or something like this, you actually do want to see it.

We recently hid the file reads and file searches. So you'll notice instead of saying, you know, read foo.md it says, you know, read one file, searched one pattern. And this is something I think we could not have shipped six months ago because the model just was not ready. It would have, you know, it still read the wrong thing pretty often. As a user, you still had to be there and kind of catch it and debug it. But nowadays, I just noticed it's on the right track almost every time. And because it's using tools so much, it's actually a lot better just to summarize it.

But then we shipped it. We dogfooded it for like a month and then people on GitHub didn't like it. So there was a big issue where people were like "no, I want to see the details" and that was really great feedback. And so we added a new verbose mode and so that's just in slash config you can enable verbose mode and if you want to see all the file outputs you can continue to do that. And then I posted on the issue and people still didn't like it which is again awesome because my favorite thing in the world is just hearing people's feedback and hearing how they actually want to use it. And so we just iterated more and more and more to get that really good and to make it the thing that people want.

HOST: I'm amazed how much I enjoy fixing bugs now. And then all you have to do is have really good logging and then even just say like hey check out this particular object it messed up in this way and it searches the log. It figures everything out. It can go into your — you can make a production tunnel and it'll look at your production DB for you. It's like this is insane. Bug fixing is just going to Sentry, copy markdown. Pretty soon it's just going to be straight MCP. It's like an auto-bug-fixing and test-making sort of — what's the new term they call it, like a making a startup factory.

HOST: Oh yeah. Right. There's all these concepts now of rather than having to review the code — I'm old school, so I like the verbosity. I like to say, "Oh, well, you're doing this, but I want you to do that." Right? But there's a totally different school of thought now that says anytime a real human being has to look at code, that's bad.

BORIS: Yeah. Yeah. Which is fascinating. I think Dan Shipper talks about this a lot as kind of whenever you see the model make a mistake try to put it in the CLAUDE.md, try to put it in skills or something like that so it's reusable. But I think there's this meta point that I actually struggle with a lot. And people talk about agents can do this, agents can do that, but actually what agents can do, it changes with every single model. And so sometimes there's a new person that joins the team and they actually use Claude Code more than I would have used it. And I'm just constantly surprised by this.

Like for example, we had a memory leak and we were trying to debug it. And by the way, Jared Sumner has just been on this crusade killing all the memory leaks and it's just been amazing. But before Jared was on the team, I had to do this and there was this memory leak. I was trying to debug it. And so I took a heap dump. I opened it in DevTools. I was looking through the profile. Then I was looking through the code and I was just trying to figure this out. And then another engineer on the team, Chris, he just asked Claude Code. He was like, "Hey, I think there's a memory leak. Can you run this?" And then try to figure it out. And Claude Code took the heap dump. It wrote a little tool for itself to analyze the heap dump. And then it found the leak faster than I did. And this is just something I have to constantly relearn because my brain is still stuck somewhere six months ago at times.


VII. Beginner's Mindset Is Key as the Models Improve

HOST: So what would be some advice for technical founders to really become maximalists at the latest model release? It sounds like people fresh off of school or that don't have any assumptions might be better suited than maybe sometimes engineers who have been working at it for a long time. And how do the experts get better?

BORIS: I think for yourself it's kind of beginner mindset and I don't know maybe just like humility. I feel like engineers as a discipline we've learned to have very strong opinions and senior engineers are kind of rewarded for this. In my old job at a big company when I hired architects and this type of engineer you look for people that have a lot of experience and really strong opinions. But it actually turns out a lot of this stuff just isn't relevant anymore and a lot of these opinions should change because the model is getting better. So I think actually the biggest skill is people that can think scientifically and can just think from first principles.

HOST: How do you screen for that when you try to hire someone now for your team?

BORIS: I sometimes ask about what's an example of when you're wrong. It's a really good one. Some of these classic behavioral questions — not even coding questions — I think are quite useful because you can see if people can recognize their mistake in hindsight, if they can claim credit for the mistake and if they learned something from it. And I think a lot of these very senior people especially — there are some founder types like this but I think founders in particular are actually quite good at it. But other people sometimes will never really take the blame for a mistake. But I don't know, for me personally I'm wrong probably half the time. Like half my ideas are bad and you just have to try stuff and you try a thing, you give it to users, you talk to users, you learn, and then eventually you might end up at a good idea. Sometimes you don't. And this is the skill that I think in the past was very important for founders, but now I think it's very important for every engineer.

HOST: Do you think you would ever hire someone based on the Claude Code transcript of them working with the agent? Because we're actively doing that right now. We just added as a test — you can upload a transcript of you coding a feature with Claude Code or Codex or whatever it is. Personally, I think it's going to work. I mean, you can figure out how someone thinks, like whether they're looking at the logs or not, can they correct the agent if it goes off the rails? Do they use plan mode? When they use plan mode, do they make sure that there are tests? All of these different things — do they think about systems? Do they even understand systems? There's just so much that's embedded in that. I just want a spider web graph, you know, like in those video games like NBA 2K. It's like, oh, this person's really good at shooting or defense. You could imagine a spiderweb graph of someone's Claude Code skill level.

HOST: What would the skills be? What would those be?

HOST: I mean, I think it's like systems, testing, user behavior. There's got to be a design part, product sense, maybe also just automating stuff.

BORIS: My favorite thing in CLAUDE.md for me is I have a thing that says for every plan decide whether it's overengineered, underengineered, or perfectly engineered and why. I think this is something that we're trying to figure out too.


VIII. Hyper Specialists vs Hyper Generalists

BORIS: When I look at engineers on the team that I think are the most effective, there's essentially two — it's very bimodal. There's one side where it's extreme specialists. And so I named Jared before, he's a really good example of this and kind of the Bun team is a really good example. Just hyper specialist. They understand dev tools better than anyone else. They understand JavaScript runtime systems better than anyone else. And then there's the flip side of kind of hyper generalists and that's the rest of the team. And a lot of people they span product and infra or product and design or product and user research, product and business. I really like to see people that just do weird stuff. I think that's one of these things that was kind of a warning sign in the past because it's like can these people actually build something useful?

HOST: That's the litmus test.

BORIS: Yeah, that's what it must be. But nowadays — for example an engineer on the team Daisy, she was on a different team and then she transferred onto our team and the reason that I wanted her to transfer is she put up a PR for Claude Code a couple weeks after she joined or something and the PR was to add a new feature to Claude Code and then instead of just adding the feature what she did is first she put up a PR to give Claude Code a tool so that it can test an arbitrary tool and verify that that works. And then she put up that PR and then she had Claude write its own tool instead of herself implementing it. And I think it's this kind of out of the box thinking that is just so interesting because not a lot of people get it yet.

We use the Claude Agents SDK to automate pretty much every part of development. It automates code review, security review. It labels all of our issues. It shepherds things to production. It does pretty much everything for us. But I think externally I'm seeing a lot of people start to figure this out, but it's actually taken a while to figure out how do you use LLMs in this way? How do you use this new kind of automation? So it's kind of a new skill.

HOST: I guess one of the funnier things that I've been having office hours with various founders about is you have sort of the visionary founder who has the idea, they've built this crystal palace of the product that they want to build. They've totally loaded in their brain who the user is and what they feel and what they're motivated by and then they're sitting in Claude Code and they can do like 50x work and then but they have engineers who work for them who don't have the crystal memory palace of the platonic ideal of the product that the founder has and they can only do like 5x work. Are you hearing stories like that?

BORIS: There's usually a person who's the core designer of a thing and they're just trying to blast it out of their brain.

HOST: What's the nature of teams like that? It seems like that's almost a stable configuration. You're going to have the visionary who now is unleashed, but maybe going back to the top of it, I'm experiencing this right now. I was like, "Oh, well, I'm only a solo person and I need to eat and sleep and I have a whole job. How am I going to do this?"


IX. The Vision for Claude Teams

HOST: We just launched Claude Teams. What's the vision for Claude Teams?

BORIS: Just collaboration. There's this whole new field of agent topologies that people are exploring. What are the ways that you can configure agents? There's this one sub-idea which is uncorrelated context windows. And the idea is just multiple agents, they have fresh context windows that aren't essentially polluted with each other's context or their own previous context. And if you throw more context at a problem, that's like a form of test-time compute. And so you just get more capability that way. And then if you have the right topology on top of it, so the agents can communicate in the right way, they're laid out in the right way, then they can just build bigger stuff.

And so Teams is kind of one idea. There's a few more that are coming pretty soon. And the idea is just maybe it can build a little bit more. I think the first kind of big example where it worked is our plugins feature was entirely built by a swarm over a weekend. It just ran for a few days. There wasn't really human intervention. And plugins is pretty much in the form that it was when it came out.

HOST: How did you set that up? Did you spec out the outcome that you were hoping for and then let it figure out the details and let it run?

BORIS: Yeah. An engineer on the team just gave Claude a spec and told Claude to use an Asana board and then Claude just put up a bunch of tickets on Asana and then spawned a bunch of agents and the agents started picking up tasks. The main Claude just gave instructions and they all just figured it out — independent agents that didn't have the context of the bigger spec.


X. Subagents

BORIS: If you think about the way that our agents actually work nowadays — and I haven't pulled the data on this but I would bet the majority of agents are actually prompted by Claude today in the form of sub-agents because a sub-agent is just a recursive Claude Code. That's all it is in the code. And it's just prompted by — we call her mama Claude — and that's all it is. And I think probably if you look at most agents they're launched in this way.

HOST: My Claude Insights just told me to do this more for debugging. So I spend a lot of time on debugging and it would just be better to have multiple sub-agents spin up and debug something in parallel. And so then I just added that to my CLAUDE.md to be like, hey, next time you try and fix a bug, have one agent that looks in the log, one that looks in the code path. That just seems sort of inevitable.

HOST: For weird scary bugs, I try to fix bugs in plan mode and then it seems to use the agents to sort of search everything. Whereas when you're just trying to do it inline, it's like, okay, I'm going to do this one task instead of search wide.

BORIS: This is something I do all the time too. I just say — if the test seems kind of hard, this kind of research task, I'll calibrate the number of sub-agents I ask it to use based on the difficulty of the task. So if it's really hard, I'll say use three or maybe five or even 10 sub-agents, research in parallel and then see what they come up with.

HOST: I'm curious. So then why don't you put that in your CLAUDE.md file?

BORIS: It's kind of case by case. CLAUDE.md — what is it? It's just a shortcut. If you find yourself repeating the same thing over and over, you put it in the CLAUDE.md. But otherwise, you don't have to put everything there. You can just prompt Claude.


XI. A World Without Plan Mode?

HOST: Are you also in the back of your mind thinking that maybe in six months, you won't need to prompt that explicitly? Like the model will just be good enough to figure out on its own.

BORIS: Maybe in a month.

HOST: No more need for plan mode in a month. Oh my god.

BORIS: I think plan mode probably has a limited lifespan.

HOST: Interesting. That's some alpha for everyone here. What would the world look like without plan mode? Do you just describe it at the prompt level and it would just do it? One-shot it?

BORIS: Yeah, we've started experimenting with this because Claude Code can now enter plan mode by itself. I don't know if you guys have seen that. So, we're trying to get this experience really good. So, it would enter plan mode at the same point where a human would have wanted to enter it. So, I think it's something like this, but actually plan mode — there's no big secret to it. All it does is it adds one sentence to the prompt that's like "please don't code." That's all it is. You can actually just say that.

HOST: Yeah. So it sounds like a lot of the feature development for Claude Code is very much what we talk about at YC — talk to your users and then you come and implement it. It wasn't the other way that you had this master plan and then implemented all the features.

BORIS: Yeah. Yeah. I mean that's all it was. Plan mode was — we saw users that were like "hey Claude come up with an idea, plan this out but don't write any code yet." And there were various versions of this. Sometimes it was just talking through an idea. Sometimes it was these very sophisticated specs that they were asking Claude to write, but the common dimension was do a thing without coding yet. And so literally this was like Sunday night at 10 p.m. I was just looking at GitHub issues and kind of seeing what people were talking about and looking at our internal Slack feedback channel and I just wrote this thing in like 30 minutes and then shipped it that night. It went out Monday morning. That was plan mode.

HOST: So do you mean that there will be no need for plan mode in the sense of "I'm worried that the model's going to do the wrong thing or head off in the wrong direction" but there will still be a need for that — you need to think through the idea and figure out exactly what it is that you want and you have to do that somewhere.

BORIS: I kind of think about it in terms of increasing model capabilities. So maybe 6 months ago a plan was insufficient. So you get Claude to make a plan. Let's say even with plan mode you still have to kind of sit there and babysit because it can go off track. Nowadays what I do is probably 80% of my sessions I start in plan mode — I say plan mode has a limited lifespan but I'm a heavy plan mode user. Probably 80% of my sessions I start in plan mode and Claude will start making a plan. I'll move on to my second terminal tab and then I'll have it make another plan and then when I run out of tabs I open the desktop app and then I go to the code tab and then I just start a bunch of tabs there and they all start in plan mode probably 80% of the time. Once the plan is good, and sometimes it takes a little back and forth, I just get Claude to execute. And nowadays, what I find with Opus 4.5, I think it started with 4.6 it got really good. Once the plan is good, it just stays on track and it'll just do the thing exactly right almost every time.

And so before you had to babysit after the plan and before the plan, now it's just before the plan. So, maybe the next thing is you just won't have to babysit. You can just give a prompt and Claude will figure it out.

HOST: The next step is Claude just speaks to your users directly.

BORIS: Yeah, it just bypasses you entirely. It's funny. This is actually the current stuff for us. Our Claudes actually talk to each other. They talk to our users on Slack, at least internally pretty often. My Claude will tweet once in a while.

HOST: No way.

BORIS: But I actually delete it. It's just a little cheesy. I don't love the tone.

HOST: What does it want to tweet about?

BORIS: Sometimes it'll just respond to someone because I always have Cowork in the background and it's the Cowork that really loves to do that because it likes using a browser.

HOST: That's funny.

BORIS: A really common pattern is I ask Claude to build something. It'll look in the codebase. It'll see some engineer touched something in the git blame and then it'll message that engineer on Slack. Just asking a clarifying question and then once it gets an answer back, it'll keep going.


XII. Tips for Founders to Build for the Future

HOST: What are some tips for founders now on how to build for the future? Sounds like everything is really changing. What are some principles that will stay on and what will change?

BORIS: So I think some of these are pretty basic, but I think they're even more important now than they were before. So one example is latent demand. I mentioned it a thousand times. For me it's just the single biggest idea in product. It's a thing that no one understands. It's a thing I certainly did not understand my first few startups. And the idea is people will only do a thing that they already do. You can't get people to do a new thing. If people are trying to do a thing and you make it easier, that's a good idea. But if people are doing a thing and you try to make them do a different thing, they're not going to do that. And so you just have to make the thing that they're trying to do easier. And I think Claude is going to get increasingly good at figuring out these product ideas for you just because it can look at feedback, it can look at debug logs, it can kind of figure this out.

HOST: That's what you mean by plan mode was latent demand — that people were already like, I don't know, had their Claude chat window open in a browser and were talking to it to figure out the spec and what it should do. And now plan mode just became that — you just do it in Claude Code.

BORIS: Yeah. Yeah, that's it. Sometimes what I'll do is I'll just walk around the office on our floor and I'll just kind of stand behind people — I'll say hi so it's not creepy — and then I'll just see how they're using Claude Code. And this is also just something I saw a lot but it also came up in GitHub issues, like people were talking about it.


XIII. How Much Life Does the Terminal Still Have?

HOST: It seems like you're surprised how far the terminal has gone and how far it's been pushed. How far do you think it has left to go just given this world of swarm, multiple agents? Do you think there's going to be a need for a different UI on top of it?

BORIS: It's funny. If you asked me this a year ago I would have said the terminal has like a three-month lifespan and then we're going to move on to the next thing. And you can see us experimenting with this right because Claude Code started in a terminal but now it's on web, Claude Code is in the desktop app — we've had that for three months or six months or something just in the code tab. It's in the iOS and Android apps just in the code tab. It's in Slack. It's in GitHub. There's VS Code extensions. There's JetBrains extensions. So we're always experimenting with different form factors for this thing to figure out what's the next thing. I've been wrong so far about the lifespan of the CLI. So, I'm probably not the person to forecast that.


XIV. Advice for Dev Tool Founders

HOST: What about your advice to dev tool founders? Someone's building a dev tool company today. Should they be building for engineers and humans or should they be thinking more about what Claude is going to think and want and build for the agent?

BORIS: The way I would frame it is think about the thing that the model wants to do and figure out how do you make that easier. And that's something that we saw — when I first started hacking on Claude Code, I realized this thing just wants to use tools. It just wants to interact with the world. And how do you enable that? Well, the way you don't do it is you put it in a box and you're like, here's the API, here's how you interact with me, and here's how you interact with the world. The way you do it is you see what tools it wants to use. You see what it's trying to do, and you enable that the same way that you do for your users. And so if you're building a dev tool startup, I would think about what is the problem you want to solve for the user? And then when you apply the model to solving this problem, what is the thing the model wants to do? And then what is the technical and product solution that serves the latent demand of both?


XV. Claude Code and TypeScript Parallels

HOST: Back in the day, more than 10 years ago, you were a very heavy user and you wrote a book about TypeScript, right? Before TypeScript was cool. This is when everyone was deep in JavaScript. This is back in the early 2010s, right?

BORIS: Yeah, something like that. Before TypeScript was a thing because back then it was a very weird language. It's not supposed to do a lot of things with being typed in JavaScript and now it's the right thing and it feels like Claude Code in the terminal has a lot of parallels with TypeScript at the beginning.

HOST: TypeScript makes a lot of really weird language decisions. So if you look at the type system pretty much anything can be a literal type for example and this is super weird because even Haskell doesn't even do this. It's just too extreme. Or it has conditional types which I don't think any language thought of at all. It was very strongly typed.

BORIS: Yeah, it was very strongly typed and the idea was — when Joe Pamer and Anders and the early team was building this thing, the way they built it is okay, we have these teams with these big untyped JavaScript code bases. We have to get types in there, but we're not going to get engineers to change the way that they code. You're not going to get JavaScript people to have 15 layers of class inheritance like you would a Java programmer, right? They're going to write code the way they're going to write it. They're going to use reflection and they're going to use mutation and they're going to use all these features that traditionally are very very difficult to type.

HOST: They're very unsafe to type to any strong functional programmer.

BORIS: That's right. And so the thing that they did instead of getting people to change the way that they code, they built a type system around this. And it was just brilliant because there's all these ideas that no one was thinking about even in academia. No one thought of a bunch of these ideas. It purely came out of the practice of observing people and seeing how JavaScript programmers want to write code.

And so for Claude Code there are some ideas that are kind of similar in that you can use it like a Unix utility. You can pipe into it. You can pipe out of it. In some ways it is kind of rigorous in this way but in almost every other way it's just the tool that we wanted. I build a tool for myself and then the team builds the tool for themselves and then for Anthropic employees and then for users and it just ends up being really useful. It's not this principled and academic thing which I think the proof is actually in the results.

HOST: Now fast forward more than 15 years later not many codebases are in Haskell which is more academic and there's tons of them now on TypeScript because it's way more practical, right? Which is interesting.

BORIS: Yeah, it is interesting. It's like TypeScript solves a problem.


XVI. Designing for the Terminal Was Hard

HOST: I guess one thing that's cool — I don't know how many people know, but the terminal is actually one of the most beautiful terminal apps out there and is actually written with React terminal.

BORIS: When I first started building it, I did front-end engineering for a while. And I was also sort of a hybrid — I do design and user research and write code and all this stuff. And we love hiring engineers that are like this. We just love generalists. So for me it's like okay, I'm building a thing for the terminal. I'm actually kind of a shitty Vim user. So how do I build a thing for people like me that are going to be working in a terminal. And I think just the delight is so important. And I feel like at YC this is something you talk about a lot, right? It's build a thing that people love. If the product is useful but you don't fall in love with it, that's not great. So it kind of has to do both.

Designing for the terminal honestly has been hard, right? It's like 80 by 100 characters or whatever. You have 256 colors, you have one font size, you don't have mouse interactions, there's all this stuff you can't do, and there's all these very hard trade-offs. So, a little-known thing, for example, is you can actually enable mouse interactions in a terminal. So, you can enable clicking and stuff.

HOST: Oh, how do you do that in Claude Code? I've been trying to figure out how to do this.

BORIS: We don't have it in Claude Code because we actually prototyped it a few times and it felt really bad because the trade-off is you have to virtualize scrolling and there's all these weird trade-offs because the way terminals work is there's no DOM, right? It's like there's ANSI escape codes and these kind of weird organically evolved specs since the 1960s or whatever.

HOST: Yeah. It feels like BBSes. It's like a BBS door game.

BORIS: Oh my god. That's like a great compliment. It should feel like you're discovering Lord of the Red Dragon. It's fantastic.

But we've had to discover all these UX principles for building the terminal because no one really writes about this stuff. And if you look at the big terminal apps of the 80s or 90s or 2000s or whatever, they use ncurses and they have all these windows and things like this. And it just looks kind of janky by modern standards. It just looks too heavy and complicated. And so we had to reinvent a lot. And for example, something like the terminal spinner, the spinner words, it's gone through probably I want to say like 50 maybe 100 iterations at this point. And probably 80% of those didn't ship. So we tried it, it didn't feel good, move on to the next one. Try it, didn't feel good, move on to the next one.

And this was sort of one of the amazing things about Claude Code — you can write these prototypes and you can do like 20 prototypes back to back, see which one you like, and then ship that and the whole thing takes maybe a couple hours. Whereas in the past, what you would have had to do is use Origami or Framer or something like this. You built maybe three prototypes, it took two weeks. It just took much longer. And so we have this luxury of we have to discover this new thing. We have to build a thing. We don't know what the right endpoint is, but we can iterate there so quickly and that's what makes it really easy and that's what lets us build a product that's joyous and that people like to use.


XVII. Other Advice for Builders

HOST: Boris, you had other advice for builders and we kept interrupting you because we have so many questions. I would say —

BORIS: So okay, maybe two pieces of advice that are kind of weird because it's about building for the model. So one is don't build for the model of today, build for the model of 6 months from now. This is sort of weird, right? Because you can't find PMF if the product doesn't work. But actually this is the thing that you should do because otherwise what will happen is you spend a bunch of work, you find PMF for the product right now and then you're just going to get leapfrogged by someone else because they're building for the next model and a new model comes out every few months. Use the model, feel out the boundary of what it can do and then build for the model that you think will be the model maybe 6 months from now.

I think the second thing is — actually in the Claude Code area where we sit we have a framed copy of the bitter lesson on the wall. And this is Rich Sutton — everyone should read it if you haven't — and the idea is the more general model will always beat the more specific model and there's a lot of corollaries to this but essentially what it boils down to is never bet against the model. And so this is just a thing that we always think about where we could build a feature into Claude Code. We could make it better as a product and we call this scaffolding. That's all the code that's not the model itself. But we could also just wait a couple months and the model can probably just do the thing instead.

And there's always this trade-off. It's like engineering work now and you can extend the capability a little bit, maybe 10, 20% or whatever in whatever domain on the spider chart of what you're trying to extend. Or you can just wait and the next model will do it. So just always think in terms of this trade-off — where do you actually want to invest and assume that whatever the scaffolding is, it's just temporary.

HOST: How often do you rewrite the code base of Claude Code? Is it every six months? Is there scaffolding that you've deleted because you don't need it anymore because the model just improved?

BORIS: Oh so much. Yeah. All of Claude Code has just been written and rewritten and rewritten and rewritten over and over and over. We unship tools every couple weeks. We add new tools every couple weeks. There is no part of Claude Code that was around six months ago. It's just constantly rewritten.

HOST: Would you say most of the code base for current Claude Code — is only say 80% of it only less than a couple months old?

BORIS: Yeah, definitely. It might even be less. Maybe a couple months. That feels about right.

HOST: So the life cycle of code now. That's another alpha — expecting the shelf life to be just a couple months.

BORIS: Yeah. For the best founders.


XVIII. Productivity per Engineer

HOST: Do you see Steve Yegge's post about how awesome working at Anthropic is? And I think there's a line in there that says that an Anthropic engineer currently averages 1,000x more productivity than a Google engineer at Google's peak which is really an insane number honestly. Like 1,000x — three years ago we were still talking about 10x engineers. Now we're talking about 1,000x on top of a Google engineer in their prime. This is unbelievable honestly.

BORIS: Yeah, I mean internally if you look at technical employees, they all use Claude Code every day. And even non-technical employees — I think like half the sales team uses Claude Code. They've started switching to Cowork because it's a little easier to use. It has a VM, so it's a little bit safer. But yeah, we actually just pulled a stat and I think the team doubled in size last year, but productivity per engineer grew something like 70%. It's measured by just the simplest stupidest measure — pull requests. But we also cross-check that against commits and the lifetime of commits and things like this. And since Claude Code came out, productivity per engineer at Anthropic has grown 150%.

And this is crazy because in my old life I was responsible for code quality at Meta. I was responsible for the quality of all of our code bases across every product — Facebook, Instagram, WhatsApp, whatever. And one of the things that the team worked on was improving productivity. And back then seeing a gain of something like 2% in productivity — that was a year of work by hundreds of people. And so this 100%+ — this is just unheard of. Completely unheard of.


XIX. Why Boris Chose to Join Anthropic

HOST: What drove you to come over to Anthropic? I mean basically as a builder you could go anywhere. What was the moment that made you say this is the set of people or this is the approach?

BORIS: I was living in rural Japan and I was opening up Hacker News every morning and I was reading the news and it was all — it just started to be AI stuff at some point and I started to use some of these early products and I remember the first couple times that I used it I was just — it just took my breath away. That's very cheesy to say, but that was actually the feeling. It was just amazing. As a builder, I've just never kind of felt this feeling using these very early products. That was in the Claude 2 days or something like that.

And so I just started talking to friends at labs just to kind of see what was going on. And I met Ben Mann who's one of the founders at Anthropic and he just immediately won me over. And as soon as I met the rest of the team at Anthropic it just won me over and I think probably in two ways. So one is it operates as a research lab. The product was teeny teeny tiny. It's really all about building a safe model. That's all that matters. And so this idea of just being very close to the model and being very close to development and being not the most important thing because the product isn't anymore. It's just the model is the thing that's the most important. That really resonated with me after building product for many years.

And then the second thing was just how mission-driven it is. I'm a huge sci-fi reader. My bookshelf is just filled with sci-fi. And so I just know how bad this can go. And when I kind of think about what's going to happen this year, it's going to be totally insane. And in the worst case it can go very very bad. And so I just wanted to be at a place that really understood that and kind of really internalized that. And at Anthropic, if you overhear conversations in the lunchroom or in the hallway, people are talking about AI safety. This is really the thing that everyone cares about more than anything. And so I just wanted to be in a place like that. I know for me personally the mission is just so important.


XX. How Coding Will Change

HOST: What is going to happen this year?

BORIS: Okay. So if you think back like six months ago and kind of what are the predictions that people are making? So Dario predicted that 90% of the code at Anthropic would be written by Claude. This is true. For me personally it's been 100% since Opus 4.5. I uninstalled my IDE. I don't edit a single line of code by hand. It's just 100% Claude Code and Opus. And I land like 20 PRs a day every day. If you look at Anthropic overall it ranges between 70 to 90% depending on the team. For a lot of teams it's also 100%. For a lot of people it's 100%.

And I remember making this prediction back in May when we GA'd Claude Code that you wouldn't need an IDE to code anymore. And it was totally crazy to say. I feel like people in the audience gasped because it was such a silly prediction at the time. But really all it is is you just trace the exponential and this is just so deep in the DNA at Anthropic because three of our founders were co-authors of the scaling laws paper. They kind of saw this very early and so this is just tracing the exponential — this is what's going to happen and yes that happened.

So continuing to trace the exponential I think what will happen is coding will be generally solved for everyone. And I think today coding is practically solved for me and I think it'll be the case for everyone. Regardless of domain, I think we're going to start to see the title software engineer go away. And I think it's just going to be maybe builder, maybe product manager, maybe we'll keep the title as kind of a vestigial thing, but the work that people do, it's not just going to be coding. Software engineers are also going to be writing specs. They're going to be talking to users. This thing that we're starting to see right now in our team where engineers are very much generalists and every single function on our team codes — our PMs code, our designers code, our EM codes, our finance guy codes, everyone on our team codes. We're going to start to see this everywhere.

So this is sort of the lower bound if we just continue the trend. The upper bound I think is a lot scarier. And this is something like we hit ASL-4. At Anthropic, we talk about these safety levels. ASL-3 is where the models are right now. ASL-4 is the model is recursively self-improving. And so if this happens, essentially, we have to meet a bunch of criteria before we can release a model. And so the extreme is that this happens or there's some kind of catastrophic misuse like people are using the model to design bioviruses, design zero-days, stuff like this. And this is something that we're really really actively working on so that doesn't happen.

I think it's just been honestly so exciting and humbling seeing how people are using Claude Code. I just wanted to build a cool thing and it ended up being really useful and that was so surprising and so exciting.


XXI. Outro

HOST: My impression from Twitter or just the outside is basically everyone went away over the holidays and then found out about Claude Code and it's just been crazy ever since. Is that how it was for you internally? Were you having a nice Christmas break and then came back and like what happened?

BORIS: Well, actually for all of December, I was traveling around. And I took a coding vacation. So, we were kind of traveling around and I was just coding every day. So, that was really nice. And then I also started to use Twitter at the time because I worked on Threads back then, way back when. So, I've been a Threads user for a while. So, I just tried to see other platforms where people are. Yeah. I think for a lot of people they kind of discovered — that was the moment where they discovered Opus 4.5. I kind of already knew. And internally Claude Code's just been on this exponential tear for many many months now. So that just became even more steep. That's what we saw.

And if you look at Claude Code now, there was some stat from Mercury that 70% of startups are choosing Claude as their model of choice. There was some other stat from Semi Analysis that 4% of all public commits are made by Claude Code — of all code written everywhere. All the companies use Claude Code from the biggest companies to the smallest startups. It wrote — it plotted the course for Perseverance, the Mars rover. This is just the coolest thing for me. And we even printed posters because the team was like, "Wow, this is just so cool that NASA chooses to use this thing." So, yeah, it's just humbling. But it also feels like the very beginning.

HOST: What's the interaction between Claude Code and then Cowork? Was it a fork of Claude Code? Was it like you had Claude Code look at the Claude Code and say let's make a new spec for non-technical people that keeps all the lessons and then it went off for a couple days and did that? What's the genesis of that and where do you think that goes?

BORIS: This is going to be like my fifth time using the word latent demand. It was just that. We were looking at Twitter and there was that one guy that was using Claude Code to monitor his tomato plants. There was this other person that was using it to recover wedding photos off of a corrupted hard drive. There were people using it for finance. When we looked internally at Anthropic, every designer is using it. The entire finance team at this point is using it. The entire data science team is using it — not for coding. People are jumping over hoops to install a thing in the terminal so that they could use this. So we knew for a while that we wanted to build something and so we were experimenting with a bunch of different ideas and the thing that took off was just a little Claude Code wrapper in a GUI in the desktop app and that's all it is. It's just Claude Code under the hood. It's the same agent.

HOST: Oh wow.

BORIS: And Felix and the team — Felix was an early Electron contributor. He kind of knows that stack really well and he was hacking on various ideas and they built it in I think something like 10 days. It was just 100% written by Claude Code. And it just felt ready to release. There was a lot of stuff that we had to build for non-technical users. So it's a little bit different than a technical audience. It runs in a — all the code runs in a virtual machine. There's a lot of deletion protections and things like this. There's a lot of permission prompting and other guardrails for users. Yeah, it was honestly pretty obvious.

HOST: Boris, thank you so much for making something that is taking away all my sleep, but in return, it's making me feel creator mode again, sort of founder mode again. It's been an exhilarating 3 weeks. I can't believe I waited that long since November to actually get into it. Thank you so much for being with us. Thank you for building what you're building.

BORIS: Yeah, thanks for having me. And send bugs.

HOST: Sounds good.


Transcript source: "The Lightcone" (Y Combinator) — Inside Claude Code With Its Creator Boris Cherny. YouTube. Formatted for readability.